Replication Technologies for High Availability and Disaster Recovery – A Technical Overview

作者: Maclean Liu , post on December 24th, 2010 , English Version
【本站文章除注明转载外,均为本站原创编译】
转载请注明:文章转载自: Oracle Clinic – Maclean Liu的个人技术博客 [http://www.oracledatabase12g.com/]
本文标题: Replication Technologies for High Availability and Disaster Recovery – A Technical Overview
本文永久地址: http://www.oracledatabase12g.com/archives/replication-technologies-for-high-availability-and-disaster-recovery-a-technical-overview.html

What did they do?

Data Guard was already implemented
Physical Standby, Maximum Availability
Data Guard prevented corruption from impacting the standby database

Failed over to the standby database
New production database up in minutes, no loss of data

Resolved corruptions without impacting availability
Problem traced to faulty storage array component
Took a few days to investigate and fix the problem

Utilize primary and standby systems, storage, and software – for productive purposes regardless of role

Complexity is the sum total of:
Efficiency – standby resources (cpu, memory, I/o) required to maintain synchronization
Simplicity – amount of management attention potentially required and degree to which the replication solution is application transparent in terms of performance and implementation.
Cost – a function of resource consumption required to maintain synchronization, management overhead, and acquisition/implementation costs

Active Meter – the degree to which the standby systems, storage, and software can be utilized while they are providing HA/DR

DR Meter – the sum total of Data Protection, High Performance, and the degree to which use of the technology does not entail additional considerations impacting data protection and availability

There have been a number of replication technologies marketed over the years by third parties and database vendors aimed at addressing the shortcomings of storage remote-mirroring.
10gR2 to 11g
85% gain for LOBs (CPU bound)
5-30% gain for OLTP and insert bound (CPU and IO bound)

9i -> 10g = 25% 10g -> 10gR2 = 19% Both just OLTP – No other workloads back then ;-)

Comprehensive data protection and availability
Hardware and OS agnostic – enables utilization of low cost servers and storage
Failover in seconds to a synchronized standby database
Full utilization of standby servers and storage

Prior to Oracle Database 11g, RMAN-detected block corruptions were recorded in V$DATABASE_BLOCK_CORRUPTION. In Oracle Database 11g, several database components and utilities, including RMAN, can now detect a corrupt block and record it in that view. Oracle Database automatically updates this view when block corruptions are detected or repaired (for example, using block media recovery or data file recovery). The benefit is that the time it takes to discover block corruptions is shortened.
In addition, you can use the DB_ULTRA_SAFE initialization parameter to automatically configure the appropriate data protection block checking level in the database. The performance impact may vary depending on the application and available system resources, but the effect can vary from 1% to 10%.
The DB_ULTRA_SAFE initialization parameter:
Controls the setting of other related initialization parameters, including DB_BLOCK_CHECKING, DB_BLOCK_CHECKSUM, and DB_LOST_WRITE_PROTECT
Controls other data protection behavior in the Oracle Database, such as requiring ASM to perform sequential mirror writes
By making it possible to detect data corruptions in a timely manner, these features provide critical high availability benefits for the Oracle database.

Before Failover is traditionally viewed as a last resort
Seldom tested, time consuming, manual and error prone process

Shows the reductions in failover time as new technologies were introduced.

Pre-Data Guard
Clock is ticking while the problem is identified, notifications are sent, and DBAs respond
Resolution involved custom scripts and complex procedures

Data Guard Fast Start Failover
Observer identifies the problem (primary not available),
responds appropriately (failover or not depending upon conditions),
and resolves the outage in under two minutes (if failover will not result in data loss)
Major improvement in availability

Database failover is fast, at Amazon.com . . .
Database failover completes in 10 seconds
Total time to redirect clients to new primary database is less than 2 minutes

Database fails – primary site and application tier remain viable
Execute Data Guard failover to recovery site
Fast-Start Failover recommended for optimum availability (best RTO)
Oracle Database Services used to automate relocation of the production service to the new primary database
Oracle Fast Application Notification (FAN) used to avoid tcp timeouts
Requires Oracle OCI and/or JDBC clients configured for FAN
Absent FAN support – use trigger on DB_ROLE_CHANGE system event to automatically restart middle tier and break clients out of tcp timeout
Application tier automatically connects to the new production database
For more details:
MAA Best Practices for Client Failover in a Data Guard Configuration
Demonstrations – Automatic Client Failover with Fast-Start Failover

Active Data Guard is a capability requested by many customers and generates immediate benefit for physical standby users

Satisfies specific requirements communicated by customers:
Telecom – service schedules for technicians
Medical – access patient medical reports
Finance – ad-hoc queries and reports
Transportation – package tracking queries
Web-business – scale-out read access for catalog browsing

Significant advantage compared to storage mirroring
Mirror volumes are offline during mirroring

RMAN block change tracking on standby database
Fast incremental backups complete 20x faster

Logical standby allows you to move reporting and other applications to the logical that require read-write access to Oracle, and that depend on primary database’s data but do not modify it. For instance, say you are a TELCO that has two separate databases, one running an application to measure a customer’s usage of the network, and the other application that bills the customer for the usage. You will have the primary database running your call-measurement application, and the offload the billing application to your logical standby. Same issue with toll collecting with EZPass: one database to record when you entered and exited from the highway, and the other application on logical standby database that actually charges your account (or credit card). We have customers who are using logical in both the variations that I descibed above. This capability, although not new, is more critical today  to differentiate between a logical and a physical standby database, now that we have Active Data Guard that allows queries to a physical standby database. The other thing important benefit/use casae for logical standby compared to over physical standby with Active Data Guard, is that logical allows you to maintain additional indexes and materialized views at the logical standby. At face value this does not seem like much. But there are indexes that are quite expensive to maintain, and can be very valuable  for reporting and browsing activities like a soundex index (this allows you to search on a name even if you do not know the exact spelling of the name, but you know what the name sounds like).  You can add such indexes to a music catalog say, such that the index gets maintained only on the logical standby. This allows me to mis-spell the name of the artist and still get a reasonable match. For instance,  say you were looking for me in the employee directory, but did not know how to spell my last name: with an existing soundex index you can query for a surname “koondoo” and still get the record that actually is associated with me (e.g, select first_name, last_name from hr.employees where soundex(last_name) = soundex(‘koondoo’); )

For tables that have only supported data types, changes are propagated to the standby site normally with SQL Apply.

For tables that require Extended Data Type Support, three additional database objects are created
a log table that is similar in structure to the base table except that data types not supported by SQL Apply are represented by supported data types.
For example, a simple object type would be flattened and represented in the log table as its scalar equivalents only
Varrays are represented as BLOBs
a trigger on the base table – fires on primary only
DML on the base table fires this trigger, which inserts rows into the log table containing DML type and relevant column values
a trigger on the log table – fires on standby only
When Apply inserts a row into the log table, this trigger fires, which executes original DML that was performed on the base table at the primary

Performing a rolling upgrade with SQL Apply provides several advantages:
The upgrade incurs very little downtime. The overall downtime can be as little as the time it takes to perform a switchover.
Application downtime due to PL/SQL recompilation is eliminated.
It is possible to validate the upgraded release without affecting the primary database.
Step Description 1 Stop SQL Apply and upgrade the logical standby database. 2 Restart SQL Apply. 3 Monitor events on the upgraded standby database. 4 Begin a switchover. 5 Determine if unsupported objects were modified during the upgrade. 6 Complete the switchover and activate user applications. 7 Upgrade the former primary database. Foot 1
8 Start SQL Apply. 9 Optionally, raise the compatibility level on both databases. 10 Monitor events on the new logical standby database. 11 Optionally, perform another switchover.

Thomson used Data Guard SQL Apply to execute a complete technology refresh, (servers and storage), a rolling database upgrade (10.1.0.3 to 10.1.0.4), and implement several database changes (initrans . . .), with zero downtime for consumers of their online information services (i.e. westlaw and other services), and minimal downtime for their content load processes.
It was Linux -> Linux. They changed the underlying hardware – more CPU’s, more Memory, etc… and I believe they even changed the version of the O/S, but it was still SUSE, just a new version of SUSE.

1. They began with 10.1.0.3 primary and logical standby.

2. They create a “temporary” logical standby running on the new servers and storage on 10.1.0.4 and perform the required database maintenance tasks – all while production is running on the original systems.

3. They create a “temporary” physical standby using new servers and storage for the temporary logical.

4. They do a Data Guard switchover of production from the original 10.1.0.3 primary, transitioning the 10.1.0.4 logical on the new servers and storage to the primary role. During this process, readers continue to have uninterrupted access to the database, there is a short outage for read-write access while the switchover completes.

5. The temporary physical standby running at 10.1.0.4 on the new servers and storage is converted to a logical standby.

6. The original systems on the old hardware running at 10.1.0.3 are decommissioned.

The Streams capture process captures changes from the redo logs and places them in a staging queue. This requires extra database objects and resources at the source database, which can sometimes have a negative impact on database performance.
With downstream capture, you can create the capture process and supporting objects on a different database than the source database, referred to as the downstream database. The redo logs generated at the source database can be transported to the downstream database through the Oracle database log transport service or any other supported method of transporting the redo logs, such as FTP. If you do not use the log transport service, you must add the log files to capture process by using the command ALTER DATABASE REGISTER LOGICAL LOGFILE FOR .
This configuration can provide the following benefits:
Overhead for supporting Streams on the source database greatly reduced
Easier capture process administration, as you can locate capture processes with different source sites on the same database
Configuring multiple capture processes for the same source database at a downstream database provides additional scalability and flexibility in configuring your Streams environment
Shipping the redo logs to a downstream database provides additional protection against database failure and corruption
Because the source and downstream databases use copies of the same archived log file, both databases must run on the same operating system and must both be running Oracle Database 10g. If the downstream database propagates events to another destination by using Oracle Streams (Streams), then this other destination database is not required to be on the same operating system.
The ADD_*_RULES procedures of DBMS_STREAMS_ADM cannot be used to create a downstream capture process. To implement downstream capture, the capture process must be created by using DBMS_CAPTURE_ADM.CREATE_CAPTURE() before invoking the ADD_*_RULES procedures.

IF: THEN
  1. Very high primary database throughput
  2. Simplicity of a physical replica
  3. Maximum protection from physical corruptions
  4. Read-only access to synchronized standby can offload production
Data Guard

Physical Standby

  1. Simpler, one-way logical replication
  2. Standby requires local tables, additional schemas, indexes and mv’s
  3. Offload apps that need read-write access, but don’t modify primary data
Data Guard

Logical

Standby

  1. Fine grained, N-way multimaster, hub&spoke or many to one replication
  2. Load balance update workload across sites
  3. Accommodations in app design and management are acceptable in

    return for access to advanced replication features

Oracle Streams
  1. Highest level of availability for server or computer room failure
  2. HA benefit and workload balancing outweighs performance concerns
  3. Willing to make additional provisions for remote data protection
Oracle RAC

Extended Distance Cluster

© 2010, www.oracledatabase12g.com. 版权所有.文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.

相关文章 | Related posts:

  1. Oracle Database 11g: High Availability Student Guide
  2. Oracle Active Data Guard-Technical Overview
  3. Introduction to Oracle GoldenGate: An Introduction to High Availability Solutions & Real-Time Data Integration
  4. Oracle GoldenGate: Disaster Recovery Solution
  5. Compare Oracle High Availability to Microsoft SQL Server 2008
  6. Best Practices for Automatic Failover Using Oracle Data Guard 10g Release 2
  7. ACFS Technical Overview and Deployment Guide
  8. Oracle Database 11gr1/10gr2 Automatic Storage Management Overview and Technical Best Practices
  9. Database Upgrade using Transportable Tablespaces
  10. How To Exclude A Table From Schema Capture And Replication When Using Schema Level Streams Replication

Leave a Reply

  

  

  

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>