Very Large Database (VLDB) Backup & Recovery Strategies

作者: Maclean Liu , post on December 27th, 2010 , English Version
【本站文章除注明转载外,均为本站原创编译】
转载请注明:文章转载自: Oracle Clinic – Maclean Liu的个人技术博客 [http://www.oracledatabase12g.com/]
本文标题: Very Large Database (VLDB) Backup & Recovery Strategies
本文永久地址: http://www.oracledatabase12g.com/archives/very-large-database-vldb-backup-recovery-strategies.html

Administrators must back up larger volumes of data in less and less time, due to stringent service-level agreements. This session outlines best practices for meeting the challenges associated with backing up and recovering very large databases (VLDBs) to protect against media corruption, data failures, and human and application errors.

VLDB:
Old: >= 1 TB
New: >= Tens-hundreds TB

What is definition of VLDB?
Rapidly growing databases (100 TB++)1
Growth of business applications (ERP, CRM)2
Proliferation of collaboration applications (email, IM, voice)
Creation of digital content — audio, image, video

VLOLTP deployments also rising (tens of TB range)
UK national payment clearinghouse
5B+ transactions in 2005 worth $6.5 trillion
60,000+ reports/day against online database
Major disk drive manufacturer
60 TB in ’08 -> 100 TB in ’09
2 TB redo/day

See most recent external Winter report (2005) for VLDB profiles:

http://www.oracle.com/solutions/business_intelligence/docs/spotlight2005_6live.pdf

For DW, there are certain B&R strategies that can be optimized vs. OLTP

First step is to start with how critical lost or unavailable data is to your business – how much will the company lose in $x/hr of downtime, not to mention in productivity and labor costs, if a database is down? This cost is important as it will justify HW and storage expenditures to prevent and recover from those outages.

Once the business cost and impact is determined for unavailable data, next step is to classify those databases and potentially data sets (recall database consolidation trend), according to criticality. E.g. you may have large DW reporting system that can tolerate 12 hours worth of lost data, as batch loads can be re-run, with several hours of acceptable downtime. This system may fit well with a disk/tape backup strategy. You may also have a critical OLTP system that can tolerate no more than a few mins worth of lost data and several mins of acceptable downtime, e.g. online sales processing, which can cost the company $100K/hr in lost revenue, if unavailable. In this case, traditional backups will not suffice and minimal downtime solution, e.g. Data Guard, should be evaluated.

First key requirement is RPO – how much data loss is acceptable? Are there some databases and data sets that require more frequent backups than others? If I need PITR, have I provisioned sufficient disk space for archived logs?

Second key requirement is RTO – how much recovery time can I tolerate? Can I tolerate hours, or only minutes?

* Recovery Object Granularity (ROG). ROG measures the level of objects that a solution is capable of recovering. We are the best there because we can recover just table or row.
The other important characteristics for DW databases is Recovery Event Granularity (REG). REG measures the capability of a recovery management solution to track events and to recover a failed application or missing data to a specific event. E.g., if event is an ETL load, you should be able to easily revert back if ETL load is bad.

What are your Recovery Time Objectives (RTO) – i.e. how quickly must you recover from a data failure?
RTO may vary by database and/or server
Consider combined disk and tape backup strategy for critical databases to meet more pressing RTO requirements
Determine desired recovery capability from disk and tape
What time period represents most frequent recovery needs?
Do you need to retain data on a long-term basis?
Review Oracle Suggested Strategy
Utilize the benefits of integrated disk and tape backup using technologies such as Flash Recovery Area, RMAN and Oracle Secure Backup

Quantum LTO-3 compressed offers 800 GB/volume, 88 MB/s throughput (1 TB/hr backup with 4 tape drives)
Virtual Tape Libraries (VTL) are disk arrays that emulate tape devices. Achieves the performance advantage of disk but seamlessly deployed into backup infrastructure

Oracle9i introduced Flashback Query to provide a simple, powerful and completely non-disruptive mechanism for recovering from human errors. It allows users to view the state of data at a point in time in the past without requiring any structural changes to the database.
Oracle Database 10g extended the Flashback Technology to provide fast and easy recovery at the database, table, row, and transaction level. Flashback Technology revolutionizes recovery by operating just on the changed data. The time it takes to recover the error is now equal to the same amount of time it took to make the mistake.

Flashback Query:
Ensure that the database is using an undo tablespace. The setting the UNDO_MANAGEMENT initialization parameter to AUTO specifies this.
Set the UNDO_RETENTION initialization parameter (secs, default=900) to a value that causes undo to be kept for a length of time that allows success of your longest query back in time or to recovery from human errors.
To guarantee that unexpired undo will not be overwritten, set the RETENTION GUARANTEE clause for the undo tablespace.

Flashback Version Query Pseudocolumn Description:
VERSIONS_STARTSCN, VERSIONS_STARTTIME – Starting System Change Number (SCN) or TIMESTAMP when the row version was created. This identifies the time when the data first took on the values reflected in the row version. You can use this to identify the past target time for a Flashback Table or Flashback Query operation. If this is NULL, then the row version was created before the lower time bound of the query BETWEEN clause.
VERSIONS_ENDSCN, VERSIONS_ENDTIME – SCN or TIMESTAMP when the row version expired. This identifies the row expiration time. If this is NULL, then either the row version was still current at the time of the query or the row corresponds to a DELETE operation.
VERSIONS_XID – Identifier of the transaction that created the row version.
VERSIONS_OPERATION – Operation performed by the transaction: I for insertion, D for deletion, or U for update. The version is that of the row that was inserted, deleted, or updated; that is, the row after an INSERT operation, the row before a DELETE operation, or the row affected by an UPDATE operation. Note: For user updates of an index key, a Flashback Version Query may treat an UPDATE operation as two operations, DELETE plus INSERT, represented as two version rows with a D followed by an I.

The following statement queries the FLASHBACK_TRANSACTION_QUERY view for transaction information, including the transaction ID, the operation, the operation start and end SCNs, the user responsible for the operation, and the SQL code to undo the operation:
SELECT xid, operation, start_scn,commit_scn, logon_user, undo_sql
FROM flashback_transaction_query
WHERE xid = HEXTORAW(’000200030000002D’);

Flashback Database – restore database to point-in-time

Imagine a bad batch job that has impacted your entire database, incorrectly modifying data across many tables and even certain PL/SQL procedures. The only corrective action would be to import a previous good data set, provided you have them, or to restore a good backup and roll the database forward to a point-in-time prior to the time of the batch job. Both can potentially entail long recovery times into the hours or days.

Oracle Database 10g features Flashback Database, a unique database point-in-time recovery capability, which allows the database to be ‘rewound’ to a prior point-in-time within seconds or minutes. Because only the affected data blocks are restored and recovered, it’s faster than traditional recovery methods. The command is simply ‘flashback database to my point-in-time’

We accomplish this fast recovery by utilizing flashback logs which record old block versions. When writes are issued to disk and flashback database is enabled, the old block version is written to the flashback log while the new block version is written to datafiles. When a flashback database command is issued, only the changed blocks are retrieved from the flashback logs and then recovered with appropriate archived logs to the required point-in-time. You control how far back to retain flashback logging to support your recovery requirements. For example you might enable flashback database for 24 hrs, and then rely on backups for recovery past 24 hrs.

Flashback Table – recover contents of tables to point-in-time (undo-based)
Flashback Drop – restore accidentally dropped tables (based on free space in tablespace)
Flashback Transaction – back out transaction and all subsequent conflicting transactions (redo-based)

The important part of snapshots that they don’t protect against “physical” errors such as disk array failure, and “disasters”.

Netapp snapshot:
Extra block read required during sequential scan for copy-on-write (scatter allocation)

For example, if the full db was backed up in June 2006 then read only tablespaces were not backed up moving forward, What happens in March 2007 when they must restore….they need tapes from June 2006 and all tape cooresponding to backed up tablespaces.  We should recommend that periodically users do full backup to include read only tablespace so a more recent set of tapes is available.  Depending on retention policies for tape, it could be problematic to require such old tapes and new ones.

Oracle Database 9i first introduced Table Compression several years ago to allow data to be compressed during bulk load operations such as direct path load, CREATE TABLE AS SELECT…. (CTAS), etc.. This form of compression was ideally suited for data warehousing environments where most data is loaded in the database using batch processes.
Oracle uses a unique compression algorithm specifically designed to work with relational data. The algorithm works by eliminating duplicate values within a database block, even across multiple columns. Compressed blocks contain a structure called a symbol table that maintains compression metadata. When a block is compressed, duplicate values are eliminated by first adding a single copy of the duplicate value to the symbol table. Each duplicate value is then replaced by a short reference to the appropriate entry in the symbol table. Through this innovative design, compressed data is self-contained within the database block as the metadata used to translate compressed data into its original state is contained within the block. When compared with competing compression algorithms that maintain a global database symbol table, Oracle’s unique approach offers significant performance benefits by not introducing additional I/O when accessing compressed data.
The compression ratio achieved in a given environment depends on the nature of data being compressed, specifically the cardinality of the data. In general, customers can expect to reduce their storage space consumption by 2 – 3 times by using the table compression feature. That is, the amount of space consumed by uncompressed data volume will be two to three times larger than that of the compressed data. The benefits of compression go beyond just on-disk storage savings. One significant advantage is Oracle’s ability to read compressed blocks directly without having to first uncompress the block. Therefore, there is no measurable performance degradation for accessing compressed data. In fact, in many cases performance may improve due to the reduction in I/O since Oracle will have to access fewer blocks. Further, you can achieve a more efficient buffer cache by storing more data in the cache without having to increase your memory footprint.
As stated above, the Table Compression feature has no adverse impact on read operations. However, compression requires additional work to be performed while writing the data making it unavoidable to eliminate performance overhead for write operations. However, Oracle has put in a significant amount of work to minimize such overhead for OLTP Table Compression. Oracle compresses a block in batch mode rather than compressing data every single time a write operation takes place. A newly initialized block remains uncompressed until data in the block reaches an internally controlled threshold. When a transaction causes the data in the block to reach this threshold, all contents of the block are compressed. Subsequently, as more data is added to the block and the threshold is again reached, the entire block is recompressed to achieve the highest level of compression. This process repeats until Oracle determines that the block can no longer benefit from further compression. Only transactions that trigger the compression of the block will experience the minimal compression overhead. Therefore, a majority of OLTP transactions on compressed blocks will have the exact same performance as they would with uncompressed blocks.

Media Failure
Restore database files to a new location
Validate restoring database, tablespace, data file
Block Corruption
Validate data file block corruption repair with Block Media Recovery
Validate archived log apply with Trial Recovery (detect corrupt and missing logs)
User Error
Validate correction of user error using Flashback Query (Flashback Table, Flashback Drop, Flashback Database also available in Oracle Database 10g)
Perform tablespace point-in-time recovery (TSPITR) to recover original data prior to an erroneous transaction
Research corrupt or erroneous transactions with LogMiner
Disaster
Validate restoring all files to another host
Test switchover/failover procedures to standby database in Data Guard configuration

Backup and Recovery Design Architect Backup Environment

Disk Tape Library Virtual Tape Library (VTL)
Compatibility,

Provisioning

- OS, drivers, storage must be compatible

- Requires ongoing capacity monitoring

- Well-known interface across heterogeneous systems

- Tapes easily added, as needed

- Emulates standard tape devices

  1. Reduced administrative complexity than disk
  2. Eases transition from tape to disk
Performance - Fast, reliable random I/O access - Slower, sequential access - Fast, reliable random I/O access
Disaster Recovery - Optional block-level mirroring - Offsite tapes for long-term archival and/or DR - Optional file-based replication with deduplication
Cost - Price/capacity starts at few dollars/GB (ATA) - Best price/capacity, e.g. LTO-3 ($60 list) holds 800 GB compressed - VTL license + disk cost (appliance-based)

Backup and Recovery Deployment Develop Recovery Strategies

Recovery Scenario Oracle Technologies
Media Failure RMAN – restore all files to new storage location
Block Corruption RMAN Block Media Recovery, Trial Recovery, LogMiner
User/Logical Error Flashback Technologies, RMAN TSPITR, LogMiner
Disaster Data Guard; RMAN — restore all files to new host/storage


© 2010, www.oracledatabase12g.com. 版权所有.文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.

相关文章 | Related posts:

  1. Best Practices for Improving Oracle Database Backup and Recovery SLAs
  2. Recovery Manager (RMAN) and Oracle Secure Backup (OSB) Secure Your Data, Protect Your Budget
  3. Database Backup with the Sun ZFS Storage Appliance
  4. Advanced Compression in Oracle Database 11g
  5. Database Hangs Because SMON is taking 100% CPU doing transaction recovery
  6. How to recover and open the database if the archivelog required for recovery is either missing, lost or corrupted?
  7. Dream features in Oracle Database 12g?
  8. Oracle Database 11g: High Availability Student Guide
  9. EVENT: 10013 “Instance Recovery”
  10. Oracle Advanced Compression:Throw Away Half of Your Disks and Run Your Database Faster

Leave a Reply

  

  

  

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>