Best Practices for Avoiding and Detecting Corruption

作者: Maclean Liu , post on January 19th, 2010 , English Version
【本站文章除注明转载外,均为本站原创编译】
转载请注明:文章转载自: Oracle Clinic – Maclean Liu的个人技术博客 [http://www.oracledatabase12g.com/]
本文标题: Best Practices for Avoiding and Detecting Corruption
本文永久地址: http://www.oracledatabase12g.com/archives/best-practices-for-avoiding-and-detecting-corruption.html

Applies to:

Oracle Server – Enterprise Edition – Version: 7.1.1.0 to 11.2.0.1.0 – Release: 7.1.1 to 11.2
Information in this document applies to any platform.
Oracle Server Enterprise Edition – Version: 7.1.1.0 to 11.2.0.1.0
“Checked for relevance on 17-Mar-2009″

Purpose

The purpose of this Note is to explain good practices that can help to avoid the corruption scenarios. Also will explain about various methods to check for physical and logical corruption and basic steps to solve the corruption problems.

Scope and Application

The article is intended for Database Administrators who need to check the
consistency of the data in an Oracle7, Oracle8, Oracle8i, or Oracle9i and 10g databases,
and various methods to identify the corruption and explain basic steps to resolve the data corruption

Best Practices for Avoiding and Detecting Corruption

About Corruption:
~~~~~~~~~~~~~~
Two types of corruption can be reported in database

a. Physical corruption:

This can be defined as a damage to the internal data structure in a way that prevents Oracle from being able to read the data in the corrupted blocks. Physical corruption can be due to a hardware or software and can occur in MEMORY or on DISK. The information in the block doesn’t make any sense after the read from disk (also called: media corrupt).

b. Logical Corruption:

The Oracle RDBMS is able to find the data, but the data values are incorrect: the block is marked as corrupt after detecting an inconsistency (also called: soft corrupt).

Best Practices for Avoiding Corruption:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Using supported and documented combinations:

Look at the supportability matrix while choosing the components / subcomponents while installing Oracle software. These combinations have been tested rigorously and have been certified to work in combination. Error free working of other combinations have not been tested and hence Oracle may not be able to predict normal working of these combinations, which at times may throw errors, which may not have explanations.

Following documented installation and usage guides:

Following documented steps and best practices while installing, configuring and working on the Oracle database will help minimizing chances of corruption and also if the corruption happens, minimizing the impact of it.

Having a sound backup and DR strategy:

A sound backup and disaster recovery strategy is always helpful in such scenarios. With Oracle 10g, we have a quite a few new options that help us to detect the corruption closer to the time of occurrence and also help in recovering from corruptions.

Testing new application before it is moved into production:

It is also advisable to test new applications before they are moved to production.

Be on the latest patchset and the latest release:

The latest patchsets have fixes for the bugs that have been identified on the earlier releases / patchsets. These bug fixes have been regression tested and grouped into patchsets. Thus being on the latest patchset will help avoid errors (and may be corruptions) resulting from these known bugs. This will also help getting a new patch if Oracle Support research indicates it to be a new bug altogether.

Regular and proactive checking of the database for corruptions:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1. Run the DBVerify utility against the datafile at a periodic intervals, to check for any physical corruption.

dbv file=system01.dbf blocksize=8192

Refer Note 35512.1 -> DBVERIFY – Database file Verification Utility (7.3.2 – 10.2)
for more commands, especially for runnign DBVerify on raw devices.

2. Run object level analyze command at a periodic intervals. This will check for logical inconsistencies and
even detect physical corruptions. It is important to note that this command locks the object, which is
being  analyzed so need to be performed at off-peak hours. The online option available with analyze helps
in removing this restriction to some extent.

analyze <table_name> valiadate structure cascade [online];
analyze <index_name/cluster_name> valiadate structure;
analyze table <table_name> partition <partition_name> validate structure cascade into invalid_rows;
(For partition table analyze, the utlvalid.sql script must be run to create invalid_rows table)

3. A full database export (dump file location to /dev/null on UNIX and to nul on windows platform, if space
is a constraint) can be used to check the consistency of a database. This full database export implies a full
table scan on all the tables to retrieve the data. This is a best method for checking the logical consistency     and detecting both for physical corruption (in used blocks) and for logical corruption (data-dictionary            issues).Further diagnosis to be done according to the errors thrown and symptoms indicating corruptions     in any table.

a. Example for Windows platforms

exp system/manager full=y log=db_chk.log file=nul

b. Example for UNIX platforms

exp system/manager full=y log=db_chk.log file=/dev/null

4. Ensuring the data free of corruptions when doing a bulk load. After the dataload, perform normal
validations(select/analyze/export) to detect corruptions, if introduced while loading.

5. We can also use RMAN to check the existence of physical and logical corruption

To validate the complete database

RMAN> BACKUP VALIDATE DATABASE;

To validate only a particular file, example file# 1

RMAN> BACKUP VALIDATE DATAFILE 1;

For more informations about RMAN validation and commands refer

Note 283053.1 How To Use RMAN To Check For Logical & Physical Database Corruption

6. You can use DBMS_REPAIR package to verify corruption in an object and can use this package to mark the block as soft corrupt.

DBMS_REPAIR.CHECK_OBJECT :CHECK_OBJECT procedure checks the specified object and populates the repair
table with information about corruption and repair directive(s). Validation
consists of block checking all blocks in the object. All blocks previously
marked corrupt will be skipped.
For more details review the note below.

Note.68013.1 Ext/Pub DBMS_REPAIR example

7. Run Memory/Hardware diagnostics periodically.

8. Set block checking parameters.

DB_BLOCK_CHECKING = TRUE (FULL from 10.2 onwards)
DB_BLOCK_CHECKSUM = TRUE (FULL from 10.2 onwards)
_DB_BLOCK_CHECK_FOR_DEBUG = TRUE

DB_BLOCK_CHECKING controls whether Oracle performs block checking for data blocks. When this parameter is set to true, Oracle performs block checking for all data blocks. When it is set to false, Oracle does not perform block checking for blocks in the user tablespaces. However, block checking for the SYSTEM tablespace is always turned on.
DB_BLOCK_CHECKSUM determines whether DBWn and the direct loader will calculate a checksum (a number calculated from all the bytes stored in the block) and store it in the cache header of every data block when writing it to disk. Checksums are verified when a block is read only if this parameter is true and the last write of the block stored a checksum. In addition, Oracle gives every log block a checksum before writing it to the current log.
If this parameter is set to false, DBWn calculates checksums only for the SYSTEM tablespace, but not for user tablespaces.
Checksums allow Oracle to detect corruption caused by underlying disks, storage systems, or I/O systems. Turning on this feature typically causes only an additional 1% to 2% overhead. Therefore, Oracle Corporation recommends that you set DB_BLOCK_CHECKSUM to true.
_DB_BLOCK_CHECK_FOR_DEBUG Check more and dump block before image for debugging.

This can have some performance impact, so needs to be checked first.
Recovery from corruptions
~~~~~~~~~~~~~~~~~~~~
1. Find the object throwing the error messages.
o If not known explicitly, identify the datablock throwing corruption (from the error message, the objects
involved in the SQL throwing the error message or the trace file produced)
o Map it to the object it belongs to.

2. If it is index, drop and recreate the index, if feasible. However, if it is a table and you have data to
populate the table, you can drop, recreate and then reload the data. In 10g, the feature of flashback table
can be useful as well to take you to a point before corruptions. If above methods are not possible, then
we need to extract the data from the corrupted table by skipping the corrupted block(s) using various
methods. One of the method is to salvage the data using rowid range scan. Refer
Note 61685.1 -> Extracting Data from a Corrupt Table using ROWID Range Scans in Oracle8 and    higher

3. If above steps are not feasible or not helping getting rid of corruptions, we need to use recovery
techniques. For this, find the datafile / tablespace containing the object. Offline this datafile(s)/tablespace.
Take a backup of these files. Restore the datafiles(s) from the last good backup and recover completely
using archive logs and the online redo logs.

4. If the object having corruption belongs to system tablespace, restore and recovery of the system datafile
to be done while the database is mounted. For user objects, the rest of the tablespaces can be online
when restore and recovery of the affected datafile / tablespace is being implemented.

If the errors introduced are due to Oracle bugs, they may get re-introduced while applying the archive/redo logs, when you are doing a full recovery. The best option in this case is to do a PIT recovery to the time just before the corruptions were introduced.

© 2010, www.oracledatabase12g.com. 版权所有.文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.

相关文章 | Related posts:

  1. Collecting Diagnostic information for Oracle block corruption
  2. Oracle Block Cache Layer And Its Corresponding Corruption Error Messages.
  3. Dealing with Database Corruption
  4. FAQ: Physical Corruption
  5. FAQ Memory Corruption
  6. REDO LOG CORRUPTION – DROPPING REDO LOGS NOT POSSIBLE – CLEAR LOGFILE
  7. Using DUL to Recover From Database Corruption (with some examples)
  8. 11g New Feature: Health monitor

Leave a Reply

  

  

  

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>