PURPOSE
=======
Often times after a corruption is encountered and even worked around, client’s
management start asking questions on how the corruption occurred and what they
can do to prevent it.
The purpose of this document is to guide the reader through the steps of
gathering the minimum amount of information needed for Oracle to do Root Cause
Analysis (RCA) that will help address customer’s concerns for those corruption
cases.
It is EXTREMELY important and a good practice to gather ALL the information
needed on the FIRST corruption occurrence even if client is not asking for RCA
and just wants to workaround the problem. Often times, the same customer will
ask for the RCA at a later time!
SCOPE & APPLICATION
===================
The article is intended for Oracle Support Analysts.
CORRUPTION OVERVIEW
===================
Each block of an Oracle data file is formatted with a fixed header that
contains information about the particular block. This information provides a
means to ensure the integrity for each block and in turn, the entire Oracle
database.
When Oracle reads a block from disk into the Buffer Cache or writes a block
from the Buffer Cache to disk, there are certain checks done on the block.
For example, one of the checks is to compare the lower order 2 bytes of SCNBase
(byte 10 & 11), the block Type (byte 0), and the Seq number (byte 14) and see
if it matches with the Tail (Byte C, D, E, F).
Oracle8 data block layout:
~~~~~~~~~~~~~~~~~~~~~~~~~
| | | | | | | | | | | | | | | | |
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |10 |11 |12 |13 |14 |15 |
+——-+——-+—————+—————+—————+
|Typ|Fmt|Filler | RDBA | SCNBase |SCNWrap|Seq|Flg|
|——-+——-+—————+—————+—————|
|ChkVal |Filler | <—— |
|——-+——-+ |
: Body of :
: Data Block :
: :
| +—————|
| ———> | Tail |
+—————————————————————+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F |
| | | | | | | | | | | | | | | | |
There are various types of corruption that can occur in an Oracle environment:
o Physical Corruption:
Can be defined as a damage to the internal data structure in a
way that prevents Oracle from being able to read the data in the
corrupted block/s.
Physical corruption can be due to a hardware or software and can
occur in MEMORY — including various I/O buffers and Oracle Buffer
Cache — or on DISK.
+ Memory Corruption – Memory corruption is usually identified by
a background or shadow process whenever it tries to access the
part of memory that is corrupted. For example, when a shadow
process tries to read or update a block in the Oracle Buffer
Cache OR when the DBWR background process tries to write
dirtied block from the Cache to Disk.
Common causes include:
- OS bugs causing bad reads or bad writes.
- Hardware issues.
- Oracle Bug.
- Non-Oracle program attaching and illegally writing
to the same shared memory address space.
+ Disk/Media Corruption – Media failures are failures that are
caused due to a hardware problems, operating system problems,
controller problems, logical volume manager problems, etc.
The data in the Media corrupted block will be lost.
o Logical Corruption:
Involves Oracle being able to find the data, but the data values
are incorrect.
For the purpose of this document our only concern is Physical Corruption.
CORRUPTION ERRORS:
=================
Following are some corruption related ERRORS or Messages that you might be
dealing with:
I. Getting an ORA-01578 :
ORA-01578: “ORACLE data block corrupted (file # %s, block # %s)”
Cause: The data block indicated was corrupted, mostly due to
software errors.
Action: Try to restore the segment containing the block indicated.
This may involve dropping the segment and recreating it.
If there is a trace file, report the errors in it to your
ORACLE representative.
II. Getting an ORA-00600 with first argument in the range of 2000-8000
Code Layer Range
—————– ———–
Cache Layer 2000 - 4000
Transaction Layer 4000 - 6000
Data Layer 6000 - 8000
Generic Layer 17000 – 18000 — Heap Corruption.
IV. You found a trace files similar to this:
Corrupt block dba: 0x160c5958 . found during
buffer read on disk type:6. ver:1. dba: 0x160c5958 inc:0x0000da78
seq:0x000000fb incseq:0xda7800fc
Reread of block=160c5958 file=22. blocknum=809304. found same
corrupted data
V. You ran ‘dbv’ against a file and you got messages similar to this:
Example_one:
DBVERIFY – Verification starting : FILE = /orad01_A/b91prd02/system01.dbf
kdbchk: the amount of space used is not equal to block size
used=213 fsc=12 avsp=7870 dtl=8096
Page 1147 failed with check code 10
DBVERIFY – Verification complete
Example_two:
DBVERIFY – Verification starting : FILE = /oracle/datafile01.dbf
kdbchk: a row ends in the middle of another
tab=0 slot=10 begin=0x2a6 len=0x15e
Page 56167 failed with check code 5
DBVERIFY – Verification complete
o You analyzed an object and failed with :
ORA-01498 : “block check failure – see trace file”
OR
ORA-01499 : “table/index cross reference failure – see trace file”
BEFORE YOU WORKAROUND THE CORRUPTION
====================================
I suggest that you gather the following information before you attempt to work
around the corruption problem:
o The database Alert log.
o The database init<SID>.ora
o Oracle trace files generated around the corruption time.
o Operating System’s system logs.
o Logical Volume Manager (LVM) logs.
o Thorough history of events that led to the corruption.
o Find out if there are any noticeable changes to the environment such
as new init<SID>.ora parameters, new code, Oracle patches, OS patches,
Oracle upgrade, and OS upgrade.
o DBVERIFY on the corrupted file (assuming that the corruption
was not detected by running DBVERIFY in the first place):
% dbv file=<corrupted_file_name> blocksize=<oracle_block_size_in_bytes>
Example:
% dbv file=/orad01_A/b91prd02/system01.dbf blocksize=4096
DBVERIFY – Verification starting : FILE = /orad01_A/b91prd02/system01.dbf
kdbchk: the amount of space used is not equal to block size
used=213 fsc=12 avsp=7870 dtl=8096
Page 1147 failed with check code 10
Total Pages Examined : 6400
Total Pages Processed (Data) : 1690
Total Pages Failing (Data) : 1
Total Pages Processed (Index): 709
Total Pages Failing (Index): 0
Total Pages Empty : 2969
Total Pages Marked Corrupt : 0
Total Pages Influx : 0
o ANALYZE on the object reporting the corruption (assuming that the
corruption was not detected by running the ANALYZE command) :
1_ Identify the corrupted object:
First, identify the FILE# and the BLOCK# of the
corrupted block:
a_ If ORA-01578 :
File# and Block# is part of the error message.
b_ If ORA-00600 :
Find if one of the ORA-00600 arguments represent
a Data Block Address (DBA) in Oracle7 or the
Relative Data Block Address (RDBA) for Oracle8 :
++ If it does :
Example:
ORA-00600 [3339] [805306632] [805306633] (Oracle7)
Plug in the DBA/RDBA decimal value in WebIV –
DBA Calculations Utility — to get the corresponding
File# and Block#.
In this example argument 2 and 3 correspond to
DBA addresses .. lets take DBA 805306633 for an
example. If you plug in the DBA decimal value in
the DBA Calculations Utility you will get the
following results.
Converting DBA 805306633 (dec) ==>
DBA (Hex) = File#, Block#
805306633 (0×30000109) = 12, 265
So File# = 12
Block# = 265
++ If it doesn’t:
Find out if the ORA-00600 trace file has an Oracle
Block Dump in it .. you can search the trace
file for ‘Block header’.
Example of dump:
*** SESSION ID:(13.1324) 2000.04.11.11.57.04.036
Block header dump: dba: 0×30000109
Object id on Block? Y
seg/obj: 0xf92 csc: 0×00.32f3b9a1 itc: 2 flg: -
typ: 2 – INDEX fsl: 0 fnx: 0×0
….
Plug in the DBA/RDBA hex value in WebIV — DBA
Calculations Utility — to get the corresponding
File# and Block#.
In this example the DBA == 0×30000109 (hex),
plugging in this value in DBA Calculations Utility
we get the corresponding File#, Block# as follows:
Converting 0×30000109 (hex) ==>
DBA (Hex) = File#, Block#
805306633 (0×30000109) = 12, 265
So File# = 12
Block# = 265
c_ If you get a trace similar to the one listed under
CORRUPTION ERRORS — number IV — above:
File# and Block# are part of the trace — assuming
that the DBA itself is not corrupted!
Ex.
Corrupt block dba: 0x160c5958 . found during
buffer read on disk type:6. ver:1. dba: 0x160c5958
inc:0x0000da78 seq:0x000000fb incseq:0xda7800fc
Reread of block=160c5958 file=22. blocknum=809304. found same
corrupted data
Here the File# = 22
Block# = 809304
d_ If the corruption was detected by running DBVERIFY:
File# is the corresponding file# of the file
you ran dbverify against.
select name, file# from v$datafile;
Block# is the Page# that failed from running
dbverify against the file.
Example:
DBVERIFY – Verification starting : FILE = system01.dbf
kdbchk: the amount of space used is not equal to block
size used=213 fsc=12 avsp=7870 dtl=8096
Page 1147 failed with check code 10
DBVERIFY – Verification complete
SVRMGR> select name, file# from v$datafile;
NAME FILE#
————————— —-
/oracle/system01.dbf 1
File# for system01.dbf ==> 1
Block# = failed Page# ==> 1147
Once you identify the File# and Block#, you need to run the
following SQL statement to identify the corrupted object:
SELECT owner, segment_name, segment_type
FROM dba_extents
WHERE file_id = <File#>
AND <Block#> BETWEEN block_id AND block_id + blocks – 1;
2_ Analyze the object:
IF segment_type …
INDEX:
—–
ANALYZE INDEX <index> VALIDATE STRUCTURE;
DATA:
—–
ANALYZE TABLE <table> VALIDATE STRUCTURE;
or
ANALYZE TABLE <table> VALIDATE STRUCTURE CASCADE;
Note down the errors and get all the trace files generated
from the analyze command/s.
o Dump the corrupted block/s at the ORACLE level — assuming
that the trace files generated with the corruption error
does not have that dump already:
For detailed information on how to get a block dump at the
ORACLE level, please reference:
Note 45852.1 (Oracle8):
————————-
Example:
ALTER SYSTEM DUMP DATAFILE {‘filename’}|{filenumber}
|—BLOCK MIN {blockno} BLOCK MAX {blockno}|–>
| |
|—-BLOCK {blockno}———————–|
Eg:
ALTER SYSTEM DUMP DATAFILE 1 BLOCK 5586;
ALTER SYSTEM DUMP DATAFILE 1 BLOCK MIN 5585 BLOCK MAX 5586;
ALTER SYSTEM DUMP DATAFILE ‘/u01/oradata/MYDB/system01.dbf’ BLOCK 98;
Note 43142.1 (Oracle7):
————————-
Example:
alter session set events ‘immediate trace name BLOCKDUMP level <DBA>’;
Where the DBA value is in DECIMAL.
o Dump the corrupted block/s at the OS level:
For detailed information on how to get a block dumps at the
OS level, please reference Note 43142.1.
If data files are on ASM, please review the following notes:
Note 293095.1
Note 387001.1
Note 294727.1
EXAMPLES:
========
UNIX:
—-
% dd if= <data_file> bs= <oracle_block_size> skip=<BlockNo> \
count= <how_many_blocks> of=<output file>
where
if: For Input File — the file we are dumping from.
of: For Output File — the file we are dumping to
bs: Oracle block size.
skip: number of blocks to skip == the block# we are trying to
dump. It is useful to dump the block BEFORE and
AFTER the one required.
Eg: skip=Block# -1 count=3
count: The number of block to dump.
Ex.
The following command dumps the BEFORE block, block#2345, and the
AFTER block in file ‘dbfile.dbf’ :
% dd if= dbfile.dbf bs=2k skip=2344 count=3 of=blocks.dd
VMS:
—
Dump the block at the OS level in VMS:
$ DUMP/BLOCKS=(start:XXXX,end:YYYY)/out=dump.out <data-file>
Where XXXX or YYYY =
Operating system block number (in 512 byte chunks). To calculate these
numbers multiply the block number you are dumping by
‘DB_BLOCK_SIZE/512′.
Ex.
The following command dumps the BEFORE block#231822, the
block#231823, and the AFTER block 231824 in file ‘dbfile.dbf’
assuming the Oracle data block size is 2k :
$ DUMP/BLOCKS=(start:927288,end:927296)/out=dump.out dbfile.dbf
927288 = 231822 * (DB_BLOCK_SIZE/512) = 231822 * (2k/512)
927296 = 231824 * (DB_BLOCK_SIZE/512) = 231824 * (2k/512)
o If the database is running in ARCHIVE mode:
Dump the corrupted block from the ONLINE + ARCHIVE redo
logs. It is hard to determine how far back in time we
need to dump the redo info for the corrupted block unless
you know a point in time when the block was *not* corrupted.
Potentially we might need all the redo dump for the block
back to that time.
Please reference Note 28989.1 for more detail
instruction on how to dump the redo info.
Example:
ALTER SYSTEM dump logfile ‘filename’
dba min <filno> . <blkno>
dba max <filno> . <blkno>;
Where ‘filename’ is the ONLINE or ARCHIVE redo log we are
currently dumping the block info from.
Note: There must be a space each side of the ‘.’ in the DBA
Ex.
The following command will dump block#100 of file#14
from the online redo log, log2a.dbf.
ALTER SYSTEM dump logfile ‘log4b.dbf’
dba min 14 . 100
dba max 14 . 100;
(NOTE: You need to run the above alter command *multiple*
times — each time with different online or archive
log (logfile) depending from which redo log you are
dumping the block from. )
A trace file will be generated in the USER_DUMP_DEST directory.
Following is an example of how the beginning of the dump output
will look like:
*** SESSION ID:(9.32) 2000.05.24.15.02.30.000
DUMP OF REDO FROM FILE ‘/database/805/V805/log4b.rdo’
Opcodes *.*
DBA’s: (file # 14, block # 100) thru (file # 14, block # 100)
RBA’s: 0×000000.00000000.0000 thru 0xffffffff.ffffffff.ffff
SCN’s scn: 0×0000.00000000 thru scn: 0xffff.ffffffff
….
After you take the redo dumps and if it is all possible,
have client restore the corrupted file from a backup from
time were he/she thinks the block was not corrupted –
this could be verified by running DBVERIFY on the restored
file — and apply recovery on the file to bring it in
sync with the rest of the database files.
In addition to this simple recovery procedure of having the
potential of working around the problem, it will be a valuable
test for our root cause analysis process to find out whether
restoring the file and recovering it will re-introduces the
corruption problem or not? Please make sure that you start
with a clean copy of the file, one where the block was not
corrupted. Again, running a simple DBVERIFY test on the
restored file will identify whether the file is corrupted
or not!
REFERENCES:
==========
o ”Prevention, Detection and Repair of Database Corruption”
COE Technical Report — Janani Mahalingam
© 2010, www.oracledatabase12g.com. 版权所有.文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.
相关文章 | Related posts:
- Collecting Diagnostic information for Oracle block corruption
- Oracle Block Cache Layer And Its Corresponding Corruption Error Messages.
- FAQ Memory Corruption
- 9iRAC Tuning Best Practices (INTERNAL ONLY)
- How to Force the Database Open ** INTERNAL ONLY **
- ORA-00600 [KCBZPB_1], [59033077], [4], [1], [] example
- Script to Collect RAC Diagnostic Information (racdiag.sql)
- 11g New Feature: Health monitor




最新评论