Applies to:
Oracle Server – Enterprise Edition – Version: 9.2.0.1 to 9.2.0.8 – Release: 9.2 to 9.2
Oracle Server – Enterprise Edition – Version: 10.1.0.2 to 10.1.0.5 [Release: 10.1 to 10.1]
Oracle Server – Enterprise Edition – Version: 10.2.0.1 to 10.2.0.4 [Release: 10.2 to 10.2]
Oracle Server – Enterprise Edition – Version: 11.1.0.6 to 11.1.0.7 [Release: 11.1 to 11.1]
Information in this document applies to any platform.
Purpose
A performance issue will typically present itself in transactions
not appearing on the Apply sides tables as quickly as expected
thus there will be latency or lag present.
It may be unclear in the first instance whether the
issue is Capture , Propagation or Apply related – most likely it
will either be a Capture or Apply problem.
A heartbeat table is a simple way to monitor latency of Streams.
This Note explains what diagnostics are required by DBAs when
troubleshooting Streams processes. The details to be collected
should be forwarded to Oracle for further scrutiny if it is not
obvious what the issue may be.
In addition, pointers / notes are suggested to help identify where
a performance problem may lie.
Please also refer to :
The Oracle Streams Performance Tuning Best Practices: Oracle
Database 10g Release 10.2 which provides a useful guide to
performance analysis and identification.
(http://www.oracle.com/technology/deploy/availability/pdf/maa_wp_10gr2_streams_performance.pdf)
This Notes explains what diagnostics are required by DBAs when troubleshooting the Streams processes
Last updated 13 February 2009
Last Review Date
November 15, 2008
Instructions for the Reader
Troubleshooting Details
Prerequisites
- Verify that Streams Best Practices are implemented (9.2, 10.1, 10.2, 11.1);
Please refer to Note 335516.1 for Streams performance recommendations;
- Confirm that Recommended Patches are installed
Details to collect
The following presents a summary of the information Oracle Support
Services require to investigate a performance issue.
All the following should be collected if possible.
- Use STRMMON at both source and target to analyze processing
rates(default output) and top events;
- Run Healthcheck at source and target ;
- High process CPU usage at the OS level;
- Generate AWR output for the time period;
- Check SQL connected with high level of buffer gets;
- Generate ASH output for the time period;
- Oracle Streams Performance Advisor (> 11.1);
- Generate process stack information;
- Investigate memory usage by Streams Processes;
- Optional 10046 / level 12 (sql trace);
- Gather and upload details collected
Troubleshooting detail
Verify that Streams Best Practices are implemented (9.2, 10.1, 10.2, 11.1)
Capture :
Make sure that the documented Streams best practices are implemented, especially
the following:
Alter the capture parameter CHECKPOINT_RETENTION_TIME to a value suitable for
the database. Typically this value can be set to 7 (days) from 60 (days). It is
best practice to maintain at least 1 day of Streams logminer information in the
database.
Make sure that the hidden capture parameter _CHECKPOINT_FREQUENCY is set to
1000 (new default value for 11g). This will reduce the number of logminer (not
database) checkpoints taken which has the side effect of reducing the size of
the SYSAUX tablespace and redo generation as well.
Ensure that the Streams pool in configured appropriately to allow buffered
LCRs to be processed on Capture and Apply databases.
If the capture appears to be progressing through a single redo log very slowly,
check the v$logminer_stats view for the statistic ‘resume from low memory’. If
the value for this statistic is increasing, consider setting the hidden
capture parameter _SGA_SIZE. Customers who have encountered the
“resume from low memory” issue report improvements by increasing this value
to 50 or 100 (Mb) from 10. If the STREAMS_POOL_SIZE is explicitly set
and the sga_target is not (i.e., no autotuning), be sure to increase
the this parameter by a similar amount.
Apply :
The Apply process needs to be able to uniquely identify a row.
Where there is no primary key, the Apply process may need to
be configured to identify relevant columns to allow it to effectively
determine a substitute primary key.
This is a basic requirement which will avoid Full Table Scan (FTS) operations
for update and delete operations. Refer to the Steams Replication
Administrator’s guide for recommendations on the use of set_key_columns
and Supplemental logging in relation to tables which do not have primary keys.
Hint : if an Application does not fully make use of primary keys, this is
a potential issue, refer to AWR collection and Sql plan generation below.
Confirm that Recommended Patches are installed
There are a number of patches which are relevant to performance issues.
Given the complexity of Streams and logminer operations, Streams users
should be mindful recommended patches as outlined in : Note 437838.1.
These patches address performance related issues as well as address
Streams stability issues.
Use STRMMON at both source and target to analyze processing
rates(default output) and top events
Strmmon is useful to help identify :
- key event summary details associated with a process;
- latency;
- bottlenecks in processing activity;
- memory usage by Streams;
- LCR throughput; as well as
- performance activity over time / rate at which LCRs are processed
The default output indicates the related Streams process throughput rates
The ‘-long’ option presents additional information.
Note : To simplify viewing, the default (not -long) strmmon output file
can be opened in MS Excel. Select ‘other’ for the separator and specify
the unix pipe symbol “|” as the separator.
===========
Important events detailed by strmmon are :
%I reported by strmmon indicates that the process is effectively
idle performing no activity ;
%F indicates that the Capture process is in flow control
Hint : where flow control is a persistent observation, the performance issue
is likely Apply side related.
Capture latency is a measure of when the message was captured in relation
to the time at which the message was created.
v$streams_capture.capture_time – v$streams_capture.capture_message_create_time
Apply latency is a measure of when the message was dequeued by the
Apply reader in relation to when the message was created :
v$streams_apply_reader.dequeue_time -
v$streams_apply_reader.dequeued_message_create_time
Hint : If there is latency, is Capture in paused for flow control
state a substantial part of the time. If this is the case,
Capture is likely throttling back due to the Apply side not keeping up
and therefore the performance issue is likely an apply side problem.
Hint : If there is latency on both sides with little or no paused
for flow control shown by Capture, then this may suggest a Capture
side issue or , less likely, that perhaps both sides may have a problem.
Bottlenecks give a clue to where an issue may lie. This is identified by
‘<B>‘ in the strmmon output.
A reported Capture bottleneck indicates that
v$streams_capture.capture_message_number < v$logmnr_session.read_scn
Hint : If this is consistently observed, this is likely a Capture
side issue. This may suggest that the builder/preparer processes
activities are slower than expected.
A reported Apply bottleneck indicates :
(v$streams_apply_coordinator.total_received -
v$streams_apply_coordinator.total_assigned) > 10
Hint : If this is consistently observed and there are fewer than 10
apply servers then this might suggest an Apply side issue.
This may suggest that the Apply servers are slower than expected
and that the Apply coordinator is receiving messages faster than
server processes can process them.
High memory usage may suggest that the Streams pool is not large
enough or that there is significant spillover of transactions from memory
to disk.
Please refer to Note 746247.1 for more information on
potential issues associated with Paused For Flow Control problems.
Example of strmmon output and interpretation is detailed in :
The Oracle Streams Performance Tuning Best Practices: Oracle
Database 10g Release 10.2 on otn.
Run Healthcheck at source and target
The Streams healthcheck provides :
- configuration, as well as
- performance related detail
Gather this on the Capture as well as Apply related databases – Note 273674.1.
Relevant sections (under heading detailed) worthy of inspection are :
‘constraints on tables configured in streams’
Hint : This section is applicable to the Apply side. As outlined in the
prerequisites section, determine those Apply side tables which may
not have a primary key. Determine if any of these tables may be subject to
full table scans by referring to an AWR report.
‘performance checks’
Hint : If there are obvious issues and hard coded thresholds in the
script are exceeded, details will be output; in particular details regarding
the following will be output as an advisory warning :
- process latency information;
- complex rules for : capture, propagation and apply; complex rules should be
avoided as this involves sql activity in rule evaluation;
- high Streams pool usage;
- logminer spill
‘history’
‘Streams Buffered Queue History for last day’/
‘Streams Buffered Subscriber History for last day’/
‘Streams Capture History for last day’
The above give a useful appreciation of how the buffer queue is being processed
over time.
There can be > 1 subscriber per queue / capture process .
Check whether spill to disk is a common activity.
Also, inspect the ‘current’ and ‘cumulative columns to understand how messages
have been processed or spilled.
‘capture statistics’
The elapsed time detail come from the v$streams_capture view.
Check the elapsed time for the various activities listed.
Hint : if ‘elapsed time rule evaluation’ column is relatively high examine rule
definitions. Bear in mind that the elapsed enqueue time for capture includes
the time for propagation rule evaluation.
Hint : ‘full evaluations’ column should be small or zero.
Double check that the rules are simple. If at all possible, eliminate rules
for propagation. Check v$rule_set information for the capture and propagation
rule sets. Make sure that SQL_EXECUTIONS is 0. If non-zero, one or more
rules in the rule set may require full evaluation.
Check for get_tag clauses in rules that do not include HEXTORAW.
e.g :dml.get_tag() = HEXTORAW(’09′).
Explicit use of HEXTORAW in get_tag comparisons will avoid SQL execution
and improve rule evaluation performance.
Hint : A high value in relation to other elapsed time values for
‘Elapsed Time Paused’ column would suggest an Apply side issue;
‘elapsed time redo wait’ column indicates that Capture is waiting for
redo. If this is high relative to other values this might indicate
redo is not available (downstream capture) or there is not a lot of
activity on the system.
Inspecting the ‘elapsed time capture’ column you should consider the number of
LCRs created versus the number of LCRs enqueued.
This is a measure of the number of LCRs Capture is not interested in.
‘logminer statistics’
Hint : ‘bytes paged out’ column (from v$logmnr_stats) should be zero.
If not, increase _SGA_SIZE. If this persists, the AWR may identify the
sql statement related to what is being transferred to disk.
‘streams Process Wait Analysis’
- attempt to isolate whether one process is waiting significantly longer
on a resource than might be expected. ‘time waited’ column indicates the
extent to which the process is waiting on the
event. Disregard the ‘rdbms ipc message’ event.
Hint : a process which has an empty / null event name is effectively using
CPU or is waiting for CPU and therefore is active.
Where this is high refer to the AWR report as the next step.
The following events may suggest that the process is waiting
but is constrained / restricted , i.e events such as :
- ‘Streams capture: waiting for subscribers to catch up’
- ‘Streams capture: resolve low memory condition’
- ‘Streams capture: waiting for archive log’
(the first of the above suggests an Apply side issue).
Hint : Events such as :
- ‘db file scattered read’ suggest Full Table Scan activity.
This should be avoided at all cost.
To isolate the cause of this, refer to the AWR report in the first
instance; as explain plans.
- ‘db file sequential read’ is expected as this implies index access
‘apply statistics’
If the Apply side is operating correctly there should be relatively few
messages spilled by the Apply reader.
‘apply spilled transactions’
Hint : if there are large number of transactions spilled on the apply side
this implies :
- that the transactions may be very large;
- that the apply servers may not be working as effectively as expected
and that this might be an Apply side issue. i.e that they are being
spilled to disk.
From 10.2 onwards, large transactions (> 10000 LCRs) and
long running transaction (no further LCRs received for over 10 minutes)
are spilled to disk on the Apply database. Does the issue correspond
with these transaction types.
Check progress of the various Apply processes :
reader/coordinator and Apply servers.
‘apply server statistics’
The same applies here as with ‘streams Process Wait Analysis’ above.
Hint : If there is any kind of latching wait events (latch:*) listed
above i/o type wait events such as ‘d/b file sequential read’ this
would be unexpected and therefore gathering stack samples from the
relevant process may be of benefit.
If high latch waits are observed , as well as collecting, process
stacks it may also be useful to collect 10046 / level trace as detailed
below.
High process CPU usage at the OS level
Streams related OS process ids are recorded in the alert.log file.
Note: Allowance for 11g CCA is yet to be made in queries below.
The following also identifies memory usage associated with Streams processes.
connect / as sysdba
set pagesize 1000
-- Capture
define capture_name='<substitute relevant capture name>'
select p.spid Spid, 'C00'||c.capture#||' '||upper(lp.role) "Capture Process ", c.capture_name "Capture Name", p.pga_used_mem "PGA Memory Used", p.pga_alloc_mem "PGA Memory Allocated", p.pga_max_mem "PGA Maximum Memory"
from v$streams_capture c, v$logmnr_process lp, v$session s, v$process p
where c.logminer_id = lp.session_id
and lp.role in ('reader','preparer','builder')
and lp.sid = s.sid
and lp.serial# = s.serial#
and s.paddr = p.addr
and c.capture_name = '&capture_name'
union
select p.spid, 'C00'||c.capture#||' Coordinator', c.capture_name, p.pga_used_mem, p.pga_alloc_mem, p.pga_max_mem
from v$streams_capture c, v$session s, v$process p
where c.sid = s.sid
and c.serial# = s.serial#
and s.paddr = p.addr
and c.capture_name = '&capture_name'
order by 6,5;
connect / as sysdba
set pagesize 1000
-- Apply
define apply_name='<substitute relevant apply name>'
select p.spid Spid, 'AP0'||sac.apply#||' Apply Coordinator' Process, sac.apply_name "Apply Name", p.pga_used_mem "PGA Memory Used", p.pga_alloc_mem "PGA Memory Allocated", p.pga_max_mem "PGA Maximum Memory"
from v$streams_apply_coordinator sac, v$session s, v$process p
where sac.sid = s.sid
and sac.serial# = s.serial#
and s.paddr = p.addr
and sac.apply_name = '&apply_name'
union
select p.spid, 'AP0'||sas.apply#||' Server' Process, sas.apply_name "Apply Name", p.pga_used_mem "PGA Memory Used", p.pga_alloc_mem "PGA Memory Allocated", p.pga_max_mem "PGA Maximum Memory"
from v$streams_apply_server sas, v$session s, v$process p
where sas.sid = s.sid
and sas.serial# = s.serial#
and s.paddr = p.addr
and sas.apply_name = '&apply_name'
union
select p.spid, 'AP0'||sar.apply#||' Reader' Process, sar.apply_name "Apply Name", p.pga_used_mem "PGA Memory Used", p.pga_alloc_mem "PGA Memory Allocated", p.pga_max_mem "PGA Maximum Memory"
from v$streams_apply_reader sar, v$session s, v$process p
where sar.sid = s.sid
and sar.serial# = s.serial#
and s.paddr = p.addr
and sar.apply_name = '&apply_name'
order by 6,5;
connect / as sysdba
set pagesize 1000
-- Propagation
define prop_name='<substitute relevant propagation name>'
select p.spid Spid, qs.process_name "Process ", pr.propagation_name "Propagation Name", pr.source_queue_owner ||'.'||
pr.source_queue_name ||'@'||
g.global_name "source queue", pr.destination_queue_owner ||'.'||
pr.destination_queue_name ||'@'||
pr.destination_dblink "Destination Queue",
p.pga_used_mem "PGA Memory Used", p.pga_alloc_mem "PGA Memory Allocated", p.pga_max_mem "PGA Maximum Memory"
from dba_queue_schedules qs, dba_propagation pr, v$session s, v$process p, global_name g
where qs.session_id = s.sid||', '||s.serial#
and qs.schema = pr.source_queue_owner
and qs.qname = pr.source_queue_name
and s.paddr = p.addr
and qs.message_delivery_mode = 'BUFFERED'
and pr.propagation_name = '&prop_name'
order by 8,7;
Having determined the relevant process ids, determine CPU usage as
follows for individual Streams’ processes; spid1,spid2,spid3, etc
are identified from above queries.
| Platform | Command | Example |
|---|---|---|
| Linux | top | top -b -d2 -n30 -p spid1,spid2,spid3,etc > cpu.top 30 snapshots taken 2 seconds apart for the Streams processes : spid1, spid2, spid3, etc. Output directed to file. |
| hp-ux | top | top -f cpu.top -s2 -d30 -n20
record 30 snapshots each showing 20 processes taken 2 seconds apart. Second phase involves searching for Streams processes in output : egrep -i “CPU.*TTY|spid1|spid2|spid3|etc” cpu.top |
| AIX | top | Note: it is difficult to capture the output of topas to a file which is in a readable form. Therefore, the following achieves the same as above :
sc=0 egrep -i “CPU.*COMMAND|spid1|spid2|spid3|etc” /tmp/cap.out |
| Solaris | prstat | prstat -p spid1,spid2,spid3,etc 2 30 > cpu.top |
| Windows | - | use relevant Windows tool |
Hint : Are these processes exhibiting more CPU than would typically
be observed?. If there is high CPU , what is the process type and
does the Healthcheck report also show that the process is using
greater then usual CPU. i.e. is the event name null.
Generate AWR output for the time period
An AWR snapshot can be collected a the start and end of a monitoring
period using :
connect / as sysdba
execute dbms_workload_repository.create_snapshot; The AWR report can be collected using :
connect / as sysdba
Note: Choose report type html.
The AWR report will provide important information in relation to
overall process activity.
As a guide the most revealing details in the output are likely to be :
- SQL Statistics / ‘SQL ordered by Gets’. Poor performance may typically
be associated with the execution of a particular statement or statements
with poor execution plan(s). This will typically present itself by a high
number of buffers or blocks retrieved by a particular statement – per
execution of that statement. These may be linked to statements of form :
update /*+ streams restrict_all_ref_cons */ ….
delete /*+ restrict_all_ref_cons */ …
Also consider ‘Wait Events’ as well as ‘Top 5 timed Events’ – this summary
can sometimes present no obvious clues as to what is going on
although values such as ‘CPU Time’ will likely correspond to sql statements
and buffer gets access.
Check the other SQL related details also : ‘SQL ordered by CPU’ should
then be checked to see if there is a correlation in information presented.
Hint : for slow Apply side activity check for high values associated
with ‘SQL ordered by Gets’ on streams related tables.
- ‘Streams Statistics’ – this can identify high CPU/IO activity with a
particular Streams process. This will likely cross reference the SQL
ordered by Gets section and indicate which process is encountering the problem;
- ‘Streams Capture’ / ‘Streams Apply’ statistics show activity at an LCR
level which should also be cross referenced with the Streams Healthcheck.
Hint : In addition to buffer gets associated with
Application level tables , access by Streams to underlying tables
on disk tables can and will affect performance. e.g.
- High buffer gets associated with access to : aq$<queue_table>_p indicates
that there is queue spill and that messages are being written to disk as
they cannot go into memory.
- High buffer gets associated with access to :
sys.streams$_apply_spill_messages / streams$_apply_spill_msgs_part indicate
Apply spill. Here messages have been aged out to disk as a result of :
- Apply server processes being slow ;
- large transactions being processed on the source which would otherwise
consume / exhaust the Streams pool
- High buffer gets to system.logminer_restart_ckpt$ may indicate that
checkpoints are not configured correctly as per Streams prerequisites.
Other issues may be possible.
Check SQL connected with high level of buffer gets
Where there is a high ‘buffer gets’ statistic in relation
to the number of executions , this may suggest a Full Table Scan (FTS) issue.
FTS will show as wait event : ‘db file scattered read’.
This could be either a configuation issue as discussed under prerequisites
or due to the Oracle’s optimizer dynamic statistic sampling setting.
Hint : in order to understand what the issue may be , generate
the explain plan for each of the queries at the top of the
‘SQL ordered by Gets’ list where ‘Gets per Exec’ is high.
This is likely to be more useful on the Apply side.
connect / as sysdba
set pagesize 1000
spool plans.lst
- select default html report type;
- enter snapshot begin/end ids;
- importantly , enter the sql id as shown in the AWR report
spool off
Generate ASH output for the time period
The Streams Healthcheck in 9.2 does not summarise the Active Session History
thus it may be useful to collect this under circumstances :
- when 9.2 is being used;
- when ASH details are collected over a short period of time
The Streams healthcheck shows active session history information for the
last 30 minutes where information is available.
connect / as sysdba
Oracle Streams Performance Advisor (>= 11.1)
In versions >= 11.1, the Oracle Streams performance advisor package
gives further insight into a potential performance issue.
You do not have to do anything extra to gather this information as the
Streams Healthcheck for 11.1 and above outputs specific details related to
the performance advisor under headings beginning : ’Streams Topology’.
Generate process stack information
When to gather this information :
- High latching activity observed which could be associated with Streams
processes;
- High CPU with no corresponding SQL buffer gets activity; this might
suggest some other PGA or SGA related structure references which are
inefficient;
When collecting stack information, aim to collect a sample large enough
such that a profile of the execution of the process can be gleaned.
Using this information Oracle Support Services can collate the details
to understand where a process is typically executing.
This may be useful in identifying the cause of difficult to understand
performance related issues.
Stack collection can be achieved with the scripts provided as follows :
Note: Oradebug will work on all platforms. pstack (Solaris, Linux) could
also be used to achieve the same. In which case, replace :
- ./stack.sh $1 -> pstack $1
create the following 2 scripts :
stack.sh
#!/bin/ksh
sqlplus -s '/ as sysdba' << EOF
spool stacks.out append
oradebug setospid $1
oradebug unlimit
oradebug SHORT_STACK
EOF
genstacks.sh
#!/bin/ksh
count=$2
sc=0
while [ $sc -lt $count ]
do
sc=`expr $sc + 1`
./stack.sh $1
sleep 1
done
With the OS process ids or threads consuming CPU (refer above)
or using higher than usual CPU.
For each process , with spid (OS process/thread id) X, do following :
chmod +x stack.sh
chmod +x genstacks.sh
script X.out
./genstacks.sh X 600
This will generate 600 stacks from the process : 1 stack per second.
On Unix platforms, the command script can be used to record
For Windows platforms, cygwin can be downloaded from http://www.cygwin.com
to allow Unix shell compatibility.
Investigate memory usage by Streams Processes
If the queries used above to identify Streams processes or OS
tools indicate high memory usage then consider generating a
heapdump of the process.
It may be that most of the memory allocated within the process is
linked to a particular allocation chunk name which suggests a
particular area of functionality.
A performance issue may be connected with the traversal of a large
memory structure or a leak which will eventually cause OS memory
and swap issues.
Hint : this section is really only applicable where :
- there is poor performance, and
- it is observed that a process is using a large amount of memory
The Spid above identifies the related process.
connect / as sysdba
oradebug setospid <Spid>
oradebug unlimit
#top PGA/top UGA/Current call/User call
# 29 + 268435456 = 268435485
oradebug dump heapdump 268435485
oradebug tracefile_name
Note on some platforms, the tracefile name may not be correctly returned.
Optional 10046 / level 12 (sql trace)
Sql trace can be collected from the process to understand more.
This may generate a large trace file therefore it is optional.
First of all try and isolate the processing using high CPU.
connect / as sysdba
oradebug setospid X (X is the OS process id/thread)
oradebug unlimit
oradebug event 10046 trace name context forever, level 12
wait 5 minutes
oradebug event 10046 trace name context off
Gather and upload details collected
Collect and upload to Oracle Support Services details from the above.
Please also try and summarise any any key findings or conclusions.
Upload , where possible :
- strmmon output from source Capture and destination Apply;
- Streams Healthcheck from source and destination;
- evidence of High CPU at OS level;
- AWR output;
- any explain plans collected;
- ASH output for the time period;
- process stack information;
- trace file associated with 10046/level 12 or heapdump
(if relevant).
Please also upload alert.log files from Capture database and
Apply database.
© 2009 – 2010, www.oracledatabase12g.com. 版权所有.文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.
相关文章 | Related posts:
- Oracle Streams Performance Advisor
- Streams Combined Capture and Apply in 11g
- Troubleshooting Queries in Streams
- Troubleshooting Streams Capture when status is Paused For Flow Control
- Split and Merge of a Streams Destination
- Streams Performance Recommendations
- Scripts for Oracle Streams Performance Tuning Best Practice: Oracle 10g Release 10.2
- How To Set The _TXN_BUFFER_SIZE Parameter For a Streams Apply Process




最新评论