Checklist for Performance Problems with Parallel Execution

作者: Maclean Liu , post on April 24th, 2009 , English Version
【本站文章除注明转载外,均为本站原创编译】
转载请注明:文章转载自: Oracle Clinic – Maclean Liu的个人技术博客 [http://www.oracledatabase12g.com/]
本文标题: Checklist for Performance Problems with Parallel Execution
本文永久地址: http://www.oracledatabase12g.com/archives/checklist-for-performance-problems-with-parallel-execution.html

PURPOSE
——-
This note gives an overview of tuning parallel operations at
session and database level.

SCOPE & APPLICATION
——————-
DBA’s and Support analysts.

Checklist for parallel execution
———————————

Tune a parallel operation at session level.
——————————————–

To tune a statements that run in parallel, you have to understand
the basics of parallel execution.

We split SQL statement in smaller operations. Each of this operations
can done in parallel.

As example

select c.unit_price
from costs c, products p
where c.prod_id = p.prod_id;

We can scan the tables (costs,products) in parallel and we can
do the join in parallel. This means we have 3 smaller operations
that run in parallel.

We have slaves that are grouped in slave sets. 1 slave set ( called
Producer) only works at a certain time and the other slave set
( called Consumer ) waits to get data from the first slave set.
After it got all data it starts to work, it is now the Producer
slave set.
The slaves are controlled by the user background process (called
query coordinator QC ). In RAC environment the slaves for one
query maybe be are spawned on different Nodes.

We can use different methods to send data from one slave set
to another. we can send all rows from one slave to all other slaves
in the other slave set. This we call a ‘BROADCAST’ distribution.
Another example is a join on a partitioned table. Here every slave
in the slave set that does the join works only on range of partitions.
Then you see a ‘PART’ distribution method.

A slave set can do 1 or more operation in parallel. As example
a slave set that scans the table COSTS can also do the join.

Here is the execution plan of the example.

-----------------------------------------------------------------
|Id| Operation               |Name    |  TQ |IN-OUT| PQ Distrib |
-----------------------------------------------------------------
| 0| SELECT STATEMENT        |        |     |      |            |
| 1|  PX COORDINATOR         |        |     |      |            |
| 2|   PX SEND QC (RANDOM)   |:TQ10001|Q1,01| P->S | QC (RAND)  |
| 3|    HASH JOIN            |        |Q1,01| PCWP |            |
| 4|     PX RECEIVE          |        |Q1,01| PCWP |            |
| 5|      PX SEND BROADCAST  |:TQ10000|Q1,00| P->P | BROADCAST  |
| 6|       PX BLOCK ITERATOR |        |Q1,00| PCWC |            |
| 7|        TABLE ACCESS FULL|PRODUCTS|Q1,00| PCWP |            |
| 8|     PX BLOCK ITERATOR   |        |Q1,01| PCWC |            |
| 9|      TABLE ACCESS FULL  |COSTS   |Q1,01| PCWP |            |
-----------------------------------------------------------------

We scan both tables COSTS,PRODUCTS in parallel. The slave set Q1,01
that scans table COSTS does also the hash join. The PRODUCTS table
is scaned by a other slave set Q1,00 and send via a BROADCAST distribution
method to the other slave set.

Some operations are running in serial faster than in parallel. Please verify
if you query is faster with parallel execution.
It can happen that the degree of parallelism was changed on tables or indexes
implicit by some operations. As example a import of data or a
“ALTER INDEX REBUILD PARALLEL” can change the degree of parallelism.

The basic information about parallel execution you can find
in the Data Warehousing Guide.

See also
Note.184417.1 Where you find Information about Parallel Execution
in the Oracle Documentation

In 9.x you see the distribution method in the “PX distributed” column
when you run the utlxplp script to format the explain plan output.
From 10g onwards you will see this in the “Operation” column, so it will
also appear in the output of the utlxpls.sql script.
See Note
Note.235530.1 Recommended Method for Obtaining a Formatted Explain Plan
Note.236026.1 Display Execution Plans with dbms_xplan.display

1.) Check the execution plan.

a.) Check if the query is well tuned.

b.) Are steps ( e.q join, read the tables ,..) running on the same slave set?
(Hint: You can see this in the IN-OUT column. If you have often a
PARALLEL_COMBINED_WITH_PARENT PCWP you are on the same slave set.)
This is good for the performance because we do not have to sent data from
one slave set to the other slave set.

E.g We want to join a 2 tables. For the scan of the 2 table we need 2
slave sets. It can happened that the slave set that scanned one table
can also do the join operation. We do not have to send the data of the
2 tables to a new slave set. You will see this in the IN-OUT column of
formatted execution plan output. See also the example above.

Features like full partition wise joins and partial partitions wise joins
can help to achieve that we have PCWP.

c.) Check ff some parts of the operation are running in serial
(e.g full table scan,index fast full scan )

(Hint: You can see this in the IN-OUT column. If you see a P->S or a
S->P you do some operations in serial.)

To scan small tables in serial is maybe faster than to do it in parallel.
Please check if the objects have a suitable degree setup.

2.) Check if any data are skewed when we send them between slave sets.

Run the following query after the parallel SQL statement is finished.

select DFO_NUMBER, tq_id, SERVER_TYPE, NUM_ROWS, PROCESS
from V$PQ_TQSTAT
order by 1,2, 3;

Please have in mind that you can query V$PQ_TQSTAT after the query
is finished not during the run of the query.

An example.

The same SQL statement as before. We force only a HASH distribution.
That all three steps ( scan both tables and the join) done on
different slave sets.
The prod_id columns has only 2 values in the costs tables.

   PROD_ID   COUNT(*)
---------- ----------
       148        720
       147      81392

select  /*+ pq_distribute(c hash,hash) parallel(p,4) parallel(c,4)  */
         c.unit_price
   from  costs c, products p
   where c.prod_id = p.prod_id;

------------------------------------------------------------------
|Id| Operation               | Name    |  TQ |IN-OUT| PQ Distrib |
------------------------------------------------------------------
| 0| SELECT STATEMENT        |         |     |      |            |
| 1|  PX COORDINATOR         |         |     |      |            |
| 2|   PX SEND QC (RANDOM)   |:TQ10002 |Q1,02| P->S | QC (RAND)  |
| 3|    HASH JOIN            |         |Q1,02| PCWP |            |
| 4|     PX RECEIVE          |         |Q1,02| PCWP |            |
| 5|      PX SEND HASH       |:TQ10000 |Q1,00| P->P | HASH       |
| 6|       PX BLOCK ITERATOR |         |Q1,00| PCWC |            |
| 7|        TABLE ACCESS FULL|PRODUCTS |Q1,00| PCWP |            |
| 8|     PX RECEIVE          |         |Q1,02| PCWP |            |
| 9|      PX SEND HASH       |:TQ10001 |Q1,01| P->P | HASH       |
|10|       PX BLOCK ITERATOR |         |Q1,01| PCWC |            |
|11|        TABLE ACCESS FULL|COSTS    |Q1,01| PCWP |            |
------------------------------------------------------------------

 select  DFO_NUMBER, TQ_ID, SERVER_TYPE,  NUM_ROWS ,BYTES
      from v$pq_tqstat
     order by dfo_number , tq_id , server_type;

DFO_NUMBER      TQ_ID SERVER_TYP   NUM_ROWS      BYTES
---------- ---------- ---------- ---------- ----------
............
         1          2 Producer        81392 P002
         1          2 Producer            0 P003
         1          2 Producer          720 P001
         1          2 Producer            0 P000

You see only 2 slaves have work to do.

Tune a parallel operation at database level.
———————————————

It is not so easy to tune parallel execution on database level. You can
tune some parameters like

– tune PARALLEL_MAX_SERVERS
– tune PARALLEL_EXECUTION_MESSAGE_SIZE
– set PARALLEL_ADAPTIVE_MULTI_USER

on a RAC System you can limit the number of instances with
– PARALLEL_INSTANCE_GROUP/INSTANCE_GROUPS

It is important that the CPU utilization is monitored. If the CPU utilization
is always high then you can try reducing the DOP or PARALLEL_MAX_SERVERS.
If your CPU utilization is high you get no benefit from parallel execution.
In same cases the usage of parallel execution under a high CPU utilization
can hurt the performance.

On Platform where OSWatcher is available it should be install and started
to monitor the system. For more information please have a look into
Note 301137.1 OS Watcher User Guide
This can help to check if there was a problem with high CPU utilization.

A good start value for parallel_max_servers is 10*(#CPU’s in the system).

1.) Degree of parallelism (DOP)
Tuning the DOP is a balance between query performance and overall
database performance. If the query runs fine standalone it maybe
has a problem during the day when a lot of queries running in parallel
and we have not enough slaves that the query get the request degree.
You can be check via v$px_session for a single query or via
v$sysstat you can monitor how many parallel operations are downgraded.

The conclusion is that with a high DOP on a table/index only a few
queries get the requested DOP, this depends also of the setting of
PARALLEL_MAX_SERVERS. You can use Resource manager to limit
the degree for a group of users. As example your OTLP users get a low
DOP and your power users ( batch jobs ,..) get higher degrees.
The parameter PARALLEL_ADAPTIVE_MULTI_USER can also help.

A high DOP and a high value for PARALLEL_MAX_SERVERS can hurt your system
and cause a 100% CPU utilization.

Consider the use of different number for the DOP on your tables.
On large tables and their indexes use high degree like #CPU.
For smaller tables use DOP (#CPU)/2 as start value.

a.) Check if you get the request degree for your query in v$px_session

select distinct qcsid, degree , req_degree from v$px_session;

QCSID DEGREE REQ_DEGREE
———- ———- ———-
141 3 4

As you can see, we wanted a degree of 4 and we got a degree of 3.

b.) You can check in statspack if some of your queries have a
reduced the DOP

See columns
DFO trees parallelized
………..
Parallel operations downgraded 1
Parallel operations downgraded 25
Parallel operations downgraded 50
Parallel operations downgraded 75
Parallel operations downgraded to
Parallel operations not downgrade

or use
select name , value
from v$sysstat
where name like ‘%arallel%’;

If a lot Parallel operations downgraded, please consider to
reduce the DOP.

2.) Settings of parallel execution parameters

We already discussed PARALLEL_MAX_SERVERS and PARALLEL_ADAPTIVE_MULTI_USER.
PARALLEL_EXECUTION_MESSAGE_SIZE is the size of the buffer that we need
to exchange data/messages between the slaves or between the slaves
and the QC. High values can improve the overall performance, but we maybe
need much more memory.
See Note:238680.1 Parallel Execution the Large/Shared Pool and ORA-4031

You can check the setting with the commands below.

a.) Use the following statements to check the setting of the important init.ora parameters.

select substr(name,1,30) name , substr(value,1,15) value
from v$parameter
where name like ‘%arallel%’;

b.) With the command
select * from V$PX_PROCESS_SYSSTAT;

we could see the high water mark of parallel execution server.

3.) Checkpoint tuning

Parallel queries read large objects with a direct path read. Before we can do
the read we have to do a tablespace checkpoint to write the dirty buffers
back to the datafiles. So a checkpoint tuning and help to improve the
performance of parallel operations.

See also
Note.237328.1 Parallel Execution and Direct Reads

Parallel Execution with RAC systems
————————————

PX is also automatically controlled on RAC systems.

Parallel execution does not allocate slaves randomly across
the available instances, but rather will start by allocating on the
least loaded instance. The goal is to both minimize inter-node traffic
and at the same time try to minimize any imbalance of work across
the instances.

The INSTANCE_GROUPS/PARALLEL_INSTANCE_GROUP parameter can be used to
restrict allocation of query slaves to specific instances in a RAC
configuration and over-ride the automatic allocation across instances.
This can improve the performance when there are problem with the
inter-connect.

Note: 181489.1 Tuning Inter-Instance Performance in RAC and OPS

© 2009, www.oracledatabase12g.com. 版权所有.文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.

相关文章 | Related posts:

  1. Tracing Parallel Execution with _px_trace. Part I
  2. TROUBLESHOOTING: Possible Causes of Poor SQL Performance
  3. Checklist for Slow Performance of Export Data Pump (expdp) and Import DataPump (impdp)
  4. "hcheck.sql" script to check for known problems in Oracle8i, Oracle9i, Oracle10g and Oracle 11g
  5. Interpreting HANGANALYZE trace files to diagnose hanging and performance problems
  6. Performance Tuning Guide for Siebel CRM Application on Oracle
  7. SQL*Net PERFORMANCE TUNING UTILIZING UNDERLYING NETWORK PROTOCOL
  8. Yet Another Performance Profiling Method (Or YAPP-Method)
  9. FAQ's about Parallel/Noparallel Hints.
  10. THE GAINS AND PAINS OF NOLOGGING OPERATIONS

3 comments to Checklist for Performance Problems with Parallel Execution

  • admin

    Tuning Inter-Instance Performance in RAC and OPS

    Applies to:

    Oracle Server – Enterprise Edition – Version: 9.0.1.0 to 10.2.0.4 – Release: 9.0.1 to 10.2
    Generic UNIX
    Purpose

    This note was written to help DBAs and Support Analysts Understand Inter-Instance
    Performance and Tuning in RAC.
    Scope and Application

    Real Application Clusters uses the interconnect to transfer blocks and messages
    between instances. If inter-instance performance is bad, almost all database
    operations can be delayed. This note describes methods of identifying and
    resolving inter-instance performance issues.
    Tuning Inter-Instance Performance in RAC and OPS

    TUNING INTER-INSTANCE PERFORMANCE IN RAC

    @ ARCHIVED

    SYMPTOMS OF INTER-INSTANCE PERFORMANCE PROBLEMS
    ———————————————–

    The best way to monitor inter-instance performance is to take AWR or statspack
    snaps on each instance (at the same time) at regular intervals.

    If there are severe inter-instance performance issues or hung sessions, you
    may also want to run the racdiag.sql script from the following note
    to collect additional RAC specific data:

    Note: 135714.1
    Script to Collect RAC Diagnostic Information (racdiag.sql)

    The output of the script has tips for how to read the output.

    Within the AWR, statspack report, or racdiag.sql output, you can use the wait
    events and global cache statistics to monitor inter-instance performance. It
    will be important to look for symptoms of inter-instance performance issues.
    These symptoms include:

    1. The average cr or current block receive time will be high. This value is calculated
    by dividing the ‘global cache cr block receive time’ statistic by the
    ‘global cache cr blocks received’ statistic (in the case of cr blocks):

    global cache cr block receive time
    ———————————-
    global cache cr blocks received

    Multiply this calculation by 10 to find the average number of milliseconds. In a
    9.2 statspack report you can also use the following Global Cache Service Workload
    characteristics:

    Ave receive time for CR block (ms): 4.1

    The following query can also be run to monitor the average cr block receive time
    since the last startup:

    set numwidth 20
    column “AVG CR BLOCK RECEIVE TIME (ms)” format 9999999.9
    select b1.inst_id, b2.value “GCS CR BLOCKS RECEIVED”,
    b1.value “GCS CR BLOCK RECEIVE TIME”,
    ((b1.value / b2.value) * 10) “AVG CR BLOCK RECEIVE TIME (ms)”
    from gv$sysstat b1, gv$sysstat b2
    where b1.name = ‘global cache cr block receive time’ and
    b2.name = ‘global cache cr blocks received’ and b1.inst_id = b2.inst_id ;

    The average cr block receive time or current block receive time should typically be
    less than 15 milliseconds depending on your system configuration and volume, is the
    average latency of a consistent-read request round-trip from the requesting instance
    to the holding instance and back to the requesting instance.

    Please note that if you are on 9i and the global cache current block receive
    time is abnormally high and the average wait time for the ‘global cache null
    to x’ wait event is low (under 15ms) then you are likely hitting bug 2130923
    (statistics bug). This is a problem in the way statistics are reported and does
    not impact performance.

    More about that issue is documented in the following note:

    Note: 243593.1
    RAC: Ave Receive Time for Current Block is Abnormally High in Statspack

    2. “Global cache” or “gc” events will be the top wait event. Some of these wait
    events show the amount of time that an instance has requested a data block for a
    consistent read or current block via the global cache.

    When a consistent read buffer cannot be found in the local cache, an attempt is
    made to find a usable version in another instance. There are 3 possible outcomes,
    depending on whether any instance in the cluster has the requested data block
    cached or not:

    a) A cr block is received (i.e. another instance found or managed to produce the
    wanted version). The “global cache cr blocks received” statistic is incremented.
    b) No other instance has the block cached and therefor the requesting instance
    needs to read from disk, but a shared lock will be granted to the requestor
    The “global cache gets” statistic is incremented
    c) A current block is received (the current block is good enough for
    the query ). The ” global cache current blocks received” statistic is
    incremented.

    In all three cases, the requesting process may wait for global cache cr request.
    The view X$KCLCRST (CR Statistics) may be helpful in debugging ‘global cache cr
    request’ wait issues. It will return the number of requests that were handled for
    data or undo header blocks, the number of requests resulting in the shipment of a
    block (cr or current), and the number of times a read from disk status is returned.

    It should be noted that having ‘global cache’ or ‘gc’ waits does not always
    indicate an inter-instance performance issue. Many times this wait is
    completely normal if data is read and modified concurrently on multiple
    instances. Global cache statistics should also be examined to determine if
    there is an inter-instance performance problem.

    3. The GCS/GES may run out of tickets. When viewing the racdiag.sql output
    Note: 135714.1 or querying the gv$ges_traffic_controller or
    gv$dlm_traffic_controller views, you may find that the TCKT_AVAIL shows ’0′. To
    find out the available network buffer space we introduce the concepts of tickets.
    The maximum number of tickets available is a function of the network send buffer
    size. In the case of lmd and lmon, they always buffer their messages in case of
    ticket unavailability. A node relies on messages to come back from the remote
    node to release tickets for reuse.

    4. The above information should be enough to identify an inter-instance performance
    problem but additional calculations can be made to monitor inter-instance
    performance can be found in the documentation.

    IDENTIFYING AND RESOLVING INTER-INSTANCE PERFORMANCE PROBLEMS
    ————————————————————-

    Inter-Instance performance issues can be caused by:

    1. Under configured network settings at the OS. Check UDP, or other network protocol
    settings and tune them. See your OS specific documentation for instructions on how
    to do this. If using UDP, make sure the parameters relating to send buffer space,
    receive buffer space, send highwater, and receive highwater are set well above the
    OS default. The alert.log will indicate what protocol is being used. Example:

    cluster interconnect IPC version:Oracle RDG
    IPC Vendor 1 proto 2 Version 1.0

    Changing network parameters to optimal values:

    Sun (UDP Protcol)
    UDP related OS parameters can be queried with the following command:
    ndd /dev/udp udp_xmit_hiwat
    ndd /dev/udp udp_recv_hiwat
    ndd /dev/udp udp_max_buf
    Set the udp_xmit_hiwat and udp_recv_hiwat to the OS maximum with:
    ndd -set /dev/udp udp_xmit_hiwat
    ndd -set /dev/udp udp_recv_hiwat
    ndd -set /dev/udp udp_max_buf <1M or higher>
    IBM AIX (UDP Protocol)
    UDP related OS parameters can be queried with the following command:
    no -a
    Set the udp_sendspace and udp_recvspace to the OS maximum with:
    no -o
    Linux (edit files)
    /proc/sys/net/core/rmem_default
    /proc/sys/net/core/rmem_max
    /proc/sys/net/core/wmem_default
    /proc/sys/net/core/wmem_max
    HP-UX (HMP Protocol):
    The file /opt/clic/lib/skgxp/skclic.conf contains the Hyper Messaging Protocol (HMP)
    configuration parameters that are relevant for Oracle:
    - CLIC_ATTR_APPL_MAX_PROCS Maximum number of Oracle processes. This includes
    the background and shadow processes. It does not
    include non-IPC processes like SQL client processes.
    - CLIC_ATTR_APPL_MAX_NQS This is a derivative of the first parameter; it will
    be removed in the next release. For the time being, this should be set to
    the value of CLIC_ATTR_APPL_MAX_PROCS.
    - CLIC_ATTR_APPL_MAX_MEM_EPTS Maximum number of Buffer descriptors. Oracle
    seems to require about 1500-5000 of them depending on the block size (8K or
    2K). You can choose the maximum value indicated above.
    - CLIC_ATTR_APPL_MAX_RECV_EPTS Maximum number of Oracle Ports. Typically,
    Oracle requires as many ports as there are processes. Thus it should be
    identical to CLIC_ATTR_APPL_MAX_PROCS.
    - CLIC_ATTR_APPL_DEFLT_PROC_SENDS Maximum number of outstanding sends. You
    can leave it at the default value of 1024.
    - CLIC_ATTR_APPL_DEFLT_NQ_RECVS Maximum number of outstanding receives on a
    port or buffer. You can leave it at the default value of 1024.
    HP-UX (UDP Protcol):
    Not tunable before HP-UX 11i Version 1.6
    For HP-UX 11i Version 1.6 or later be able to use below command to set socket_udp_rcvbuf_default & socket_udp_sndbuf_default
    ndd -set /dev/udp socket_udp_rcvbuf_default 1048576
    echo $?
    ndd -set /dev/udp socket_udp_sndbuf_default 1048576
    echo $?
    HP Tru64 (RDG Protocol):
    RDG related OS parameters are queried with the following command:
    /sbin/sysconfig -q rdg
    The most important parameters and settings are:
    - rdg_max_auto_msg_wires – MUST be set to zero.
    - max_objs – Should be set to at least < # of Oracle processes * 5> and up to
    the larger of 10240 or < # of Oracle processes * 70>. Example: 5120
    - msg_size – Needs to set to at least , but we recommend
    setting it to 32768, since Oracle9i supports different block sizes for each
    tablespace.
    - max_async_req – Should be set to at least 100 but 256+ may provide better
    performance.
    - max_sessions – Should be set to at least < # of Oracle processes + 20>,
    example: 500
    HP TRU64 (UDP Protocol):
    UDP related OS parameters are queried with the following command:
    /sbin/sysconfig -q udp
    udp_recvspace
    udp_sendspace

    2. If the interconnect is slow, busy, or faulty, you can look for dropped packets,
    retransmits, or cyclic redundancy check errors (CRC). You can use netstat commands
    to check the networks. On Unix, check the man page for netstat for a list of options.
    Also check the OS logs for any errors and make sure that inter-instance traffic is
    not routed through a public network.

    With most network protcols, you can use ‘oradebug ipc’ to see which interconnects
    the database is using:

    SQL> oradebug setmypid
    SQL> oradebug ipc

    This will dump a trace file to the user_dump_dest. The output will look something
    like this:

    SSKGXPT 0x1a2932c flags SSKGXPT_READPENDING info for network 0
    socket no 10 IP 172.16.193.1 UDP 43749
    sflags SSKGXPT_WRITESSKGXPT_UP info for network 1
    socket no 0 IP 0.0.0.0 UDP 0…

    So you can see that we are using IP 172.16.193.1 with a UDP protocol.

    3. A large number of processes in the run queue waiting for CPU or scheduling
    delays. If your CPU has limited idle time and your system typically processes
    long-running queries, then latency may be higher. Ensure that LMSx processes get
    enough CPU.

    4. Latency can be influenced by a high value for the DB_FILE_MULTIBLOCK_READ_COUNT
    parameter. This is because a requesting process can issue more than one request
    for a block depending on the setting of this parameter.

    ADDITIONAL RAC AND OPS PERFORMANCE TIPS
    —————————————

    1. Poor SQL or bad optimization paths can cause additional block gets via the
    interconnect just as it would via I/O.

    2. Tuning normal single instance wait events and statistics is also very
    important.

    3. A poor gc_files_to_locks or db_files_to_locks setting can cause problems. In
    almost all cases in RAC, gc_files_to_locks does not need to set at all.

    4. The use of locally managed tablespaces (instead of dictionary managed) with
    the ‘SEGMENT SPACE MANAGEMENT AUTO’ option can reduce dictionary and freelist
    block contention. Symptoms of this could include ‘buffer busy’ waits.

  • admin

    Network Protocols and Real Application Clusters
    PURPOSE
    ——-

    Purpose of this document is to give DBAs and Systems Administrators interested
    in Real Application Clusters an overview and a comparison of network protocols
    available.

    SCOPE & APPLICATION
    ——————-

    DBAs and IT professionals may use this note for their physical cluster layout
    and the options they have available on their platform.

    NETWORK PROTOCOLS AND REAL APPLICATION CLUSTERS
    ———————————————–

    With Real Application Clusters, there are many different interconnect protocols
    available to use with the high speed interconnect.

    On Unix platforms, Oracle typically recommends Infiniband (RDS) where it is certified. See
    Note: 751343.1 for more information on the RDS protocol. Here are the hardware and RAC protocol
    alternatives for each platform (not including RDS):

    Operating System Clusterware Network Hardware RAC Protocol
    —————- ———– —————- ——–
    HP OpenVMS HP OpenVMS Gigabit Ethernet TCP (UDP for 10gR1 and above)
    HP Tru64 HP TruCluster Memory Channel RDG
    HP Tru64 HP TruCluster Memory Channel UDP
    HP Tru64 HP TruCluster Gigabit Ethernet RDG
    HP Tru64 HP TruCluster Gigabit Ethernet UDP
    HP-UX Oracle Clusterware Hyperfabric UDP
    HP-UX Oracle Clusterware Gigabit Ethernet UDP
    HP-UX HP ServiceGuard Hyperfabric UDP
    HP-UX HP ServiceGuard Gigabit Ethernet UDP
    HP-UX Veritas Cluster Gigabit Ethernet LLT
    HP-UX Veritas Cluster Gigabit Ethernet UDP
    IBM AIX Oracle Clusterware Gigabit Ethernet (FDDI) UDP
    IBM AIX HACMP Gigabit Ethernet (FDDI) UDP
    Linux Oracle Clusterware Gigabit Ethernet UDP
    Microsoft Windows Oracle Clusterware Gigabit Ethernet TCP
    Sun Solaris Oracle Clusterware Gigabit Ethernet UDP
    Sun Solaris Fujitsu Primecluster Gigabit Ethernet ICF
    Sun Solaris Sun Cluster SCI Interconnect RSM
    Sun Solaris Sun Cluster Firelink interconnect RSM
    Sun Solaris Sun Cluster Gigabit Ethernet UDP
    Sun Solaris Veritas Cluster Gigabit Ethernet LLT
    Sun Solaris Veritas Cluster Gigabit Ethernet UDP

    Oracle Corporation primarily tests and supports the RDS and UDP network libraries (and
    TCP for Windows). For other proprietary protocols (RDG, ICF, and LLT), the
    cluster vendor writes and supports the network libraries.

    Oracle Corporation has done extensive testing on the Oracle provided RDS and UDP
    libraries (and TCP for Windows). Based on this testing and extensive experience
    with production customer deployments, at this time Oracle Support strongly recommends
    the use of RDS or UDP (or TCP on Windows) for RAC environments.

    RE-LINKING TO CHANGE PROTCOLS
    —————————–

    To switch to the udp protocol, shut down the instances and re-link on each node
    with:

    cd $ORACLE_HOME/rdbms/lib
    make -f ins_rdbms.mk ipc_g ioracle

    9i command:

    cd $ORACLE_HOME/rdbms/lib
    make -f ins_rdbms.mk ipc_udp ioracle

    To confirm that the UDP protcol is being used, run the ‘oradebug ipc’ command
    as described in Note 181489.1 . Note 181489.1 also contains some tuning
    recommendations for various protocols.

    BEST PRACTICES FOR UDP
    ———————-

    - Have at least a gigabit ethernet for optimal performance
    - Do not use crossover cables (use a switch)
    - Increase the udp buffer sizes to the OS maximum
    - Turn on udp checksumming

  • How to Get a 10046 trace for a Parallel Query
    Applies to:
    Oracle Server – Enterprise Edition – Version: 8.1.5.0 and later [Release: 8.1.5 and later ]
    Information in this document applies to any platform.
    Goal

    The Notes gives instruction how to gather 10046 traces for parallel queries.
    For use by Developers and everyone who is interested in tuning.
    It also expain where you find this trace files.

    I assume that the reader is familar with Parallel execution and the Terminology.
    Here is a short overview, more information are in the documentation.

    Parallel execution coordinator/query corridantor (QC):
    The user background process that controls the query.
    Parallel execution servers/slave (QS);
    Do the work and pass results back to the QC
    Parallel execution server set:
    Collection of QS that execute one operation

    Solution
    In Note 376442.1; Master Note: Recommended Method for Obtaining 10046 trace for TuningWe explain how to get a 10046 trace for a serial query. Most of the instructions are valid for a parallel query, but there are some important differences.

    In a parallel query the query corridantor and the slaves execute the SQL command and every process traces the work that he does. That means as example if a slave waits for I/O and there are wait events then the wait events are only visible in the slave trace file and not in the QC trace file.

    If the 10046 trace event is executed before the parallel query is started then the events is automatically propagate to the slaves. This means the slaves also create trace files.

    In 10.2 and previous versions of the database the 10046 trace files will created in the locations of user_dump_dest for the user processes and background_dump_dest for the background processes. Slave are background processes and the traces can be found in the location of background_dump_dest; To show the locaction of the traces files, the following command can be used:
    show parameter dump_dest

    In 11.1 and newer version of the database the traces are in one folder.
    You can use the adrci tool to get an overview over the last creates trace files.You can use the following command:

    adrci> show tracefile -t

    Session Tracing
    This tracing can be used where the session is accessible to the user prior to the start of the statement(s) to be traced.
    To gather 10046 trace at the session level:

    alter session set tracefile_identifier=’10046′;

    alter session set timed_statistics = true;
    alter session set statistics_level=all;
    alter session set max_dump_file_size = unlimited;

    alter session set events ’10046 trace name context forever,level 12′;

    – Execute the queries or operations to be traced here –
    alter session set events ’10046 trace name context off’;
    select * from dual;
    exit;

    In 10.2 and previous version the tracefile_identifier and the procedures of the package dbms_application_info will not change the slave trace names nor it puts information in the trace file. As example:

    exec dbms_application_info.set_action (‘PX Trace’)
    alter session set tracefile_identifier=’10046′;

    Then you will not see a line like the following in the slave traces

    *** ACTION NAME:(PX Trace)

    Where I find the slave traces:

    In some case it might be not obviously what slaves are involved in parallel query and on what node(s) in a RAC system the traces files are created.
    In this situation the following tracing can help

    alter session set tracefile_identifier=’10046′;

    alter session set timed_statistics = true;
    alter session set statistics_level=all;
    alter session set “_px_trace” = low , messaging;
    alter session set max_dump_file_size = unlimited;

    alter session set events ’10046 trace name context forever,level 12′;

    – Execute the queries or operations to be traced here –
    alter session set events ’10046 trace name context off’;
    alter session set “_px_trace” = none;
    select * from dual;
    exit

    In this case the QC trace file contains lines like:

    GROUP GET
    Acquired 4 slaves in 1 set q serial 3073
    P000 inst 1 spid 22332
    P001 inst 1 spid 22334
    P003 inst 1 spid 22336
    P004 inst 1 spid 22336

    This shows what slaves on what instance are involved in the parallel query and in this example we have to look
    on instance 1 in the trace folder for filenames that contain the words ..p001..trc,..p002..trc,..p003..trc and ..p004..trc.

    Tracing a process after it has started

    If you start tracing after the parallel query started with one of the following commands:

    SQL> exec DBMS_MONITOR.SESSION_TRACE_ENABLE(..)
    SQL> oradebug event 10046 trace name context forever, level 12
    SQL> exec DBMS_SYSTEM.SET_SQL_TRACE_IN_SESSION(..)
    SQL> exec DBMS_SYSTEM.SET_EV(..)

    then you have to start tracing for all slaves and the QC, If you only started to trace the QC there are no traces created by the slaves.

    The following topic from Note 376442.1 can be used in the same way for parallel eexcution.
    Instance wide tracing
    Initialisation parameter setting
    Via a Logon Trigger

    Tracing in 11g and higher versions
    In 11 g there is a new tracing interface. This allows us to trace only some SQL command or processes.
    or do the tracing in the same way we did it in previous versions.

    alter session set tracefile_identifier=’10046′;

    alter session set timed_statistics = true;
    alter session set statistics_level=all;
    alter session set max_dump_file_size = unlimited;

    alter session set events ‘sql_trace level 12′;

    – Execute the queries or operations to be traced here –
    alter session set events ‘sql_trace off’;
    select * from dual;
    exit;

    We can also limit the trace to certain SQL’s. Let us assume we would like to trace 2 SQL’s with the slq_id
    g3yc1js3g2689 and 7ujay4u33g337.Then we can use the following command:

    alter session set tracefile_identifier=’10046′;

    alter session set timed_statistics = true;
    alter session set statistics_level=all;
    alter session set max_dump_file_size = unlimited;

    alter session set events ‘sql_trace [sql:g3yc1js3g2689|7ujay4u33g337] level 12′;

    – Execute the queries or operations to be traced here –
    alter session set events ‘sql_trace off’;

    select * from dual;
    exit;

Leave a Reply

  

  

  

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>