RAC Performance Experts Reveal All

作者: Maclean Liu , post on September 12th, 2009 , English Version
【本站文章除注明转载外,均为本站原创编译】
转载请注明:文章转载自: Oracle Clinic – Maclean Liu的个人技术博客 [http://www.oracledatabase12g.com/]
本文标题: RAC Performance Experts Reveal All
本文永久地址: http://www.oracledatabase12g.com/archives/rac-performance-experts-reveal-all.html

From the point of view of process architecture, one or more block server processes , called

LMS, are handling the bulk of the message traffic. The LMS processes are Oracle background processes. When a shadow process makes a request for data, it sends a message directly to an LMS process on another node, which in turn returns either the data or a grant ( permission to read from disk, or write to data block  ) directly to the requester.

The state objects used for globally cached data are maintained in the SGA and are accessed by all processes in an instance which need to maintain and manipulate global data consistently.

The Global Cache Service manages data cached in the buffer caches of all instances which are part of the database cluster. In conjunction with the an IPC transport layer, it initiates and handles the

•memory transfers for write access ( CURRENT ) or read access ( CR )
• transfers for all types (e.g data, index, undo, headers )
• globally managed access permissions to cached data

The GCS can determine if and where a data block is cached and forwards data requests to the appropriate instances. It minimizes the access time to data, as the response time on a private network is faster than a read from disk. The message protocol scales and will at most involve 3 hops in a cluster comprised of more than 2 nodes.

Cache Fusion and the GCS constitute the infrastructure which allows the scale-out of a database tier by adding commodity servers.

 

In its simplest case, a data request involves a message to the instance where the data block is cached. The request message is usually small, approx. 200 bytes in size. The requesting shadow process initiates the send and then waits until the response arrives. The message is sent to an LMS process on a remote instance. The LMS process receives the message, executes a handler and processes the message, and eventually send either the data block or a grant message.

The minimum roundtrip time involving an 8K data block is about 400 microseconds. It is obvious that the pure wire time consumes only an insignificant portion of the total time. It should also be clear that the key factors for performance are the time it takes to send, receive and process the data, which makes the responsiveness of LMS under load a critical factor.

 

Ordered from smallest to highest cost in terms of CPU cycles and delays

The factors which influence blocks access cost and therefore performance are listed here, ordered by significance:

The pure message propagation delay or wire time constitutes only about 6% of the roundtrip time and is usually smaller than the CPU time used for sending, receiving and processing the messages.

The processing time or cycles in the OS stack typically consume about 52% of the time, which includes process scheduling and wakeup.

The block server process load and scheduling have the biggest impact on the latencies and may inflate them.  If an LMS is busy ( or “congested” , cannot pick up incoming requests fast enough ), queueing time for requests increases. Similarly, when the CPUs on a node are busy and run queues are long, the LMS spends more time delayed in a queue. If its priority is inverted, then it can be starved.

A saturation of the physical bandwidth and/or the buffering resources in NICs and switches can result in severe performance impact.. Overflowing buffers, faulty links,.dropped frames require retries, which can be expensive.

The important things to bear in mind when comparing interconnect latencies are that they are defined as roundtrip times and include the processing times, CPU cycles for send and receive, and propagation delays.

The CPU cost varies with message size and is smaller for 2K blocks than for 8K blocks.

The avg latency reported in Statspack or AWR is based on roundtrip, end-to-end latencies measured at the buffer cache layer. The individual latencies can have a “narrower” or a “wider” dispersion. Larger variation and a tendency for higher delays can be caused by the load on a node, particularly affecting the block server processes, or interconnect bandwidth saturation.

The lower bound for a block transfer of 4KB is about 300 microseconds. What one sees in Statspack and AWR reports are usually averages which are based on a particular distribution. Normally, latencies for the 90% percentile are in the < 500 microsecond bucket, but this can be significantly impacted by the system load.

© 2009 – 2011, www.oracledatabase12g.com. 版权所有.文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.

相关文章 | Related posts:

  1. How many LMS processes for Oracle Rac 9i?
  2. Checklist for Performance Problems with Parallel Execution
  3. Troubleshooting JVM Performance Problems
  4. Oracle Streams Performance Advisor
  5. SQL*Net PERFORMANCE TUNING UTILIZING UNDERLYING NETWORK PROTOCOL

Leave a Reply

  

  

  

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>