Managing Performance and Availability for 25,000 Siebel Contact Center Users with Oracle Real Application Clusters

作者: Maclean Liu , post on November 9th, 2010 , English Version
【本站文章除注明转载外,均为本站原创编译】
转载请注明:文章转载自: Oracle Clinic – Maclean Liu的个人技术博客 [http://www.oracledatabase12g.com/]
本文标题: Managing Performance and Availability for 25,000 Siebel Contact Center Users with Oracle Real Application Clusters
本文永久地址: http://www.oracledatabase12g.com/archives/managing-performance-availability-for-25000-siebel-contact-center-users-with-oracle-real-application-cluster.html

Agenda

  • Introductions
  • Project Overview & Business Drivers
    • Scope and Usage of the system
  • Technical Footprint – Handling 25,000 Users
    • Architecture & Infrastructure
    • Session Management
    • RAC Performance
    • System Monitoring
    • Reporting
  • System Interfaces
  • Lessons Learned
  • Future plans

Throughout the whole country, we support all our Call Centers plus out Stores (T-Shops) as well as our partners. They are all connected to the central CRM system which it self is connected to various legacy systems.

The Data Centers are 150 Km away from each other. They are located in Bielefeld and Krefeld?
There are 26 Legacy Systems directly connected to the CRM-T System
The SOA T-ESB handles several million message every day

26 Legacy Systems
Facts of the OH TESB up to 11 TB(6TB Siebel Daten)
Up to 10.000 orders per hour
20 call center within Germany
7000 T-Shop User
20.000 concurrent user on the system

So where is this massive amount of workload coming from?

Our RAC infrastructure does not consolidate applications.
It is pure for Scalability and Availability.
Heavy I/O rates will be captured with Solid State Disks. Several Critical high frequently changing Tables are stored there.

Naming of FC Switches.
Streams, 1 RAC node sends into Stage. Synch below 1 minute, (Juergen/ Thysen bezüglich apply pro Tag)

Managing Sessions on the RAC
Session Handling
In Siebel we see up to 20000 Concurrent Users plus hundreds of Sessions from other components like EAI, Batch,…
This huge amount of session have influenced our RAC Cluster stability heavily due to memory shortages (each Session had an amount of up to 15 MB)
So 2 possibilities have been offered to solve this problem:
Oracle Multi threaded Server which pools the user in the Database. There, we haven’t had lots of experience.
Siebel Connection pooling which pools Siebel Users into a single connection. There we had lots of experiences from our old, migrated system
So we choose Siebel Connection pooling as this was the more easy to implement solution by just changing the component parameters to a ratio of 1 to 5. So 1 Database connection is handling 5 Siebel user sessions. For other components we still use dedicated sessions as compared to user session they do not have a think time.

Session Management:
All sessions are connected via a single Database Service, where we define the Nodes, the failover strategy as well as the session distribution over the nodes.
A single Service has several disadvantages this is why we have planned but currently not implemented to use multiply services in order to:
Separate the different components from each other so that we can manage them independently
This will allow us to prioritize these different component like Batch job will become higher priority during the night while Call Center user will get highest priority from 7 am till 5 pm.
We will also be able to split this user session over different nodes on the RAC. So we can separate Call Center Agents from every thing else. They will have than dedicated dedicated Nodes (CPU, Memory, … ) and only in case of a failure other nodes will be used where the resources will be shared with other users or components.

Resource Profiles:
Finally we are currently using Profiles to implement Resource limitations.
During the Rollout we have been facing several bad SQL statements which influenced the overall performance on the system.
Due to the fact that siebel implements the loss of a session in a way that it reconnects and send the statement again, we haven’t been able to effectively stop long running SQL statements.
To solve this problem we used Oracle resource profiles to limit the CPU per call time so that long running queries disappeared as problems.

One big aspect from the beginning was to do the sizing of the RAC Cluster.
This was hard to do as:
not to many RAC environment will be used for pure scalability
Not to many RAC environments exists handling that much OLTP user
Not to many RAC environments exists from this size for a Siebel Enterprise
Internal Siebel Architecture was not defined in detail.

So there haven’t been to much comparable examples.
The tests have been done on a system with 1/20 of the real DB size.

Now after the rollout we are much smarter.

Current SGA Target in Production is 50 GB (System Global Area – Data Cache)
Current PGA Target in Production is 32 GB (Process Global Area – Sort,
Current Memory utilization is about 60% (240 GB of memory)

Memory was one of the Key factors not only for performance.
Less memory has lead to RAC instability due to AIX memory paging. Therefore the Nodes haven’t been reachable for the Cluster Software which ends up in node evictions.

Even as we haven’t really tested the load on a single Server we would calculate 10% to 20% more physical memory for an RAC environment in the future.

—————————————————–
V1009: changed the order of More and Less nodes.
How much nodes we should have in the RAC environment.

Oracle RAC gives the Infrastructure Architects the possibility to choose between 2 and 64/128 Nodes. So another freedom where decisions needs to be made.
Similar to the amount of memory, the amount of Nodes had to be an estimate during the sizing exercises.
The necessary amount of CPU’s based on the expected load, mixed with the chosen Hardware, will provide the minimum amount of server.
For sure the RAC is extensible by new nodes. However, performance will not always improve with an higher amount of node as this can increase the interconnect traffic. i.e. DB Triggers will be fired possible on all nodes. So the Data Blocks will be shipped through the enterprise.

More nodes provides better availability as if one node fails less resources will be missed and the Mean time before Failure is less as well.
However, more nodes increase the management overhead as well as needs more work in terms of workload distribution management (Which service should run where).

So for us, 4 nodes have been chosen due to Load and Hardware restriction – every thing else has come from lessons learned.

Interconnect
As the interconnect is the heart of the Cluster, we now understand that we have to have a dedicated infrastructure as we found that even sharing the Switches was a possible reason for node evictions.
Even if the Bandwidth is more than sufficient, hasn’t been a problem so far, the possible latency

V1009: Failover Interfaces / Channel bundling should be considered in order to reach the necessary availability and throughput

V1009: In the beginning we thought about Stretch Cluster but had a workshop with Oracle which clarified our limitations: This limitations are hard facts for the current hardware as the latency becomes the bottleneck. The recommended distance between Cluster nodes should not go over 20 Km. As our Data Centers are much further away from each other, we haven’t created a Stretch cluster and addresses the disaster scenarios via Backups.
V1010 Disaster Recovery: Even as SOX conformity will be a goal for the future, we will not be able to use Stretch Cluster to fulfill the requirements as the nodes need to be to far away from each other. Therefore other methodologies like, Disk mirroring – what we currently use – or, Data Guard with a Stand By Database would be necessary.

V1009: Consider Rolling Upgrades – so that the downtime for installing patches could be minimized. This is also important to have a strategy up front.

V1009: Consider to use Partitioning – For sure this is helpful to tune SQL statements and reduce the amount of Data to be handled -but also to keep data Blocks on as less nodes as possible especially if they will be changed frequently from several nodes (like from Database triggers)

V1009: Moved Database Service into this Slide as it fits better:
Session Management –As we have discussed this previously it is an important feature.
Our future design contains more than 20 Services for the different Siebel components.
Use DB Services as early as possible. This allows the Load Balancer to spread out the load without influencing different resources.
This will provide especially the DBA’s the freedom to move workload around within the cluster. Separate them on demand, Limit the provided CPU resources or the availability based on the session granularity.
For example our Callcenter users are the most important Agents. So they should have the highest throughput and therefore should get dedicated resources. While our Batch components could get only limited resources during the day. Without Services it is hard to achieve this goals.
This is also an aspect for availability. Like we can survive if EAI Components are having only ½ of the DB CPU power in case of a failure, but Callcenter user should again have full resources.
So services allows us to manage Performance and Availability.
V1009: Watch out for Architectural bottlenecks – Consider the increase concurrency. Watch out for bottlenecks in the architecture like Database Sequences which needs to be used concurrently by all Nodes as well as centralized Tables (like the siebel S_SSA_ID which is used in Siebel to generate the unique ROW_IDs or other central internally used Tables – S_SRM_REQUEST, S_ESCL_REQ).
Reduce the use of such central points, Tune them if it is sufficient, or move them to the fastest HW Disks as we did – Solid State Disks.

© 2010, www.oracledatabase12g.com. 版权所有.文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.

相关文章 | Related posts:

  1. Performance Tuning Guide for Siebel CRM Application on Oracle
  2. Oracle Real Application Clusters Installation and Configuration Best Practices
  3. Upgrade to Oracle Real Application Clusters 11g Release 2 Key Success Factors
  4. Design for Higher Availability and Faster Recovery Lawrence To Center of Expertise Worldwide Customer Support Oracle Corporation October 18th,
  5. Data gathering for troubleshooting Oracle Real Application Cluster issues
  6. Benefits From Oracle Real Application Cluster
  7. Oracle Real Application Testing for Earlier Releases of Oracle Database
  8. Siebel CRM with Oracle® Cost-Based Optimizer (CBO)

Leave a Reply

  

  

  

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>