Agenda
- Services
- Creating and Using
- Service Measurement
- Performance Diagnosis
- Load Balancing
- Service High Availability with RAC
- Planned and unplanned outages
- Notifications for end to end HA
Customers require services that are uninterrupted and continuously available, similar to the dial tone when picking up the phone receiver.
Services allow applications to benefit from the reliability of the redundant parts of the cluster.
The services hide the complexity of the cluster from the client by providing a single system image for managing work.
A number of RDBMS features act in concert to support services. The Automatic Workload Repository (AWR) manages the performance of services. The AWR repository records the service performance, including SQL execution times, wait classes and resource consumed by service. AWR alerts warn when service response time thresholds are exceeded. The dynamic views report current service status with one hour of history. The Database Resource Manager maps services (in place of users) to consumer groups, automatically managing the priority of one service relative to others. The RAC high availability features keep the services available, including when some or all components are unavailable. Data Guard Broker, in conjunction with RAC, migrates the primary service across data guard sites for disaster tolerance.
Application services
Used for application workloads, including job scheduler and parallel query; optionally qualified by db_domain.
All high availability and workload management features and services.
Different Kinds of Application Services
Application services are planned to describe applications, application functions and data ranges, as follows:
Functional services. Functional services are the most common mapping of workloads. Sessions using a particular function are grouped together. For Oracle*Applications, ERP, CRM and iSupport functions create a functional division of the work. For SAP, dialog and update functions create a function division of the work.
Data-dependent services. Data dependent routing routes sessions to services based on a data keys. The mapping of work requests to services occurs in the object relational mapping layer for application servers and TP monitors. Because the database is shared, these ranges can be completely dynamic, for example based on demand. This is a major differentiation over shared-nothing systems.
Pre-connect services. The preconnect service spans the set of instances that are available to support a service in the event of a failure. When a service is added using DBCA or SRVCTL the preconnect service is created automatically and is then managed by the clusterware . This service name
When a service is associated with a function this is termed a functional-dependent routing.. When a service is associated with a data partition this is termed a data-dependent routing.
Internal Services
In addition to application services, the RDBMS also supports two internal services. SYS$BACKGROUND is used by the background processes only. SYS$USERS is the default service for user sessions that are not associated with application services.
SYS$USERS
Default for sessions using the database without application services – db_name.db_domain.
All workload management features. Cannot be stopped and disabled.
SYS$BACKGROUND
Oracle background processes only.
All workload management features. Cannot be stopped or disabled
Each service has the following attributes:
globally unique name – that identifies the service in the local cluster and globally for data guard.
quality of service thresholds for response time and CPU consumption.
priority – relative to other services, defined in terms of either ratio of resource consumption or priority.
In a RAC environment, services have two additional attributes
preferred configuration for High Availability – a description of how to distribute the services when the first system starts.
TAF policy for High Availability – none, basic, or preconnect. – managed automatically using services
The companion paper – Application Continuous Services by Example provides a complete description of the commands for configuring and using services in single instance and RAC environments.
To configure services in single Oracle instance environments use the DBMS_SERVICE package. For backward compatibility, services are also created implicitly when the service_names parameter is set for the instance. To configure the service level thresholds and Database Resource Manager for services use Enterprise Manager or PL/SQL.
To configure the high availability features of services in Oracle 10g RAC environments, use the DBCA and NETCA or the Server Control (SRVCTL) utility. This definition process creates a number of HA resources that are managed by the clusterware to keep the services available. The DBCA and SRVCTL interfaces for service ask the administrator to enter the list of instances that are the preferred location for the service, plus any additional instances that are available to support the service in the event of an outage or planned operation. The definition of the preferred set of instances are a hint to the HA framework integrated with RAC 10g, as to how to spread out the service when the system first starts. For example, on a three node cluster the preferred configuration may offer the PAYROLL service from instance one, and the ERP service from instances two and three. Some or all instances of the database may be configured to support the service. Once the service is created, Database Resource Manager can then be used to create consumer groups that control the priority for the service.
The sql*net connect is created by DBCA.
Client-Side Usage
Middle tier applications and client-server applications use a service by specifying the service as part of the connection in the TNS connect data. This may be in the TNSnames file for thick Net drivers, in the URL specification for thin drivers, or may be maintained in the Oracle Internet Directory. For example, data sources for the application server are set to route to a service. Using Net Easy*Connection in 10g, this connection needs only the service and network address. For example, scott/tiger@//myVIP/myservice. For Oracle eBusiness suite, the service is also maintained in the application database identifier and in the cookie for the ICX parameters.
Server-side Usage
Server side work, such as the Job Scheduler, Parallel Query, and Oracle Streams Advanced Queuing set the service name as part of the workload definition.
For the Job Scheduler, the service that the job class uses is defined when the job class is created. During execution, jobs are assigned to job classes and job classes run within services. Using services with job classes ensures that the job scheduler work is identified for workload management and performance tuning. For high availability, DBMS_JOBS previously used the instance name to define where services should execute. This approach resulted in jobs not running when the instance was unavailable. With the new job scheduler, setting the service in the job class ensures that the job executes when the service is running anywhere in the cluster or grid.
For parallel query and parallel DML, the query coordinator connects to a service just like any other client, and the parallel query slaves inherit the service for the duration of the execution. At the end of the query execution, the slaves revert to the default service. For Oracle Streams Advanced Queuing, streams queues are accessed by service, achieving location transparency.
Database Resource Manager
The Database Resource Manager manages the relative priority of services within an instance. In previous releases of Oracle, the Database Resource Manager identified work based on the USER. In Oracle 10g, the Database Resource Manager is enhanced to identify work using services.
The Database Resource Manager binds services directly to consumer groups. When a work request connects using a service, the consumer group is assigned transparently at connect time. This lets Resource Manager manage the work requests by service in the order of their importance. For example, an installation could define separate services for high priority online users and lower priority internal reporting applications. Alternatively, using importance alone, an installation could define gold, silver, and bronze services to prioritize the order in which work requests are serviced within the same application. Using ratios, the Database Resource Manager can provide two-thirds of resource to the payroll service and one-third of resource to the CRM service. Using priorities, the Database Resource Manager can satisfy the highest priority services first, followed by the next priority services, and so on.
Use Case
Consider an application called iSupport with three service levels Gold, Silver and Bronze. The application server will have three data sources for these services, each connected to application services at the database with the same name. In addition there is an authentication service used when a client identifies their credentials. During authentication the support level is accessed transparently, and subsequent requests are routed to the data source with the matching service level. At the server side, each of these 4 services maps to consumer groups for Gold, Silver and Bronze, plus authentication. The resource plan implements either the priorities or relative resource consumption for each group.
Job Scheduler
In Oracle10g, Oracle implements a new job scheduler through the DBMS_SCHEDULER package.
DBMS_SCHEDULER provides far more functionality than the DBMS_JOB package, which was the previous Oracle Database job scheduler. Collectively, DBMS_SCHEDULER and all of its programs are referred to as the Scheduler. The Scheduler is able to take advantage of services and the benefits they offer in a clustered environment. The service that a specific job class uses is defined when the job class is created. During execution, jobs are assigned to job classes and job classes run within services. Using services with job classes ensures that the Scheduler work is identified for workload management and performance tuning. For instance, jobs inherit AWR alerts and performance thresholds for the service they run under. For high availability, the Scheduler offers service affinity instead of instance affinity. Jobs are not scheduled to run on any specific instance. They are scheduled to run under a service. So, if an instance dies, the job can still run on any other instance in the cluster that offers the service. The following example shows how jobs run within a service. A job class with the same name (AP) as the service it runs under is created. A job is created to run within that job class.
exec DBMS_SCHEDULER.CREATE_JOB_CLASS(JOB_CLASS_NAME => ‘AP’,-
RESOURCE_CONSUMER_GROUP => NULL, SERVICE => ‘AP’,-
LOGGING_LEVEL => DBMS_SCHEDULER.LOGGING_RUNS, LOG_HISTORY => 30,-
COMMENTS => ‘Job class for AP jobs’);
exec DBMS_SCHEDULER.CREATE_JOB(job_name => ‘name_for_the_job’, JOB_TYPE => ‘stored_procedure’,-
JOB_ACTION => ‘package_name.procedure();’, NUMBER_OF_ARGUMENTS => 4,-
START_DATE => SYSDATE + 1, REPEAT_INTERVAL => 5, END_DATE => SYSDATE + 30,-
JOB_CLASS => ‘AP’, ENABLED => TRUE, AUTO_DROP -> FALSE,-
COMMENTS => ‘Job to do some sort of AP related work.’);
For more information on the Scheduler, see Oracle Database Administrator’s Guide and PL/SQL Packages and Types Reference.
The service, MODULE, ACTION names are visible in V$SESSION V$ACTIVE_SESSION_HISTORY, and V$SQL views. The call times and performance statistics are visible in V$SERVICE_STATS, V$SERVICE_EVENTS, V$SERVICE_WAIT_CLASSES, V$SERVICE_METRICS, and V$SERVICE_METRICS_HISTORY. When statistics collection for “important” transaction s is enabled, the call speed is available for each Service, MODULE, and ACTION name at each database instance using V$SERV_MOD_ACT_STATS.
In the information business, the movement of resources in the corporation, dynamically, to where they are needed is essential. Hands-free sharing of resources based on business rules, not on hardware or software limitations, is fundamental. Better utilization of resources through automated routing of workloads in response to load, availability and priority is the competitive advantage.
The workload measurement features are fully integrated with the Oracle RDBMS for single instance and Real Application Cluster environments.
Focus on Response Time
The approach to managing services is focused on response time:
R = C + W
· R = response time, named DB time
· C = CPU/service, named CPU time
· W = wait time, in v$service_wait_class and v$service_event
The basic idea is to look at all of the time that is spent in the service, both efficiency and delay, and to rate this against the goals set for the service and for the system.
Measuring Workloads by Service
Automatically, the Automatic Workload Repository (AWR) maintains performance statistics, including response times resource consumption and wait events, for all services, for the work that is being done in the system. Selected metrics, statistics, wait events, and wait classes, plus SQL level traces are maintained for service, optionally augmented by MODULE and ACTION name. The statistics aggregation and tracing by service are new in their global scope for RAC and in their persistence across instance restarts and service relocation for both RAC and single instance Oracle.
By default, statistics are collected for the work attributed to every service. Each service can be further qualified by MODULE and ACTION name to identify the important transactions within the service. This removes the noise associated with measuring the time for all calls for the service. The service, MODULE and ACTION name provide a user-explicable unit for measuring elapsed time per call, and for setting and resource consumption thresholds. One way to think of demarcated transactions are as a litmus test for measuring the current service levels. Since they are normally business transactions, using demarcated transactions to report service quality is a superior technique to using synthetic queries.
MODULE and ACTION name are a way of tying a portion of the application code to the database work done on its behalf. The MODULE name is set to a user recognizable name for the program that is currently executing (script or form). The ACTION name is set to a specific action that a user is performing within a module (’reading mail’, or ’entering a new customer’). Setting these tags using OCI in 10g results in NO additional round trip to the database.
Use the DBMS_MONITOR package to control the gathering of statistics that quantify the performance of services, modules and actions. Also use DBMS_MONITOR for tracing services and service, modules, and actions.
Service Thresholds and Alerts
Service level thresholds permit the comparison of achieved service levels against accepted minimum required level. This provides accountability with respect to delivery or failure to deliver an agreed service level. The end goal is a predictable system that achieves service levels and will continue to do so. There is no requirement to perform as fast as possible with minimum resource consumption – the requirement is to meet the quality of service.
An installation can specify explicitly two performance thresholds for each service – the response time for calls (SERVICE_ELAPSED_TIME) and the CPU time for calls (SERVICE_CPU_TIME). The response time goal indicates a desire for the elapsed time to be, at most, a certain value. The response time represents the wall clock time. It is a fundamental measure that reflects all delays and faults blocking the call from running on behalf of the user, and differences in node power across a RAC cluster.
The service time and CPU time are calculated as the moving average of the elapsed, server-side call time. The Automatic Workload Repository (AWR) monitors the service time (and also the CPU time) and publishes AWR alerts when the performance exceeds the thresholds. You can then respond to these alerts by changing the priority of a job, stopping overloaded processes, or relocating, expanding, shrinking, starting or stopping a service. This permits you to maintain service availability despite changes in demand.
Use Case
To check the thresholds for the payroll service, use the AWR report. Take the report over a successive intervals that are running well. For example, on Email Server the AWR report is used for each Monday with peak times between 10am and 2pm. The AWR report contains the response time (“DB time”) and the CPU consumption (“CPU time”) for calls for each service, along with a breakdown of the work done and wait times contributing to this response time.
Using DBMS_MONITOR set a warning threshold for the payroll service at 0.2 seconds and a critical threshold for the payroll service at 0.5 seconds. In 10i release 1, these thresholds must be set at all instances. Actions can be scheduled using EM jobs for alerts or programmatically, when the alert is received.
Statistics aggregation across general dimensions – for example, services, modules and actions – is frequently more useful than the session-based aggregation that is supported today. Performance management by the service aggregation makes sense when monitoring by sessions may not. For example, in systems using connection pools or transaction processing monitors, the sessions are shared, making accountability difficult.
Service, module, and action tags identify operations at the server. (MODULE and ACTION name are set by the application for finer grain reporting) The DBMS_MONITOR package enables aggregation and tracing at the service, module, and action levels to identify high load operations. The service, module, and action tags provide major and minor boundaries to discriminate the work and the processing flow. This new aggregation level facilitates tuning groups of SQL that run together (at service, module, and action levels).
The new aggregation level can be used to manage service quality, to assess resource consumption, to adjust priorities of services relative to other services, and to point to places where tuning is required. With Oracle RAC 10g (RAC), services can be provisioned on different instances based on their current performance.
Workload Balancing
The benefits of managing work across multiple servers as a single unit increases the utilization of the available resources. Work requests can be distributed across instances offering a service according to the current service performance. Balancing or work requests occurs at two different times – at connect time and at runtime.
Connection Load Balancing
Reasonable applications connect once to the database server and stay connected. This is best practice for all applications – connection pools, client/server, and server side. Since connections are relatively static, the method for balancing connections across a service should not depend on metrics that vary widely during the lifetime of the connection. Three metrics are available for the listeners to use when selecting the best instance, as follows:
Session count by instance – For symmetric services and same capacity nodes, the absolute session count evenly distributes the sessions.
Run queue length of the node – For asymmetric services or different capacity nodes, the run queue length places more sessions on the node with least load – at the time of connection.
Goodness by service – For all services and any capacity nodes, the goodness of the service is a ranking of the quality of service that the service is experiencing at an instance. The ranking compares the service time to the threshold value for the service. It also considers states such as access to an instance is restricted. Example goodness ratings are – excellent, average, violating, and restricted. (for internal use – available on Grid)
To avoid a listener routing all connections to the “excellent” instance between updates to the goodness values, each listener adjusts its local ratings by a delta as connections are distributed. The delta value used is the resource consumed by each connection using a service.
Runtime time Load Balancing
Runtime load balancing is used when selecting connections from connection pools. For connection pools that service services at one instance only – the first available connection in the pool is adequate. When connection pools service services that span multiple instances
For runtime load balancing, a metric is needed that is responsive to the current state of each service at each instance.
The Automatic Workload Repository measures the response time and CPU consumption (and service goodness) for each service. The views – V$SERVICE_METRICS and V$SERVICE_METRICS_HISTORY contain the service time for every service, updated every 60 seconds with one hour of history. These views are available to applications to use for their own runtime load balancing. For example, a middle-tier using connection pools can use the service metrics when routing the runtime requests to instances offering a service.
Using the service time for load balancing is a major improvement over earlier approaches that use round robin or run queue length. The run queue length metric does not consider sessions are blocked in wait and unable to execute – e.g. interconnect wait, IO wait, application wait. Run queue length and session count metrics also do not consider session priority. Using service time for load balancing recognizes machine power differences, sessions that are blocked in wait, failures that block processing, as well as competing services of different importance. Using service time prevents sending work to nodes that are over worked, hung or failed.
Use Case
The response time “DB time” and “CPU time” is available for each service at each instance in V$SERVICE_METRICS_HISTORY. Just as the listener uses this data to deal connections across the service, the connection pool algorithm can use the data when selecting connections from the connection pool. Using this approach avoids proportionally distributes the work across instances that are serving the service well, and sending work to slow, hung, failed and restricted instances.
With Real Application Clusters, the high availability focus is on protecting the logically defined application services. This focus is more flexible and more cost effective than other high availability approaches in the market place that focus on the availability of single physical systems. The application services virtualize the cluster. Services are made available continuously with load shared across one or more instances in the RAC. Any server in the cluster can offer services in response to runtime demands, in response to failures, and in response to planned maintenance. Services are always available somewhere in the cluster.
Oracle RAC 10g introduces a portable, full stack clusterware, Cluster Ready Services (CRS), which has all the features that are provided with OS clusterware — node membership, group services, global locking and HA resources. CRS is tightly integrated with the Oracle RAC 10g database so that it is easier to configure and manage on every platform that Oracle supports. There is support for a large number of nodes and capacity can be added on demand.
No longer are there any inconsistencies in clusterware requirements, high availability service levels, time to recover, number of nodes supported and the myriad of other discrepancies that make clusters so hard to manage. There is no need to purchase additional software from other vendors to support the cluster and since Oracle supplies the whole clusterware stack, there is only one vendor to call for support. With CRS users can expect a fast, consistent result to high availability issues, such as server node failures and interconnect failures, with no dependence on any platform specific clusterwares. For example, instead of waiting for the vendor’s node management services to determine that a node has failed or disconnected, CRS reports the failure back to the applications within 15 seconds.
Another feature of Oracle RAC 10g is the high availability framework (RAC HA) that is used to deliver the continuous services for the Grid. For high availability, CRS defines the application resource profiles for the application services with their dependencies. The RAC HA monitors the Oracle RAC 10g database and the services through a set of common API’s and sends events to CRS to recover and balance services according to the configuration. The RAC HA framework also sends events to applications to notify them of changes in service status. The application can then take immediate actions – such as redistribution of connection pools – when a failure occurs, when a system is restarted, and also when additional capacity is available.
Composite Resources
resource representing a class of resources
exist whenever a member resource is running.
applications depend on the composite resource rather than on the member resources.
HA events (up/down/not restarting) occur for the composite resource.
monitors are local to the members, the corrective action may arbitrate across the class.
examples – database and service groups
Oracle RAC 10g Sharing the Workload
Rather than hardware being idle, it is more cost effective to have all instances share providing the services. The management functions to distribute the work across the instances that provide service are necessarily more complex, and this complexity is fully concealed in the RAC HA framework. When all nodes support services, there is a potential for the service time to degrade when service providers have failed or are disabled. To overcome this problem, a set of M Oracle RAC 10g instances are defined, as the minimum set to support the services. The hardware redundancy is established at N Oracle RAC 10g instances, where N is greater than or equal to M, and the preferred topology for the services spreads the services across all N instances. There are many permutations for workload sharing as follows.
Oracle RAC 10g with Symmetric Workloads
Two or higher node Oracle RAC 10g databases with services uniformly spanning each database on the cluster:
This configuration for Oracle RAC 10g consists of two or more nodes. The configuration supports one or more RAC databases, with each database being active on all instances and with the same set of services on all instances. An example of this solution is plant one and plant two services are provided on RAC instances on nodes 1, 2, and 3 of a three node cluster, with each instance providing failover and load sharing capabilities.
Oracle RAC 10g with ACTIVE/Asymmetric workloads
Two or higher node Oracle Real Application Cluster databases with workload distributed by service:
In this configuration, these application services span one or more instances of the database. During normal operation, work requests are balanced across each service. After a failure, CRS distributes the application services amongst the remaining instances supporting the service. When failed components are restored (often immediately by CRS) the work is rebalanced to use the additional capacity. This configuration of RAC achieves the highest availability and highest performance by transparently maintaining independence based on service. Schema and data are divided into logical databases and separate fault domains. Planned and unplanned outages on one fault domain can be isolated from others and the effected service can be recovered or upgraded. Clients of the same service are presumed to have similar access patterns, therefore routing them together results in higher cache locality than distributing them randomly across the instances. This is a differentiator over shared-nothing approaches, where the applications need to change to route requests to different places for each transaction.
An example of this solution is the Oracle eBusiness Suite. Responsibilities are mapped to service, and these services are managed across RAC transparently by CRS. Common services can span all RAC instances efficiently. Services with lower capacity needs can be defined with single cardinality and all instances capable of providing the service. Note that active services can also use a subset of instances of the database without declaring spare instances.
© 2011, www.oracledatabase12g.com. 版权所有.文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.
相关文章 | Related posts:
- Systems and Services-Service Level Management
- Discover How To Leverage The Power Of The Grid
- 【书籍推荐】Oracle 8i Internal Services
- Improved features to Deploy PeopleSoft on Oracle Grid
- Oracle Real Application Clusters One Node and Rac One Node Price
- Building a Dynamic Grid Infrastructure
- Oracle Database 10g Automatic Storage Management Concepts and Administration
- Using Oracle RAC and Microsoft Windows 64-bit as the Foundation For a Database Grid
- CSS(Cluster Synchronization Services) Internals (INTERNAL ONLY)
- Oracle Real Application Clusters Installation and Configuration Best Practices




最新评论