Everything you ever wanted to know about the Cluster Health Monitor (CHM)

作者: Maclean Liu , post on January 10th, 2010 , English Version
【本站文章除注明转载外,均为本站原创编译】
转载请注明:文章转载自: Oracle Clinic – Maclean Liu的个人技术博客 [http://www.oracledatabase12g.com/]
本文标题: Everything you ever wanted to know about the Cluster Health Monitor (CHM)
本文永久地址: http://www.oracledatabase12g.com/archives/everything-you-ever-wanted-to-know-about-the-cluster-health-monitor-chm.html

What is the Cluster Health Monitor (CHM)?

Introduction
The Cluster Health Monitor (CHM) (formerly a.k.a. Instantaneous Problem Detector for Clusters or IPD/OS) is designed to
detect and analyze operating system (OS)
and cluster resource related degradation and failures
in order to bring more explanatory power to many issues that occur in clusters, in which Oracle Clusterware and / or Oracle RAC are used, such as node evictions.

Why should you use CHM?
Because there is a Monday morning for example
Assume the following scenario:
Leaving the office Friday night
Getting an email that one node in the cluster rebooted on Sunday morning
Getting a question from your manager why that node rebooted on Monday

Typical way of addressing this question:
Gather and analyze Oracle Clusterware and operating system logs (e.g. following MOS doc 330358.1 – CRS 10gR2/ 11gR1/ 11gR2 Diagnostic Collection Guide)
Open a Service Request with Oracle Support

Possible outcomes:
Oracle Support finds the answer in one of the logs
Oracle Support needs more node specific information to answer the question

For the latter: This why you need Cluster Health Monitor (CHM) for example

Based on the previous scenario:
It is determined that the reboot was caused by an abnormally high CPU load in conjunction with extreme IO waits.
Your manager asks you: What caused the high CPU load? What can we do to prevent this in future?
For the latter: CHM provides a historical view on collected data:
>crfgui -d “00:05:00″ -m 192.168.2.8
Cluster Health Analyzer V1.10 Look for Loggerd via node 192.168.2.8 …reading 300 sec from the past Connected to Loggerd on rac1 Note: Node rac1 is now up Cluster ‘MyCluster’, 2 nodes. Ext time=2010-08-18 23:22:30

How to Install CHM?
Use the documentation
Overview of Cluster Heath Monitor (CHM) (http://www.oracle.com/technetwork/database/enterprise-edition/ipd-overview-130032.pdf)
Summary of installation steps:
Download the software
Unzip the downloaded file
Do not install from a shared file system
Set up an OS-user for CHM
The user must have passwordless SSH access to all nodes
The user can be the same as the Oracle Grid Infrastructure-owner
Install the software
$CHM_install_DIR/install/crfinst.pl –i {node1,node2…} –b /BDBdirectory
Do not use a shared destination for the location of the BDBdirectory
The software is distributed across all nodes specified under –i automatically
Define one of the nodes as the master node
Run “crfinst.pl -f -b /u01/orachmbdb/” as root on all nodes to enable the tool

Future Development of CHM What you will find in Oracle Grid Infrastructure 11.2.0.2
Cluster Health Monitor is planned to be integrated with Oracle Grid Infrastructure starting with 11.2.0.2 as follows:
The data gathering part of the tool will be part of the standard installation
CHM will therefore be installed into the Oracle Grid Infrastructure home
The Berkeley DB will be installed in the Oracle Grid Infrastructure home (default)
The GUI remains as a separately downloadable item
Changes in some parts of the architecture are possible, but the principles remain
The tool will provide more configuration options on the command line for example
The tool will be enabled per default with a default retention time (adjustable)

Going forward, all OS supported for Oracle Grid Infrastructure will be supported for Cluster Health Monitor.
More Operating Systems are planned to be supported for CHM as 11.2.0.2 becomes available on those OS’s (completion is planned for 11.2.0.3)

© 2010 – 2011, www.oracledatabase12g.com. 版权所有.文章允许转载,但必须以链接方式注明源地址,否则追究法律责任.

相关文章 | Related posts:

  1. 11g New Feature: Health monitor
  2. EVENT:10212 check cluster integrity
  3. How to Perform a Health Check on the Database [ID 122669.1]
  4. The On-Board Monitor User’s Guide
  5. Data gathering for troubleshooting Oracle Real Application Cluster issues
  6. Benefits From Oracle Real Application Cluster

Leave a Reply

  

  

  

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>