Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Configuring OPS Clusters with MC/LockManager: > Chapter 8 Troubleshooting Your Cluster

Solving Package Problems

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Index

Problems with packages fall into three categories:

  • System administration errors.

  • Package movement errors.

  • Node and network failures.

The first two categories of problems occur with the incorrect configuration of MC/LockManager. The last category contains "normal" failures to which MC/LockManager is designed to react and ensure the availability of packages containing your applications.

System Administration Errors

There are a number of errors you can make when configuring MC/LockManager that will not show up when you start the cluster. Your cluster can be running, and everything appears to be fine, until there is a hardware or software failure and control of your packages are not transferred to another node as you would have expected.

These are errors caused specifically by errors in the cluster configuration file and package configuration scripts. Examples of these errors include:

  • Volume groups not defined on adoptive node.

  • Mount point does not exist on adoptive node.

  • Network errors on adoptive node (configuration errors).

  • User information not correct on adoptive node.

You can use the following commands to check the status of your disks:

  • bdf - to see if your package's volume group is mounted.

  • vgdisplay -v - to see if all volumes are present.

  • lvdisplay -v - to view logical volume status.

  • strings /etc/lvmtab - to ensure that the configuration is correct.

  • ioscan -fnC disk - to see physical disks.

  • diskinfo -v /dev/rdsk/cxtydz - to display disk information.

Package Movement Errors

These errors are similar to the system administration errors except they are caused specifically by errors in the package control script. The best way to prevent these errors is to test your package control script before putting your high availability application on line.

Running your script with the -x shell option will give you details on where your script may be failing.

Node and Network Failures

Node and network failures cause MC/LockManager to transfer control of a package to another node. This is the normal action of MC/LockManager, but you have to be able to recognize when a transfer has taken place and decide to leave the cluster is its current condition or to restore it to its original condition. Possible node failures can be caused by the following conditions:

  • HPMC

  • TOC

  • Panics

  • Hangs

  • Power failures

In the event of a TOC, a system dump is performed on the failed node and numerous messages are also displayed on the console. You can use the following commands to check the status of your network and subnets:

  • netstat -in - to display LAN status and check to see if the package IP is stacked on the LAN card.

  • lanscan - to see if the LAN is on the primary interface or has switched to the standby interface.

  • arp -a - to check the arp tables.

  • landiag - to display, test, and reset the LAN cards.

Since your cluster is unique, there are no cookbook solutions to possible problems. But if you apply these checks and commands and work your way through the log files, you will be successful in identifying and solving problems.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© 1999 Hewlett-Packard Development Company, L.P.