| United States-English |
|
|
|
![]() |
Configuring OPS Clusters with ServiceGuard OPS Edition > Chapter 8 Troubleshooting Your
ClusterTroubleshooting Approaches |
|
The following sections offer a few suggestions for troubleshooting by reviewing the state of the running system and by examining cluster status data, log files, and configuration files. Topics include: The netstat -in command can be used to examine the LAN configuration. The command, if executed on node 1 after the halting of node 2, shows that the package IP addresses are assigned to lan0 on node 1 along with the heartbeat IP address.
All the components of ServiceGuard produce messages at different times indicating the completion of a step or an error or warning condition. Messages generated by SAM are displayed to the user in a message box; messages from HP-UX commands are normally displayed on the standard output; some information may also be written to different log files, depending on which software component is generating the message. Messages from the cluster manager are found in the system log file, /var/adm/syslog/syslog.log. Messages from the Cluster Manager and Package Manager are written to the system log file. Each message is accompanied by a timestamp showing the date and time the message was written out and the name of the process that generated the message. The default location of the log file is /var/adm/syslog/syslog.log. You can distinguish messages from the following daemon processes: You can examine the syslog.log file periodically for messages relating to the configuration. In SAM, use the following steps:
You can also browse the syslog file directly:
The cluster manager employs several types of messages to convey information about the running system. Each message is accompanied by a prefix that identifies the message type. The categories are as follows:
Messages are intended to be self-explanatory, but occasionally it is necessary to study several messages together in context to determine the appropriate corrective action. In some cases, no action is required because the message is purely informative, as when a message reports successful completion of a task. In other cases, the only action may be to gather information from the running system for use in diagnosis of the problem by HP field personnel. The following entries from the file /var/adm/syslog/syslog.log show a package that failed to run due to a problem in the pkg5_run script. You would look at the pkg5_run.log for details.
The following is an example of a successful package starting:
Runtime errors appear in the syslog file. If the message contains the keywords cmgmsd and ERROR a hardware or software defect has occurred. Send a copy of the syslog file when requested by HP support. The following is an example:
The ServiceGuard Object Manager daemon cmomd logs messages to the file /var/opt/cmom/cmomd.log. You can review these messages using the cmreadlog command, as follows: # cmreadlog /var/opt/cmom/cmomd.log Messages from cmomd include information about the processes that request data from the Object Manager, including type of data, timestamp, etc. An example of a client that requests data from Object Manager is ServiceGuard Manager. ServiceGuard Manager maintains a log file of user activity. This file is stored in the HP-UX directory /var/opt/sgmgr or the Windows directory X:\Program Files\Hewlett-Packard\ServiceGuard Manager\log (where X refers to the drive on which you have installed ServiceGuard Manager). You can review these messages using the cmreadlog command, as in the following HP-UX example: # cmreadlog /var/opt/sgmgr/929917sgmgr.log Messages from ServiceGuard Manger include information about the login date and time, Object Manager server system, timestamp, etc. Review the following configuration files:
Ensure that the package control script is found on all nodes where the package can run and that the file is identical on all nodes. Ensure that the script is executable on all nodes. Ensure that the name of the control script appears in the package configuration file, and ensure that all services named in the package configuration file also appear in the package control script. Information about the starting and halting of each package is found in the package's control script log. This log provides the history of the operation of the package control script. It is found at /etc/cmcluster/package_name/control.sh.log. This log documents all package run and halt activities. If you have written a separate run and halt script for a package, each script will have its own log. In addition, cmquerycl and cmcheckconf can be used to troubleshoot your cluster just as they were used to verify its configuration. The following example shows the commands used to verify the existing cluster configuration on node 1 and node 2: # cmquerycl -v -C /etc/cmcluster/verify.asc node1 -n node2 # cmcheckconf -v -C /etc/cmcluster/verify.asc The cmcheckconf command checks the following:
the cmcheckconf command does not check the following:
The command cmscancl displays information about all the nodes in a cluster in a structured report that allows you to compare such items as IP addresses or subnets, physical volume names for disks, and other node-specific items for all nodes in the cluster. cmscancl actually runs several different HP-UX commands on all nodes and gathers the output into a report on the node where you run the command. The following are the types of configuration data that cmscancl displays for each node: Table 8-1 Data Displayed by the cmscancl Command
cmviewconf allows you to examine the binary cluster configuration file, even when the cluster is not running. The command displays the content of this file on the node where you run the command. To display the current configuration of a shared volume group, use the vgdisplay -v command. An example is as follows:
The output includes a list of all volume groups, together with the logical volumes configured in them and all the physical volumes associated with them. Physical volume groups are also included. This section describes some approaches to solving problems that may occur with VxVM disk groups in a cluster environment. For most problems, it is helpful to use the vxdg list command to display the disk groups currently imported on a specific node. Also, you should consult the package control script log files for messages associated with importing and deporting disk groups on particular nodes. After certain failures, packages configured with VxVM disk groups will fail to start, and the following error will be seen in the package log file:
This can happen if a package is running on a node which then fails before the package control script can deport the disk group. In these cases, the host name of the node that had failed is still written on the disk group header. When the package starts up on another node in the cluster, a series of messages is printed as in the following example (the hostname of the failed system is ftsys9, and the disk group is dg_01):
Follow the instructions in the message to use the force import option (-C) to allow the current node to import the disk group. Then deport the disk group, after which it can be used again by the package. Example: # vxdg -tfC import dg_01 # vxdg deport dg_01 The force import will clear the host name currently written on the disks in the disk group, after which you can deport the disk group without error so it can then be imported by a package running on a different node.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||