The following sections offer a few suggestions for troubleshooting
by reviewing the state of the running system and by examining cluster status
data, log files, and configuration files. Topics include:
Reviewing Package IP Addresses
Reviewing the System Log File
Reviewing Configuration Files
Reviewing the Package Control Script
Using cmquerycl and cmcheckconf
Using cmscancl and cmviewcl
Reviewing the LAN Configuration
 |
 |  |
 |
 | NOTE: The use of Serviceguard Manager is recommended for observing
the current status of a cluster and viewing the properties of cluster
objects. See “Using Serviceguard Manager” in Chapter
7 for information about running Serviceguard Manager. |
 |
 |  |
 |
Reviewing
Package IP Addresses |
 |
The netstat -in command can be used to examine the LAN configuration.
The command, if executed on ftsys9 after the halting of node ftsys10, shows that the package IP addresses are assigned
to lan0 on ftsys9 along with the heartbeat
IP address.
Name Mtu Network Address Ipkts Opkts ni0* 0 none none 0 0 ni1* 0 none none 0 0 lo0 4608 127 127.0.0.1 10114 10114 lan0 1500 15.13.168 15.13.171.14 959269 305189 lan0 1500 15.13.168 15.13.171.23 959269 305189 lan0 1500 15.13.168 15.13.171.20 959269 305189 lan1* 1500 none none 418623 41716
|
Reviewing
the System Log File |
 |
Messages from the Cluster Manager and Package Manager are
written to the system log file. The default location of the log
file is /var/adm/syslog/syslog.log. You can use a text editor, such as vi, or the more command to view the log file for historical information
on your cluster.
This log provides information on the following:
Commands executed and their outcome.
Major cluster events which may, or may not, be errors.
Cluster status information.
 |
 |  |
 |
 | NOTE: Many other products running on HP-UX in addition to
Serviceguard use the syslog.log file to save messages. The HP-UX Managing
Systems and Workgroups manual provides additional information
on using the system log. |
 |
 |  |
 |
Sample
System Log Entries
The following entries from the file /var/adm/syslog/syslog.log show a package that failed to run due to a problem
in the pkg5_run script. You would look at the pkg5_run.log for details.
 |
Dec 14 14:33:48 star04 cmcld[2048]: Starting cluster management protocols. Dec 14 14:33:48 star04 cmcld[2048]: Attempting to form a new cluster Dec 14 14:33:53 star04 cmcld[2048]: 3 nodes have formed a new cluster Dec 14 14:33:53 star04 cmcld[2048]: The new active cluster membership is: star04(id=1) , star05(id=2), star06(id=3) Dec 14 17:33:53 star04 cmlvmd[2049]: Clvmd initialized successfully. Dec 14 14:34:44 star04 CM-CMD[2054]: cmrunpkg -v pkg5 Dec 14 14:34:44 star04 cmcld[2048]: Request from node star04 to start package pkg5 on node star04. Dec 14 14:34:44 star04 cmcld[2048]: Executing '/etc/cmcluster/pkg5/pkg5_run start' for package pkg5. Dec 14 14:34:45 star04 LVM[2066]: vgchange -a n /dev/vg02 Dec 14 14:34:45 star04 cmcld[2048]: Package pkg5 run script exited with NO_RESTART. Dec 14 14:34:45 star04 cmcld[2048]: Examine the file /etc/cmcluster/pkg5/pkg5_run.log for more details. |
The following is an example of a successful package starting:
Dec 14 14:39:27 star04 CM-CMD[2096]: cmruncl Dec 14 14:39:27 star04 cmcld[2098]: Starting cluster management protocols. Dec 14 14:39:27 star04 cmcld[2098]: Attempting to form a new cluster Dec 14 14:39:27 star04 cmclconfd[2097]: Command execution message Dec 14 14:39:33 star04 cmcld[2098]: 3 nodes have formed a new cluster Dec 14 14:39:33 star04 cmcld[2098]: The new active cluster membership is: star04(id=1), star05(id=2), star06(id=3) Dec 14 17:39:33 star04 cmlvmd[2099]: Clvmd initialized successfully. Dec 14 14:39:34 star04 cmcld[2098]: Executing '/etc/cmcluster/pkg4/pkg4_run start' for package pkg4. Dec 14 14:39:34 star04 LVM[2107]: vgchange /dev/vg01 Dec 14 14:39:35 star04 CM-pkg4[2124]: cmmodnet -a -i 15.13.168.0 15.13.168.4 Dec 14 14:39:36 star04 CM-pkg4[2127]: cmrunserv Service4 /vg01/MyPing 127.0.0.1 >>/dev/null Dec 14 14:39:36 star04 cmcld[2098]: Started package pkg4 on node star04. |
Reviewing
Object Manager Log Files |
 |
The Serviceguard Object Manager daemon cmomd logs messages to the file /var/opt/cmom/cmomd.log. You can review these messages using the cmreadlog command, as follows:
# cmreadlog /var/opt/cmom/cmomd.log
Messages from cmomd include information about the processes that request
data from the Object Manager, including type of data, timestamp,
etc. An example of a client that requests data from Object Manager is Serviceguard
Manager.
Reviewing
Serviceguard Manager Log Files |
 |
Serviceguard Manager maintains a log file of user activity.
This file is stored in the HP-UX directory /var/opt/sgmgr or the Windows directory X:\Program Files\Hewlett-Packard\Serviceguard Manager\log (where X refers to the drive on which you have installed
Serviceguard Manager). You can review these messages using the cmreadlog command, as in the following HP-UX example:
# cmreadlog /var/opt/sgmgr/929917sgmgr.log
Messages from Serviceguard Manger include information about
the login date and time, Object Manager server system, timestamp,
etc.
Reviewing
Configuration Files |
 |
Review the following ASCII configuration files:
Cluster configuration file.
Package configuration files.
Ensure that the files are complete and correct according to
your configuration planning worksheets.
Reviewing
the Package Control Script |
 |
Ensure that the package control script is found on all nodes
where the package can run and that the file is identical on all
nodes. Ensure that the script is executable on all nodes. Ensure
that the name of the control script appears in the package configuration
file, and ensure that all services named in the package configuration
file also appear in the package control script.
Information about the starting and halting of each package
is found in the package’s control script log. This log
provides the history of the operation of the package control script.
It is found at /etc/cmcluster/package_name/control_script.log. This log documents all package run and halt activities.
If you have written a separate run and halt script for a package,
each script will have its own log.
Using
the cmcheckconf Command |
 |
In addition, cmcheckconf can be used to troubleshoot your cluster just
as it was used to verify the configuration.
The following example shows the commands used to verify the
existing cluster configuration on ftsys9 and ftsys10:
# cmquerycl -v -C /etc/cmcluster/verify.ascii -n ftsys9 -n ftsys10 # cmcheckconf -v -C /etc/cmcluster/verify.ascii |
The cmcheckconf command checks:
The network addresses and connections.
The cluster lock disk connectivity.
The validity of configuration parameters of the
cluster and packages for:
The existence and permission of scripts.
It doesn’t check:
The correct setup of the power circuits.
The correctness of the package configuration script.
Using
the cmscancl Command |
 |
The command cmscancl displays information about all the nodes in a cluster
in a structured report that allows you to compare such items as IP
addresses or subnets, physical volume names for disks, and other node-specific
items for all nodes in the cluster. cmscancl actually runs several different HP-UX commands
on all nodes and gathers the output into a report on the node where
you run the command.
The following are the types of configuration data that cmscancl displays for each node:
Table 8-1 Data Displayed by the cmscancl Command
Description | Source of Data |
|---|
LAN device configuration and status | lanscan command |
network status and interfaces | netstat command |
file systems | mount command |
LVM configuration | /etc/lvmtab file |
LVM physical volume group data | /etc/lvmpvg file |
link level connectivity for all links | linkloop command |
binary configuration file | cmviewconf command |
Using
the cmviewconf Command |
 |
cmviewconf allows you to examine the binary cluster configuration
file, even when the cluster is not running. The command displays
the content of this file on the node where you run the command.
Reviewing
the LAN Configuration |
 |
The following networking commands can be used to diagnose
problems:
netstat -in can be used to examine the LAN configuration. This command
lists all IP addresses assigned to each LAN interface card.
lanscan can also be used to examine the LAN configuration. This command
lists the MAC addresses and status of all LAN interface cards on
the node.
arp -a can be used to check the arp tables.
landiag is useful to display, diagnose, and reset LAN card information.
linkloop verifies the communication between LAN cards at MAC address
levels. For example, if you enter
# linkloop -i4 0x08000993AB72 |
you should see displayed the following message:
Link Connectivity to LAN station: 0x08000993AB72 OK |
cmscancl can be used to verify that primary and standby LANs are on
the same bridged net.
cmviewcl -v shows the status of primary and standby LANs.
Use these commands on all nodes.