Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
Using High Availability Monitors > Chapter 2 Monitoring Disk Resources

Creating Disk Monitoring Requests

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

There are two ways to create HA Disk Monitor requests:

  • From EMS GUI, to send alerts to HP OpenView ITO, Network Node Manager, email addresses, the console, a textlog file, or the system log.

  • From ServiceGuard, to configure any HA Disk Monitor resource as a package dependency.

These requests are not exclusive. You can configure the HA Disk Monitor from both ServiceGuard and EMS. If you are using EMS to monitor disks for ServiceGuard package dependencies, it is recommended you also configure EMS to send events to alert you when something threatens data or application availability.

NOTE: When ServiceGuard packages are configured, the package configuration (including package dependencies) is automatically distributed to all nodes in the cluster. If you also want to send event notification to your system monitoring software (IT/O), you must configure the same monitoring requests on all nodes in the cluster using the EMS GUI.

The following sections take some common disk configurations in a high availability environment and give examples of the types of monitor requests you might want to create.

Disk Monitoring Request Suggestions

The examples listed in Table 2-8 “Suggestions for Creating Disk Monitor Requests” are valid for both RAID and mirrored configurations.

Table 2-8 Suggestions for Creating Disk Monitor Requests

To be alerted when ...Resources to monitorMonitoring Parameters
Notify

ValueOption

you are at risk for data loss (most common for use with ServiceGuard)

pv_
summary
when value is>=SUSPECT

lv_
summary
when value is>=INACTIVE_DOWN

any disks fail

pv_pvlink/
status/*
when value isnot equalUP

any disks fail, and you want to know when they are back uppv_pvlink/
status/*
when value isnot equalUPRETURN
you want regular reminders to fix inoperative disks, controllers, buses, and host adapters, and you want notification when they are fixedpv_pvlink/status/*at each interval (use a long polling interval, 1 hour or more)=DOWNREPEAT RETURN
any logical volume becomes unavailablelv/status/*when value is not equalUP

you have lost a mirror in your 2-way mirroring environment

lv/copies/*when value is<2

 

The following series of screens provide a sample process for creating an HA Disk Monitor request. These samples use the EMS GUI, though the Package Dependency screens in ServiceGuard are similar. Refer to the Using the Event Monitoring Service (HP Part Number B7612-90015) for specific instructions.

Assume you want to be alerted when any disks fail and when they are back up. Figure 2-2 “Example: Selecting All Instances of /system/filesystem/availMb” shows you can select all instances of pv_pvlink, so you only have to enter the parameters once for each volume group. You still need to create multiple pv_pvlink requests, one for each volume group on your system. Click OK to set monitoring parameters.

Figure 2-2 Example: Selecting All Instances of /system/filesystem/availMb

Example: Selecting All Instances of /system/filesystem/availMb

The parameters for the monitoring request in Figure 2-3 “Example: Configuring /vg/vg01/pv_pvlink/status Parameters to Notify When Disks Fail” request an event notification when the resource value is not equal to UP. The polling interval for checking the resources value is 300. The notification method is an SNMP trap with a minor severity level. No initial, repeat or return values are requested.

Figure 2-3 Example: Configuring /vg/vg01/pv_pvlink/status Parameters to Notify When Disks Fail

Example: Configuring /vg/vg01/pv_pvlink/status Parameters to Notify When Disks Fail

All requests are created in a similar way. You need to make sure you perform these steps for all instances in all volume groups you want to monitor.

Resources to Monitor for RAID Arrays

These considerations are relevant to all RAID supported configurations listed at the beginning of this chapter. To adequately monitor a RAID system, create requests to monitor at least the following resources for all volume groups on a node:

Table 2-9 Resources to Monitor for RAID Arrays

/vg/vgName/
pv_summary

This gives you an overview of the status of the entire physical volume group and is recommended when using EMS in conjunction with ServiceGuard; see “Rules for Using the HA Disk Monitor with ServiceGuard”.

vg/vgName/
pv_pvlink/
status/*

This gives you the status of each PV link in the array and is redundant to pv_summary. It is recommended when using EMS outside of the ServiceGuard environment, or if you require specific status on each physical device.

vg/vgName/
lv_summary

This gives you the status of data availability on the array.

 

Figure 2-4 “RAID Array Example” represents a node with two RAID arrays and two PV links.

Figure 2-4 RAID Array Example

RAID Array Example

Each LUN on the RAID array is in its own volume group: vgdance and vgsing. Assume this is one node in a 2-node cluster and you want to be notified when there is a failover, when any physical device fails, and when any logical volume becomes unavailable.

If you do not have ServiceGuard Manager installed with OpenView, to be notified when a package fails over, you must configure an EMS request that is the same as the package dependency you configured in ServiceGuard. See Using the Event Monitoring Service (HP Part Number B7612-90015). For this example, assume the package UP values were set as UP and PVG_UP.

To configure the EMS alerts, create the following requests:

Table 2-10 Sample Disk Monitoring Requests

Resource

Monitoring Parameters

Notify

Condition

Option

/vg/vgdance/pv_summary

when value is

>

PVG_UP

RETURN

/vg/vgsing/pv_summary

when value is

>

PVG_UP

RETURN

/vg/dance/lv_summary

when value is

>=

INACTIVE

RETURN

/vg/vgsing/lv_summary

when value is

>=

INACTIVE

RETURN

 

If pv_summary is SUSPECT, you know a physical device failed. If pv_summary status is SUSPECT, you may want to look at your lv_summary to see if you can still access all data. If lv_summary is DOWN or INACTIVE_DOWN, you do not have a complete copy of data.

Resources to Monitor for Mirrored Disks

This section is valid for mirrored disks created with MirrorDisk/UX. Mirroring is required to be PVG-strict if you are using the HA Disk Monitor. Mirrored configurations that are not PVG-strict will not give you a correct pv_summary.

To adequately monitor mirrored disks, create requests for the following resources for all volume groups on a node:

Table 2-11 Resources to Monitor for Mirrored Disks

/vg/vgName/
pv_summary

This gives you summary status of all physical volumes in a volume group. A high availability system must be configured PVG strict. If not, pv_summary cannot accurately determine disk availability.

vg/vgName/pv_
pvlink/status/*

This gives you the status of each physical disk and each link.

vg/vgName/
lv_summary

This gives you the status of data. If it is available on the logical volumes.

vg/vgName/lv/
copies/*

This gives you the total number of copies of data currently available. Copies in addition to the original copy.

 

Figure 2-5 “Mirrored Disks Example” represents two nodes with 2-way mirrored configuration with 10 disks on 2 buses. Both copies are in a single volume group. Assume you want to be notified when:

  • Any physical device fails

  • You only have one copy of data

  • There is a ServiceGuard failover.

To configure this last request, you must duplicate your ServiceGuard package dependency.

Figure 2-5 Mirrored Disks Example

Mirrored Disks Example

To configure the EMS alerts, create the requests listed in Table 2-12 “EMS Alert Requests” on each node:

Table 2-12 EMS Alert Requests

Resource

Monitoring Parameters

Notify

Condition

Option

/vg/vg01/pv_summary

when value is

>=

PVG_UP

RETURN

/vg/vg01/lv_summary

when value is

>=

INACTIVE

RETURN

/vg/vg01/lv/copies/*

when value is

<=1

RETURN

 

Alerts need to be interpreted in relation to each other. In the table above, you would get an alert when PVG_UP is true. Although all data is available, the condition PVG_UP implies there are physical volumes that are not functioning and need to be fixed. See Table 2-15 “Root Volumes Monitoring Requests”. You may want to examine lv/copies to see how many copies of data are accessible and determine how urgently you need to repair the failures. For example, if you have 3-way mirroring and only one copy of data is available, you may want to correct the failure immediately to eliminate the single point of failure. Table 2-13 “Example for Interpreting the pv_summary for Mirrored Disks” is an example of how the HA Disk Monitor determines whether data is available in a mirrored configuration with 5 disks on each bus.

Table 2-13 Example for Interpreting the pv_summary for Mirrored Disks

Number of Valid Devices

Meaning

pv_summary
Value

10

all PVs and data accessible

UP

9

1 PV down, all data accessible

PVG_UP

8-5

if 5 PVs are from the same PVG, then all data is available

PVG_UP

if 2 or more physical volumes from different PVGs are DOWN, the HA Disk Monitor cannot conclude that all data is available

SUSPECT

4-1

some data missing

DOWN

0

no data available

 

Resources to Monitor for Lock Disks

Lock disks are used as a tie-breaker in forming or reforming a cluster. If the lock disk is unavailable during cluster formation, the cluster may fail to reform. If you are using a lock disk with your cluster, you should configure a monitoring request for that disk and send an alert to your system management software if the lock disk is unavailable. Requests to monitor the lock disk might look like those listed in Table 2-14 “Lock Disk Monitoring Requests”:

Table 2-14 Lock Disk Monitoring Requests

Resource

Monitoring Parameters

Notify

Condition

Option

/vg/vg02/pv_pvlink/c0t0d1

when value is

>=

BUSY

REPEAT

 

The Repeat value in the Options will send an alert until the lock disk is available.

You need to create a request on each node in the cluster. Because the bus name and SCSI path to the lock disk may be different on each node, the resource instance may have a different name. It is merely a different path to the same lock disk.

Resources to Monitor for Root Volumes

In a high availability system, it is recommended that you mirror your root volume, and have them on separate links in separate PVGs. Note that the root volume should always be ACTIVE. Requests to monitor the root volume might look like those listed in Table 2-15 “Root Volumes Monitoring Requests”:

Table 2-15 Root Volumes Monitoring Requests

Resource

Monitoring Parameters

Notify

Condition

Option

/vg/vg00/pv_pvlink/c0t0d0

when value is

>=

BUSY

REPEAT

/vg/vg00/pv_pvlink/c1t0d0

when value is

>=

BUSY

REPEAT

/vg/vg00/lv_summary

when value is

not equalUP

RETURN

/vg/vg00/lv/copies/lv01

when value is

<

1

RETURN

 

If one of the root volumes is unavailable, you are alerted and told which one has failed (pv_pvlink/status). You are alerted if you lose a root disk mirror. With the RETURN option, you are also notified when the mirror is restored.

Excluding Volume Groups from being Monitored

This feature implements an optional config file to the HA Disk Monitor. If the config file is not present when the HA Disk Monitor begins, the HA Disk Monitor processes all volume groups found in /etc/lvmtab. This is the default behavior.

If the HA Disk Monitor config file is present, the HA Disk Monitor reads the config file. Any volume groups in the file that are found are excluded (filtered out) from monitoring by the HA Disk Monitor. The config file must be located in the directory /etc/opt/resmon/monitors and be named diskmond.config. You must create this file in order to enable the filtering-out. There is a sample config file supplied in /etc/opt/resmon/monitors/diskmond.config.sample. The file and format are also documented in the diskmond manpage.

The diskmond.config file contains:

exclude_vg=/dev/vgxx

where—vgxx is the volume group to be excluded.

Add one line for each volume group to exclude. Ownership on the config file should be bin:bin. If the config file is present but empty, HA Disk Monitor assumes the default behavior and monitors all the disks. If there are typographical errors in it, HA Disk Monitor attempts to honor the exclude_vg's that it can find, if any.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© 1997, 2003 Hewlett-Packard Development Company, L.P.