Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
HP A5856A RAID 4Si PCI 4-Channel Ultra2 SCSI Controller: Installation and Administration Guide > Chapter 4 Managing the HP RAID 4Si Product

Monitoring the HP RAID 4Si Product (irmd)

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

irmd, the Internal RAID Monitoring Daemon, monitors the HP RAID 4Si controllers and reports controller, logical drive, and physical drive state changes. Two examples of state changes irmd reports are when a controller's battery is low and when a physical disk drive fails. When irmd detects appropriate state changes, it writes corresponding messages to the system log file (/var/adm/syslog/syslog.log). You can configure irmd to also send those messages to an e-mail address, the HP-UX system console, or both.

NOTE: Because the HP RAID 4Si product contains its own monitoring daemon (irmd), the Event Monitoring System (EMS) is not supported for RAID 4Si logical drives.

Configuring irmd

You configure irmd by changing the IRMD_OPTS variable in the /sbin/init.d/i2o_raid script. The format of the IRMD_OPTS variable is as follows:

-e email_address -c -p polling_interval

where

  • -e specifies that irmd is to send messages to email_address. If you do not want irmd to send messages to any e-mail address, delete -e email_address.

  • -c specifies that irmd is to send messages to the HP-UX system console (/dev/console). If you do not want irmd to send messages to the system console, delete -c.

  • -p specifies that irmd is to poll the HP RAID 4 Si controllers every polling_interval seconds. The default is 5. If you do not need irmd to poll the controllers as often as every 5 seconds, change polling_interval to the number of seconds you want. If you delete -p polling_interval, irmd will poll the controller every 5 seconds (the default).

See the irmd(1M) man page for details.

irmd Messages

This section lists the messages irmd generates. They are grouped as follows: controller, battery, logical drive, and physical drive and disk enclosure.

Controller Messages

This section lists the irmd controller-related messages.



No Internal RAID adapters found, exiting

Cause: The HP RAID 4Si controller is not properly installed.

Action: Power down the system and reinstall the controller.



Error opening device file, exiting

Cause: /dev/iopiop_number does not exist or a driver problem has occurred.

Action: Check if /dev/iopiop_number exists and has correct major number.



ioctl error on device file, exiting

Cause: Kernel misconfiguration or driver problem.

Action: Contact your Hewlett-Packard Support representative.



Adapter hardware_path is NOT RESPONDING

Cause: One minute has elapsed since the firmware has been downloaded to the HP RAID 4Si controller, and the system has not been rebooted. Note that this message is displayed only once. In other words, this message is not continually displayed (as a reminder) until the system is rebooted.

Action: Wait for one irmd poll interval (as specified in the control script /sbin/init.d/i2o_raid). If the message Adapter hardware_path is RESPONDING does not appear, reboot the system.



Configuration Change Detected (device file)

Cause: The RAID configuration has been changed.

Action: No action is needed.

Battery Messages

This section lists the irmd battery-related messages.

TIP: The first action for each of the battery-related messages is "Disable the cache." As explained in each message's text, this means to run IRM and change the Write Policy of each logical drive to WRTHRU. This is necessary because (as explained in “Planning Your Configuration”) the battery is used to hold the data in the cache for up to 72 hours if a system power outage occurs. If an abnormal battery condition persists, the battery might not be able to power the cache memory, and the data might be lost. See “Changing a Logical Drive's Write Policy” for the steps for changing the Write Policy.

You can re-enable the cache (set each logical drive's Write Policy back to WRBACK) when the Adapter hardware_path: Battery is fully charged message appears (see the last battery-related message listed in this section).



Adapter hardware_path: Battery not present. Please run irm. Select the RAID adapter at /dev/iopiop_number and change the policy of all logical drives to WRTHRU to prevent data loss in case of system power outage.

Cause: The battery pack is absent.

Action: Disable the cache and contact your Hewlett-Packard Support representative.



Adapter hardware_path: Battery is not plugged into the J22 socket. Please run irm. Select the RAID adapter at /dev/iopiop_number and change the cache policy of all logical drives to WRTHRU to prevent data loss in case of system power outage.

Cause: The battery is disconnected or malfunctioning.

Action: (1) Disable the cache or (2) power down the system and check the battery connection.



Adapter hardware_path: Battery temperature high. Please run irm. Select the RAID adapter at /dev/iopiop_number and change the cache policy of all logical drives to WRTHRU to prevent data loss in case of system power outage.

Cause: A high battery temperature has been detected.

Action: Disable the cache and reduce the ambient (surrounding) temperature.



Adapter hardware_path: Battery failed to charge. Please run irm. Select the RAID adapter at /dev/iopiop_number and change the cache policy of all logical drives to WRTHRU to prevent data loss in case of system power outage.

Cause: The battery fast charge failed.

Action: Disable the cache and contact your Hewlett-Packard Support representative.



Adapter hardware_path: Battery is charging. Please run irm. Select the RAID adapter at /dev/iopiop_number and change the cache policy of all logical drives to WRTHRU to prevent data loss in case of system power outage.

Cause: A low battery charge has been detected.

Action: Disable the cache and wait 3-5 hours for the charge to complete.



Adapter hardware_path: Battery has low voltage. Please run irm. Select the RAID adapter at /dev/iopiop_number and change the cache policy of all logical drives to WRTHRU to prevent data loss in case of system power outage.

Cause: The battery voltage is out of range.

Action: Disable the cache and contact your Hewlett-Packard Support representative.



Adapter hardware_path: Battery is in unknown state. Please run irm. Select the RAID adapter at /dev/iopiop_number and change the cache policy of all logical drives to WRTHRU to prevent data loss in case of system power outage.

Cause: The battery is in an unknown state.

Action: Disable the cache and contact your Hewlett-Packard Support representative.



Adapter hardware_path: Battery is fully charged. It is safe to set the cache policy to WRBACK if desired. In order to do that, please run irm. Select the RAID adapter at /dev/iopiop_number and change the cache policy of the desired logical drives to WRBACK.

Cause: The battery's condition is good.

Action: Set the Write Policy of the logical drives to WRBACK, if desired (to increase the disk write performance). See “Changing a Logical Drive's Write Policy” for the steps for changing the Write Policy.

Logical Drive Messages

This section lists the irmd logical drive-related messages.

Note that a logical drive can be in one of these states:

  • OPTIMAL—The logical drive is in good condition. All of the configured physical disks are online.

  • DEGRADED—The logical drive is functioning but is not in optimal condition. One of the configured physical drives has failed.

  • OFFLINE—The logical drive is no longer functioning. One or more of the configured physical drives have failed.



LDrv logical_drive_number state at startup is DEGRADED

Cause: The logical drive's condition is not optimal. One of the configured physical drives has failed.

Action: Run IRM to find the failed physical drive, then replace it with a drive in good condition. See “Determining Which Physical Drive Has Failed” for the steps for finding the failed drive.



LDrv logical_drive_number state at startup is OFFLINE

Cause: The logical drive's condition is not optimal. One or more of the configured physical drives have failed.

Action: Run IRM to find each failed physical drive, then replace it with a drive in good condition. See “Determining Which Physical Drive Has Failed” for the steps for finding the failed drives.



LDrv logical_drive_number State Change from previous_state to OPTIMAL

Cause: The logical drive's condition is good. All of the configured physical drives are online.

Action: No action is needed.



LDrv logical_drive_number State Change from previous_state to DEGRADED

Cause: The logical drive's condition is not optimal. One of the configured physical drives has failed.

Action: Run IRM to find the failed physical drive, then replace it with a drive in good condition. See “Determining Which Physical Drive Has Failed” for the steps for finding the failed drive.



LDrv logical_drive_number State Change from previous_state to OFFLINE

Cause: The logical drive's condition is not optimal. One or more of the configured physical drives have failed.

Action: Run IRM to find each failed physical drive, then replace it with a drive in good condition. See “Determining Which Physical Drive Has Failed” for the steps for finding the failed drives.

Physical Drive and Disk Enclosure Messages

This section lists the irmd messages related to physical drives and disk enclosures.

A physical drive can be in one of these states:

  • READY—The drive is functioning normally but is not part of a configured logical drive, and is not designated as a Hot Spare.

  • ONLINE—The drive is functioning normally and is part of a configured logical drive.

  • REBUILD—The drive is being rebuilt with data from a failed drive.

  • HOTSPARE—The drive is functioning normally and is designated as a Hot Spare drive, to be used if an online drive fails.

  • FAILED—A fault has occurred in the drive, placing the drive out of service.

In each of these messages, each physical device is specified as either PDrv x:y or Enclosure x:y, where x is the channel number (0 through 3) and y is the SCSI ID of the physical device.

Figure 4-1 “Mapping of Controller Channels to irmd Channel Numbers” below shows how to determine which channel on the HP RAID 4Si controller corresponds to the channel number (x) in the irmd message text. Note that, on the controller, the channels are identified by the letters A, B, C, and D.

Figure 4-1 Mapping of Controller Channels to irmd Channel Numbers

Mapping of Controller Channels to irmd Channel Numbers

To map a SCSI ID to a disk drive slot, see the SC10 documentation.



Enclosure x:y is in SAF-TE mode (unsupported)

Cause: The disk enclosure is in SAF-TE mode.

Action: See the SC10 documentation for information about how to put the enclosure in the supported (SES) mode.



PDrv x:y at startup is in REBUILD state

Cause: The physical drive is being rebuilt with data from a failed physical drive.

Action: No action is needed.



PDrv x:y at startup is in FAILED state

Cause: A fault has occurred in the physical drive, placing it out of service.

Action: Run IRM to confirm that the drive has failed. See “Determining Which Physical Drive Has Failed” for the steps for doing this. Once you have confirmed it has failed, replace the drive.



PDrv x:y State Change from previous_state to READY

Cause: The physical drive is functioning normally, but is not part of a configured logical drive and is not designated as a Hot Spare.

Action: No action is needed.



PDrv x:y State Change from previous_state to ONLINE

Cause: The physical drive is functioning normally and is part of a configured logical drive.

Action: No action is needed.



PDrv x:y State Change from previous_state to REBUILD

Cause: The physical drive is being rebuilt with data from a failed physical drive.

Action: No action is needed.



PDrv x:y Rebuild Completed

Cause: The physical drive's rebuild has completed successfully.

Action: No action is needed.



PDrv x:y Rebuild percentage% Complete

Cause: The physical drive's rebuild is in progress.

Action: No action is needed.



Rebuild Progress Check Failed (device file)

Cause: The controller fails to respond to the rebuild progress check.

Action: Run IRM to check the rebuild's progress on the physical drives. See “Rebuilding a Failed Physical Drive” for more information.



PDrv x:y State Change from previous_state to HOTSPARE

Cause: The physical drive is powered up and ready for use as a Hot Spare in case an online physical drive fails.

Action: No action is needed.



PDrv x:y is no longer configured as HOTSPARE. It is now automatically configured as part of logical drive(s) and in REBUILD state.

Cause: The physical drive was configured as a Hot Spare, and is now being rebuilt as part of the logical drive(s).

Action: Manually configure another physical drive as a Hot Spare, if needed. See “Configuring a Physical Drive As a Hot Spare”.



PDrv x:y State Change from previous_state to FAILED

Cause: A fault has occurred in the physical drive, placing it out of service.

Action: Run IRM to confirm that the physical drive has failed. See “Determining Which Physical Drive Has Failed” for the steps for doing this. Once you have confirmed it has failed, replace the drive.



PDrv x:y SCSI Sense Data

Cause: A SCSI sense occurrence was detected and sense data is retrieved from the physical drive.

Action: Analyze the SCSI sense data to determine if further action is needed.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© 2002, - Hewlett-Packard Development Company, L.P.