 |
» |
|
|
 |
irmd, the Internal RAID Monitoring Daemon, monitors
the HP RAID 4Si controllers and reports controller, logical drive,
and physical drive state changes. Two examples of state changes irmd reports are when a controller's battery
is low and when a physical disk drive fails. When irmd detects appropriate state changes, it writes corresponding messages
to the system log file (/var/adm/syslog/syslog.log). You can configure irmd to also send those messages to an e-mail address,
the HP-UX system console, or both.  |  |  |  |  | NOTE: Because the HP RAID 4Si product contains its own monitoring
daemon (irmd), the Event Monitoring System (EMS) is not supported
for RAID 4Si logical drives. |  |  |  |  |
Configuring
irmd |  |
You configure irmd by changing the IRMD_OPTS variable in the /sbin/init.d/i2o_raid script. The format of the IRMD_OPTS variable is as follows: -e email_address -c -p polling_interval |
where -e specifies that irmd is to send messages to email_address. If you do not want irmd to send messages to any e-mail address, delete
-e email_address. -c specifies that irmd is to send messages to the HP-UX system console
(/dev/console). If you do not want irmd to send messages to the system console, delete
-c. -p specifies that irmd is to poll the HP RAID 4 Si controllers every
polling_interval seconds. The default is 5. If you do not need irmd to poll the controllers as often as every 5 seconds,
change polling_interval to the number of seconds you want. If you delete
-p polling_interval, irmd will poll the controller every 5 seconds (the
default).
See the irmd(1M) man page for details. irmd Messages |  |
This section lists the messages irmd generates. They are grouped as follows: controller,
battery, logical drive, and physical drive and disk enclosure. This section lists the irmd controller-related messages. No Internal RAID
adapters found, exiting | Cause: The HP RAID 4Si controller is not properly
installed. Action: Power down the system and reinstall the controller. |
Error opening device
file, exiting | Cause: /dev/iopiop_number does not exist or a driver problem has occurred. Action: Check if /dev/iopiop_number exists and has correct major number. |
ioctl error on device
file, exiting | Cause: Kernel misconfiguration or driver problem. Action: Contact your Hewlett-Packard Support representative. |
Adapter hardware_path is NOT RESPONDING | Cause: One minute has elapsed since the firmware
has been downloaded to the HP RAID 4Si controller, and the system
has not been rebooted. Note that this message is displayed only
once. In other words, this message is
not continually displayed (as a reminder) until the system
is rebooted. Action: Wait for one irmd poll interval (as specified in the control script /sbin/init.d/i2o_raid). If the message Adapter hardware_path is RESPONDING does not appear, reboot the system. |
Configuration Change
Detected (device file) | Cause: The RAID configuration has been changed. Action: No action is needed. |
This section lists the irmd battery-related messages. Adapter hardware_path: Battery not present. Please run irm. Select the
RAID adapter at /dev/iopiop_number and change the policy of all logical drives to WRTHRU
to prevent data loss in case of system power outage. | Cause: The battery pack is absent. Action: Disable the cache and contact your Hewlett-Packard
Support representative. |
Adapter hardware_path: Battery is not plugged into the J22 socket. Please
run irm. Select the RAID adapter at /dev/iopiop_number and change the cache policy of all logical drives
to WRTHRU to prevent data loss in case of system power outage. | Cause: The battery is disconnected or malfunctioning. Action: (1) Disable the cache or (2) power down the system
and check the battery connection. |
Adapter hardware_path: Battery temperature high. Please run irm. Select
the RAID adapter at /dev/iopiop_number and change the cache policy of all logical drives
to WRTHRU to prevent data loss in case of system power outage. | Cause: A high battery temperature has been detected. Action: Disable the cache and reduce the ambient (surrounding) temperature. |
Adapter hardware_path: Battery failed to charge. Please run irm. Select
the RAID adapter at /dev/iopiop_number and change the cache policy of all logical drives
to WRTHRU to prevent data loss in case of system power outage. | Cause: The battery fast charge failed. Action: Disable the cache and contact your Hewlett-Packard
Support representative. |
Adapter hardware_path: Battery is charging. Please run irm. Select the RAID
adapter at /dev/iopiop_number and change the cache policy of all logical drives
to WRTHRU to prevent data loss in case of system power outage. | Cause: A low battery charge has been detected. Action: Disable the cache and wait 3-5 hours for the charge
to complete. |
Adapter hardware_path: Battery has low voltage. Please run irm. Select
the RAID adapter at /dev/iopiop_number and change the cache policy of all logical drives
to WRTHRU to prevent data loss in case of system power outage. | Cause: The battery voltage is out of range. Action: Disable the cache and contact your Hewlett-Packard
Support representative. |
Adapter hardware_path: Battery is in unknown state. Please run irm. Select
the RAID adapter at /dev/iopiop_number and change the cache policy of all logical drives
to WRTHRU to prevent data loss in case of system power outage. | Cause: The battery is in an unknown state. Action: Disable the cache and contact your Hewlett-Packard
Support representative. |
Adapter hardware_path: Battery is fully charged. It is safe to set the cache
policy to WRBACK if desired. In order to do that, please run irm. Select
the RAID adapter at /dev/iopiop_number and change the cache policy of the desired logical
drives to WRBACK. | Cause: The battery's condition is good. Action: Set the Write Policy of the logical drives to WRBACK, if desired (to increase the disk write performance).
See “Changing a Logical
Drive's Write Policy” for the
steps for changing the Write Policy. |
This section lists the irmd logical drive-related messages. Note that a logical drive can be in one of these states: OPTIMAL—The
logical drive is in good condition. All of the configured physical
disks are online. DEGRADED—The logical
drive is functioning but is not in optimal condition. One of the
configured physical drives has failed. OFFLINE—The logical
drive is no longer functioning. One or more of the configured physical
drives have failed.
LDrv logical_drive_number state at startup is DEGRADED | Cause: The logical drive's condition is
not optimal. One of the configured physical drives has failed. Action: Run IRM to find the failed physical drive, then replace
it with a drive in good condition. See “Determining Which
Physical Drive Has Failed” for the steps for finding the failed drive. |
LDrv logical_drive_number state at startup is OFFLINE | Cause: The logical drive's condition is
not optimal. One or more of the configured physical drives have
failed. Action: Run IRM to find each failed physical drive, then replace
it with a drive in good condition. See “Determining Which
Physical Drive Has Failed” for the steps for finding the failed drives. |
LDrv logical_drive_number State Change from previous_state to OPTIMAL | Cause: The logical drive's condition is
good. All of the configured physical drives are online. Action: No action is needed. |
LDrv logical_drive_number State Change from previous_state to DEGRADED | Cause: The logical drive's condition is
not optimal. One of the configured physical drives has failed. Action: Run IRM to find the failed physical drive, then replace
it with a drive in good condition. See “Determining Which
Physical Drive Has Failed” for the steps for finding the failed drive. |
LDrv logical_drive_number State Change from previous_state to OFFLINE | Cause: The logical drive's condition is
not optimal. One or more of the configured physical drives have
failed. Action: Run IRM to find each failed physical drive, then replace
it with a drive in good condition. See “Determining Which
Physical Drive Has Failed” for the steps for finding the failed drives. |
Physical Drive and Disk Enclosure MessagesThis section lists the irmd messages related to physical drives and disk enclosures. A physical drive can be in one of these states: READY—The drive
is functioning normally but is not part of a configured logical
drive, and is not designated as a Hot Spare. ONLINE—The drive
is functioning normally and is part of a configured logical drive. REBUILD—The drive
is being rebuilt with data from a failed drive. HOTSPARE—The drive
is functioning normally and is designated as a Hot Spare drive,
to be used if an online drive fails. FAILED—A fault has
occurred in the drive, placing the drive out of service.
In each of these messages, each physical device is specified
as either PDrv x:y or Enclosure x:y, where x is the channel number (0 through 3) and y is the SCSI ID of the physical device. Figure 4-1 “Mapping of Controller Channels to irmd Channel
Numbers” below shows how
to determine which channel on the HP RAID 4Si controller corresponds
to the channel number (x) in the irmd message text. Note that, on the controller, the
channels are identified by the letters A, B, C, and D.
To map a SCSI ID to a disk drive slot, see the SC10 documentation. Enclosure x:y is in SAF-TE mode (unsupported) | Cause: The disk enclosure is in SAF-TE mode. Action: See the SC10 documentation for information about how
to put the enclosure in the supported (SES) mode. |
PDrv x:y at startup is in REBUILD state | Cause: The physical drive is being rebuilt with
data from a failed physical drive. Action: No action is needed. |
PDrv x:y at startup is in FAILED state | Cause: A fault has occurred in the physical drive,
placing it out of service. Action: Run IRM to confirm that the drive has failed. See “Determining Which
Physical Drive Has Failed” for the steps for
doing this. Once you have confirmed it has failed, replace the drive. |
PDrv x:y State Change from previous_state to READY | Cause: The physical drive is functioning normally,
but is not part of a configured logical drive and is not designated
as a Hot Spare. Action: No action is needed. |
PDrv x:y State Change from previous_state to ONLINE | Cause: The physical drive is functioning normally
and is part of a configured logical drive. Action: No action is needed. |
PDrv x:y State Change from previous_state to REBUILD | Cause: The physical drive is being rebuilt with
data from a failed physical drive. Action: No action is needed. |
PDrv x:y Rebuild Completed | Cause: The physical drive's rebuild has
completed successfully. Action: No action is needed. |
PDrv x:y Rebuild percentage% Complete | Cause: The physical drive's rebuild is
in progress. Action: No action is needed. |
Rebuild Progress
Check Failed (device file) | Cause: The controller fails to respond to the rebuild
progress check. Action: Run IRM to check the rebuild's progress on
the physical drives. See “Rebuilding a Failed
Physical Drive” for more information. |
PDrv x:y State Change from previous_state to HOTSPARE | Cause: The physical drive is powered up and ready
for use as a Hot Spare in case an online physical drive fails. Action: No action is needed. |
PDrv x:y is no longer configured as HOTSPARE. It is now automatically configured
as part of logical drive(s) and in REBUILD state. | Cause: The physical drive was configured as a Hot
Spare, and is now being rebuilt as part of the logical drive(s). Action: Manually configure another physical drive as a Hot
Spare, if needed. See “Configuring a Physical
Drive As a Hot Spare”. |
PDrv x:y State Change from previous_state to FAILED | Cause: A fault has occurred in the physical drive,
placing it out of service. Action: Run IRM to confirm that the physical drive has failed.
See “Determining Which
Physical Drive Has Failed” for the steps
for doing this. Once you have confirmed it has failed, replace the drive. |
PDrv x:y SCSI Sense Data | Cause: A SCSI sense occurrence was detected and
sense data is retrieved from the physical drive. Action: Analyze the SCSI sense data to determine if further
action is needed. |
|