Event 3
- Severity: Information
- Summary: Array with S/N XXXXXXX no longer being monitored.
- Description of Event: This is generated whenever an array first becomes inaccessible to the ARMServer.
- Probable Cause / Recommended Action: Insure that the array named should not be accessible.
Event 4
- Severity: Major Warning
- Summary: The computer can not connect to the ARMServer. The status on the disk array is not currently available.
- Description of Event: This event message is displayed when a connection cannot be made to the ARMServer. The resource monitor will continue to attempt to connect to the ARMServer at regular polling intervals. When the ARMServer is restored, the connection will succeed and normal operation will continue. Until the ARMServer is operating normally, it is not possible to obtain the current status of any of the disk arrays.
Possible Causes Include:
- The ARMServer is not running.
- The wrong version of the ARMServer is running.
- The correct ARMServer is running, but is hung.
- Probable Cause / Recommended Action: Restart ARMServer, or if still running, stop and restart the ARMServer.
Event 5
- Severity: Major Warning
- Summary: The controller in slot X is not accessible.
- Description of Event: A check of the access path to the indicated controller has failed. The firmware does not report any trouble with the indicated controller.
Possible causes include:
- SCSI Cable
- Host Bus Adapter
- Undetected failure of the indicated controller.
- Probable Cause / Recommended Action: Check the path to the indicated controller.
Event 6
- Severity: Serious
- Summary: The controller in slot Y has failed, or one of the SIMMs in the controller in slot Y has failed.
- Description of Event: This event message is displayed by one of several conditions:
- One of the controllers in a two-controller configuration has failed. Possible causes are any number of hardware and/or firmware related malfunctions.
- Cross-controller communication is broken The controller in slot has been disabled. The controller in slot is the only active controller. Note that this event message means that the controller in slot X (index value 0) is active, and that the controller in slot Y (index value 1) is not active. This is the firmware-defined convention where each controller seems able to operate, yet certain communication attempts between the controllers have failed.
- There is some anomaly manipulating mirrored/shared memory. The firmware in the controller in slot assumes control of the disk array and abandons further attempts to communicate with the controller in slot Y (the firmware in the controller in slot Y shuts controller Y down). This situation is different than when one controller fails completely. When one controller fails, it can be recognized as a controller failure and the normal dual-controller failover logic takes care of the situation.
- Probable Cause / Recommended Action: Replace the indicated controller.
Event 8
- Severity: Major Warning
- Summary: Secondary controller revision is different than primary controller revision.
- Description of Event: This event message is displayed when a secondary controller reports a different firmware revision than the primary controller. The primary controller will operate in a condition independent of the secondary controller. This error can only be fixed by installing controllers with matching firmware, or downloading (copying) firmware from one controller to the other (or downloading firmware to both controllers from the host).
- Probable Cause / Recommended Action: Replace the secondary controller with a controller that matches the primary controller's firmware.
Event 9
- Severity: Minor Warning
- Summary: The disk array is configured for dual controllers, but only one controller is present.
- Description of Event: This event message is displayed when the array has a single controller present. Because of this, mapping information is exposed to a single point of failure. Follow the procedure in the HP AutoRAID Disk Array Model 12H Users and Service Manual (HP Part Number C5445-90901) for installing a second controller. Otherwise, to disable this warning type: arraymgr -J SingleController
- Probable Cause / Recommended Action: Install second controller.
Event 10
- Severity: Major Warning
- Summary: Both controllers do not have the same RAM image. There is a loss of NVRAM redundancy.
- Description of Event: This event message is displayed when the mirroring hardware is unable to keep the two RAM images synchronized across the two controllers.
Possible causes include:
- A controller has failed, but no obvious failure mode is evident.
- A controller is not communicating at all.
- Probable Cause / Recommended Action: Investigate the hardware log and use other warning conditions as a guide to the problem. You may need to replace the controller.
Event 11
- Severity: Serious
- Summary: Disk in slot XX has failed.
- Description of Event: This event message is displayed when the controller has determined that the indicated drive is included, but is not sufficiently functional to be used. The indicated drive could have also failed during initialization.
- Probable Cause / Recommended Action: Replace the disk in the indicated slot.
Event 12
- Severity: Major Warning
- Summary: SMART event for drive in slot XX.
- Description of Event: This event message is displayed when the indicated drive emits a SMART event. This indicates that the drive may fail and should be replaced.
- Probable Cause / Recommended Action: Replace the drive in the indicated slot.
Event 13
- Severity: Minor Warning
- Summary: Disk in slot XX belongs to a different disk set.
- Description of Event: This event message is displayed when the indicated drive contains information that is previously being used by another disk array controller. This can happen if:
Either format the disk array to erase the previous disk format (erasing ALL user data) or remove the disk(s) with the incompatible format.
- A user removed the drive from another disk array and inserted it into the disk array reporting the event.
- One or more disks installed contain a disk format that is incompatible with this disk array.
- Probable Cause / Recommended Action: Include the disk (using the System Administration Manager), or replace the disk with a new disk with the correct format.
Event 14
- Severity: Serious
- Summary: Fan in slot XX is missing or has failed.
- Description of Event: This event message is displayed when the indicated fan has failed or has been removed. If two or more fans fail, or are removed, the disk array shuts down immediately. If one of the fans fail or is removed, then the disk array shuts down within 15 minutes.
Possible causes include:
- fan motor or bearing wearing out,
- foreign object stuck in fan blade, fan door stuck closed, etc.
- Probable Cause / Recommended Action: Replace or install the fan in the indicated slot.
Event 15
- Severity: Serious
- Summary: Power supply in slot XX is missing or has failed.
- Description of Event: This event message is displayed when the indicated power supply has failed or has been removed. If two or more power supplies fail, or are removed, the disk array shuts down immediately.
Possible causes include:
- A power supply component failure.
- An external circuit problem.
- Probable Cause / Recommended Action: Replace or install the power supply in the indicated slot.
Event 16
- Severity: Major Warning
- Summary: One or more of the batteries in the Controller in slot X have failed.
- Description of Event: This event message is displayed when there is not enough electrical charge on one of the battery packs in the indicated controller to maintain NVRAM. Each controller contains two battery packs. If both battery packs cannot provide the minimum protection for the NVRAM, then the controller refuses to boot. Without NVRAM protection, all user data may be lost. When a battery pack has been fully charged and then enters the warning state, it may last only about an hour or less if called upon to protect NVRAM. Possible causes include:
- Power has never been applied, and the batteries need to be charged.
- Probable Cause / Recommended Action: Replace both batteries (replace them one at a time, to preserve NVRAM) in the indicated controller.
Event 17
- Severity: Serious
- Summary: Firmware revisions on the two controllers do not match.
- Description of Event: This event message is displayed by one of several conditions.
- The secondary controller has a firmware revision that does not match the primary controller. The primary controller will operate in another state independently from the secondary controller. This error can be fixed by downloading firmware from the host through the other controller (downloading firmware will download to both controllers).
- The disk array has lost microcode. This state is entered when the FLASH ROM is erased in preparation for firmware download and persists until the disk array is reset following a successful download of firmware. A possible cause is controllers with different firmware revisions have been installed into the disk array.
- Probable Cause / Recommended Action: Copy firmware from primary to secondary controller, or download newer firmware from the host to both controllers.
Event 18
- Severity: Critical
- Summary: Less than half of the Disks are present.
- Description of Event: This event message is displayed when the Part field (in the disk array parameters mode page) is clear, and not enough disks are available to continue without partitioning the volume set. Continued read/write access when no quorum is available may lead to partitioning of the volume set (a condition where the volume set that existed on one controller now exists on two controllers). A partitioned volume set may diverge in its content.
- Probable Cause / Recommended Action: Install the missing disks, or format the array.
Event 19
- Severity: Critical
- Summary: Some user data is unavailable.
- Description of Event: This event message is displayed when the disk array is not able to provide some user data in response to host requests. This may be due to the failure or the removal of so many disk drives that it is impossible to access or reconstruct user data. The 'Data Unavailability' condition takes into account disk drives only (not other components such as controllers, etc). Thus, the failure of all fans or all controllers does NOT cause 'Data Unavailability'. The only possible cause is more than one disk from the volume set is inaccessible.
- Probable Cause / Recommended Action: Replace the drives and restore the data from a backup.
Event 20
- Severity: Critical
- Summary: Internal Metadata tables not available.
- Description of Event: This event message is displayed when the disk array lacks internal metadata critical to the continued operation of the disk array. The data structures necessary to perform host data storage commands have been lost or were not created. When the disk array reaches this state, recovering data is difficult for firmware revisions prior to HP40 (ignoring the case where disks were simply removed and are reinserted). Prior to HP40 firmware, recovery of data can be attempted (with some success) using disk pass through commands. After HP40 firmware, use the arrayrecover utility to restore the metadata.
- Probable Cause / Recommended Action: Correct the bad configuration by using the Recover utility, or format array.
Event 21
- Severity: Major Warning
- Summary: Data Redundancy has been lost.
- Description of Event: This event message is displayed when there is not sufficient information available to reconstruct user data should another disk fail or be removed from the volume set. 'Data Redundancy Loss' represents a degraded state, whereas, 'Data Unavailability' represents a total lack of ability to access user data. The 'Data Redundancy Loss' condition takes into account disk drives only, and not other components such as controllers, fans, etc. Thus, the failure of just one fan or controller in a configuration with redundant components does NOT cause 'Data Redundancy Loss.' The only cause is one disk from the volume set is inaccessible.
- Probable Cause / Recommended Action: Perform a rebuild on the disk array. You may need to add more disks.
Event 22
- Severity: Major Warning
- Summary: Active Hot Spare request can not be accomplished. Not enough physical space to provide active hot spare.
- Description of Event: This event message is displayed when the available disk space is not sufficient to rebuild data redundancy if the largest surviving disk drive in the volume fails. This event can only occur if the option is set to reserve enough space to ensure that a rebuild operation will not fail.
Possible causes include:
- Any 'down' or missing disk may have removed too much available space, preventing any attempt at a rebuild operation.
- Too much available disk space was allocated to SCSI LUNs before the active spare reservation was requested.
- Probable Cause / Recommended Action: Add more disks to the array to increase capacity.
Event 23
- Severity: Major Warning
- Summary: Rebuild process terminated abnormally.
- Description of Event: This event message is displayed when the process of rebuilding redundant information for protection of user data could not be completed.
Possible causes include:
- Restore process encountered a 'Data Unavailability' condition.
- Insufficient disk array capacity.
- Probable Cause / Recommended Action: Investigate the error messages in the hardware log to determine the problem.
Event 24
- Severity: Serious
- Summary: A map recovery operation failed.
- Description of Event: This event message is displayed when a recovery operation is started, but does not complete successfully.
Possible causes include:
- Power failure during recovery.
- Reset during recovery.
- Corrupt recovery data.
- Failure of parity scan phase of recovery.
- Removal of power supply, fan, disks or controller during recovery
- Probable Cause / Recommended Action: Examine the hardware logs. If the message is due to an interruption, then re-issue the Recover command.
Event 25
- Severity: Major Warning
- Summary: Controller event logs are full.
- Description of Event: This event message is displayed when one of the disk array log pages has filled or that a log parameter has reached its maximum value. In this state, important diagnostic data logged by the firmware will not be logged.
- Probable Cause / Recommended Action: Events are being added to the hardware logs too quickly and indicate a problem with the disk array. Use the READLOG utility to examine the logs for intermittent failures.
Event 26
- Severity: Major Warning
- Summary: Both Recoverable Mode is set to HIGH and Write Cache Enable are TRUE. High Recoverable Mode requires Write Cache Enable to be FALSE to be effective.
- Description of Event: This event message is displayed when the resiliency parameters are set to a high recoverability mode and the Write Cache Enable mode bit is set to TRUE. In order to obtain high resiliency to data loss, the Write Cache Enable must be set FALSE. The method to change this parameter differs depending upon the operating system and/or SCSI driver. Possible causes include: user request (using a client program to change resiliency parameters) or operating system and/or SCSI drivers setting Write Cache Enable.
- Probable Cause / Recommended Action: Use the specific methods for your operating system to set Write Cache Enable to FALSE for the disk array, or set Recoverable Mode to NORMAL.
Event 100
- Severity: Information
- Event Summary: Disk at hardware path x/x.x.x : Device added to monitoring
- Event Class: I/O
- Problem Description:
The device has been added to the set of devices being monitored by this monitor.- Probable Cause / Recommended Action
The device was added to the system, has started responding to the system.
The /var/stm/data/os_decode_xref file was modified to add information indicating this device is now supported by this monitor.
If the device is in the DOWN state and this monitor does not control setting the state of the device to UP (check by running the "/etc/opt/resmon/lbin/set_fix ed -L" command), use the "/etc/opt/resmon/lbin/set_fixed -n <RESOURCE_NAME>" command to set the state of the device to UP.- Automated Recovery: None
- Event Generation Threshold: 1 occurrence <RESOURCE_NAME>
Event 101
- Severity: Critical
- Event Summary: Disk at hardware path x/xx/x.x.x : Device removed from monitoring
- Event Class: I/O
- Problem Description:
The device has been removed from the list of devices being monitored by this monitor.- Probable Cause / Recommended Action
The device was removed from the system, has stopped responding to the system or it has been replaced with a device that is not supported by this monitor.
Run ioscan to determine the state and type of the device.
Check the /var/stm/data/os_decode_xref for the information indicating which devices are supported by this monitor.
Check other monitors to determine if they are now monitoring the device by running /etc/opt/resmon/lbin/monconfig and using the "Check monitoring" command.- Automated Recovery: None
- Event Generation Threshold: 1 occurrence
Event 103
- Severity: Information
- Event Summary: Disk at hardware path x/xx/x.x.x : Test event
- Event Class: Memory
- Problem Description:
This is a test message from the monitor to test the communication path from the monitor to the notification mechanism.- Probable Cause / Recommended Action
No action required.- Automated Recovery: None
- Event Generation Threshold: 1 occurrence
Events 100000-1nnnnn
Events 100000-1nnnnn are default SCSI Events. All are asynchronous. For a list of these events, see:
- New List of SCSI Events (June 2000 and later versions of diagnostics)
- Old List of SCSI Events (Feb 1999 - March 2000 versions of diagnostics)