This document covers different versions of the Hardware MonitorsNote: An exclamation point (!) in the text will be replaced by a value in the text actually displayed in an event message. For example:
- Events 1-3: Releases from Feb 99 (IPR 9902) to March 00 (IPR 0003)
- Events 1000 and up: Releases from June 00 (IPR 0006) forward
"The Page Deallocation Table (PDT) is !% full"
might actually appear in a event message as:
"The Page Deallocation Table (PDT) is 70% full."Event 1
- Severity: Major Warning
- Description: The Page De-allocation Table (PDT) is XX% full.
- Cause: The operating system was not able to disable a page(s) resulting in pending status for the page(s). Since the page was not disabled, errors will continue to be seen from this page.
- Action: Run logtool for detail information and look for any pages that have the pending status. To disable pages with the pending status, the system will need to be rebooted.
Event 2
- Severity: Serious
- Description: The Page De-allocation Table (PDT) is XX% full.
- Cause: The operating system was not able to disable a page(s) resulting in pending status for the page(s). Since the page was not disabled, errors will continue to be seen from this page.
- Action: Run logtool for detail information and look for any pages that have the pending status. To disable pages with the pending status, the system will need to be rebooted.
Event 3
- Severity: Critical
- Description: The Page De-allocation Table (PDT) is XX% full.
- Cause: The operating system was not able to disable a page(s) resulting in pending status for the page(s). Since the page was not disabled, errors will continue to be seen from this page.
- Action: Run logtool for detail information and look for any pages that have the pending status. To disable pages with the pending status, the system will need to be rebooted.
Event 100:
- Severity: Information
- Event Summary: Device added for monitoring
- Event Class: Memory
- Problem Description: The device has been added to the set of devices being monitored by this monitor.
- Cause / Action: The device was added to the system or has started responding to the system. The /var/datm/data/os_decode_xref file was modified to add information indicating this device is now supported by this monitor.
If the device is in the DOWN state and this monitor does not control setting the state of the device to UP (check by running the "/etc/opt/resmon/lbin/set_fixed -L" command), use the "/etc/opt/resmon/lbin/set_fixed -n <resource_name>" command to set the state of the device to UP.- Automated Recovery: None
- Event Generation Threshold: 1 occurrence
Event 101
- Severity: Critical
- Event Summary: XXXX at hardware path x/xx/x.x.x : Device removed from monitoring
- Event Class: Memory
- Problem Description:
The device has been removed from the list of devices being monitored by this monitor.- Probable Cause / Recommended Action
The device was removed from the system, has stopped responding to the system or it has been replaced with a device that is not supported by this monitor.
Run ioscan to determine the state and type of the device.
Check the /var/stm/data/os_decode_xref for the information indicating which devices are supported by this monitor.
Check other monitors to determine if they are now monitoring the device by running /etc/opt/resmon/lbin/monconfig and using the "Check monitoring" command.- Automated Recovery: None
- Event Generation Threshold: 1 occurrence
Event 102
- Severity: Information
- Event Summary: XXXX at hardware path x/x.x.x : Device added to monitoring
- Event Class: Memory
- Problem Description:
The device has been added to the set of devices being monitored by this monitor.- Probable Cause / Recommended Action
The device was added to the system, has started responding to the system.
The /var/stm/data/os_decode_xref file was modified to add information indicating this device is now supported by this monitor.
If the device is in the DOWN state and this monitor does not control setting the state of the device to UP (check by running the "/etc/opt/resmon/lbin/set_fix ed -L" command), use the "/etc/opt/resmon/lbin/set_fixed -n <RESOURCE_NAME>" command to set the state of the device to UP.- Automated Recovery: None
- Event Generation Threshold: 1 occurrence <RESOURCE_NAME>
Event 103
- Severity: Information
- Event Summary: Test event
- Event Class: Memory
- Problem Description:
This is a test message from the monitor to test the communication path from the monitor to the notification mechanism.- Probable Cause / Recommended Action
No action required.- Automated Recovery: None
- Event Generation Threshold: 1 occurrence
Event 1000
Event Details: None.
- Severity: Informational
- Event Summary: A memory page has been deallocated and entered into the Page Deallocation Table (PDT).
- Event Class: Memory
- Problem Description:
Correctable single bit memory errors have been detected resulting in the deallocation of a memory page.- Cause / Action:
The occasional deallocation of memory pages is to be expected. No action is required. Should the rate of pages being deallocated become excessive, an event of higher severity will be generated.- Automated Recovery: None
- Event Generation Threshold: Never
- Fault Notifier Event Generation Threshold: Never
Event 1100
Event Details: None.
- Severity: Major Warning
- Event Summary: A memory page has been deallocated and entered into the Page Deallocation Table (PDT).
- Event Class: Memory
- Problem Description:
The Page Deallocation Table (PDT) is !% full.
PDT Entries Used: !
PDT Entries Free: !
PDT Total Size: !
A large number of memory pages have been deallocated due to excessive correctable single bit errors being detected.- Cause / Action:
Although the PDT is not in immediate danger of overflowing, it may be advisable to monitor the situation. If pages continue to be deallocated, an event with higher severity will be generated.- Automated Recovery: None
- Event Generation Threshold: 1 occurrence
- Fault Notifier Event Generation Threshold: 1 occurrence
Event 1200
Event Details: None.
- Severity: Serious
- Event Summary: A memory page has been deallocated and entered into the Page Deallocation Table (PDT).
- Event Class: Memory
- Problem Description:
The Page Deallocation Table (PDT) is !% full.
PDT Entries Used: !
PDT Entries Free: !
PDT Total Size: !
An excessive number of memory pages have been deallocated due to excessive correctable single bit errors being detected.
- Cause / Action:
Although the PDT is not in immediate danger of overflowing, it is advisable to closely monitor the situation. Although the single bit errors are being corrected, this condition indicates a potential problem. If pages continue to be deallocated, an event with higher severity will be generated.- Automated Recovery: None
- Event Generation Threshold: 1 occurrence
- Fault Notifier Event Generation Threshold: 1 occurrence
Event 1300
Event Details: None.
- Severity: Critical
- Event Summary: A memory page has been deallocated and entered into the Page Deallocation Table (PDT).
- Event Class: Memory
- Problem Description:
The Page Deallocation Table (PDT) is !% full.
PDT Entries Used: !
PDT Entries Free: !
PDT Total Size: !
An excessive number of memory pages have been deallocated due to excessive correctable single bit errors being detected. Although the errors are being corrected, this condition indicates a potential problem.- Cause / Action:
The Page Deallocation Table (PDT) is in danger of overflowing; it is strongly advisable to closely monitor the situation. Although the errors are being corrected, this condition indicates a potential problem. Contact your HP support representative to check the memory boards.- Automated Recovery: None
- Event Generation Threshold: 1 occurrences
- Fault Notifier Event Generation Threshold: 1 occurrences
Event 1400
Event Details: None.
- Severity: Critical
- Event Summary: A memory page has been deallocated and entered into the Page Deallocation Table (PDT).
- Event Class: Memory
- Problem Description:
The Page Deallocation Table (PDT) is !% full.
PDT Entries Used: !
PDT Entries Free: !
PDT Total Size: !
A large number of memory pages have been deallocated due to excessive correctable single bit errors being detected. Since the PDT is 100% full, no more entries can be added to it.- Cause / Action:
The Page Deallocation Table (PDT) is full, it is strongly advisable to monitor the situation. Although the errors are being corrected, this condition indicates a potential problem. Contact your HP support representative to check the memory boards.- Automated Recovery: None.
- Event Generation Threshold: 1 occurrence in 24 hours.
- Fault Notifier Event Generation Threshold: 1 occurrence in 24 hours.
Event 3000
Event Details: None.
- Severity: Informational
- Event Summary: Single bit error (SBE) event. A correctable single bit error has been detected and logged.
- Event Class: Memory
- Problem Description:
A correctable single bit memory error has been detected and logged.
The memory component:
Cab/Cell or Node: !
MC/EXT: !
DIMM: !
Serial Number: !
Part Number: !
has generated a single bit memory error and the error information has been logged in the memlog file.
- Cause / Action:
The occasional correctable single bit memory error is to be expected. No action is required. Should the rate of single bit errors become excessive, an event of higher severity will be generated.- Automated Recovery: None
- Event Generation Threshold: Never
- Fault Notifier Event Generation Threshold: Never
Event 3100
- Severity: Major Warning
- Event Summary: Single bit error (SBE) event. A correctable single bit error has been detected and logged.
- Event Class: Memory
- Problem Description:
The memory component:
Cab/Cell or Node: !
MC/EXT !
DIMM !
Serial Number: !
Part Number: !
is experiencing correctable single bit errors on a single address and the same component. Since the page has not been deallocated and is still active, this indicates the address is located in kernel or reserved memory.- Cause / Action: Although the single bit errors are being corrected, it may be advisable to evaluate whether the system should be rebooted. Rebooting the system will allow the memory page to be deallocated so it is no longer referenced. If an excessive rate of single bit errors occur, an event with higher severity will be generated.
- Automated Recovery: None.
- Event Generation Threshold: 500 occurrences.
- Fault Notifier Event Generation Threshold: Never.
Note: This event is disabled by default. If this event is enabled, then it must be noted that there is no time frame for the threshold to be met. Therefore, this event will be generated whenever the number of Single Bit Errors on the same address meets the threshold irrespective of any time frame.
Event Details: None.
Event 3200
- Severity: Serious
- Event Summary: Single bit error (SBE) event. A correctable single bit error has been detected and logged.
- Event Class: Memory
- Problem Description:
The memory component:
Cab/Cell or Node: !
MC/EXT: !
DIMM: !
Serial Number: !
Part Number: !
is experiencing higher than expected correctable single bit errors. Since the page has not been deallocated and is still active, this indicates the address is located in kernel or reserved memory.- Cause / Action: Although the single bit errors are being corrected, it is advisable to evaluate whether the system should be rebooted. Rebooting the system will allow the memory page to be deallocated so it is no longer referenced. If an excessive rate of single bit errors occur, an event with higher severity will be generated.
- Automated Recovery: None
- Event Generation Threshold: 1000 occurrences.
- Fault Notifier Event Generation Threshold: Never.
Note: This event is disabled by default. If this event is enabled, then it must be noted that there is no time frame for the threshold to be met. Therefore, this event will be generated whenever the number of Single Bit Errors on the same address meets the threshold irrespective of any time frame.
Event Details: None.
.Event 3300
- Severity: Critical
- Event Summary: Single bit error (SBE) event. A correctable single bit error has been detected and logged.
- Event Class: Memory
- Problem Description:
The memory component:
Cab/Cell or Node: !
MC/EXT !
DIMM !
Serial Number: !
Part Number: !
is experiencing an excessive number of single bit errors (SBE) at the same address and same component. Since the page has not been deallocated and is still active, this indicates the address is located in kernel or reserved memory.- Cause / Action: Although the single bit errors are being corrected, it is strongly advisable to evaluate whether the system should be rebooted. Rebooting the system will allow the memory page to be deallocated so it is no longer referenced. Although the errors are being corrected, this condition indicates a potential problem.
- Automated Recovery: None
- Event Generation Threshold: 1500 occurrences.
- Fault Notifier Event Generation Threshold: Never.
Note: This event is disabled by default. If this event is enabled, then it must be noted that there is no time frame for the threshold to be met. So this event will be generated whenever the number of Single Bit Errors on the same address meets the threshold irrespective of any time frame.
Event Details: None.
Event 4000
Event Details: None.
- Severity: Major Warning
- Event Summary: Single bit error (SBE) event. A correctable single bit error has been detected and logged.
- Event Class: Memory
- Problem Description:
The memory component:
Cab/Cell or Node: !
MC/EXT !
DIMM !
Serial Number: !
Part Number: !
is experiencing correctable single bit errors (SBE) on a single component.- Cause / Action:
Although the single bit errors are being corrected, it may be advisable to monitor the situation. If an excessive rate of single bit errors occur, an event with higher severity will be generated.- Automated Recovery: None
- Event Generation Threshold: 20 occurrences within 24 hours
- Fault Notifier Event Generation Threshold: 20 occurrences within 24 hours
Event 4100
Event Details: None.
- Severity: Serious
- Event Summary: Single bit error (SBE) event. A correctable single bit error has been detected and logged.
- Event Class: Memory
- Problem Description:
The memory component:
Cab/Cell or Node: !
MC/EXT !
DIMM !
Serial Number: !
Part Number: !
is experiencing a high rate of correctable single bit errors on a single component.
- Cause / Action: Although the single bit errors are being corrected, it is advisable to closely monitor the situation. If an excessive rate of single bit errors occur, an event with higher severity will be generated.
- Automated Recovery: None
- Event Generation Threshold: 50 occurrence in 24 hours
- Fault Notifier Event Generation Threshold: 50 occurrence in 24 hours
Event 4200
Event Details: None.
- Severity: Critical
- Event Summary: Single bit error (SBE) event. A correctable single bit error has been detected and logged.
- Event Class: Memory
- Problem Description:
The memory component:
Cab/Cell or Node: !
MC/EXT !
DIMM !
Serial Number: !
Part Number: !
is experiencing an excessive rate of single bit errors on a single component.
- Cause / Action: Although the single bit errors are being corrected, it is strongly advisable to closely monitor the situation. This condition indicates a potential problem. Contact your HP support representative to check the memory boards.
- Automated Recovery: None
- Event Generation Threshold: 120 occurrences in 24 hours
- Fault Notifier Event Generation Threshold: 120 occurrences in 24 hours
Event 4300
Event Details: None.
- Severity: Warning
- Event Summary: Single bit error (SBE) event. A correctable single bit error has been detected and logged.
- Event Class: Memory
- Problem Description:
The memory component:
Cab/Cell or Node: !
MC/EXT !
DIMM !
Serial Number: !
Part Number: !
is experiencing correctable single bit errors (SBE) on a single component.
- Cause / Action: Although the single bit errors are being corrected, it may be advisable to monitor the situation. If an excessive rate of single bit errors occur, an event with higher severity will be generated.
- Automated Recovery: None
- Event Generation Threshold: 70 occurrences within 7 days
- Fault Notifier Event Generation Threshold: 70 occurrences within 7 days
Event 4400
Event Details: None.
- Severity: Serious
- Event Summary: Single bit error (SBE) event. A correctable single bit error has been detected and logged.
- Event Class: Memory
- Problem Description:
The memory component:
Cab/Cell or Node: !
MC/EXT !
DIMM !
Serial Number: !
Part Number: !
is experiencing a high rate of correctable single bit errors on a single component.
- Cause / Action: Although the single bit errors are being corrected, it is advisable to closely monitor the situation. If an excessive rate of single bit errors occur, an event with higher severity will be generated.
- Automated Recovery: None
- Event Generation Threshold: 120 occurrence in 7 days
- Fault Notifier Event Generation Threshold: 120 occurrence in 7 days
Event 4500
Event Details: None.
- Severity: Critical
- Event Summary: Single bit error (SBE) event. A correctable single bit error has been detected and logged.
- Event Class: Memory
- Problem Description:
The memory component:
Cab/Cell or Node: !
MC/EXT: !
DIMM: !
Serial Number: !
Part Number: !
is experiencing an excessive number of single bit errors.- Cause / Action: Although the single bit errors are being corrected, it is strongly advisable to closely monitor the situation. This condition indicates a potential problem. Contact your HP support representative to check the memory boards.
- Automated Recovery: None
- Event Generation Threshold: 200 occurrences in 7 days
- Fault Notifier Event Generation Threshold: 200 occurrences in 7 days
Event 6000
Event Details: None.
- Severity: Critical
- Event Summary: The memory.debug file is present.
- Event Class: Memory
- Problem Description:
The monitor detected the file "memory.debug" on the system. The memory.debug file is created when the data written into memory has a single bit error. This is a critical problem since this indicates some hardware on the bus is generating single bit errors.- Cause / Action: Although the errors are being corrected, it is strongly advisable to evalute the file memory.debug located in the directory /var/stm/logs/os. This condition indicates a potential problem. Contact your HP support representative to check the hardware.
- Automated Recovery: None
- Event Generation Threshold: 1 detection
- Fault Notifier Event Generation Threshold: 1 detection
Event 6100
Event Details: None.
- Severity: Critical
- Event Summary: A correctable single bit error was detected in the memory controller cache.
- Event Class: Memory
- Problem Description:
A correctable single bt error was detected when reading the data to be returned to a processor or I/O from the Order Access Queue (OAQ). The OAQ is located within the memory controller.- Cause / Action: Although the single bit errors are being corrected, it is strongly advisable to closely monitor the situation.
- Automated Recovery: None
- Event Generation Threshold: 1 detection
- Fault Notifier Event Generation Threshold: 1 detection