| United States-English |
|
|
|
![]() |
HP XC System Software : Release Notes > Chapter 14 Documentation NotesInformation Omitted From the HP XC System Software Administration Guide |
|
The following sections provide information that was omitted from the HP XC System Software Administration Guide . This new functionality was delivered to your HP XC system through the PK01 patch for Version 3.0. Each hardware platform provided by HP supplies an event logging mechanism to capture platform-specific events to track hardware states and changes. Information in the system event log (SEL) varies, but it typically contains information including, but not limited to, the following: Event logs are stored by the firmware and can become full over time. Some hardware models require regular maintenance to clear the logs to avoid losing critical events. In addition, errors that indicate failure or pending failure of a component need to be brought to the operator's immediate attention. The HP XC system event log functionality provides complete management of all log types of supported HP platforms. Log information is regularly read and archived, and the information is used to generate Nagios alerts when applicable. Logs that approach a critical size are cleared to prevent event loss. Event logs are typically accessed through the management port requiring platform/protocol-specific user authentication as well as network access to the console port (cp-nxxx, where nxxx is the node number). System event log history is captured in the /hptc_cluster/adm/logs/sel/sel-nxxx.log file where nxxx represents the name of the individual node. System event logs are managed by the standard logrotate functionality. For more information on this utility, see logrotate(8). This new functionality was delivered to your HP XC system through the PK01 patch for Version 3.0. System event log and hardware sensor information is gathered using the input interface for platforms without an iLO management port. Some platforms require additional user name and password setup to allow access to the BMC/IPMI connection on the console port. In addition, depending on how your head node is attached to the network, additional password setup may be required. On HP XC systems whose head node is either an HP CP6000 HP Integrity system or an HP CP4000 HP ProLiant system, you can only obtain the sensor and system event log information remotely. See “Required Task: Configure the BMC Password On Itanium Systems” for instructions to configure the BMC password on HP Integrity systems. This new functionality was delivered to your HP XC system through the PK01 patch for Version 3.0. You can use the logrotate command to change the rotation of the system event logs and the rules for Nagios alerts. In this release (with all patches installed), Nagios is able to alert you with power, memory, voltage, and automatic system recovery (ASR) messages. Nagios alert rules are defined in the /opt/hptc/nagios/etc/selRules file. Edit this file if you want to modify the alert rules. This procedure is not documented in the HP XC System Software Administration Guide but it will be included in a future version. To move the SLURM and LSF daemons from their primary node to their backup node (perhaps due to a maintenance need on the primary node), follow this procedure:
If you set another node to be the BackupController for SLURM, you can log into that node and run the slurmctld command. This new backup node requires the resource_management role to be assigned to it for this configuration to persist after future runs of the cluster_config command . To move LSF and SLURM back to the original primary node, follow the same procedure with the assumption that the original primary node is now the backup node, and the original backup node is now the primary node. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||