| Once, after initial installation and configuration | Create a system log book for monitoring configuration changes
to your system. | N/A |
Run the ovp utility. | Chapter 7: “Monitoring the System” |
Run the sys_check command to establish a
baseline. |
Run the dgemm command to detect any nodes
that are not performing at their peak performance. |
Frequently | Consult the Nagios Web interface to monitor the system status. | Chapter 4: “Managing and Customizing System Services” |
Ensure that the following services are running: | Chapter 6: “Managing the Configuration and Management Database” Monitoring
the System |
Regularly | Back up the head node's disks. | |
Back up the configuration and management database. | Chapter 20: “Using Diagnostic Tools” |
Archive or purge metrics data from the configuration
and management database; a cron script is provided
for this purpose |
Run the sys_check utility at a time
that does not interfere with users' jobs. | Chapter 11: “Distributing Software Throughout the System” |
Run the dgemm utility to detect any
nodes that are not performing at their peak performance. |
For systems that use the Myrinet system interconnect, run the gm_drain_test when it does not interfere with users' jobs. |
For systems that use the Quadrics system interconnect, run the qsnet2_drain_test when it does not interfere with users' jobs. |
Monitor the /hptc_cluster/adm/logs/consolidated.log for potential errors. | Chapter 7: “Monitoring the System” |
| After installing additional software installation
or changing the system configuration | Ensure that the golden image
is updated. | Chapter 11: “Distributing Software Throughout the System” |
Run the ovp command. | Chapter 20: “Using Diagnostic Tools” |
| |