| Once, after initial installation and configuration | Create a system log book for monitoring configuration changes to your system. | No reference available |
Run the ovp command. | Chapter 6: “Monitoring the System” |
Run the sys_check command to establish a baseline. |
Run the dgemm command to detect any nodes that are not performing at their peak performance. |
Frequently | Consult the Nagios Web interface to monitor the system status. | Chapter 3: “Managing System Services” |
Ensure that the following services are running: | Chapter 5: “Managing the Configuration and Management Database” Monitoring the System |
Regularly | Back up the head node's disks. | |
Back up the configuration and management database. | Chapter 16: “Using Diagnostic Tools” |
Archive or purge metrics data from the configuration and management database; a cron script is provided for this purpose |
Run the sys_check command at a time that does not interfere with users' jobs. | Chapter 8: “Distributing Software Throughout the System” |
Run the dgemm command to detect any nodes that are not performing at their peak performance. |
For systems that use the Myrinet system interconnect, run the gm_drain_test when it does not interfere with users' jobs. |
For systems that use the Quadrics system interconnect, run the qsnet2_drain_test when it does not interfere with users' jobs. |
Monitor the /hptc_cluster/adm/logs/consolidated.log for potential errors. | Chapter 6: “Monitoring the System” |
| After a software installation or a change to the configuration | Ensure that the golden image is updated. | Chapter 8: “Distributing Software Throughout the System” |
Run the ovp command. | Chapter 16: “Using Diagnostic Tools” |
| |