| Once, after initial installation and configuration | Create a system log book for monitoring configuration changes to your system. | N/A |
Run the ovp utility. | Chapter 7: “Monitoring the System” |
Run the sys_check command to establish a baseline. |
Run the dgemm command to detect any nodes that are not performing at their peak performance. |
Frequently | Consult the Nagios Web interface to monitor the system status. | Chapter 4: “Managing and Customizing System Services” |
Ensure that the following services are running: | Chapter 6: “Managing the Configuration and Management Database” Monitoring the System |
Regularly | Back up the head node's disks. | |
Back up the configuration and management database. | Chapter 19: “Using Diagnostic Tools” |
Archive or purge metrics data from the configuration and management database; a cron script is provided for this purpose |
Run the sys_check utility at a time that does not interfere with users' jobs. | Chapter 10: “Distributing Software Throughout the System” |
Run the dgemm utility to detect any nodes that are not performing at their peak performance. |
For systems that use the Myrinet system interconnect, run the gm_drain_test when it does not interfere with users' jobs. |
For systems that use the Quadrics system interconnect, run the qsnet2_drain_test when it does not interfere with users' jobs. |
Monitor the /hptc_cluster/adm/logs/consolidated.log for potential errors. | Chapter 7: “Monitoring the System” |
| After installing additional software installation or changing the system configuration | Ensure that the golden image is updated. | Chapter 10: “Distributing Software Throughout the System” |
Run the ovp command. | Chapter 19: “Using Diagnostic Tools” |
| |