Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
HP XC System Software : Administration Guide > Chapter 7 Monitoring the System

The collectl Utility

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

The collectl utility collects data on the nodes of the HP XC system. As a development or debug tool, the collectl utility typically gathers more detail more frequently than the supermon utility. The collectl utility does have some overhead, but for most situations, it consumes less than 0.1 percent of the CPU and has minimal effect on user applications. However, even this low level can have a significant impact on some applications, so use the collectl utility with care.

The collectl utility also enables you to play back the data in either raw ASCII characters or in a plot form, which can be used to display the data with GnuPlot or Microsoft Excel. Figure 7-3 shows one example of the plotted graph based on the collectl utility's collection of CPU data. Example 7-1 provides an illustration of collectl utility's ASCII output.

Figure 7-3 Plotted Output from the collectl Utility

Plotted output from the collectl utility

You can use any of the following methods to run the collectl utility:

Running the collectl Utility from the Command Line

The default action of this utility is to collect data at 10-second intervals and to display the data in ASCII characters on the terminal screen. Example 7-1 shows the invocation and first record reported from the collectl utility. The information has been edited to fit horizontally on the page.

Example 7-1 Using the collectl Utility from the Command Line

# collectl
waiting for 10 second sample...

### RECORD    1 >>> n3 <<< (m.n) (date and time stamp) ###

# CPU SUMMARY (INTR, CTXSW & PROC /sec)
# USER NICE SYS IDLE WAIT INTR CTXSW PROC RUNQ RUN AVG1 AVG5 AVG15
     0    0   0   99    0 1055    65    0  151   0 0.02 0.04  0.00

# DISK SUMMARY (/sec)
#Reads  R-Merged  R-KBytes   Writes  W-Merged  W-KBytes
     0         0         0        5         7        51

# MEMORY STATISTICS
#<---------------------Physical Memory------------------->
#   TOTAL    USED    FREE    BUFF  CACHED    SLAB  MAPPED
    3965M   1255M   2710M 129800K 920732K  89484K 157068K

<-----------Swap----------><-Inactive-><Pages/sec>
   TOTAL    USED    FREE     TOTAL     IN    OUT
   6141M       0   6141M   368352K      0     51

# NETWORK SUMMARY (/sec)
#InPck  InErr OutPck OutErr   Mult   ICmp   OCmp    IKB    OKB
     8      0      3      0      0      0      0      0      0

# SOCKET STATISTICS
#      <-------------Tcp------------->   Udp   Raw   <---Frag-->
#Used  Inuse Orphan    Tw  Alloc   Mem  Inuse Inuse  Inuse   Mem
  146     33      0    13     51     0     27     1      0     0

# TCP SUMMARY (/sec)
# PureAcks HPAcks   Loss FTrans
         1      0      0      0

The collectl utility provides alternate output formats:

  • Use the --M 1 option to display the output in a single line for a more compressed and easier to read format. Be aware that this option may not produce all the fields.

  • Use the --oT option to timestamp the data.

  • Use the --oD and --od options to provide two formats of a date and timestamp.

For a discussion of the options to the collectl utility and a description of its output, see collectl(1).

Running the collectl Utility as a Service

After it is enabled, the collectl utility can be run as a service. You can use the service command to stop and start the collectl service. You can also obtain the current status of this service, as shown in the following example:

# service collectl status
collectl (pid process_id) is running...

The collectl service is set up to collect normally reported summary data and to write it in a compressed text file in the /var/log/collectl directory.

The actions of the collectl service are specified by the /opt/hptc/config/services/collectl.ini file.

By default, the collectl service gathers information on the following subsystems:

  • CPU

  • Disk

  • Inode and file system

  • Lustre file system

  • Memory

  • Networks

  • Sockets

  • TCP

  • Interconnect

The collectl(1) manpage discusses running the collectl utility as a service.

Running the collectl Utility in a Batch Job Submission

You can run the collectl utility as one job in a batch job submission. In a batch job submission, the purpose of the collectl utility is to monitor the node while the batch job processes. You must modify the job submission script, as follows:

  1. Determine on which node the collectl utility is to be run.

  2. Decide which options you need. Typically, the following options define:

    • The output file, specified with the -f option.

    • The subsystem data to collect, specified with the -s option. The subsystems include the following:

      • CPU

      • Disk

      • Inode and File System

      • Interconnect

      • Memory

      • Networks

      • NFS V3 data

      • TCP

    • The number of seconds in the sampling interval, specified with the -i option.

  3. Start the collectl utility on each node with the ssh utility.

    Be sure to run the collectl utility in the background so that the script does not hang while waiting for the collectl utility to complete.

    Collect the process ID for the collectl utility on each node.

    Allow the collectl utility from 5 to 10 seconds to start and quiesce.

  4. Start the batch job and allow it to complete.

  5. When the batch job completes, stop the collectl process on each node by killing its process ID. The collectl process traps the SIGNINT signal and shuts down cleanly.

  6. Copy the files that the collectl process created on the node's disk, and store them in a separate location for later review.

    Delete the files that the collectl process created.

Another alternative is to log in to one of the compute nodes used by the application, and run the collectl utility on the command line.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© 2003 Hewlett-Packard Development Company, L.P.