Jump to content United States-English
HP.com Home Products and Services Support and Drivers Solutions How to Buy
» Contact HP
More options
HP.com home
HP XC System Software : Administration Guide > Chapter 8 Monitoring the System with Nagios

Adjusting the Nagios Configuration

» 

Technical documentation

Complete book in PDF
» Feedback
Content starts here

 » Table of Contents

 » Glossary

 » Index

You can adjust Nagios by stopping the nagios service, updating a configuration file, and restarting the nagios service. This section describes the procedures for adjusting a Nagios configuration. It addresses the following topics:

Stopping and Restarting Nagios

Nagios can record a multitude of alerts on large systems when many nodes undergo known maintenance operations. These operations can include restarting or shutting down the HP XC system.

To avoid these alerts, shut down Nagios on the head node immediately before these maintenance operations with the following command:

# pdsh -a "service nagios stop"

To restart Nagios after a maintenance operation, use the following command:

# pdsh -a "service nagios start"

To restart Nagios after changing its configuration, use the following command:

# pdsh -a "service nagios restart"

Improved Availability Is in Effect

If improved availability is in effect, you must restart the nagios service (that is, the Nagios master) using the system's availability tool. Following is an example of how to restart the nagios service using HP Serviceguard. This example restarts the Nagios master, which is running on node n128.

  1. Run the /usr/local/cmcluster/bin/cmviewcl command to determine which node the Nagios master is running on:

    # /usr/local/cmcluster/bin/cmviewcl | grep nagios
       nagios.n128     up           running      enabled      n128
    • Stop Nagios using this Serviceguard command:

      # /usr/local/cmcluster/bin/cmhaltpkg nagios.n128 
    • Restart Nagios using the following Serviceguard commands:

      # /usr/local/cmcluster/bin/cmhaltpkg nagios.n128 
      # /usr/local/cmcluster/bin/cmrunpkg -n n128 nagios.n128 
      # /usr/local/cmcluster/bin/cmmodpkg -e nagios.n128

Updating the Nagios Configuration

Most of the following sections provide you with information on which template files to update to accomplish a given task. The nagios_vars.ini file contains most of the parameters that define the Nagios configuration. Editing this file is key for most of the configuration updates you want to perform. The HP XC System Software also features Nagios template files that define configurable parameters.

As shown in Figure 8-8, the template files, the nagios_vars.ini file, and data from the configuration and management database (CMDB) are processed by a Nagios configurator to generate include files that form the basis for the configured Nagios application.

NOTE: If you change the nagios_vars.ini file, you must propagate it to all nodes. For more information, see Chapter 10

Figure 8-8 Nagios Configuration

Configuring Nagios

When you change the Nagios configuration, you must perform the following tasks:

  1. Read the Nagios documentation carefully.

  2. Change the template files accordingly.

  3. Stop the Nagios service. For instructions on how to stop the Nagios service, see “Stopping and Restarting Nagios”.

  4. Repopulate the Nagios configuration files you changed throughout the HP XC system. Use the pdcp command to copy the files onto all the nodes from the head node immediately, and use the updateimage utility to make this change permanent.

  5. Restart Nagios service. For instructions on how to restart the Nagios service, see “Stopping and Restarting Nagios”.

  6. Verify the results by using the Nagios Web interface.

Forwarding Nagios e-mail Alerts

Nagios sends e-mail by default to the nagios user.

The simplest method to forward e-mail alerts is to log in as the Nagios user and to create a .forward file in the Nagios user's directory (usually /home/nagios) to redirect e-mail alert messages from Nagios to another e-mail account. This method assures that the .forward file's permissions are correct.

NOTE:

Ensure that the sendmail utility is running. For information on the implementation of the sendmail utility on the HP XC system, see “Modifying Sendmail”.

You can customize the Nagios configuration to specify whom to contact by editing the /opt/hptc/nagios/etc/contacts.cfg file. The main portion of this file is shown here:

# 'nagios' contact definition
define contact{
        contact_name                    nagios
        alias                           Nagios Admin
        service_notification_period     24x7
        host_notification_period        24x7
        service_notification_options    w,u,c,r
        host_notification_options       d,u,r
        service_notification_commands   notify-by-email,notify-by-epager
        host_notification_commands      host-notify-by-email,host-notify-by-epager
        email                           nagios@localhost.localdomain
        pager                           nagios@localhost.localdomain
        }

Changing the values for email and pager to reflect your system's name enables Nagios to send notification through the sendmail utility. For example, changing nagios@localhost.localdomain to nagios@example.com.

NOTE: Nagios can generate many e-mail messages. You can use the open source Nan utility to help control these messages. For more information, see “Nan Notification Aggregator and Delimiter”.

Changing Sensor Thresholds

Job loads, usage patterns, process types, counts, memory, cache, disk subsystems, and so on all contribute input to Nagios. Nagios uses threshold values to determine whether or not to send an alert, and, if so, whether that alert is critical or a warning. Nagios monitors the sensor thresholds and generates alerts when a threshold is reached. Depending on your specific site configuration and use, some default thresholds might not be appropriate for your system.

The platform-dependent default thresholds provided in the HP XC system serve as a baseline, but they might not be optimal for your site. As system administrator, you need to determine the threshold values appropriate for your site and customize the Nagios configuration.

The /opt/hptc/nagios/etc/nagios_vars.ini file represents various constants and variables used throughout the HP XC system's plug-ins and the Nagios configurations. You can edit this file to customize Nagios for the thresholds. Changing these values changes when Nagios alerts you to subsystems encountering thresholds.

The nagios_vars.ini file also contains variables that are commented out. Examine the content of the file to determine if those variables are appropriate for your system. If so, remove the comment characters accordingly. This portion of the nagios_vars.ini file is an example:

#   Note any sensors matched by the following patterns will
#   be individually archived and viewable via shownode metrics sensors
#   any sensors not matched will be reported as a single group
#   status when it is within threshold.  Any sensor reporting
#   outside of its thresholds will always be individually archived.
 
#       SENSORPRINT0 = CPU[0-9]+ TEMP
#       SENSORPRINT1 = SYS TEMP

If you change the nagios_vars.ini file, be sure to propagate the file to the appropriate nodes, usually the management hubs, on your system; see Chapter 10 for more information. “Updating the Nagios Configuration” describes the overall procedure for updating the Nagios configuration.

Adjusting the Time Allotted for Metrics Collection

Table 8-1 displays the default collection intervals for the Supermon Metrics Monitor service.

The Supermon Metrics Monitor schedules and collects individual metrics at a specified interval. You can change an interval. The interval must be a multiple of the time specified by the value of the normal_check_interval parameter defined in the /opt/hptc/nagios/etc/templates/nagios_template.cfg or /opt/hptc/nagios/etc/templates/nagios_monitor.cfg template file.

Table 8-1 Supermon Metrics Collection Intervals

Metric NameCollection Interval
pagingdefault*
cpuinfodefault*
cputypedefault*
btimedefault*
processesdefault*
netinfodefault*
meminfodefault*
swapinfodefault*
timedefault*
switchdefault*
cputotaldefault*
avenrun%LOADAVECOLLECTIONPERIOD% **
mdadm%MDADMCOLLECTIONPERIOD% **
*

The default is set to 5 minutes.

**

These values are specified in /opt/hptc/nagios/etc/nagios_vars.ini file.

 

Global Service Check Timeout Limit

The master Nagios configuration file, nagios.cfg, has a number of global settings that control overall behavior. One of these is the service_check_timeout interval. Nagios limits the execution time of plug-ins to this interval. If a plug-in is still running when the interval expires, Nagios terminates the plug-in and shows the result as a Service check timeout error.

For systems with fewer than 256 nodes, the default value of 180 seconds should be adequate. However, warning or critical messages can occur if the service_check_timeout interval ends before the metrics gathering is complete. If your system has more nodes, consider increasing the value for the service_check_timeout parameter.

Changing the Default Nagios User Name

Often the Nagios user name and user ID are established during the initial system configuration, that is, when the cluster_config utility is run. If a Nagios user name is found at that time, the HP XC system uses that user name and user ID instead of creating the default user name and user ID. However, you can configure the HP XC system to use an alternate nagios user and group account.

Use the following procedure to change the default Nagios user name.

  1. Stop the Nagios service if the HP XC system is running. For instructions on how to stop the Nagios service, see “Stopping and Restarting Nagios”.

  2. Verify the Nagios user ID:

    # grep nagios /etc/passwd
    nagios:x:222:222::/home/nagios:/bin/bash
    NOTE: The default Nagios user account ID, nagios, is 222.
  3. Use the standard user account utilities to delete the nagios user account, then add another:

    # userdel –r nagios
    # useradd –u 222 –g hpadm newname

    Alternatively, you can use NIS to change the user account name if this appropriate for your site.

    NOTE: This example retains the default user ID for Nagios.
  4. Change the line:

    nagios_user=nagios

    to

    nagios_user=newname

    in each of the following files:

    • /opt/hptc/nagios/etc/nagios.cfg

    • /opt/hptc/nagios/etc/nagios_monitor.cfg

    • /opt/hptc/nagios/etc/nrpe.cfg

    • /opt/hptc/nagios/etc/nsca.cfg

    NOTE: Complete steps 5 through 10 only for a new user name that was added after the cluster_config utility was run.
  5. To change the ownership of Nagios files to the newname user, perform the following steps:

    1. Change to the root directory:

       # cd /  
    2. Use the following command to change the file ownership from user nagios to newname:

       # find . –mount –user nagios | xargs chown newname
    IMPORTANT: If the /hptc_cluster file system is a Lustre file system (SFS), run the following command to change file ownership on that file system separately:
     # find /hptc_cluster –user nagios | xargs chown newname
  6. Ensure that the password files are synchronized throughout the HP XC system:

    # pdcp –a /etc/passwd /etc/passwd
  7. Create the ssh keys for the newname user account:

    # /opt/hptc/bin/ssh_create_shared_keys –user newname
  8. Capture the Nagios keys and replicate them across the HP XC system:

    # tar cvf /hptc_cluster/newname_keys.tar /home/newname
    # pdsh –a –x nh "tar xvf /hptc_cluster/newname_keys.tar"
  9. Verify that you can log in to a random node as the newname user:

    # ssh any_node -l newname
  10. Use the nconfigure utility to reconfigure Nagios across the HP XC system:

    # pdsh –a "service nagios nconfigure"
  11. Restart Nagios. For instructions on how to restart Nagios, see “Stopping and Restarting Nagios”.

Disabling Individual Nagios Plug-Ins

All the Nagios plug-ins developed for the HP XC system are enabled by default. However, you can modify the /opt/hptc/nagios/etc/templates/*_template.cfg files to customize the service checks as needed.

IMPORTANT: Do not modify files in the /opt/hptc/nagios/etc directory with file names of the form *_local.cfg or xc_*.cfg.

Use the following procedure to disable a specific Nagios plug-in:

  1. Log in as superuser (root) on the head node.

  2. Change directory to the /opt/hptc/nagios/etc/templates/ directory:

    # cd /opt/hptc/nagios/etc/templates
  3. Determine the appropriate template file to disable the plug-in.

    This procedure uses the nagios_template.cfg file as an example.

  4. Use the text editor of your choice to modify the template file.

  5. Use the pdcp command to copy the template file to all the nodes in the HP XC system.

    # pdcp -a nagios_template.cfg /opt/hptc/nagios/etc/templates/
  6. Stop the Nagios service on all the nodes. For instructions on how to stop Nagios, see “Stopping and Restarting Nagios”.

  7. Reconfigure Nagios on all the nodes:

    # pdsh -a "service nagios nconfigure"
  8. Restart the Nagios service on all the nodes. For instructions on how to restart Nagios, see “Stopping and Restarting Nagios”.

Update the golden image with the Nagios template file to ensure a permanent change. See Chapter 10 for more information.

Printable version
Privacy statement Using this site means you accept its terms Feedback to webmaster
© 2003 Hewlett-Packard Development Company, L.P.