9.6. Troubleshooting Nagios


The possible errors while configuring Nagios Service Check Acceptor (NSCA) and Nagios Remote Plug-in Executor (NRPE) and the troubleshooting steps are listed in this section.
Troubleshooting NSCA Configuration Issues

  • Check Firewall and Port Settings on Nagios Server
    If port 5667 is not opened on the server host's firewall, a timeout error is displayed. Ensure that port 5667 is opened.
    1. Log in as root and run the following command on the Red Hat Storage node to get the list of current iptables rules:
      # iptables -L
      Copy to Clipboard Toggle word wrap
    2. The output is displayed as shown below:
      ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:5667
      Copy to Clipboard Toggle word wrap
    3. If the port is not opened, add an iptables rule by adding the following line in /etc/sysconfig/iptables file:
      -A INPUT -m state --state NEW -m tcp -p tcp --dport 5667 -j ACCEPT
      Copy to Clipboard Toggle word wrap
    4. Restart the iptables service using the following command:
      # service iptables restart
      Copy to Clipboard Toggle word wrap
    5. Restart the NSCA service using the following command:
      # service nsca restart
      Copy to Clipboard Toggle word wrap
  • Check the Configuration File on Red Hat Storage Node
    Messages cannot be sent to the NSCA server, if Nagios server IP or FQDN, cluster name and hostname (as configured in Nagios server) are not configured correctly.
    Open the Nagios server configuration file /etc/nagios/nagios_server.conf and verify if the correct configurations are set as shown below:
    # NAGIOS SERVER 
    # The nagios server IP address or FQDN to which the NSCA command 
    # needs to be sent 
    [NAGIOS-SERVER] 
    nagios_server=NagiosServerIPAddress 
     
     
    # CLUSTER NAME 
    # The host name of the logical cluster configured in Nagios under which 
    # the gluster volume services reside 
    [NAGIOS-DEFINTIONS] 
    cluster_name=cluster_auto 
     
     
    # LOCAL HOST NAME 
    # Host name given in the nagios server 
    [HOST-NAME] 
    hostname_in_nagios=NagiosServerHostName
    Copy to Clipboard Toggle word wrap
    If Host name is updated, restart the NSCA service using the following command:
    # service nsca restart
    Copy to Clipboard Toggle word wrap

Troubleshooting NRPE Configuration Issues

  • CHECK_NRPE: Error - Could Not Complete SSL Handshake
    This error occurs if the IP address of the Nagios server is not defined in the nrpe.cfg file of the Red Hat Storage node. To fix this issue, follow the steps given below:
    1. Add the Nagios server IP address in /etc/nagios/nrpe.cfg file in the allowed_hosts line as shown below:
      allowed_hosts=127.0.0.1, NagiosServerIP
      Copy to Clipboard Toggle word wrap
      The allowed_hosts is the list of IP addresses which can execute NRPE commands.
    2. Save the nrpe.cfg file and restart the NRPE service using the following command:
      # service nrpe restart
      Copy to Clipboard Toggle word wrap
  • CHECK_NRPE: Socket Timeout After n Seconds
    To resolve this issue perform the steps given below:
    On Nagios Server:
    The default timeout value for the NRPE calls is 10 seconds and if the server does not respond within 10 seconds, Nagios GUI displays an error that the NRPE call has timed out in 10 seconds. To fix this issue, change the timeout value for NRPE calls by modifying the command definition configuration files.
    1. Changing the NRPE timeout for services which directly invoke check_nrpe.
      For the services which directly invoke check_nrpe (check_disk_and_inode, check_cpu_multicore, and check_memory), modify the command definition configuration file /etc/nagios/gluster/gluster-commands.cfg by adding -t Time in Seconds as shown below:
      define command {
             command_name check_disk_and_inode
             command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_disk_and_inode -t TimeInSeconds
      }
      Copy to Clipboard Toggle word wrap
    2. Changing the NRPE timeout for the services in nagios-server-addons package which invoke NRPE call through code.
      The services which invoke /usr/lib64/nagios/plugins/gluster/check_vol_server.py (check_vol_utilization, check_vol_status, check_vol_quota_status, check_vol_heal_status, and check_vol_georep_status) make NRPE call to the Red Hat Storage nodes for the details through code. To change the timeout for the NRPE calls, modify the command definition configuration file /etc/nagios/gluster/gluster-commands.cfg by adding -t No of seconds as shown below:
      define command {
            command_name check_vol_utilization
            command_line $USER1$/gluster/check_vol_server.py $ARG1$ $ARG2$ -w $ARG3$ -c $ARG4$ -o utilization -t TimeInSeconds
      }
      Copy to Clipboard Toggle word wrap
      The auto configuration service gluster_auto_discovery makes NRPE calls for the configuration details from the Red Hat Storage nodes. To change the NRPE timeout value for the auto configuration service, modify the command definition configuration file /etc/nagios/gluster/gluster-commands.cfg by adding -t TimeInSeconds as shown below:
      define command{
              command_name    gluster_auto_discovery
              command_line    sudo $USER1$/gluster/configure-gluster-nagios.py -H $ARG1$ -c $HOSTNAME$ -m auto -n $ARG2$ -t TimeInSeconds
      }
      Copy to Clipboard Toggle word wrap
    3. Restart Nagios service using the following command:
      # service nagios restart
      Copy to Clipboard Toggle word wrap
    On Red Hat Storage node:
    1. Add the Nagios server IP address as described in CHECK_NRPE: Error - Could Not Complete SSL Handshake section in Troubleshooting NRPE Configuration Issues section.
    2. Edit the nrpe.cfg file using the following command:
      # vi /etc/nagios/nrpe.cfg
      Copy to Clipboard Toggle word wrap
    3. Search for the command_timeout and connection_timeout settings and change the value. The command_timeout value must be greater than or equal to the timeout value set in Nagios server.
      The timeout on checks can be set as connection_timeout=300 and the command_timeout=60 seconds.
    4. Restart the NRPE service using the following command:
      # service nrpe restart
      Copy to Clipboard Toggle word wrap
  • Check the NRPE Service Status
    This error occurs if the NRPE service is not running. To resolve this issue perform the steps given below:
    1. Verify the status of NRPE service by logging into the Red Hat Storage node as root and running the following command:
      # service nrpe status
      Copy to Clipboard Toggle word wrap
    2. If NRPE is not running, start the service using the following command:
      # service nrpe start
      Copy to Clipboard Toggle word wrap
  • Check Firewall and Port Settings
    This error is associated with firewalls and ports. The timeout error is displayed if the NRPE traffic is not traversing a firewall, or if port 5666 is not open on the Red Hat Storage node.
    Enure that port 5666 is open on the Red Hat Storage node.
    1. Run check_nrpe command from the Nagios server to verify if the port is open and if NRPE is running on the Red Hat Storage Node .
    2. Log into the Nagios server as root and run the following command:
      # /usr/lib64/nagios/plugins/check_nrpe -H RedHatStorageNodeIP
      Copy to Clipboard Toggle word wrap
    3. The output is displayed as given below:
      NRPE v2.14
      Copy to Clipboard Toggle word wrap
    If not, ensure the that port 5666 is opened on the Red Hat Storage node.
    1. Run the following command on the Red Hat Storage node as root to get a listing of the current iptables rules:
      # iptables -L
      Copy to Clipboard Toggle word wrap
    2. The output is displayed as shown below:
      ACCEPT - tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:5666
      Copy to Clipboard Toggle word wrap
  • If the port is not open, add iptables rule for it.
    1. To add iptables rule, edit the iptables file as shown below:
      # vi /etc/sysconfig/iptables
      Copy to Clipboard Toggle word wrap
    2. Add the following line in the file:
      -A INPUT -m state --state NEW -m tcp -p tcp --dport 5666 -j ACCEPT
      Copy to Clipboard Toggle word wrap
    3. Restart the iptables service using the following command:
      # service iptables restart
      Copy to Clipboard Toggle word wrap
    4. Save the file and restart the NRPE service:
      # service nrpe restart
      Copy to Clipboard Toggle word wrap
  • Checking Port 5666 From the Nagios Server with Telnet
    Use telnet to verify the Red Hat Storage node's ports. To verify the ports of the Red Hat Storage node, perform the steps given below:
    1. Log in as root on Nagios server.
    2. Test the connection on port 5666 from the Nagios server to the Red Hat Storage node using the following command:
      # telnet RedHatStorageNodeIP 5666
      Copy to Clipboard Toggle word wrap
    3. The output displayed is similar to:
      telnet 10.70.36.49 5666 
      Trying 10.70.36.49... 
      Connected to 10.70.36.49. 
      Escape character is '^]'.
      Copy to Clipboard Toggle word wrap
  • Connection Refused By Host
    This error is due to port/firewall issues or incorrectly configured allowed_hosts directives. See the sections CHECK_NRPE: Error - Could Not Complete SSL Handshake and CHECK_NRPE: Socket Timeout After n Seconds for troubleshooting steps.

9.6.2. Troubleshooting General Issues

This section describes the troubleshooting procedures for general issues related to Nagios.
All cluster services are in warning state and status information is displayed as (null).

Set SELinux to permissive and restart the Nagios server.

Graphs are not displayed in Trends tab

Ensure that the host name given in Name field of Add Host window matches the host name given while configuring Nagios. The host name of the node is used while configuring Nagios server using auto-discovery.

Back to top
Red Hat logoGithubredditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust. Explore our recent updates.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

Theme

© 2025 Red Hat