17.7. Troubleshooting Nagios

17.7.1. Troubleshooting NSCA and NRPE Configuration Issues
Copy link

The possible errors while configuring Nagios Service Check Acceptor (NSCA) and Nagios Remote Plug-in Executor (NRPE) and the troubleshooting steps are listed in this section.

Troubleshooting NSCA Configuration Issues

Check Firewall and Port Settings on Nagios Server
If port 5667 is not opened on the server host's firewall, a timeout error is displayed. Ensure that port 5667 is opened.
On Red Hat Gluster Storage based on Red Hat Enterprise Linux 6
1. Log in as root and run the following command on the Red Hat Gluster Storage node to get the list of current iptables rules:
  # iptables -L
  Copy to Clipboard Toggle word wrap
2. The output is displayed as shown below:
  ACCEPT tcp -- anywhere anywhere tcp dpt:5667
  Copy to Clipboard Toggle word wrap
On Red Hat Gluster Storage based on Red Hat Enterprise Linux 7:
1. Run the following command on the Red Hat Gluster Storage node as root to get a listing of the current firewall rules:
  # firewall-cmd --list-all-zones
  Copy to Clipboard Toggle word wrap
2. If the port is open, 5667/tcp is listed beside ports: under one or more zones in your output.
If the port is not open, add a firewall rule for the port:
On Red Hat Gluster Storage based on Red Hat Enterprise Linux 6
1. If the port is not open, add an iptables rule by adding the following line in /etc/sysconfig/iptables file:
  -A INPUT -m state --state NEW -m tcp -p tcp --dport 5667 -j ACCEPT
  Copy to Clipboard Toggle word wrap
2. Restart the iptables service using the following command:
  # service iptables restart
  Copy to Clipboard Toggle word wrap
3. Restart the NSCA service using the following command:
  # service nsca restart
  Copy to Clipboard Toggle word wrap
On Red Hat Gluster Storage based on Red Hat Enterprise Linux 7:
1. Run the following commands to open the port:
  # firewall-cmd --zone=public --add-port=5667/tcp # firewall-cmd --zone=public --add-port=5667/tcp --permanent
  Copy to Clipboard Toggle word wrap

Check the Configuration File on Red Hat Gluster Storage Node

Messages cannot be sent to the NSCA server, if Nagios server IP or FQDN, cluster name and hostname (as configured in Nagios server) are not configured correctly.

Open the Nagios server configuration file /etc/nagios/nagios_server.conf and verify if the correct configurations are set as shown below:

# NAGIOS SERVER
# The nagios server IP address or FQDN to which the NSCA command
needs to be sent
[NAGIOS-SERVER]
nagios_server=NagiosServerIPAddress


# CLUSTER NAME
# The host name of the logical cluster configured in Nagios under which
the gluster volume services reside
[NAGIOS-DEFINTIONS]
cluster_name=cluster_auto


# LOCAL HOST NAME
# Host name given in the nagios server
[HOST-NAME]
hostname_in_nagios=NagiosServerHostName

# NAGIOS SERVER 
# The nagios server IP address or FQDN to which the NSCA command 
# needs to be sent 
[NAGIOS-SERVER] 
nagios_server=NagiosServerIPAddress 
 
 
# CLUSTER NAME 
# The host name of the logical cluster configured in Nagios under which 
# the gluster volume services reside 
[NAGIOS-DEFINTIONS] 
cluster_name=cluster_auto 
 
 
# LOCAL HOST NAME 
# Host name given in the nagios server 
[HOST-NAME] 
hostname_in_nagios=NagiosServerHostName

Copy to Clipboard

Toggle word wrap

If Host name is updated, restart the NSCA service using the following command:

service nsca restart

# service nsca restart

Copy to Clipboard

Toggle word wrap

Troubleshooting NRPE Configuration Issues

CHECK_NRPE: Error - Could Not Complete SSL Handshake
This error occurs if the IP address of the Nagios server is not defined in the nrpe.cfg file of the Red Hat Gluster Storage node. To fix this issue, follow the steps given below:
1. Add the Nagios server IP address in /etc/nagios/nrpe.cfg file in the allowed_hosts line as shown below:
  allowed_hosts=127.0.0.1, NagiosServerIP
  
  Copy to Clipboard Toggle word wrap
  The allowed_hosts is the list of IP addresses which can execute NRPE commands.
2. Save the nrpe.cfg file and restart NRPE service using the following command:
  # service nrpe restart
  
  Copy to Clipboard Toggle word wrap
CHECK_NRPE: Socket Timeout After n Seconds
To resolve this issue perform the steps given below:
On Nagios Server:
The default timeout value for the NRPE calls is 10 seconds and if the server does not respond within 10 seconds, Nagios Server GUI displays an error that the NRPE call has timed out in 10 seconds. To fix this issue, change the timeout value for NRPE calls by modifying the command definition configuration files.
1. Changing the NRPE timeout for services which directly invoke check_nrpe.
  For the services which directly invoke check_nrpe (check_disk_and_inode, check_cpu_multicore, and check_memory), modify the command definition configuration file /etc/nagios/gluster/gluster-commands.cfg by adding -t Time in Seconds as shown below:
  
  define command { command_name check_disk_and_inode command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_disk_and_inode -t TimeInSeconds }
  
  Copy to Clipboard Toggle word wrap
2. Changing the NRPE timeout for the services in nagios-server-addons package which invoke NRPE call through code.
  The services which invoke /usr/lib64/nagios/plugins/gluster/check_vol_server.py (check_vol_utilization, check_vol_status, check_vol_quota_status, check_vol_heal_status, and check_vol_georep_status) make NRPE call to the Red Hat Gluster Storage nodes for the details through code. To change the timeout for the NRPE calls, modify the command definition configuration file /etc/nagios/gluster/gluster-commands.cfg by adding -t No of seconds as shown below:
  
  define command { command_name check_vol_utilization command_line $USER1$/gluster/check_vol_server.py $ARG1$ $ARG2$ -w $ARG3$ -c $ARG4$ -o utilization -t TimeInSeconds }
  
  Copy to Clipboard Toggle word wrap
  The auto configuration service gluster_auto_discovery makes NRPE calls for the configuration details from the Red Hat Gluster Storage nodes. To change the NRPE timeout value for the auto configuration service, modify the command definition configuration file /etc/nagios/gluster/gluster-commands.cfg by adding -t TimeInSeconds as shown below:
  
  define command{ command_name gluster_auto_discovery command_line sudo $USER1$/gluster/configure-gluster-nagios.py -H $ARG1$ -c $HOSTNAME$ -m auto -n $ARG2$ -t TimeInSeconds }
  
  Copy to Clipboard Toggle word wrap
3. Restart Nagios service using the following command:
  
  # service nagios restart
  
  Copy to Clipboard Toggle word wrap
On Red Hat Gluster Storage node:
1. Add the Nagios server IP address as described in CHECK_NRPE: Error - Could Not Complete SSL Handshake section in Troubleshooting NRPE Configuration Issues section.
2. Edit the nrpe.cfg file using the following command:
  # vi /etc/nagios/nrpe.cfg
  
  Copy to Clipboard Toggle word wrap
3. Search for the command_timeout and connection_timeout settings and change the value. The command_timeout value must be greater than or equal to the timeout value set in Nagios server.
  The timeout on checks can be set as connection_timeout=300 and the command_timeout=60 seconds.
4. Restart the NRPE service using the following command:
  # service nrpe restart
  
  Copy to Clipboard Toggle word wrap
Check the NRPE Service Status
This error occurs if the NRPE service is not running. To resolve this issue perform the steps given below:
1. Log in as root to the Red Hat Gluster Storage node and run the following command to verify the status of NRPE service:
  # service nrpe status
  
  Copy to Clipboard Toggle word wrap
2. If NRPE is not running, start the service using the following command:
  # service nrpe start
  
  Copy to Clipboard Toggle word wrap
Check Firewall and Port Settings
This error is associated with firewalls and ports. The timeout error is displayed if the NRPE traffic is not traversing a firewall, or if port 5666 is not open on the Red Hat Gluster Storage node.
Ensure that port 5666 is open on the Red Hat Gluster Storage node.
1. Run check_nrpe command from the Nagios server to verify if the port is open and if NRPE is running on the Red Hat Gluster Storage Node .
2. Log into the Nagios server as root and run the following command:
  # /usr/lib64/nagios/plugins/check_nrpe -H RedHatStorageNodeIP
  Copy to Clipboard Toggle word wrap
3. The output is displayed as given below:
  NRPE v2.14
  Copy to Clipboard Toggle word wrap
If not, ensure the that port 5666 is opened on the Red Hat Gluster Storage node.
On Red Hat Gluster Storage based on Red Hat Enterprise Linux 6:
1. Run the following command on the Red Hat Gluster Storage node as root to get a listing of the current iptables rules:
  # iptables -L
  Copy to Clipboard Toggle word wrap
2. If the port is open, the following appears in your output.
  ACCEPT tcp -- anywhere anywhere tcp dpt:5666
  Copy to Clipboard Toggle word wrap
On Red Hat Gluster Storage based on Red Hat Enterprise Linux 7:
1. Run the following command on the Red Hat Gluster Storage node as root to get a listing of the current firewall rules:
  # firewall-cmd --list-all-zones
  Copy to Clipboard Toggle word wrap
2. If the port is open, 5666/tcp is listed beside ports: under one or more zones in your output.
If the port is not open, add an iptables rule for the port.
On Red Hat Gluster Storage based on Red Hat Enterprise Linux 6:
1. To add iptables rule, edit the iptables file as shown below:
  # vi /etc/sysconfig/iptables
  
  Copy to Clipboard Toggle word wrap
2. Add the following line in the file:
  -A INPUT -m state --state NEW -m tcp -p tcp --dport 5666 -j ACCEPT
  
  Copy to Clipboard Toggle word wrap
3. Restart the iptables service using the following command:
  # service iptables restart
  
  Copy to Clipboard Toggle word wrap
4. Save the file and restart the NRPE service:
  # service nrpe restart
  
  Copy to Clipboard Toggle word wrap
On Red Hat Gluster Storage based on Red Hat Enterprise Linux 7:
1. Run the following commands to open the port:
  # firewall-cmd --zone=public --add-port=5666/tcp # firewall-cmd --zone=public --add-port=5666/tcp --permanent
  Copy to Clipboard Toggle word wrap
Checking Port 5666 From the Nagios Server with Telnet
Use telnet to verify the Red Hat Gluster Storage node's ports. To verify the ports of the Red Hat Gluster Storage node, perform the steps given below:
1. Log in as root on Nagios server.
2. Test the connection on port 5666 from the Nagios server to the Red Hat Gluster Storage node using the following command:
  # telnet RedHatStorageNodeIP 5666
  
  Copy to Clipboard Toggle word wrap
3. The output displayed is similar to:
  telnet 10.70.36.49 5666 Trying 10.70.36.49... Connected to 10.70.36.49. Escape character is '^]'.
  
  Copy to Clipboard Toggle word wrap
Connection Refused By Host
This error is due to port/firewall issues or incorrectly configured allowed_hosts directives. See the sections CHECK_NRPE: Error - Could Not Complete SSL Handshake and CHECK_NRPE: Socket Timeout After n Seconds for troubleshooting steps.

17.7. Troubleshooting Nagios

17.7.1. Troubleshooting NSCA and NRPE Configuration Issues
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

17.7. Troubleshooting Nagios

17.7.1. Troubleshooting NSCA and NRPE Configuration IssuesCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

17.7.1. Troubleshooting NSCA and NRPE Configuration Issues
Copy link