Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.
9.6. Troubleshooting Nagios
9.6.1. Troubleshooting NSCA and NRPE Configuration Issues
- Check Firewall and Port Settings on Nagios ServerIf port 5667 is not opened on the server host's firewall, a timeout error is displayed. Ensure that port 5667 is opened.- Log in as root and run the following command on the Red Hat Storage node to get the list of current iptables rules:iptables -L # iptables -LCopy to Clipboard Copied! Toggle word wrap Toggle overflow 
- The output is displayed as shown below:ACCEPT tcp -- anywhere anywhere tcp dpt:5667 ACCEPT tcp -- anywhere anywhere tcp dpt:5667Copy to Clipboard Copied! Toggle word wrap Toggle overflow 
- If the port is not opened, add an iptables rule by adding the following line in/etc/sysconfig/iptablesfile:-A INPUT -m state --state NEW -m tcp -p tcp --dport 5667 -j ACCEPT -A INPUT -m state --state NEW -m tcp -p tcp --dport 5667 -j ACCEPTCopy to Clipboard Copied! Toggle word wrap Toggle overflow 
- Restart the iptables service using the following command:service iptables restart # service iptables restartCopy to Clipboard Copied! Toggle word wrap Toggle overflow 
- Restart the NSCA service using the following command:service nsca restart # service nsca restartCopy to Clipboard Copied! Toggle word wrap Toggle overflow 
 
- Check the Configuration File on Red Hat Storage NodeMessages cannot be sent to the NSCA server, if Nagios server IP or FQDN, cluster name and hostname (as configured in Nagios server) are not configured correctly.Open the Nagios server configuration file /etc/nagios/nagios_server.conf and verify if the correct configurations are set as shown below:Copy to Clipboard Copied! Toggle word wrap Toggle overflow If Host name is updated, restart the NSCA service using the following command:service nsca restart # service nsca restartCopy to Clipboard Copied! Toggle word wrap Toggle overflow 
- CHECK_NRPE: Error - Could Not Complete SSL HandshakeThis error occurs if the IP address of the Nagios server is not defined in thenrpe.cfgfile of the Red Hat Storage node. To fix this issue, follow the steps given below:- Add the Nagios server IP address in/etc/nagios/nrpe.cfgfile in theallowed_hostsline as shown below:allowed_hosts=127.0.0.1, NagiosServerIP allowed_hosts=127.0.0.1, NagiosServerIPCopy to Clipboard Copied! Toggle word wrap Toggle overflow Theallowed_hostsis the list of IP addresses which can execute NRPE commands.
- Save thenrpe.cfgfile and restart the NRPE service using the following command:service nrpe restart # service nrpe restartCopy to Clipboard Copied! Toggle word wrap Toggle overflow 
 
- CHECK_NRPE: Socket Timeout After n SecondsTo resolve this issue perform the steps given below:On Nagios Server:The default timeout value for the NRPE calls is 10 seconds and if the server does not respond within 10 seconds, Nagios GUI displays an error that the NRPE call has timed out in 10 seconds. To fix this issue, change the timeout value for NRPE calls by modifying the command definition configuration files.- Changing the NRPE timeout for services which directly invoke check_nrpe.For the services which directly invoke check_nrpe (check_disk_and_inode, check_cpu_multicore, and check_memory), modify the command definition configuration file/etc/nagios/gluster/gluster-commands.cfgby adding -t Time in Seconds as shown below:define command { command_name check_disk_and_inode command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_disk_and_inode -t TimeInSeconds }define command { command_name check_disk_and_inode command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_disk_and_inode -t TimeInSeconds }Copy to Clipboard Copied! Toggle word wrap Toggle overflow 
- Changing the NRPE timeout for the services innagios-server-addonspackage which invoke NRPE call through code.The services which invoke/usr/lib64/nagios/plugins/gluster/check_vol_server.py(check_vol_utilization, check_vol_status, check_vol_quota_status, check_vol_heal_status, and check_vol_georep_status) make NRPE call to the Red Hat Storage nodes for the details through code. To change the timeout for the NRPE calls, modify the command definition configuration file/etc/nagios/gluster/gluster-commands.cfgby adding -t No of seconds as shown below:define command { command_name check_vol_utilization command_line $USER1$/gluster/check_vol_server.py $ARG1$ $ARG2$ -w $ARG3$ -c $ARG4$ -o utilization -t TimeInSeconds }define command { command_name check_vol_utilization command_line $USER1$/gluster/check_vol_server.py $ARG1$ $ARG2$ -w $ARG3$ -c $ARG4$ -o utilization -t TimeInSeconds }Copy to Clipboard Copied! Toggle word wrap Toggle overflow The auto configuration servicegluster_auto_discoverymakes NRPE calls for the configuration details from the Red Hat Storage nodes. To change the NRPE timeout value for the auto configuration service, modify the command definition configuration file/etc/nagios/gluster/gluster-commands.cfgby adding -t TimeInSeconds as shown below:define command{ command_name gluster_auto_discovery command_line sudo $USER1$/gluster/configure-gluster-nagios.py -H $ARG1$ -c $HOSTNAME$ -m auto -n $ARG2$ -t TimeInSeconds }define command{ command_name gluster_auto_discovery command_line sudo $USER1$/gluster/configure-gluster-nagios.py -H $ARG1$ -c $HOSTNAME$ -m auto -n $ARG2$ -t TimeInSeconds }Copy to Clipboard Copied! Toggle word wrap Toggle overflow 
- Restart Nagios service using the following command:service nagios restart # service nagios restartCopy to Clipboard Copied! Toggle word wrap Toggle overflow 
 On Red Hat Storage node:- Add the Nagios server IP address as described in CHECK_NRPE: Error - Could Not Complete SSL Handshake section in Troubleshooting NRPE Configuration Issues section.
- Edit thenrpe.cfgfile using the following command:vi /etc/nagios/nrpe.cfg # vi /etc/nagios/nrpe.cfgCopy to Clipboard Copied! Toggle word wrap Toggle overflow 
- Search for thecommand_timeoutandconnection_timeoutsettings and change the value. Thecommand_timeoutvalue must be greater than or equal to the timeout value set in Nagios server.The timeout on checks can be set as connection_timeout=300 and the command_timeout=60 seconds.
- Restart the NRPE service using the following command:service nrpe restart # service nrpe restartCopy to Clipboard Copied! Toggle word wrap Toggle overflow 
 
- Check the NRPE Service StatusThis error occurs if the NRPE service is not running. To resolve this issue perform the steps given below:- Verify the status of NRPE service by logging into the Red Hat Storage node as root and running the following command:service nrpe status # service nrpe statusCopy to Clipboard Copied! Toggle word wrap Toggle overflow 
- If NRPE is not running, start the service using the following command:service nrpe start # service nrpe startCopy to Clipboard Copied! Toggle word wrap Toggle overflow 
 
- Check Firewall and Port SettingsThis error is associated with firewalls and ports. The timeout error is displayed if the NRPE traffic is not traversing a firewall, or if port 5666 is not open on the Red Hat Storage node.Enure that port 5666 is open on the Red Hat Storage node.- Runcheck_nrpecommand from the Nagios server to verify if the port is open and if NRPE is running on the Red Hat Storage Node .
- Log into the Nagios server as root and run the following command:/usr/lib64/nagios/plugins/check_nrpe -H RedHatStorageNodeIP # /usr/lib64/nagios/plugins/check_nrpe -H RedHatStorageNodeIPCopy to Clipboard Copied! Toggle word wrap Toggle overflow 
- The output is displayed as given below:NRPE v2.14 NRPE v2.14Copy to Clipboard Copied! Toggle word wrap Toggle overflow 
 If not, ensure the that port 5666 is opened on the Red Hat Storage node.- Run the following command on the Red Hat Storage node as root to get a listing of the current iptables rules:iptables -L # iptables -LCopy to Clipboard Copied! Toggle word wrap Toggle overflow 
- The output is displayed as shown below:ACCEPT - tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:5666 ACCEPT - tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:5666Copy to Clipboard Copied! Toggle word wrap Toggle overflow 
 
- If the port is not open, add iptables rule for it.- To add iptables rule, edit theiptablesfile as shown below:vi /etc/sysconfig/iptables # vi /etc/sysconfig/iptablesCopy to Clipboard Copied! Toggle word wrap Toggle overflow 
- Add the following line in the file:-A INPUT -m state --state NEW -m tcp -p tcp --dport 5666 -j ACCEPT -A INPUT -m state --state NEW -m tcp -p tcp --dport 5666 -j ACCEPTCopy to Clipboard Copied! Toggle word wrap Toggle overflow 
- Restart the iptables service using the following command:service iptables restart # service iptables restartCopy to Clipboard Copied! Toggle word wrap Toggle overflow 
- Save the file and restart the NRPE service:service nrpe restart # service nrpe restartCopy to Clipboard Copied! Toggle word wrap Toggle overflow 
 
- Checking Port 5666 From the Nagios Server with TelnetUse telnet to verify the Red Hat Storage node's ports. To verify the ports of the Red Hat Storage node, perform the steps given below:- Log in as root on Nagios server.
- Test the connection on port 5666 from the Nagios server to the Red Hat Storage node using the following command:telnet RedHatStorageNodeIP 5666 # telnet RedHatStorageNodeIP 5666Copy to Clipboard Copied! Toggle word wrap Toggle overflow 
- The output displayed is similar to:telnet 10.70.36.49 5666 Trying 10.70.36.49... Connected to 10.70.36.49. Escape character is '^]'. telnet 10.70.36.49 5666 Trying 10.70.36.49... Connected to 10.70.36.49. Escape character is '^]'.Copy to Clipboard Copied! Toggle word wrap Toggle overflow 
 
- Connection Refused By HostThis error is due to port/firewall issues or incorrectly configured allowed_hosts directives. See the sections CHECK_NRPE: Error - Could Not Complete SSL Handshake and CHECK_NRPE: Socket Timeout After n Seconds for troubleshooting steps.
9.6.2. Troubleshooting General Issues
					Set SELinux to permissive and restart the Nagios server.
				
Ensure that the host name given in Name field of Add Host window matches the host name given while configuring Nagios. The host name of the node is used while configuring Nagios server using auto-discovery.