Monitoring Ceph with Nagios Guide
Monitoring Ceph with Nagios Core.
Abstract
Chapter 1. Nagios and Ceph Copy linkLink copied to clipboard!
Nagios Core is an open source solution for monitoring nodes. Large Red Hat Ceph Storage clusters benefit from distributed monitoring systems such as Nagios Core. The Nagios Core checks each node in a cluster, including the health of the underlying operating system, as well as the health of the Red Hat Ceph Storage cluster daemons.
To deploy Nagios Core with Ceph requires:
- A running Red Hat Ceph Storage cluster.
Instead of Nagios Core, you can also substitute the more feature rich commercial version, Nagios XI.
Red Hat does not provide the Nagios packages.
Red Hat works with our technology partners to provide this documentation as a service to our customers. However, Red Hat does not provide support for this product. If you need technical assistance for this product, then contact Nagios for support.
Chapter 2. Nagios Core installation and configuration Copy linkLink copied to clipboard!
As a storage administrator, you can install Nagios Core by downloading the Nagios Core source code; then, configuring, making and installing it on the node that will run Nagios Core instance.
2.1. Installing and configuring the Nagios Core server from source Copy linkLink copied to clipboard!
There is not a Red Hat Enterprise Linux package for the Nagios Core software, so the Nagios Core software must be compiled from source.
Prerequisites
- Access to OpenSSL.
- Internet access.
Procedure
Install the prerequisites:
[user@nagios]# yum install -y httpd php php-cli gcc glibc glibc-common gd gd-devel net-snmp openssl openssl-devel wget unzipOpen port
80forhttpd:[user@nagios]# firewall-cmd --zone=public --add-port=80/tcp [user@nagios]# firewall-cmd --zone=public --add-port=80/tcp --permanentCreate a user and group for Nagios Core:
[user@nagios]# useradd nagios [user@nagios]# passwd nagios [user@nagios]# groupadd nagcmd [user@nagios]# usermod -a -G nagcmd nagios [user@nagios]# usermod -a -G nagcmd apacheDownload the latest version of Nagios Core and Plug-ins:
[user@nagios]# wget --inet4-only https://assets.nagios.com/downloads/nagioscore/releases/nagios-4.3.1.tar.gz [user@nagios]# wget --inet4-only http://www.nagios-plugins.org/download/nagios-plugins-2.2.1.tar.gz [user@nagios]# tar zxf nagios-4.3.1.tar.gz [user@nagios]# tar zxf nagios-plugins-2.2.1.tar.gz [user@nagios]# cd nagios-4.3.1Run
./configure:[user@nagios]# ./configure --with-command-group=nagcmdCompile the Nagios Core source code:
[user@nagios]# make allInstall Nagios source code:
[user@nagios]# make install [user@nagios]# make install-init [user@nagios]# make install-config [user@nagios]# make install-commandmode [user@nagios]# make install-webconfCopy the event handlers and change their ownership:
[user@nagios]# cp -R contrib/eventhandlers/ /usr/local/nagios/libexec/ [user@nagios]# chown -R nagios:nagios /usr/local/nagios/libexec/eventhandlersRun the pre-flight check:
[user@nagios]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfgMake and install the Nagios Core plug-ins:
[user@nagios]# cd ../nagios-plugins-2.2.1 [user@nagios]# ./configure --with-nagios-user=nagios --with-nagios-group=nagios [user@nagios]# make [user@nagios]# make installCreate a user for the Nagios Core user interface:
[user@nagios]$ sudo htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadminImportantIf adding a user other than
nagiosadmin, ensure the/usr/local/nagios/etc/cgi.cfgfile gets updated with the username too.Also modify the
/usr/local/nagios/etc/objects/contacts.cfgfile with the user name, full name and email address as needed.
2.2. Starting the Nagios Core service Copy linkLink copied to clipboard!
Start the Nagios Core service to monitor the Red Hat Ceph Storage cluster health.
Prerequisites
- Root-level access to the Nagios Core service.
Procedure
Add Nagios Core as a service and enable it:
[user@nagios]# chkconfig --add nagios [user@nagios]# chkconfig --level 35 nagios onStart the Nagios Core daemon and Apache:
[user@nagios]# systemctl start nagios [user@nagios]# systemctl enable httpd [user@nagios]# systemctl start httpd
2.3. Logging into the Nagios Core server Copy linkLink copied to clipboard!
Log in to the Nagios Core server to view the health status of the Red Hat Ceph Storage cluster.
Prerequisites
- User name and password for the Nagios web interface.
Procedure
With Nagios up and running, log in to the web user interface:
http://IP_ADDRESS/nagiosNagios Core will prompt for a user name and password.
- Input the login and password of the default Nagios Core user.
Chapter 3. Nagios remote plug-in executor installation Copy linkLink copied to clipboard!
As a storage administrator, you can monitor the Ceph storage cluster nodes, install Nagios plug-ins, the Ceph plug-ins and the Nagios remote plug-in executor (NRPE) add-on to each of the Ceph nodes.
For demonstration purposes, this section adds NRPE to a Ceph Monitor node with the hostname mon. Repeat the remaining procedures on all Ceph nodes that Nagios should monitor.
3.1. Installing and configuring Nagios Remote Plug-In Executor Copy linkLink copied to clipboard!
Install the Nagios Remote Plug-in Executor (NPRE) and configure it to communicate with the Nagios Core server.
Prerequisites
- Access to OpenSSL.
- User-level access to Ceph Monitor node.
Procedure
Install these packages on the node:
[user@mon]# yum install openssl openssl-devel gcc make gitNRPE installation requires a Nagios user. So create the user first:
[user@mon]# useradd nagios [user@mon]# passwd nagiosDownload the latest version of the Nagios plug-ins. Then, make and install them:
[user@mon]# wget http://www.nagios-plugins.org/download/nagios-plugins-2.2.1.tar.gz [user@mon]# tar zxf nagios-plugins-2.2.1.tar.gz [user@mon]# cd nagios-plugins-2.2.1 [user@mon]# ./configure [user@mon]# make [user@mon]# make installNRPE uses
xinetdfor communication. Install it before installing the NRPE module:[user@mon]# yum install xinetdDownload the latest verion of the Ceph plug-ins:
[user@mon]# cd ~ [user@mon]# git clone --recursive https://github.com/valerytschopp/ceph-nagios-plugins.git [user@mon]# cd ceph-nagios-plugins [user@mon]# make dist [user@mon]# make installDownload, make and install Nagios NRPE:
[user@mon]# cd ~ [user@mon]# wget https://github.com/NagiosEnterprises/nrpe/releases/download/nrpe-3.1.0/nrpe-3.1.0.tar.gz [user@mon]# tar xvfz nrpe-3.1.0.tar.gz [user@mon]# cd nrpe-3.1.0 [user@mon]# ./configure [user@mon]# make all [user@mon]# make install-groups-users [user@mon]# make install [user@mon]# make install-config [user@mon]# make install-init-
Edit the the
/etc/servicesfile, and add the service stringnrpe 5666/tcp: Open port
5666to allow communication with NRPE:[user@mon]# firewall-cmd --zone=public --add-port=5666/tcp [user@mon]# firewall-cmd --zone=public --add-port=5666/tcp --permanent
Additional Resources
- See https://github.com/valerytschopp/ceph-nagios-plugins for details.
3.2. Starting the Nagios Remote Plug-in Executor service Copy linkLink copied to clipboard!
Start the Nagios Remote Plug-in Executor service to collect data and report it back to the Nagios Core server.
Prerequisites
- User-level access to the Ceph Monitor node
Procedure
Enable, restart, and reload
xinetd:[user@mon]# systemctl enable xinetd [user@mon]# systemctl restart xinetd [user@mon]# systemctl reload xinetdEnable and start NRPE:
[user@mon]# systemctl enable nrpe [user@mon]# systemctl start nrpe
3.3. Configuring Nagios Core server access to remote nodes Copy linkLink copied to clipboard!
In order for the Nagios Core server to access Nagios Remote Plugin Executor (NPRE) on a remote machine, the remote machine’s xinetd and NRPE configurations must be updated with the IP address of the Nagios Core server.
Prerequisites
- User-level access to the Nagios Core server.
- Internet access.
- Access to the Nagios Remote Plugin Executor.
Procedure
Edit the xinetd configuration with the Nagios server’s IP address:
[user@mon]# vi /etc/xinetd.d/nrpe# default: off # description: NRPE (Nagios Remote Plugin Executor) service nrpe { disable = yes socket_type = stream port = 5666 wait = no user = nagios group = nagios server = /usr/local/nagios/bin/nrpe server_args = -c /usr/local/nagios/etc/nrpe.cfg --inetd only_from = 127.0.0.1,IP_ADDRESS_OF_NAGIOS_CORE_SERVER log_on_success = }After adding the IP address of the Nagios Core server to the
only_fromoption, restart thexinetdservice:[user@mon]# systemctl restart xinetdEdit the NRPE configuration with the Nagios server’s IP address:
[user@mon]# vi /usr/local/nagios/etc/nrpe.cfgallowed_hosts=127.0.0.1,IP_ADDRESS_OF_NAGIOS_CORE_SERVERAdd the IP address of the Nagios Core server to the
allowed_hostssetting. Then, restartnrpe:[user@mon]# systemctl restart nrpeTest the installation:
[user@host]# /usr/local/nagios/libexec/check_nrpe -H localhostThe check should echo
NRPE v3.1.0-rc1if it is working correctly.
Chapter 4. Configuring the remote node on the Nagios Core server Copy linkLink copied to clipboard!
Configure the Nagios Core server to be aware of the remote nodes.
Prerequisites
- User-level access to the remote node on the Nagios Core server.
- Internet access.
Procedure
Install the
check_nrpeplug-in:[user@nagios]# cd ~ [user@nagios]# wget https://github.com/NagiosEnterprises/nrpe/releases/download/nrpe-3.1.0/nrpe-3.1.0.tar.gz [user@nagios]# tar xvfz nrpe-3.1.0.tar.gz [user@nagios]# cd nrpe-3.1.0 [user@nagios]# ./configure [user@nagios]# make check_nrpe [user@nagios]# make install-pluginCreate a configuration for the remote host:
[user@nagios]# cd /usr/local/nagios/etc/objects [user@nagios]# cp localhost.cfg mon.cfgReplace
localhostwith the hostname of the remote host, and the loopback IP address with the IP address of the remote host. Finally, delete or comment out the Host Group definition.Change the file ownership to Nagios:
[user@nagios]# chown nagios:nagios mon.cfgAdd a
cfg_file=reference to themon.cfgfile in/usr/local/nagios/etc/nagios.cfg:[user@nagios]# vi /usr/local/nagios/etc/nagios.cfgExample
cfg_file=/usr/local/nagios/etc/objects/mon.cfgRestart the Nagios server:
[user@nagios]# systemctl restart nagiosEnsure that the make and install procedures worked and that there is connectivity between the Nagios Core server and the remote host containing NRPE:
[user@nagios]# /usr/local/nagios/libexec/check_nrpe -H IP_ADDRESS_OF_REMOTE_HOSTIt should echo
NRPE v3.1.0-rc1if it is working correctly.
Chapter 5. Configuring the Nagios Plugins for Ceph Copy linkLink copied to clipboard!
Configure the Nagios plug-ins for Red Hat Ceph Storage cluster.
Prerequisites
- User-level access to the Ceph Monitor node.
- A running Red Hat Ceph Storage cluster.
- Access to the Nagios Core Server.
Procedure
Log in to the monitor server and create a Ceph key and keyring for Nagios.
[user@mon]# ssh mon [user@mon]# cd /etc/ceph [user@mon]# ceph auth get-or-create client.nagios mon 'allow r' > client.nagios.keyringEach plug-in will require authentication. Repeat this procedure for each node that contains a plug-in.
Add a command for the
check_ceph_healthplug-in:[user@mon]# vi /usr/local/nagios/etc/nrpe.cfgExample
command[check_ceph_health]=/usr/lib/nagios/plugins/check_ceph_health --id nagios --keyring /etc/ceph/client.nagios.keyringEnable and restart the
nrpeservice:[user@mon]# systemctl enable nrpe [user@mon]# systemctl restart nrpeRepeat this procedure for each Ceph plug-in applicable to the node.
Return to the Nagios Core server and define a
check_nrpecommand for the NRPE plug-in:[user@nagios]# cd /usr/local/nagios/etc/objects [user@nagios]# vi commands.cfgdefine command{ command_name check_nrpe command_line USER1/check_nrpe -H HOSTADDRESS -c ARG1 }On the Nagios Core server, edit the configuration file for the node and add a service for the Ceph plug-in.
Example
[user@nagios]# vi /usr/local/nagios/etc/objects/mon.cfgdefine service { use generic-service host_name mon service_description Ceph Health Check check_command check_nrpe!check_ceph_health }NoteThe
check_commandsetting usescheck_nrpe!before the Ceph plug-in name. This tells NRPE to execute thecheck_ceph_healthcommand on the remote node.- Repeat this procedure for each plug-in applicable to the node.
Restart the Nagios Core server:
[user@nagios]# systemctl restart nagiosBefore proceeding with additional configuration, ensure that the plug-ins are working.
Example
[user@mon]# /usr/lib/nagios/plugins/check_ceph_health --id nagios --keyring /etc/ceph/client.nagios.keyringNoteThe
check_ceph_healthplug-in performs the equivalent of theceph healthcommand.
Additional Resources
- See the Ceph Nagios plugins web page for usage.