17.6. Configuring Nagios Manually
You can configure the Nagios server and node manually to monitor a Red Hat Gluster Storage trusted storage pool.
Note
It is recommended to configure Nagios using Auto-Discovery. For more information on configuring Nagios using Auto-Discovery, see Section 17.3.1, “Configuring Nagios”
For more information on Nagios Configuration files, see Chapter 21, Nagios Configuration Files
Configuring Nagios Server
- In the
/etc/nagios/gluster
directory, create a directory with the cluster name. All configurations for the cluster are added in this directory. - In the
/etc/nagios/gluster/cluster-name
directory, create a file with nameclustername.cfg
to specify thehost
andhostgroup
configurations. The service configurations for all the cluster and volume level services are added in this file.Note
Cluster is configured as host and host group in Nagios.In theclustername.cfg
file, add the following definitions:- Define a host group with cluster name as shown below:
define hostgroup{ hostgroup_name cluster-name alias cluster-name }
- Define a host with cluster name as shown below:
define host{ host_name cluster-name alias cluster-name use gluster-cluster address cluster-name }
- Define Cluster-Quorum service to monitor cluster quorum status as shown below:
define service { service_description Cluster - Quorum use gluster-passive-service host_name cluster-name }
- Define the Cluster Utilization service to monitor cluster utilization as shown below:
define service { service_description Cluster Utilization use gluster-service-with-graph check_command check_cluster_vol_usage!warning-threshold!critcal-threshold; host_name cluster-name }
- Add the following service definitions for each volume in the cluster:
- Volume Status service to monitor the status of the volume as shown below:
define service { service_description Volume Status - volume-name host_name cluster-name use gluster-service-without-graph _VOL_NAME volume-name notes Volume type : Volume-Type check_command check_vol_status!cluster-name!volume-name }
- Volume Utilization service to monitor the volume utilization as shown below:
define service { service_description Volume Utilization - volume-name host_name cluster-name use gluster-service-with-graph _VOL_NAME volume-name notes Volume type : Volume-Type check_command check_vol_utilization!cluster-name!volume-name!warning-threshold!critcal-threshold }
- Volume Split-brain service to monitor split brain status as shown below:
define service { service_description Volume Split-brain status - volume-name host_name cluster-name use gluster-service-without-graph _VOL_NAME volume-name check_command check_vol_heal_status!cluster1!vol1 }
- Volume Quota service to monitor the volume quota status as shown below:
define service { service_description Volume Quota - volume-name host_name cluster-name use gluster-service-without-graph _VOL_NAME volume-name check_command check_vol_quota_status!cluster-name!volume-name notes Volume type : Volume-Type }
- Volume Geo-Replication service to monitor Geo Replication status as shown below:
define service { service_description Volume Geo Replication - volume-name host_name cluster-name use gluster-service-without-graph _VOL_NAME volume-name check_command check_vol_georep_status!cluster-name!volume-name }
- In the
/etc/nagios/gluster/cluster-name
directory, create a file with namehost-name.cfg
. The host configuration for the node and service configuration for all the brick from the node are added in this file.Inhost-name.cfg
file, add following definitions:- Define Host for the node as shown below:
define host { use gluster-host hostgroups gluster_hosts,cluster-name alias host-name host_name host-name #Name given by user to identify the node in Nagios _HOST_UUID host-uuid #Host UUID returned by gluster peer status address host-address # This can be FQDN or IP address of the host }
- Create the following services for each brick in the node:
- Add Brick Utilization service as shown below:
define service { service_description Brick Utilization - brick-path host_name host-name # Host name given in host definition use brick-service _VOL_NAME Volume-Name notes Volume : Volume-Name _BRICK_DIR brick-path }
- Add Brick Status service as shown below:
define service { service_description Brick - brick-path host_name host-name # Host name given in host definition use gluster-brick-status-service _VOL_NAME Volume-Name notes Volume : Volume-Name _BRICK_DIR brick-path }
- Add host configurations and service configurations for all nodes in the cluster as shown in Step 3.
Configuring Red Hat Gluster Storage node
- In
/etc/nagios
directory of each Red Hat Gluster Storage node, editnagios_server.conf
file by setting the configurations as shown below:# NAGIOS SERVER # The nagios server IP address or FQDN to which the NSCA command # needs to be sent [NAGIOS-SERVER] nagios_server=NagiosServerIPAddress # CLUSTER NAME # The host name of the logical cluster configured in Nagios under which # the gluster volume services reside [NAGIOS-DEFINTIONS] cluster_name=cluster_auto # LOCAL HOST NAME # Host name given in the nagios server [HOST-NAME] hostname_in_nagios=NameOfTheHostInNagios # LOCAL HOST CONFIGURATION # Process monitoring sleeping intevel [HOST-CONF] proc-mon-sleep-time=TimeInSeconds
Thenagios_server.conf
file is used byglusterpmd
service to get server name, host name, and the process monitoring interval time. - Start the
glusterpmd
service using the following command:# service glusterpmd start
Changing Nagios Monitoring time interval
By default, the active Red Hat Gluster Storage services are monitored every 10 minutes. You can change the time interval for monitoring by editing the gluster-templates.cfg
file.
- In
/etc/nagios/gluster/gluster-templates.cfg
file, edit the service withgluster-service
name. - Add
normal_check_interval
and set the time interval to 1 to check all Red Hat Gluster Storage services every 1 minute as shown below:define service { name gluster-service use generic-service notifications_enabled 1 notification_period 24x7 notification_options w,u,c,r,f,s notification_interval 120 register 0 contacts +ovirt,snmp _GLUSTER_ENTITY HOST_SERVICE normal_check_interval 1 }
- To change this on individual service, add this property to the required service definition as shown below:
define service { name gluster-brick-status-service use gluster-service register 0 event_handler brick_status_event_handler check_command check_brick_status normal_check_interval 1 }
Thecheck_interval
is controlled by the global directiveinterval_length
. This defaults to 60 seconds. This can be changed in/etc/nagios/nagios.cfg
as shown below:# INTERVAL LENGTH # This is the seconds per unit interval as used in the # host/contact/service configuration files. Setting this to 60 means # that each interval is one minute long (60 seconds). Other settings # have not been tested much, so your mileage is likely to vary... interval_length=TimeInSeconds