Chapter 10. Monitoring and analyzing GFS2 file systems using Performance Co-Pilot (PCP)
Performance Co-Pilot (PCP) can help with monitoring and analyzing GFS2 file systems. Monitoring of GFS2 file systems in PCP is provided by the GFS2 PMDA module in Red Hat Enterprise Linux which is available through the pcp-pmda-gfs2
package.
The GFS2 PMDA provides a number of metrics given by the GFS2 statistics provided in the debugfs
subsystem. When installed, the PMDA exposes values given in the glocks
, glstats
, and sbstats
files. These report sets of statistics on each mounted GFS2 filesystem. The PMDA also makes use of the GFS2 kernel tracepoints exposed by the Kernel Function Tracer (ftrace
).
10.1. Installing the GFS2 PMDA
In order to operate correctly, The GFS2 PMDA requires that the debugfs
file system is mounted. If the debugfs
file system is not mounted, run the following commands before installing the GFS2 PMDA:
# mkdir /sys/kernel/debug # mount -t debugfs none /sys/kernel/debug
The GFS2 PMDA is not enabled as part of the default installation. In order to make use of GFS2 metric monitoring through PCP you must enable it after installation.
Run the following commands to install PCP and enable the GFS2 PMDA. Note that the PMDA install script must be run as root.
# yum install pcp pcp-pmda-gfs2 # cd /var/lib/pcp/pmdas/gfs2 # ./Install Updating the Performance Metrics Name Space (PMNS) ... Terminate PMDA if already installed ... Updating the PMCD control file, and notifying PMCD ... Check gfs2 metrics have appeared ... 346 metrics and 255 values
10.2. Displaying information about the available performance metrics with the pminfo tool
The pminfo
tool displays information about the available performance metrics. The following examples show different GFS2 metrics you can display with this tool.
10.2.1. Examining the number of glock structures that currently exist per file system
The GFS2 glock metrics give insights to the number of glock structures currently incore for each mounted GFS2 file system and their locking states. In GFS2, a glock is a data structure that brings together the DLM and caching into a single state machine. Each glock has a 1:1 mapping with a single DLM lock and provides caching for the lock states so that repetitive operations carried out on a single node do not have to repeatedly call the DLM, reducing unnecessary network traffic.
The following pminfo
command displays a list of the number of glocks per mounted GFS2 file system by their lock mode.
# pminfo -f gfs2.glocks
gfs2.glocks.total
inst [0 or "afc_cluster:data"] value 43680
inst [1 or "afc_cluster:bin"] value 2091
gfs2.glocks.shared
inst [0 or "afc_cluster:data"] value 25
inst [1 or "afc_cluster:bin"] value 25
gfs2.glocks.unlocked
inst [0 or "afc_cluster:data"] value 43652
inst [1 or "afc_cluster:bin"] value 2063
gfs2.glocks.deferred
inst [0 or "afc_cluster:data"] value 0
inst [1 or "afc_cluster:bin"] value 0
gfs2.glocks.exclusive
inst [0 or "afc_cluster:data"] value 3
inst [1 or "afc_cluster:bin"] value 3
10.2.2. Examining the number of glock structures that exist per file system by type
The GFS2 glstats metrics give counts of each type of glock which exist for each files ystem, a large number of these will normally be of either the inode (inode and metadata) or resource group (resource group metadata) type.
The following pminfo
command displays a list of the number of each type of Glock per mounted GFS2 file system.
# pminfo -f gfs2.glstats
gfs2.glstats.total
inst [0 or "afc_cluster:data"] value 43680
inst [1 or "afc_cluster:bin"] value 2091
gfs2.glstats.trans
inst [0 or "afc_cluster:data"] value 3
inst [1 or "afc_cluster:bin"] value 3
gfs2.glstats.inode
inst [0 or "afc_cluster:data"] value 17
inst [1 or "afc_cluster:bin"] value 17
gfs2.glstats.rgrp
inst [0 or "afc_cluster:data"] value 43642
inst [1 or "afc_cluster:bin"] value 2053
gfs2.glstats.meta
inst [0 or "afc_cluster:data"] value 1
inst [1 or "afc_cluster:bin"] value 1
gfs2.glstats.iopen
inst [0 or "afc_cluster:data"] value 16
inst [1 or "afc_cluster:bin"] value 16
gfs2.glstats.flock
inst [0 or "afc_cluster:data"] value 0
inst [1 or "afc_cluster:bin"] value 0
gfs2.glstats.quota
inst [0 or "afc_cluster:data"] value 0
inst [1 or "afc_cluster:bin"] value 0
gfs2.glstats.journal
inst [0 or "afc_cluster:data"] value 1
inst [1 or "afc_cluster:bin"] value 1
10.2.3. Checking the number of glock structures that are in a wait state
The most important holder flags are H (holder: indicates that requested lock is granted) and W (wait: set while waiting for request to complete). These flags are set on granted lock requests and queued lock requests, respectively.
The following pminfo
command displays a list of the number of glocks with the Wait (W) holder flag for each mounted GFS2 file system.
# pminfo -f gfs2.holders.flags.wait
gfs2.holders.flags.wait
inst [0 or "afc_cluster:data"] value 0
inst [1 or "afc_cluster:bin"] value 0
If you do see a number of waiting requests queued on a resource group lock there may be a number of reasons for this. One is that there are a large number of nodes compared to the number of resource groups in the file system. Another is that the file system may be very nearly full (requiring, on average, longer searches for free blocks). The situation in both cases can be improved by adding more storage and using the gfs2_grow
command to expand the file system.
10.2.4. Checking file system operation latency using the kernel tracepoint based metrics
The GFS2 PMDA supports collecting of metrics from the GFS2 kernel tracepoints. By default the reading of these metrics is disabled. Activating these metrics turns on the GFS2 kernel tracepoints when the metrics are collected in order to populate the metric values. This could have a small effect on performance throughput when these Kernel Tracepoint metrics are enabled.
PCP provides the pmstore
tool, which allows you to modify PMDA settings based on metric values. The gfs2.control.*
metrics allow the toggling of GFS2 kernel tracepoints. The following example uses the pmstore
command to enable all of the GFS2 kernel tracepoints.
# pmstore gfs2.control.tracepoints.all 1
gfs2.control.tracepoints.all old value=0 new value=1
When this command is run, the PMDA switches on all of the GFS2 tracepoints in the debugfs
file system. The "Complete Metric List" table in Complete listing of available metrics for GFS2 in PCP explains each of the control tracepoints and their usage, An explanation on the effect of each control tracepoint and its available options is also available through the help switch in pminfo
.
The GFS2 promote metrics count the number of promote requests on the file system. These requests are separated by the number of requests that have occurred on the first attempt and “others” which are granted after their initial promote request. A drop in the number of first time promotes with a rise in “other” promotes can indicate issues with file contention.
The GFS2 demote request metrics, like the promote request metrics, count the number of demote requests which occur on the file system. These, however, are also split between requests that have come from the current node and requests that have come from other nodes on the system. A large number of demote requests from remote nodes can indicate contention between two nodes for a given resource group.
The pminfo
tool displays information about the available performance metrics. This procedure displays a list of the number of glocks with the Wait (W) holder flag for each mounted GFS2 file system. The following pminfo
command displays a list of the number of glocks with the Wait (W) holder flag for each mounted GFS2 file system.
# pminfo -f gfs2.latency.grant.all gfs2.latency.demote.all
gfs2.latency.grant.all
inst [0 or "afc_cluster:data"] value 0
inst [1 or "afc_cluster:bin"] value 0
gfs2.latency.demote.all
inst [0 or "afc_cluster:data"] value 0
inst [1 or "afc_cluster:bin"] value 0
It is a good idea to determine the general values observed when the workload is running without issues to be able to notice changes in performance when these values differ from their normal range.
For example, you might notice a change in the number of promote requests waiting to complete rather than completing on first attempt, which the output from following command would allow you to determine.
# pminfo -f gfs2.latency.grant.all gfs2.latency.demote.all
gfs2.tracepoints.promote.other.null_lock
inst [0 or "afc_cluster:data"] value 0
inst [1 or "afc_cluster:bin"] value 0
gfs2.tracepoints.promote.other.concurrent_read
inst [0 or "afc_cluster:data"] value 0
inst [1 or "afc_cluster:bin"] value 0
gfs2.tracepoints.promote.other.concurrent_write
inst [0 or "afc_cluster:data"] value 0
inst [1 or "afc_cluster:bin"] value 0
gfs2.tracepoints.promote.other.protected_read
inst [0 or "afc_cluster:data"] value 0
inst [1 or "afc_cluster:bin"] value 0
gfs2.tracepoints.promote.other.protected_write
inst [0 or "afc_cluster:data"] value 0
inst [1 or "afc_cluster:bin"] value 0
gfs2.tracepoints.promote.other.exclusive
inst [0 or "afc_cluster:data"] value 0
inst [1 or "afc_cluster:bin"] value 0
The output from following command would allow you to determine a large increase in remote demote requests (especially if from other cluster nodes).
# pminfo -f gfs2.tracepoints.demote_rq.requested
gfs2.tracepoints.demote_rq.requested.remote
inst [0 or "afc_cluster:data"] value 0
inst [1 or "afc_cluster:bin"] value 0
gfs2.tracepoints.demote_rq.requested.local
inst [0 or "afc_cluster:data"] value 0
inst [1 or "afc_cluster:bin"] value 0
The output from the following command could indicate an unexplained increase in log flushes.
# pminfo -f gfs2.tracepoints.log_flush.total]
gfs2.tracepoints.log_flush.total
inst [0 or "afc_cluster:data"] value 0
inst [1 or "afc_cluster:bin"] value 0
10.3. Complete listing of available metrics for GFS2 in PCP
The following table describes the full list of performance metrics given by the pcp-pmda-gfs2
package for GFS2 file systems.
Metric Name | Description |
---|---|
|
Metrics regarding the information collected from the glock stats file ( |
| Range of metrics counting the number of glocks that exist with the given glocks flags |
|
Metrics regarding the information collected from the glock stats file ( |
| Range of metrics counting the number of glocks holders with the given holder flags |
|
Timing metrics regarding the information collected from the superblock stats file ( |
|
Metrics regarding the information collected from the glock stats file ( |
|
A derived metric making use of the data from both the |
|
A derived metric making use of the data from both the |
|
A derived metric making use of the data from the |
|
A derived metric making use of the data from the |
|
Metrics regarding the output from the GFS2 |
|
Configuration metrics which are used to switch on or off metric recording in the PMDA. Conrol metricsare toggled by means of the |
10.4. Performing minimal PCP setup to gather file system data
This procedure outlines instructions on how to install a minimal PCP setup to collect statistics on Red Hat Enterprise Linux. This setup involves adding the minimum number of packages on a production system needed to gather data for further analysis.
The resulting tar.gz
archive of the pmlogger
output can be analyzed by using further PCP tools and can be compared with other sources of performance information.
Procedure
Install the required PCP packages.
# yum install pcp pcp-pmda-gfs2
Activate the GFS2 module for PCP.
# cd /var/lib/pcp/pmdas/gfs2 # ./Install
Start both the
pmcd
andpmlogger
services.# systemctl start pmcd.service # systemctl start pmlogger.service
- Perform operations on the GFS2 file system.
Stop both the
pmcd
andpmlogger
services.# systemctl stop pmcd.service # systemctl stop pmlogger.service
Collect the output and save it to a
tar.gz
file named based on the host name and the current date and time.# cd /var/log/pcp/pmlogger # tar -czf $(hostname).$(date+%F-%Hh%M).pcp.tar.gz $(hostname)