Chapter 10. Monitoring and analyzing GFS2 file systems using Performance Co-Pilot (PCP)
Performance Co-Pilot (PCP) can help with monitoring and analyzing GFS2 file systems. Monitoring of GFS2 file systems in PCP is provided by the GFS2 PMDA module in Red Hat Enterprise Linux which is available through the pcp-pmda-gfs2
package.
The GFS2 PMDA provides a number of metrics given by the GFS2 statistics provided in the debugfs
subsystem. When installed, the PMDA exposes values given in the glocks
, glstats
, and sbstats
files. These report sets of statistics on each mounted GFS2 filesystem. The PMDA also makes use of the GFS2 kernel tracepoints exposed by the Kernel Function Tracer (ftrace
).
10.1. Installing the GFS2 PMDA Copy linkLink copied to clipboard!
In order to operate correctly, The GFS2 PMDA requires that the debugfs
file system is mounted. If the debugfs
file system is not mounted, run the following commands before installing the GFS2 PMDA:
mkdir /sys/kernel/debug mount -t debugfs none /sys/kernel/debug
# mkdir /sys/kernel/debug
# mount -t debugfs none /sys/kernel/debug
The GFS2 PMDA is not enabled as part of the default installation. In order to make use of GFS2 metric monitoring through PCP you must enable it after installation.
Run the following commands to install PCP and enable the GFS2 PMDA. Note that the PMDA install script must be run as root.
10.2. Displaying information about the available performance metrics with the pminfo tool Copy linkLink copied to clipboard!
The pminfo
tool displays information about the available performance metrics. The following examples show different GFS2 metrics you can display with this tool.
10.2.1. Examining the number of glock structures that currently exist per file system Copy linkLink copied to clipboard!
The GFS2 glock metrics give insights to the number of glock structures currently incore for each mounted GFS2 file system and their locking states. In GFS2, a glock is a data structure that brings together the DLM and caching into a single state machine. Each glock has a 1:1 mapping with a single DLM lock and provides caching for the lock states so that repetitive operations carried out on a single node do not have to repeatedly call the DLM, reducing unnecessary network traffic.
The following pminfo
command displays a list of the number of glocks per mounted GFS2 file system by their lock mode.
10.2.2. Examining the number of glock structures that exist per file system by type Copy linkLink copied to clipboard!
The GFS2 glstats metrics give counts of each type of glock which exist for each files ystem, a large number of these will normally be of either the inode (inode and metadata) or resource group (resource group metadata) type.
The following pminfo
command displays a list of the number of each type of Glock per mounted GFS2 file system.
10.2.3. Checking the number of glock structures that are in a wait state Copy linkLink copied to clipboard!
The most important holder flags are H (holder: indicates that requested lock is granted) and W (wait: set while waiting for request to complete). These flags are set on granted lock requests and queued lock requests, respectively.
The following pminfo
command displays a list of the number of glocks with the Wait (W) holder flag for each mounted GFS2 file system.
pminfo -f gfs2.holders.flags.wait
# pminfo -f gfs2.holders.flags.wait
gfs2.holders.flags.wait
inst [0 or "afc_cluster:data"] value 0
inst [1 or "afc_cluster:bin"] value 0
If you do see a number of waiting requests queued on a resource group lock there may be a number of reasons for this. One is that there are a large number of nodes compared to the number of resource groups in the file system. Another is that the file system may be very nearly full (requiring, on average, longer searches for free blocks). The situation in both cases can be improved by adding more storage and using the gfs2_grow
command to expand the file system.
10.2.4. Checking file system operation latency using the kernel tracepoint based metrics Copy linkLink copied to clipboard!
The GFS2 PMDA supports collecting of metrics from the GFS2 kernel tracepoints. By default the reading of these metrics is disabled. Activating these metrics turns on the GFS2 kernel tracepoints when the metrics are collected in order to populate the metric values. This could have a small effect on performance throughput when these Kernel Tracepoint metrics are enabled.
PCP provides the pmstore
tool, which allows you to modify PMDA settings based on metric values. The gfs2.control.*
metrics allow the toggling of GFS2 kernel tracepoints. The following example uses the pmstore
command to enable all of the GFS2 kernel tracepoints.
pmstore gfs2.control.tracepoints.all 1
# pmstore gfs2.control.tracepoints.all 1
gfs2.control.tracepoints.all old value=0 new value=1
When this command is run, the PMDA switches on all of the GFS2 tracepoints in the debugfs
file system. The "Complete Metric List" table in Complete listing of available metrics for GFS2 in PCP explains each of the control tracepoints and their usage, An explanation on the effect of each control tracepoint and its available options is also available through the help switch in pminfo
.
The GFS2 promote metrics count the number of promote requests on the file system. These requests are separated by the number of requests that have occurred on the first attempt and “others” which are granted after their initial promote request. A drop in the number of first time promotes with a rise in “other” promotes can indicate issues with file contention.
The GFS2 demote request metrics, like the promote request metrics, count the number of demote requests which occur on the file system. These, however, are also split between requests that have come from the current node and requests that have come from other nodes on the system. A large number of demote requests from remote nodes can indicate contention between two nodes for a given resource group.
The pminfo
tool displays information about the available performance metrics. This procedure displays a list of the number of glocks with the Wait (W) holder flag for each mounted GFS2 file system. The following pminfo
command displays a list of the number of glocks with the Wait (W) holder flag for each mounted GFS2 file system.
It is a good idea to determine the general values observed when the workload is running without issues to be able to notice changes in performance when these values differ from their normal range.
For example, you might notice a change in the number of promote requests waiting to complete rather than completing on first attempt, which the output from following command would allow you to determine.
The output from following command would allow you to determine a large increase in remote demote requests (especially if from other cluster nodes).
The output from the following command could indicate an unexplained increase in log flushes.
pminfo -f gfs2.tracepoints.log_flush.total]
# pminfo -f gfs2.tracepoints.log_flush.total]
gfs2.tracepoints.log_flush.total
inst [0 or "afc_cluster:data"] value 0
inst [1 or "afc_cluster:bin"] value 0
10.3. Complete listing of available metrics for GFS2 in PCP Copy linkLink copied to clipboard!
The following table describes the full list of performance metrics given by the pcp-pmda-gfs2
package for GFS2 file systems.
Metric Name | Description |
---|---|
|
Metrics regarding the information collected from the glock stats file ( |
| Range of metrics counting the number of glocks that exist with the given glocks flags |
|
Metrics regarding the information collected from the glock stats file ( |
| Range of metrics counting the number of glocks holders with the given holder flags |
|
Timing metrics regarding the information collected from the superblock stats file ( |
|
Metrics regarding the information collected from the glock stats file ( |
|
A derived metric making use of the data from both the |
|
A derived metric making use of the data from both the |
|
A derived metric making use of the data from the |
|
A derived metric making use of the data from the |
|
Metrics regarding the output from the GFS2 |
|
Configuration metrics which are used to switch on or off metric recording in the PMDA. Conrol metricsare toggled by means of the |
10.4. Performing minimal PCP setup to gather file system data Copy linkLink copied to clipboard!
This procedure outlines instructions on how to install a minimal PCP setup to collect statistics on Red Hat Enterprise Linux. This setup involves adding the minimum number of packages on a production system needed to gather data for further analysis.
The resulting tar.gz
archive of the pmlogger
output can be analyzed by using further PCP tools and can be compared with other sources of performance information.
Procedure
Install the required PCP packages.
yum install pcp pcp-pmda-gfs2
# yum install pcp pcp-pmda-gfs2
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Activate the GFS2 module for PCP.
cd /var/lib/pcp/pmdas/gfs2 ./Install
# cd /var/lib/pcp/pmdas/gfs2 # ./Install
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Start both the
pmcd
andpmlogger
services.systemctl start pmcd.service systemctl start pmlogger.service
# systemctl start pmcd.service # systemctl start pmlogger.service
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Perform operations on the GFS2 file system.
Stop both the
pmcd
andpmlogger
services.systemctl stop pmcd.service systemctl stop pmlogger.service
# systemctl stop pmcd.service # systemctl stop pmlogger.service
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Collect the output and save it to a
tar.gz
file named based on the host name and the current date and time.cd /var/log/pcp/pmlogger tar -czf $(hostname).$(date+%F-%Hh%M).pcp.tar.gz $(hostname)
# cd /var/log/pcp/pmlogger # tar -czf $(hostname).$(date+%F-%Hh%M).pcp.tar.gz $(hostname)
Copy to Clipboard Copied! Toggle word wrap Toggle overflow