Chapter 10. Monitoring and analyzing GFS2 file systems using Performance Co-Pilot (PCP)

Performance Co-Pilot (PCP) can help with monitoring and analyzing GFS2 file systems. Monitoring of GFS2 file systems in PCP is provided by the GFS2 PMDA module in Red Hat Enterprise Linux which is available through the pcp-pmda-gfs2 package.

The GFS2 PMDA provides a number of metrics given by the GFS2 statistics provided in the debugfs subsystem. When installed, the PMDA exposes values given in the glocks, glstats, and sbstats files. These report sets of statistics on each mounted GFS2 filesystem. The PMDA also makes use of the GFS2 kernel tracepoints exposed by the Kernel Function Tracer (ftrace).

10.1. Installing the GFS2 PMDA
Copy link

In order to operate correctly, The GFS2 PMDA requires that the debugfs file system is mounted. If the debugfs file system is not mounted, run the following commands before installing the GFS2 PMDA:

mkdir /sys/kernel/debug
mount -t debugfs none /sys/kernel/debug

# mkdir /sys/kernel/debug
# mount -t debugfs none /sys/kernel/debug

Copy to Clipboard

Toggle word wrap

The GFS2 PMDA is not enabled as part of the default installation. In order to make use of GFS2 metric monitoring through PCP you must enable it after installation.

Run the following commands to install PCP and enable the GFS2 PMDA. Note that the PMDA install script must be run as root.

yum install pcp pcp-pmda-gfs2
cd /var/lib/pcp/pmdas/gfs2
./Install
Updating the Performance Metrics Name Space (PMNS) ...
Terminate PMDA if already installed ...
Updating the PMCD control file, and notifying PMCD ...
Check gfs2 metrics have appeared ... 346 metrics and 255 values

# yum install pcp pcp-pmda-gfs2
# cd /var/lib/pcp/pmdas/gfs2
# ./Install
Updating the Performance Metrics Name Space (PMNS) ...
Terminate PMDA if already installed ...
Updating the PMCD control file, and notifying PMCD ...
Check gfs2 metrics have appeared ... 346 metrics and 255 values

Copy to Clipboard

Toggle word wrap

10.2. Displaying information about the available performance metrics with the pminfo tool
Copy link

The pminfo tool displays information about the available performance metrics. The following examples show different GFS2 metrics you can display with this tool.

10.2.1. Examining the number of glock structures that currently exist per file system
Copy link

The GFS2 glock metrics give insights to the number of glock structures currently incore for each mounted GFS2 file system and their locking states. In GFS2, a glock is a data structure that brings together the DLM and caching into a single state machine. Each glock has a 1:1 mapping with a single DLM lock and provides caching for the lock states so that repetitive operations carried out on a single node do not have to repeatedly call the DLM, reducing unnecessary network traffic.

The following pminfo command displays a list of the number of glocks per mounted GFS2 file system by their lock mode.

pminfo -f gfs2.glocks

gfs2.glocks.total
    inst [0 or "afc_cluster:data"] value 43680
    inst [1 or "afc_cluster:bin"] value 2091

gfs2.glocks.shared
    inst [0 or "afc_cluster:data"] value 25
    inst [1 or "afc_cluster:bin"] value 25

gfs2.glocks.unlocked
    inst [0 or "afc_cluster:data"] value 43652
    inst [1 or "afc_cluster:bin"] value 2063

gfs2.glocks.deferred
    inst [0 or "afc_cluster:data"] value 0
    inst [1 or "afc_cluster:bin"] value 0

gfs2.glocks.exclusive
    inst [0 or "afc_cluster:data"] value 3
    inst [1 or "afc_cluster:bin"] value 3

# pminfo -f gfs2.glocks

gfs2.glocks.total
    inst [0 or "afc_cluster:data"] value 43680
    inst [1 or "afc_cluster:bin"] value 2091

gfs2.glocks.shared
    inst [0 or "afc_cluster:data"] value 25
    inst [1 or "afc_cluster:bin"] value 25

gfs2.glocks.unlocked
    inst [0 or "afc_cluster:data"] value 43652
    inst [1 or "afc_cluster:bin"] value 2063

gfs2.glocks.deferred
    inst [0 or "afc_cluster:data"] value 0
    inst [1 or "afc_cluster:bin"] value 0

gfs2.glocks.exclusive
    inst [0 or "afc_cluster:data"] value 3
    inst [1 or "afc_cluster:bin"] value 3

Copy to Clipboard

Toggle word wrap

10.2.2. Examining the number of glock structures that exist per file system by type
Copy link

The GFS2 glstats metrics give counts of each type of glock which exist for each files ystem, a large number of these will normally be of either the inode (inode and metadata) or resource group (resource group metadata) type.

The following pminfo command displays a list of the number of each type of Glock per mounted GFS2 file system.

pminfo -f gfs2.glstats

gfs2.glstats.total
    inst [0 or "afc_cluster:data"] value 43680
    inst [1 or "afc_cluster:bin"] value 2091

gfs2.glstats.trans
    inst [0 or "afc_cluster:data"] value 3
    inst [1 or "afc_cluster:bin"] value 3

gfs2.glstats.inode
    inst [0 or "afc_cluster:data"] value 17
    inst [1 or "afc_cluster:bin"] value 17

gfs2.glstats.rgrp
    inst [0 or "afc_cluster:data"] value 43642
    inst [1 or "afc_cluster:bin"] value 2053

gfs2.glstats.meta
    inst [0 or "afc_cluster:data"] value 1
    inst [1 or "afc_cluster:bin"] value 1

gfs2.glstats.iopen
    inst [0 or "afc_cluster:data"] value 16
    inst [1 or "afc_cluster:bin"] value 16

gfs2.glstats.flock
    inst [0 or "afc_cluster:data"] value 0
    inst [1 or "afc_cluster:bin"] value 0

gfs2.glstats.quota
    inst [0 or "afc_cluster:data"] value 0
    inst [1 or "afc_cluster:bin"] value 0

gfs2.glstats.journal
    inst [0 or "afc_cluster:data"] value 1
    inst [1 or "afc_cluster:bin"] value 1

# pminfo -f gfs2.glstats

gfs2.glstats.total
    inst [0 or "afc_cluster:data"] value 43680
    inst [1 or "afc_cluster:bin"] value 2091

gfs2.glstats.trans
    inst [0 or "afc_cluster:data"] value 3
    inst [1 or "afc_cluster:bin"] value 3

gfs2.glstats.inode
    inst [0 or "afc_cluster:data"] value 17
    inst [1 or "afc_cluster:bin"] value 17

gfs2.glstats.rgrp
    inst [0 or "afc_cluster:data"] value 43642
    inst [1 or "afc_cluster:bin"] value 2053

gfs2.glstats.meta
    inst [0 or "afc_cluster:data"] value 1
    inst [1 or "afc_cluster:bin"] value 1

gfs2.glstats.iopen
    inst [0 or "afc_cluster:data"] value 16
    inst [1 or "afc_cluster:bin"] value 16

gfs2.glstats.flock
    inst [0 or "afc_cluster:data"] value 0
    inst [1 or "afc_cluster:bin"] value 0

gfs2.glstats.quota
    inst [0 or "afc_cluster:data"] value 0
    inst [1 or "afc_cluster:bin"] value 0

gfs2.glstats.journal
    inst [0 or "afc_cluster:data"] value 1
    inst [1 or "afc_cluster:bin"] value 1

Copy to Clipboard

Toggle word wrap

10.2.3. Checking the number of glock structures that are in a wait state
Copy link

The most important holder flags are H (holder: indicates that requested lock is granted) and W (wait: set while waiting for request to complete). These flags are set on granted lock requests and queued lock requests, respectively.

The following pminfo command displays a list of the number of glocks with the Wait (W) holder flag for each mounted GFS2 file system.

pminfo -f gfs2.holders.flags.wait

gfs2.holders.flags.wait
    inst [0 or "afc_cluster:data"] value 0
    inst [1 or "afc_cluster:bin"] value 0

# pminfo -f gfs2.holders.flags.wait

gfs2.holders.flags.wait
    inst [0 or "afc_cluster:data"] value 0
    inst [1 or "afc_cluster:bin"] value 0

Copy to Clipboard

Toggle word wrap

If you do see a number of waiting requests queued on a resource group lock there may be a number of reasons for this. One is that there are a large number of nodes compared to the number of resource groups in the file system. Another is that the file system may be very nearly full (requiring, on average, longer searches for free blocks). The situation in both cases can be improved by adding more storage and using the gfs2_grow command to expand the file system.

10.2.4. Checking file system operation latency using the kernel tracepoint based metrics
Copy link

The GFS2 PMDA supports collecting of metrics from the GFS2 kernel tracepoints. By default the reading of these metrics is disabled. Activating these metrics turns on the GFS2 kernel tracepoints when the metrics are collected in order to populate the metric values. This could have a small effect on performance throughput when these Kernel Tracepoint metrics are enabled.

PCP provides the pmstore tool, which allows you to modify PMDA settings based on metric values. The gfs2.control.* metrics allow the toggling of GFS2 kernel tracepoints. The following example uses the pmstore command to enable all of the GFS2 kernel tracepoints.

pmstore gfs2.control.tracepoints.all 1
gfs2.control.tracepoints.all old value=0 new value=1

# pmstore gfs2.control.tracepoints.all 1
gfs2.control.tracepoints.all old value=0 new value=1

Copy to Clipboard

Toggle word wrap

When this command is run, the PMDA switches on all of the GFS2 tracepoints in the debugfs file system. The "Complete Metric List" table in Complete listing of available metrics for GFS2 in PCP explains each of the control tracepoints and their usage, An explanation on the effect of each control tracepoint and its available options is also available through the help switch in pminfo.

The GFS2 promote metrics count the number of promote requests on the file system. These requests are separated by the number of requests that have occurred on the first attempt and “others" which are granted after their initial promote request. A drop in the number of first time promotes with a rise in “other” promotes can indicate issues with file contention.

The GFS2 demote request metrics, like the promote request metrics, count the number of demote requests which occur on the file system. These, however, are also split between requests that have come from the current node and requests that have come from other nodes on the system. A large number of demote requests from remote nodes can indicate contention between two nodes for a given resource group.

The pminfo tool displays information about the available performance metrics. This procedure displays a list of the number of glocks with the Wait (W) holder flag for each mounted GFS2 file system. The following pminfo command displays a list of the number of glocks with the Wait (W) holder flag for each mounted GFS2 file system.

pminfo -f gfs2.latency.grant.all gfs2.latency.demote.all

gfs2.latency.grant.all
    inst [0 or "afc_cluster:data"] value 0
    inst [1 or "afc_cluster:bin"] value 0

gfs2.latency.demote.all
    inst [0 or "afc_cluster:data"] value 0
    inst [1 or "afc_cluster:bin"] value 0

# pminfo -f gfs2.latency.grant.all gfs2.latency.demote.all

gfs2.latency.grant.all
    inst [0 or "afc_cluster:data"] value 0
    inst [1 or "afc_cluster:bin"] value 0

gfs2.latency.demote.all
    inst [0 or "afc_cluster:data"] value 0
    inst [1 or "afc_cluster:bin"] value 0

Copy to Clipboard

Toggle word wrap

It is a good idea to determine the general values observed when the workload is running without issues to be able to notice changes in performance when these values differ from their normal range.

For example, you might notice a change in the number of promote requests waiting to complete rather than completing on first attempt, which the output from following command would allow you to determine.

pminfo -f gfs2.latency.grant.all gfs2.latency.demote.all

gfs2.tracepoints.promote.other.null_lock
     inst [0 or "afc_cluster:data"] value 0
     inst [1 or "afc_cluster:bin"] value 0

gfs2.tracepoints.promote.other.concurrent_read
     inst [0 or "afc_cluster:data"] value 0
     inst [1 or "afc_cluster:bin"] value 0

gfs2.tracepoints.promote.other.concurrent_write
     inst [0 or "afc_cluster:data"] value 0
     inst [1 or "afc_cluster:bin"] value 0

gfs2.tracepoints.promote.other.protected_read
     inst [0 or "afc_cluster:data"] value 0
     inst [1 or "afc_cluster:bin"] value 0

gfs2.tracepoints.promote.other.protected_write
     inst [0 or "afc_cluster:data"] value 0
     inst [1 or "afc_cluster:bin"] value 0

gfs2.tracepoints.promote.other.exclusive
     inst [0 or "afc_cluster:data"] value 0
     inst [1 or "afc_cluster:bin"] value 0

# pminfo -f gfs2.latency.grant.all gfs2.latency.demote.all

gfs2.tracepoints.promote.other.null_lock
     inst [0 or "afc_cluster:data"] value 0
     inst [1 or "afc_cluster:bin"] value 0

gfs2.tracepoints.promote.other.concurrent_read
     inst [0 or "afc_cluster:data"] value 0
     inst [1 or "afc_cluster:bin"] value 0

gfs2.tracepoints.promote.other.concurrent_write
     inst [0 or "afc_cluster:data"] value 0
     inst [1 or "afc_cluster:bin"] value 0

gfs2.tracepoints.promote.other.protected_read
     inst [0 or "afc_cluster:data"] value 0
     inst [1 or "afc_cluster:bin"] value 0

gfs2.tracepoints.promote.other.protected_write
     inst [0 or "afc_cluster:data"] value 0
     inst [1 or "afc_cluster:bin"] value 0

gfs2.tracepoints.promote.other.exclusive
     inst [0 or "afc_cluster:data"] value 0
     inst [1 or "afc_cluster:bin"] value 0

Copy to Clipboard

Toggle word wrap

The output from following command would allow you to determine a large increase in remote demote requests (especially if from other cluster nodes).

pminfo -f gfs2.tracepoints.demote_rq.requested

gfs2.tracepoints.demote_rq.requested.remote
     inst [0 or "afc_cluster:data"] value 0
     inst [1 or "afc_cluster:bin"] value 0

gfs2.tracepoints.demote_rq.requested.local
     inst [0 or "afc_cluster:data"] value 0
     inst [1 or "afc_cluster:bin"] value 0

# pminfo -f gfs2.tracepoints.demote_rq.requested

gfs2.tracepoints.demote_rq.requested.remote
     inst [0 or "afc_cluster:data"] value 0
     inst [1 or "afc_cluster:bin"] value 0

gfs2.tracepoints.demote_rq.requested.local
     inst [0 or "afc_cluster:data"] value 0
     inst [1 or "afc_cluster:bin"] value 0

Copy to Clipboard

Toggle word wrap

The output from the following command could indicate an unexplained increase in log flushes.

pminfo -f gfs2.tracepoints.log_flush.total]

gfs2.tracepoints.log_flush.total
     inst [0 or "afc_cluster:data"] value 0
     inst [1 or "afc_cluster:bin"] value 0

# pminfo -f gfs2.tracepoints.log_flush.total]

gfs2.tracepoints.log_flush.total
     inst [0 or "afc_cluster:data"] value 0
     inst [1 or "afc_cluster:bin"] value 0

Copy to Clipboard

Toggle word wrap

10.3. Complete listing of available metrics for GFS2 in PCP
Copy link

The following table describes the full list of performance metrics given by the pcp-pmda-gfs2 package for GFS2 file systems.

Expand

Table 10.1. Complete Metric List
Metric Name	Description
`gfs2.glocks.*`	Metrics regarding the information collected from the glock stats file (`glocks`) which count the number of glocks in each state that currently exists for each GFS2 file system currently mounted on the system.
`gfs2.glocks.flags.*`	Range of metrics counting the number of glocks that exist with the given glocks flags
`gfs2.holders.*`	Metrics regarding the information collected from the glock stats file (`glocks`) which counts the number of glocks with holders in each lock state that currently exists for each GFS2 file system currently mounted on the system.
`gfs2.holders.flags.*`	Range of metrics counting the number of glocks holders with the given holder flags
`gfs2.sbstats.*`	Timing metrics regarding the information collected from the superblock stats file (`sbstats`) for each GFS2 file system currently mounted on the system.
`gfs2.glstats.*`	Metrics regarding the information collected from the glock stats file (`glstats`) which count the number of each type of glock that currently exists for each GFS2 file system currently mounted on the system.
`gfs2.latency.grant.*`	A derived metric making use of the data from both the `gfs2_glock_queue` and `gfs2_glock_state_change` tracepoints to calculate an average latency in microseconds for glock grant requests to be completed for each mounted file system. This metric is useful for discovering potential slowdowns on the file system when the grant latency increases.
`gfs2.latency.demote.*`	A derived metric making use of the data from both the `gfs2_glock_state_change` and `gfs2_demote_rq` tracepoints to calculate an average latency in microseconds for glock demote requests to be completed for each mounted file system. This metric is useful for discovering potential slowdowns on the file system when the demote latency increases.
`gfs2.latency.queue.*`	A derived metric making use of the data from the `gfs2_glock_queue` tracepoint to calculate an average latency in microseconds for glock queue requests to be completed for each mounted file system.
`gfs2.worst_glock.*`	A derived metric making use of the data from the `gfs2_glock_lock_time` tracepoint to calculate a perceived “current worst glock" for each mounted file system. This metric is useful for discovering potential lock contention and file system slowdown if the same lock is suggested multiple times.
`gfs2.tracepoints.*`	Metrics regarding the output from the GFS2 `debugfs` tracepoints for each file system currently mounted on the system. Each sub-type of these metrics (one of each GFS2 tracepoint) can be individually controlled whether on or off using the control metrics.
`gfs2.control.*`	Configuration metrics which are used to switch on or off metric recording in the PMDA. Conrol metricsare toggled by means of the `pmstore` tool.

10.4. Performing minimal PCP setup to gather file system data
Copy link

This procedure outlines instructions on how to install a minimal PCP setup to collect statistics on Red Hat Enterprise Linux. This setup involves adding the minimum number of packages on a production system needed to gather data for further analysis.

The resulting tar.gz archive of the pmlogger output can be analyzed by using further PCP tools and can be compared with other sources of performance information.

Procedure

Install the required PCP packages.
```
yum install pcp pcp-pmda-gfs2
```
```
# yum install pcp pcp-pmda-gfs2
```
Copy to Clipboard Toggle word wrap
Activate the GFS2 module for PCP.
```
cd /var/lib/pcp/pmdas/gfs2
./Install
```
```
# cd /var/lib/pcp/pmdas/gfs2
# ./Install
```
Copy to Clipboard Toggle word wrap

Start both the pmcd and pmlogger services.

systemctl start pmcd.service
systemctl start pmlogger.service

# systemctl start pmcd.service
# systemctl start pmlogger.service

Copy to Clipboard

Toggle word wrap

Perform operations on the GFS2 file system.

Stop both the pmcd and pmlogger services.

systemctl stop pmcd.service
systemctl stop pmlogger.service

# systemctl stop pmcd.service
# systemctl stop pmlogger.service

Copy to Clipboard

Toggle word wrap

Collect the output and save it to a tar.gz file named based on the host name and the current date and time.

cd /var/log/pcp/pmlogger
tar -czf $(hostname).$(date+%F-%Hh%M).pcp.tar.gz $(hostname)

# cd /var/log/pcp/pmlogger
# tar -czf $(hostname).$(date+%F-%Hh%M).pcp.tar.gz $(hostname)

Copy to Clipboard

Toggle word wrap

Chapter 10. Monitoring and analyzing GFS2 file systems using Performance Co-Pilot (PCP)

10.1. Installing the GFS2 PMDA
Copy link

10.2. Displaying information about the available performance metrics with the pminfo tool
Copy link

10.2.1. Examining the number of glock structures that currently exist per file system
Copy link

10.2.2. Examining the number of glock structures that exist per file system by type
Copy link

10.2.3. Checking the number of glock structures that are in a wait state
Copy link

10.2.4. Checking file system operation latency using the kernel tracepoint based metrics
Copy link

10.3. Complete listing of available metrics for GFS2 in PCP
Copy link

10.4. Performing minimal PCP setup to gather file system data
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 10. Monitoring and analyzing GFS2 file systems using Performance Co-Pilot (PCP)

10.1. Installing the GFS2 PMDACopy linkLink copied to clipboard!

10.2. Displaying information about the available performance metrics with the pminfo toolCopy linkLink copied to clipboard!

10.2.1. Examining the number of glock structures that currently exist per file systemCopy linkLink copied to clipboard!

10.2.2. Examining the number of glock structures that exist per file system by typeCopy linkLink copied to clipboard!

10.2.3. Checking the number of glock structures that are in a wait stateCopy linkLink copied to clipboard!

10.2.4. Checking file system operation latency using the kernel tracepoint based metricsCopy linkLink copied to clipboard!

10.3. Complete listing of available metrics for GFS2 in PCPCopy linkLink copied to clipboard!

10.4. Performing minimal PCP setup to gather file system dataCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

10.1. Installing the GFS2 PMDA
Copy link

10.2. Displaying information about the available performance metrics with the pminfo tool
Copy link

10.2.1. Examining the number of glock structures that currently exist per file system
Copy link

10.2.2. Examining the number of glock structures that exist per file system by type
Copy link

10.2.3. Checking the number of glock structures that are in a wait state
Copy link

10.2.4. Checking file system operation latency using the kernel tracepoint based metrics
Copy link

10.3. Complete listing of available metrics for GFS2 in PCP
Copy link

10.4. Performing minimal PCP setup to gather file system data
Copy link