Chapter 4. Configuring performance monitoring with PCP by using RHEL system roles
Performance Co-Pilot (PCP) is a system performance analysis toolkit. You can use it to record and analyze performance data from many components on a Red Hat Enterprise Linux system.
You can use the metrics
RHEL system role to automate the installation and configuration of PCP, and the role can configure Grafana to visualize PCP metrics.
4.1. Configuring Performance Co-Pilot by using the metrics
RHEL system role
You can use Performance Co-Pilot (PCP) to monitor many metrics, such as CPU utilization and memory usage. For example, this can help to identify resource and performance bottlenecks. By using the metrics
RHEL system role, you can remotely configure PCP on multiple hosts to record metrics.
Prerequisites
- You have prepared the control node and the managed nodes
- You are logged in to the control node as a user who can run playbooks on the managed nodes.
-
The account you use to connect to the managed nodes has
sudo
permissions on them.
Procedure
Create a playbook file, for example
~/playbook.yml
, with the following content:--- - name: Monitoring performance metrics hosts: managed-node-01.example.com tasks: - name: Configure Performance Co-Pilot ansible.builtin.include_role: name: rhel-system-roles.metrics vars: metrics_retention_days: 14 metrics_manage_firewall: true metrics_manage_selinux: true
The settings specified in the example playbook include the following:
metrics_retention_days: <number>
-
Sets the number of days after which the
pmlogger_daily
systemd timer removes old PCP archives. metrics_manage_firewall: <true|false>
-
Defines whether the role should open the required ports in the
firewalld
service. If you want to remotely access PCP on the managed nodes, set this variable totrue
.
For details about all variables used in the playbook, see the
/usr/share/ansible/roles/rhel-system-roles.metrics/README.md
file on the control node.Validate the playbook syntax:
$ ansible-playbook --syntax-check ~/playbook.yml
Note that this command only validates the syntax and does not protect against a wrong but valid configuration.
Run the playbook:
$ ansible-playbook ~/playbook.yml
Verification
Query a metric, for example:
# ansible managed-node-01.example.com -m command -a 'pminfo -f kernel.all.load'
Next step
Additional resources
-
/usr/share/ansible/roles/rhel-system-roles.metrics/README.md
file -
/usr/share/doc/rhel-system-roles/metrics/
directory
4.2. Configuring Performance Co-Pilot with authentication by using the metrics
RHEL system role
You can enable authentication in Performance Co-Pilot (PCP) so that the pmcd
service and Performance Metrics Domain Agents (PDMAs) can determine whether the user running the monitoring tools is allowed to perform an action. Authenticated users have access to metrics with sensitive information. Additionally, certain agents require authentication. For example, the bpftrace
agent uses authentication to identify whether a user is allowed to load bpftrace
scripts into the kernel to generate metrics.
By using the metrics
RHEL system role, you can remotely configure PCP with authentication on multiple hosts.
Prerequisites
- You have prepared the control node and the managed nodes
- You are logged in to the control node as a user who can run playbooks on the managed nodes.
-
The account you use to connect to the managed nodes has
sudo
permissions on them.
Procedure
Store your sensitive variables in an encrypted file:
Create the vault:
$ ansible-vault create vault.yml New Vault password: <vault_password> Confirm New Vault password: <vault_password>
After the
ansible-vault create
command opens an editor, enter the sensitive data in the<key>: <value>
format:metrics_usr: <username> metrics_pwd: <password>
- Save the changes, and close the editor. Ansible encrypts the data in the vault.
Create a playbook file, for example
~/playbook.yml
, with the following content:--- - name: Monitoring performance metrics hosts: managed-node-01.example.com tasks: - name: Configure Performance Co-Pilot ansible.builtin.include_role: name: rhel-system-roles.metrics vars: metrics_retention_days: 14 metrics_manage_firewall: true metrics_manage_selinux: true metrics_username: "{{ metrics_usr }}" metrics_password: "{{ metrics_pwd }}"
The settings specified in the example playbook include the following:
metrics_retention_days: <number>
-
Sets the number of days after which the
pmlogger_daily
systemd timer removes old PCP archives. metrics_manage_firewall: <true|false>
-
Defines whether the role should open the required ports in the
firewalld
service. If you want to remotely access PCP on the managed nodes, set this variable totrue
. metrics_username: <username>
-
The role creates this user locally on the managed node, adds the credentials to the
/etc/pcp/passwd.db
Simple Authentication and Security Layer (SASL) database, and configures authentication in PCP. Additionally, if you setmetrics_from_bpftrace: true
in the playbook, PCP uses this account to registerbpftrace
scripts.
For details about all variables used in the playbook, see the
/usr/share/ansible/roles/rhel-system-roles.metrics/README.md
file on the control node.Validate the playbook syntax:
$ ansible-playbook --ask-vault-pass --syntax-check ~/playbook.yml
Note that this command only validates the syntax and does not protect against a wrong but valid configuration.
Run the playbook:
$ ansible-playbook --ask-vault-pass ~/playbook.yml
Verification
On a host with the
pcp
package installed, query a metric that requires authentication:Query the metrics by using the credentials that you used in the playbook:
# pminfo -fmdt -h pcp://managed-node-01.example.com?username=<user> proc.fd.count Password: <password> proc.fd.count inst [844 or "000844 /var/lib/pcp/pmdas/proc/pmdaproc"] value 5
If the command succeeds, it returns the value of the
proc.fd.count
metric.Run the command again, but omit the username to verify that the command fails for unauthenticated users:
# pminfo -fmdt -h pcp://managed-node-01.example.com proc.fd.count proc.fd.count Error: No permission to perform requested operation
Next step
Additional resources
-
/usr/share/ansible/roles/rhel-system-roles.metrics/README.md
file -
/usr/share/doc/rhel-system-roles/metrics/
directory - Ansible vault
4.3. Setting up Grafana by using the metrics
RHEL system role to monitor multiple hosts with Performance Co-Pilot
If you have already configured Performance Co-Pilot (PCP) on multiple hosts, you can use an instance of Grafana to visualize the metrics for these hosts. You can display the live data and, if the PCP data is stored in a Redis database, also past data.
By using the metrics
RHEL system role, you can automate the process of setting up Grafana, the PCP plug-in, the optional Redis database, and the configuration of the data sources.
If you use the metrics
role to install Grafana on a host, the role also installs automatically PCP on this host.
Prerequisites
- You have prepared the control node and the managed nodes
- You are logged in to the control node as a user who can run playbooks on the managed nodes.
-
The account you use to connect to the managed nodes has
sudo
permissions on them. - PCP is configured for remote access on the hosts you want to monitor.
- The host on which you want to install Grafana can access port 44321 on the PCP nodes you plan to monitor.
Procedure
Store your sensitive variables in an encrypted file:
Create the vault:
$ ansible-vault create vault.yml New Vault password: <vault_password> Confirm New Vault password: <vault_password>
After the
ansible-vault create
command opens an editor, enter the sensitive data in the<key>: <value>
format:grafana_admin_pwd: <password>
- Save the changes, and close the editor. Ansible encrypts the data in the vault.
Create a playbook file, for example
~/playbook.yml
, with the following content:--- - name: Monitoring performance metrics hosts: managed-node-01.example.com vars_files: - vault.yml tasks: - name: Set up Grafana to monitor multiple hosts ansible.builtin.include_role: name: rhel-system-roles.metrics vars: metrics_graph_service: true metrics_query_service: true metrics_monitored_hosts: - <pcp_host_1.example.com> - <pcp_host_2.example.com> metrics_manage_firewall: true metrics_manage_selinux: true - name: Set Grafana admin password ansible.builtin.shell: cmd: grafana-cli admin reset-admin-password "{{ grafana_admin_pwd }}"
The settings specified in the example playbook include the following:
metrics_graph_service: true
-
Installs Grafana and the PCP plug-in. Additionally, the role adds the
PCP Vector
,PCP Redis
, andPCP bpftrace
data sources to Grafana. metrics_query_service: <true|false>
- Defines whether the role should install and configure Redis for centralized metric recording. If enabled, data collected from PCP clients is stored in Redis and, as a result, you can also display historical data instead of only live data.
metrics_monitored_hosts: <list_of_hosts>
- Defines the list of hosts to monitor. In Grafana, you can then display the data of these hosts and, additionally, the host that runs Grafana.
metrics_manage_firewall: <true|false>
-
Defines whether the role should open the required ports in the
firewalld
service. If you set this variable totrue
, you can, for example, access Grafana remotely.
For details about all variables used in the playbook, see the
/usr/share/ansible/roles/rhel-system-roles.metrics/README.md
file on the control node.Validate the playbook syntax:
$ ansible-playbook --ask-vault-pass --syntax-check ~/playbook.yml
Note that this command only validates the syntax and does not protect against a wrong but valid configuration.
Run the playbook:
$ ansible-playbook --ask-vault-pass ~/playbook.yml
Verification
-
Open
http://<grafana_server_IP_or_hostname>:3000
in your browser, and log in as theadmin
user with the password you set in the procedure. Display monitoring data:
To display live data:
-
Click
-
By default, the graphs display metrics from the host that runs Grafana. To switch to a different host, enter the hostname in the
hostspec
field and press Enter.
-
Click
-
To display historical data stored in a Redis database: Create a panel with a PCP Redis data source. This requires that you set
metrics_query_service: true
in the playbook.
Additional resources
-
/usr/share/ansible/roles/rhel-system-roles.metrics/README.md
file -
/usr/share/doc/rhel-system-roles/metrics/
directory - Ansible vault
4.4. Configuring web hooks in Performance Co-Pilot by using the metrics
RHEL system role
The Performance Co-Pilot (PCP) suite contains the performance metrics inference engine (PMIE) service. This service evaluates performance rules in real time. For example, you can use the default rules to detect excessive swap activities.
You can configure a host as a central PCP management site that collects the monitoring data from multiple PCP nodes. If a rule matches, this central host sends a notification to a web hook to notify other services. For example, the web hook can trigger Event-Driven Ansible to run on Ansible Automation Platform template or playbook on the host that had caused the event.
By using the metrics
RHEL system role, you can automate the configuration of a central PCP management host that notifies a web hook.
Prerequisites
- You have prepared the control node and the managed nodes
- You are logged in to the control node as a user who can run playbooks on the managed nodes.
-
The account you use to connect to the managed nodes has
sudo
permissions on them. - PCP is configured for remote access on the hosts you want to monitor.
- The host on which you want to configure PMIE can access port 44321 on the PCP nodes you plan to monitor.
Procedure
Create a playbook file, for example
~/playbook.yml
, with the following content:--- - name: Monitoring performance metrics hosts: managed-node-01.example.com tasks: - name: Configure PMIE web hooks ansible.builtin.include_role: name: redhat.rhel_system_roles.metrics vars: metrics_manage_firewall: true metrics_retention_days: 7 metrics_monitored_hosts: - pcp-node-01.example.com - pcp-node-02.example.com metrics_webhook_endpoint: "https://<webserver>:<port>/<endpoint>"
The settings specified in the example playbook include the following:
metrics_retention_days: <number>
-
Sets the number of days after which the
pmlogger_daily
systemd timer removes old PCP archives. metrics_manage_firewall: <true|false>
-
Defines whether the role should open the required ports in the
firewalld
service. If you want to remotely access PCP on the managed nodes, set this variable totrue
. metrics_monitored_hosts: <list_of_hosts>
- Specifies the hosts to observe.
metrics_webhook_endpoint: <URL>
- Sets the web hook endpoint to which the performance metrics inference engine (PMIE) sends notifications about detected performance issues. By default, these issues are logged to the local system only.
For details about all variables used in the playbook, see the
/usr/share/ansible/roles/rhel-system-roles.metrics/README.md
file on the control node.Validate the playbook syntax:
$ ansible-playbook --syntax-check ~/playbook.yml
Note that this command only validates the syntax and does not protect against a wrong but valid configuration.
Run the playbook:
$ ansible-playbook ~/playbook.yml
Verification
Check the configuration summary on
managed-node-node-01.example.com
:# ansible managed-node-01.example.com -m command -a 'pcp summary' Performance Co-Pilot configuration on managed-node-01.example.com: platform: Linux managed-node-node-01.example.com 5.14.0-427.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Feb 23 01:51:18 EST 2024 x86_64 hardware: 8 cpus, 1 disk, 1 node, 1773MB RAM timezone: CEST-2 services: pmcd pmproxy pmcd: Version 6.2.0-1, 12 agents, 6 clients pmda: root pmcd proc pmproxy xfs linux nfsclient mmv kvm jbd2 dm openmetrics pmlogger: primary logger: /var/log/pcp/pmlogger/managed-node-node-01.example.com/20240510.16.25 pcp-node-01.example.com: /var/log/pmlogger/pcp-node-01.example.com/20240510.16.25 pcp-node-02.example.com: /var/log/pmlogger/pcp-node-02.example.com/20240510.16.25 pmie: primary engine: /var/log/pcp/pmie/managed-node-node-01.example.com/pmie.log pcp-node-01.example.com: : /var/log/pcp/pmie/pcp-node-01.example.com/pmie.log pcp-node-02.example.com: : /var/log/pcp/pmie/pcp-node-02.example.com/pmie.log
The last three lines confirm that PMIE is configured to monitor three systems.
Additional resources
-
/usr/share/ansible/roles/rhel-system-roles.metrics/README.md
file -
/usr/share/doc/rhel-system-roles/metrics/
directory - Automate performance management with Performance Co-Pilot using Event-Driven Ansible blog post