Chapter 4. Configuring performance monitoring with PCP by using RHEL system roles


Performance Co-Pilot (PCP) is a system performance analysis toolkit. You can use it to record and analyze performance data from many components on a Red Hat Enterprise Linux system.

You can use the metrics RHEL system role to automate the installation and configuration of PCP, and the role can configure Grafana to visualize PCP metrics.

4.1. Configuring Performance Co-Pilot by using the metrics RHEL system role

You can use Performance Co-Pilot (PCP) to monitor many metrics, such as CPU utilization and memory usage. For example, this can help to identify resource and performance bottlenecks. By using the metrics RHEL system role, you can remotely configure PCP on multiple hosts to record metrics.

Prerequisites

Procedure

  1. Create a playbook file, for example ~/playbook.yml, with the following content:

    ---
    - name: Monitoring performance metrics
      hosts: managed-node-01.example.com
      tasks:
        - name: Configure Performance Co-Pilot
          ansible.builtin.include_role:
            name: rhel-system-roles.metrics
          vars:
            metrics_retention_days: 14
            metrics_manage_firewall: true
            metrics_manage_selinux: true

    The settings specified in the example playbook include the following:

    metrics_retention_days: <number>
    Sets the number of days after which the pmlogger_daily systemd timer removes old PCP archives.
    metrics_manage_firewall: <true|false>
    Defines whether the role should open the required ports in the firewalld service. If you want to remotely access PCP on the managed nodes, set this variable to true.

    For details about all variables used in the playbook, see the /usr/share/ansible/roles/rhel-system-roles.metrics/README.md file on the control node.

  2. Validate the playbook syntax:

    $ ansible-playbook --syntax-check ~/playbook.yml

    Note that this command only validates the syntax and does not protect against a wrong but valid configuration.

  3. Run the playbook:

    $ ansible-playbook ~/playbook.yml

Verification

  • Query a metric, for example:

    # ansible managed-node-01.example.com -m command -a 'pminfo -f kernel.all.load'

Additional resources

  • /usr/share/ansible/roles/rhel-system-roles.metrics/README.md file
  • /usr/share/doc/rhel-system-roles/metrics/ directory

4.2. Configuring Performance Co-Pilot with authentication by using the metrics RHEL system role

You can enable authentication in Performance Co-Pilot (PCP) so that the pmcd service and Performance Metrics Domain Agents (PDMAs) can determine whether the user running the monitoring tools is allowed to perform an action. Authenticated users have access to metrics with sensitive information. Additionally, certain agents require authentication. For example, the bpftrace agent uses authentication to identify whether a user is allowed to load bpftrace scripts into the kernel to generate metrics.

By using the metrics RHEL system role, you can remotely configure PCP with authentication on multiple hosts.

Prerequisites

Procedure

  1. Store your sensitive variables in an encrypted file:

    1. Create the vault:

      $ ansible-vault create vault.yml
      New Vault password: <vault_password>
      Confirm New Vault password: <vault_password>
    2. After the ansible-vault create command opens an editor, enter the sensitive data in the <key>: <value> format:

      metrics_usr: <username>
      metrics_pwd: <password>
    3. Save the changes, and close the editor. Ansible encrypts the data in the vault.
  2. Create a playbook file, for example ~/playbook.yml, with the following content:

    ---
    - name: Monitoring performance metrics
      hosts: managed-node-01.example.com
      tasks:
        - name: Configure Performance Co-Pilot
          ansible.builtin.include_role:
            name: rhel-system-roles.metrics
          vars:
            metrics_retention_days: 14
            metrics_manage_firewall: true
            metrics_manage_selinux: true
    	metrics_username: "{{ metrics_usr }}"
            metrics_password: "{{ metrics_pwd }}"

    The settings specified in the example playbook include the following:

    metrics_retention_days: <number>
    Sets the number of days after which the pmlogger_daily systemd timer removes old PCP archives.
    metrics_manage_firewall: <true|false>
    Defines whether the role should open the required ports in the firewalld service. If you want to remotely access PCP on the managed nodes, set this variable to true.
    metrics_username: <username>
    The role creates this user locally on the managed node, adds the credentials to the /etc/pcp/passwd.db Simple Authentication and Security Layer (SASL) database, and configures authentication in PCP. Additionally, if you set metrics_from_bpftrace: true in the playbook, PCP uses this account to register bpftrace scripts.

    For details about all variables used in the playbook, see the /usr/share/ansible/roles/rhel-system-roles.metrics/README.md file on the control node.

  3. Validate the playbook syntax:

    $ ansible-playbook --ask-vault-pass --syntax-check ~/playbook.yml

    Note that this command only validates the syntax and does not protect against a wrong but valid configuration.

  4. Run the playbook:

    $ ansible-playbook --ask-vault-pass ~/playbook.yml

Verification

  • On a host with the pcp package installed, query a metric that requires authentication:

    1. Query the metrics by using the credentials that you used in the playbook:

      # pminfo -fmdt -h pcp://managed-node-01.example.com?username=<user> proc.fd.count
      Password: <password>
      
      proc.fd.count
          inst [844 or "000844 /var/lib/pcp/pmdas/proc/pmdaproc"] value 5

      If the command succeeds, it returns the value of the proc.fd.count metric.

    2. Run the command again, but omit the username to verify that the command fails for unauthenticated users:

      # pminfo -fmdt -h pcp://managed-node-01.example.com proc.fd.count
      
      proc.fd.count
      Error: No permission to perform requested operation

Additional resources

  • /usr/share/ansible/roles/rhel-system-roles.metrics/README.md file
  • /usr/share/doc/rhel-system-roles/metrics/ directory
  • Ansible vault

4.3. Setting up Grafana by using the metrics RHEL system role to monitor multiple hosts with Performance Co-Pilot

If you have already configured Performance Co-Pilot (PCP) on multiple hosts, you can use an instance of Grafana to visualize the metrics for these hosts. You can display the live data and, if the PCP data is stored in a Redis database, also past data.

By using the metrics RHEL system role, you can automate the process of setting up Grafana, the PCP plug-in, the optional Redis database, and the configuration of the data sources.

Note

If you use the metrics role to install Grafana on a host, the role also installs automatically PCP on this host.

Prerequisites

Procedure

  1. Store your sensitive variables in an encrypted file:

    1. Create the vault:

      $ ansible-vault create vault.yml
      New Vault password: <vault_password>
      Confirm New Vault password: <vault_password>
    2. After the ansible-vault create command opens an editor, enter the sensitive data in the <key>: <value> format:

      grafana_admin_pwd: <password>
    3. Save the changes, and close the editor. Ansible encrypts the data in the vault.
  2. Create a playbook file, for example ~/playbook.yml, with the following content:

    ---
    - name: Monitoring performance metrics
      hosts: managed-node-01.example.com
      vars_files:
        - vault.yml
      tasks:
        - name: Set up Grafana to monitor multiple hosts
          ansible.builtin.include_role:
            name: rhel-system-roles.metrics
          vars:
            metrics_graph_service: true
            metrics_query_service: true
            metrics_monitored_hosts:
              - <pcp_host_1.example.com>
              - <pcp_host_2.example.com>
            metrics_manage_firewall: true
            metrics_manage_selinux: true
    
        - name: Set Grafana admin password
          ansible.builtin.shell:
            cmd: grafana-cli admin reset-admin-password "{{ grafana_admin_pwd }}"

    The settings specified in the example playbook include the following:

    metrics_graph_service: true
    Installs Grafana and the PCP plug-in. Additionally, the role adds the PCP Vector, PCP Redis, and PCP bpftrace data sources to Grafana.
    metrics_query_service: <true|false>
    Defines whether the role should install and configure Redis for centralized metric recording. If enabled, data collected from PCP clients is stored in Redis and, as a result, you can also display historical data instead of only live data.
    metrics_monitored_hosts: <list_of_hosts>
    Defines the list of hosts to monitor. In Grafana, you can then display the data of these hosts and, additionally, the host that runs Grafana.
    metrics_manage_firewall: <true|false>
    Defines whether the role should open the required ports in the firewalld service. If you set this variable to true, you can, for example, access Grafana remotely.

    For details about all variables used in the playbook, see the /usr/share/ansible/roles/rhel-system-roles.metrics/README.md file on the control node.

  3. Validate the playbook syntax:

    $ ansible-playbook --ask-vault-pass --syntax-check ~/playbook.yml

    Note that this command only validates the syntax and does not protect against a wrong but valid configuration.

  4. Run the playbook:

    $ ansible-playbook --ask-vault-pass ~/playbook.yml

Verification

  1. Open http://<grafana_server_IP_or_hostname>:3000 in your browser, and log in as the admin user with the password you set in the procedure.
  2. Display monitoring data:

    • To display live data:

      1. Click Menu Apps Performance Co-Pilot PCP Vector Checklist
      2. By default, the graphs display metrics from the host that runs Grafana. To switch to a different host, enter the hostname in the hostspec field and press Enter.
    • To display historical data stored in a Redis database: Create a panel with a PCP Redis data source. This requires that you set metrics_query_service: true in the playbook.

Additional resources

  • /usr/share/ansible/roles/rhel-system-roles.metrics/README.md file
  • /usr/share/doc/rhel-system-roles/metrics/ directory
  • Ansible vault

4.4. Configuring web hooks in Performance Co-Pilot by using the metrics RHEL system role

The Performance Co-Pilot (PCP) suite contains the performance metrics inference engine (PMIE) service. This service evaluates performance rules in real time. For example, you can use the default rules to detect excessive swap activities.

You can configure a host as a central PCP management site that collects the monitoring data from multiple PCP nodes. If a rule matches, this central host sends a notification to a web hook to notify other services. For example, the web hook can trigger Event-Driven Ansible to run on Ansible Automation Platform template or playbook on the host that had caused the event.

By using the metrics RHEL system role, you can automate the configuration of a central PCP management host that notifies a web hook.

Prerequisites

Procedure

  1. Create a playbook file, for example ~/playbook.yml, with the following content:

    ---
    - name: Monitoring performance metrics
      hosts: managed-node-01.example.com
      tasks:
        - name: Configure PMIE web hooks
          ansible.builtin.include_role:
            name: redhat.rhel_system_roles.metrics
          vars:
            metrics_manage_firewall: true
            metrics_retention_days: 7
            metrics_monitored_hosts:
              - pcp-node-01.example.com
              - pcp-node-02.example.com
            metrics_webhook_endpoint: "https://<webserver>:<port>/<endpoint>"

    The settings specified in the example playbook include the following:

    metrics_retention_days: <number>
    Sets the number of days after which the pmlogger_daily systemd timer removes old PCP archives.
    metrics_manage_firewall: <true|false>
    Defines whether the role should open the required ports in the firewalld service. If you want to remotely access PCP on the managed nodes, set this variable to true.
    metrics_monitored_hosts: <list_of_hosts>
    Specifies the hosts to observe.
    metrics_webhook_endpoint: <URL>
    Sets the web hook endpoint to which the performance metrics inference engine (PMIE) sends notifications about detected performance issues. By default, these issues are logged to the local system only.

    For details about all variables used in the playbook, see the /usr/share/ansible/roles/rhel-system-roles.metrics/README.md file on the control node.

  2. Validate the playbook syntax:

    $ ansible-playbook --syntax-check ~/playbook.yml

    Note that this command only validates the syntax and does not protect against a wrong but valid configuration.

  3. Run the playbook:

    $ ansible-playbook ~/playbook.yml

Verification

  1. Check the configuration summary on managed-node-node-01.example.com:

    # ansible managed-node-01.example.com -m command -a 'pcp summary'
    Performance Co-Pilot configuration on managed-node-01.example.com:
    
     platform: Linux managed-node-node-01.example.com 5.14.0-427.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Feb 23 01:51:18 EST 2024 x86_64
     hardware: 8 cpus, 1 disk, 1 node, 1773MB RAM
     timezone: CEST-2
     services: pmcd pmproxy
     pmcd: Version 6.2.0-1, 12 agents, 6 clients
     pmda: root pmcd proc pmproxy xfs linux nfsclient mmv kvm jbd2
           dm openmetrics
     pmlogger: primary logger: /var/log/pcp/pmlogger/managed-node-node-01.example.com/20240510.16.25
               pcp-node-01.example.com: /var/log/pmlogger/pcp-node-01.example.com/20240510.16.25
               pcp-node-02.example.com: /var/log/pmlogger/pcp-node-02.example.com/20240510.16.25
     pmie: primary engine: /var/log/pcp/pmie/managed-node-node-01.example.com/pmie.log
           pcp-node-01.example.com: : /var/log/pcp/pmie/pcp-node-01.example.com/pmie.log
           pcp-node-02.example.com: : /var/log/pcp/pmie/pcp-node-02.example.com/pmie.log

    The last three lines confirm that PMIE is configured to monitor three systems.

Additional resources

Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.