Assessing and Monitoring RHEL Resource Optimization with Insights for Red Hat Enterprise Linux
Understanding RHEL resource-usage statistics
Abstract
Chapter 1. The resource-optimization service for public-cloud systems Copy linkLink copied to clipboard!
The Red Hat Insights for Red Hat Enterprise Linux resource optimization service enables RHEL customers to assess and monitor their public RHEL cloud usage and optimization. The service shows metrics for the following:
- CPU
- Memory
- Disk-usage
It analyzes those metrics and compares them to resource limits recommended by your public cloud provider. Leveraging data from the past day, the resource optimization service considers each resource parameter in several distinct ways and returns actionable data. This data enables better resource allocation and helps you to save money on your public cloud investment.
Features
The service reveals the following information:
- Utilization and optimization data for existing systems in the Insights for Red Hat Enterprise Linux inventory.
- Range of systems running in the public cloud.
- Overview of system characteristics.
- Highlights potential issues.
- Formulates suggestions for issue resolution.
1.1. Resource optimization service core concepts Copy linkLink copied to clipboard!
1.1.1. The resource optimization service performance rules Copy linkLink copied to clipboard!
Use the resource optimization service to view performance metrics from your managed hosts that run in the supported public cloud, Amazon Web Services (AWS). The service uses a framework called the Performance Co-Pilot (PCP) toolkit to record performance metrics. These metrics empower you to make better business decisions.
Insights performance rules
The performance rules are sets of conditions that are applied to the data collected by PCP. They identify the following system states:
- Undersized. The undersized state is determined by examining CPU, RAM and disk input/output (I/O) usage, and combining that with CPU idle time, over a period of 24 hours. If that results in a high score, the resource optimization service labels the system as too small for its workload. A system will be reported as undersized whenever any of the dimensions are undersized.
- Oversized. The oversized state is determined by examining CPU, RAM and disk I/O usage, and combining that with CPU idle time, over a period of 24 hours. If that results in a low score, the resource optimization service labels the system as too big for its workload. A system will be reported as oversized only if all of the dimensions are oversized.
- Idling. The idling state is determined by examining CPU, RAM and disk I/O usage, and combining that with CPU idle time, over a period of 24 hours. If that results in very low utilization, the resource optimization service labels the system as appropriate for its workload but underused. The idling condition can be viewed as a needs improvement scenario.
- Optimized. The optimized state is determined by examining CPU, RAM and disk I/O usage, and combining that with CPU idle time, over a period of 24 hours. If that results in a middle point, the resource optimization service labels the system as optimized.
- Under pressure. This state is only active when Kernel Pressure Stall Information (PSI) has been enabled. Systems are labeled as under pressure when they are optimized utilization-wise, but some pressure condition persists.
The resource optimization service measures the system’s state and the desired performance criteria that you have set, in order to assign a score to the system.
1.1.2. Data security guarantee for the resource optimization service Copy linkLink copied to clipboard!
The resource optimization service adheres to the data and application security practices for Red Hat Insights for Red Hat Enterprise Linux services. For more details see Security.
1.1.3. Performance metrics for resource optimization Copy linkLink copied to clipboard!
The resource optimization service installs the pcp package on your system and runs two services, pmcd and pmlogger. Both are part of the Performance Co-Pilot (PCP) toolkit, which monitor and process specific metrics on your system. Metrics are stored in an archive, which the Insights client uploads to Red Hat Insights for Red Hat Enterprise Linux.
1.1.4. Access usage metrics for the resource optimization service Copy linkLink copied to clipboard!
The resource optimization service captures data from the previous day and provides system utilization metrics after 24 hours. By default, the archive is uploaded to Insights for Red Hat Enterprise Linux at 12:00am +/- 1 hour, local system time. However, the time when this data is uploaded can be configured in the Performance Co-Pilot (PCP) toolkit configuration.
1.1.5. User Access settings in the Red Hat Hybrid Cloud Console Copy linkLink copied to clipboard!
User Access is the Red Hat implementation of role-based access control (RBAC). Your Organization Administrator uses User Access to configure what users can see and do on the Red Hat Hybrid Cloud Console (the console):
- Control user access by organizing roles instead of assigning permissions individually to users.
- Create groups that include roles and their corresponding permissions.
- Assign users to these groups, allowing them to inherit the permissions associated with their group’s roles.
1.1.5.1. Predefined User Access groups and roles Copy linkLink copied to clipboard!
To make groups and roles easier to manage, Red Hat provides two predefined groups and a set of predefined roles:
Predefined groups
The Default access group contains all users in your organization. Many predefined roles are assigned to this group. It is automatically updated by Red Hat.
NoteIf the Organization Administrator makes changes to the Default access group its name changes to Custom default access group and it is no longer updated by Red Hat.
The Default admin access group contains only users who have Organization Administrator permissions. This group is automatically maintained and users and roles in this group cannot be changed.
On the Hybrid Cloud Console navigate to Red Hat Hybrid Cloud Console > the Settings icon (⚙) > Identity & Access Management > User Access > Groups to see the current groups in your account. This view is limited to the Organization Administrator.
Predefined roles assigned to groups
The Default access group contains many of the predefined roles. Because all users in your organization are members of the Default access group, they inherit all permissions assigned to that group.
The Default admin access group includes many (but not all) predefined roles that provide update and delete permissions. The roles in this group usually include administrator in their name.
On the Hybrid Cloud Console navigate to Red Hat Hybrid Cloud Console > the Settings icon (⚙) > Identity & Access Management > User Access > Roles to see the current roles in your account. You can see how many groups each role is assigned to. This view is limited to the Organization Administrator.
1.1.5.2. Access permissions Copy linkLink copied to clipboard!
The Prerequisites for each procedure list which predefined role provides the permissions you must have. As a user, you can navigate to Red Hat Hybrid Cloud Console > the Settings icon (⚙) > My User Access to view the roles and application permissions currently inherited by you.
If you try to access Insights for Red Hat Enterprise Linux features and see a message that you do not have permission to perform this action, you must obtain additional permissions. The Organization Administrator or the User Access administrator for your organization configures those permissions.
Use the Red Hat Hybrid Cloud Console Virtual Assistant to ask "Contact my Organization Administrator". The assistant sends an email to the Organization Administrator on your behalf.
Additional resources
For more information about user access and permissions, see User Access Configuration Guide for Role-based Access Control (RBAC).
1.1.5.3. User Access roles for resource optimization users Copy linkLink copied to clipboard!
The following roles enable standard or enhanced access to resource optimization service features in Insights for Red Hat Enterprise Linux:
- Resource optimization viewer. Read any resource optimization service resource.
- Resource optimization administrator. Perform any available operation against any resource optimization service resource.
Chapter 2. Installing and configuring the resource-optimization components Copy linkLink copied to clipboard!
Installing resource optimization involves installing packages, configuring settings and enabling local services. This can be done manually, or with an Ansible playbook provided by Red Hat.
Pay as you go (PAYG) customers need to register the Insights client with subscription-manager (RHSM). There are two ways to register with subscription-manager:
- Using activation keys (recommended)
- Using your user name and password
For more information about how to register the Insights client, refer to Client Configuration Guide for Red Hat Insights.
| RHEL Versions | Cloud Provider | Resource Optimization Compatibility |
|---|---|---|
| 8.x-9.x | AWS | Yes (x86_64 and ARM 64-bit) |
| 7.7-7.9 | AWS | Yes (x86_64 and ARM 64-bit) |
| 7.0-7.6 | AWS | No |
| 6.x | AWS | No |
Prerequisites
The following applications and configurations need to be installed or confirmed before the resource optimization service can be used:
- Cloud marketplace RHEL instance is configured.
- The Insights client is installed on the system and is operational.
If you want to use Ansible to install or uninstall the resource optimization service:
- The Ansible repository is enabled and the Ansible client is installed on each system.
- The system administrator can run Ansible Playbooks.
2.1. Installing and configuring enhanced data collection for resource optimization Copy linkLink copied to clipboard!
The Red Hat Enterprise Linux Performance Co-Pilot (PCP) utility selects and gathers resource optimization (ROS) client-side usage metrics for Insights for Red Hat Enterprise Linux.
The Insights client detects the presence of the ROS PCP configuration file (/var/lib/pcp/config/pmlogger/config.ros). When it finds the file, it launches pmlogger to collect performance metrics data. PCP compiles these metrics into daily summaries. The RHEL pmlogger service then uploads the summaries to Insights for analysis.
Using config.ros results in one data point per upload with a subset of available metrics. This leads to limited information available to Insights recommendations. To enhance the data available to Insights and to improve recommendations, install the RHEL pcp-zeroconf package for your version of RHEL.
The pcp-zeroconf package automates PCP installation. Once you install pcp-zeroconf, it automatically starts pmlogger. pmlogger gathers raw data on the default set of usage metrics and records the raw data into archive files.
2.1.1. Installing insights-client-ros and pcp-zeroconf on systems running RHEL 9 and later Copy linkLink copied to clipboard!
The insights-client-ros package includes a dependency for pcp-zeroconf. When you install insights-client-ros on your system, the process also automatically installs pcp-zeroconf. You need to install the insights-client-ros package on each system in your environment.
Once pcp-zeroconf is installed, its default configuration overrides the legacy settings in config.ros. The Insights client then uses the ROS data that pcp-zeroconf collects.
Once you install insights-client-ros, pcp-zeroconf automatically starts pmlogger. pmlogger gathers raw data on the default set of usage metrics, and records the raw data into archive files.
Prerequisites
-
You are logged in to the system as root, or have elevated permissions using
sudo. -
Your system is registered using
subscription-managerand has access to therhel-<RHEL-Version>-for-<arch>-appstream-rpmsrepository.<RHEL-Version>is the version of RHEL installed on the system, and<arch>is the system architecture, such as x86_64.
Procedure
Use
dnfto install theinsights-client-rospackage.dnf install insights-client-ros -y
# dnf install insights-client-ros -yCopy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that
pmloggeris running.pcp | grep pmlogger
# pcp | grep pmloggerCopy to Clipboard Copied! Toggle word wrap Toggle overflow pmlogger: primary logger: /var/log/pcp/pmlogger/<hostname>/<date.hours.minutes>
pmlogger: primary logger: /var/log/pcp/pmlogger/<hostname>/<date.hours.minutes>Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Open the
/etc/insights-client/insights-client.conffile with an editor. Look for the
ros_collectsetting in the file. Edit the file to setros_collecttotrue.Note: If
ros_collectis set tofalse,pcp-zeroconfdoes not collect and report ROS data.- Save the file and exit the editor.
2.1.2. Removing the ros directory on systems running RHEL 9 and later Copy linkLink copied to clipboard!
insights-core collects today-1 (the previous day’s) data from the system. Before removing the legacy ros directory, you should ensure that after installing pcp-zeroconf, you use the UI or API to check ROS to verify that it displays the details based on the latest uploads for your system.
Prerequisites
-
You are logged in to the system as root, or have elevated permissions using
sudo.
Procedure
Remove the
rosdirectory.rm -f /var/log/pcp/pmlogger/ros
# rm -f /var/log/pcp/pmlogger/rosCopy to Clipboard Copied! Toggle word wrap Toggle overflow
2.1.3. Installing pcp-zeroconf on systems running RHEL 7 or RHEL 8 Copy linkLink copied to clipboard!
In RHEL versions 7 and 8, you must explicitly install pcp-zeroconf on each system in your environment. Once you install pcp-zeroconf, its default configuration overrides the legacy settings in config.ros. The Insights client then uses the ROS data that pcp-zeroconf collects.
Prerequisites
-
You are logged in to the system as root, or have elevated permissions using
sudo. -
Your system is registered using
subscription-managerand has access torhel-<RHEL-Version>-for-<arch>-appstream-rpmsrepository.<RHEL-Version>is the version of RHEL installed on the system and<arch>is the system architecture, such as x86_64.
Procedure
Use
dnfto install thepcp-zeroconfpackage.dnf install pcp-zeroconf -y
# dnf install pcp-zeroconf -yCopy to Clipboard Copied! Toggle word wrap Toggle overflow -
Open the
/etc/insights-client/insights-client.conffile with an editor. Look for the
ros_collectsetting in the file. Edit the file to setros_collecttotrue. If theros_collectsetting does not exist, then add a new setting with its value set totrue.Note: If
ros_collectis set tofalse,pcp-zeroconfdoes not collect and report ROS data.- Save the file and exit the editor.
2.1.4. Removing the ros directory and config.ros on systems running RHEL 7 or RHEL 8 Copy linkLink copied to clipboard!
insights-core collects today-1 (the previous day’s) data from the system. Before removing the legacy ros directory, ensure that after installing pcp-zeroconf, you use the UI or API to check ROS to verify that it displays the details based on the latest uploads for your system.
Prerequisites
-
You are logged in to the system as root, or have elevated permissions using
sudo.
Procedure
Remove the
config.rosfile.rm /var/lib/pcp/config/pmlogger/config.ros
# rm /var/lib/pcp/config/pmlogger/config.rosCopy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the
rosdirectory.rm -f /var/log/pcp/pmlogger/ros
# rm -f /var/log/pcp/pmlogger/rosCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Additional resources
-
Installing and using the
pcp-zeroconfpackage for Performance Co-Pilot (PCP) -
Setting up PCP with
pcp-zeroconf For more information about PCP and
pcp-zeroconf, refer to the performance guide for your version of RHEL.-
For more information about
pmlogger, see thepmloggermanpage on your system.
2.1.5. Troubleshooting Copy linkLink copied to clipboard!
If pcp-zeroconf is not installed or the pmlogger service is not running, the log file /var/log/insights-client/insights-client.log might contain warnings for the related upload(s).
2.2. Enabling Kernel Pressure Stall Information (PSI) Copy linkLink copied to clipboard!
PSI provides a canonical way to see resource pressure increases as they develop. There are pressure metrics for three major resources: memory, CPU, and input/output (I/O). PSI is available on RHEL 8 and newer versions, and is disabled by default.
When PSI is enabled, the resource optimization service can augment its findings and provide more details and better suggestions. Enabling PSI is strongly recommended to identify peaks.
Procedure
To enable Kernel PSI on RHEL, update all kernel entries with the following command:
sudo grubby --update-kernel=ALL --args="psi=1"
$ sudo grubby --update-kernel=ALL --args="psi=1"Copy to Clipboard Copied! Toggle word wrap Toggle overflow Reboot the system.
ImportantEnabling PSI incurs a slight (<1%) performance hit.
Verification
-
When PSI is enabled, files for CPU, memory and IO appear under
/proc/pressure.
2.3. Enabling notifications and integrations in the resource optimization service Copy linkLink copied to clipboard!
You can enable the notifications service on Red Hat Hybrid Cloud Console to send notifications whenever the resource optimization service detects an issue and generates a suggestion. Using the notifications service frees you from having to continually check the Red Hat Insights for Red Hat Enterprise Linux dashboard for recommendations.
For example, you can configure the notifications service to automatically send an email message whenever the resource optimization service generates a suggestion.
Enabling the notifications service requires three main steps:
- First, an Organization Administrator creates a User access group with the Notifications administrator role, and then adds account members to the group.
- Next, a Notifications administrator sets up behavior groups for events in the notifications service. Behavior groups specify the delivery method for each notification. For example, a behavior group can specify whether email notifications are sent to all users, or just to Organization administrators.
- Finally, users who receive email notifications from events must set their user preferences so that they receive individual emails for each event.
In addition to sending email messages, you can configure the notifications service to pull event data in other ways:
- Using an authenticated client to query Red Hat Insights APIs for event data.
- Using webhooks to send events to third-party applications that accept inbound requests.
- Integrating notifications with applications such as Splunk to route resource optimization recommendations to the application dashboard.
Chapter 3. Viewing resource optimization reports Copy linkLink copied to clipboard!
Historical data reports are available to help you assess your level of optimization over time, in order to make informed decisions about your future public cloud investment.
3.1. Viewing historical utilization data Copy linkLink copied to clipboard!
The resource optimization service enables you to see how your system utilization scores have been trending over the last 7-45 days. The service displays a bar chart that indicates CPU Utilization and Memory Utilization percentages on a daily basis.
Complete the following steps to view, filter, and sort system historical utilization data:
Procedure
- Navigate to the Business > Resource Optimization page. The system states screen opens.
- Click on the Name header on the left side of the page to filter by Name, State or Operating system. Use the sort arrow to the right of each column name to sort by OS, CPU, Memory Utilization, I/O Output, Suggestions, State, and Last Reported. Clicking once sorts the column so that optimized systems are displayed first. Clicking a second time sorts the column so that systems categorized as Waiting for data are displayed first.
- Systems that have been analyzed render in blue. Click on the blue system name for a more detailed view.
- Click on the Actions dropdown to see the system’s properties in Inventory, such as operating system, infrastructure, configuration, BIOS and other data.
- By default, the resource optimization system displays 7 days of utilization results. Click on the dropdown labeled Last 7 Days to view 45 days of utilization data. To view specific days and the utilization scores for those days, use the mouse wheel and buttons to pan and zoom across the bar chart.
- Scroll down to see specific suggestions for that system.
3.2. Downloading resource optimization service reports Copy linkLink copied to clipboard!
You can download the resource optimization reports for all registered systems. The report identifies the following data gathered over the last 7- 45 days:
- Registered systems. This section details the number of systems that are optimal, non-optimal, and stale. The optimized state is determined by examining CPU, RAM, and disk I/O usage, combined with CPU idle time, over a period of 24 hours. If the calculation, based on the examination of the three factors, results in a middle point, the resource optimization service labels the system as optimized. A stale system is defined as one that has not submitted data to the resource optimization service in 7 days.
- Kernel pressure stall information (PSI). This is an analysis of the number of systems that have PSI enabled and the number of systems that have NOT enabled PSI. PSI allows you to receive better system recommendations since it can identify resource pressure increases as they develop.
- System performance issues. Specific performance issues such as RAM or CPU related peaks are identified along with the number of occurrences.
- Most used current instance types. The service will evaluate and display your top 5 most frequently used instance types across all registered systems.
- Suggested instance types. The service identifies the top 5 frequently suggested instance types based on the most recent utilization metrics. This may indicate that a change is necessary for better resource allocation.
- Suggested instance types in 45 days. This metric displays the top 5 frequently suggested instance types based on 45 days of historical data. You can also view the effectiveness of changes you have made in the recent past.
Prerequisites
The following prerequisites and conditions must be met to create a PDF of the executive report:
- The Insights client is installed on the system and is operational.
- Performance Co-Pilot is installed and correctly configured.
- At least one system is registered and sending data to the resource optimization.
The longer your systems have been sending information to the resource optimization, the more accurate and valuable the recommendations will be.
Procedure
- Navigate to Business > Resource Optimization.
- In the top right corner, click on Download executive report.
- You will see a dialog box with the message, Export successful and notice the PDF file in your taskbar.
Additional Resources
Chapter 4. Disabling the resource optimization service Copy linkLink copied to clipboard!
4.1. Removing resource optimization files and data Copy linkLink copied to clipboard!
Using Ansible to disable the resource optimization service
Perform the following steps on each system to disable and uninstall the resource optimization service.
Procedure
Download the Ansible Playbook with the following command:
curl -O https://raw.githubusercontent.com/RedHatInsights/ros-backend/v1.0/ansible-playbooks/ros_disable.yml
$ curl -O https://raw.githubusercontent.com/RedHatInsights/ros-backend/v1.0/ansible-playbooks/ros_disable.ymlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Run the Ansible Playbook using command:
ansible-playbook -c local ros_disable_and_clean_up.yml
# ansible-playbook -c local ros_disable_and_clean_up.ymlCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Uninstalling the playbook does not stop or remove the Performance Co-Pilot (PCP) toolkit. Note that PCP may support multiple applications. If you are using PCP exclusively for the resource optimization service, and desire to remove PCP as well, there are a couple options. You can stop and disable the pmlogger and pmcd services, or remove PCP completely by uninstalling the pcp package from the system.
Manually disabling the resource optimization service without the use of Ansible
The use of Ansible is recommended to expedite the uninstallation process. If you choose to not use Ansible, use the manual procedure that follows:
Procedure
Disable resource optimization service metrics collection by removing this line from
/etc/pcp/pmlogger/control.d/localLOCALHOSTNAME n y PCP_LOG_DIR/pmlogger/ros -r -T24h10m -c config.ros -v 100Mb
LOCALHOSTNAME n y PCP_LOG_DIR/pmlogger/ros -r -T24h10m -c config.ros -v 100MbCopy to Clipboard Copied! Toggle word wrap Toggle overflow Restart PCP so that resource optimization service metrics collection is effectively stopped:
sudo systemctl pmcd pmlogger
$ sudo systemctl pmcd pmloggerCopy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the resource optimization service configuration file
sudo rm /var/lib/pcp/config/pmlogger/config.ros
$ sudo rm /var/lib/pcp/config/pmlogger/config.rosCopy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the resource optimization data from the system
sudo rm -rf /var/log/pcp/pmlogger/ros
$ sudo rm -rf /var/log/pcp/pmlogger/rosCopy to Clipboard Copied! Toggle word wrap Toggle overflow If you are not using PCP for anything else, you can remove it from your system
sudo yum remove pcp
$ sudo yum remove pcpCopy to Clipboard Copied! Toggle word wrap Toggle overflow
4.2. Disabling kernel pressure stall information (PSI) Copy linkLink copied to clipboard!
Procedure
-
Edit the
/etc/default/grubfile and removepsi=1from the GRUB_CMDLINE_LINUX line. Regenerate the grub configuration file.
sudo grub2-mkconfig -o /boot/grub2/grub.cfg
[user]$ sudo grub2-mkconfig -o /boot/grub2/grub.cfgCopy to Clipboard Copied! Toggle word wrap Toggle overflow - Reboot the system.
Verification step
When PSI is disabled, /proc/pressure does not exist.
Providing feedback on Red Hat documentation Copy linkLink copied to clipboard!
We appreciate and prioritize your feedback regarding our documentation. Provide as much detail as possible, so that your request can be quickly addressed.
Prerequisites
- You are logged in to the Red Hat Customer Portal.
Procedure
To provide feedback, perform the following steps:
- Click the following link: Create Issue
- Describe the issue or enhancement in the Summary text box.
- Provide details about the issue or requested enhancement in the Description text box.
- Type your name in the Reporter text box.
- Click the Create button.
This action creates a documentation ticket and routes it to the appropriate documentation team. Thank you for taking the time to provide feedback.