Chapter 3. Resource optimization for OpenShift optimization reports
Access resource optimization for OpenShift from the Red Hat Hybrid Cloud Console to see detailed recommendations for how to optimize your Red Hat OpenShift clusters.
3.1. Enabling optimization Copy linkLink copied to clipboard!
To receive resource optimization recommendations for your namespaces, you must first enable each namespace. To enable a namespace, label it with cost_management_optimizations='true'.
In the CLI, run: oc label namespace NAMESPACE cost_management_optimizations="true" --overwrite=true
3.2. Viewing optimization reports Copy linkLink copied to clipboard!
Prerequisites
- You added an OpenShift integration to Red Hat Hybrid Cloud Console.
- You uploaded at least 24 hours of data from the operator.
- You logged in to the Red Hat Hybrid Cloud Console.
Procedure
- In cost management, click the tab Optimizations.
- Search for an optimization or use the filter. Click the link to the optimization that you selected.
- View details about the recommendation and toggle between Cost optimizations and Performance optimizations. For more information, see Optimizing for cost or for performance.
3.3. Understanding efficiency scores Copy linkLink copied to clipboard!
The efficiency scores of your CPU and memory resource use translate technical optimization into business value. Efficiency scores put a monetary value on savings and waste.
Efficiency is calculated as the percentage of usage over the request, which is why efficiency scores can exceed 100% if the usage is higher than the requested resources. Wasted cost is derived from the total cost minus the cost of the used capacity, but this figure currently includes intentionally unallocated capacity for future growth and disaster recovery. When efficiency exceeds 100%, wasted cost is shown as $0.
To reduce spending, it is necessary to optimize resource allocation by reducing memory requests or adjusting CPU limits, or both. However, technical details of optimization, such as changes in megabytes of memory are often misunderstood. Metrics such as being 66% cost efficient or wasting $20,000 in a cluster are more tangible. These figures make it possible for financial departments to justify reallocating money by identifying the largest resource offenders and assigning development teams recommendations to act upon.
However, it is important to interpret efficiency scores within the context of your organization’s availability strategy. A seemingly low efficiency score, for example 51% CPU efficient, might be appropriate depending on the number of availability zones (AZs) deployed. If you have two AZs, a 50% cluster utilization (or 51% efficiency) is ideal to absorb the capacity of a failing cluster, but if you have three AZs, you need a 66% utilization rate. In addition, high efficiency is great for optimizing utilization but it leaves little room for cluster growth.
On the other hand, efficiency scores exceeding 100% can indicate a failure risk, especially with memory usage. In this case, more resources are being used than are available or guaranteed by requests. For memory, using more than 100% of the guaranteed request can cause the scheduler to crash the application entirely, making scores greater than 100% efficiency risky, particularly if all applications are operating above 100% efficiency. You must understand your environment, including the kind of resource being over-allocated and the available room for other workloads to understand the meaning of efficiency score.
OpenShift permits applications to use more resources than they request by defining a request (the guaranteed resource amount) and a limit (the maximum amount allowed). While CPU usage beyond the request can be throttled when reaching the limit (application running slower), memory usage beyond the request can cause the scheduler to kill the application, depending on node resources available. A common industry recommendation to mitigate memory risk is to set the memory limit equal to the memory request.
3.3.1. Viewing efficiency scores Copy linkLink copied to clipboard!
Use the efficiency scores of your CPU and memory resource usage to translate technical optimization into business value.
Procedure
- Navigate to Cost Management > Optimizations.
- Click the Efficiency tab. This tab contains the CPU workload efficiency and the Memory workload efficiency tables that display the efficiency percent and the amount of any wasted cost.
- Use the Group-by and Filter menus to organize and narrow your data by project or cluster.
- Hover over the tool tip next to CPU workload efficiency or Memory workload efficiency to see the formulas used in these calculations.
3.4. Optimizing for cost or for performance Copy linkLink copied to clipboard!
After you select an optimization, you can toggle between two tabs called Cost optimizations and Performance optimizations. Optimizing for cost uses less resources and is useful when you are performing tests where there is no impact to users. Optimizing for performance provides all the resources possible and is helpful for apps running in a production cluster.
In Cost optimizations, recommendations get generated when CPU usage is at or above the 60th percentile and memory usage is at the 100th percentile. In Performance optimizations, recommendations get generated when CPU usage is at or above the 98th percentile and when memory usage is at the 100th percentile.
3.5. Understanding box plots Copy linkLink copied to clipboard!
On the Optimizations page, there are two box plots for your Current CPU utilization and your Current memory utilization. These visualizations can help you understand resource distribution and identify outliers in your data. You can export the data in CSV and JSON format.
The box plots display the following data points in millicpu (m) for CPU and in mebibytes (Mi) for memory:
- Minimum
- Maximum
- Median
- First quartile (Q1): value where 25% of data points are when they are arranged in increasing order
- Third quartile (Q3): value where 75% of data points are when they are arranged in increasing order
- Recommended limit
- Recommended request
The data points are based on the time period that you selected:
- 1 day = 4 data points
- 7 days = 7 data points
- 14 days = 14 data points