This documentation is for a release that is no longer maintained
See documentation for the latest supported version 3 or the latest supported version 4.第 8 章 Troubleshooting and debugging metering
Use the following sections to help troubleshoot and debug specific issues with metering.
In addition to the information in this section, be sure to review the following topics:
8.1. Troubleshooting metering 复制链接链接已复制到粘贴板!
A common issue with metering is pods failing to start. Pods might fail to start due to lack of resources or if they have a dependency on a resource that does not exist, such as a StorageClass
or Secret
resource.
8.1.1. Not enough compute resources 复制链接链接已复制到粘贴板!
A common issue when installing or running metering is a lack of compute resources. As the cluster grows and more reports are created, the Reporting Operator pod requires more memory. If memory usage reaches the pod limit, the cluster considers the pod out of memory (OOM) and terminates it with an OOMKilled
status. Ensure that metering is allocated the minimum resource requirements described in the installation prerequisites.
The Metering Operator does not autoscale the Reporting Operator based on the load in the cluster. Therefore, CPU usage for the Reporting Operator pod does not increase as the cluster grows.
To determine if the issue is with resources or scheduling, follow the troubleshooting instructions included in the Kubernetes document Managing Compute Resources for Containers.
To troubleshoot issues due to a lack of compute resources, check the following within the openshift-metering
namespace.
Prerequisites
You are currently in the
openshift-metering
namespace. Change to theopenshift-metering
namespace by running:oc project openshift-metering
$ oc project openshift-metering
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Procedure
Check for metering
Report
resources that fail to complete and show the status ofReportingPeriodUnmetDependencies
:oc get reports
$ oc get reports
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME QUERY SCHEDULE RUNNING FAILED LAST REPORT TIME AGE namespace-cpu-utilization-adhoc-10 namespace-cpu-utilization Finished 2020-10-31T00:00:00Z 2m38s namespace-cpu-utilization-adhoc-11 namespace-cpu-utilization ReportingPeriodUnmetDependencies 2m23s namespace-memory-utilization-202010 namespace-memory-utilization ReportingPeriodUnmetDependencies 26s namespace-memory-utilization-202011 namespace-memory-utilization ReportingPeriodUnmetDependencies 14s
NAME QUERY SCHEDULE RUNNING FAILED LAST REPORT TIME AGE namespace-cpu-utilization-adhoc-10 namespace-cpu-utilization Finished 2020-10-31T00:00:00Z 2m38s namespace-cpu-utilization-adhoc-11 namespace-cpu-utilization ReportingPeriodUnmetDependencies 2m23s namespace-memory-utilization-202010 namespace-memory-utilization ReportingPeriodUnmetDependencies 26s namespace-memory-utilization-202011 namespace-memory-utilization ReportingPeriodUnmetDependencies 14s
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the
ReportDataSource
resources where theNEWEST METRIC
is less than the report end date:oc get reportdatasource
$ oc get reportdatasource
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the health of the
reporting-operator
Pod
resource for a high number of pod restarts:oc get pods -l app=reporting-operator
$ oc get pods -l app=reporting-operator
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME READY STATUS RESTARTS AGE reporting-operator-84f7c9b7b6-fr697 2/2 Running 542 8d
NAME READY STATUS RESTARTS AGE reporting-operator-84f7c9b7b6-fr697 2/2 Running 542 8d
1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The Reporting Operator pod is restarting at a high rate.
Check the
reporting-operator
Pod
resource for anOOMKilled
termination:oc describe pod/reporting-operator-84f7c9b7b6-fr697
$ oc describe pod/reporting-operator-84f7c9b7b6-fr697
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The Reporting Operator pod was terminated due to OOM kill.
Increasing the reporting-operator pod memory limit
If you are experiencing an increase in pod restarts and OOM kill events, you can check the current memory limit set for the Reporting Operator pod. Increasing the memory limit allows the Reporting Operator pod to update the report data sources. If necessary, increase the memory limit in your MeteringConfig
resource by 25% - 50%.
Procedure
Check the current memory limits of the
reporting-operator
Pod
resource:oc describe pod reporting-operator-67d6f57c56-79mrt
$ oc describe pod reporting-operator-67d6f57c56-79mrt
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The current memory limit for the Reporting Operator pod.
Edit the
MeteringConfig
resource to update the memory limit:oc edit meteringconfig/operator-metering
$ oc edit meteringconfig/operator-metering
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
MeteringConfig
resourceCopy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Add or increase memory limits within the
resources
field of theMeteringConfig
resource.
注意If there continue to be numerous OOM killed events after memory limits are increased, this might indicate that a different issue is causing the reports to be in a pending state.
8.1.2. StorageClass resource not configured 复制链接链接已复制到粘贴板!
Metering requires that a default StorageClass
resource be configured for dynamic provisioning.
See the documentation on configuring metering for information on how to check if there are any StorageClass
resources configured for the cluster, how to set the default, and how to configure metering to use a storage class other than the default.
8.1.3. Secret not configured correctly 复制链接链接已复制到粘贴板!
A common issue with metering is providing the incorrect secret when configuring your persistent storage. Be sure to review the example configuration files and create you secret according to the guidelines for your storage provider.