主页
产品
OpenShift Container Platform
4.5
Metering
第 8 章 Troubleshooting and debugging metering

第 8 章 Troubleshooting and debugging metering

Use the following sections to help troubleshoot and debug specific issues with metering.

In addition to the information in this section, be sure to review the following topics:

8.1. Troubleshooting metering
复制链接

A common issue with metering is pods failing to start. Pods might fail to start due to lack of resources or if they have a dependency on a resource that does not exist, such as a StorageClass or Secret resource.

8.1.1. Not enough compute resources
复制链接

A common issue when installing or running metering is a lack of compute resources. As the cluster grows and more reports are created, the Reporting Operator pod requires more memory. If memory usage reaches the pod limit, the cluster considers the pod out of memory (OOM) and terminates it with an OOMKilled status. Ensure that metering is allocated the minimum resource requirements described in the installation prerequisites.

注意

The Metering Operator does not autoscale the Reporting Operator based on the load in the cluster. Therefore, CPU usage for the Reporting Operator pod does not increase as the cluster grows.

To determine if the issue is with resources or scheduling, follow the troubleshooting instructions included in the Kubernetes document Managing Compute Resources for Containers.

To troubleshoot issues due to a lack of compute resources, check the following within the openshift-metering namespace.

Prerequisites

You are currently in the openshift-metering namespace. Change to the openshift-metering namespace by running:
```
oc project openshift-metering
```
```
$ oc project openshift-metering
```
Copy to Clipboard Toggle word wrap

Procedure

Check for metering Report resources that fail to complete and show the status of ReportingPeriodUnmetDependencies:

oc get reports

$ oc get reports

Copy to Clipboard

Toggle word wrap

Example output

NAME                                  QUERY                          SCHEDULE   RUNNING                            FAILED   LAST REPORT TIME       AGE
namespace-cpu-utilization-adhoc-10    namespace-cpu-utilization                 Finished                                    2020-10-31T00:00:00Z   2m38s
namespace-cpu-utilization-adhoc-11    namespace-cpu-utilization                 ReportingPeriodUnmetDependencies                                   2m23s
namespace-memory-utilization-202010   namespace-memory-utilization              ReportingPeriodUnmetDependencies                                   26s
namespace-memory-utilization-202011   namespace-memory-utilization              ReportingPeriodUnmetDependencies                                   14s

NAME                                  QUERY                          SCHEDULE   RUNNING                            FAILED   LAST REPORT TIME       AGE
namespace-cpu-utilization-adhoc-10    namespace-cpu-utilization                 Finished                                    2020-10-31T00:00:00Z   2m38s
namespace-cpu-utilization-adhoc-11    namespace-cpu-utilization                 ReportingPeriodUnmetDependencies                                   2m23s
namespace-memory-utilization-202010   namespace-memory-utilization              ReportingPeriodUnmetDependencies                                   26s
namespace-memory-utilization-202011   namespace-memory-utilization              ReportingPeriodUnmetDependencies                                   14s

Copy to Clipboard

Toggle word wrap

Check the ReportDataSource resources where the NEWEST METRIC is less than the report end date:

oc get reportdatasource

$ oc get reportdatasource

Copy to Clipboard

Toggle word wrap

Example output

NAME                                         EARLIEST METRIC        NEWEST METRIC          IMPORT START           IMPORT END             LAST IMPORT TIME       AGE
...
node-allocatable-cpu-cores                   2020-04-23T09:14:00Z   2020-08-31T10:07:00Z   2020-04-23T09:14:00Z   2020-10-15T17:13:00Z   2020-12-09T12:45:10Z   230d
node-allocatable-memory-bytes                2020-04-23T09:14:00Z   2020-08-30T05:19:00Z   2020-04-23T09:14:00Z   2020-10-14T08:01:00Z   2020-12-09T12:45:12Z   230d
...
pod-usage-memory-bytes                       2020-04-23T09:14:00Z   2020-08-24T20:25:00Z   2020-04-23T09:14:00Z   2020-10-09T23:31:00Z   2020-12-09T12:45:12Z   230d

NAME                                         EARLIEST METRIC        NEWEST METRIC          IMPORT START           IMPORT END             LAST IMPORT TIME       AGE
...
node-allocatable-cpu-cores                   2020-04-23T09:14:00Z   2020-08-31T10:07:00Z   2020-04-23T09:14:00Z   2020-10-15T17:13:00Z   2020-12-09T12:45:10Z   230d
node-allocatable-memory-bytes                2020-04-23T09:14:00Z   2020-08-30T05:19:00Z   2020-04-23T09:14:00Z   2020-10-14T08:01:00Z   2020-12-09T12:45:12Z   230d
...
pod-usage-memory-bytes                       2020-04-23T09:14:00Z   2020-08-24T20:25:00Z   2020-04-23T09:14:00Z   2020-10-09T23:31:00Z   2020-12-09T12:45:12Z   230d

Copy to Clipboard

Toggle word wrap

Check the health of the reporting-operator Pod resource for a high number of pod restarts:

oc get pods -l app=reporting-operator

$ oc get pods -l app=reporting-operator

Copy to Clipboard

Toggle word wrap

Example output

NAME                                  READY   STATUS    RESTARTS   AGE
reporting-operator-84f7c9b7b6-fr697   2/2     Running   542        8d

NAME                                  READY   STATUS    RESTARTS   AGE
reporting-operator-84f7c9b7b6-fr697   2/2     Running   542        8d

Copy to Clipboard

Toggle word wrap

1: The Reporting Operator pod is restarting at a high rate.

Check the reporting-operator Pod resource for an OOMKilled termination:

oc describe pod/reporting-operator-84f7c9b7b6-fr697

$ oc describe pod/reporting-operator-84f7c9b7b6-fr697

Copy to Clipboard

Toggle word wrap

Example output

Name:         reporting-operator-84f7c9b7b6-fr697
Namespace:    openshift-metering
Priority:     0
Node:         ip-10-xx-xx-xx.ap-southeast-1.compute.internal/10.xx.xx.xx
...
   Ports:          8080/TCP, 6060/TCP, 8082/TCP
   Host Ports:     0/TCP, 0/TCP, 0/TCP
   State:          Running
     Started:      Thu, 03 Dec 2020 20:59:45 +1000
   Last State:     Terminated
     Reason:       OOMKilled 
     Exit Code:    137
     Started:      Thu, 03 Dec 2020 20:38:05 +1000
     Finished:     Thu, 03 Dec 2020 20:59:43 +1000

Name:         reporting-operator-84f7c9b7b6-fr697
Namespace:    openshift-metering
Priority:     0
Node:         ip-10-xx-xx-xx.ap-southeast-1.compute.internal/10.xx.xx.xx
...
   Ports:          8080/TCP, 6060/TCP, 8082/TCP
   Host Ports:     0/TCP, 0/TCP, 0/TCP
   State:          Running
     Started:      Thu, 03 Dec 2020 20:59:45 +1000
   Last State:     Terminated
     Reason:       OOMKilled


     Exit Code:    137
     Started:      Thu, 03 Dec 2020 20:38:05 +1000
     Finished:     Thu, 03 Dec 2020 20:59:43 +1000

Copy to Clipboard

Toggle word wrap

1: The Reporting Operator pod was terminated due to OOM kill.

Increasing the reporting-operator pod memory limit

If you are experiencing an increase in pod restarts and OOM kill events, you can check the current memory limit set for the Reporting Operator pod. Increasing the memory limit allows the Reporting Operator pod to update the report data sources. If necessary, increase the memory limit in your MeteringConfig resource by 25% - 50%.

Procedure

Check the current memory limits of the reporting-operator Pod resource:

oc describe pod reporting-operator-67d6f57c56-79mrt

$ oc describe pod reporting-operator-67d6f57c56-79mrt

Copy to Clipboard

Toggle word wrap

Example output

Name:         reporting-operator-67d6f57c56-79mrt
Namespace:    openshift-metering
Priority:     0
...
   Ports:          8080/TCP, 6060/TCP, 8082/TCP
   Host Ports:     0/TCP, 0/TCP, 0/TCP
   State:          Running
     Started:      Tue, 08 Dec 2020 14:26:21 +1000
   Ready:          True
   Restart Count:  0
   Limits:
     cpu:     1
     memory:  500Mi 
   Requests:
     cpu:     500m
     memory:  250Mi
   Environment:
...

Name:         reporting-operator-67d6f57c56-79mrt
Namespace:    openshift-metering
Priority:     0
...
   Ports:          8080/TCP, 6060/TCP, 8082/TCP
   Host Ports:     0/TCP, 0/TCP, 0/TCP
   State:          Running
     Started:      Tue, 08 Dec 2020 14:26:21 +1000
   Ready:          True
   Restart Count:  0
   Limits:
     cpu:     1
     memory:  500Mi


   Requests:
     cpu:     500m
     memory:  250Mi
   Environment:
...

Copy to Clipboard

Toggle word wrap

1: The current memory limit for the Reporting Operator pod.

Edit the MeteringConfig resource to update the memory limit:

oc edit meteringconfig/operator-metering

$ oc edit meteringconfig/operator-metering

Copy to Clipboard

Toggle word wrap

Example MeteringConfig resource

  kind: MeteringConfig
  metadata:
  name: operator-metering
   namespace: openshift-metering
  spec:
   reporting-operator:
     spec:
       resources: 
         limits:
           cpu: 1
           memory: 750Mi
         requests:
           cpu: 500m
           memory: 500Mi
  ...

  kind: MeteringConfig
  metadata:
  name: operator-metering
   namespace: openshift-metering
  spec:
   reporting-operator:
     spec:
       resources:


         limits:
           cpu: 1
           memory: 750Mi
         requests:
           cpu: 500m
           memory: 500Mi
  ...

Copy to Clipboard

Toggle word wrap

1: Add or increase memory limits within the resources field of the MeteringConfig resource.

注意

If there continue to be numerous OOM killed events after memory limits are increased, this might indicate that a different issue is causing the reports to be in a pending state.

8.1.2. StorageClass resource not configured
复制链接

Metering requires that a default StorageClass resource be configured for dynamic provisioning.

See the documentation on configuring metering for information on how to check if there are any StorageClass resources configured for the cluster, how to set the default, and how to configure metering to use a storage class other than the default.

8.1.3. Secret not configured correctly
复制链接

A common issue with metering is providing the incorrect secret when configuring your persistent storage. Be sure to review the example configuration files and create you secret according to the guidelines for your storage provider.

返回顶部

第 8 章 Troubleshooting and debugging metering

8.1. Troubleshooting metering
复制链接

8.1.1. Not enough compute resources
复制链接

Increasing the reporting-operator pod memory limit

8.1.2. StorageClass resource not configured
复制链接

8.1.3. Secret not configured correctly
复制链接

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

第 8 章 Troubleshooting and debugging metering

8.1. Troubleshooting metering复制链接链接已复制到粘贴板!

8.1.1. Not enough compute resources复制链接链接已复制到粘贴板!

Increasing the reporting-operator pod memory limit

8.1.2. StorageClass resource not configured复制链接链接已复制到粘贴板!

8.1.3. Secret not configured correctly复制链接链接已复制到粘贴板!

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

8.1. Troubleshooting metering
复制链接

8.1.1. Not enough compute resources
复制链接

8.1.2. StorageClass resource not configured
复制链接

8.1.3. Secret not configured correctly
复制链接