Este contenido no está disponible en el idioma seleccionado.
Chapter 8. Troubleshooting and debugging metering
Metering is a deprecated feature. Deprecated functionality is still included in OpenShift Container Platform and continues to be supported; however, it will be removed in a future release of this product and is not recommended for new deployments.
For the most recent list of major functionality that has been deprecated or removed within OpenShift Container Platform, refer to the Deprecated and removed features section of the OpenShift Container Platform release notes.
Use the following sections to help troubleshoot and debug specific issues with metering.
In addition to the information in this section, be sure to review the following topics:
8.1. Troubleshooting metering Copiar enlaceEnlace copiado en el portapapeles!
A common issue with metering is pods failing to start. Pods might fail to start due to lack of resources or if they have a dependency on a resource that does not exist, such as a
StorageClass
Secret
8.1.1. Not enough compute resources Copiar enlaceEnlace copiado en el portapapeles!
A common issue when installing or running metering is a lack of compute resources. As the cluster grows and more reports are created, the Reporting Operator pod requires more memory. If memory usage reaches the pod limit, the cluster considers the pod out of memory (OOM) and terminates it with an
OOMKilled
The Metering Operator does not autoscale the Reporting Operator based on the load in the cluster. Therefore, CPU usage for the Reporting Operator pod does not increase as the cluster grows.
To determine if the issue is with resources or scheduling, follow the troubleshooting instructions included in the Kubernetes document Managing Compute Resources for Containers.
To troubleshoot issues due to a lack of compute resources, check the following within the
openshift-metering
Prerequisites
You are currently in the
namespace. Change to theopenshift-meteringnamespace by running:openshift-metering$ oc project openshift-metering
Procedure
Check for metering
resources that fail to complete and show the status ofReport:ReportingPeriodUnmetDependencies$ oc get reportsExample output
NAME QUERY SCHEDULE RUNNING FAILED LAST REPORT TIME AGE namespace-cpu-utilization-adhoc-10 namespace-cpu-utilization Finished 2020-10-31T00:00:00Z 2m38s namespace-cpu-utilization-adhoc-11 namespace-cpu-utilization ReportingPeriodUnmetDependencies 2m23s namespace-memory-utilization-202010 namespace-memory-utilization ReportingPeriodUnmetDependencies 26s namespace-memory-utilization-202011 namespace-memory-utilization ReportingPeriodUnmetDependencies 14sCheck the
resources where theReportDataSourceis less than the report end date:NEWEST METRIC$ oc get reportdatasourceExample output
NAME EARLIEST METRIC NEWEST METRIC IMPORT START IMPORT END LAST IMPORT TIME AGE ... node-allocatable-cpu-cores 2020-04-23T09:14:00Z 2020-08-31T10:07:00Z 2020-04-23T09:14:00Z 2020-10-15T17:13:00Z 2020-12-09T12:45:10Z 230d node-allocatable-memory-bytes 2020-04-23T09:14:00Z 2020-08-30T05:19:00Z 2020-04-23T09:14:00Z 2020-10-14T08:01:00Z 2020-12-09T12:45:12Z 230d ... pod-usage-memory-bytes 2020-04-23T09:14:00Z 2020-08-24T20:25:00Z 2020-04-23T09:14:00Z 2020-10-09T23:31:00Z 2020-12-09T12:45:12Z 230dCheck the health of the
reporting-operatorresource for a high number of pod restarts:Pod$ oc get pods -l app=reporting-operatorExample output
NAME READY STATUS RESTARTS AGE reporting-operator-84f7c9b7b6-fr697 2/2 Running 542 8d1 - 1
- The Reporting Operator pod is restarting at a high rate.
Check the
reporting-operatorresource for anPodtermination:OOMKilled$ oc describe pod/reporting-operator-84f7c9b7b6-fr697Example output
Name: reporting-operator-84f7c9b7b6-fr697 Namespace: openshift-metering Priority: 0 Node: ip-10-xx-xx-xx.ap-southeast-1.compute.internal/10.xx.xx.xx ... Ports: 8080/TCP, 6060/TCP, 8082/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP State: Running Started: Thu, 03 Dec 2020 20:59:45 +1000 Last State: Terminated Reason: OOMKilled1 Exit Code: 137 Started: Thu, 03 Dec 2020 20:38:05 +1000 Finished: Thu, 03 Dec 2020 20:59:43 +1000- 1
- The Reporting Operator pod was terminated due to OOM kill.
Increasing the reporting-operator pod memory limit
If you are experiencing an increase in pod restarts and OOM kill events, you can check the current memory limit set for the Reporting Operator pod. Increasing the memory limit allows the Reporting Operator pod to update the report data sources. If necessary, increase the memory limit in your
MeteringConfig
Procedure
Check the current memory limits of the
reporting-operatorresource:Pod$ oc describe pod reporting-operator-67d6f57c56-79mrtExample output
Name: reporting-operator-67d6f57c56-79mrt Namespace: openshift-metering Priority: 0 ... Ports: 8080/TCP, 6060/TCP, 8082/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP State: Running Started: Tue, 08 Dec 2020 14:26:21 +1000 Ready: True Restart Count: 0 Limits: cpu: 1 memory: 500Mi1 Requests: cpu: 500m memory: 250Mi Environment: ...- 1
- The current memory limit for the Reporting Operator pod.
Edit the
resource to update the memory limit:MeteringConfig$ oc edit meteringconfig/operator-meteringExample
MeteringConfigresourcekind: MeteringConfig metadata: name: operator-metering namespace: openshift-metering spec: reporting-operator: spec: resources:1 limits: cpu: 1 memory: 750Mi requests: cpu: 500m memory: 500Mi ...- 1
- Add or increase memory limits within the
resourcesfield of theMeteringConfigresource.
NoteIf there continue to be numerous OOM killed events after memory limits are increased, this might indicate that a different issue is causing the reports to be in a pending state.
8.1.2. StorageClass resource not configured Copiar enlaceEnlace copiado en el portapapeles!
Metering requires that a default
StorageClass
See the documentation on configuring metering for information on how to check if there are any
StorageClass
8.1.3. Secret not configured correctly Copiar enlaceEnlace copiado en el portapapeles!
A common issue with metering is providing the incorrect secret when configuring your persistent storage. Be sure to review the example configuration files and create you secret according to the guidelines for your storage provider.
8.2. Debugging metering Copiar enlaceEnlace copiado en el portapapeles!
Debugging metering is much easier when you interact directly with the various components. The sections below detail how you can connect and query Presto and Hive as well as view the dashboards of the Presto and HDFS components.
All of the commands in this section assume you have installed metering through OperatorHub in the
openshift-metering
8.2.1. Get reporting operator logs Copiar enlaceEnlace copiado en el portapapeles!
Use the command below to follow the logs of the
reporting-operator
$ oc -n openshift-metering logs -f "$(oc -n openshift-metering get pods -l app=reporting-operator -o name | cut -c 5-)" -c reporting-operator
8.2.2. Query Presto using presto-cli Copiar enlaceEnlace copiado en el portapapeles!
The following command opens an interactive presto-cli session where you can query Presto. This session runs in the same container as Presto and launches an additional Java instance, which can create memory limits for the pod. If this occurs, you should increase the memory request and limits of the Presto pod.
By default, Presto is configured to communicate using TLS. You must use the following command to run Presto queries:
$ oc -n openshift-metering exec -it "$(oc -n openshift-metering get pods -l app=presto,presto=coordinator -o name | cut -d/ -f2)" \
-- /usr/local/bin/presto-cli --server https://presto:8080 --catalog hive --schema default --user root --keystore-path /opt/presto/tls/keystore.pem
Once you run this command, a prompt appears where you can run queries. Use the
show tables from metering;
$ presto:default> show tables from metering;
Example output
Table
datasource_your_namespace_cluster_cpu_capacity_raw
datasource_your_namespace_cluster_cpu_usage_raw
datasource_your_namespace_cluster_memory_capacity_raw
datasource_your_namespace_cluster_memory_usage_raw
datasource_your_namespace_node_allocatable_cpu_cores
datasource_your_namespace_node_allocatable_memory_bytes
datasource_your_namespace_node_capacity_cpu_cores
datasource_your_namespace_node_capacity_memory_bytes
datasource_your_namespace_node_cpu_allocatable_raw
datasource_your_namespace_node_cpu_capacity_raw
datasource_your_namespace_node_memory_allocatable_raw
datasource_your_namespace_node_memory_capacity_raw
datasource_your_namespace_persistentvolumeclaim_capacity_bytes
datasource_your_namespace_persistentvolumeclaim_capacity_raw
datasource_your_namespace_persistentvolumeclaim_phase
datasource_your_namespace_persistentvolumeclaim_phase_raw
datasource_your_namespace_persistentvolumeclaim_request_bytes
datasource_your_namespace_persistentvolumeclaim_request_raw
datasource_your_namespace_persistentvolumeclaim_usage_bytes
datasource_your_namespace_persistentvolumeclaim_usage_raw
datasource_your_namespace_persistentvolumeclaim_usage_with_phase_raw
datasource_your_namespace_pod_cpu_request_raw
datasource_your_namespace_pod_cpu_usage_raw
datasource_your_namespace_pod_limit_cpu_cores
datasource_your_namespace_pod_limit_memory_bytes
datasource_your_namespace_pod_memory_request_raw
datasource_your_namespace_pod_memory_usage_raw
datasource_your_namespace_pod_persistentvolumeclaim_request_info
datasource_your_namespace_pod_request_cpu_cores
datasource_your_namespace_pod_request_memory_bytes
datasource_your_namespace_pod_usage_cpu_cores
datasource_your_namespace_pod_usage_memory_bytes
(32 rows)
Query 20210503_175727_00107_3venm, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:02 [32 rows, 2.23KB] [19 rows/s, 1.37KB/s]
presto:default>
8.2.3. Query Hive using beeline Copiar enlaceEnlace copiado en el portapapeles!
The following opens an interactive beeline session where you can query Hive. This session runs in the same container as Hive and launches an additional Java instance, which can create memory limits for the pod. If this occurs, you should increase the memory request and limits of the Hive pod.
$ oc -n openshift-metering exec -it $(oc -n openshift-metering get pods -l app=hive,hive=server -o name | cut -d/ -f2) \
-c hiveserver2 -- beeline -u 'jdbc:hive2://127.0.0.1:10000/default;auth=noSasl'
Once you run this command, a prompt appears where you can run queries. Use the
show tables;
$ 0: jdbc:hive2://127.0.0.1:10000/default> show tables from metering;
Example output
+----------------------------------------------------+
| tab_name |
+----------------------------------------------------+
| datasource_your_namespace_cluster_cpu_capacity_raw |
| datasource_your_namespace_cluster_cpu_usage_raw |
| datasource_your_namespace_cluster_memory_capacity_raw |
| datasource_your_namespace_cluster_memory_usage_raw |
| datasource_your_namespace_node_allocatable_cpu_cores |
| datasource_your_namespace_node_allocatable_memory_bytes |
| datasource_your_namespace_node_capacity_cpu_cores |
| datasource_your_namespace_node_capacity_memory_bytes |
| datasource_your_namespace_node_cpu_allocatable_raw |
| datasource_your_namespace_node_cpu_capacity_raw |
| datasource_your_namespace_node_memory_allocatable_raw |
| datasource_your_namespace_node_memory_capacity_raw |
| datasource_your_namespace_persistentvolumeclaim_capacity_bytes |
| datasource_your_namespace_persistentvolumeclaim_capacity_raw |
| datasource_your_namespace_persistentvolumeclaim_phase |
| datasource_your_namespace_persistentvolumeclaim_phase_raw |
| datasource_your_namespace_persistentvolumeclaim_request_bytes |
| datasource_your_namespace_persistentvolumeclaim_request_raw |
| datasource_your_namespace_persistentvolumeclaim_usage_bytes |
| datasource_your_namespace_persistentvolumeclaim_usage_raw |
| datasource_your_namespace_persistentvolumeclaim_usage_with_phase_raw |
| datasource_your_namespace_pod_cpu_request_raw |
| datasource_your_namespace_pod_cpu_usage_raw |
| datasource_your_namespace_pod_limit_cpu_cores |
| datasource_your_namespace_pod_limit_memory_bytes |
| datasource_your_namespace_pod_memory_request_raw |
| datasource_your_namespace_pod_memory_usage_raw |
| datasource_your_namespace_pod_persistentvolumeclaim_request_info |
| datasource_your_namespace_pod_request_cpu_cores |
| datasource_your_namespace_pod_request_memory_bytes |
| datasource_your_namespace_pod_usage_cpu_cores |
| datasource_your_namespace_pod_usage_memory_bytes |
+----------------------------------------------------+
32 rows selected (13.101 seconds)
0: jdbc:hive2://127.0.0.1:10000/default>
8.2.4. Port-forward to the Hive web UI Copiar enlaceEnlace copiado en el portapapeles!
Run the following command to port-forward to the Hive web UI:
$ oc -n openshift-metering port-forward hive-server-0 10002
You can now open http://127.0.0.1:10002 in your browser window to view the Hive web interface.
8.2.5. Port-forward to HDFS Copiar enlaceEnlace copiado en el portapapeles!
Run the following command to port-forward to the HDFS namenode:
$ oc -n openshift-metering port-forward hdfs-namenode-0 9870
You can now open http://127.0.0.1:9870 in your browser window to view the HDFS web interface.
Run the following command to port-forward to the first HDFS datanode:
$ oc -n openshift-metering port-forward hdfs-datanode-0 9864
- 1
- To check other datanodes, replace
hdfs-datanode-0with the pod you want to view information on.
8.2.6. Metering Ansible Operator Copiar enlaceEnlace copiado en el portapapeles!
Metering uses the Ansible Operator to watch and reconcile resources in a cluster environment. When debugging a failed metering installation, it can be helpful to view the Ansible logs or status of your
MeteringConfig
8.2.6.1. Accessing Ansible logs Copiar enlaceEnlace copiado en el portapapeles!
In the default installation, the Metering Operator is deployed as a pod. In this case, you can check the logs of the Ansible container within this pod:
$ oc -n openshift-metering logs $(oc -n openshift-metering get pods -l app=metering-operator -o name | cut -d/ -f2) -c ansible
Alternatively, you can view the logs of the Operator container (replace
-c ansible
-c operator
8.2.6.2. Checking the MeteringConfig Status Copiar enlaceEnlace copiado en el portapapeles!
It can be helpful to view the
.status
MeteringConfig
Invalid
$ oc -n openshift-metering get meteringconfig operator-metering -o=jsonpath='{.status.conditions[?(@.type=="Invalid")].message}'
8.2.6.3. Checking MeteringConfig Events Copiar enlaceEnlace copiado en el portapapeles!
Check events that the Metering Operator is generating. This can be helpful during installation or upgrade to debug any resource failures. Sort events by the last timestamp:
$ oc -n openshift-metering get events --field-selector involvedObject.kind=MeteringConfig --sort-by='.lastTimestamp'
Example output with latest changes in the MeteringConfig resources
LAST SEEN TYPE REASON OBJECT MESSAGE
4m40s Normal Validating meteringconfig/operator-metering Validating the user-provided configuration
4m30s Normal Started meteringconfig/operator-metering Configuring storage for the metering-ansible-operator
4m26s Normal Started meteringconfig/operator-metering Configuring TLS for the metering-ansible-operator
3m58s Normal Started meteringconfig/operator-metering Configuring reporting for the metering-ansible-operator
3m53s Normal Reconciling meteringconfig/operator-metering Reconciling metering resources
3m47s Normal Reconciling meteringconfig/operator-metering Reconciling monitoring resources
3m41s Normal Reconciling meteringconfig/operator-metering Reconciling HDFS resources
3m23s Normal Reconciling meteringconfig/operator-metering Reconciling Hive resources
2m59s Normal Reconciling meteringconfig/operator-metering Reconciling Presto resources
2m35s Normal Reconciling meteringconfig/operator-metering Reconciling reporting-operator resources
2m14s Normal Reconciling meteringconfig/operator-metering Reconciling reporting resources