Chapter 11. Troubleshooting

11.1. Collecting diagnostic data from the command line
Copy link

When submitting a support case, it is helpful to include diagnostic information about your cluster to Red Hat Support. You can use the oc adm must-gather tool to gather diagnostic data for resources of various types, such as OpenTelemetryCollector, Instrumentation, and the created resources like Deployment, Pod, or ConfigMap. The oc adm must-gather tool creates a new pod that collects this data.

Procedure

From the directory where you want to save the collected data, run the oc adm must-gather command to collect the data:

oc adm must-gather --image=ghcr.io/open-telemetry/opentelemetry-operator/must-gather -- \
/usr/bin/must-gather --operator-namespace <operator_namespace>

$ oc adm must-gather --image=ghcr.io/open-telemetry/opentelemetry-operator/must-gather -- \
/usr/bin/must-gather --operator-namespace <operator_namespace>

1

Copy to Clipboard

Toggle word wrap

1: The default namespace where the Operator is installed is openshift-opentelemetry-operator.

Verification

Verify that the new directory is created and contains the collected data.

11.2. Getting the OpenTelemetry Collector logs
Copy link

You can get the logs for the OpenTelemetry Collector as follows.

Procedure

Set the relevant log level in the OpenTelemetryCollector custom resource (CR):
```
  config:
    service:
      telemetry:
        logs:
          level: debug 
```
```
  config:
    service:
      telemetry:
        logs:
          level: debug 
```
1
Copy to Clipboard Toggle word wrap
1
Collector’s log level. Supported values include info, warn, error, or debug. Defaults to info.
Use the oc logs command or the web console to retrieve the logs.

11.3. Exposing the metrics
Copy link

The OpenTelemetry Collector exposes the following metrics about the data volumes it has processed:

otelcol_receiver_accepted_spans: The number of spans successfully pushed into the pipeline.
otelcol_receiver_refused_spans: The number of spans that could not be pushed into the pipeline.
otelcol_exporter_sent_spans: The number of spans successfully sent to the destination.
otelcol_exporter_enqueue_failed_spans: The number of spans failed to be added to the sending queue.
otelcol_receiver_accepted_logs: The number of logs successfully pushed into the pipeline.
otelcol_receiver_refused_logs: The number of logs that could not be pushed into the pipeline.
otelcol_exporter_sent_logs: The number of logs successfully sent to the destination.
otelcol_exporter_enqueue_failed_logs: The number of logs failed to be added to the sending queue.
otelcol_receiver_accepted_metrics: The number of metrics successfully pushed into the pipeline.
otelcol_receiver_refused_metrics: The number of metrics that could not be pushed into the pipeline.
otelcol_exporter_sent_metrics: The number of metrics successfully sent to the destination.
otelcol_exporter_enqueue_failed_metrics: The number of metrics failed to be added to the sending queue.

You can use these metrics to troubleshoot issues with your Collector. For example, if the otelcol_receiver_refused_spans metric has a high value, it indicates that the Collector is not able to process incoming spans.

The Operator creates a <cr_name>-collector-monitoring telemetry service that you can use to scrape the metrics endpoint.

Procedure

Enable the telemetry service by adding the following lines in the OpenTelemetryCollector custom resource (CR):

# ...
  config:
    service:
      telemetry:
        metrics:
          readers:
          - pull:
              exporter:
                prometheus:
                  host: 0.0.0.0
                  port: 8888 
# ...

# ...
  config:
    service:
      telemetry:
        metrics:
          readers:
          - pull:
              exporter:
                prometheus:
                  host: 0.0.0.0
                  port: 8888

1


# ...

Copy to Clipboard

Toggle word wrap

1: The port at which the internal collector metrics are exposed. Defaults to :8888.

Retrieve the metrics by running the following command, which uses the port-forwarding Collector pod:
```
oc port-forward <collector_pod>
```
```
$ oc port-forward <collector_pod>
```
Copy to Clipboard Toggle word wrap
In the OpenTelemetryCollector CR, set the enableMetrics field to true to scrape internal metrics:
```
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
spec:
# ...
  mode: deployment
  observability:
    metrics:
      enableMetrics: true
# ...
```
```
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
spec:
# ...
  mode: deployment
  observability:
    metrics:
      enableMetrics: true
# ...
```
Copy to Clipboard Toggle word wrap
Depending on the deployment mode of the OpenTelemetry Collector, the internal metrics are scraped by using PodMonitors or ServiceMonitors.
Note
Alternatively, if you do not set the enableMetrics field to true, you can access the metrics endpoint at http://localhost:8888/metrics.
Optional: If the User Workload Monitoring feature is enabled in the web console, go to Observe Dashboards in the web console, and then select the OpenTelemetry Collector dashboard from the drop-down list to view it. For more information about the User Workload Monitoring feature, see "Enabling monitoring for user-defined projects" in Monitoring.
Tip
You can filter the visualized data such as spans or metrics by the Collector instance, namespace, or OpenTelemetry components such as processors, receivers, or exporters.

11.4. Debug Exporter
Copy link

You can configure the Debug Exporter to export the collected data to the standard output.

Procedure

Configure the OpenTelemetryCollector custom resource as follows:

  config:
    exporters:
      debug:
        verbosity: detailed
    service:
      pipelines:
        traces:
          exporters: [debug]
        metrics:
          exporters: [debug]
        logs:
          exporters: [debug]

  config:
    exporters:
      debug:
        verbosity: detailed
    service:
      pipelines:
        traces:
          exporters: [debug]
        metrics:
          exporters: [debug]
        logs:
          exporters: [debug]

Copy to Clipboard

Toggle word wrap

Use the oc logs command or the web console to export the logs to the standard output.

11.5. Disabling network policies
Copy link

The Red Hat build of OpenTelemetry Operator creates network policies to control the traffic for the Operator and operands to improve security. By default, the network policies are enabled and configured to allow traffic to all the required components. No additional configuration is needed.

If you are experiencing traffic issues for the OpenTelemetry Collector or its Target Allocator component, the problem might be caused by the default network policy configuration. You can disable network policies for the OpenTelemetry Collector to troubleshoot the issue.

Prerequisites

You have access to the cluster as a cluster administrator with the cluster-admin role.

Procedure

Disable the network policy for the OpenTelemetry Collector by configuring the OpenTelemetryCollector custom resource (CR):

apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: otel
  namespace: observability
spec:
  networkPolicy:
    enabled: false 
  # ...

apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: otel
  namespace: observability
spec:
  networkPolicy:
    enabled: false

1


  # ...

Copy to Clipboard

Toggle word wrap

1: Specify whether to enable network policies by setting networkPolicy.enabled to true (default) or false. Setting it to false disables the creation of network policies.

11.6. Using the Network Observability Operator for troubleshooting
Copy link

You can debug the traffic between your observability components by visualizing it with the Network Observability Operator.

Prerequisites

You have installed the Network Observability Operator as explained in "Installing the Network Observability Operator".

Procedure

In the OpenShift Container Platform web console, go to Observe Network Traffic Topology.
Select Namespace to filter the workloads by the namespace in which your OpenTelemetry Collector is deployed.
Use the network traffic visuals to troubleshoot possible issues. See "Observing the network traffic from the Topology view" for more details.

11.7. Troubleshooting the instrumentation
Copy link

To troubleshoot the instrumentation, look for any of the following issues:

Issues with instrumentation injection into your workload
Issues with data generation by the instrumentation libraries

11.7.1. Troubleshooting instrumentation injection into your workload
Copy link

To troubleshoot instrumentation injection, you can perform the following activities:

Checking if the Instrumentation object was created
Checking if the init-container started
Checking if the resources were deployed in the correct order
Searching for errors in the Operator logs
Double-checking the pod annotations

Procedure

Run the following command to verify that the Instrumentation object was successfully created:
```
oc get instrumentation -n <workload_project>
```
```
$ oc get instrumentation -n <workload_project> 
```
1
Copy to Clipboard Toggle word wrap
1
The namespace where the instrumentation was created.
Run the following command to verify that the opentelemetry-auto-instrumentation init-container successfully started, which is a prerequisite for instrumentation injection into workloads:
```
oc get events -n <workload_project>
```
```
$ oc get events -n <workload_project> 
```
1
Copy to Clipboard Toggle word wrap
1
The namespace where the instrumentation is injected for workloads.
Example output
```
... Created container opentelemetry-auto-instrumentation
... Started container opentelemetry-auto-instrumentation
```
```
... Created container opentelemetry-auto-instrumentation
... Started container opentelemetry-auto-instrumentation
```
Copy to Clipboard Toggle word wrap
Verify that the resources were deployed in the correct order for the auto-instrumentation to work correctly. The correct order is to deploy the Instrumentation custom resource (CR) before the application. For information about the Instrumentation CR, see the section "Configuring the instrumentation".
Note
When the pod starts, the Red Hat build of OpenTelemetry Operator checks the Instrumentation CR for annotations containing instructions for injecting auto-instrumentation. Generally, the Operator then adds an init-container to the application’s pod that injects the auto-instrumentation and environment variables into the application’s container. If the Instrumentation CR is not available to the Operator when the application is deployed, the Operator is unable to inject the auto-instrumentation.
Fixing the order of deployment requires the following steps:
1. Update the instrumentation settings.
2. Delete the instrumentation object.
3. Redeploy the application.

Run the following command to inspect the Operator logs for instrumentation errors:

oc logs -l app.kubernetes.io/name=opentelemetry-operator --container manager -n openshift-opentelemetry-operator --follow

$ oc logs -l app.kubernetes.io/name=opentelemetry-operator --container manager -n openshift-opentelemetry-operator --follow

Copy to Clipboard

Toggle word wrap

Troubleshoot pod annotations for the instrumentations for a specific programming language. See the required annotation fields and values in "Configuring the instrumentation".
1. Verify that the application pods that you are instrumenting are labeled with correct annotations and the appropriate auto-instrumentation settings have been applied.
  Example
  instrumentation.opentelemetry.io/inject-python="true"
  
  Copy to Clipboard Toggle word wrap
  Example command to get pod annotations for an instrumented Python application
  $ oc get pods -n <workload_project> -o jsonpath='{range .items[?(@.metadata.annotations["instrumentation.opentelemetry.io/inject-python"]=="true")]}{.metadata.name}{"\n"}{end}'
  
  Copy to Clipboard Toggle word wrap
2. Verify that the annotation applied to the instrumentation object is correct for the programming language that you are instrumenting.
3. If there are multiple instrumentations in the same namespace, specify the name of the Instrumentation object in their annotations.
  Example
  instrumentation.opentelemetry.io/inject-nodejs: "<instrumentation_object>"
  
  Copy to Clipboard Toggle word wrap
4. If the Instrumentation object is in a different namespace, specify the namespace in the annotation.
  Example
  instrumentation.opentelemetry.io/inject-nodejs: "<other_namespace>/<instrumentation_object>"
  
  Copy to Clipboard Toggle word wrap
5. Verify that the OpenTelemetryCollector custom resource specifies the auto-instrumentation annotations under spec.template.metadata.annotations. If the auto-instrumentation annotations are in spec.metadata.annotations instead, move them into spec.template.metadata.annotations.

11.7.2. Troubleshooting telemetry data generation by the instrumentation libraries
Copy link

You can troubleshoot telemetry data generation by the instrumentation libraries by checking the endpoint, looking for errors in your application logs, and verifying that the Collector is receiving the telemetry data.

Procedure

Verify that the instrumentation is transmitting data to the correct endpoint:
```
oc get instrumentation <instrumentation_name> -n <workload_project> -o jsonpath='{.spec.endpoint}'
```
```
$ oc get instrumentation <instrumentation_name> -n <workload_project> -o jsonpath='{.spec.endpoint}'
```
Copy to Clipboard Toggle word wrap
The default endpoint http://localhost:4317 for the Instrumentation object is only applicable to a Collector instance that is deployed as a sidecar in your application pod. If you are using an incorrect endpoint, correct it by editing the Instrumentation object and redeploying your application.
Inspect your application logs for error messages that might indicate that the instrumentation is malfunctioning:
```
oc logs <application_pod> -n <workload_project>
```
```
$ oc logs <application_pod> -n <workload_project>
```
Copy to Clipboard Toggle word wrap
If the application logs contain error messages that indicate that the instrumentation might be malfunctioning, install the OpenTelemetry SDK and libraries locally. Then run your application locally and troubleshoot for issues between the instrumentation libraries and your application without OpenShift Container Platform.
Use the Debug Exporter to verify that the telemetry data is reaching the destination OpenTelemetry Collector instance. For more information, see "Debug Exporter".

11.1. Collecting diagnostic data from the command line
Copy link

11.2. Getting the OpenTelemetry Collector logs
Copy link

11.3. Exposing the metrics
Copy link

11.4. Debug Exporter
Copy link

11.5. Disabling network policies
Copy link

11.6. Using the Network Observability Operator for troubleshooting
Copy link

11.7. Troubleshooting the instrumentation
Copy link

11.7.1. Troubleshooting instrumentation injection into your workload
Copy link

11.7.2. Troubleshooting telemetry data generation by the instrumentation libraries
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 11. Troubleshooting

11.1. Collecting diagnostic data from the command lineCopy linkLink copied to clipboard!

11.2. Getting the OpenTelemetry Collector logsCopy linkLink copied to clipboard!

11.3. Exposing the metricsCopy linkLink copied to clipboard!

11.4. Debug ExporterCopy linkLink copied to clipboard!

11.5. Disabling network policiesCopy linkLink copied to clipboard!

11.6. Using the Network Observability Operator for troubleshootingCopy linkLink copied to clipboard!

11.7. Troubleshooting the instrumentationCopy linkLink copied to clipboard!

11.7.1. Troubleshooting instrumentation injection into your workloadCopy linkLink copied to clipboard!

11.7.2. Troubleshooting telemetry data generation by the instrumentation librariesCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

11.1. Collecting diagnostic data from the command line
Copy link

11.2. Getting the OpenTelemetry Collector logs
Copy link

11.3. Exposing the metrics
Copy link

11.4. Debug Exporter
Copy link

11.5. Disabling network policies
Copy link

11.6. Using the Network Observability Operator for troubleshooting
Copy link

11.7. Troubleshooting the instrumentation
Copy link

11.7.1. Troubleshooting instrumentation injection into your workload
Copy link

11.7.2. Troubleshooting telemetry data generation by the instrumentation libraries
Copy link