Chapter 10. Troubleshooting
The OpenTelemetry Collector offers multiple ways to measure its health as well as investigate data ingestion issues.
10.1. Collecting diagnostic data from the command line
When submitting a support case, it is helpful to include diagnostic information about your cluster to Red Hat Support. You can use the oc adm must-gather
tool to gather diagnostic data for resources of various types, such as OpenTelemetryCollector
, Instrumentation
, and the created resources like Deployment
, Pod
, or ConfigMap
. The oc adm must-gather
tool creates a new pod that collects this data.
Procedure
From the directory where you want to save the collected data, run the
oc adm must-gather
command to collect the data:$ oc adm must-gather --image=ghcr.io/open-telemetry/opentelemetry-operator/must-gather -- \ /usr/bin/must-gather --operator-namespace <operator_namespace> 1
- 1
- The default namespace where the Operator is installed is
openshift-opentelemetry-operator
.
Verification
- Verify that the new directory is created and contains the collected data.
10.2. Getting the OpenTelemetry Collector logs
You can get the logs for the OpenTelemetry Collector as follows.
Procedure
Set the relevant log level in the
OpenTelemetryCollector
custom resource (CR):config: | service: telemetry: logs: level: debug 1
- 1
- Collector’s log level. Supported values include
info
,warn
,error
, ordebug
. Defaults toinfo
.
-
Use the
oc logs
command or the web console to retrieve the logs.
10.3. Exposing the metrics
The OpenTelemetry Collector exposes the metrics about the data volumes it has processed. The following metrics are for spans, although similar metrics are exposed for metrics and logs signals:
otelcol_receiver_accepted_spans
- The number of spans successfully pushed into the pipeline.
otelcol_receiver_refused_spans
- The number of spans that could not be pushed into the pipeline.
otelcol_exporter_sent_spans
- The number of spans successfully sent to the destination.
otelcol_exporter_enqueue_failed_spans
- The number of spans failed to be added to the sending queue.
The Operator creates a <cr_name>-collector-monitoring
telemetry service that you can use to scrape the metrics endpoint.
Procedure
Enable the telemetry service by adding the following lines in the
OpenTelemetryCollector
custom resource (CR):# ... config: | service: telemetry: metrics: address: ":8888" 1 # ...
- 1
- The address at which the internal collector metrics are exposed. Defaults to
:8888
.
Retrieve the metrics by running the following command, which uses the port-forwarding Collector pod:
$ oc port-forward <collector_pod>
In the
OpenTelemetryCollector
CR, set theenableMetrics
field totrue
to scrape internal metrics:apiVersion: opentelemetry.io/v1alpha1 kind: OpenTelemetryCollector spec: # ... mode: deployment observability: metrics: enableMetrics: true # ...
Depending on the deployment mode of the OpenTelemetry Collector, the internal metrics are scraped by using
PodMonitors
orServiceMonitors
.NoteAlternatively, if you do not set the
enableMetrics
field totrue
, you can access the metrics endpoint athttp://localhost:8888/metrics
.On the Observe page in the web console, enable User Workload Monitoring to visualize the scraped metrics.
NoteNot all processors expose the required metrics.
In the web console, go to Observe
Dashboards and select the OpenTelemetry Collector dashboard from the drop-down list to view it. TipYou can filter the visualized data such as spans or metrics by the Collector instance, namespace, or OpenTelemetry components such as processors, receivers, or exporters.
10.4. Debug exporter
You can configure the debug exporter to export the collected data to the standard output.
Procedure
Configure the
OpenTelemetryCollector
custom resource as follows:config: | exporters: debug: verbosity: detailed service: pipelines: traces: exporters: [debug] metrics: exporters: [debug] logs: exporters: [debug]
-
Use the
oc logs
command or the web console to export the logs to the standard output.
10.5. Using the Network Observability Operator for troubleshooting
You can debug the traffic between your observability components by visualizing it with the Network Observability Operator.
Prerequisites
- You have installed the Network Observability Operator as explained in "Installing the Network Observability Operator".
Procedure
-
In the OpenShift Container Platform web console, go to Observe
Network Traffic Topology. - Select Namespace to filter the workloads by the namespace in which your OpenTelemetry Collector is deployed.
- Use the network traffic visuals to troubleshoot possible issues. See "Observing the network traffic from the Topology view" for more details.