Chapter 4. Tracing
4.1. Tracing requests
Distributed tracing records the path of a request through the various services that make up an application. It is used to tie information about different units of work together, to understand a whole chain of events in a distributed transaction. The units of work might be executed in different processes or hosts.
4.1.1. Distributed tracing overview
As a service owner, you can use distributed tracing to instrument your services to gather insights into your service architecture. You can use distributed tracing for monitoring, network profiling, and troubleshooting the interaction between components in modern, cloud-native, microservices-based applications.
With distributed tracing you can perform the following functions:
- Monitor distributed transactions
- Optimize performance and latency
- Perform root cause analysis
Red Hat OpenShift distributed tracing consists of two main components:
- Red Hat OpenShift distributed tracing platform - This component is based on the open source Jaeger project.
- Red Hat OpenShift distributed tracing data collection - This component is based on the open source OpenTelemetry project.
Both of these components are based on the vendor-neutral OpenTracing APIs and instrumentation.
4.1.2. Additional resources for OpenShift Container Platform
4.2. Using Red Hat OpenShift distributed tracing
You can use Red Hat OpenShift distributed tracing with OpenShift Serverless to monitor and troubleshoot serverless applications.
4.2.1. Using Red Hat OpenShift distributed tracing to enable distributed tracing
Red Hat OpenShift distributed tracing is made up of several components that work together to collect, store, and display tracing data.
Prerequisites
- You have access to an OpenShift Container Platform account with cluster administrator access.
- You have installed Red Hat OpenShift distributed tracing by following the OpenShift Container Platform "Installing distributed tracing" documentation.
-
You have installed the OpenShift CLI (
oc
). - You have created a project or have access to a project with the appropriate roles and permissions to create applications and other workloads in OpenShift Container Platform.
Procedure
Create an
OpenTelemetryCollector
custom resource (CR):Example OpenTelemetryCollector CR
apiVersion: opentelemetry.io/v1alpha1 kind: OpenTelemetryCollector metadata: name: cluster-collector namespace: <namespace> spec: mode: deployment config: | receivers: zipkin: processors: exporters: jaeger: endpoint: jaeger-all-in-one-inmemory-collector-headless.tracing-system.svc:14250 tls: ca_file: "/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt" logging: service: pipelines: traces: receivers: [zipkin] processors: [] exporters: [jaeger, logging]
Verify that you have two pods running in the namespace where Red Hat OpenShift distributed tracing is installed:
$ oc get pods -n <namespace>
Example output
NAME READY STATUS RESTARTS AGE cluster-collector-collector-85c766b5c-b5g99 1/1 Running 0 5m56s jaeger-all-in-one-inmemory-ccbc9df4b-ndkl5 2/2 Running 0 15m
Verify that the following headless services have been created:
$ oc get svc -n <namespace> | grep headless
Example output
cluster-collector-collector-headless ClusterIP None <none> 9411/TCP 7m28s jaeger-all-in-one-inmemory-collector-headless ClusterIP None <none> 9411/TCP,14250/TCP,14267/TCP,14268/TCP 16m
These services are used to configure Jaeger, Knative Serving, and Knative Eventing. The name of the Jaeger service may vary.
- Install the OpenShift Serverless Operator by following the "Installing the OpenShift Serverless Operator" documentation.
Install Knative Serving by creating the following
KnativeServing
CR:Example KnativeServing CR
apiVersion: operator.knative.dev/v1beta1 kind: KnativeServing metadata: name: knative-serving namespace: knative-serving spec: config: tracing: backend: "zipkin" zipkin-endpoint: "http://cluster-collector-collector-headless.tracing-system.svc:9411/api/v2/spans" debug: "false" sample-rate: "0.1" 1
- 1
- The
sample-rate
defines sampling probability. Usingsample-rate: "0.1"
means that 1 in 10 traces are sampled.
Install Knative Eventing by creating the following
KnativeEventing
CR:Example KnativeEventing CR
apiVersion: operator.knative.dev/v1beta1 kind: KnativeEventing metadata: name: knative-eventing namespace: knative-eventing spec: config: tracing: backend: "zipkin" zipkin-endpoint: "http://cluster-collector-collector-headless.tracing-system.svc:9411/api/v2/spans" debug: "false" sample-rate: "0.1" 1
- 1
- The
sample-rate
defines sampling probability. Usingsample-rate: "0.1"
means that 1 in 10 traces are sampled.
Create a Knative service:
Example service
apiVersion: serving.knative.dev/v1 kind: Service metadata: name: helloworld-go spec: template: metadata: labels: app: helloworld-go annotations: autoscaling.knative.dev/minScale: "1" autoscaling.knative.dev/target: "1" spec: containers: - image: quay.io/openshift-knative/helloworld:v1.2 imagePullPolicy: Always resources: requests: cpu: "200m" env: - name: TARGET value: "Go Sample v1"
Make some requests to the service:
Example HTTPS request
$ curl https://helloworld-go.example.com
Get the URL for the Jaeger web console:
Example command
$ oc get route jaeger-all-in-one-inmemory -o jsonpath='{.spec.host}' -n <namespace>
You can now examine traces by using the Jaeger console.
4.3. Using Jaeger distributed tracing
If you do not want to install all of the components of Red Hat OpenShift distributed tracing, you can still use distributed tracing on OpenShift Container Platform with OpenShift Serverless.
4.3.1. Configuring Jaeger to enable distributed tracing
To enable distributed tracing using Jaeger, you must install and configure Jaeger as a standalone integration.
Prerequisites
- You have cluster administrator permissions on OpenShift Container Platform, or you have cluster or dedicated administrator permissions on Red Hat OpenShift Service on AWS or OpenShift Dedicated.
- You have installed the OpenShift Serverless Operator, Knative Serving, and Knative Eventing.
- You have installed the Red Hat OpenShift distributed tracing platform Operator.
-
You have installed the OpenShift CLI (
oc
). - You have created a project or have access to a project with the appropriate roles and permissions to create applications and other workloads.
Procedure
Create and apply a
Jaeger
custom resource (CR) that contains the following:Jaeger CR
apiVersion: jaegertracing.io/v1 kind: Jaeger metadata: name: jaeger namespace: default
Enable tracing for Knative Serving, by editing the
KnativeServing
CR and adding a YAML configuration for tracing:Tracing YAML example for Serving
apiVersion: operator.knative.dev/v1beta1 kind: KnativeServing metadata: name: knative-serving namespace: knative-serving spec: config: tracing: sample-rate: "0.1" 1 backend: zipkin 2 zipkin-endpoint: "http://jaeger-collector.default.svc.cluster.local:9411/api/v2/spans" 3 debug: "false" 4
- 1
- The
sample-rate
defines sampling probability. Usingsample-rate: "0.1"
means that 1 in 10 traces are sampled. - 2
backend
must be set tozipkin
.- 3
- The
zipkin-endpoint
must point to yourjaeger-collector
service endpoint. To get this endpoint, substitute the namespace where the Jaeger CR is applied. - 4
- Debugging should be set to
false
. Enabling debug mode by settingdebug: "true"
allows all spans to be sent to the server, bypassing sampling.
Enable tracing for Knative Eventing by editing the
KnativeEventing
CR:Tracing YAML example for Eventing
apiVersion: operator.knative.dev/v1beta1 kind: KnativeEventing metadata: name: knative-eventing namespace: knative-eventing spec: config: tracing: sample-rate: "0.1" 1 backend: zipkin 2 zipkin-endpoint: "http://jaeger-collector.default.svc.cluster.local:9411/api/v2/spans" 3 debug: "false" 4
- 1
- The
sample-rate
defines sampling probability. Usingsample-rate: "0.1"
means that 1 in 10 traces are sampled. - 2
- Set
backend
tozipkin
. - 3
- Point the
zipkin-endpoint
to yourjaeger-collector
service endpoint. To get this endpoint, substitute the namespace where the Jaeger CR is applied. - 4
- Debugging should be set to
false
. Enabling debug mode by settingdebug: "true"
allows all spans to be sent to the server, bypassing sampling.
Verification
You can access the Jaeger web console to see tracing data, by using the jaeger
route.
Get the
jaeger
route’s hostname by entering the following command:$ oc get route jaeger -n default
Example output
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD jaeger jaeger-default.apps.example.com jaeger-query <all> reencrypt None
- Open the endpoint address in your browser to view the console.