Chapter 5. Deploying the Jaeger streaming strategy from the web console
The streaming
deployment strategy is intended for production environments, where long term storage of trace data is important, as well as a more scalable and highly available architecture is required.
The streaming
strategy provides a streaming capability that sits between the collector and the storage (Elasticsearch). This reduces the pressure on the storage under high load situations, and enables other trace post-processing capabilities to tap into the real time span data directly from the streaming platform (Kafka).
The streaming strategy requires an additional Red Hat subscription for AMQ Streams. If you do not have an AMQ Streams subscription, contact your sales representative for more information.
Prerequisites
- The AMQ Streams Operator must be installed. If using version 1.4.0 or higher you can use self-provisioning. If otherwise, you need to create the Kafka instance.
- The Jaeger Operator must be installed.
- Review the instructions for how to customize the Jaeger installation.
-
An account with the
cluster-admin
role.
Procedure
-
Log in to the OpenShift Container Platform web console as a user with the
cluster-admin
role. Create a new project, for example
jaeger-system
.-
Navigate to Home
Projects. - Click Create Project.
-
Enter
jaeger-system
in the Name field. - Click Create.
-
Navigate to Home
-
Navigate to Operators
Installed Operators. -
If necessary, select
jaeger-system
from the Project menu. You may have to wait a few moments for the Operators to be copied to the new project. - Click the Jaeger Operator. On the Overview tab, under Provided APIs, the Operator provides a single link.
- Under Jaeger click Create Instance.
-
On the Create Jaeger page, replace the default
all-in-one
yaml text with your streaming YAML configuration, for example:
Example jaeger-streaming.yaml file
apiVersion: jaegertracing.io/v1 kind: Jaeger metadata: name: jaeger-streaming spec: strategy: streaming collector: options: kafka: producer: topic: jaeger-spans #Note: If brokers are not defined,AMQStreams 1.4.0+ will self-provision Kafka. brokers: my-cluster-kafka-brokers.kafka:9092 storage: type: elasticsearch ingester: options: kafka: consumer: topic: jaeger-spans brokers: my-cluster-kafka-brokers.kafka:9092
- Click Create to create the Jaeger instance.
-
On the Jaegers page, click the name of the Jaeger instance, for example,
jaeger-streaming
. - On the Jaeger Details page, click the Resources tab. Wait until all the pods have a status of "Running" before continuing.
5.1. Deploying Jaeger streaming from the CLI
Follow this procedure to create an instance of Jaeger from the command line.
Prerequisites
- An installed, verified OpenShift Jaeger Operator.
-
Access to the OpenShift CLI (
oc
). -
An account with the
cluster-admin
role.
Procedure
Log in to the OpenShift Container Platform CLI as a user with the
cluster-admin
role.$ oc login https://{HOSTNAME}:8443
Create a new project named
jaeger-system
.$ oc new-project jaeger-system
-
Create a custom resource file named
jaeger-streaming.yaml
that contains the text of the example file in the previous procedure. Run the following command to deploy Jaeger:
$ oc create -n jaeger-system -f jaeger-streaming.yaml
Run the following command to watch the progress of the Pods during the installation process:
$ oc get pods -n jaeger-system -w
Once the installation process has completed, you should see output similar to the following:
NAME READY STATUS RESTARTS AGE elasticsearch-cdm-jaegersystemjaegerstreaming-1-697b66d6fcztcnn 2/2 Running 0 5m40s elasticsearch-cdm-jaegersystemjaegerstreaming-2-5f4b95c78b9gckz 2/2 Running 0 5m37s elasticsearch-cdm-jaegersystemjaegerstreaming-3-7b6d964576nnz97 2/2 Running 0 5m5s jaeger-streaming-collector-6f6db7f99f-rtcfm 1/1 Running 0 80s jaeger-streaming-entity-operator-6b6d67cc99-4lm9q 3/3 Running 2 2m18s jaeger-streaming-ingester-7d479847f8-5h8kc 1/1 Running 0 80s jaeger-streaming-kafka-0 2/2 Running 0 3m1s jaeger-streaming-query-65bf5bb854-ncnc7 3/3 Running 0 80s jaeger-streaming-zookeeper-0 2/2 Running 0 3m39s
5.2. Customizing Jaeger deployment
5.2.1. Jaeger default configuration options
The Jaeger custom resource (CR) defines the architecture and settings to be used when creating the Jaeger resources. You can modify these parameters to customize your Jaeger implementation to your business needs.
Jaeger generic YAML example
apiVersion: jaegertracing.io/v1 kind: Jaeger metadata: name: name spec: strategy: <deployment_strategy> allInOne: options: {} resources: {} agent: options: {} resources: {} collector: options: {} resources: {} sampling: options: {} storage: type: options: {} query: options: {} resources: {} ingester: options: {} resources: {} options: {}
Parameter | Description | Values | Default value |
---|---|---|---|
| Version of the Application Program Interface to use when creating the object. |
|
|
| Defines the kind of Kubernetes object to create. |
| |
|
Data that helps uniquely identify the object, including a |
OpenShift automatically generates the | |
| Name for the object. | The name of your Jaeger instance. |
|
| Specification for the object to be created. | Contains all of the configuration parameters for your Jaeger instance. When a common definition (for all Jaeger components) is required, it is defined under the spec node. When the definition relates to an individual component, it is placed under the spec/<component> node. | N/A |
| Jaeger deployment strategy |
|
|
| Because the allInOne image deploys the agent, collector, query, ingester, Jaeger UI in a single pod, configuration for this deployment should nest component configuration under the allInOne parameter. | ||
| Configuration options that define the Jaeger agent. | ||
| Configuration options that define the Jaeger Collector. | ||
| Configuration options that define the sampling strategies for tracing. | ||
|
Configuration options that define the storage. All storage related options should be placed under | ||
| Configuration options that define the Query service. | ||
| Configuration options that define the Ingester service. |
The following example YAML is the minimum required to create a Jaeger instance using the default settings.
Example minimum required jaeger-all-in-one.yaml
apiVersion: jaegertracing.io/v1 kind: Jaeger metadata: name: jaeger-all-in-one-inmemory
5.2.2. Jaeger Collector configuration options
The Jaeger Collector is the component responsible for receiving the spans that were captured by the tracer and writing them to a persistent storage (Elasticsearch) when using the production
strategy, or to AMQ Streams when using the streaming
strategy.
The collectors are stateless and thus many instances of Jaeger Collector can be run in parallel. Collectors require almost no configuration, except for the location of the Elasticsearch cluster.
Parameter | Description | Values |
---|---|---|
spec: collector: options: {} | Configuration options that define the Jaeger Collector. | |
autoscale: |
This parameter controls whether to enable/disable autoscaling for the Collector. Set to |
|
kafka: producer: topic: jaeger-spans |
The | Label for the producer |
kafka: producer: brokers: my-cluster-kafka-brokers.kafka:9092 | Identifies the Kafka configuration used by the Collector to produce the messages. If brokers are not specified, and you have AMQ Streams 1.4.0+ installed, Jaeger will self-provision Kafka. | |
log-level: | Logging level for the collector. |
|
maxReplicas: | Specifies the maximum number of replicas to create when autoscaling the Collector. |
Integer, for example, |
num-workers: | The number of workers pulling from the queue. |
Integer, for example, |
queue-size: | The size of the Collector queue. |
Integer, for example, |
replicas: | Specifies the number of Collector replicas to create. |
Integer, for example, |
5.2.2.1. Configuring the Collector for autoscaling
You can configure the Collector to autoscale; the Collector will scale up or down based on the CPU and/or memory consumption. Configuring the Collector to autoscale can help you ensure your Jaeger environment scales up during times of increased load, and scales down when less resources are needed, saving on costs. You configure autoscaling by setting the autoscale
parameter to true
and specifying a value for .spec.collector.maxReplicas
along with a reasonable value for the resources that you expect the Collector’s pod to consume. If you do not set a value for .spec.collector.maxReplicas
the Operator will set it to 100
.
By default, when there is no value provided for .spec.collector.replicas
, the Jaeger Operator creates a Horizontal Pod Autoscaler (HPA) configuration for the Collector. For more information about HPA, refer to the Kubernetes documentation.
The following is an example autoscaling configuration, setting the Collector’s limits as well as the maximum number of replicas:
Collector autoscaling example
apiVersion: jaegertracing.io/v1 kind: Jaeger metadata: name: simple-prod spec: strategy: production collector: maxReplicas: 5 resources: limits: cpu: 100m memory: 128Mi
5.2.3. Jaeger sampling configuration options
The Operator can be used to define sampling strategies that will be supplied to tracers that have been configured to use a remote sampler.
While all traces are generated, only a few are sampled. Sampling a trace marks the trace for further processing and storage.
This is not relevant if a trace was started by the Istio proxy as the sampling decision is made there. The Jaeger sampling decision is only relevant when the trace is started by an application using the Jaeger tracer.
When a service receives a request that contains no trace context, the Jaeger tracer will start a new trace, assign it a random trace ID, and make a sampling decision based on the currently installed sampling strategy. The sampling decision is propagated to all subsequent requests in the trace, so that other services are not making the sampling decision again.
Jaeger libraries support the following samplers:
- Constant - The sampler always makes the same decision for all traces. It either samples all traces (sampling.param=1) or none of them (sampling.param=0).
-
Probabilistic - The sampler makes a random sampling decision with the probability of sampling equal to the value of the
sampling.param
property. For example, with sampling.param=0.1 approximately 1 in 10 traces will be sampled. - Rate Limiting - The sampler uses a leaky bucket rate limiter to ensure that traces are sampled with a certain constant rate. For example, when sampling.param=2.0 it will sample requests with the rate of 2 traces per second.
- Remote - The sampler consults the Jaeger agent for the appropriate sampling strategy to use in the current service. This allows controlling the sampling strategies in the services from a central configuration in the Jaeger backend.
Parameter | Description | Values | Default value |
---|---|---|---|
spec: sampling: options: {} | Configuration options that define the sampling strategies for tracing. | ||
sampling: type: | Sampling strategy to use. (See descriptions above.) |
Valid values are |
|
sampling: options: type: param: | Parameters for the selected sampling strategy. (See examples above.) | Decimal and integer values (0, .1, 1, 10) | N/A |
This example defines a default sampling strategy that is probabilistic, with a 50% chance of the trace instances being sampled.
Probabilistic sampling example
apiVersion: jaegertracing.io/v1 kind: Jaeger metadata: name: with-sampling spec: strategy: allInOne sampling: options: default_strategy: type: probabilistic param: 50
5.2.4. Jaeger storage configuration options
You configure storage for the Collector, Ingester, and Query services under spec:storage
. Multiple instances of each of these components can be provisioned as required for performance and resilience purposes.
Restrictions
- There can be only one Jaeger with self-provisioned Elasticsearch instance per namespace.
- There can be only one Elasticsearch per namespace.
- You cannot share or reuse a OpenShift Jaeger logging Elasticsearch instance with Jaeger. The Elasticsearch cluster is meant to be dedicated for a single Jaeger instance.
If you already have installed Elasticsearch as part of OpenShift logging, the Jaeger Operator can use the installed Elasticsearch Operator to provision storage.
Parameter | Description | Values | Default value |
---|---|---|---|
spec: storage: options: {} | Configuration options that define the storage. | ||
storage: type: | Type of storage to use for the deployment. |
|
|
Parameter | Description | Values | Default value |
---|---|---|---|
General Elasticsearch configuration settings | |||
elasticsearch: server-urls: | URL of the Elasticsearch instance. |
The fully-qualified domain name of the Elasticsearch server. If you have specified | |
es: max-num-spans: | The maximum number of spans to fetch at a time, per query, in Elasticsearch. | 10000 | |
es: max-span-age: | The maximum lookback for spans in Elasticsearch. | 72h0m0s | |
elasticsearch: secretname: |
Name of the secret, for example | N/A | |
es: sniffer: | The sniffer configuration for Elasticsearch. The client uses the sniffing process to find all nodes automatically. Disabled by default. |
|
|
es: timeout: | Timeout used for queries. When set to zero there is no timeout. | 0s | |
es: username: |
The username required by Elasticsearch. The basic authentication also loads CA if it is specified. See also | ||
es: password: |
The password required by Elasticsearch. See also, | ||
es: version: | The major Elasticsearch version. If not specified, the value will be auto-detected from Elasticsearch. | 0 | |
Elasticsearch resource configuration settings | |||
elasticsearch: nodeCount: | Number of Elasticsearch nodes. For high availability use at least 3 nodes. Do not use 2 nodes as “split brain” problem can happen. | Integer value. For example, Proof of concept = 1, Minimum deployment =3 | 1 |
elasticsearch: resources: requests: cpu: | Number of central processing units for requests, based on your environment’s configuration. | Specified in cores or millicores (for example, 200m, 0.5, 1). For example, Proof of concept = 500m, Minimum deployment =1 | 1Gi |
elasticsearch: resources: requests: memory: | Available memory for requests, based on your environment’s configuration. | Specified in bytes (for example, 200Ki, 50Mi, 5Gi). For example, Proof of concept = 1Gi, Minimum deployment = 16Gi* | 500m |
elasticsearch: resources: limits: cpu: | Limit on number of central processing units, based on your environment’s configuration. | Specified in cores or millicores (for example, 200m, 0.5, 1). For example, Proof of concept = 500m, Minimum deployment =1 | |
elasticsearch: resources: limits: memory: | Available memory limit based on your environment’s configuration. | Specified in bytes (for example, 200Ki, 50Mi, 5Gi). For example, Proof of concept = 1Gi, Minimum deployment = 16Gi* | |
*Each Elasticsearch node can operate with a lower memory setting though this is NOT recommended for production deployments. For production use, you should have no less than 16Gi allocated to each Pod by default, but preferably allocate as much as you can, up to 64Gi per Pod. | |||
Elasticsearch data replication options | |||
elasticsearch: redundancyPolicy: | Data replication policy defines how Elasticsearch shards are replicated across data nodes in the cluster. If not specified,the Jaeger Operator automatically determines the most appropriate replication based on number of nodes. |
| |
es: num-replicas: | The number of replicas per index in Elasticsearch. | 1 | |
es: num-shards: | The number of shards per index in Elasticsearch. | 5 | |
Elasticsearch index and index cleaner configuration options | |||
es: create-index-templates: |
Automatically create index templates at application startup when set to |
|
|
es: index-prefix: | Optional prefix for Jaeger indices. For example, setting this to "production" creates indices named "production-jaeger-*". | ||
esIndexCleaner: enabled: | When using Elasticsearch storage, by default a job is created to clean old traces from the index. This parameter enables or disables the index cleaner job. |
|
|
esIndexCleaner: numberOfDays: | Number of days to wait before deleting an index. | Integer value |
|
esIndexCleaner: schedule: | Defines the schedule for how often to clean the Elasticsearch index. | Cron expression | "55 23 * * *" |
esRollover: schedule: | Defines the schedule for how often to rollover to a new Elasticsearch index. | Cron expression | '*/30 * * * *' |
Configuration settings for Elasticsearch bulk processor | |||
es: bulk: actions: | The number of requests that can be added to the queue before the bulk processor decides to commit updates to disk. | 1000 | |
es: bulk: flush-interval: |
A | 200ms | |
es: bulk: size: | The number of bytes that the bulk requests can take up before the bulk processor decides to commit updates to disk. | 5000000 | |
es: bulk: workers: | The number of workers that are able to receive and commit bulk requests to Elasticsearch. | 1 | |
Elasticsearch TLS configuration settings | |||
es: tls: ca: | Path to a TLS Certification Authority (CA) file used to verify the remote server(s). | Will use the system truststore by default. | |
es: tls: cert: | Path to a TLS Certificate file, used to identify this process to the remote server(s). | ||
es: tls: enabled: | Enable transport layer security (TLS) when talking to the remote server(s). Disabled by default. |
|
|
es: tls: key: | Path to a TLS Private Key file, used to identify this process to the remote server(s). | ||
es: tls: server-name: | Override the expected TLS server name in the certificate of the remote server(s). | ||
es: token-file: | Path to a file containing the bearer token. This flag also loads the Certification Authority (CA) file if it is specified. | ||
Elasticsearch archive configuration settings | |||
es-archive: bulk: actions: | The number of requests that can be added to the queue before the bulk processor decides to commit updates to disk. | 0 | |
es-archive: bulk: flush-interval: |
A | 0s | |
es-archive: bulk: size: | The number of bytes that the bulk requests can take up before the bulk processor decides to commit updates to disk. | 0 | |
es-archive: bulk: workers: | The number of workers that are able to receive and commit bulk requests to Elasticsearch. | 0 | |
es-archive: create-index-templates: |
Automatically create index templates at application startup when set to |
|
|
es-archive: enabled: | Enable extra storage. |
|
|
es-archive: index-prefix: | Optional prefix for Jaeger indices. For example, setting this to "production" creates indices named "production-jaeger-*". | ||
es-archive: max-num-spans: | The maximum number of spans to fetch at a time, per query, in Elasticsearch. | 0 | |
es-archive: max-span-age: | The maximum lookback for spans in Elasticsearch. | 0s | |
es-archive: num-replicas: | The number of replicas per index in Elasticsearch. | 0 | |
es-archive: num-shards: | The number of shards per index in Elasticsearch. | 0 | |
es-archive: password: |
The password required by Elasticsearch. See also, | ||
es-archive: server-urls: |
The comma-separated list of Elasticsearch servers. Must be specified as fully qualified URLs, for example, | ||
es-archive: sniffer: | The sniffer configuration for Elasticsearch. The client uses the sniffing process to find all nodes automatically. Disabled by default. |
|
|
es-archive: timeout: | Timeout used for queries. When set to zero there is no timeout. | 0s | |
es-archive: tls: ca: | Path to a TLS Certification Authority (CA) file used to verify the remote server(s). | Will use the system truststore by default. | |
es-archive: tls: cert: | Path to a TLS Certificate file, used to identify this process to the remote server(s). | ||
es-archive: tls: enabled: | Enable transport layer security (TLS) when talking to the remote server(s). Disabled by default. |
|
|
es-archive: tls: key: | Path to a TLS Private Key file, used to identify this process to the remote server(s). | ||
es-archive: tls: server-name: | Override the expected TLS server name in the certificate of the remote server(s). | ||
es-archive: token-file: | Path to a file containing the bearer token. This flag also loads the Certification Authority (CA) file if it is specified. | ||
es-archive: username: |
The username required by Elasticsearch. The basic authentication also loads CA if it is specified. See also | ||
es-archive: version: | The major Elasticsearch version. If not specified, the value will be auto-detected from Elasticsearch. | 0 |
Production storage example
apiVersion: jaegertracing.io/v1 kind: Jaeger metadata: name: simple-prod spec: strategy: production storage: type: elasticsearch elasticsearch: nodeCount: 3 resources: requests: cpu: 1 memory: 16Gi limits: memory: 16Gi
Storage example with volume mounts
apiVersion: jaegertracing.io/v1 kind: Jaeger metadata: name: simple-prod spec: strategy: production storage: type: elasticsearch options: es: server-urls: https://quickstart-es-http.default.svc:9200 index-prefix: my-prefix tls: ca: /es/certificates/ca.crt secretName: jaeger-secret volumeMounts: - name: certificates mountPath: /es/certificates/ readOnly: true volumes: - name: certificates secret: secretName: quickstart-es-http-certs-public
Storage example with persistent storage:
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: simple-prod
spec:
strategy: production
storage:
type: elasticsearch
elasticsearch:
nodeCount: 1
storage: 1
storageClassName: gp2
size: 5Gi
resources:
requests:
cpu: 200m
memory: 4Gi
limits:
memory: 4Gi
redundancyPolicy: ZeroRedundancy
- 1
- Persistent storage configuration. In this case AWS
gp2
with5Gi
size. When no value is specified, Jaeger usesemptyDir
. The Elasticsearch Operator provisionsPersistentVolumeClaim
andPersistentVolume
which are not removed with Jaeger instance. You can mount the same volumes if you create a Jaeger instance with the same name and namespace.
5.2.5. Jaeger Query configuration options
Query is a service that retrieves traces from storage and hosts the user interface to display them.
Parameter | Description | Values | Default value |
---|---|---|---|
spec: query: options: {} resources: {} | Configuration options that define the Query service. | ||
query: additional-headers: | Additional HTTP response headers. Can be specified multiple times. | Format: "Key: Value" | |
query: base-path: |
The base path for all jaeger-query HTTP routes can be set to a non-root value, for example, | /{path} | |
query: port: | The port for the query service. | 16686 | |
options: log-level: | Logging level for Query. |
Possible values: |
Sample Query configuration
apiVersion: jaegertracing.io/v1 kind: "Jaeger" metadata: name: "my-jaeger" spec: strategy: allInOne allInOne: options: log-level: debug query: base-path: /jaeger
5.2.6. Jaeger Ingester configuration options
Ingester is a service that reads from a Kafka topic and writes to another storage backend (Elasticsearch). If you are using the allInOne
or production
deployment strategies, you do not need to configure the Ingester service.
Parameter | Description | Values |
---|---|---|
spec: strategy: streaming ingester: options: {} | Configuration options that define the Ingester service. | |
autoscale: |
This parameter controls whether to enable/disable autoscaling for the Ingester. Autoscaling is enabled by default. Set to |
|
kafka: consumer: topic: |
The |
Label for the consumer. For example, |
kafka: consumer: brokers: | Identifies the Kafka configuration used by the Ingester to consume the messages. |
Label for the broker, for example, |
ingester: deadlockInterval: | Specifies the interval (in seconds or minutes) that the Ingester should wait for a message before terminating. The deadlock interval is disabled by default (set to 0), to avoid the Ingester being terminated when no messages arrive while the system is being initialized. |
Minutes and seconds, for example, |
log-level: | Logging level for the Ingester. |
Possible values: |
maxReplicas: | Specifies the maximum number of replicas to create when autoscaling the Ingester. |
Integer, for example, |
Streaming Collector and Ingester example
apiVersion: jaegertracing.io/v1 kind: Jaeger metadata: name: simple-streaming spec: strategy: streaming collector: options: kafka: producer: topic: jaeger-spans brokers: my-cluster-kafka-brokers.kafka:9092 ingester: options: kafka: consumer: topic: jaeger-spans brokers: my-cluster-kafka-brokers.kafka:9092 ingester: deadlockInterval: 5 storage: type: elasticsearch options: es: server-urls: http://elasticsearch:9200
5.2.6.1. Configuring Ingester for autoscaling
You can configure the Ingester to autoscale; the Ingester will scale up or down based on the CPU and/or memory consumption. Configuring the Ingester to autoscale can help you ensure your Jaeger environment scales up during times of increased load, and scales down when less resources are needed, saving on costs. You configure autoscaling by setting the autoscale
parameter to true
and specifying a value for .spec.ingester.maxReplicas
along with a reasonable value for the resources that you expect the Ingester’s pod to consume. If you do not set a value for .spec.ingester.maxReplicas
the Operator will set it to 100
.
By default, when there is no value provided for .spec.ingester.replicas
, the Jaeger Operator creates a Horizontal Pod Autoscaler (HPA) configuration for the Ingester. For more information about HPA, refer to the Kubernetes documentation.
The following is an example autoscaling configuration, setting the Ingester’s limits as well as the maximum number of replicas:
Ingester autoscaling example
apiVersion: jaegertracing.io/v1 kind: Jaeger metadata: name: simple-streaming spec: strategy: streaming ingester: maxReplicas: 8 resources: limits: cpu: 100m memory: 128Mi
5.3. Injecting sidecars
OpenShift Jaeger relies on a proxy sidecar within the application’s pod to provide the agent. The Jaeger Operator can inject Jaeger Agent sidecars into Deployment workloads. You can enable automatic sidecar injection or manage it manually.