Observability
Observability features including administrator and developer metrics, cluster logging, and tracing
Abstract
Chapter 1. Administrator metrics
1.1. Serverless administrator metrics
Metrics enable cluster administrators to monitor how OpenShift Serverless cluster components and workloads are performing.
You can view different metrics for OpenShift Serverless by navigating to Dashboards in the web console Administrator perspective.
1.1.1. Prerequisites
- See the OpenShift Container Platform documentation on Managing metrics for information about enabling metrics for your cluster.
- You have access to an account with cluster administrator access (or dedicated administrator access for OpenShift Dedicated or Red Hat OpenShift Service on AWS).
- You have access to the Administrator perspective in the web console.
If Service Mesh is enabled with mTLS, metrics for Knative Serving are disabled by default because Service Mesh prevents Prometheus from scraping metrics.
For information about resolving this issue, see Enabling Knative Serving metrics when using Service Mesh with mTLS.
Scraping the metrics does not affect autoscaling of a Knative service, because scraping requests do not go through the activator. Consequently, no scraping takes place if no pods are running.
1.2. Serverless controller metrics
The following metrics are emitted by any component that implements a controller logic. These metrics show details about reconciliation operations and the work queue behavior upon which reconciliation requests are added to the work queue.
Metric name | Description | Type | Tags | Unit |
---|---|---|---|---|
| The depth of the work queue. | Gauge |
| Integer (no units) |
| The number of reconcile operations. | Counter |
| Integer (no units) |
| The latency of reconcile operations. | Histogram |
| Milliseconds |
| The total number of add actions handled by the work queue. | Counter |
| Integer (no units) |
| The length of time an item stays in the work queue before being requested. | Histogram |
| Seconds |
| The total number of retries that have been handled by the work queue. | Counter |
| Integer (no units) |
| The length of time it takes to process and item from the work queue. | Histogram |
| Seconds |
| The length of time that outstanding work queue items have been in progress. | Histogram |
| Seconds |
| The length of time that the longest outstanding work queue items has been in progress. | Histogram |
| Seconds |
1.3. Webhook metrics
Webhook metrics report useful information about operations. For example, if a large number of operations fail, this might indicate an issue with a user-created resource.
Metric name | Description | Type | Tags | Unit |
---|---|---|---|---|
| The number of requests that are routed to the webhook. | Counter |
| Integer (no units) |
| The response time for a webhook request. | Histogram |
| Milliseconds |
1.4. Knative Eventing metrics
Cluster administrators can view the following metrics for Knative Eventing components.
By aggregating the metrics from HTTP code, events can be separated into two categories; successful events (2xx) and failed events (5xx).
1.4.1. Broker ingress metrics
You can use the following metrics to debug the broker ingress, see how it is performing, and see which events are being dispatched by the ingress component.
Metric name | Description | Type | Tags | Unit |
---|---|---|---|---|
| Number of events received by a broker. | Counter |
| Integer (no units) |
| The time taken to dispatch an event to a channel. | Histogram |
| Milliseconds |
1.4.2. Broker filter metrics
You can use the following metrics to debug broker filters, see how they are performing, and see which events are being dispatched by the filters. You can also measure the latency of the filtering action on an event.
Metric name | Description | Type | Tags | Unit |
---|---|---|---|---|
| Number of events received by a broker. | Counter |
| Integer (no units) |
| The time taken to dispatch an event to a channel. | Histogram |
| Milliseconds |
| The time it takes to process an event before it is dispatched to a trigger subscriber. | Histogram |
| Milliseconds |
1.4.3. InMemoryChannel dispatcher metrics
You can use the following metrics to debug InMemoryChannel
channels, see how they are performing, and see which events are being dispatched by the channels.
Metric name | Description | Type | Tags | Unit |
---|---|---|---|---|
|
Number of events dispatched by | Counter |
| Integer (no units) |
|
The time taken to dispatch an event from an | Histogram |
| Milliseconds |
1.4.4. Event source metrics
You can use the following metrics to verify that events have been delivered from the event source to the connected event sink.
Metric name | Description | Type | Tags | Unit |
---|---|---|---|---|
| Number of events sent by the event source. | Counter |
| Integer (no units) |
| Number of retried events sent by the event source after initially failing to be delivered. | Counter |
| Integer (no units) |
1.4.5. Knative Kafka broker metrics
You can use the following metrics to debug and visualize the performance of Kafka broker.
Metric name | Description | Type | Tags | Unit |
---|---|---|---|---|
| Number of events received by a broker | Counter |
| Dimensionless |
| The time spent dispatching an event to a Kafka cluster | Histogram |
| Milliseconds |
| Number of expected replicas for a given Kafka consumer group resource | Gauge |
Note In this context, resources refer to user facing entities such as Kafka source, trigger, and subscription. Avoid using internal or generated names when using these resources. | Dimensionless |
| Number of ready replicas for a given Kafka consumer group resource | Gauge |
Note In this context, resources refer to user facing entities such as Kafka source, trigger, and subscription. Avoid using internal or generated names when using these resources. | Dimensionless |
1.4.6. Knative Kafka trigger metrics
You can use the following metrics to debug and visualize the performance of Kafka triggers.
Metric name | Description | Type | Tags | Unit |
---|---|---|---|---|
| Number of events dispatched by a trigger to a subscriber | Counter |
| Dimensionless |
| The time spent dispatching an event to a subscriber | Histogram |
| Milliseconds |
| The time spent processing and filtering an event | Histogram |
| Milliseconds |
1.4.7. Knative Kafka channel metrics
You can use the following metrics to debug and visualize the performance of Kafka channel.
Metric name | Description | Type | Tags | Unit |
---|---|---|---|---|
| Number of events received by a Kafka channel | Counter |
| Dimensionless |
| The time spent dispatching an event to a Kafka cluster | Histogram |
| Milliseconds |
1.4.8. Knative Kafka subscription metrics
You can use the following metrics to debug and visualize the performance of subscriptions associated with the Kafka channel.
Metric name | Description | Type | Tags | Unit |
---|---|---|---|---|
| Number of events dispatched by a subscription to a subscriber | Counter |
| Dimensionless |
| The time spent dispatching an event to a subscriber | Histogram |
| Milliseconds |
| The time spent processing an event | Histogram |
| Dimensionless |
1.4.9. Knative Kafka source metrics
You can use the following metrics to debug and visualize the performance of Kafka sources.
Metric name | Description | Type | Tags | Unit |
---|---|---|---|---|
| Number of events dispatched by a Kafka source | Counter |
| Dimensionless |
| The time spent dispatching an event to a sink | Histogram |
| Milliseconds |
| The time spent processing an event | Histogram |
| Milliseconds |
| Number of expected replicas for a given Kafka consumer group resource | Gauge |
Note In this context, resources refer to user facing entities such as Kafka source,trigger, and subscription. Avoid using internal or generated names when using these resources. | Dimensionless |
| Number of ready replicas for a given Kafka consumer group resource | Gauge |
Note In this context, resources refer to user facing entities such as Kafka source,trigger, and subscription. Avoid using internal or generated names when using these resources. | Dimensionless |
1.4.10. Knative Kafka sink metrics
You can use the following metrics to debug and visualize the performance of Kafka sinks.
Metric name | Description | Type | Tags | Unit |
---|---|---|---|---|
| Number of events received by a broker | Counter |
| Dimensionless |
| The time spent dispatching an event to a Kafka cluster | Histogram |
| Milliseconds |
1.5. Knative Serving metrics
Cluster administrators can view the following metrics for Knative Serving components.
1.5.1. Activator metrics
You can use the following metrics to understand how applications respond when traffic passes through the activator.
Metric name | Description | Type | Tags | Unit |
---|---|---|---|---|
| The number of concurrent requests that are routed to the activator, or average concurrency over a reporting period. | Gauge |
| Integer (no units) |
| The number of requests that are routed to activator. These are requests that have been fulfilled from the activator handler. | Counter |
| Integer (no units) |
| The response time in milliseconds for a fulfilled, routed request. | Histogram |
| Milliseconds |
1.5.2. Autoscaler metrics
The autoscaler component exposes a number of metrics related to autoscaler behavior for each revision. For example, at any given time, you can monitor the targeted number of pods the autoscaler tries to allocate for a service, the average number of requests per second during the stable window, or whether the autoscaler is in panic mode if you are using the Knative pod autoscaler (KPA).
Metric name | Description | Type | Tags | Unit |
---|---|---|---|---|
| The number of pods the autoscaler tries to allocate for a service. | Gauge |
| Integer (no units) |
| The excess burst capacity served over the stable window. | Gauge |
| Integer (no units) |
| The average number of requests for each observed pod over the stable window. | Gauge |
| Integer (no units) |
| The average number of requests for each observed pod over the panic window. | Gauge |
| Integer (no units) |
| The number of concurrent requests that the autoscaler tries to send to each pod. | Gauge |
| Integer (no units) |
| The average number of requests-per-second for each observed pod over the stable window. | Gauge |
| Integer (no units) |
| The average number of requests-per-second for each observed pod over the panic window. | Gauge |
| Integer (no units) |
| The number of requests-per-second that the autoscaler targets for each pod. | Gauge |
| Integer (no units) |
|
This value is | Gauge |
| Integer (no units) |
| The number of pods that the autoscaler has requested from the Kubernetes cluster. | Gauge |
| Integer (no units) |
| The number of pods that are allocated and currently have a ready state. | Gauge |
| Integer (no units) |
| The number of pods that have a not ready state. | Gauge |
| Integer (no units) |
| The number of pods that are currently pending. | Gauge |
| Integer (no units) |
| The number of pods that are currently terminating. | Gauge |
| Integer (no units) |
1.5.3. Go runtime metrics
Each Knative Serving control plane process emits a number of Go runtime memory statistics (MemStats).
The name
tag for each metric is an empty tag.
Metric name | Description | Type | Tags | Unit |
---|---|---|---|---|
|
The number of bytes of allocated heap objects. This metric is the same as | Gauge |
| Integer (no units) |
| The cumulative bytes allocated for heap objects. | Gauge |
| Integer (no units) |
| The total bytes of memory obtained from the operating system. | Gauge |
| Integer (no units) |
| The number of pointer lookups performed by the runtime. | Gauge |
| Integer (no units) |
| The cumulative count of heap objects allocated. | Gauge |
| Integer (no units) |
| The cumulative count of heap objects that have been freed. | Gauge |
| Integer (no units) |
| The number of bytes of allocated heap objects. | Gauge |
| Integer (no units) |
| The number of bytes of heap memory obtained from the operating system. | Gauge |
| Integer (no units) |
| The number of bytes in idle, unused spans. | Gauge |
| Integer (no units) |
| The number of bytes in spans that are currently in use. | Gauge |
| Integer (no units) |
| The number of bytes of physical memory returned to the operating system. | Gauge |
| Integer (no units) |
| The number of allocated heap objects. | Gauge |
| Integer (no units) |
| The number of bytes in stack spans that are currently in use. | Gauge |
| Integer (no units) |
| The number of bytes of stack memory obtained from the operating system. | Gauge |
| Integer (no units) |
|
The number of bytes of allocated | Gauge |
| Integer (no units) |
|
The number of bytes of memory obtained from the operating system for | Gauge |
| Integer (no units) |
|
The number of bytes of allocated | Gauge |
| Integer (no units) |
|
The number of bytes of memory obtained from the operating system for | Gauge |
| Integer (no units) |
| The number of bytes of memory in profiling bucket hash tables. | Gauge |
| Integer (no units) |
| The number of bytes of memory in garbage collection metadata. | Gauge |
| Integer (no units) |
| The number of bytes of memory in miscellaneous, off-heap runtime allocations. | Gauge |
| Integer (no units) |
| The target heap size of the next garbage collection cycle. | Gauge |
| Integer (no units) |
| The time that the last garbage collection was completed in Epoch or Unix time. | Gauge |
| Nanoseconds |
| The cumulative time in garbage collection stop-the-world pauses since the program started. | Gauge |
| Nanoseconds |
| The number of completed garbage collection cycles. | Gauge |
| Integer (no units) |
| The number of garbage collection cycles that were forced due to an application calling the garbage collection function. | Gauge |
| Integer (no units) |
| The fraction of the available CPU time of the program that has been used by the garbage collector since the program started. | Gauge |
| Integer (no units) |
Chapter 2. Developer metrics
2.1. Serverless developer metrics overview
Metrics enable developers to monitor how Knative services are performing. You can use the OpenShift Container Platform monitoring stack to record and view health checks and metrics for your Knative services.
You can view different metrics for OpenShift Serverless by navigating to Dashboards in the web console Developer perspective.
If Service Mesh is enabled with mTLS, metrics for Knative Serving are disabled by default because Service Mesh prevents Prometheus from scraping metrics.
For information about resolving this issue, see Enabling Knative Serving metrics when using Service Mesh with mTLS.
Scraping the metrics does not affect autoscaling of a Knative service, because scraping requests do not go through the activator. Consequently, no scraping takes place if no pods are running.
2.2. Knative service metrics exposed by default
Metric name, unit, and type | Description | Metric tags |
---|---|---|
Metric unit: dimensionless Metric type: counter |
The number of requests that are routed to | configuration_name="event-display", container_name="queue-proxy", namespace_name="apiserversource1", pod_name="event-display-00001-deployment-658fd4f9cf-qcnr5", response_code="200", response_code_class="2xx", revision_name="event-display-00001", service_name="event-display" |
Metric unit: milliseconds Metric type: histogram | The response time in milliseconds. | configuration_name="event-display", container_name="queue-proxy", namespace_name="apiserversource1", pod_name="event-display-00001-deployment-658fd4f9cf-qcnr5", response_code="200", response_code_class="2xx", revision_name="event-display-00001", service_name="event-display" |
Metric unit: dimensionless Metric type: counter |
The number of requests that are routed to | configuration_name="event-display", container_name="queue-proxy", namespace_name="apiserversource1", pod_name="event-display-00001-deployment-658fd4f9cf-qcnr5", response_code="200", response_code_class="2xx", revision_name="event-display-00001", service_name="event-display" |
Metric unit: milliseconds Metric type: histogram | The response time in milliseconds. | configuration_name="event-display", container_name="queue-proxy", namespace_name="apiserversource1", pod_name="event-display-00001-deployment-658fd4f9cf-qcnr5", response_code="200", response_code_class="2xx", revision_name="event-display-00001", service_name="event-display" |
Metric unit: dimensionless Metric type: gauge |
The current number of items in the serving and waiting queue, or not reported if unlimited concurrency. | configuration_name="event-display", container_name="queue-proxy", namespace_name="apiserversource1", pod_name="event-display-00001-deployment-658fd4f9cf-qcnr5", response_code="200", response_code_class="2xx", revision_name="event-display-00001", service_name="event-display" |
2.3. Knative service with custom application metrics
You can extend the set of metrics exported by a Knative service. The exact implementation depends on your application and the language used.
The following listing implements a sample Go application that exports the count of processed events custom metric.
package main import ( "fmt" "log" "net/http" "os" "github.com/prometheus/client_golang/prometheus" "github.com/prometheus/client_golang/prometheus/promauto" "github.com/prometheus/client_golang/prometheus/promhttp" ) var ( opsProcessed = promauto.NewCounter(prometheus.CounterOpts{ Name: "myapp_processed_ops_total", Help: "The total number of processed events", }) ) func handler(w http.ResponseWriter, r *http.Request) { log.Print("helloworld: received a request") target := os.Getenv("TARGET") if target == "" { target = "World" } fmt.Fprintf(w, "Hello %s!\n", target) opsProcessed.Inc() } func main() { log.Print("helloworld: starting server...") port := os.Getenv("PORT") if port == "" { port = "8080" } http.HandleFunc("/", handler) // Separate server for metrics requests go func() { mux := http.NewServeMux() server := &http.Server{ Addr: fmt.Sprintf(":%s", "9095"), Handler: mux, } mux.Handle("/metrics", promhttp.Handler()) log.Printf("prometheus: listening on port %s", 9095) log.Fatal(server.ListenAndServe()) }() // Use same port as normal requests for metrics //http.Handle("/metrics", promhttp.Handler()) log.Printf("helloworld: listening on port %s", port) log.Fatal(http.ListenAndServe(fmt.Sprintf(":%s", port), nil)) }
package main
import (
"fmt"
"log"
"net/http"
"os"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
"github.com/prometheus/client_golang/prometheus/promhttp"
)
var (
opsProcessed = promauto.NewCounter(prometheus.CounterOpts{
Name: "myapp_processed_ops_total",
Help: "The total number of processed events",
})
)
func handler(w http.ResponseWriter, r *http.Request) {
log.Print("helloworld: received a request")
target := os.Getenv("TARGET")
if target == "" {
target = "World"
}
fmt.Fprintf(w, "Hello %s!\n", target)
opsProcessed.Inc()
}
func main() {
log.Print("helloworld: starting server...")
port := os.Getenv("PORT")
if port == "" {
port = "8080"
}
http.HandleFunc("/", handler)
// Separate server for metrics requests
go func() {
mux := http.NewServeMux()
server := &http.Server{
Addr: fmt.Sprintf(":%s", "9095"),
Handler: mux,
}
mux.Handle("/metrics", promhttp.Handler())
log.Printf("prometheus: listening on port %s", 9095)
log.Fatal(server.ListenAndServe())
}()
// Use same port as normal requests for metrics
//http.Handle("/metrics", promhttp.Handler())
log.Printf("helloworld: listening on port %s", port)
log.Fatal(http.ListenAndServe(fmt.Sprintf(":%s", port), nil))
}
2.4. Configuration for scraping custom metrics
Custom metrics scraping is performed by an instance of Prometheus purposed for user workload monitoring. After you enable user workload monitoring and create the application, you need a configuration that defines how the monitoring stack will scrape the metrics.
The following sample configuration defines the ksvc
for your application and configures the service monitor. The exact configuration depends on your application and how it exports the metrics.
apiVersion: serving.knative.dev/v1 kind: Service metadata: name: helloworld-go spec: template: metadata: labels: app: helloworld-go annotations: spec: containers: - image: docker.io/skonto/helloworld-go:metrics resources: requests: cpu: "200m" env: - name: TARGET value: "Go Sample v1" --- apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: name: helloworld-go-sm spec: endpoints: - port: queue-proxy-metrics scheme: http - port: app-metrics scheme: http namespaceSelector: {} selector: matchLabels: name: helloworld-go-sm --- apiVersion: v1 kind: Service metadata: labels: name: helloworld-go-sm name: helloworld-go-sm spec: ports: - name: queue-proxy-metrics port: 9091 protocol: TCP targetPort: 9091 - name: app-metrics port: 9095 protocol: TCP targetPort: 9095 selector: serving.knative.dev/service: helloworld-go type: ClusterIP
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: helloworld-go
spec:
template:
metadata:
labels:
app: helloworld-go
annotations:
spec:
containers:
- image: docker.io/skonto/helloworld-go:metrics
resources:
requests:
cpu: "200m"
env:
- name: TARGET
value: "Go Sample v1"
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
name: helloworld-go-sm
spec:
endpoints:
- port: queue-proxy-metrics
scheme: http
- port: app-metrics
scheme: http
namespaceSelector: {}
selector:
matchLabels:
name: helloworld-go-sm
---
apiVersion: v1
kind: Service
metadata:
labels:
name: helloworld-go-sm
name: helloworld-go-sm
spec:
ports:
- name: queue-proxy-metrics
port: 9091
protocol: TCP
targetPort: 9091
- name: app-metrics
port: 9095
protocol: TCP
targetPort: 9095
selector:
serving.knative.dev/service: helloworld-go
type: ClusterIP
2.5. Examining metrics of a service
After you have configured the application to export the metrics and the monitoring stack to scrape them, you can examine the metrics in the web console.
Prerequisites
- You have logged in to the OpenShift Container Platform web console.
- You have installed the OpenShift Serverless Operator and Knative Serving.
Procedure
Optional: Run requests against your application that you will be able to see in the metrics:
hello_route=$(oc get ksvc helloworld-go -n ns1 -o jsonpath='{.status.url}') && \ curl $hello_route
$ hello_route=$(oc get ksvc helloworld-go -n ns1 -o jsonpath='{.status.url}') && \ curl $hello_route
Copy to Clipboard Copied! Example output
Hello Go Sample v1!
Hello Go Sample v1!
Copy to Clipboard Copied! - In the web console, navigate to the Observe → Metrics interface.
In the input field, enter the query for the metric you want to observe, for example:
revision_app_request_count{namespace="ns1", job="helloworld-go-sm"}
revision_app_request_count{namespace="ns1", job="helloworld-go-sm"}
Copy to Clipboard Copied! Another example:
myapp_processed_ops_total{namespace="ns1", job="helloworld-go-sm"}
myapp_processed_ops_total{namespace="ns1", job="helloworld-go-sm"}
Copy to Clipboard Copied! Observe the visualized metrics:
2.5.1. Queue proxy metrics
Each Knative service has a proxy container that proxies the connections to the application container. A number of metrics are reported for the queue proxy performance.
You can use the following metrics to measure if requests are queued at the proxy side and the actual delay in serving requests at the application side.
Metric name | Description | Type | Tags | Unit |
---|---|---|---|---|
|
The number of requests that are routed to | Counter |
| Integer (no units) |
| The response time of revision requests. | Histogram |
| Milliseconds |
|
The number of requests that are routed to the | Counter |
| Integer (no units) |
| The response time of revision app requests. | Histogram |
| Milliseconds |
|
The current number of items in the | Gauge |
| Integer (no units) |
2.6. Dashboard for service metrics
You can examine the metrics using a dedicated dashboard that aggregates queue proxy metrics by namespace.
2.6.1. Examining metrics of a service in the dashboard
Prerequisites
- You have logged in to the OpenShift Container Platform web console.
- You have installed the OpenShift Serverless Operator and Knative Serving.
Procedure
- In the web console, navigate to the Observe → Metrics interface.
-
Select the
Knative User Services (Queue Proxy metrics)
dashboard. - Select the Namespace, Configuration, and Revision that correspond to your application.
Observe the visualized metrics:
Chapter 3. Cluster logging
3.1. Configuring log settings for Serving and Eventing
You can configure logging for OpenShift Serverless Serving and OpenShift Serverless Eventing using the KnativeServing
and KnativeEventing
custom resource (CR). The level of logging is determined by the specified loglevel
value.
3.1.1. Supported log levels
The following loglevel
values are supported:
Log level | Description |
---|---|
| Fine-grained debugging |
| Normal logging |
| Unexpected but non-critical errors |
| Critical errors; unexpected during normal operation |
| In debug mode, trigger a panic (crash) |
Using the debug
level for production might negatively affect performance.
3.1.2. Configuring log settings
You can configure logging for Serving and Eventing in the KnativeServing
custom resource (CR) and KnativeEventing
CR.
Procedure
Configure the log settings for Serving and Eventing by setting or modifying the
loglevel
value in theKnativeServing
andKnativeEventing
CR respectively. Here are two example configurations with all possible logging options set to levelinfo
:KnativeServing CR
apiVersion: operator.knative.dev/v1beta1 kind: KnativeServing metadata: name: knative-serving namespace: knative-serving spec: config: logging: loglevel.controller: "info" loglevel.autoscaler: "info" loglevel.queueproxy: "info" loglevel.webhook: "info" loglevel.activator: "info" loglevel.hpaautoscaler: "info" loglevel.net-certmanager-controller: "info" loglevel.net-istio-controller: "info" loglevel.net-kourier-controller: "info"
apiVersion: operator.knative.dev/v1beta1 kind: KnativeServing metadata: name: knative-serving namespace: knative-serving spec: config: logging: loglevel.controller: "info" loglevel.autoscaler: "info" loglevel.queueproxy: "info" loglevel.webhook: "info" loglevel.activator: "info" loglevel.hpaautoscaler: "info" loglevel.net-certmanager-controller: "info" loglevel.net-istio-controller: "info" loglevel.net-kourier-controller: "info"
Copy to Clipboard Copied! KnativeEventing CR
apiVersion: operator.knative.dev/v1beta1 kind: KnativeEventing metadata: name: knative-eventing namespace: knative-eventing spec: config: logging: loglevel.controller: "info" loglevel.eventing-webhook: "info" loglevel.inmemorychannel-dispatcher: "info" loglevel.inmemorychannel-webhook: "info" loglevel.mt-broker-controller: "info" loglevel.mt_broker_filter: "info" loglevel.mt_broker_ingress: "info" loglevel.pingsource-mt-adapter: "info"
apiVersion: operator.knative.dev/v1beta1 kind: KnativeEventing metadata: name: knative-eventing namespace: knative-eventing spec: config: logging: loglevel.controller: "info" loglevel.eventing-webhook: "info" loglevel.inmemorychannel-dispatcher: "info" loglevel.inmemorychannel-webhook: "info" loglevel.mt-broker-controller: "info" loglevel.mt_broker_filter: "info" loglevel.mt_broker_ingress: "info" loglevel.pingsource-mt-adapter: "info"
Copy to Clipboard Copied!
3.1.3. Configuring request log settings
You can configure request logging for your service in the observability
field of your KnativeServing
custom resource (CR).
For information about the available parameters for configuring request logging, see "Parameters of request logging".
Procedure
Configure request logging for your service by modifying the
observability
field in yourKnativeServing
CR:Example
KnativeServing
CRapiVersion: operator.knative.dev/v1beta1 kind: KnativeServing metadata: name: knative-serving namespace: knative-serving # ... spec: config: observability: logging.enable-request-log: true logging.enable-probe-request-log: true logging.request-log-template: '{"httpRequest": {"requestMethod": "{{.Request.Method}}", "requestUrl": "{{js .Request.RequestURI}}", "requestSize": "{{.Request.ContentLength}}", "status": {{.Response.Code}}, "responseSize": "{{.Response.Size}}", "userAgent": "{{js .Request.UserAgent}}", "remoteIp": "{{js .Request.RemoteAddr}}", "serverIp": "{{.Revision.PodIP}}", "referer": "{{js .Request.Referer}}", "latency": "{{.Response.Latency}}s", "protocol": "{{.Request.Proto}}"}, "traceId": "{{index .Request.Header "X-B3-Traceid"}}"}' # ...
apiVersion: operator.knative.dev/v1beta1 kind: KnativeServing metadata: name: knative-serving namespace: knative-serving # ... spec: config: observability: logging.enable-request-log: true logging.enable-probe-request-log: true logging.request-log-template: '{"httpRequest": {"requestMethod": "{{.Request.Method}}", "requestUrl": "{{js .Request.RequestURI}}", "requestSize": "{{.Request.ContentLength}}", "status": {{.Response.Code}}, "responseSize": "{{.Response.Size}}", "userAgent": "{{js .Request.UserAgent}}", "remoteIp": "{{js .Request.RemoteAddr}}", "serverIp": "{{.Revision.PodIP}}", "referer": "{{js .Request.Referer}}", "latency": "{{.Response.Latency}}s", "protocol": "{{.Request.Proto}}"}, "traceId": "{{index .Request.Header "X-B3-Traceid"}}"}' # ...
Copy to Clipboard Copied!
3.1.4. Parameters of request logging
The following table describes parameters used to configure request logging.
Parameter | Type | Description |
---|---|---|
|
Boolean ( |
Set to |
|
Boolean ( |
Set to |
|
Go | Determine the shape of the request logs. Use a single line to prevent logs from being split into multiple records. |
The logging.request-log-template
parameter includes the following functions:
-
Request
is anhttp.Request
representing an HTTP request received by the server. Response
represents the HTTP response and includes the following fields:-
Code
is the HTTP status code. -
Size
is the response size in bytes. -
Latency
is the response latency in seconds.
-
Revision
contains revision details and includes the following fields:-
Name
is the name of the revision. -
Namespace
is the namespace of the revision. -
Service
is the name of the service. -
Configuration
is the name of the configuration. -
PodName
is the name of the pod hosting the revision. -
PodIP
is the IP address of the hosting pod.
-
Chapter 4. Tracing
4.1. Tracing requests
Distributed tracing records the path of a request through the various services that make up an application. It is used to tie information about different units of work together, to understand a whole chain of events in a distributed transaction. The units of work might be executed in different processes or hosts.
4.1.1. Distributed tracing overview
As a service owner, you can use distributed tracing to instrument your services to gather insights into your service architecture. You can use distributed tracing for monitoring, network profiling, and troubleshooting the interaction between components in modern, cloud-native, microservices-based applications.
With distributed tracing you can perform the following functions:
- Monitor distributed transactions
- Optimize performance and latency
- Perform root cause analysis
Red Hat OpenShift distributed tracing consists of two main components:
- Red Hat OpenShift distributed tracing platform - This component is based on the open source Jaeger project.
- Red Hat OpenShift distributed tracing data collection - This component is based on the open source OpenTelemetry project.
Both of these components are based on the vendor-neutral OpenTracing APIs and instrumentation.
4.2. Using Red Hat OpenShift distributed tracing
You can use Red Hat OpenShift distributed tracing with OpenShift Serverless to monitor and troubleshoot serverless applications.
4.2.1. Using Red Hat OpenShift distributed tracing to enable distributed tracing
Red Hat OpenShift distributed tracing is made up of several components that work together to collect, store, and display tracing data.
Prerequisites
- You have access to an OpenShift Container Platform account with cluster administrator access.
- You have installed Red Hat OpenShift distributed tracing by following the OpenShift Container Platform "Installing distributed tracing" documentation.
-
You have installed the OpenShift CLI (
oc
). - You have created a project or have access to a project with the appropriate roles and permissions to create applications and other workloads in OpenShift Container Platform.
Procedure
Create an
OpenTelemetryCollector
custom resource (CR):Example OpenTelemetryCollector CR
apiVersion: opentelemetry.io/v1alpha1 kind: OpenTelemetryCollector metadata: name: cluster-collector namespace: <namespace> spec: mode: deployment config: | receivers: zipkin: processors: exporters: jaeger: endpoint: jaeger-all-in-one-inmemory-collector-headless.tracing-system.svc:14250 tls: ca_file: "/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt" logging: service: pipelines: traces: receivers: [zipkin] processors: [] exporters: [jaeger, logging]
apiVersion: opentelemetry.io/v1alpha1 kind: OpenTelemetryCollector metadata: name: cluster-collector namespace: <namespace> spec: mode: deployment config: | receivers: zipkin: processors: exporters: jaeger: endpoint: jaeger-all-in-one-inmemory-collector-headless.tracing-system.svc:14250 tls: ca_file: "/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt" logging: service: pipelines: traces: receivers: [zipkin] processors: [] exporters: [jaeger, logging]
Copy to Clipboard Copied! Verify that you have two pods running in the namespace where Red Hat OpenShift distributed tracing is installed:
oc get pods -n <namespace>
$ oc get pods -n <namespace>
Copy to Clipboard Copied! Example output
NAME READY STATUS RESTARTS AGE cluster-collector-collector-85c766b5c-b5g99 1/1 Running 0 5m56s jaeger-all-in-one-inmemory-ccbc9df4b-ndkl5 2/2 Running 0 15m
NAME READY STATUS RESTARTS AGE cluster-collector-collector-85c766b5c-b5g99 1/1 Running 0 5m56s jaeger-all-in-one-inmemory-ccbc9df4b-ndkl5 2/2 Running 0 15m
Copy to Clipboard Copied! Verify that the following headless services have been created:
oc get svc -n <namespace> | grep headless
$ oc get svc -n <namespace> | grep headless
Copy to Clipboard Copied! Example output
cluster-collector-collector-headless ClusterIP None <none> 9411/TCP 7m28s jaeger-all-in-one-inmemory-collector-headless ClusterIP None <none> 9411/TCP,14250/TCP,14267/TCP,14268/TCP 16m
cluster-collector-collector-headless ClusterIP None <none> 9411/TCP 7m28s jaeger-all-in-one-inmemory-collector-headless ClusterIP None <none> 9411/TCP,14250/TCP,14267/TCP,14268/TCP 16m
Copy to Clipboard Copied! These services are used to configure Jaeger, Knative Serving, and Knative Eventing. The name of the Jaeger service may vary.
- Install the OpenShift Serverless Operator by following the "Installing the OpenShift Serverless Operator" documentation.
Install Knative Serving by creating the following
KnativeServing
CR:Example KnativeServing CR
apiVersion: operator.knative.dev/v1beta1 kind: KnativeServing metadata: name: knative-serving namespace: knative-serving spec: config: tracing: backend: "zipkin" zipkin-endpoint: "http://cluster-collector-collector-headless.tracing-system.svc:9411/api/v2/spans" debug: "false" sample-rate: "0.1"
apiVersion: operator.knative.dev/v1beta1 kind: KnativeServing metadata: name: knative-serving namespace: knative-serving spec: config: tracing: backend: "zipkin" zipkin-endpoint: "http://cluster-collector-collector-headless.tracing-system.svc:9411/api/v2/spans" debug: "false" sample-rate: "0.1"
1 Copy to Clipboard Copied! - 1
- The
sample-rate
defines sampling probability. Usingsample-rate: "0.1"
means that 1 in 10 traces are sampled.
Install Knative Eventing by creating the following
KnativeEventing
CR:Example KnativeEventing CR
apiVersion: operator.knative.dev/v1beta1 kind: KnativeEventing metadata: name: knative-eventing namespace: knative-eventing spec: config: tracing: backend: "zipkin" zipkin-endpoint: "http://cluster-collector-collector-headless.tracing-system.svc:9411/api/v2/spans" debug: "false" sample-rate: "0.1"
apiVersion: operator.knative.dev/v1beta1 kind: KnativeEventing metadata: name: knative-eventing namespace: knative-eventing spec: config: tracing: backend: "zipkin" zipkin-endpoint: "http://cluster-collector-collector-headless.tracing-system.svc:9411/api/v2/spans" debug: "false" sample-rate: "0.1"
1 Copy to Clipboard Copied! - 1
- The
sample-rate
defines sampling probability. Usingsample-rate: "0.1"
means that 1 in 10 traces are sampled.
Create a Knative service:
Example service
apiVersion: serving.knative.dev/v1 kind: Service metadata: name: helloworld-go spec: template: metadata: labels: app: helloworld-go annotations: autoscaling.knative.dev/minScale: "1" autoscaling.knative.dev/target: "1" spec: containers: - image: quay.io/openshift-knative/helloworld:v1.2 imagePullPolicy: Always resources: requests: cpu: "200m" env: - name: TARGET value: "Go Sample v1"
apiVersion: serving.knative.dev/v1 kind: Service metadata: name: helloworld-go spec: template: metadata: labels: app: helloworld-go annotations: autoscaling.knative.dev/minScale: "1" autoscaling.knative.dev/target: "1" spec: containers: - image: quay.io/openshift-knative/helloworld:v1.2 imagePullPolicy: Always resources: requests: cpu: "200m" env: - name: TARGET value: "Go Sample v1"
Copy to Clipboard Copied! Make some requests to the service:
Example HTTPS request
curl https://helloworld-go.example.com
$ curl https://helloworld-go.example.com
Copy to Clipboard Copied! Get the URL for the Jaeger web console:
Example command
oc get route jaeger-all-in-one-inmemory -o jsonpath='{.spec.host}' -n <namespace>
$ oc get route jaeger-all-in-one-inmemory -o jsonpath='{.spec.host}' -n <namespace>
Copy to Clipboard Copied! You can now examine traces by using the Jaeger console.
4.3. Using Jaeger distributed tracing
If you do not want to install all of the components of Red Hat OpenShift distributed tracing, you can still use distributed tracing on OpenShift Container Platform with OpenShift Serverless.
4.3.1. Configuring Jaeger to enable distributed tracing
To enable distributed tracing using Jaeger, you must install and configure Jaeger as a standalone integration.
Prerequisites
- You have cluster administrator permissions on OpenShift Container Platform, or you have cluster or dedicated administrator permissions on Red Hat OpenShift Service on AWS or OpenShift Dedicated.
- You have installed the OpenShift Serverless Operator, Knative Serving, and Knative Eventing.
- You have installed the Red Hat OpenShift distributed tracing platform Operator.
-
You have installed the OpenShift CLI (
oc
). - You have created a project or have access to a project with the appropriate roles and permissions to create applications and other workloads.
Procedure
Create and apply a
Jaeger
custom resource (CR) that contains the following:Jaeger CR
apiVersion: jaegertracing.io/v1 kind: Jaeger metadata: name: jaeger namespace: default
apiVersion: jaegertracing.io/v1 kind: Jaeger metadata: name: jaeger namespace: default
Copy to Clipboard Copied! Enable tracing for Knative Serving, by editing the
KnativeServing
CR and adding a YAML configuration for tracing:Tracing YAML example for Serving
apiVersion: operator.knative.dev/v1beta1 kind: KnativeServing metadata: name: knative-serving namespace: knative-serving spec: config: tracing: sample-rate: "0.1" backend: zipkin zipkin-endpoint: "http://jaeger-collector.default.svc.cluster.local:9411/api/v2/spans" debug: "false"
apiVersion: operator.knative.dev/v1beta1 kind: KnativeServing metadata: name: knative-serving namespace: knative-serving spec: config: tracing: sample-rate: "0.1"
1 backend: zipkin
2 zipkin-endpoint: "http://jaeger-collector.default.svc.cluster.local:9411/api/v2/spans"
3 debug: "false"
4 Copy to Clipboard Copied! - 1
- The
sample-rate
defines sampling probability. Usingsample-rate: "0.1"
means that 1 in 10 traces are sampled. - 2
backend
must be set tozipkin
.- 3
- The
zipkin-endpoint
must point to yourjaeger-collector
service endpoint. To get this endpoint, substitute the namespace where the Jaeger CR is applied. - 4
- Debugging should be set to
false
. Enabling debug mode by settingdebug: "true"
allows all spans to be sent to the server, bypassing sampling.
Enable tracing for Knative Eventing by editing the
KnativeEventing
CR:Tracing YAML example for Eventing
apiVersion: operator.knative.dev/v1beta1 kind: KnativeEventing metadata: name: knative-eventing namespace: knative-eventing spec: config: tracing: sample-rate: "0.1" backend: zipkin zipkin-endpoint: "http://jaeger-collector.default.svc.cluster.local:9411/api/v2/spans" debug: "false"
apiVersion: operator.knative.dev/v1beta1 kind: KnativeEventing metadata: name: knative-eventing namespace: knative-eventing spec: config: tracing: sample-rate: "0.1"
1 backend: zipkin
2 zipkin-endpoint: "http://jaeger-collector.default.svc.cluster.local:9411/api/v2/spans"
3 debug: "false"
4 Copy to Clipboard Copied! - 1
- The
sample-rate
defines sampling probability. Usingsample-rate: "0.1"
means that 1 in 10 traces are sampled. - 2
- Set
backend
tozipkin
. - 3
- Point the
zipkin-endpoint
to yourjaeger-collector
service endpoint. To get this endpoint, substitute the namespace where the Jaeger CR is applied. - 4
- Debugging should be set to
false
. Enabling debug mode by settingdebug: "true"
allows all spans to be sent to the server, bypassing sampling.
Verification
You can access the Jaeger web console to see tracing data, by using the jaeger
route.
Get the
jaeger
route’s hostname by entering the following command:oc get route jaeger -n default
$ oc get route jaeger -n default
Copy to Clipboard Copied! Example output
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD jaeger jaeger-default.apps.example.com jaeger-query <all> reencrypt None
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD jaeger jaeger-default.apps.example.com jaeger-query <all> reencrypt None
Copy to Clipboard Copied! - Open the endpoint address in your browser to view the console.