此内容没有您所选择的语言版本。
Chapter 25. Introducing metrics
Metrics provide insight into the health, performance, and capacity of your Kafka deployment. By observing these metrics, you can detect potential issues early, optimize resource usage, and improve reliability.
Metrics from Streams for Apache Kafka components can be collected using Prometheus and visualized in Grafana. Prometheus collects metrics from running pods in your cluster, while Grafana provides an intuitive interface for exploring and visualizing these metrics.
The Streams for Apache Kafka operators expose Prometheus metrics by default, including reconciliation counts, durations, JVM data, and resource processing statistics. Other components and features require configuration to expose their metrics.
In addition, you can use kube-state-metrics (KSM) to monitor the state and readiness of Streams for Apache Kafka custom resources, and to raise alerts for warning conditions.
Streams for Apache Kafka provides example Prometheus rules, Grafana dashboards, and kube-state-metrics configuration files that you can adapt to your environment. You can also track message flow across components by setting up distributed tracing.
Streams for Apache Kafka provides example installation files for Prometheus, Grafana, and kube-state-metrics to help get you started. For further information, refer to the resources of those projects.
25.1. Example metrics files 复制链接链接已复制到粘贴板!
You can find example Grafana dashboards and other metrics configuration files in the example configuration files provided by Streams for Apache Kafka.
Example metrics files provided with Streams for Apache Kafka
metrics
├── grafana-dashboards
│ ├── strimzi-cruise-control.json
│ ├── strimzi-kafka-bridge.json
│ ├── strimzi-kafka-connect.json
│ ├── strimzi-kafka-exporter.json
│ ├── strimzi-kafka-mirror-maker-2.json
| ├── strimzi-kafka-oauth.json
│ ├── strimzi-kafka.json
| ├── strimzi-kraft.json
│ ├── strimzi-operators.json
├── grafana-install
│ └── grafana.yaml
├── kube-state-metrics
│ ├── configmap.yaml
│ ├── ksm.yaml
│ └── prometheus-rules.yaml
├── prometheus-additional-properties
│ └── prometheus-additional.yaml
├── prometheus-alertmanager-config
│ └── alert-manager-config.yaml
├── prometheus-install
│ ├── alert-manager.yaml
│ ├── pod-monitors
│ │ ├── bridge-metrics.yaml
│ │ ├── cluster-operator-metrics.yaml
│ │ ├── entity-operator-metrics.yaml
│ │ └── kafka-resources-metrics.yaml
│ ├── prometheus-rules
│ │ ├── prometheus-kafka-bridge-rules.yaml
│ │ ├── prometheus-kafka-certificate-rules.yaml
│ │ ├── prometheus-kafka-connect-rules.yaml
│ │ ├── prometheus-kafka-rules.yaml
│ │ ├── prometheus-kafka-exporter-topic-rules.yaml
│ │ ├── prometheus-mirrormaker2-rules.yaml
│ │ ├── prometheus-strimzi-cluster-operator-rules.yaml
│ │ └── prometheus-strimzi-entity-operator-rules.yaml
│ └── prometheus.yaml
├── strimzi-metrics-reporter
| ├── grafana-dashboards
| | ├── strimzi-kafka.json
| | ├── strimzi-kafka-bridge.json
| | ├── strimzi-kafka-connect.json
| | ├── strimzi-kafka-mirror-maker-2.json
| | ├── strimzi-kraft.json
│ ├── kafka-bridge-metrics.yaml
│ ├── kafka-connect-metrics.yaml
│ ├── kafka-metrics.yaml
| └── kafka-mirror-maker-2-metrics.yaml
├── kafka-bridge-metrics.yaml
├── kafka-connect-metrics.yaml
├── kafka-cruise-control-metrics.yaml
├── kafka-metrics.yaml
├── kafka-mirror-maker-2-metrics.yaml
└── oauth-metrics.yaml
- 1
- Grafana dashboards for components using the JMX Exporter.
- 2
- Installation file for the Grafana image.
- 3
- Kube-state-metrics deployment and configuration files for custom resource monitoring.
- 4
- Additional configuration to scrape metrics for CPU, memory and disk volume usage, which comes directly from the OpenShift cAdvisor agent and kubelet on the nodes.
- 5
- Hook definitions for sending notifications through Alertmanager.
- 6
- Resources for deploying and configuring Alertmanager.
- 7
- PodMonitor definitions translated by the Prometheus Operator into jobs for the Prometheus server to be able to scrape metrics data directly from pods.
- 8
- Alerting rules examples for use with Prometheus Alertmanager (deployed with Prometheus).
- 9
- Installation resource file for the Prometheus image.
- 10
- Grafana dashboards for components using the Streams for Apache Kafka Metrics Reporter.
- 11
KafkaBridgeresource for deploying Kafka Bridge with Streams for Apache Kafka Metrics Reporter.- 12
KafkaConnectresource for deploying Kafka Connect with Streams for Apache Kafka Metrics Reporter.- 13
Kafkaresource for deploying Kafka with Streams for Apache Kafka Metrics Reporter.- 14
KafkaMirrorMaker2resource for deploying MirrorMaker 2 with Streams for Apache Kafka Metrics Reporter.- 15
- Metrics configuration that defines Prometheus JMX Exporter relabeling rules for Kafka Bridge.
- 16
- Metrics configuration that defines Prometheus JMX Exporter relabeling rules for Kafka Connect.
- 17
- Metrics configuration that defines Prometheus JMX Exporter relabeling rules for Cruise Control.
- 18
- Metrics configuration that defines Prometheus JMX Exporter relabeling rules for Kafka.
- 19
- Metrics configuration that defines Prometheus JMX Exporter relabeling rules for MirrorMaker 2.
- 20
- Metrics configuration that defines Prometheus JMX Exporter relabeling rules for OAuth 2.0.
The Prometheus JMX Exporter collects Java Management Extensions (JMX) from Kafka components and converts them into Prometheus metrics. You do not require jmxOptions configuration in the custom resource of the component for the Prometheus JMX Exporter to function. jmxOptions is only required if you need direct access to JMX from Kafka components.
Streams for Apache Kafka uses the Streams for Apache Kafka Metrics Reporter or the Prometheus JMX Exporter to expose metrics through an HTTP endpoint, which can be scraped by the Prometheus server.
Streams for Apache Kafka provides example custom resource configuration YAML files with relabeling rules. When deploying Prometheus metrics configuration, you can deploy the example custom resource or copy the metrics configuration to your own custom resource definition.
Grafana dashboards with Prometheus JMX Exporter are dependent on relabeling rules, which are defined for Streams for Apache Kafka components in the custom resource configuration.
A label is a name-value pair. Relabeling is the process of writing a label dynamically. For example, the value of a label may be derived from the name of a Kafka server and client ID.
| Component | Custom resource | Example YAML file |
|---|---|---|
| Kafka nodes |
|
|
| Kafka Connect |
|
|
| Kafka MirrorMaker 2 |
|
|
| Kafka Bridge |
|
|
| Cruise Control |
|
|
25.1.2. Example Metrics Reporter metrics configuration 复制链接链接已复制到粘贴板!
| Component | Custom resource | Example YAML file |
|---|---|---|
| Kafka nodes |
|
|
| Kafka Connect |
|
|
| Kafka MirrorMaker2 |
|
|
| Kafka Bridge |
|
|
25.1.3. Example Prometheus rules for alert notifications 复制链接链接已复制到粘贴板!
Example Prometheus rules for alert notifications are provided with the example metrics configuration files provided by Streams for Apache Kafka. The rules are specified in the example prometheus-rules directory for use in a Prometheus deployment.
The prometheus-rules directory contains example rules for the following components:
- Kafka
- Entity Operator
- Kafka Connect
- Kafka Bridge
- MirrorMaker2
- Kafka Exporter
A description of each of the example rules is provided in the file.
Alerting rules provide notifications about specific conditions observed in metrics. Rules are declared on the Prometheus server, but Prometheus Alertmanager is responsible for alert notifications.
Prometheus alerting rules describe conditions using PromQL expressions that are continuously evaluated.
When an alert expression becomes true, the condition is met and the Prometheus server sends alert data to the Alertmanager. Alertmanager then sends out a notification using the communication method configured for its deployment.
General points about the alerting rule definitions:
-
A
forproperty is used with the rules to determine the period of time a condition must persist before an alert is triggered. -
The availability of the
KafkaRunningOutOfSpacemetric and alert is dependent on the OpenShift configuration and storage implementation used. Storage implementations for certain platforms may not be able to supply the information on available space required for the metric to provide an alert.
Alertmanager can be configured to use email, chat messages or other notification methods. Adapt the default configuration of the example rules according to your specific needs.
25.1.4. Example Grafana dashboards 复制链接链接已复制到粘贴板!
If you deploy Prometheus to provide metrics, you can use the example Grafana dashboards provided with Streams for Apache Kafka to monitor Streams for Apache Kafka components.
Example dashboards are provided in the examples/metrics/grafana-dashboards directory as JSON files.
All dashboards provide JVM metrics, as well as metrics specific to the component. For example, the Grafana dashboard for Streams for Apache Kafka operators provides information on the number of reconciliations or custom resources they are processing.
The example dashboards don’t show all the metrics supported by Kafka. The dashboards are populated with a representative set of metrics for monitoring.
| Component | Example JSON file |
|---|---|
| Streams for Apache Kafka operators |
|
| Kafka |
|
| Kafka KRaft controllers |
|
| Kafka Connect |
|
| Kafka MirrorMaker 2 |
|
| Kafka Bridge |
|
| Cruise Control |
|
| Kafka Exporter |
|
When metrics are not available to the Kafka Exporter, because there is no traffic in the cluster yet, the Kafka Exporter Grafana dashboard will show N/A for numeric fields and No data to show for graphs.
25.2. Using Prometheus with Streams for Apache Kafka 复制链接链接已复制到粘贴板!
Use Prometheus to provide monitoring data for the example Grafana dashboards provided with Streams for Apache Kafka.
To expose metrics in Prometheus format, you add configuration to a custom resource. You must also make sure that the metrics are scraped by your monitoring stack. Prometheus and Prometheus Alertmanager are used in the examples provided by Streams for Apache Kafka, but you can use also other compatible tools.
You can expose metrics using one of the following approaches:
After enabling metrics, you can integrate with Prometheus:
Streams for Apache Kafka provides an example Grafana dashboards to display visualizations of metrics. The exposed metrics provide the monitoring data when you enable the Grafana dashboard.
25.2.1. Enabling Prometheus JMX Exporter 复制链接链接已复制到粘贴板!
To enable and expose metrics in Streams for Apache Kafka for Prometheus, configure the appropriate properties in the custom resources for the components you want to monitor.
Use metricsConfig to expose metrics for these components:
- Kafka
- Kafka Connect
- Kafka MirrorMaker 2
- Kafka Bridge
- Cruise Control
This enables the Prometheus JMX Exporter, which exposes metrics on port 9404 through an HTTP endpoint. Prometheus scrapes this endpoint to collect Kafka metrics.
Set enableMetrics to true to expose metrics for the following:
OAuth 2.0
-
Configure in the
Kafkaresource foroauthorkeycloakcluster authorization, oroauthlistener authentication. -
Configure in the
KafkaBridge,KafkaConnect, orKafkaMirrorMaker2resources foroauthauthentication.
-
Configure in the
You can create your own Prometheus configuration or use the example custom resource files provided with Streams for Apache Kafka:
-
kafka-metrics.yaml -
kafka-connect-metrics.yaml -
kafka-mirror-maker-2-metrics.yaml -
kafka-bridge-metrics.yaml -
kafka-cruise-control-metrics.yaml -
oauth-metrics.yaml
These files include relabeling rules and example metrics configuration, and are a good starting point for trying Prometheus with Streams for Apache Kafka.
This procedure shows how to deploy example Prometheus metrics configuration to the Kafka resource. The same steps apply when deploying the example files for other resources.
Procedure
Deploy the example custom resource with the Prometheus configuration.
For example, for each
Kafkaresource you can apply thekafka-metrics.yamlfile.Deploying the example configuration
oc apply -f kafka-metrics.yamlAlternatively, copy the example configuration in
kafka-metrics.yamlto your ownKafkaresource.Copying the example configuration
oc edit kafka <kafka_configuration_file>Copy the
metricsConfigproperty and theConfigMapit references to yourKafkaresource.Example metrics configuration for Kafka
apiVersion: kafka.strimzi.io/v1beta2 kind: Kafka metadata: name: my-cluster spec: kafka: # ... metricsConfig:1 type: jmxPrometheusExporter valueFrom: configMapKeyRef: name: kafka-metrics key: kafka-metrics-config.yml --- kind: ConfigMap2 apiVersion: v1 metadata: name: kafka-metrics labels: app: strimzi data: kafka-metrics-config.yml: | # metrics configuration...
Enabling metrics for OAuth 2.0
To expose metrics for OAuth 2.0, set the enableMetrics property to true in the appropriate custom resource.
In the Kafka resource for:
-
Cluster authorization (
oauthorkeycloak) -
Listener authentication (
oauthonly)
-
Cluster authorization (
-
In the
KafkaBridge,KafkaConnect, orKafkaMirrorMaker2resources foroauthauthentication
In the following example, metrics are enabled for OAuth 2.0 listener authentication and OAuth 2.0 (keycloak) cluster authorization.
Example configuration with OAuth 2.0 metrics enabled
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: my-cluster
namespace: myproject
spec:
kafka:
# ...
listeners:
- name: external3
port: 9094
type: loadbalancer
tls: true
authentication:
type: oauth
enableMetrics: true
configuration:
#...
authorization:
type: keycloak
enableMetrics: true
# ...
To use OAuth 2.0 metrics with Prometheus, copy the ConfigMap configuration from the oauth-metrics.yaml file to the same Kafka resource configuration file where you enabled metrics for OAuth 2.0 and then apply the configuration.
You can also enable metrics for the type: opa authorization option in the same way as for OAuth 2.0 authorization. However, type: opa is deprecated and will be removed in a future release. To continue using the Open Policy Agent authorizer, use the type: custom authorization configuration.
This feature is a technology preview and not intended for a production environment. For more information see the release notes.
To enable and expose metrics in Streams for Apache Kafka for Prometheus, configure the appropriate properties in the custom resources for the components you want to monitor.
Use metricsConfig configuration to expose metrics for these components:
- Kafka
- Kafka Bridge
- Kafka Connect
- Kafka MirrorMaker 2
This enables the Streams for Apache Kafka Metrics Reporter, which exposes metrics on port 9404 through an HTTP endpoint. Prometheus scrapes this endpoint to collect Kafka metrics.
You can create your own Prometheus configuration or use the example custom resource file provided with Streams for Apache Kafka:
-
kafka-metrics.yaml -
kafka-connect-metrics.yaml -
kafka-mirror-maker-2-metrics.yaml -
kafka-bridge-metrics.yaml
These files contain the necessary configuration, and are a good starting point for trying Prometheus with Streams for Apache Kafka.
This procedure shows how to deploy example Prometheus metrics configuration to the Kafka resource.
Procedure
Deploy the example custom resource with the Prometheus configuration.
For example, for each
Kafkaresource you can apply thekafka-metrics.yamlfile.Deploying the example configuration
oc apply -f kafka-metrics.yamlAlternatively, copy the example configuration in
kafka-metrics.yamlto your ownKafkaresource.Copying the example configuration
oc edit kafka <kafka_configuration_file>Copy the
metricsConfigproperty to yourKafkaresource.Example metrics configuration for Kafka
apiVersion: kafka.strimzi.io/v1beta2 kind: Kafka metadata: name: my-cluster spec: kafka: # ... metricsConfig: type: strimziMetricsReporter values: allowList: - "kafka_log." - "kafka_network."
Set up monitoring for your Streams for Apache Kafka cluster on OpenShift Container Platform. Use OpenShift’s feature for monitoring user-defined projects, which provides a dedicated Prometheus instance to collect metrics from your Kafka components. Deploy a Grafana instance to visualize these metrics using pre-built dashboards.
The monitoring stack in the openshift-monitoring project is for core platform components only. Do not use the Prometheus and Grafana instances from this project for your Kafka monitoring. You must use the user workload monitoring stack described here.
With the prerequisites in place, set up Streams for Apache Kafka monitoring in OpenShift by following these procedures in order:
25.2.3.1. Prerequisites 复制链接链接已复制到粘贴板!
-
Monitoring for user-defined projects is enabled. A cluster administrator has created a
cluster-monitoring-configConfigMapresource in your OpenShift cluster. -
You have been granted permissions to create monitoring resources in your project. This is typically done by assigning your user the
monitoring-rules-editormonitoring-editrole. Metrics exposed using one of the following approaches:
For more information on creating a cluster-monitoring-config config map and granting users permission to monitor user-defined projects, see the OpenShift documentation.
25.2.3.2. Deploying the Prometheus resources 复制链接链接已复制到粘贴板!
Use Prometheus to obtain monitoring data in your Kafka cluster.
You can use your own Prometheus deployment or deploy Prometheus using the example metrics configuration files provided by Streams for Apache Kafka. To use the example files, you configure and deploy the PodMonitor resources. The PodMonitors scrape data directly from pods for Apache Kafka, Operators, the Kafka Bridge, and Cruise Control.
You then deploy the example alerting rules for Alertmanager.
Prerequisites
- A running Kafka cluster.
- Check the example alerting rules provided with Streams for Apache Kafka.
Procedure
Check that monitoring for user-defined projects is enabled:
oc get pods -n openshift-user-workload-monitoringIf enabled, pods for the monitoring components are returned. For example:
NAME READY STATUS RESTARTS AGE prometheus-operator-5cc59f9bc6-kgcq8 1/1 Running 0 25s prometheus-user-workload-0 5/5 Running 1 14s prometheus-user-workload-1 5/5 Running 1 14s thanos-ruler-user-workload-0 3/3 Running 0 14s thanos-ruler-user-workload-1 3/3 Running 0 14sIf no pods are returned, monitoring for user-defined projects is disabled. See the Prerequisites in xref:assembly-metrics-str.
Multiple
PodMonitorresources are defined inexamples/metrics/prometheus-install/strimzi-pod-monitor.yaml.For each
PodMonitorresource, edit thespec.namespaceSelector.matchNamesproperty:apiVersion: monitoring.coreos.com/v1 kind: PodMonitor metadata: name: cluster-operator-metrics labels: app: strimzi spec: selector: matchLabels: strimzi.io/kind: cluster-operator namespaceSelector: matchNames: - <project-name>1 podMetricsEndpoints: - path: /metrics port: http # ...- 1
- The project where the pods to scrape the metrics from are running, for example,
Kafka.
Deploy the
strimzi-pod-monitor.yamlfile to the project where your Kafka cluster is running:oc apply -f strimzi-pod-monitor.yaml -n <my-project>Deploy the example Prometheus rules to the same project:
oc apply -f prometheus-rules.yaml -n <my-project>
25.2.3.3. Creating a service account for Grafana 复制链接链接已复制到粘贴板!
A Grafana instance for Streams for Apache Kafka needs to run with a service account that is assigned the cluster-monitoring-view role.
Create a service account if you are using Grafana to present metrics for monitoring.
Prerequisites
Procedure
Create a
ServiceAccountfor Grafana in the project containing your Kafka cluster:oc create sa grafana-service-account -n my-projectIn this example, a service account named
grafana-service-accountis created in themy-projectnamespace.Create a
ClusterRoleBindingresource that assigns thecluster-monitoring-viewrole to the GrafanaServiceAccount. Here the resource is namedgrafana-cluster-monitoring-binding.apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: grafana-cluster-monitoring-binding labels: app: strimzi subjects: - kind: ServiceAccount name: grafana-service-account namespace: my-project roleRef: kind: ClusterRole name: cluster-monitoring-view apiGroup: rbac.authorization.k8s.ioDeploy the
ClusterRoleBindingto the same project:oc apply -f grafana-cluster-monitoring-binding.yaml -n my-projectCreate a token secret for the service account:
apiVersion: v1 kind: Secret metadata: name: secret-sa annotations: kubernetes.io/service-account.name: "grafana-service-account"1 type: kubernetes.io/service-account-token2 Create the
Secretobject and access token:oc create -f <secret_configuration>.yamlYou need the access token when deploying Grafana.
25.2.3.4. Deploying Grafana with a Prometheus datasource 复制链接链接已复制到粘贴板!
Deploy Grafana to present Prometheus metrics. A Grafana application requires configuration for the OpenShift Container Platform monitoring stack.
OpenShift Container Platform includes a Thanos Querier instance in the openshift-monitoring project. Thanos Querier is used to aggregate platform metrics.
To consume the required platform metrics, your Grafana instance requires a Prometheus data source that can connect to Thanos Querier. To configure this connection, you create a config map that authenticates, by using a token, to the oauth-proxy sidecar that runs alongside Thanos Querier. A datasource.yaml file is used as the source of the config map.
Finally, you deploy the Grafana application with the config map mounted as a volume to the project containing your Kafka cluster.
Prerequisites
Procedure
Get the access token of the Grafana
ServiceAccount:oc describe sa/grafana-service-account | grep Tokens: oc describe secret grafana-service-account-token-mmlp9 | grep token:In this example, the service account is named
grafana-service-account. Copy the access token to use in the next step.Create a
datasource.yamlfile containing the Thanos Querier configuration for Grafana.Paste the access token into the
httpHeaderValue1property as indicated.apiVersion: 1 datasources: - name: Prometheus type: prometheus url: https://thanos-querier.openshift-monitoring.svc.cluster.local:9091 access: proxy basicAuth: false withCredentials: false isDefault: true jsonData: timeInterval: 5s tlsSkipVerify: true httpHeaderName1: "Authorization" secureJsonData: httpHeaderValue1: "Bearer ${GRAFANA-ACCESS-TOKEN}"1 editable: true- 1
GRAFANA-ACCESS-TOKEN: The value of the access token for the GrafanaServiceAccount.
Create a config map named
grafana-configfrom thedatasource.yamlfile:oc create configmap grafana-config --from-file=datasource.yaml -n my-projectIn this example, a config map named
grafana-configis created in themy-projectnamespace.Create a Grafana application consisting of a
Deploymentand aService.The
grafana-configconfig map is mounted as a volume for the datasource configuration.apiVersion: apps/v1 kind: Deployment metadata: name: grafana labels: app: strimzi spec: replicas: 1 selector: matchLabels: name: grafana template: metadata: labels: name: grafana spec: serviceAccountName: grafana-service-account containers: - name: grafana image: grafana/grafana:12.3.0 ports: - name: grafana containerPort: 3000 protocol: TCP volumeMounts: - name: grafana-data mountPath: /var/lib/grafana - name: grafana-logs mountPath: /var/log/grafana - name: grafana-config mountPath: /etc/grafana/provisioning/datasources/datasource.yaml readOnly: true subPath: datasource.yaml readinessProbe: httpGet: path: /api/health port: 3000 initialDelaySeconds: 5 periodSeconds: 10 livenessProbe: httpGet: path: /api/health port: 3000 initialDelaySeconds: 15 periodSeconds: 20 volumes: - name: grafana-data emptyDir: {} - name: grafana-logs emptyDir: {} - name: grafana-config configMap: name: grafana-config --- apiVersion: v1 kind: Service metadata: name: grafana labels: app: strimzi spec: ports: - name: grafana port: 3000 targetPort: 3000 protocol: TCP selector: name: grafana type: ClusterIPDeploy the Grafana application to the project containing your Kafka cluster:
oc apply -f <grafana-application> -n my-project
25.2.3.5. Creating a route to the Grafana Service 复制链接链接已复制到粘贴板!
You can access the Grafana user interface through a Route that exposes the Grafana service.
Prerequisites
Procedure
Create an edge route to the
grafanaservice:oc create route edge <my-grafana-route> --service=grafana --namespace=KAFKA-NAMESPACE
25.2.3.6. Importing the example Grafana dashboards 复制链接链接已复制到粘贴板!
Use Grafana to provide visualizations of Prometheus metrics on customizable dashboards.
Streams for Apache Kafka provides example dashboard configuration files for Grafana in JSON format.
-
examples/metrics/grafana-dashboards
This procedure uses the example Grafana dashboards.
The example dashboards are a good starting point for monitoring key metrics, but they don’t show all the metrics supported by Kafka. You can modify the example dashboards or add other metrics, depending on your infrastructure.
Prerequisites
Procedure
Get the details of the Route to the Grafana Service. For example:
oc get routes NAME HOST/PORT PATH SERVICES MY-GRAFANA-ROUTE MY-GRAFANA-ROUTE-amq-streams.net grafana- In a web browser, access the Grafana login screen using the URL for the Route host and port.
Enter your user name and password, and then click Log In.
The default Grafana user name and password are both
admin. After logging in for the first time, you can change the password.- In Configuration > Data Sources, check that the Prometheus data source was created. The data source was created in Section 25.2.3.4, “Deploying Grafana with a Prometheus datasource”.
- Click the + icon and then click Import.
-
In
examples/metrics/grafana-dashboards, copy the JSON of the dashboard to import. - Paste the JSON into the text box, and then click Load.
- Repeat steps 5-7 for the other example Grafana dashboards.
The imported Grafana dashboards are available to view from the Dashboards home page.
Use kube-state-metrics (KSM) to monitor the state and readiness of custom resources managed by Streams for Apache Kafka. KSM is a scalable OpenShift native service that listens to the Kubernetes API server and exports metrics about the state of the objects. Monitoring through KSM allows you to track the health of Streams for Apache Kafka components, identify resources that are not ready, and configure alerts for warning conditions.
You can use your own KSM deployment or deploy KSM using the example metrics configuration files provided by Streams for Apache Kafka. The example files include a configuration file for a KSM deployment:
-
examples/metrics/kube-state-metrics/ksm.yaml
If you are using the example deployment file, you can update the namespace by replacing the example myproject with your own:
On Linux, use:
sed -E -i '/[[:space:]]*namespace: [a-zA-Z0-9-]*$/s/namespace:[[:space:]]*[a-zA-Z0-9-]*$/namespace: myproject/' examples/metrics/kube-state-metrics/ksm.yaml
On MacOS, use:
sed -i '' -e '/[[:space:]]*namespace: [a-zA-Z0-9-]*$/s/namespace:[[:space:]]*[a-zA-Z0-9-]*$/namespace: myproject/' examples/metrics/kube-state-metrics/ksm.yaml
Streams for Apache Kafka also provides an example configuration ConfigMap for KSM:
-
examples/metrics/kube-state-metrics/configmap.yaml
This procedure uses the example KSM deployment and configuration file.
Prerequisites
Procedure
Deploy KSM:
oc apply -f configmap.yaml oc apply -f ksm.yamlVerify that Prometheus is scraping KSM metrics.
Metrics are scraped from the
strimzi-kube-state-metricsservice using theServiceMonitorconfigured for the KSM deployment.For alerting on these metrics, check the provided
PrometheusRuleresources:-
examples/metrics/kube-state-metrics/prometheus-rules.yaml
-
25.4. Consumer lag monitoring 复制链接链接已复制到粘贴板!
Kafka Exporter is an open source project that enhances the monitoring of Apache Kafka brokers and clients. Kafka Exporter extracts additional metrics data from Kafka brokers related to consumer groups, consumer lag, topic offsets, and partitions. The metrics are exposed in Prometheus format and can be collected by Prometheus, then visualized in Grafana.
Kafka Exporter relies on data from the __consumer_offsets topic to report consumer lag metrics. This topic only contains information if consumer groups are actively committing offsets. Consumer groups must therefore be in use for Kafka Exporter to function correctly.
Consumer lag indicates the difference in the rate of production and consumption of messages. Specifically, consumer lag for a given consumer group indicates the delay between the last message in the partition and the message being currently picked up by that consumer.
The lag reflects the position of the consumer offset in relation to the end of the partition log.
Consumer lag between the producer and consumer offset
This difference is sometimes referred to as the delta between the producer offset and consumer offset: the read and write positions in the Kafka broker topic partitions.
Suppose a topic streams 100 messages a second. A lag of 1000 messages between the producer offset (the topic partition head) and the last offset the consumer has read means a 10-second delay.
Why monitor consumer lag?
For applications that rely on the processing of (near) real-time data, it is critical to monitor consumer lag to check that it does not become too big. The greater the lag becomes, the further the process moves from the real-time processing objective.
Consumer lag, for example, might be a result of consuming too much old data that has not been purged, or through unplanned shutdowns.
Reducing consumer lag
Use the Grafana charts to analyze lag and to check if actions to reduce lag are having an impact on an affected consumer group. If, for example, Kafka brokers are adjusted to reduce lag, the dashboard will show the Lag by consumer group chart going down and the Messages consumed per minute chart going up.
Typical actions to reduce lag include:
- Scaling-up consumer groups by adding new consumers
- Increasing the retention time for a message to remain in a topic
- Adding more disk capacity to increase the message buffer
Actions to reduce consumer lag depend on the underlying infrastructure and the use cases Streams for Apache Kafka is supporting. For instance, a lagging consumer is less likely to benefit from the broker being able to service a fetch request from its disk cache. And in certain cases, it might be acceptable to automatically drop messages until a consumer has caught up.
25.4.1. Deploying Kafka Exporter 复制链接链接已复制到粘贴板!
To monitor consumer lag in your Kafka cluster, configure Kafka Exporter in the Kafka custom resource. Kafka Exporter exposes lag data as Prometheus metrics, which can be visualized in Grafana. A Grafana dashboard for Kafka Exporter is included in the example Grafana dashboards provided by Streams for Apache Kafka.
Kafka Exporter provides only metrics related to consumer groups and lag. To collect general Kafka metrics, configure metrics on the Kafka brokers. For more information, see Section 25.2, “Using Prometheus with Streams for Apache Kafka”.
Prerequisites
Consumer groups must be in use.
Kafka Exporter relies on data from the
__consumer_offsetstopic to report consumer lag metrics. This topic only contains information if consumer groups are actively committing offsets.
Procedure
Add
kafkaExporterconfiguration to thespecsection of theKafkaresource.Example configuration for deploying Kafka Exporter
apiVersion: kafka.strimzi.io/v1beta2 kind: Kafka metadata: name: my-cluster spec: # ... kafkaExporter: # Collection filters (recommended) groupRegex: ".*"1 topicRegex: ".*"2 groupExcludeRegex: "^excluded-.*"3 topicExcludeRegex: "^excluded-.*"4 # Resources requests and limits (recommended) resources:5 requests: cpu: 200m memory: 64Mi limits: cpu: 500m memory: 128Mi # Metrics for all consumers (optional) showAllOffsets: false6 # Logging configuration (optional) logging: debug7 # Sarama logging (optional) enableSaramaLogging: true8 # Readiness probe (optional) readinessProbe:9 initialDelaySeconds: 15 timeoutSeconds: 5 # Liveness probe (optional) livenessProbe:10 initialDelaySeconds: 15 timeoutSeconds: 5 # Pod template (optional) template:11 pod: metadata: labels: label1: value1 imagePullSecrets: - name: my-docker-credentials securityContext: runAsUser: 1000001 fsGroup: 0 terminationGracePeriodSeconds: 120 # Custom image (optional) image: my-registry.io/my-org/my-exporter-cluster:latest12 # ...- 1
- Regular expression to specify consumer groups to include in metrics.
- 2
- Regular expression to specify topics to include in metrics.
- 3
- Regular expression to specify consumer groups to exclude from metrics.
- 4
- Regular expression to specify topics to exclude from metrics.
- 6
- By default, metrics are collected for all consumers regardless of connection status. Setting
showAllOffsetstofalsestops collecting metrics for disconnected consumers. - 5
- CPU and memory resources to reserve.
- 7
- Logging configuration, to log messages with a given severity (debug, info, warn, error, fatal) or above.
- 8
- Boolean to enable Sarama logging, which provides detailed logs from the Go client library used by Kafka Exporter. Useful for debugging Kafka client interactions.
- 9
- Readiness probe to check when Kafka Exporter is ready to serve metrics.
- 10
- Liveness probe to detect and restart Kafka Exporter if it becomes unresponsive.
- 11
- Template customization. Here a pod is scheduled with additional security attributes.
- 12
- ADVANCED OPTION: Container image configuration, which is recommended only in special situations.
Apply the changes to the
Kafkaconfiguration.Resources, including a
ServiceandPod, are created for the Kafka Exporter with the naming convention<kafka_cluster_name>-kafka-exporter.Configure Prometheus to scrape metrics from the Kafka Exporter endpoint.
If you are using the example Prometheus deployment, it is already set up to discover and scrape Kafka Exporter metrics. The
PodMonitorresource namedkafka-resources-metricsmatches thestrimzi.io/kind: Kafkalabel used to identify the Kafka Exporter. For more information, see Section 25.2.3.2, “Deploying the Prometheus resources”.Import the Kafka Exporter dashboard into Grafana to visualize consumer lag.
For more information, see Section 25.2.3.6, “Importing the example Grafana dashboards”.
TipUse the Lag by consumer group and Messages consumed per second panels to evaluate lag and the impact of tuning actions.
25.5. Cruise Control operations monitoring 复制链接链接已复制到粘贴板!
Cruise Control monitors Kafka brokers in order to track the utilization of brokers, topics, and partitions. Cruise Control also provides a set of metrics for monitoring its own performance.
The Cruise Control metrics reporter collects raw metrics data from Kafka brokers. The data is produced to topics that are automatically created by Cruise Control. The metrics are used to generate optimization proposals for Kafka clusters.
Cruise Control metrics are available for real-time monitoring of Cruise Control operations. For example, you can use Cruise Control metrics to monitor the status of rebalancing operations that are running or provide alerts on any anomalies that are detected in an operation’s performance.
For more information on exposing Cruise Control metrics, see Using Prometheus with Strimzi.
For a full list of available Cruise Control metrics, which are known as sensors, see the Cruise Control documentation
25.5.1. Monitoring balancedness scores 复制链接链接已复制到粘贴板!
Cruise Control metrics include a balancedness score. Balancedness is the measure of how evenly a workload is distributed in a Kafka cluster.
The Cruise Control metric for balancedness score (balancedness-score) might differ from the balancedness score in the KafkaRebalance resource. Cruise Control calculates each score using anomaly.detection.goals which might not be the same as the default.goals used in the KafkaRebalance resource. The anomaly.detection.goals are specified in the spec.cruiseControl.config of the Kafka custom resource.
Refreshing the KafkaRebalance resource fetches an optimization proposal. The latest cached optimization proposal is fetched if one of the following conditions applies:
-
KafkaRebalance
goalsmatch the goals configured in thedefault.goalssection of theKafkaresource -
KafkaRebalance
goalsare not specified
Otherwise, Cruise Control generates a new optimization proposal based on KafkaRebalance goals. If new proposals are generated with each refresh, this can impact performance monitoring.
25.5.2. Setting up alerts for anomaly detection 复制链接链接已复制到粘贴板!
Cruise control’s anomaly detector provides metrics data for conditions that block the generation of optimization goals, such as broker failures. If you want more visibility, you can use the metrics provided by the anomaly detector to set up alerts and send out notifications. You can set up Cruise Control’s anomaly notifier to route alerts based on these metrics through a specified notification channel. Alternatively, you can set up Prometheus to scrape the metrics data provided by the anomaly detector and generate alerts. Prometheus Alertmanager can then route the alerts generated by Prometheus.
The Cruise Control configurations documentation provides information on AnomalyDetector metrics and the anomaly notifier.