Chapter 16. Kafka Exporter
Kafka Exporter is an open source project to enhance monitoring of Apache Kafka brokers and clients.
Kafka Exporter is provided with AMQ Streams for deployment with a Kafka cluster to extract additional metrics data from Kafka brokers related to offsets, consumer groups, consumer lag, and topics.
The metrics data is used, for example, to help identify slow consumers.
Lag data is exposed as Prometheus metrics, which can then be presented in Grafana for analysis.
If you are already using Prometheus and Grafana for monitoring of built-in Kafka metrics, you can configure Prometheus to also scrape the Kafka Exporter Prometheus endpoint.
Additional resources
Kafka exposes metrics through JMX, which can then be exported as Prometheus metrics.
16.1. Consumer lag
Consumer lag indicates the difference in the rate of production and consumption of messages. Specifically, consumer lag for a given consumer group indicates the delay between the last message in the partition and the message being currently picked up by that consumer. The lag reflects the position of the consumer offset in relation to the end of the partition log.
This difference is sometimes referred to as the delta between the producer offset and consumer offset, the read and write positions in the Kafka broker topic partitions.
Suppose a topic streams 100 messages a second. A lag of 1000 messages between the producer offset (the topic partition head) and the last offset the consumer has read means a 10-second delay.
The importance of monitoring consumer lag
For applications that rely on the processing of (near) real-time data, it is critical to monitor consumer lag to check that it does not become too big. The greater the lag becomes, the further the process moves from the real-time processing objective.
Consumer lag, for example, might be a result of consuming too much old data that has not been purged, or through unplanned shutdowns.
Reducing consumer lag
Typical actions to reduce lag include:
- Scaling-up consumer groups by adding new consumers
- Increasing the retention time for a message to remain in a topic
- Adding more disk capacity to increase the message buffer
Actions to reduce consumer lag depend on the underlying infrastructure and the use cases AMQ Streams is supporting. For instance, a lagging consumer is less likely to benefit from the broker being able to service a fetch request from its disk cache. And in certain cases, it might be acceptable to automatically drop messages until a consumer has caught up.
16.2. Kafka Exporter alerting rule examples
The sample alert notification rules specific to Kafka Exporter are as follows:
UnderReplicatedPartition
- An alert to warn that a topic is under-replicated and the broker is not replicating enough partitions. The default configuration is for an alert if there are one or more under-replicated partitions for a topic. The alert might signify that a Kafka instance is down or the Kafka cluster is overloaded. A planned restart of the Kafka broker may be required to restart the replication process.
TooLargeConsumerGroupLag
- An alert to warn that the lag on a consumer group is too large for a specific topic partition. The default configuration is 1000 records. A large lag might indicate that consumers are too slow and are falling behind the producers.
NoMessageForTooLong
- An alert to warn that a topic has not received messages for a period of time. The default configuration for the time period is 10 minutes. The delay might be a result of a configuration issue preventing a producer from publishing messages to the topic.
You can adapt alerting rules according to your specific needs.
Additional resources
For more information about setting up alerting rules, see Configuration in the Prometheus documentation.
16.3. Kafka Exporter metrics
Lag information is exposed by Kafka Exporter as Prometheus metrics for presentation in Grafana.
Kafka Exporter exposes metrics data for brokers, topics, and consumer groups.
Name | Information |
---|---|
| Number of brokers in the Kafka cluster |
Name | Information |
---|---|
| Number of partitions for a topic |
| Current topic partition offset for a broker |
| Oldest topic partition offset for a broker |
| Number of in-sync replicas for a topic partition |
| Leader broker ID of a topic partition |
|
Shows |
| Number of replicas for this topic partition |
|
Shows |
Name | Information |
---|---|
| Current topic partition offset for a consumer group |
| Current approximate lag for a consumer group at a topic partition |
16.4. Running Kafka Exporter
Kafka Exporter is provided with the download archive used for Installing AMQ Streams.
You can run it to expose Prometheus metrics for presentation in a Grafana dashboard.
Prerequisites
This procedure assumes you already have access to a Grafana user interface and Prometheus is deployed and added as a data source.
Procedure
Run the Kafka Exporter script using appropriate configuration parameter values.
./bin/kafka_exporter --kafka.server=<kafka-bootstrap-address>:9092 --kafka.version=2.8.0 --<my-other-parameters>
The parameters require a double-hyphen convention, such as
--kafka.server
.Table 16.4. Kafka Exporter configuration parameters Option Description Default kafka.server
Host/post address of the Kafka server.
kafka:9092
kafka.version
Kafka broker version.
1.0.0
group.filter
A regular expression to specify the consumer groups to include in the metrics.
.*
(all)topic.filter
A regular expression to specify the topics to include in the metrics.
.*
(all)sasl.<parameter>
Parameters to enable and connect to the Kafka cluster using SASL/PLAIN authentication, with user name and password.
false
tls.<parameter>
Parameters to enable connect to the Kafka cluster using TLS authentication, with optional certificate and key.
false
web.listen-address
Port address to expose the metrics.
:9308
web.telemetry-path
Path for the exposed metrics.
/metrics
log.level
Logging configuration, to log messages with a given severity (debug, info, warn, error, fatal) or above.
info
log.enable-sarama
Boolean to enable Sarama logging, a Go client library used by the Kafka Exporter.
false
legacy.partitions
Boolean to enable metrics to be fetched from inactive topic partitions as well as from active partitions. If you want Kafka Exporter to return metrics for inactive partitions, set to
true
.false
You can use
kafka_exporter --help
for information on the properties.Configure Prometheus to monitor the Kafka Exporter metrics.
For more information on configuring Prometheus, see the Prometheus documentation.
Enable Grafana to present the Kafka Exporter metrics data exposed by Prometheus.
For more information, see Presenting Kafka Exporter metrics in Grafana.
16.5. Presenting Kafka Exporter metrics in Grafana
Using Kafka Exporter Prometheus metrics as a data source, you can create a dashboard of Grafana charts.
For example, from the metrics you can create the following Grafana charts:
- Message in per second (from topics)
- Message in per minute (from topics)
- Lag by consumer group
- Messages consumed per minute (by consumer groups)
When metrics data has been collected for some time, the Kafka Exporter charts are populated.
Use the Grafana charts to analyze lag and to check if actions to reduce lag are having an impact on an affected consumer group. If, for example, Kafka brokers are adjusted to reduce lag, the dashboard will show the Lag by consumer group chart going down and the Messages consumed per minute chart going up.
Additional resources