Chapter 18. Monitoring your cluster using JMX
ZooKeeper, the Kafka broker, Kafka Connect, and the Kafka clients all expose management information using Java Management Extensions (JMX). Most management information is in the form of metrics that are useful for monitoring the condition and performance of your Kafka cluster. Like other Java applications, Kafka provides this management information through managed beans or MBeans.
JMX works at the level of the JVM (Java Virtual Machine). To obtain management information, external tools can connect to the JVM that is running ZooKeeper, the Kafka broker, and so on. By default, only tools on the same machine and running as the same user as the JVM are able to connect.
Management information for ZooKeeper is not documented here. You can view ZooKeeper metrics in JConsole. For more information, see Monitoring using JConsole.
18.1. JMX configuration options
You configure JMX using JVM system properties. The scripts provided with AMQ Streams (bin/kafka-server-start.sh
and bin/connect-distributed.sh
, and so on) use the KAFKA_JMX_OPTS
environment variable to set these system properties. The system properties for configuring JMX are the same, even though Kafka producer, consumer, and streams applications typically start the JVM in different ways.
18.2. Disabling the JMX agent
You can prevent local JMX tools from connecting to the JVM (for example, for compliance reasons) by disabling the JMX agent for an AMQ Streams component. The following procedure disables the JMX agent for a Kafka broker.
Procedure
Use the
KAFKA_JMX_OPTS
environment variable to setcom.sun.management.jmxremote
tofalse
.export KAFKA_JMX_OPTS=-Dcom.sun.management.jmxremote=false bin/kafka-server-start.sh
- Start the JVM.
18.3. Connecting to the JVM from a different machine
You can connect to the JVM from a different machine by configuring the port that the JMX agent listens on. This is insecure because it allows JMX tools to connect from anywhere, with no authentication.
Procedure
Use the
KAFKA_JMX_OPTS
environment variable to set-Dcom.sun.management.jmxremote.port=<port>
. For<port>
, enter the name of the port on which you want the Kafka broker to listen for JMX connections.export KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.port=<port> -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false" bin/kafka-server-start.sh
- Start the JVM.
It is recommended that you configure authentication and SSL to ensure that the remote JMX connection is secure. For more information about the system properties needed to do this, see the JMX documentation.
18.4. Monitoring using JConsole
The JConsole tool is distributed with the Java Development Kit (JDK). You can use JConsole to connect to a local or remote JVM and discover and display management information from Java applications. If using JConsole to connect to a local JVM, the names of the JVM processes correspond to the AMQ Streams components.
AMQ Streams component | JVM process |
---|---|
ZooKeeper |
|
Kafka broker |
|
Kafka Connect standalone |
|
Kafka Connect distributed |
|
Kafka MirrorMaker 2.0 |
|
Kafka MirrorMaker |
|
Kafka Bridge |
|
A Kafka producer, consumer, or Streams application |
The name of the class containing the |
When using JConsole to connect to a remote JVM, use the appropriate hostname and JMX port.
Many other tools and monitoring products can be used to fetch the metrics using JMX and provide monitoring and alerting based on those metrics. Refer to the product documentation for those tools.
18.5. Important Kafka broker metrics
Kafka provides many MBeans for monitoring the performance of the brokers in your Kafka cluster. These apply to an individual broker rather than the entire cluster.
The following tables present a selection of these broker-level MBeans organized into server, network, logging, and controller metrics.
18.5.1. Kafka server metrics
The following table shows a selection of metrics that report information about the Kafka server.
Metric | MBean | Description | Expected value |
---|---|---|---|
Messages in per second |
| The rate at which individual messages are consumed by the broker. | Approximately the same as the other brokers in the cluster. |
Bytes in per second |
| The rate at which data sent from producers is consumed by the broker. | Approximately the same as the other brokers in the cluster. |
Replication bytes in per second |
| The rate at which data sent from other brokers is consumed by the follower broker. | N/A |
Bytes out per second |
| The rate at which data is fetched and read from the broker by consumers. | N/A |
Replication bytes out per second |
| The rate at which data is sent from the broker to other brokers. This metric is useful to monitor if the broker is a leader for a group of partitions. | N/A |
Under-replicated partitions |
| The number of partitions that have not been fully replicated in the follower replicas. | Zero |
Under minimum ISR partition count |
| The number of partitions under the minimum In-Sync Replica (ISR) count. The ISR count indicates the set of replicas that are up-to-date with the leader. | Zero |
Partition count |
| The number of partitions in the broker. | Approximately even when compared with the other brokers. |
Leader count |
| The number of replicas for which this broker is the leader. | Approximately the same as the other brokers in the cluster. |
ISR shrinks per second |
| The rate at which the number of ISRs in the broker decreases | Zero |
ISR expands per second |
| The rate at which the number of ISRs in the broker increases. | Zero |
Maximum lag |
| The maximum lag between the time that messages are received by the leader replica and by the follower replicas. | Proportional to the maximum batch size of a produce request. |
Requests in producer purgatory |
| The number of send requests in the producer purgatory. | N/A |
Requests in fetch purgatory |
| The number of fetch requests in the fetch purgatory. | N/A |
Request handler average idle percent |
| Indicates the percentage of time that the request handler (IO) threads are not in use. | A lower value indicates that the workload of the broker is high. |
Request (Requests exempt from throttling) |
| The number of requests that are exempt from throttling. | N/A |
ZooKeeper request latency in milliseconds |
| The latency for ZooKeeper requests from the broker, in milliseconds. | N/A |
ZooKeeper session state |
| The status of the broker’s connection to ZooKeeper. | CONNECTED |
18.5.2. Kafka network metrics
The following table shows a selection of metrics that report information about requests.
Metric | MBean | Description | Expected value |
---|---|---|---|
Requests per second |
|
The total number of requests made for the request type per second. The | N/A |
Request bytes (request size in bytes) |
|
The size of requests, in bytes, made for the request type identified by the | N/A |
Temporary memory size in bytes |
| The amount of temporary memory used for converting message formats and decompressing messages. | N/A |
Message conversions time |
| Time, in milliseconds, spent on converting message formats. | N/A |
Total request time in milliseconds |
| Total time, in milliseconds, spent processing requests. | N/A |
Request queue time in milliseconds |
|
The time, in milliseconds, that a request currently spends in the queue for the request type given in the | N/A |
Local time (leader local processing time) in milliseconds |
| The time taken, in milliseconds, for the leader to process the request. | N/A |
Remote time (leader remote processing time) in milliseconds |
|
The length of time, in milliseconds, that the request waits for the follower. Separate MBeans for all available request types are listed under the | N/A |
Response queue time in milliseconds |
| The length of time, in milliseconds, that the request waits in the response queue. | N/A |
Response send time in milliseconds |
| The time taken, in milliseconds, to send the response. | N/A |
Network processor average idle percent |
| The average percentage of time that the network processors are idle. | Between zero and one. |
18.5.3. Kafka log metrics
The following table shows a selection of metrics that report information about logging.
Metric | MBean | Description | Expected Value |
---|---|---|---|
Log flush rate and time in milliseconds |
| The rate at which log data is written to disk, in milliseconds. | N/A |
Offline log directory count |
| The number of offline log directories (for example, after a hardware failure). | Zero |
18.5.4. Kafka controller metrics
The following table shows a selection of metrics that report information about the controller of the cluster.
Metric | MBean | Description | Expected Value |
---|---|---|---|
Active controller count |
| The number of brokers designated as controllers. | One indicates that the broker is the controller for the cluster. |
Leader election rate and time in milliseconds |
| The rate at which new leader replicas are elected. | Zero |
18.5.5. Yammer metrics
Metrics that express a rate or unit of time are provided as Yammer metrics. The class name of an MBean that uses Yammer metrics is prefixed with com.yammer.metrics
.
Yammer rate metrics have the following attributes for monitoring requests:
- Count
- EventType (Bytes)
- FifteenMinuteRate
- RateUnit (Seconds)
- MeanRate
- OneMinuteRate
- FiveMinuteRate
Yammer time metrics have the following attributes for monitoring requests:
- Max
- Min
- Mean
- StdDev
- 75/95/98/99/99.9th Percentile
18.6. Producer MBeans
MBeans are present in Kafka producer applications, including Kafka Streams applications and Kafka Connect with source connectors.
Producer metrics
Attribute | Description |
---|---|
batch-size-avg | The average number of bytes sent per partition per-request. |
batch-size-max | The max number of bytes sent per partition per-request. |
batch-split-rate | The average number of batch splits per second. |
batch-split-total | The total number of batch splits. |
buffer-available-bytes | The total amount of buffer memory that is not being used (either unallocated or in the free list). |
buffer-total-bytes | The maximum amount of buffer memory the client can use (whether or not it is currently used). |
bufferpool-wait-time | The fraction of time an appender waits for space allocation. |
bufferpool-wait-time-ns-total | The total time an appender waits for space allocation in nanoseconds. |
bufferpool-wait-time-total | Deprecated The total time an appender waits for space allocation in nanoseconds. Replacement is bufferpool-wait-time-ns-total. |
compression-rate-avg | The average compression rate of record batches, defined as the average ratio of the compressed batch size over the uncompressed size. |
connection-close-rate | Connections closed per second in the window. |
connection-close-total | Total connections closed in the window. |
connection-count | The current number of active connections. |
connection-creation-rate | New connections established per second in the window. |
connection-creation-total | Total new connections established in the window. |
failed-authentication-rate | Connections per second that failed authentication. |
failed-authentication-total | Total connections that failed authentication. |
failed-reauthentication-rate | Connections per second that failed re-authentication. |
failed-reauthentication-total | Total connections that failed re-authentication. |
flush-time-ns-total | The total time the Producer spent in Producer.flush in nanoseconds. |
incoming-byte-rate | Bytes/second read off all sockets. |
incoming-byte-total | Total bytes read off all sockets. |
io-ratio | The fraction of time the I/O thread spent doing I/O. |
io-time-ns-avg | The average length of time for I/O per select call in nanoseconds. |
io-time-ns-total | The total time the I/O thread spent doing I/O in nanoseconds. |
io-wait-ratio | The fraction of time the I/O thread spent waiting. |
io-wait-time-ns-avg | The average length of time the I/O thread spent waiting for a socket ready for reads or writes in nanoseconds. |
io-wait-time-ns-total | The total time the I/O thread spent waiting in nanoseconds. |
io-waittime-total | Deprecated The total time the I/O thread spent waiting in nanoseconds. Replacement is io-wait-time-ns-total. |
iotime-total | Deprecated The total time the I/O thread spent doing I/O in nanoseconds. Replacement is io-time-ns-total. |
metadata-age | The age in seconds of the current producer metadata being used. |
network-io-rate | The average number of network operations (reads or writes) on all connections per second. |
network-io-total | The total number of network operations (reads or writes) on all connections. |
outgoing-byte-rate | The average number of outgoing bytes sent per second to all servers. |
outgoing-byte-total | The total number of outgoing bytes sent to all servers. |
produce-throttle-time-avg | The average time in ms a request was throttled by a broker. |
produce-throttle-time-max | The maximum time in ms a request was throttled by a broker. |
reauthentication-latency-avg | The average latency in ms observed due to re-authentication. |
reauthentication-latency-max | The maximum latency in ms observed due to re-authentication. |
record-error-rate | The average per-second number of record sends that resulted in errors. |
record-error-total | The total number of record sends that resulted in errors. |
record-queue-time-avg | The average time in ms record batches spent in the send buffer. |
record-queue-time-max | The maximum time in ms record batches spent in the send buffer. |
record-retry-rate | The average per-second number of retried record sends. |
record-retry-total | The total number of retried record sends. |
record-send-rate | The average number of records sent per second. |
record-send-total | The total number of records sent. |
record-size-avg | The average record size. |
record-size-max | The maximum record size. |
records-per-request-avg | The average number of records per request. |
request-latency-avg | The average request latency in ms. |
request-latency-max | The maximum request latency in ms. |
request-rate | The average number of requests sent per second. |
request-size-avg | The average size of all requests in the window. |
request-size-max | The maximum size of any request sent in the window. |
request-total | The total number of requests sent. |
requests-in-flight | The current number of in-flight requests awaiting a response. |
response-rate | Responses received per second. |
response-total | Total responses received. |
select-rate | Number of times the I/O layer checked for new I/O to perform per second. |
select-total | Total number of times the I/O layer checked for new I/O to perform. |
successful-authentication-no-reauth-total | Total connections that were successfully authenticated by older, pre-2.2.0 SASL clients that do not support re-authentication. May only be non-zero. |
successful-authentication-rate | Connections per second that were successfully authenticated using SASL or SSL. |
successful-authentication-total | Total connections that were successfully authenticated using SASL or SSL. |
successful-reauthentication-rate | Connections per second that were successfully re-authenticated using SASL. |
successful-reauthentication-total | Total connections that were successfully re-authenticated using SASL. |
txn-abort-time-ns-total | The total time the Producer spent aborting transactions in nanoseconds (for EOS). |
txn-begin-time-ns-total | The total time the Producer spent in beginTransaction in nanoseconds (for EOS). |
txn-commit-time-ns-total | The total time the Producer spent committing transactions in nanoseconds (for EOS). |
txn-init-time-ns-total | The total time the Producer spent initializing transactions in nanoseconds (for EOS). |
txn-send-offsets-time-ns-total | The total time the Producer spent sending offsets to transactions in nanoseconds (for EOS). |
waiting-threads | The number of user threads blocked waiting for buffer memory to enqueue their records. |
Producer metrics about broker connections
Attribute | Description |
---|---|
incoming-byte-rate | The average number of bytes received per second for a node. |
incoming-byte-total | The total number of bytes received for a node. |
outgoing-byte-rate | The average number of outgoing bytes sent per second for a node. |
outgoing-byte-total | The total number of outgoing bytes sent for a node. |
request-latency-avg | The average request latency in ms for a node. |
request-latency-max | The maximum request latency in ms for a node. |
request-rate | The average number of requests sent per second for a node. |
request-size-avg | The average size of all requests in the window for a node. |
request-size-max | The maximum size of any request sent in the window for a node. |
request-total | The total number of requests sent for a node. |
response-rate | Responses received per second for a node. |
response-total | Total responses received for a node. |
Producer metrics about messages sent to topics
Attribute | Description |
---|---|
byte-rate | The average number of bytes sent per second for a topic. |
byte-total | The total number of bytes sent for a topic. |
compression-rate | The average compression rate of record batches for a topic, defined as the average ratio of the compressed batch size over the uncompressed size. |
record-error-rate | The average per-second number of record sends that resulted in errors for a topic. |
record-error-total | The total number of record sends that resulted in errors for a topic. |
record-retry-rate | The average per-second number of retried record sends for a topic. |
record-retry-total | The total number of retried record sends for a topic. |
record-send-rate | The average number of records sent per second for a topic. |
record-send-total | The total number of records sent for a topic. |
18.7. Consumer MBeans
MBeans are present in Kafka consumer applications, including Kafka Streams applications and Kafka Connect with sink connectors.
Consumer metrics
Attribute | Description |
---|---|
connection-close-rate | Connections closed per second in the window. |
connection-close-total | Total connections closed in the window. |
connection-count | The current number of active connections. |
connection-creation-rate | New connections established per second in the window. |
connection-creation-total | Total new connections established in the window. |
failed-authentication-rate | Connections per second that failed authentication. |
failed-authentication-total | Total connections that failed authentication. |
failed-reauthentication-rate | Connections per second that failed re-authentication. |
failed-reauthentication-total | Total connections that failed re-authentication. |
incoming-byte-rate | Bytes/second read off all sockets. |
incoming-byte-total | Total bytes read off all sockets. |
io-ratio | The fraction of time the I/O thread spent doing I/O. |
io-time-ns-avg | The average length of time for I/O per select call in nanoseconds. |
io-time-ns-total | The total time the I/O thread spent doing I/O in nanoseconds. |
io-wait-ratio | The fraction of time the I/O thread spent waiting. |
io-wait-time-ns-avg | The average length of time the I/O thread spent waiting for a socket ready for reads or writes in nanoseconds. |
io-wait-time-ns-total | The total time the I/O thread spent waiting in nanoseconds. |
io-waittime-total | Deprecated The total time the I/O thread spent waiting in nanoseconds. Replacement is io-wait-time-ns-total. |
iotime-total | Deprecated The total time the I/O thread spent doing I/O in nanoseconds. Replacement is io-time-ns-total. |
network-io-rate | The average number of network operations (reads or writes) on all connections per second. |
network-io-total | The total number of network operations (reads or writes) on all connections. |
outgoing-byte-rate | The average number of outgoing bytes sent per second to all servers. |
outgoing-byte-total | The total number of outgoing bytes sent to all servers. |
reauthentication-latency-avg | The average latency in ms observed due to re-authentication. |
reauthentication-latency-max | The maximum latency in ms observed due to re-authentication. |
request-rate | The average number of requests sent per second. |
request-size-avg | The average size of all requests in the window. |
request-size-max | The maximum size of any request sent in the window. |
request-total | The total number of requests sent. |
response-rate | Responses received per second. |
response-total | Total responses received. |
select-rate | Number of times the I/O layer checked for new I/O to perform per second. |
select-total | Total number of times the I/O layer checked for new I/O to perform. |
successful-authentication-no-reauth-total | Total connections that were successfully authenticated by older, pre-2.2.0 SASL clients that do not support re-authentication. May only be non-zero. |
successful-authentication-rate | Connections per second that were successfully authenticated using SASL or SSL. |
successful-authentication-total | Total connections that were successfully authenticated using SASL or SSL. |
successful-reauthentication-rate | Connections per second that were successfully re-authenticated using SASL. |
successful-reauthentication-total | Total connections that were successfully re-authenticated using SASL. |
Consumer metrics about broker connections
Attribute | Description |
---|---|
incoming-byte-rate | The average number of bytes received per second for a node. |
incoming-byte-total | The total number of bytes received for a node. |
outgoing-byte-rate | The average number of outgoing bytes sent per second for a node. |
outgoing-byte-total | The total number of outgoing bytes sent for a node. |
request-latency-avg | The average request latency in ms for a node. |
request-latency-max | The maximum request latency in ms for a node. |
request-rate | The average number of requests sent per second for a node. |
request-size-avg | The average size of all requests in the window for a node. |
request-size-max | The maximum size of any request sent in the window for a node. |
request-total | The total number of requests sent for a node. |
response-rate | Responses received per second for a node. |
response-total | Total responses received for a node. |
Consumer group metrics
Attribute | Description |
---|---|
assigned-partitions | The number of partitions currently assigned to this consumer. |
commit-latency-avg | The average time taken for a commit request. |
commit-latency-max | The max time taken for a commit request. |
commit-rate | The number of commit calls per second. |
commit-total | The total number of commit calls. |
failed-rebalance-rate-per-hour | The number of failed group rebalance event per hour. |
failed-rebalance-total | The total number of failed group rebalances. |
heartbeat-rate | The average number of heartbeats per second. |
heartbeat-response-time-max | The max time taken to receive a response to a heartbeat request. |
heartbeat-total | The total number of heartbeats. |
join-rate | The number of group joins per second. |
join-time-avg | The average time taken for a group rejoin. |
join-time-max | The max time taken for a group rejoin. |
join-total | The total number of group joins. |
last-heartbeat-seconds-ago | The number of seconds since the last controller heartbeat. |
last-rebalance-seconds-ago | The number of seconds since the last rebalance event. |
partitions-assigned-latency-avg | The average time taken by the on-partitions-assigned rebalance listener callback. |
partitions-assigned-latency-max | The max time taken by the on-partitions-assigned rebalance listener callback. |
partitions-lost-latency-avg | The average time taken by the on-partitions-lost rebalance listener callback. |
partitions-lost-latency-max | The max time taken by the on-partitions-lost rebalance listener callback. |
partitions-revoked-latency-avg | The average time taken by the on-partitions-revoked rebalance listener callback. |
partitions-revoked-latency-max | The max time taken by the on-partitions-revoked rebalance listener callback. |
rebalance-latency-avg | The average time taken for a group rebalance. |
rebalance-latency-max | The max time taken for a group rebalance. |
rebalance-latency-total | The total time taken for group rebalances so far. |
rebalance-rate-per-hour | The number of group rebalance participated per hour. |
rebalance-total | The total number of group rebalances participated. |
sync-rate | The number of group syncs per second. |
sync-time-avg | The average time taken for a group sync. |
sync-time-max | The max time taken for a group sync. |
sync-total | The total number of group syncs. |
Consumer fetcher metrics
Attribute | Description |
---|---|
bytes-consumed-rate | The average number of bytes consumed per second. |
bytes-consumed-total | The total number of bytes consumed. |
fetch-latency-avg | The average time taken for a fetch request. |
fetch-latency-max | The max time taken for any fetch request. |
fetch-rate | The number of fetch requests per second. |
fetch-size-avg | The average number of bytes fetched per request. |
fetch-size-max | The maximum number of bytes fetched per request. |
fetch-throttle-time-avg | The average throttle time in ms. |
fetch-throttle-time-max | The maximum throttle time in ms. |
fetch-total | The total number of fetch requests. |
records-consumed-rate | The average number of records consumed per second. |
records-consumed-total | The total number of records consumed. |
records-lag-max | The maximum lag in terms of number of records for any partition in this window. NOTE: This is based on current offset and not committed offset. |
records-lead-min | The minimum lead in terms of number of records for any partition in this window. |
records-per-request-avg | The average number of records in each request. |
Consumer fetcher metrics at the topic level
Attribute | Description |
---|---|
bytes-consumed-rate | The average number of bytes consumed per second for a topic. |
bytes-consumed-total | The total number of bytes consumed for a topic. |
fetch-size-avg | The average number of bytes fetched per request for a topic. |
fetch-size-max | The maximum number of bytes fetched per request for a topic. |
records-consumed-rate | The average number of records consumed per second for a topic. |
records-consumed-total | The total number of records consumed for a topic. |
records-per-request-avg | The average number of records in each request for a topic. |
Consumer fetcher metrics at the partition level
Attribute | Description |
---|---|
preferred-read-replica | The current read replica for the partition, or -1 if reading from leader. |
records-lag | The latest lag of the partition. |
records-lag-avg | The average lag of the partition. |
records-lag-max | The max lag of the partition. |
records-lead | The latest lead of the partition. |
records-lead-avg | The average lead of the partition. |
records-lead-min | The min lead of the partition. |
18.8. Kafka Connect MBeans
Kafka Connect metrics
Attribute | Description |
---|---|
connection-close-rate | Connections closed per second in the window. |
connection-close-total | Total connections closed in the window. |
connection-count | The current number of active connections. |
connection-creation-rate | New connections established per second in the window. |
connection-creation-total | Total new connections established in the window. |
failed-authentication-rate | Connections per second that failed authentication. |
failed-authentication-total | Total connections that failed authentication. |
failed-reauthentication-rate | Connections per second that failed re-authentication. |
failed-reauthentication-total | Total connections that failed re-authentication. |
incoming-byte-rate | Bytes/second read off all sockets. |
incoming-byte-total | Total bytes read off all sockets. |
io-ratio | The fraction of time the I/O thread spent doing I/O. |
io-time-ns-avg | The average length of time for I/O per select call in nanoseconds. |
io-time-ns-total | The total time the I/O thread spent doing I/O in nanoseconds. |
io-wait-ratio | The fraction of time the I/O thread spent waiting. |
io-wait-time-ns-avg | The average length of time the I/O thread spent waiting for a socket ready for reads or writes in nanoseconds. |
io-wait-time-ns-total | The total time the I/O thread spent waiting in nanoseconds. |
io-waittime-total | Deprecated The total time the I/O thread spent waiting in nanoseconds. Replacement is io-wait-time-ns-total. |
iotime-total | Deprecated The total time the I/O thread spent doing I/O in nanoseconds. Replacement is io-time-ns-total. |
network-io-rate | The average number of network operations (reads or writes) on all connections per second. |
network-io-total | The total number of network operations (reads or writes) on all connections. |
outgoing-byte-rate | The average number of outgoing bytes sent per second to all servers. |
outgoing-byte-total | The total number of outgoing bytes sent to all servers. |
reauthentication-latency-avg | The average latency in ms observed due to re-authentication. |
reauthentication-latency-max | The maximum latency in ms observed due to re-authentication. |
request-rate | The average number of requests sent per second. |
request-size-avg | The average size of all requests in the window. |
request-size-max | The maximum size of any request sent in the window. |
request-total | The total number of requests sent. |
response-rate | Responses received per second. |
response-total | Total responses received. |
select-rate | Number of times the I/O layer checked for new I/O to perform per second. |
select-total | Total number of times the I/O layer checked for new I/O to perform. |
successful-authentication-no-reauth-total | Total connections that were successfully authenticated by older, pre-2.2.0 SASL clients that do not support re-authentication. May only be non-zero. |
successful-authentication-rate | Connections per second that were successfully authenticated using SASL or SSL. |
successful-authentication-total | Total connections that were successfully authenticated using SASL or SSL. |
successful-reauthentication-rate | Connections per second that were successfully re-authenticated using SASL. |
successful-reauthentication-total | Total connections that were successfully re-authenticated using SASL. |
Kafka Connect metrics about broker connections
Attribute | Description |
---|---|
incoming-byte-rate | The average number of bytes received per second for a node. |
incoming-byte-total | The total number of bytes received for a node. |
outgoing-byte-rate | The average number of outgoing bytes sent per second for a node. |
outgoing-byte-total | The total number of outgoing bytes sent for a node. |
request-latency-avg | The average request latency in ms for a node. |
request-latency-max | The maximum request latency in ms for a node. |
request-rate | The average number of requests sent per second for a node. |
request-size-avg | The average size of all requests in the window for a node. |
request-size-max | The maximum size of any request sent in the window for a node. |
request-total | The total number of requests sent for a node. |
response-rate | Responses received per second for a node. |
response-total | Total responses received for a node. |
Kafka Connect metrics about workers
Attribute | Description |
---|---|
connector-count | The number of connectors run in this worker. |
connector-startup-attempts-total | The total number of connector startups that this worker has attempted. |
connector-startup-failure-percentage | The average percentage of this worker’s connectors starts that failed. |
connector-startup-failure-total | The total number of connector starts that failed. |
connector-startup-success-percentage | The average percentage of this worker’s connectors starts that succeeded. |
connector-startup-success-total | The total number of connector starts that succeeded. |
task-count | The number of tasks run in this worker. |
task-startup-attempts-total | The total number of task startups that this worker has attempted. |
task-startup-failure-percentage | The average percentage of this worker’s tasks starts that failed. |
task-startup-failure-total | The total number of task starts that failed. |
task-startup-success-percentage | The average percentage of this worker’s tasks starts that succeeded. |
task-startup-success-total | The total number of task starts that succeeded. |
Kafka Connect metrics about rebalances
Attribute | Description |
---|---|
completed-rebalances-total | The total number of rebalances completed by this worker. |
connect-protocol | The Connect protocol used by this cluster. |
epoch | The epoch or generation number of this worker. |
leader-name | The name of the group leader. |
rebalance-avg-time-ms | The average time in milliseconds spent by this worker to rebalance. |
rebalance-max-time-ms | The maximum time in milliseconds spent by this worker to rebalance. |
rebalancing | Whether this worker is currently rebalancing. |
time-since-last-rebalance-ms | The time in milliseconds since this worker completed the most recent rebalance. |
Kafka Connect metrics about connectors
Attribute | Description |
---|---|
connector-class | The name of the connector class. |
connector-type | The type of the connector. One of 'source' or 'sink'. |
connector-version | The version of the connector class, as reported by the connector. |
status | The status of the connector. One of 'unassigned', 'running', 'paused', 'failed', or 'restarting'. |
Kafka Connect metrics about connector tasks
Attribute | Description |
---|---|
batch-size-avg | The average size of the batches processed by the connector. |
batch-size-max | The maximum size of the batches processed by the connector. |
offset-commit-avg-time-ms | The average time in milliseconds taken by this task to commit offsets. |
offset-commit-failure-percentage | The average percentage of this task’s offset commit attempts that failed. |
offset-commit-max-time-ms | The maximum time in milliseconds taken by this task to commit offsets. |
offset-commit-success-percentage | The average percentage of this task’s offset commit attempts that succeeded. |
pause-ratio | The fraction of time this task has spent in the pause state. |
running-ratio | The fraction of time this task has spent in the running state. |
status | The status of the connector task. One of 'unassigned', 'running', 'paused', 'failed', or 'restarting'. |
Kafka Connect metrics about sink connectors
Attribute | Description |
---|---|
offset-commit-completion-rate | The average per-second number of offset commit completions that were completed successfully. |
offset-commit-completion-total | The total number of offset commit completions that were completed successfully. |
offset-commit-seq-no | The current sequence number for offset commits. |
offset-commit-skip-rate | The average per-second number of offset commit completions that were received too late and skipped/ignored. |
offset-commit-skip-total | The total number of offset commit completions that were received too late and skipped/ignored. |
partition-count | The number of topic partitions assigned to this task belonging to the named sink connector in this worker. |
put-batch-avg-time-ms | The average time taken by this task to put a batch of sinks records. |
put-batch-max-time-ms | The maximum time taken by this task to put a batch of sinks records. |
sink-record-active-count | The number of records that have been read from Kafka but not yet completely committed/flushed/acknowledged by the sink task. |
sink-record-active-count-avg | The average number of records that have been read from Kafka but not yet completely committed/flushed/acknowledged by the sink task. |
sink-record-active-count-max | The maximum number of records that have been read from Kafka but not yet completely committed/flushed/acknowledged by the sink task. |
sink-record-lag-max | The maximum lag in terms of number of records that the sink task is behind the consumer’s position for any topic partitions. |
sink-record-read-rate | The average per-second number of records read from Kafka for this task belonging to the named sink connector in this worker. This is before transformations are applied. |
sink-record-read-total | The total number of records read from Kafka by this task belonging to the named sink connector in this worker, since the task was last restarted. |
sink-record-send-rate | The average per-second number of records output from the transformations and sent/put to this task belonging to the named sink connector in this worker. This is after transformations are applied and excludes any records filtered out by the transformations. |
sink-record-send-total | The total number of records output from the transformations and sent/put to this task belonging to the named sink connector in this worker, since the task was last restarted. |
Kafka Connect metrics about source connectors
Attribute | Description |
---|---|
poll-batch-avg-time-ms | The average time in milliseconds taken by this task to poll for a batch of source records. |
poll-batch-max-time-ms | The maximum time in milliseconds taken by this task to poll for a batch of source records. |
source-record-active-count | The number of records that have been produced by this task but not yet completely written to Kafka. |
source-record-active-count-avg | The average number of records that have been produced by this task but not yet completely written to Kafka. |
source-record-active-count-max | The maximum number of records that have been produced by this task but not yet completely written to Kafka. |
source-record-poll-rate | The average per-second number of records produced/polled (before transformation) by this task belonging to the named source connector in this worker. |
source-record-poll-total | The total number of records produced/polled (before transformation) by this task belonging to the named source connector in this worker. |
source-record-write-rate | The average per-second number of records output from the transformations and written to Kafka for this task belonging to the named source connector in this worker. This is after transformations are applied and excludes any records filtered out by the transformations. |
source-record-write-total | The number of records output from the transformations and written to Kafka for this task belonging to the named source connector in this worker, since the task was last restarted. |
transaction-size-avg | The average number of records in the transactions the task has committed so far. |
transaction-size-max | The number of records in the largest transaction the task has committed so far. |
transaction-size-min | The number of records in the smallest transaction the task has committed so far. |
Kafka Connect metrics about connector errors
Attribute | Description |
---|---|
deadletterqueue-produce-failures | The number of failed writes to the dead letter queue. |
deadletterqueue-produce-requests | The number of attempted writes to the dead letter queue. |
last-error-timestamp | The epoch timestamp when this task last encountered an error. |
total-errors-logged | The number of errors that were logged. |
total-record-errors | The number of record processing errors in this task. |
total-record-failures | The number of record processing failures in this task. |
total-records-skipped | The number of records skipped due to errors. |
total-retries | The number of operations retried. |
18.9. Kafka Streams MBeans
Kafka Streams metrics for clients
These metrics are collected when the metrics.recording.level
configuration parameter is info
or debug
.
Attribute | Description |
---|---|
blocked-time-ns-total | The total time the thread spent blocked on kafka. |
commit-latency-avg | The average execution time in ms, for committing, across all running tasks of this thread. |
commit-latency-max | The maximum execution time in ms, for committing, across all running tasks of this thread. |
commit-rate | The average number of commits per second. |
commit-total | The total number of commit calls. |
poll-latency-avg | The average execution time in ms, for consumer polling. |
poll-latency-max | The maximum execution time in ms, for consumer polling. |
poll-rate | The average number of consumer poll calls per second. |
poll-total | The total number of consumer poll calls. |
process-latency-avg | The average execution time in ms, for processing. |
process-latency-max | The maximum execution time in ms, for processing. |
process-rate | The average number of processed records per second. |
process-total | The total number of processed records. |
punctuate-latency-avg | The average execution time in ms, for punctuating. |
punctuate-latency-max | The maximum execution time in ms, for punctuating. |
punctuate-rate | The average number of punctuate calls per second. |
punctuate-total | The total number of punctuate calls. |
task-closed-rate | The average number of tasks closed per second. |
task-closed-total | The total number of tasks closed. |
task-created-rate | The average number of tasks created per second. |
task-created-total | The total number of tasks created. |
thread-start-time | The time that the thread was started. |
Kafka Streams metrics for tasks
These metrics are collected when the metrics.recording.level
configuration parameter is debug
.
Attribute | Description |
---|---|
active-process-ratio | The fraction of time the stream thread spent on processing this task among all assigned active tasks. |
commit-latency-avg | The average execution time in ns, for committing. |
commit-latency-max | The maximum execution time in ns, for committing. |
commit-rate | The average number of commit calls per second. |
commit-total | The total number of commit calls. |
dropped-records-rate | The average number of records dropped within this task. |
dropped-records-total | The total number of records dropped within this task. |
enforced-processing-rate | The average number of enforced processings per second. |
enforced-processing-total | The total number enforced processings. |
process-latency-avg | The average execution time in ns, for processing. |
process-latency-max | The maximum execution time in ns, for processing. |
process-rate | The average number of processed records per second across all source processor nodes of this task. |
process-total | The total number of processed records across all source processor nodes of this task. |
record-lateness-avg | The average observed lateness of records (stream time - record timestamp). |
record-lateness-max | The max observed lateness of records (stream time - record timestamp). |
Kafka Streams metrics for processor nodes
These metrics are collected when the metrics.recording.level
configuration parameter is debug
.
Attribute | Description |
---|---|
bytes-consumed-total | The total number of bytes consumed by a source processor node. |
bytes-produced-total | The total number of bytes produced by a sink processor node. |
process-rate | The average number of records processed by a source processor node per second. |
process-total | The total number of records processed by a source processor node per second. |
record-e2e-latency-avg | The average end-to-end latency of a record, measured by comparing the record timestamp with the system time when it has been fully processed by the node. |
record-e2e-latency-max | The maximum end-to-end latency of a record, measured by comparing the record timestamp with the system time when it has been fully processed by the node. |
record-e2e-latency-min | The minimum end-to-end latency of a record, measured by comparing the record timestamp with the system time when it has been fully processed by the node. |
records-consumed-total | The total number of records consumed by a source processor node. |
records-produced-total | The total number of records produced by a sink processor node. |
suppression-emit-rate | The rate at which records that have been emitted downstream from suppression operation nodes. |
suppression-emit-total | The total number of records that have been emitted downstream from suppression operation nodes. |
Kafka Streams metrics for state stores
These metrics are collected when the metrics.recording.level
configuration parameter is debug
.
Attribute | Description |
---|---|
all-latency-avg | The average all operation execution time in ns. |
all-latency-max | The maximum all operation execution time in ns. |
all-rate | The average all operation rate for this store. |
delete-latency-avg | The average delete execution time in ns. |
delete-latency-max | The maximum delete execution time in ns. |
delete-rate | The average delete rate for this store. |
flush-latency-avg | The average flush execution time in ns. |
flush-latency-max | The maximum flush execution time in ns. |
flush-rate | The average flush rate for this store. |
get-latency-avg | The average get execution time in ns. |
get-latency-max | The maximum get execution time in ns. |
get-rate | The average get rate for this store. |
put-all-latency-avg | The average put-all execution time in ns. |
put-all-latency-max | The maximum put-all execution time in ns. |
put-all-rate | The average put-all rate for this store. |
put-if-absent-latency-avg | The average put-if-absent execution time in ns. |
put-if-absent-latency-max | The maximum put-if-absent execution time in ns. |
put-if-absent-rate | The average put-if-absent rate for this store. |
put-latency-avg | The average put execution time in ns. |
put-latency-max | The maximum put execution time in ns. |
put-rate | The average put rate for this store. |
range-latency-avg | The average range execution time in ns. |
range-latency-max | The maximum range execution time in ns. |
range-rate | The average range rate for this store. |
record-e2e-latency-avg | The average end-to-end latency of a record, measured by comparing the record timestamp with the system time when it has been fully processed by the node. |
record-e2e-latency-max | The maximum end-to-end latency of a record, measured by comparing the record timestamp with the system time when it has been fully processed by the node. |
record-e2e-latency-min | The minimum end-to-end latency of a record, measured by comparing the record timestamp with the system time when it has been fully processed by the node. |
restore-latency-avg | The average restore execution time in ns. |
restore-latency-max | The maximum restore execution time in ns. |
restore-rate | The average restore rate for this store. |
suppression-buffer-count-avg | The average number of records buffered over the sampling window. |
suppression-buffer-count-max | The maximum number of records buffered over the sampling window. |
suppression-buffer-size-avg | The average total size, in bytes, of the buffered data over the sampling window. |
suppression-buffer-size-max | The maximum total size, in bytes, of the buffered data over the sampling window. |
Kafka Streams metrics for record caches
These metrics are collected when the metrics.recording.level
configuration parameter is debug
.
Attribute | Description |
---|---|
hit-ratio-avg | The average cache hit ratio defined as the ratio of cache read hits over the total cache read requests. |
hit-ratio-max | The maximum cache hit ratio. |
hit-ratio-min | The minimum cache hit ratio. |