Chapter 18. Monitoring your cluster using JMX


ZooKeeper, the Kafka broker, Kafka Connect, and the Kafka clients all expose management information using Java Management Extensions (JMX). Most management information is in the form of metrics that are useful for monitoring the condition and performance of your Kafka cluster. Like other Java applications, Kafka provides this management information through managed beans or MBeans.

JMX works at the level of the JVM (Java Virtual Machine). To obtain management information, external tools can connect to the JVM that is running ZooKeeper, the Kafka broker, and so on. By default, only tools on the same machine and running as the same user as the JVM are able to connect.

Note

Management information for ZooKeeper is not documented here. You can view ZooKeeper metrics in JConsole. For more information, see Monitoring using JConsole.

18.1. JMX configuration options

You configure JMX using JVM system properties. The scripts provided with AMQ Streams (bin/kafka-server-start.sh and bin/connect-distributed.sh, and so on) use the KAFKA_JMX_OPTS environment variable to set these system properties. The system properties for configuring JMX are the same, even though Kafka producer, consumer, and streams applications typically start the JVM in different ways.

18.2. Disabling the JMX agent

You can prevent local JMX tools from connecting to the JVM (for example, for compliance reasons) by disabling the JMX agent for an AMQ Streams component. The following procedure disables the JMX agent for a Kafka broker.

Procedure

  1. Use the KAFKA_JMX_OPTS environment variable to set com.sun.management.jmxremote to false.

    export KAFKA_JMX_OPTS=-Dcom.sun.management.jmxremote=false
    bin/kafka-server-start.sh
  2. Start the JVM.

18.3. Connecting to the JVM from a different machine

You can connect to the JVM from a different machine by configuring the port that the JMX agent listens on. This is insecure because it allows JMX tools to connect from anywhere, with no authentication.

Procedure

  1. Use the KAFKA_JMX_OPTS environment variable to set -Dcom.sun.management.jmxremote.port=<port>. For <port>, enter the name of the port on which you want the Kafka broker to listen for JMX connections.

    export KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote=true
      -Dcom.sun.management.jmxremote.port=<port>
      -Dcom.sun.management.jmxremote.authenticate=false
      -Dcom.sun.management.jmxremote.ssl=false"
    bin/kafka-server-start.sh
  2. Start the JVM.
Important

It is recommended that you configure authentication and SSL to ensure that the remote JMX connection is secure. For more information about the system properties needed to do this, see the JMX documentation.

18.4. Monitoring using JConsole

The JConsole tool is distributed with the Java Development Kit (JDK). You can use JConsole to connect to a local or remote JVM and discover and display management information from Java applications. If using JConsole to connect to a local JVM, the names of the JVM processes correspond to the AMQ Streams components.

Table 18.1. JVM processes for AMQ Streams components
AMQ Streams componentJVM process

ZooKeeper

org.apache.zookeeper.server.quorum.QuorumPeerMain

Kafka broker

kafka.Kafka

Kafka Connect standalone

org.apache.kafka.connect.cli.ConnectStandalone

Kafka Connect distributed

org.apache.kafka.connect.cli.ConnectDistributed

Kafka MirrorMaker 2.0

kafka.tools.MirrorMaker

Kafka MirrorMaker

org.apache.kafka.connect.mirror.MirrorMaker

Kafka Bridge

io.strimzi.kafka.bridge.Application

A Kafka producer, consumer, or Streams application

The name of the class containing the main method for the application.

When using JConsole to connect to a remote JVM, use the appropriate hostname and JMX port.

Many other tools and monitoring products can be used to fetch the metrics using JMX and provide monitoring and alerting based on those metrics. Refer to the product documentation for those tools.

18.5. Important Kafka broker metrics

Kafka provides many MBeans for monitoring the performance of the brokers in your Kafka cluster. These apply to an individual broker rather than the entire cluster.

The following tables present a selection of these broker-level MBeans organized into server, network, logging, and controller metrics.

18.5.1. Kafka server metrics

The following table shows a selection of metrics that report information about the Kafka server.

Table 18.2. Metrics for the Kafka server
MetricMBeanDescriptionExpected value

Messages in per second

kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec

The rate at which individual messages are consumed by the broker.

Approximately the same as the other brokers in the cluster.

Bytes in per second

kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec

The rate at which data sent from producers is consumed by the broker.

Approximately the same as the other brokers in the cluster.

Replication bytes in per second

kafka.server:type=BrokerTopicMetrics,name=ReplicationBytesInPerSec

The rate at which data sent from other brokers is consumed by the follower broker.

N/A

Bytes out per second

kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec

The rate at which data is fetched and read from the broker by consumers.

N/A

Replication bytes out per second

kafka.server:type=BrokerTopicMetrics,name=ReplicationBytesOutPerSec

The rate at which data is sent from the broker to other brokers. This metric is useful to monitor if the broker is a leader for a group of partitions.

N/A

Under-replicated partitions

kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions

The number of partitions that have not been fully replicated in the follower replicas.

Zero

Under minimum ISR partition count

kafka.server:type=ReplicaManager,name=UnderMinIsrPartitionCount

The number of partitions under the minimum In-Sync Replica (ISR) count. The ISR count indicates the set of replicas that are up-to-date with the leader.

Zero

Partition count

kafka.server:type=ReplicaManager,name=PartitionCount

The number of partitions in the broker.

Approximately even when compared with the other brokers.

Leader count

kafka.server:type=ReplicaManager,name=LeaderCount

The number of replicas for which this broker is the leader.

Approximately the same as the other brokers in the cluster.

ISR shrinks per second

kafka.server:type=ReplicaManager,name=IsrShrinksPerSec

The rate at which the number of ISRs in the broker decreases

Zero

ISR expands per second

kafka.server:type=ReplicaManager,name=IsrExpandsPerSec

The rate at which the number of ISRs in the broker increases.

Zero

Maximum lag

kafka.server:type=ReplicaFetcherManager,name=MaxLag,clientId=Replica

The maximum lag between the time that messages are received by the leader replica and by the follower replicas.

Proportional to the maximum batch size of a produce request.

Requests in producer purgatory

kafka.server:type=DelayedOperationPurgatory,name=PurgatorySize,delayedOperation=Produce

The number of send requests in the producer purgatory.

N/A

Requests in fetch purgatory

kafka.server:type=DelayedOperationPurgatory,name=PurgatorySize,delayedOperation=Fetch

The number of fetch requests in the fetch purgatory.

N/A

Request handler average idle percent

kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent

Indicates the percentage of time that the request handler (IO) threads are not in use.

A lower value indicates that the workload of the broker is high.

Request (Requests exempt from throttling)

kafka.server:type=Request

The number of requests that are exempt from throttling.

N/A

ZooKeeper request latency in milliseconds

kafka.server:type=ZooKeeperClientMetrics,name=ZooKeeperRequestLatencyMs

The latency for ZooKeeper requests from the broker, in milliseconds.

N/A

ZooKeeper session state

kafka.server:type=SessionExpireListener,name=SessionState

The status of the broker’s connection to ZooKeeper.

CONNECTED

18.5.2. Kafka network metrics

The following table shows a selection of metrics that report information about requests.

MetricMBeanDescriptionExpected value

Requests per second

kafka.network:type=RequestMetrics,name=RequestsPerSec,request={Produce|FetchConsumer|FetchFollower}

The total number of requests made for the request type per second. The Produce, FetchConsumer, and FetchFollower request types each have their own MBeans.

N/A

Request bytes (request size in bytes)

kafka.network:type=RequestMetrics,name=RequestBytes,request=([-.\w]+)

The size of requests, in bytes, made for the request type identified by the request property of the MBean name. Separate MBeans for all available request types are listed under the RequestBytes node.

N/A

Temporary memory size in bytes

kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request={Produce|Fetch}

The amount of temporary memory used for converting message formats and decompressing messages.

N/A

Message conversions time

kafka.network:type=RequestMetrics,name=MessageConversionsTimeMs,request={Produce|Fetch}

Time, in milliseconds, spent on converting message formats.

N/A

Total request time in milliseconds

kafka.network:type=RequestMetrics,name=TotalTimeMs,request={Produce|FetchConsumer|FetchFollower}

Total time, in milliseconds, spent processing requests.

N/A

Request queue time in milliseconds

kafka.network:type=RequestMetrics,name=RequestQueueTimeMs,request={Produce|FetchConsumer|FetchFollower}

The time, in milliseconds, that a request currently spends in the queue for the request type given in the request property.

N/A

Local time (leader local processing time) in milliseconds

kafka.network:type=RequestMetrics,name=LocalTimeMs,request={Produce|FetchConsumer|FetchFollower}

The time taken, in milliseconds, for the leader to process the request.

N/A

Remote time (leader remote processing time) in milliseconds

kafka.network:type=RequestMetrics,name=RemoteTimeMs,request={Produce|FetchConsumer|FetchFollower}

The length of time, in milliseconds, that the request waits for the follower. Separate MBeans for all available request types are listed under the RemoteTimeMs node.

N/A

Response queue time in milliseconds

kafka.network:type=RequestMetrics,name=ResponseQueueTimeMs,request={Produce|FetchConsumer|FetchFollower}

The length of time, in milliseconds, that the request waits in the response queue.

N/A

Response send time in milliseconds

kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request={Produce|FetchConsumer|FetchFollower}

The time taken, in milliseconds, to send the response.

N/A

Network processor average idle percent

kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent

The average percentage of time that the network processors are idle.

Between zero and one.

18.5.3. Kafka log metrics

The following table shows a selection of metrics that report information about logging.

MetricMBeanDescriptionExpected Value

Log flush rate and time in milliseconds

kafka.log:type=LogFlushStats,name=LogFlushRateAndTimeMs

The rate at which log data is written to disk, in milliseconds.

N/A

Offline log directory count

kafka.log:type=LogManager,name=OfflineLogDirectoryCount

The number of offline log directories (for example, after a hardware failure).

Zero

18.5.4. Kafka controller metrics

The following table shows a selection of metrics that report information about the controller of the cluster.

MetricMBeanDescriptionExpected Value

Active controller count

kafka.controller:type=KafkaController,name=ActiveControllerCount

The number of brokers designated as controllers.

One indicates that the broker is the controller for the cluster.

Leader election rate and time in milliseconds

kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs

The rate at which new leader replicas are elected.

Zero

18.5.5. Yammer metrics

Metrics that express a rate or unit of time are provided as Yammer metrics. The class name of an MBean that uses Yammer metrics is prefixed with com.yammer.metrics.

Yammer rate metrics have the following attributes for monitoring requests:

  • Count
  • EventType (Bytes)
  • FifteenMinuteRate
  • RateUnit (Seconds)
  • MeanRate
  • OneMinuteRate
  • FiveMinuteRate

Yammer time metrics have the following attributes for monitoring requests:

  • Max
  • Min
  • Mean
  • StdDev
  • 75/95/98/99/99.9th Percentile

18.6. Producer MBeans

MBeans are present in Kafka producer applications, including Kafka Streams applications and Kafka Connect with source connectors.

Producer metrics

Table 18.3. Mbeans matching kafka.producer:type=producer-metrics,client-id=*
AttributeDescription

batch-size-avg

The average number of bytes sent per partition per-request.

batch-size-max

The max number of bytes sent per partition per-request.

batch-split-rate

The average number of batch splits per second.

batch-split-total

The total number of batch splits.

buffer-available-bytes

The total amount of buffer memory that is not being used (either unallocated or in the free list).

buffer-total-bytes

The maximum amount of buffer memory the client can use (whether or not it is currently used).

bufferpool-wait-time

The fraction of time an appender waits for space allocation.

bufferpool-wait-time-ns-total

The total time an appender waits for space allocation in nanoseconds.

bufferpool-wait-time-total

Deprecated The total time an appender waits for space allocation in nanoseconds. Replacement is bufferpool-wait-time-ns-total.

compression-rate-avg

The average compression rate of record batches, defined as the average ratio of the compressed batch size over the uncompressed size.

connection-close-rate

Connections closed per second in the window.

connection-close-total

Total connections closed in the window.

connection-count

The current number of active connections.

connection-creation-rate

New connections established per second in the window.

connection-creation-total

Total new connections established in the window.

failed-authentication-rate

Connections per second that failed authentication.

failed-authentication-total

Total connections that failed authentication.

failed-reauthentication-rate

Connections per second that failed re-authentication.

failed-reauthentication-total

Total connections that failed re-authentication.

flush-time-ns-total

The total time the Producer spent in Producer.flush in nanoseconds.

incoming-byte-rate

Bytes/second read off all sockets.

incoming-byte-total

Total bytes read off all sockets.

io-ratio

The fraction of time the I/O thread spent doing I/O.

io-time-ns-avg

The average length of time for I/O per select call in nanoseconds.

io-time-ns-total

The total time the I/O thread spent doing I/O in nanoseconds.

io-wait-ratio

The fraction of time the I/O thread spent waiting.

io-wait-time-ns-avg

The average length of time the I/O thread spent waiting for a socket ready for reads or writes in nanoseconds.

io-wait-time-ns-total

The total time the I/O thread spent waiting in nanoseconds.

io-waittime-total

Deprecated The total time the I/O thread spent waiting in nanoseconds. Replacement is io-wait-time-ns-total.

iotime-total

Deprecated The total time the I/O thread spent doing I/O in nanoseconds. Replacement is io-time-ns-total.

metadata-age

The age in seconds of the current producer metadata being used.

network-io-rate

The average number of network operations (reads or writes) on all connections per second.

network-io-total

The total number of network operations (reads or writes) on all connections.

outgoing-byte-rate

The average number of outgoing bytes sent per second to all servers.

outgoing-byte-total

The total number of outgoing bytes sent to all servers.

produce-throttle-time-avg

The average time in ms a request was throttled by a broker.

produce-throttle-time-max

The maximum time in ms a request was throttled by a broker.

reauthentication-latency-avg

The average latency in ms observed due to re-authentication.

reauthentication-latency-max

The maximum latency in ms observed due to re-authentication.

record-error-rate

The average per-second number of record sends that resulted in errors.

record-error-total

The total number of record sends that resulted in errors.

record-queue-time-avg

The average time in ms record batches spent in the send buffer.

record-queue-time-max

The maximum time in ms record batches spent in the send buffer.

record-retry-rate

The average per-second number of retried record sends.

record-retry-total

The total number of retried record sends.

record-send-rate

The average number of records sent per second.

record-send-total

The total number of records sent.

record-size-avg

The average record size.

record-size-max

The maximum record size.

records-per-request-avg

The average number of records per request.

request-latency-avg

The average request latency in ms.

request-latency-max

The maximum request latency in ms.

request-rate

The average number of requests sent per second.

request-size-avg

The average size of all requests in the window.

request-size-max

The maximum size of any request sent in the window.

request-total

The total number of requests sent.

requests-in-flight

The current number of in-flight requests awaiting a response.

response-rate

Responses received per second.

response-total

Total responses received.

select-rate

Number of times the I/O layer checked for new I/O to perform per second.

select-total

Total number of times the I/O layer checked for new I/O to perform.

successful-authentication-no-reauth-total

Total connections that were successfully authenticated by older, pre-2.2.0 SASL clients that do not support re-authentication. May only be non-zero.

successful-authentication-rate

Connections per second that were successfully authenticated using SASL or SSL.

successful-authentication-total

Total connections that were successfully authenticated using SASL or SSL.

successful-reauthentication-rate

Connections per second that were successfully re-authenticated using SASL.

successful-reauthentication-total

Total connections that were successfully re-authenticated using SASL.

txn-abort-time-ns-total

The total time the Producer spent aborting transactions in nanoseconds (for EOS).

txn-begin-time-ns-total

The total time the Producer spent in beginTransaction in nanoseconds (for EOS).

txn-commit-time-ns-total

The total time the Producer spent committing transactions in nanoseconds (for EOS).

txn-init-time-ns-total

The total time the Producer spent initializing transactions in nanoseconds (for EOS).

txn-send-offsets-time-ns-total

The total time the Producer spent sending offsets to transactions in nanoseconds (for EOS).

waiting-threads

The number of user threads blocked waiting for buffer memory to enqueue their records.

Producer metrics about broker connections

Table 18.4. Mbeans matching kafka.producer:type=producer-metrics,client-id=*,node-id=*
AttributeDescription

incoming-byte-rate

The average number of bytes received per second for a node.

incoming-byte-total

The total number of bytes received for a node.

outgoing-byte-rate

The average number of outgoing bytes sent per second for a node.

outgoing-byte-total

The total number of outgoing bytes sent for a node.

request-latency-avg

The average request latency in ms for a node.

request-latency-max

The maximum request latency in ms for a node.

request-rate

The average number of requests sent per second for a node.

request-size-avg

The average size of all requests in the window for a node.

request-size-max

The maximum size of any request sent in the window for a node.

request-total

The total number of requests sent for a node.

response-rate

Responses received per second for a node.

response-total

Total responses received for a node.

Producer metrics about messages sent to topics

Table 18.5. Mbeans matching kafka.producer:type=producer-topic-metrics,client-id=*,topic=*
AttributeDescription

byte-rate

The average number of bytes sent per second for a topic.

byte-total

The total number of bytes sent for a topic.

compression-rate

The average compression rate of record batches for a topic, defined as the average ratio of the compressed batch size over the uncompressed size.

record-error-rate

The average per-second number of record sends that resulted in errors for a topic.

record-error-total

The total number of record sends that resulted in errors for a topic.

record-retry-rate

The average per-second number of retried record sends for a topic.

record-retry-total

The total number of retried record sends for a topic.

record-send-rate

The average number of records sent per second for a topic.

record-send-total

The total number of records sent for a topic.

18.7. Consumer MBeans

MBeans are present in Kafka consumer applications, including Kafka Streams applications and Kafka Connect with sink connectors.

Consumer metrics

Table 18.6. Mbeans matching kafka.consumer:type=consumer-metrics,client-id=*
AttributeDescription

connection-close-rate

Connections closed per second in the window.

connection-close-total

Total connections closed in the window.

connection-count

The current number of active connections.

connection-creation-rate

New connections established per second in the window.

connection-creation-total

Total new connections established in the window.

failed-authentication-rate

Connections per second that failed authentication.

failed-authentication-total

Total connections that failed authentication.

failed-reauthentication-rate

Connections per second that failed re-authentication.

failed-reauthentication-total

Total connections that failed re-authentication.

incoming-byte-rate

Bytes/second read off all sockets.

incoming-byte-total

Total bytes read off all sockets.

io-ratio

The fraction of time the I/O thread spent doing I/O.

io-time-ns-avg

The average length of time for I/O per select call in nanoseconds.

io-time-ns-total

The total time the I/O thread spent doing I/O in nanoseconds.

io-wait-ratio

The fraction of time the I/O thread spent waiting.

io-wait-time-ns-avg

The average length of time the I/O thread spent waiting for a socket ready for reads or writes in nanoseconds.

io-wait-time-ns-total

The total time the I/O thread spent waiting in nanoseconds.

io-waittime-total

Deprecated The total time the I/O thread spent waiting in nanoseconds. Replacement is io-wait-time-ns-total.

iotime-total

Deprecated The total time the I/O thread spent doing I/O in nanoseconds. Replacement is io-time-ns-total.

network-io-rate

The average number of network operations (reads or writes) on all connections per second.

network-io-total

The total number of network operations (reads or writes) on all connections.

outgoing-byte-rate

The average number of outgoing bytes sent per second to all servers.

outgoing-byte-total

The total number of outgoing bytes sent to all servers.

reauthentication-latency-avg

The average latency in ms observed due to re-authentication.

reauthentication-latency-max

The maximum latency in ms observed due to re-authentication.

request-rate

The average number of requests sent per second.

request-size-avg

The average size of all requests in the window.

request-size-max

The maximum size of any request sent in the window.

request-total

The total number of requests sent.

response-rate

Responses received per second.

response-total

Total responses received.

select-rate

Number of times the I/O layer checked for new I/O to perform per second.

select-total

Total number of times the I/O layer checked for new I/O to perform.

successful-authentication-no-reauth-total

Total connections that were successfully authenticated by older, pre-2.2.0 SASL clients that do not support re-authentication. May only be non-zero.

successful-authentication-rate

Connections per second that were successfully authenticated using SASL or SSL.

successful-authentication-total

Total connections that were successfully authenticated using SASL or SSL.

successful-reauthentication-rate

Connections per second that were successfully re-authenticated using SASL.

successful-reauthentication-total

Total connections that were successfully re-authenticated using SASL.

Consumer metrics about broker connections

Table 18.7. Mbeans matching kafka.consumer:type=consumer-metrics,client-id=*,node-id=*
AttributeDescription

incoming-byte-rate

The average number of bytes received per second for a node.

incoming-byte-total

The total number of bytes received for a node.

outgoing-byte-rate

The average number of outgoing bytes sent per second for a node.

outgoing-byte-total

The total number of outgoing bytes sent for a node.

request-latency-avg

The average request latency in ms for a node.

request-latency-max

The maximum request latency in ms for a node.

request-rate

The average number of requests sent per second for a node.

request-size-avg

The average size of all requests in the window for a node.

request-size-max

The maximum size of any request sent in the window for a node.

request-total

The total number of requests sent for a node.

response-rate

Responses received per second for a node.

response-total

Total responses received for a node.

Consumer group metrics

Table 18.8. Mbeans matching kafka.consumer:type=consumer-coordinator-metrics,client-id=*
AttributeDescription

assigned-partitions

The number of partitions currently assigned to this consumer.

commit-latency-avg

The average time taken for a commit request.

commit-latency-max

The max time taken for a commit request.

commit-rate

The number of commit calls per second.

commit-total

The total number of commit calls.

failed-rebalance-rate-per-hour

The number of failed group rebalance event per hour.

failed-rebalance-total

The total number of failed group rebalances.

heartbeat-rate

The average number of heartbeats per second.

heartbeat-response-time-max

The max time taken to receive a response to a heartbeat request.

heartbeat-total

The total number of heartbeats.

join-rate

The number of group joins per second.

join-time-avg

The average time taken for a group rejoin.

join-time-max

The max time taken for a group rejoin.

join-total

The total number of group joins.

last-heartbeat-seconds-ago

The number of seconds since the last controller heartbeat.

last-rebalance-seconds-ago

The number of seconds since the last rebalance event.

partitions-assigned-latency-avg

The average time taken by the on-partitions-assigned rebalance listener callback.

partitions-assigned-latency-max

The max time taken by the on-partitions-assigned rebalance listener callback.

partitions-lost-latency-avg

The average time taken by the on-partitions-lost rebalance listener callback.

partitions-lost-latency-max

The max time taken by the on-partitions-lost rebalance listener callback.

partitions-revoked-latency-avg

The average time taken by the on-partitions-revoked rebalance listener callback.

partitions-revoked-latency-max

The max time taken by the on-partitions-revoked rebalance listener callback.

rebalance-latency-avg

The average time taken for a group rebalance.

rebalance-latency-max

The max time taken for a group rebalance.

rebalance-latency-total

The total time taken for group rebalances so far.

rebalance-rate-per-hour

The number of group rebalance participated per hour.

rebalance-total

The total number of group rebalances participated.

sync-rate

The number of group syncs per second.

sync-time-avg

The average time taken for a group sync.

sync-time-max

The max time taken for a group sync.

sync-total

The total number of group syncs.

Consumer fetcher metrics

Table 18.9. Mbeans matching kafka.consumer:type=consumer-fetch-manager-metrics,client-id=*
AttributeDescription

bytes-consumed-rate

The average number of bytes consumed per second.

bytes-consumed-total

The total number of bytes consumed.

fetch-latency-avg

The average time taken for a fetch request.

fetch-latency-max

The max time taken for any fetch request.

fetch-rate

The number of fetch requests per second.

fetch-size-avg

The average number of bytes fetched per request.

fetch-size-max

The maximum number of bytes fetched per request.

fetch-throttle-time-avg

The average throttle time in ms.

fetch-throttle-time-max

The maximum throttle time in ms.

fetch-total

The total number of fetch requests.

records-consumed-rate

The average number of records consumed per second.

records-consumed-total

The total number of records consumed.

records-lag-max

The maximum lag in terms of number of records for any partition in this window. NOTE: This is based on current offset and not committed offset.

records-lead-min

The minimum lead in terms of number of records for any partition in this window.

records-per-request-avg

The average number of records in each request.

Consumer fetcher metrics at the topic level

Table 18.10. Mbeans matching kafka.consumer:type=consumer-fetch-manager-metrics,client-id=*,topic=*
AttributeDescription

bytes-consumed-rate

The average number of bytes consumed per second for a topic.

bytes-consumed-total

The total number of bytes consumed for a topic.

fetch-size-avg

The average number of bytes fetched per request for a topic.

fetch-size-max

The maximum number of bytes fetched per request for a topic.

records-consumed-rate

The average number of records consumed per second for a topic.

records-consumed-total

The total number of records consumed for a topic.

records-per-request-avg

The average number of records in each request for a topic.

Consumer fetcher metrics at the partition level

Table 18.11. Mbeans matching kafka.consumer:type=consumer-fetch-manager-metrics,client-id=*,topic=*,partition=*
AttributeDescription

preferred-read-replica

The current read replica for the partition, or -1 if reading from leader.

records-lag

The latest lag of the partition.

records-lag-avg

The average lag of the partition.

records-lag-max

The max lag of the partition.

records-lead

The latest lead of the partition.

records-lead-avg

The average lead of the partition.

records-lead-min

The min lead of the partition.

18.8. Kafka Connect MBeans

Note

Kafka Connect will contain the producer MBeans for source connectors and consumer MBeans for sink connectors in addition to those documented here.

Kafka Connect metrics

Table 18.12. Mbeans matching kafka.connect:type=connect-metrics,client-id=*
AttributeDescription

connection-close-rate

Connections closed per second in the window.

connection-close-total

Total connections closed in the window.

connection-count

The current number of active connections.

connection-creation-rate

New connections established per second in the window.

connection-creation-total

Total new connections established in the window.

failed-authentication-rate

Connections per second that failed authentication.

failed-authentication-total

Total connections that failed authentication.

failed-reauthentication-rate

Connections per second that failed re-authentication.

failed-reauthentication-total

Total connections that failed re-authentication.

incoming-byte-rate

Bytes/second read off all sockets.

incoming-byte-total

Total bytes read off all sockets.

io-ratio

The fraction of time the I/O thread spent doing I/O.

io-time-ns-avg

The average length of time for I/O per select call in nanoseconds.

io-time-ns-total

The total time the I/O thread spent doing I/O in nanoseconds.

io-wait-ratio

The fraction of time the I/O thread spent waiting.

io-wait-time-ns-avg

The average length of time the I/O thread spent waiting for a socket ready for reads or writes in nanoseconds.

io-wait-time-ns-total

The total time the I/O thread spent waiting in nanoseconds.

io-waittime-total

Deprecated The total time the I/O thread spent waiting in nanoseconds. Replacement is io-wait-time-ns-total.

iotime-total

Deprecated The total time the I/O thread spent doing I/O in nanoseconds. Replacement is io-time-ns-total.

network-io-rate

The average number of network operations (reads or writes) on all connections per second.

network-io-total

The total number of network operations (reads or writes) on all connections.

outgoing-byte-rate

The average number of outgoing bytes sent per second to all servers.

outgoing-byte-total

The total number of outgoing bytes sent to all servers.

reauthentication-latency-avg

The average latency in ms observed due to re-authentication.

reauthentication-latency-max

The maximum latency in ms observed due to re-authentication.

request-rate

The average number of requests sent per second.

request-size-avg

The average size of all requests in the window.

request-size-max

The maximum size of any request sent in the window.

request-total

The total number of requests sent.

response-rate

Responses received per second.

response-total

Total responses received.

select-rate

Number of times the I/O layer checked for new I/O to perform per second.

select-total

Total number of times the I/O layer checked for new I/O to perform.

successful-authentication-no-reauth-total

Total connections that were successfully authenticated by older, pre-2.2.0 SASL clients that do not support re-authentication. May only be non-zero.

successful-authentication-rate

Connections per second that were successfully authenticated using SASL or SSL.

successful-authentication-total

Total connections that were successfully authenticated using SASL or SSL.

successful-reauthentication-rate

Connections per second that were successfully re-authenticated using SASL.

successful-reauthentication-total

Total connections that were successfully re-authenticated using SASL.

Kafka Connect metrics about broker connections

Table 18.13. Mbeans matching kafka.connect:type=connect-metrics,client-id=*,node-id=*
AttributeDescription

incoming-byte-rate

The average number of bytes received per second for a node.

incoming-byte-total

The total number of bytes received for a node.

outgoing-byte-rate

The average number of outgoing bytes sent per second for a node.

outgoing-byte-total

The total number of outgoing bytes sent for a node.

request-latency-avg

The average request latency in ms for a node.

request-latency-max

The maximum request latency in ms for a node.

request-rate

The average number of requests sent per second for a node.

request-size-avg

The average size of all requests in the window for a node.

request-size-max

The maximum size of any request sent in the window for a node.

request-total

The total number of requests sent for a node.

response-rate

Responses received per second for a node.

response-total

Total responses received for a node.

Kafka Connect metrics about workers

Table 18.14. Mbeans matching kafka.connect:type=connect-worker-metrics
AttributeDescription

connector-count

The number of connectors run in this worker.

connector-startup-attempts-total

The total number of connector startups that this worker has attempted.

connector-startup-failure-percentage

The average percentage of this worker’s connectors starts that failed.

connector-startup-failure-total

The total number of connector starts that failed.

connector-startup-success-percentage

The average percentage of this worker’s connectors starts that succeeded.

connector-startup-success-total

The total number of connector starts that succeeded.

task-count

The number of tasks run in this worker.

task-startup-attempts-total

The total number of task startups that this worker has attempted.

task-startup-failure-percentage

The average percentage of this worker’s tasks starts that failed.

task-startup-failure-total

The total number of task starts that failed.

task-startup-success-percentage

The average percentage of this worker’s tasks starts that succeeded.

task-startup-success-total

The total number of task starts that succeeded.

Kafka Connect metrics about rebalances

Table 18.15. Mbeans matching kafka.connect:type=connect-worker-rebalance-metrics
AttributeDescription

completed-rebalances-total

The total number of rebalances completed by this worker.

connect-protocol

The Connect protocol used by this cluster.

epoch

The epoch or generation number of this worker.

leader-name

The name of the group leader.

rebalance-avg-time-ms

The average time in milliseconds spent by this worker to rebalance.

rebalance-max-time-ms

The maximum time in milliseconds spent by this worker to rebalance.

rebalancing

Whether this worker is currently rebalancing.

time-since-last-rebalance-ms

The time in milliseconds since this worker completed the most recent rebalance.

Kafka Connect metrics about connectors

Table 18.16. Mbeans matching kafka.connect:type=connector-metrics,connector=*
AttributeDescription

connector-class

The name of the connector class.

connector-type

The type of the connector. One of 'source' or 'sink'.

connector-version

The version of the connector class, as reported by the connector.

status

The status of the connector. One of 'unassigned', 'running', 'paused', 'failed', or 'restarting'.

Kafka Connect metrics about connector tasks

Table 18.17. Mbeans matching kafka.connect:type=connector-task-metrics,connector=*,task=*
AttributeDescription

batch-size-avg

The average size of the batches processed by the connector.

batch-size-max

The maximum size of the batches processed by the connector.

offset-commit-avg-time-ms

The average time in milliseconds taken by this task to commit offsets.

offset-commit-failure-percentage

The average percentage of this task’s offset commit attempts that failed.

offset-commit-max-time-ms

The maximum time in milliseconds taken by this task to commit offsets.

offset-commit-success-percentage

The average percentage of this task’s offset commit attempts that succeeded.

pause-ratio

The fraction of time this task has spent in the pause state.

running-ratio

The fraction of time this task has spent in the running state.

status

The status of the connector task. One of 'unassigned', 'running', 'paused', 'failed', or 'restarting'.

Kafka Connect metrics about sink connectors

Table 18.18. Mbeans matching kafka.connect:type=sink-task-metrics,connector=*,task=*
AttributeDescription

offset-commit-completion-rate

The average per-second number of offset commit completions that were completed successfully.

offset-commit-completion-total

The total number of offset commit completions that were completed successfully.

offset-commit-seq-no

The current sequence number for offset commits.

offset-commit-skip-rate

The average per-second number of offset commit completions that were received too late and skipped/ignored.

offset-commit-skip-total

The total number of offset commit completions that were received too late and skipped/ignored.

partition-count

The number of topic partitions assigned to this task belonging to the named sink connector in this worker.

put-batch-avg-time-ms

The average time taken by this task to put a batch of sinks records.

put-batch-max-time-ms

The maximum time taken by this task to put a batch of sinks records.

sink-record-active-count

The number of records that have been read from Kafka but not yet completely committed/flushed/acknowledged by the sink task.

sink-record-active-count-avg

The average number of records that have been read from Kafka but not yet completely committed/flushed/acknowledged by the sink task.

sink-record-active-count-max

The maximum number of records that have been read from Kafka but not yet completely committed/flushed/acknowledged by the sink task.

sink-record-lag-max

The maximum lag in terms of number of records that the sink task is behind the consumer’s position for any topic partitions.

sink-record-read-rate

The average per-second number of records read from Kafka for this task belonging to the named sink connector in this worker. This is before transformations are applied.

sink-record-read-total

The total number of records read from Kafka by this task belonging to the named sink connector in this worker, since the task was last restarted.

sink-record-send-rate

The average per-second number of records output from the transformations and sent/put to this task belonging to the named sink connector in this worker. This is after transformations are applied and excludes any records filtered out by the transformations.

sink-record-send-total

The total number of records output from the transformations and sent/put to this task belonging to the named sink connector in this worker, since the task was last restarted.

Kafka Connect metrics about source connectors

Table 18.19. Mbeans matching kafka.connect:type=source-task-metrics,connector=*,task=*
AttributeDescription

poll-batch-avg-time-ms

The average time in milliseconds taken by this task to poll for a batch of source records.

poll-batch-max-time-ms

The maximum time in milliseconds taken by this task to poll for a batch of source records.

source-record-active-count

The number of records that have been produced by this task but not yet completely written to Kafka.

source-record-active-count-avg

The average number of records that have been produced by this task but not yet completely written to Kafka.

source-record-active-count-max

The maximum number of records that have been produced by this task but not yet completely written to Kafka.

source-record-poll-rate

The average per-second number of records produced/polled (before transformation) by this task belonging to the named source connector in this worker.

source-record-poll-total

The total number of records produced/polled (before transformation) by this task belonging to the named source connector in this worker.

source-record-write-rate

The average per-second number of records output from the transformations and written to Kafka for this task belonging to the named source connector in this worker. This is after transformations are applied and excludes any records filtered out by the transformations.

source-record-write-total

The number of records output from the transformations and written to Kafka for this task belonging to the named source connector in this worker, since the task was last restarted.

transaction-size-avg

The average number of records in the transactions the task has committed so far.

transaction-size-max

The number of records in the largest transaction the task has committed so far.

transaction-size-min

The number of records in the smallest transaction the task has committed so far.

Kafka Connect metrics about connector errors

Table 18.20. Mbeans matching kafka.connect:type=task-error-metrics,connector=*,task=*
AttributeDescription

deadletterqueue-produce-failures

The number of failed writes to the dead letter queue.

deadletterqueue-produce-requests

The number of attempted writes to the dead letter queue.

last-error-timestamp

The epoch timestamp when this task last encountered an error.

total-errors-logged

The number of errors that were logged.

total-record-errors

The number of record processing errors in this task.

total-record-failures

The number of record processing failures in this task.

total-records-skipped

The number of records skipped due to errors.

total-retries

The number of operations retried.

18.9. Kafka Streams MBeans

Note

A Streams application will contain the producer and consumer MBeans in addition to those documented here.

Kafka Streams metrics for clients

These metrics are collected when the metrics.recording.level configuration parameter is info or debug.

Table 18.21. Mbeans matching kafka.streams:type=stream-metrics,client-id=*
AttributeDescription

blocked-time-ns-total

The total time the thread spent blocked on kafka.

commit-latency-avg

The average execution time in ms, for committing, across all running tasks of this thread.

commit-latency-max

The maximum execution time in ms, for committing, across all running tasks of this thread.

commit-rate

The average number of commits per second.

commit-total

The total number of commit calls.

poll-latency-avg

The average execution time in ms, for consumer polling.

poll-latency-max

The maximum execution time in ms, for consumer polling.

poll-rate

The average number of consumer poll calls per second.

poll-total

The total number of consumer poll calls.

process-latency-avg

The average execution time in ms, for processing.

process-latency-max

The maximum execution time in ms, for processing.

process-rate

The average number of processed records per second.

process-total

The total number of processed records.

punctuate-latency-avg

The average execution time in ms, for punctuating.

punctuate-latency-max

The maximum execution time in ms, for punctuating.

punctuate-rate

The average number of punctuate calls per second.

punctuate-total

The total number of punctuate calls.

task-closed-rate

The average number of tasks closed per second.

task-closed-total

The total number of tasks closed.

task-created-rate

The average number of tasks created per second.

task-created-total

The total number of tasks created.

thread-start-time

The time that the thread was started.

Kafka Streams metrics for tasks

These metrics are collected when the metrics.recording.level configuration parameter is debug.

Table 18.22. Mbeans matching kafka.streams:type=stream-task-metrics,client-id=*,task-id=*
AttributeDescription

active-process-ratio

The fraction of time the stream thread spent on processing this task among all assigned active tasks.

commit-latency-avg

The average execution time in ns, for committing.

commit-latency-max

The maximum execution time in ns, for committing.

commit-rate

The average number of commit calls per second.

commit-total

The total number of commit calls.

dropped-records-rate

The average number of records dropped within this task.

dropped-records-total

The total number of records dropped within this task.

enforced-processing-rate

The average number of enforced processings per second.

enforced-processing-total

The total number enforced processings.

process-latency-avg

The average execution time in ns, for processing.

process-latency-max

The maximum execution time in ns, for processing.

process-rate

The average number of processed records per second across all source processor nodes of this task.

process-total

The total number of processed records across all source processor nodes of this task.

record-lateness-avg

The average observed lateness of records (stream time - record timestamp).

record-lateness-max

The max observed lateness of records (stream time - record timestamp).

Kafka Streams metrics for processor nodes

These metrics are collected when the metrics.recording.level configuration parameter is debug.

Table 18.23. Mbeans matching kafka.streams:type=stream-processor-node-metrics,client-id=*,task-id=*,processor-node-id=*
AttributeDescription

bytes-consumed-total

The total number of bytes consumed by a source processor node.

bytes-produced-total

The total number of bytes produced by a sink processor node.

process-rate

The average number of records processed by a source processor node per second.

process-total

The total number of records processed by a source processor node per second.

record-e2e-latency-avg

The average end-to-end latency of a record, measured by comparing the record timestamp with the system time when it has been fully processed by the node.

record-e2e-latency-max

The maximum end-to-end latency of a record, measured by comparing the record timestamp with the system time when it has been fully processed by the node.

record-e2e-latency-min

The minimum end-to-end latency of a record, measured by comparing the record timestamp with the system time when it has been fully processed by the node.

records-consumed-total

The total number of records consumed by a source processor node.

records-produced-total

The total number of records produced by a sink processor node.

suppression-emit-rate

The rate at which records that have been emitted downstream from suppression operation nodes.

suppression-emit-total

The total number of records that have been emitted downstream from suppression operation nodes.

Kafka Streams metrics for state stores

These metrics are collected when the metrics.recording.level configuration parameter is debug.

Table 18.24. Mbeans matching kafka.streams:type=stream-[store-scope]-metrics,client-id=*,task-id=*,[store-scope]-id=*
AttributeDescription

all-latency-avg

The average all operation execution time in ns.

all-latency-max

The maximum all operation execution time in ns.

all-rate

The average all operation rate for this store.

delete-latency-avg

The average delete execution time in ns.

delete-latency-max

The maximum delete execution time in ns.

delete-rate

The average delete rate for this store.

flush-latency-avg

The average flush execution time in ns.

flush-latency-max

The maximum flush execution time in ns.

flush-rate

The average flush rate for this store.

get-latency-avg

The average get execution time in ns.

get-latency-max

The maximum get execution time in ns.

get-rate

The average get rate for this store.

put-all-latency-avg

The average put-all execution time in ns.

put-all-latency-max

The maximum put-all execution time in ns.

put-all-rate

The average put-all rate for this store.

put-if-absent-latency-avg

The average put-if-absent execution time in ns.

put-if-absent-latency-max

The maximum put-if-absent execution time in ns.

put-if-absent-rate

The average put-if-absent rate for this store.

put-latency-avg

The average put execution time in ns.

put-latency-max

The maximum put execution time in ns.

put-rate

The average put rate for this store.

range-latency-avg

The average range execution time in ns.

range-latency-max

The maximum range execution time in ns.

range-rate

The average range rate for this store.

record-e2e-latency-avg

The average end-to-end latency of a record, measured by comparing the record timestamp with the system time when it has been fully processed by the node.

record-e2e-latency-max

The maximum end-to-end latency of a record, measured by comparing the record timestamp with the system time when it has been fully processed by the node.

record-e2e-latency-min

The minimum end-to-end latency of a record, measured by comparing the record timestamp with the system time when it has been fully processed by the node.

restore-latency-avg

The average restore execution time in ns.

restore-latency-max

The maximum restore execution time in ns.

restore-rate

The average restore rate for this store.

suppression-buffer-count-avg

The average number of records buffered over the sampling window.

suppression-buffer-count-max

The maximum number of records buffered over the sampling window.

suppression-buffer-size-avg

The average total size, in bytes, of the buffered data over the sampling window.

suppression-buffer-size-max

The maximum total size, in bytes, of the buffered data over the sampling window.

Kafka Streams metrics for record caches

These metrics are collected when the metrics.recording.level configuration parameter is debug.

Table 18.25. Mbeans matching kafka.streams:type=stream-record-cache-metrics,client-id=*,task-id=*,record-cache-id=*
AttributeDescription

hit-ratio-avg

The average cache hit ratio defined as the ratio of cache read hits over the total cache read requests.

hit-ratio-max

The maximum cache hit ratio.

hit-ratio-min

The minimum cache hit ratio.

Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.