이 콘텐츠는 선택한 언어로 제공되지 않습니다.

Chapter 8. Known issues


Known issues in Streams for Apache Kafka 3.1 on OpenShift.

8.1. Apache Kafka

A summary of known issues for Apache Kafka.

8.1.1. Intra-broker log directory reassignment can cause a log directory to go offline

When using multiple log directories per broker (JBOD) and performing intra-broker log directory reassignment (moving replicas between log directories on the same broker), Apache Kafka can incorrectly mark a log directory as failed if a transient filesystem or I/O error occurs during the operation.

This issue is caused by a race condition between background log flush operations and file deletion during replica movement. Under these conditions, Kafka may encounter a NoSuchFileException or a related I/O error and treat it as a fatal storage failure. As a result, the broker takes the entire log directory offline to protect data integrity, and any partitions stored on that directory become unavailable. The log directory can remain marked as failed even after the reassignment completes.

This behavior affects intra-broker log directory reassignment only. Inter-broker partition reassignment is not affected.

Workaround

Restart the affected Kafka broker. On restart, the broker re-scans the log directories and marks the disk as healthy if no underlying filesystem issue is present.

This is a known issue in Apache Kafka. Work to address this issue is being tracked in the Apache Kafka issue tracker (KAFKA-19571). A fix will be included in a future release of Red Hat Streams for Apache Kafka once it is available in the underlying Apache Kafka distribution.

8.1.2. Enabling eligible.leader.replicas.version

A change in Apache Kafka 4.0.0 introduces restrictions on how the min.insync.replicas configuration is managed when the Eligible Leader Replicas (ELR) feature is enabled by setting eligible.leader.replicas.version to 1.

ELR is disabled by default. Enabling ELR without understanding these restrictions can lead to configuration reconciliation issues or cluster instability.

When ELR is enabled, the behavior of min.insync.replicas changes as follows:

  • Cluster-level (default):

    • Cannot be removed
    • Changing it will clear ELR configuration from all topics
  • Broker-level:

    • Ignored entirely
    • You cannot change it
    • It is recommended to remove all broker-level min.insync.replicas before enabling ELR
  • Topic-level:

    • Changing min.insync.replicas clears the ELR configuration for that topic

Workaround Only set eligible.leader.replicas.version to 1 if you have a specific use case that requires

If you must use ELR:

  • Remove all broker-level min.insync.replicas configurations. These settings are ignored when ELR is enabled but can lead to reconciliation errors and, in some cases, leaderless partitions during rolling updates or broker restarts.
  • Avoid changing the cluster-level default unless you intend to remove ELR configuration from all topics.
  • Avoid changing the topic-level configuration unless you want to remove ELR configuration from that topic.

8.2. Streams for Apache Kafka

A summary of known issues for Streams for Apache Kafka.

8.2.1. Multi-Version upgrades from the OperatorHub LTS channel

Currently, multi-version upgrades between Long Term Support (LTS) versions are not supported through the Operator Lifecycle Manager (OLM) when using the OperatorHub LTS channel.

For example, you cannot directly upgrade from version 2.2 LTS to version 2.9 LTS. Instead, you must perform incremental upgrades, stepping through each intermediate minor version to reach version 2.9.

8.2.2. Cruise Control CPU utilization estimation

Cruise Control for Streams for Apache Kafka has a known issue that relates to the calculation of CPU utilization estimation. CPU utilization is calculated as a percentage of the defined capacity of a broker pod. The issue occurs when running Kafka brokers across nodes with varying CPU cores. For example, node1 might have 2 CPU cores and node2 might have 4 CPU cores. In this situation, Cruise Control can underestimate and overestimate CPU load of brokers The issue can prevent cluster rebalances when the pod is under heavy load.

There are two workarounds for this issue.

Workaround one

Equal CPU requests and limits: You can set CPU requests equal to CPU limits in Kafka.spec.kafka.resources. That way, all CPU resources are reserved upfront and are always available. This configuration allows Cruise Control to properly evaluate the CPU utilization when preparing the rebalance proposals based on CPU goals.

Workaround two Exclude CPU goals: You can exclude CPU goals from the hard and default goals specified in the Cruise Control configuration:

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-cluster
  annotations:
    strimzi.io/node-pools: enabled
    strimzi.io/kraft: enabled
spec:
  kafka:
    # ...
  zookeeper:
    # ...
  entityOperator:
    topicOperator: {}
    userOperator: {}
  cruiseControl:
    brokerCapacity:
      inboundNetwork: 10000KB/s
      outboundNetwork: 10000KB/s
    config:
      hard.goals: >
        com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal,
        com.linkedin.kafka.cruisecontrol.analyzer.goals.MinTopicLeadersPerBrokerGoal,
        com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal,
        com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal,
        com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal,
        com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal
      default.goals: >
        com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal,
        com.linkedin.kafka.cruisecontrol.analyzer.goals.MinTopicLeadersPerBrokerGoal,
        com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal,
        com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal,
        com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal,
        com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal,
        com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaDistributionGoal,
        com.linkedin.kafka.cruisecontrol.analyzer.goals.PotentialNwOutGoal,
        com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskUsageDistributionGoal,
        com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundUsageDistributionGoal,
        com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundUsageDistributionGoal,
        com.linkedin.kafka.cruisecontrol.analyzer.goals.TopicReplicaDistributionGoal,
        com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderReplicaDistributionGoal,
        com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderBytesInDistributionGoal
Copy to Clipboard Toggle word wrap

For more information, see Insufficient CPU capacity.

8.2.3. JMX authentication when running in FIPS mode

When running Streams for Apache Kafka in FIPS mode with JMX authentication enabled, clients may fail authentication. To work around this issue, do not enable JMX authentication while running in FIPS mode. We are investigating the issue and working to resolve it in a future release.

8.3. Kafka Bridge

There are no new or existing known issues for the Kafka Bridge.

8.4. Proxy

A summary of known issues for Streams for Apache Kafka Proxy.

8.4.1. Proxy pod may restart without a user-initiated configuration change

The Streams for Apache Kafka Proxy Operator may trigger unnecessary restarts of the proxy pod even when no user-initiated configuration changes have occurred.

This issue stems from how the Operator tracks changes to ConfigMap and Secret resources. These resources do not include a generation field, so the Operator falls back to using the resourceVersion and UUID to detect changes. However, resourceVersion can be incremented by non-user activity, such as etcd performing automatic encryption key rotation (typically weekly), resulting in a new resourceVersion.

As a result, the operator may interpret system-driven updates as user changes and trigger a restart of the proxy pod. This issue will be addressed in a future release.

8.4.2. Message production fails for records ~1MB with Record Encryption filter

When using the Record Encryption filter, producing Kafka messages with a size of approximately 1MB (specifically 1,048,319 bytes or larger) will fail. This is caused by an internal limitation in the encryption filter that incorrectly calculates the required encryption operations for records of this size, even if the Kafka broker is configured to accept larger messages.

Symptom

  • Kafka producers will repeatedly fail with NETWORK_EXCEPTION errors and disconnect from the proxy.
  • The proxy logs will show a WARN with a DekUsageException error, stating: The Encryptor has no more operations allowed. The connection is then closed.

Workaround

Ensure individual Kafka records are kept below the 1,048,319-byte threshold when the Record Encryption filter is active.

8.4.3. Describing consumer groups fails when using the Authorization filter

When using the Authorization filter, attempts to describe consumer groups through the proxy fail.

This issue occurs when running the kafka-consumer-groups --describe command against a proxy configured with the Authorization filter.

Symptom

  • The kafka-consumer-groups command fails with the following error:
org.apache.kafka.common.errors.UnsupportedVersionException: The node does not support DESCRIBE_GROUPS
Copy to Clipboard Toggle word wrap

Workaround

Describe consumer groups by connecting directly to the Kafka cluster rather than through the proxy.

The proxy cannot be configured with a SASL Inspection filter that includes SASL subject builder configuration. This issue is caused by a configuration parsing defect.

If such a configuration is applied, the proxy fails during startup.

Symptom

  • The proxy fails to start and throws a PluginConfigurationException with a ClassCastException, indicating that a LinkedHashMap cannot be cast to the expected subject builder configuration class.

Impact

This issue prevents mapping the SASL authorized ID. As a result, it may not be possible to define concise or targeted ACL rules.

Workaround

There is no workaround. Do not configure SASL subject builder settings in the SASL Inspection filter until this issue is resolved.

8.5. Console

A summary of known issues for Streams for Apache Kafka Console.

The console allows Kafka clusters to have the same name in different namespaces. However, when configuring a role to grant role-based access control (RBAC) permissions to a specific Kafka cluster by name, the configuration does not account for the cluster’s namespace. As a result, permissions granted to a resource by name apply to all Kafka clusters with that name, across all namespaces.

Symptom A role intended to grant access to a single Kafka cluster (my-kafka in namespace1) inadvertently grants access to all other clusters with the same name in different namespaces, such as my-kafka in namespace2. It is not possible to configure a role that distinguishes between Kafka clusters with identical names located in separate namespaces.

Workaround To isolate permissions for a specific Kafka cluster, ensure that all cluster names are globally unique across all namespaces.

Red Hat logoGithubredditYoutubeTwitter

자세한 정보

평가판, 구매 및 판매

커뮤니티

Red Hat 문서 정보

Red Hat을 사용하는 고객은 신뢰할 수 있는 콘텐츠가 포함된 제품과 서비스를 통해 혁신하고 목표를 달성할 수 있습니다. 최신 업데이트를 확인하세요.

보다 포괄적 수용을 위한 오픈 소스 용어 교체

Red Hat은 코드, 문서, 웹 속성에서 문제가 있는 언어를 교체하기 위해 최선을 다하고 있습니다. 자세한 내용은 다음을 참조하세요.Red Hat 블로그.

Red Hat 소개

Red Hat은 기업이 핵심 데이터 센터에서 네트워크 에지에 이르기까지 플랫폼과 환경 전반에서 더 쉽게 작업할 수 있도록 강화된 솔루션을 제공합니다.

Theme

© 2026 Red Hat
맨 위로 이동