此内容没有您所选择的语言版本。

Chapter 20. Scaling clusters by adding or removing brokers


Scaling Kafka clusters by adding brokers can improve performance and reliability. Increasing the number of brokers provides more resources, enabling the cluster to handle larger workloads and process more messages. It also enhances fault tolerance by providing additional replicas. Conversely, removing underutilized brokers can reduce resource consumption and increase efficiency. Scaling must be done carefully to avoid disruption or data loss. Redistributing partitions across brokers reduces the load on individual brokers, increasing the overall throughput of the cluster.

Adjusting the replicas configuration changes the number of brokers in a cluster. A replication factor of 3 means each partition is replicated across three brokers, ensuring fault tolerance in case of broker failure:

Example node pool configuration for the number of replicas

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaNodePool
metadata:
  name: my-node-pool
  labels:
    strimzi.io/cluster: my-cluster
spec:
  replicas: 3
  # ...
Copy to Clipboard Toggle word wrap

The actual replication factor for topics depends on the number of available brokers and how many brokers store replicas of each topic partition (configured by default.replication.factor). The minimum number of replicas that must acknowledge a write for it to be considered successful is defined by min.insync.replicas:

Example configuration for topic replication

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-cluster
spec:
  kafka:
    config:
      default.replication.factor: 3
      min.insync.replicas: 2
    # ...
Copy to Clipboard Toggle word wrap

When adding brokers by changing the number of replicas, node IDs start at 0, and the Cluster Operator assigns the next lowest available ID to new brokers. Removing brokers starts with the pod that has the highest node ID. Additionally, when scaling clusters with node pools, you can assign node IDs for scaling operations.

Streams for Apache Kafka can automatically reassign partitions when brokers are added or removed if Cruise Control is deployed and auto-rebalancing is enabled in the Kafka resource. If auto-rebalancing is disabled, you can use Cruise Control to generate optimization proposals before manually rebalancing the cluster.

Cruise Control provides add-brokers and remove-brokers modes for scaling:

  • Use the add-brokers mode after scaling up to move partition replicas to the new brokers.
  • Use the remove-brokers mode before scaling down to move partition replicas off the brokers being removed.

With auto-rebalancing, these modes run automatically using the default Cruise Control configuration or custom settings from a rebalancing template.

Note

To increase the throughput of a Kafka topic, you can increase the number of partitions for that topic, distributing the load across multiple brokers. However, if all brokers are constrained by a resource (such as I/O), adding more partitions won’t improve throughput, and adding more brokers is necessary.

20.1. Triggering auto-rebalances when scaling clusters

Set up auto-rebalancing to automatically redistribute topic partitions when scaling a cluster. You can scale a Kafka cluster by adjusting the number of brokers using the spec.replicas property in the Kafka or KafkaNodePool custom resource used in deployment. When auto-rebalancing is enabled, the cluster is rebalanced without further intervention.

  • After adding brokers, topic partitions are redistributed across the new brokers.
  • Before removing brokers, partitions are moved off the brokers being removed.

Auto-rebalancing helps maintain balanced load distribution across Kafka brokers during scaling operations, depending on how the rebalancing configuration is set up.

Scaling operates in two modes: add-brokers and remove-brokers. Each mode can have its own auto-rebalancing configuration specified in the Kafka resource under spec.cruiseControl.autoRebalance properties. Use the template property to specify a predefined KafkaRebalance resource, which serves as a rebalance configuration template. If a template is not specified in the autorebalance configuration, the default Cruise Control rebalancing configuration is used. You can apply the same template configuration for both scaling modes, use different configurations for each, or enable auto-rebalancing for only one mode. If autorebalance configuration is not set for a mode, auto-rebalancing will not occur for that mode.

The template KafkaRebalance resource must include the strimzi.io/rebalance-template: "true" annotation. The template does not represent an actual rebalance request but holds the rebalancing configuration. During scaling, the Cluster Operator creates a KafkaRebalance resource based on this template, named <cluster_name>-auto-rebalancing-<mode>, where <mode> is either add-brokers or remove-brokers. The Cluster Operator applies a finalizer (strimzi.io/auto-rebalancing) to prevent the resource’s deletion during the rebalancing process.

Progress is reflected in the status of the Kafka resource. The status.autoRebalance property indicates the state of the rebalance. A modes property lists the brokers being added or removed during the operation to help track progress across reconciliations.

Prerequisites

Procedure

  1. Create a rebalancing template for the auto-rebalancing operation (if required).

    Configure a KafkaRebalance resource with the strimzi.io/rebalance-template: "true" annotation. The rebalance configuration template does not require mode and brokers properties unlike when when generating an optimization proposal for rebalancing.

    Example rebalancing template configuration

    apiVersion: kafka.strimzi.io/v1beta2
    kind: KafkaRebalance
    metadata:
      name: my-add-remove-brokers-rebalancing-template
      annotations:
        strimzi.io/rebalance-template: "true" 
    1
    
    spec:
      goals:
        - CpuCapacityGoal
        - NetworkInboundCapacityGoal
        - DiskCapacityGoal
        - RackAwareGoal
        - MinTopicLeadersPerBrokerGoal
        - NetworkOutboundCapacityGoal
        - ReplicaCapacityGoal
      skipHardGoalCheck: true
      # ... other rebalancing configuration
    Copy to Clipboard Toggle word wrap

    1
    The annotation designates the resource as a rebalance configuration template.
  2. Apply the configuration to create the template.
  3. Add auto-rebalancing configuration to the Kafka resource.

    In this example, the same template is used for adding and removing brokers.

    Example using template specifications for auto-rebalancing

    apiVersion: kafka.strimzi.io/v1beta2
    kind: Kafka
    metadata:
      name: my-cluster
    spec:
      kafka:
        # ...
      cruiseControl:
        autoRebalance:
          - mode: add-brokers
            template:
              name: my-add-remove-brokers-rebalancing-template
          - mode: remove-brokers
            template:
              name: my-add-remove-brokers-rebalancing-template
    Copy to Clipboard Toggle word wrap

    To use default Cruise Control configuration for rebalancing, omit the template configuration. In this example, the default configuration is used when adding brokers.

    Example using default specifications for auto-rebalancing

    apiVersion: kafka.strimzi.io/v1beta2
    kind: Kafka
    metadata:
      name: my-cluster
    spec:
      kafka:
        # ...
      cruiseControl:
        autoRebalance:
          - mode: add-brokers
          - mode: remove-brokers
            template:
              name: my-add-remove-brokers-rebalancing-template
    Copy to Clipboard Toggle word wrap

  4. Apply the changes to the Kafka configuration.
    Wait for the Cluster Operator to update the cluster.
  5. Scale the cluster by adjusting the spec.replicas property representing the number of brokers in the cluster.

    The following example shows a node pool configuration for a cluster using three brokers (replicas: 3).

    Example node pool configuration

    apiVersion: kafka.strimzi.io/v1beta2
    kind: KafkaNodePool
    metadata:
      name: pool-b
      labels:
        strimzi.io/cluster: my-cluster
    spec:
      replicas: 3
      roles:
        - broker
      storage:
        type: jbod
        volumes:
          - id: 0
            type: persistent-claim
            size: 100Gi
            deleteClaim: false
      # ...
    Copy to Clipboard Toggle word wrap

    For more information on scaling through node pools, see the following:

  6. Check the rebalance status.
    The status is visible in the Kafka resource.

    Example status for auto-rebalancing

    apiVersion: kafka.strimzi.io/v1beta2
    kind: Kafka
    metadata:
      name: my-cluster
    spec:
      kafka:
        # ...
      cruiseControl:
        autoRebalance:
          - mode: add-brokers
            template:
              name: my-add-remove-brokers-rebalancing-template
          - mode: remove-brokers
            template:
              name: my-add-remove-brokers-rebalancing-template
    status:
      autoRebalance:
        lastTransitionTime: <timestamp_for_last_rebalance_state>
        state: RebalanceOnScaleDown 
    1
    
        modes: 
    2
    
          - mode: add-brokers
            brokers: <broker_ids>
          - mode: remove-brokers
            brokers: <broker_ids>
    Copy to Clipboard Toggle word wrap

    1
    The state of the rebalance, which shows RebalanceOnScaleUp when adding brokers, and RebalanceOnScaleDown when removing brokers. Scale-down operations take precedence. Initial and final state (failed or successful) shows as Idle.
    2
    Rebalance operations grouped by mode, with a list of nodes to be added or removed.
Note

During a rebalance, the status of the KafkaRebalance resource used for the rebalance is checked, and the auto-rebalance state is adjusted accordingly.

20.2. Skipping checks on scale-down operations

By default, Streams for Apache Kafka performs a check to ensure that there are no partition replicas on brokers before initiating a scale-down operation on a Kafka cluster. The check applies to nodes in node pools that perform the role of broker only or a dual role of broker and controller.

If replicas are found, the scale-down is not done in order to prevent potential data loss. To scale-down the cluster, no replicas must be left on the broker before trying to scale it down again.

However, there may be scenarios where you want to bypass this mechanism. Disabling the check might be necessary on busy clusters, for example, because new topics keep generating replicas for the broker. This situation can indefinitely block the scale-down, even when brokers are nearly empty. Overriding the blocking mechanism in this way has an impact: the presence of topics on the broker being scaled down will likely cause a reconciliation failure for the Kafka cluster.

You can bypass the blocking mechanism by annotating the Kafka resource for the Kafka cluster. Annotate the resource by setting the strimzi.io/skip-broker-scaledown-check annotation to true:

Adding the annotation to skip checks on scale-down operations

oc annotate Kafka my-kafka-cluster strimzi.io/skip-broker-scaledown-check="true"
Copy to Clipboard Toggle word wrap

This annotation instructs Streams for Apache Kafka to skip the scale-down check. Replace my-kafka-cluster with the name of your specific Kafka resource.

To restore the check for scale-down operations, remove the annotation:

Removing the annotation to skip checks on scale-down operations

oc annotate Kafka my-kafka-cluster strimzi.io/skip-broker-scaledown-check-
Copy to Clipboard Toggle word wrap

返回顶部
Red Hat logoGithubredditYoutubeTwitter

学习

尝试、购买和销售

社区

关于红帽文档

通过我们的产品和服务,以及可以信赖的内容,帮助红帽用户创新并实现他们的目标。 了解我们当前的更新.

让开源更具包容性

红帽致力于替换我们的代码、文档和 Web 属性中存在问题的语言。欲了解更多详情,请参阅红帽博客.

關於紅帽

我们提供强化的解决方案,使企业能够更轻松地跨平台和环境(从核心数据中心到网络边缘)工作。

Theme

© 2025 Red Hat