此内容没有您所选择的语言版本。
Chapter 20. Scaling clusters by adding or removing brokers
Scaling Kafka clusters by adding brokers can improve performance and reliability. Increasing the number of brokers provides more resources, enabling the cluster to handle larger workloads and process more messages. It also enhances fault tolerance by providing additional replicas. Conversely, removing underutilized brokers can reduce resource consumption and increase efficiency. Scaling must be done carefully to avoid disruption or data loss. Redistributing partitions across brokers reduces the load on individual brokers, increasing the overall throughput of the cluster.
Adjusting the replicas configuration changes the number of brokers in a cluster. A replication factor of 3 means each partition is replicated across three brokers, ensuring fault tolerance in case of broker failure:
Example node pool configuration for the number of replicas
The actual replication factor for topics depends on the number of available brokers and how many brokers store replicas of each topic partition (configured by default.replication.factor). The minimum number of replicas that must acknowledge a write for it to be considered successful is defined by min.insync.replicas:
Example configuration for topic replication
When adding brokers by changing the number of replicas, node IDs start at 0, and the Cluster Operator assigns the next lowest available ID to new brokers. Removing brokers starts with the pod that has the highest node ID. Additionally, when scaling clusters with node pools, you can assign node IDs for scaling operations.
Streams for Apache Kafka can automatically reassign partitions when brokers are added or removed if Cruise Control is deployed and auto-rebalancing is enabled in the Kafka resource. If auto-rebalancing is disabled, you can use Cruise Control to generate optimization proposals before manually rebalancing the cluster.
Cruise Control provides add-brokers and remove-brokers modes for scaling:
-
Use the
add-brokersmode after scaling up to move partition replicas to the new brokers. -
Use the
remove-brokersmode before scaling down to move partition replicas off the brokers being removed.
With auto-rebalancing, these modes run automatically using the default Cruise Control configuration or custom settings from a rebalancing template.
To increase the throughput of a Kafka topic, you can increase the number of partitions for that topic, distributing the load across multiple brokers. However, if all brokers are constrained by a resource (such as I/O), adding more partitions won’t improve throughput, and adding more brokers is necessary.
20.1. Triggering auto-rebalances when scaling clusters 复制链接链接已复制到粘贴板!
Set up auto-rebalancing to automatically redistribute topic partitions when scaling a cluster. You can scale a Kafka cluster by adjusting the number of brokers using the spec.replicas property in the Kafka or KafkaNodePool custom resource used in deployment. When auto-rebalancing is enabled, the cluster is rebalanced without further intervention.
- After adding brokers, topic partitions are redistributed across the new brokers.
- Before removing brokers, partitions are moved off the brokers being removed.
Auto-rebalancing helps maintain balanced load distribution across Kafka brokers during scaling operations, depending on how the rebalancing configuration is set up.
Scaling operates in two modes: add-brokers and remove-brokers. Each mode can have its own auto-rebalancing configuration specified in the Kafka resource under spec.cruiseControl.autoRebalance properties. Use the template property to specify a predefined KafkaRebalance resource, which serves as a rebalance configuration template. If a template is not specified in the autorebalance configuration, the default Cruise Control rebalancing configuration is used. You can apply the same template configuration for both scaling modes, use different configurations for each, or enable auto-rebalancing for only one mode. If autorebalance configuration is not set for a mode, auto-rebalancing will not occur for that mode.
The template KafkaRebalance resource must include the strimzi.io/rebalance-template: "true" annotation. The template does not represent an actual rebalance request but holds the rebalancing configuration. During scaling, the Cluster Operator creates a KafkaRebalance resource based on this template, named <cluster_name>-auto-rebalancing-<mode>, where <mode> is either add-brokers or remove-brokers. The Cluster Operator applies a finalizer (strimzi.io/auto-rebalancing) to prevent the resource’s deletion during the rebalancing process.
Progress is reflected in the status of the Kafka resource. The status.autoRebalance property indicates the state of the rebalance. A modes property lists the brokers being added or removed during the operation to help track progress across reconciliations.
Prerequisites
- The Cluster Operator must be deployed.
- Cruise Control is deployed with Kafka.
- You have configured optimization goals and, optionally, capacity limits on broker resources.
Procedure
Create a rebalancing template for the auto-rebalancing operation (if required).
Configure a
KafkaRebalanceresource with thestrimzi.io/rebalance-template: "true"annotation. The rebalance configuration template does not requiremodeandbrokersproperties unlike when when generating an optimization proposal for rebalancing.Example rebalancing template configuration
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The annotation designates the resource as a rebalance configuration template.
- Apply the configuration to create the template.
Add auto-rebalancing configuration to the
Kafkaresource.In this example, the same template is used for adding and removing brokers.
Example using template specifications for auto-rebalancing
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To use default Cruise Control configuration for rebalancing, omit the template configuration. In this example, the default configuration is used when adding brokers.
Example using default specifications for auto-rebalancing
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Apply the changes to the
Kafkaconfiguration.
Wait for the Cluster Operator to update the cluster. Scale the cluster by adjusting the
spec.replicasproperty representing the number of brokers in the cluster.The following example shows a node pool configuration for a cluster using three brokers (
replicas: 3).Example node pool configuration
Copy to Clipboard Copied! Toggle word wrap Toggle overflow For more information on scaling through node pools, see the following:
Check the rebalance status.
The status is visible in theKafkaresource.Example status for auto-rebalancing
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The state of the rebalance, which shows
RebalanceOnScaleUpwhen adding brokers, andRebalanceOnScaleDownwhen removing brokers. Scale-down operations take precedence. Initial and final state (failed or successful) shows asIdle. - 2
- Rebalance operations grouped by mode, with a list of nodes to be added or removed.
During a rebalance, the status of the KafkaRebalance resource used for the rebalance is checked, and the auto-rebalance state is adjusted accordingly.
20.2. Skipping checks on scale-down operations 复制链接链接已复制到粘贴板!
By default, Streams for Apache Kafka performs a check to ensure that there are no partition replicas on brokers before initiating a scale-down operation on a Kafka cluster. The check applies to nodes in node pools that perform the role of broker only or a dual role of broker and controller.
If replicas are found, the scale-down is not done in order to prevent potential data loss. To scale-down the cluster, no replicas must be left on the broker before trying to scale it down again.
However, there may be scenarios where you want to bypass this mechanism. Disabling the check might be necessary on busy clusters, for example, because new topics keep generating replicas for the broker. This situation can indefinitely block the scale-down, even when brokers are nearly empty. Overriding the blocking mechanism in this way has an impact: the presence of topics on the broker being scaled down will likely cause a reconciliation failure for the Kafka cluster.
You can bypass the blocking mechanism by annotating the Kafka resource for the Kafka cluster. Annotate the resource by setting the strimzi.io/skip-broker-scaledown-check annotation to true:
Adding the annotation to skip checks on scale-down operations
oc annotate Kafka my-kafka-cluster strimzi.io/skip-broker-scaledown-check="true"
oc annotate Kafka my-kafka-cluster strimzi.io/skip-broker-scaledown-check="true"
This annotation instructs Streams for Apache Kafka to skip the scale-down check. Replace my-kafka-cluster with the name of your specific Kafka resource.
To restore the check for scale-down operations, remove the annotation:
Removing the annotation to skip checks on scale-down operations
oc annotate Kafka my-kafka-cluster strimzi.io/skip-broker-scaledown-check-
oc annotate Kafka my-kafka-cluster strimzi.io/skip-broker-scaledown-check-