此内容没有您所选择的语言版本。
Chapter 15. Using Cruise Control for cluster rebalancing
Cruise Control is an open-source application designed to run alongside Kafka to help optimize use of cluster resources by doing the following:
- Monitoring cluster workload
- Rebalancing partitions based on predefined constraints
Cruise Control operations help with running a more balanced Kafka cluster that uses brokers more efficiently.
As Kafka clusters evolve, some brokers may become overloaded while others remain underutilized. Cruise Control addresses this imbalance by modeling resource utilization at the replica level—including, CPU, disk, network load—and generating optimization proposals (which you can approve or reject) for balanced partition assignments based on configurable optimization goals.
The cruisecontrol.properties file contains the configuration for Cruise Control. You can specify and configure all the properties listed in the Configurations section of the Cruise Control Wiki.
15.1. Cruise Control components and features 复制链接链接已复制到粘贴板!
Cruise Control comprises four main components:
- Load Monitor
- Load Monitor collects the metrics and analyzes cluster workload data.
- Analyzer
- Analyzer generates optimization proposals based on collected data and configured goals.
- Anomaly Detector
- Anomaly Detector identifies and reports irregularities in cluster behavior.
- Executor
- Executor applies approved optimization proposals to the cluster.
Cruise Control also provides a REST API for client interactions, which Streams for Apache Kafka uses to support these features:
- Generating optimization proposals from optimization goals
- Rebalancing a Kafka cluster based on an optimization proposal
- Changing topic replication factor
Other Cruise Control features are not currently supported, including self healing, notifications, and write-your-own goals.
15.1.1. Optimization goals 复制链接链接已复制到粘贴板!
Optimization goals define objectives for rebalancing, such as distributing topic replicas evenly across brokers.
They are categorized as follows:
- Supported goals are a list of goals supported by the Cruise Control instance that can be used in its operations. By default, this list includes all goals included with Cruise Control. For a goal to be used in other categories, such as default or hard goals, it must first be listed in supported goals. To prevent a goal’s usage, remove it from this list.
- Hard goals are preset and must be satisfied for a proposal to succeed.
- Soft goals are preset goals with objectives that are prioritized during optimization as much as possible, without preventing a proposal from being created if all hard goals are satisfied.
- Default goals refer to the goals used by default when generating proposals. They match the supported goals unless specifically set by the user.
- Intra-broker goals refer to the goals used specifically for rebalances on the same broker.
- Proposal-specific goals are a subset of supported goals configured for specific proposals.
Set proposal-specific goals at runtime. Specify other optimization goals in a configuration properties file using their fully-qualified domain names and in descending priority order.
The config/cruisecontrol.properties file contains the configuration for Cruise Control. Use the following properties to manage goals:
-
Supported goals:
goalsproperty -
Hard goals:
hard.goalsproperty -
Default goals:
default.goalsproperty -
Intra-broker goals:
intra.broker.goalsproperty
15.1.1.1. Supported goals 复制链接链接已复制到粘贴板!
Supported goals are predefined and available to use for generating Cruise Control optimization proposals. Goals not listed as supported goals cannot be used in Cruise Control operations. Some supported goals are preset as hard goals.
Configure supported goals in cruisecontrol.properties:
-
To modify supported goals, specify the goals in the
goalsproperty.
You can adjust the priority order in the goals configuration. - You must specify at least one supported goal.
15.1.1.2. Hard and soft goals 复制链接链接已复制到粘贴板!
Hard goals must be satisfied for optimization proposals to be generated. Soft goals are best-effort objectives that Cruise Control tries to meet after all hard goals are satisfied. The classification of hard and soft goals is fixed in Cruise Control code and cannot be changed.
Cruise Control first prioritizes satisfying hard goals, and then addresses soft goals in the order they are listed. A proposal meeting all hard goals is valid, even if it violates some soft goals.
For example, a soft goal might be to evenly distribute a topic’s replicas. Cruise Control continues to generate an optimization proposal even if the soft goal isn’t completely satisfied.
Configure hard goals in cruisecontrol.properties:
-
To modify hard goals, specify a subset of supported goals in the
hard.goalsproperty.
You can adjust the priority order in the hard goals configuration. -
To exclude a hard goal, ensure it’s not in either
default.goalsorhard.goals.
Increasing the number of configured hard goals will reduce the likelihood of Cruise Control generating optimization proposals.
15.1.1.3. Default goals 复制链接链接已复制到粘贴板!
Cruise Control uses default goals to generate an optimization proposal. Default goals must be a subset of the supported optimization goals.
The optimization proposal based on this supported goals list is then generated and cached.
Configure default goals in cruisecontrol.properties:
-
To modify default goals, specify a subset of supported goals in the
default.goalsproperty.
You can adjust the priority order in the default goals configuration. - You must specify at least one default goal.
15.1.1.4. Intra-broker goals 复制链接链接已复制到粘贴板!
Cruise Control uses intra-broker goals to balance data between disks on the same broker, which is useful for deployments with JBOD storage and multiple disks.
Configure intra-broker goals in cruisecontrol.properties:
-
To modify intra-broker goals, list the supported goals in the
intra.broker.goalsproperty.
You can adjust the priority order in the intra-broker goals configuration.
15.1.1.5. Proposal-specific goals 复制链接链接已复制到粘贴板!
Proposal-specific optimization goals support the creation of optimization proposals based on a specific list of goals. If proposal-specific goals are not set, then default goals are used
Specify proposal-specific goals at runtime as a subset of supported optimization goals for customization.
For example, you can optimize topic leader replica distribution across the Kafka cluster without considering disk capacity or utilization by defining a single proposal-specific goal.
When specifying proposal-specific goals, include all configured hard goals, or an error occurs.
To ignore the configured hard goals in an optimization proposal, add the skip_hard_goals_check=true parameter to the request.
15.1.1.6. Goals order of priority 复制链接链接已复制到粘贴板!
Unless you change the configuration, Streams for Apache Kafka inherits goals from Cruise Control.
The following list shows supported goals inherited by Streams for Apache Kafka from Cruise Control in descending priority order. Goals labeled as hard are mandatory constraints that must be satisfied for optimization proposals.
-
RackAwareGoal(hard) -
MinTopicLeadersPerBrokerGoal(hard) -
ReplicaCapacityGoal(hard) -
DiskCapacityGoal(hard) -
NetworkInboundCapacityGoal(hard) -
NetworkOutboundCapacityGoal(hard) -
CpuCapacityGoal(hard) -
ReplicaDistributionGoal -
PotentialNwOutGoal -
DiskUsageDistributionGoal -
NetworkInboundUsageDistributionGoal -
NetworkOutboundUsageDistributionGoal -
CpuUsageDistributionGoal -
TopicReplicaDistributionGoal -
LeaderReplicaDistributionGoal -
LeaderBytesInDistributionGoal -
PreferredLeaderElectionGoal -
IntraBrokerDiskCapacityGoal(hard) -
IntraBrokerDiskUsageDistributionGoal
Resource distribution goals are subject to capacity limits on broker resources.
For more information on each optimization goal, see Goals in the Cruise Control Wiki.
"Write your own" goals and Kafka assigner goals are not supported.
Example configuration for default and hard goals
default.goals=com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.PotentialNwOutGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.TopicReplicaDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderReplicaDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderBytesInDistributionGoal hard.goals=com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuCapacityGoal
default.goals=com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.PotentialNwOutGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuUsageDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.TopicReplicaDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderReplicaDistributionGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.LeaderBytesInDistributionGoal
hard.goals=com.linkedin.kafka.cruisecontrol.analyzer.goals.RackAwareGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.ReplicaCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.DiskCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkInboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.NetworkOutboundCapacityGoal,com.linkedin.kafka.cruisecontrol.analyzer.goals.CpuCapacityGoal
Ensure that the supported goals, default.goals, and (unless skip_hard_goals_check is set to true) proposal-specific goals include all hard goals specified in hard.goals to avoid errors when generating optimization proposals. Hard goals must be included as a subset in the supported, default, and proposal-specific goals.
Example request with proposal-specific goals
curl -v -X POST 'http://<cc_host>:<cc_port>/kafkacruisecontrol/rebalance?goals=RackAwareGoal,ReplicaCapacityGoal,ReplicaDistributionGoal&skip_hard_goal_check=true'
curl -v -X POST 'http://<cc_host>:<cc_port>/kafkacruisecontrol/rebalance?goals=RackAwareGoal,ReplicaCapacityGoal,ReplicaDistributionGoal&skip_hard_goal_check=true'
15.1.1.7. Skipping hard goal checks 复制链接链接已复制到粘贴板!
If skip_hard_goals_check=true is specified in a request, Cruise Control does not verify that the proposal-specific goals include all the configured hard goals. This allows for more flexibility in generating optimization proposals, but may lead to proposals that do not satisfy all hard goals.
However, any hard goals included in the proposal-specific goals will still be treated as hard goals by Cruise Control, even with skip_hard_goals_check=true.
15.1.2. Optimization proposals 复制链接链接已复制到粘贴板!
Optimization proposals are summaries of proposed changes based on the defined optimization goals, assessed in a specific order of priority. You can approve or reject proposals and rerun them with adjusted goals if needed.
With Cruise Control deployed for use in Streams for Apache Kafka, the process to generate and approve an optimization proposal is as follows:
- Make a request to generate an optimization proposal. This request triggers Cruise Control to initiate the optimization proposal generation process.
-
A Cruise Control Metrics Reporter runs in every Kafka broker, collecting raw metrics and publishing them to a dedicated Kafka topic (
__CruiseControlMetrics). Metrics for brokers, topics, and partitions are aggregated, sampled, and stored in other topics automatically created when Cruise Control is deployed. - Load Monitor collects, processes, and stores the metrics as a workload model--including CPU, disk, and network utilization data—which is used by the Analyzer and Anomaly Detector.
- Anomaly Detector continuously monitors the health and performance of the Kafka cluster, checking for things like broker failures or disk capacity issues, that could impact cluster stability.
- Analyzer creates optimization proposals based on the workload model from the Load Monitor. Based on configured goals and capacities, it generates an optimization proposal for balancing partitions across brokers. Through the REST API, a summary of the proposal is returned in the response to the request.
- The optimization proposal is approved or rejected based on its alignment with cluster management goals.
- If approved, the Executor applies the optimization proposal to rebalance the Kafka cluster. This involves reassigning partitions and redistributing workload across brokers according to the approved proposal.
Cruise Control optimization process
Optimization proposals comprise a list of partition reassignment mappings. When you approve a proposal, the Cruise Control server applies these partition reassignments to the Kafka cluster.
A partition reassignment consists of either of the following types of operations:
Partition movement: Involves transferring the partition replica and its data to a new location. Partition movements can take one of two forms:
- Inter-broker movement: The partition replica is moved to a log directory on a different broker.
- Intra-broker movement: The partition replica is moved to a different log directory on the same broker.
- Leadership movement: Involves switching the leader of the partition’s replicas.
Cruise Control issues partition reassignments to the Kafka cluster in batches. The performance of the cluster during the rebalance is affected by the number and magnitude of each type of movement contained in each batch.
15.1.2.1. Rebalancing endpoints 复制链接链接已复制到粘贴板!
Proposals for rebalances can be generated by making a request to one of three endpoints.
- /rebalance endpoint
- A request to this endpoint runs a full rebalance by moving replicas across all the brokers in the cluster.
- /add_broker endpoint
-
This endpoint is used after scaling up a Kafka cluster by adding one or more brokers. Normally, after scaling up a Kafka cluster, new brokers are used to host only the partitions of newly created topics. If no new topics are created, the newly added brokers are not used and the existing brokers remain under the same load. By using the
add_brokerendpoint immediately after adding brokers to the cluster, the rebalancing operation moves replicas from existing brokers to the newly added brokers. You specify the new brokers in the request as a list of broker IDs. - /remove_broker
- This endpoint is used before scaling down a Kafka cluster by removing one or more brokers. The operation moves replicas off the brokers that are going to be removed. When these brokers are not hosting replicas anymore, you can safely run the scaling down operation. You specify the brokers you’re removing as a list of broker IDs.
In general, use the full rebalance endpoint to rebalance a Kafka cluster by spreading the load across brokers. Use the add_broker and remove_broker endpoints only if you want to scale your cluster up or down and rebalance the replicas accordingly.
The procedure to run a rebalance is actually the same across the three different endpoints. The only difference is with specifying the endpoint in the request and, if needed, listing brokers that have been added or will be removed.
15.1.2.2. The results of an optimization proposal 复制链接链接已复制到粘贴板!
When an optimization proposal is generated, a summary of the changes is returned.
The summary is returned in a response to a HTTP request through the Cruise Control API. The summary provides an overview of the proposed cluster rebalance and indicates the scale of the changes involved. The information provided is a summary of the full optimization proposal.
An optimization proposal summary shows the proposed scope of changes.
When you make a POST request to the /rebalance endpoint, an optimization proposal summary is returned in the response.
Returning an optimization proposal summary
curl -v -X POST 'http://<cc_host>:<cc_port>/kafkacruisecontrol/rebalance'
curl -v -X POST 'http://<cc_host>:<cc_port>/kafkacruisecontrol/rebalance'
Use the summary to decide whether to approve or reject an optimization proposal.
- Approving an optimization proposal
-
You approve the optimization proposal by making a POST request to the
/rebalanceendpoint and setting thedryrunparameter tofalse(defaulttrue). Cruise Control applies the proposal to the Kafka cluster and starts a cluster rebalance operation. - Rejecting an optimization proposal
-
If you choose not to approve an optimization proposal, you can change the optimization goals or update any of the rebalance performance tuning options, and then generate another proposal. You can resend a request without the
dryrunparameter to generate a new optimization proposal.
Use optimization proposals to assess the movements required for a rebalance. For example, a summary describes inter-broker and intra-broker movements. Inter-broker rebalancing moves data between separate brokers. Intra-broker rebalancing moves data between disks on the same broker when you are using a JBOD storage configuration. Such information can be useful even if you don’t go ahead and approve the proposal.
You might reject an optimization proposal, or delay its approval, because of the additional load on a Kafka cluster when rebalancing. If the proposal is delayed for too long, the cluster load may change significantly, so it may be better to request a new proposal.
In the following example, the proposal suggests the rebalancing of data between separate brokers. The rebalance involves the movement of 55 partition replicas, totaling 12MB of data, across the brokers. The proposal will also move 24 partition leaders to different brokers. This requires a change to the cluster metadata, which has a low impact on performance.
The balancedness scores are measurements of the overall balance of the Kafka cluster before and after the optimization proposal is approved. A balancedness score is based on optimization goals. If all goals are satisfied, the score is 100. The score is reduced for each goal that will not be met. Compare the balancedness scores to see whether the Kafka cluster is less balanced than it could be following a rebalance.
Example optimization proposal summary
Though the inter-broker movement of partition replicas has a high impact on performance, the total amount of data is not large. If the total data was much larger, you could reject the proposal, or time when to approve the rebalance to limit the impact on the performance of the Kafka cluster.
The provision status indicates whether the current cluster configuration supports the optimization goals. Check the provision status to see if you should add or remove brokers.
| Status | Description |
|---|---|
|
| The cluster has an appropriate number of brokers to satisfy the optimization goals. |
|
| The cluster is under-provisioned and requires more brokers to satisfy the optimization goals. |
|
| The cluster is over-provisioned and requires fewer brokers to satisfy the optimization goals. |
|
| The status is not relevant or it has not yet been decided. |
15.1.2.4. Optimization proposal summary properties 复制链接链接已复制到粘贴板!
The following table explains the properties contained in the optimization proposal’s summary.
| Property | Description |
|---|---|
|
| <n>: The number of partition replicas that will be moved between separate brokers. Performance impact during rebalance operation: Relatively high. <y> MB: The sum of the size of each partition replica that will be moved to a separate broker. Performance impact during rebalance operation: Variable. The larger the number of MBs, the longer the cluster rebalance will take to complete. |
|
| <n>: The total number of partition replicas that will be transferred between the disks of the cluster’s brokers.
Performance impact during rebalance operation: Relatively high, but less than <y> MB: The sum of the size of each partition replica that will be moved between disks on the same broker.
Performance impact during rebalance operation: Variable. The larger the number, the longer the cluster rebalance will take to complete. Moving a large amount of data between disks on the same broker has less impact than between separate brokers (see |
|
| The number of topics excluded from the calculation of partition replica/leader movements in the optimization proposal. You can exclude topics in one of the following ways:
In the
In a POST request to the Topics that match the regular expression are listed in the response and will be excluded from the cluster rebalance. |
|
| <n>: The number of partitions whose leaders will be switched to different replicas. Performance impact during rebalance operation: Relatively low. |
|
| <n>: The number of metrics windows upon which the optimization proposal is based. |
|
| <n>%: The percentage of partitions in the Kafka cluster covered by the optimization proposal. |
|
| Measurements of the overall balance of a Kafka Cluster.
Cruise Control assigns a
The |
15.1.2.5. Adjusting the cached proposal refresh rate 复制链接链接已复制到粘贴板!
Cruise Control maintains a cached optimization proposal based on the configured default optimization goals. This proposal is generated from the workload model and updated every 15 minutes to reflect the current state of the Kafka cluster. When you generate an optimization proposal using the default goals, Cruise Control returns the latest cached version.
For clusters with rapidly changing workloads, you may want to shorten the refresh interval to ensure the optimization proposal reflects the most recent state. However, reducing the interval increases the load on the Cruise Control server. To adjust the refresh rate, modify the proposal.expiration.ms setting in the Cruise Control deployment configuration.
15.1.3. Tuning options for rebalances 复制链接链接已复制到粘贴板!
Configuration options allow you to fine-tune cluster rebalance performance. These settings control the movement of partition replicas and leadership, as well as the bandwidth allocated for rebalances.
15.1.3.1. Selecting replica movement strategies 复制链接链接已复制到粘贴板!
Cluster rebalance performance is also influenced by the replica movement strategy that is applied to the batches of partition reassignment commands. By default, Cruise Control uses the BaseReplicaMovementStrategy, which applies the reassignments in the order they were generated. However, this strategy could lead to the delay of other partition reassignments if large partition reassignments are generated then ordered first.
Cruise Control provides four alternative replica movement strategies that can be applied to optimization proposals:
-
PrioritizeSmallReplicaMovementStrategy: Reassign smaller partitions first. -
PrioritizeLargeReplicaMovementStrategy: Reassign larger partitions first. -
PostponeUrpReplicaMovementStrategy: Prioritize partitions without out-of-sync replicas. -
PrioritizeMinIsrWithOfflineReplicasStrategy: Prioritize reassignments for partitions at or below their minimum in-sync replicas (MinISR) with offline replicas.
Setconcurrency.adjuster.min.isr.check.enabledin the Cruise Control configuration to enable this strategy.
These strategies can be configured as a sequence. The first strategy attempts to compare two partition reassignments using its internal logic. If the reassignments are equivalent, then it passes them to the next strategy in the sequence to decide the order, and so on.
15.1.3.2. Rebalance tuning options 复制链接链接已复制到粘贴板!
You can set the following rebalance tuning options when configuring Cruise Control or individual rebalances:
Set the tuning options using one of the following methods:
-
Properties in the
cruisecontrol.propertiesfile -
arameters in POST requests to the
/rebalanceendpoint
The relevant configurations for both methods are summarized in the following table.
| Cruise Control properties | Rebalance endpoint parameters | Default | Description |
|---|---|---|---|
|
|
| 5 | The maximum number of inter-broker partition movements in each partition reassignment batch |
|
|
| 2 | The maximum number of intra-broker partition movements in each partition reassignment batch |
|
|
| 1000 | The maximum number of partition leadership changes in each partition reassignment batch |
|
|
| Null (no limit) | The bandwidth (in bytes per second) to assign to partition reassignment |
|
|
|
|
The list of strategies (in priority order) used to determine the order in which partition reassignment commands are executed for generated proposals. There are three strategies: |
Changing the default settings affects the length of time that the rebalance takes to complete, as well as the load placed on the Kafka cluster during the rebalance. Using lower values reduces the load but increases the amount of time taken, and vice versa.
15.2. Downloading Cruise Control 复制链接链接已复制到粘贴板!
A ZIP file distribution of Cruise Control is available for download from the Red Hat website. You can download the latest version of Red Hat Streams for Apache Kafka from the Streams for Apache Kafka software downloads page.
Procedure
- Download the latest version of the Red Hat Streams for Apache Kafka Cruise Control archive from the Red Hat Customer Portal.
Create the
/opt/cruise-controldirectory:sudo mkdir /opt/cruise-control
sudo mkdir /opt/cruise-controlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Extract the contents of the Cruise Control ZIP file to the new directory:
unzip amq-streams-<version>-cruise-control-bin.zip -d /opt/cruise-control
unzip amq-streams-<version>-cruise-control-bin.zip -d /opt/cruise-controlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Change the ownership of the
/opt/cruise-controldirectory to the Kafka user:sudo chown -R kafka:kafka /opt/cruise-control
sudo chown -R kafka:kafka /opt/cruise-controlCopy to Clipboard Copied! Toggle word wrap Toggle overflow
15.3. Deploying the Cruise Control Metrics Reporter 复制链接链接已复制到粘贴板!
Before starting Cruise Control, you must configure the Kafka brokers to use the provided Cruise Control Metrics Reporter. The file for the Metrics Reporter is supplied with the Streams for Apache Kafka installation artifacts.
When loaded at runtime, the Metrics Reporter sends metrics to the __CruiseControlMetrics topic, one of three auto-created topics. Cruise Control uses these metrics to create and update the workload model and to calculate optimization proposals.
Prerequisites
- Streams for Apache Kafka is installed on each host, and the configuration files are available.
- You are logged in to Red Hat Enterprise Linux as the Kafka user.
Procedure
For each broker in the Kafka cluster and one at a time:
Stop the Kafka broker:
./bin/kafka-server-stop.sh
./bin/kafka-server-stop.shCopy to Clipboard Copied! Toggle word wrap Toggle overflow Edit the Kafka configuration properties file to configure the Cruise Control Metrics Reporter.
Add the
CruiseControlMetricsReporterclass to themetric.reportersconfiguration option. Do not remove any existing Metrics Reporters.metric.reporters=com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporter
metric.reporters=com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporterCopy to Clipboard Copied! Toggle word wrap Toggle overflow Add the following configuration options and values:
cruise.control.metrics.topic.auto.create=true cruise.control.metrics.topic.num.partitions=1 cruise.control.metrics.topic.replication.factor=1
cruise.control.metrics.topic.auto.create=true cruise.control.metrics.topic.num.partitions=1 cruise.control.metrics.topic.replication.factor=1Copy to Clipboard Copied! Toggle word wrap Toggle overflow These options enable the Cruise Control Metrics Reporter to create the
__CruiseControlMetricstopic with a log cleanup policy ofDELETE. For more information, see Auto-created topics and Configuring logging and cleanup policy.
Configure SSL, if required.
In the Kafka configuration properties file, configure SSL between the Cruise Control Metrics Reporter and the Kafka broker by setting the relevant client configuration properties.
The Metrics Reporter accepts all standard producer-specific configuration properties with the
cruise.control.metrics.reporterprefix. For example:cruise.control.metrics.reporter.ssl.truststore.password.In the Cruise Control properties file (
./cruise-control/config/cruisecontrol.properties) configure SSL between the Kafka broker and the Cruise Control server by setting the relevant client configuration properties.Cruise Control inherits SSL client property options from Kafka and uses those properties for all Cruise Control server clients.
Restart the Kafka broker:
./bin/kafka-server-start.sh -daemon ./config/server.properties
./bin/kafka-server-start.sh -daemon ./config/server.propertiesCopy to Clipboard Copied! Toggle word wrap Toggle overflow For information on restarting brokers in a multi-node cluster, see Section 4.3, “Performing a graceful rolling restart of Kafka brokers”.
- Repeat steps 1-5 for the remaining brokers.
15.4. Configuring and starting Cruise Control 复制链接链接已复制到粘贴板!
Configure the properties used by Cruise Control and then start the Cruise Control server using the kafka-cruise-control-start.sh script. The server is hosted on a single machine for the whole Kafka cluster.
Three topics are auto-created when Cruise Control starts. For more information, see Auto-created topics.
Prerequisites
- You are logged in to Red Hat Enterprise Linux as the Kafka user.
- You have downloaded Cruise Control.
- You have deployed the Cruise Control Metrics Reporter.
Procedure
-
Edit the Cruise Control properties file (
./cruise-control/config/cruisecontrol.properties). Configure the properties shown in the following example configuration:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Host and port numbers of the Kafka broker (always port 9092).
- 2
- Replication factor of the Kafka metric sample store topic. If you are evaluating Cruise Control in a single-node Kafka and ZooKeeper cluster, set this property to 1. For production use, set this property to 2 or more.
- 3
- The configuration file that sets the maximum capacity limits for broker resources. Use the file that applies to your Kafka deployment configuration. For more information, see Capacity configuration.
- 4
- Comma-separated list of default optimization goals, using fully-qualified domain names (FQDNs). A number of supported optimization goals (see 5) are already set as default optimization goals; you can add or remove goals if desired.
- 5
- Comma-separated list of supported optimization goals, using FQDNs. To completely exclude goals from being used to generate optimization proposals, remove them from the list.
- 6
- Comma-separated list of hard goals, using FQDNs. Seven of the supported optimization goals are already set as hard goals; you can add or remove goals if desired.
- 7
- The interval, in milliseconds, for refreshing the cached optimization proposal that is generated from the default optimization goals.
- 8
- Host and port numbers of the ZooKeeper connection (always port 2181).
Start the Cruise Control server. The server starts on port 9092 by default; optionally, specify a different port.
cd ./cruise-control/ ./kafka-cruise-control-start.sh config/cruisecontrol.properties <port_number>
cd ./cruise-control/ ./kafka-cruise-control-start.sh config/cruisecontrol.properties <port_number>Copy to Clipboard Copied! Toggle word wrap Toggle overflow To verify that Cruise Control is running, send a GET request to the
/stateendpoint of the Cruise Control server:curl -X GET 'http://<cc_host>:<cc_port>/kafkacruisecontrol/state'
curl -X GET 'http://<cc_host>:<cc_port>/kafkacruisecontrol/state'Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Auto-created topics
The following table shows the three topics that are automatically created when Cruise Control starts. These topics are required for Cruise Control to work properly and must not be deleted or changed.
| Auto-created topic | Created by | Function |
|---|---|---|
|
| Cruise Control Metrics Reporter | Stores the raw metrics from the Metrics Reporter in each Kafka broker. |
|
| Cruise Control | Stores the derived metrics for each partition. These are created by the Metric Sample Aggregator. |
|
| Cruise Control | Stores the metrics samples used to create the Cluster Workload Model. |
To ensure that log compaction is disabled in the auto-created topics, make sure that you configure the Cruise Control Metrics Reporter as described in Section 15.3, “Deploying the Cruise Control Metrics Reporter”. Log compaction can remove records that are needed by Cruise Control and prevent it from working properly.
15.5. Configuring capacity limits 复制链接链接已复制到粘贴板!
Cruise Control uses capacity limits to determine if certain resource-based optimization goals are being broken. An attempted optimization fails if one or more of these resource-based goals is set as a hard goal and then broken. This prevents the optimization from being used to generate an optimization proposal.
You specify capacity limits for Kafka broker resources in one of the following three .json files in cruise-control/config:
-
capacityJBOD.json: For use in JBOD Kafka deployments (the default file). -
capacity.json: For use in non-JBOD Kafka deployments where each broker has the same number of CPU cores. -
capacityCores.json: For use in non-JBOD Kafka deployments where each broker has varying numbers of CPU cores.
Set the file in the capacity.config.file property in cruisecontrol.properties. The selected file will be used for broker capacity resolution. For example:
capacity.config.file=config/capacityJBOD.json
capacity.config.file=config/capacityJBOD.json
Capacity limits can be set for the following broker resources in the described units:
-
DISK: Disk storage in MB -
CPU: CPU utilization as a percentage (0-100) or as a number of cores -
NW_IN: Inbound network throughput in KB per second -
NW_OUT: Outbound network throughput in KB per second
To apply the same capacity limits to every broker monitored by Cruise Control, set capacity limits for broker ID -1. To set different capacity limits for individual brokers, specify each broker ID and its capacity configuration.
Example capacity limits configuration
For more information, see Populating the Capacity Configuration File in the Cruise Control Wiki.
15.6. Configuring logging and cleanup policy 复制链接链接已复制到粘贴板!
Cruise Control uses log4j1 for all server logging. To change the default configuration, edit the log4j.properties file in ./cruise-control/config/log4j.properties.
You must restart the Cruise Control server before the changes take effect.
It is important that the auto-created __CruiseControlMetrics topic (see auto-created topics) has a log cleanup policy of DELETE rather than COMPACT. Otherwise, records that are needed by Cruise Control might be removed.
As described in Section 15.3, “Deploying the Cruise Control Metrics Reporter”, setting the following options in the Kafka configuration file ensures that the COMPACT log cleanup policy is correctly set:
-
cruise.control.metrics.topic.auto.create=true -
cruise.control.metrics.topic.num.partitions=1 -
cruise.control.metrics.topic.replication.factor=1
If topic auto-creation is disabled in the Cruise Control Metrics Reporter (cruise.control.metrics.topic.auto.create=false), but enabled in the Kafka cluster, then the __CruiseControlMetrics topic is still automatically created by the broker. In this case, you must change the log cleanup policy of the __CruiseControlMetrics topic to DELETE using the kafka-configs.sh tool.
Get the current configuration of the
__CruiseControlMetricstopic:opt/kafka/bin/kafka-configs.sh --bootstrap-server <broker_address> --entity-type topics --entity-name __CruiseControlMetrics --describe
opt/kafka/bin/kafka-configs.sh --bootstrap-server <broker_address> --entity-type topics --entity-name __CruiseControlMetrics --describeCopy to Clipboard Copied! Toggle word wrap Toggle overflow Change the log cleanup policy in the topic configuration:
./bin/kafka-configs.sh --bootstrap-server <broker_address> --entity-type topics --entity-name __CruiseControlMetrics --alter --add-config cleanup.policy=delete
./bin/kafka-configs.sh --bootstrap-server <broker_address> --entity-type topics --entity-name __CruiseControlMetrics --alter --add-config cleanup.policy=deleteCopy to Clipboard Copied! Toggle word wrap Toggle overflow
If topic auto-creation is disabled in both the Cruise Control Metrics Reporter and the Kafka cluster, you must create the __CruiseControlMetrics topic manually and then configure it to use the DELETE log cleanup policy using the kafka-configs.sh tool.
For more information, see Section 9.9, “Modifying a topic configuration”.
15.7. Generating optimization proposals 复制链接链接已复制到粘贴板!
When you make a POST request to the /rebalance endpoint, Cruise Control generates an optimization proposal to rebalance the Kafka cluster based on the optimization goals provided. You can use the results of the optimization proposal to rebalance your Kafka cluster.
You can run the optimization proposal using one of the following endpoints:
-
/rebalance -
/add_broker -
/remove_broker
The endpoint you use depends on whether you are rebalancing across all the brokers already running in the Kafka cluster; or you want to rebalance after adding brokers (scaling up) or before removing brokers (scaling down).
The optimization proposal is generated as a dry run, unless the dryrun parameter is supplied and set to false. In "dry run mode", Cruise Control generates the optimization proposal and the estimated result, but doesn’t initiate the proposal by rebalancing the cluster.
You can analyze the information returned in the optimization proposal and decide whether to approve it.
Use the following parameters to make requests to the endpoints:
dryrun
type: boolean, default: true
Informs Cruise Control whether you want to generate an optimization proposal only (true), or generate an optimization proposal and perform a cluster rebalance (false).
When dryrun=true (the default), you can also pass the verbose parameter to return more detailed information about the state of the Kafka cluster. This includes metrics for the load on each Kafka broker before and after the optimization proposal is applied, and the differences between the before and after values.
excluded_topics
type: regex
A regular expression that matches the topics to exclude from the calculation of the optimization proposal.
goals
type: list of strings, default: the configured default.goals list
List of user-provided optimization goals to use to prepare the optimization proposal. If goals are not supplied, the configured default.goals list in the cruisecontrol.properties file is used.
skip_hard_goals_check
type: boolean, default: false
By default, Cruise Control checks that the user-provided optimization goals (in the goals parameter) contain all the configured hard goals (in hard.goals). A request fails if you supply goals that are not a subset of the configured hard.goals.
Set skip_hard_goals_check to true if you want to generate an optimization proposal with user-provided optimization goals that do not include all the configured hard.goals.
json
type: boolean, default: false
Controls the type of response returned by the Cruise Control server. If not supplied, or set to false, then Cruise Control returns text formatted for display on the command line. If you want to extract elements of the returned information programmatically, set json=true. This will return JSON formatted text that can be piped to tools such as jq, or parsed in scripts and programs.
verbose
type: boolean, default: false
Controls the level of detail in responses that are returned by the Cruise Control server. Can be used with dryrun=true.
Other parameters are available. For more information, see REST APIs in the Cruise Control Wiki.
Prerequisites
- Kafka is running.
- You have configured Cruise Control.
- (Optional for scaling up) You have installed new brokers on hosts to include in the rebalance.
Procedure
Generate an optimization proposal using a POST request to the
/rebalance,/add_broker, or/remove_brokerendpoint.Example request to
/rebalanceusing default goalscurl -v -X POST 'cruise-control-server:9090/kafkacruisecontrol/rebalance'
curl -v -X POST 'cruise-control-server:9090/kafkacruisecontrol/rebalance'Copy to Clipboard Copied! Toggle word wrap Toggle overflow The cached optimization proposal is immediately returned.
NoteIf
NotEnoughValidWindowsis returned, Cruise Control has not yet recorded enough metrics data to generate an optimization proposal. Wait a few minutes and then resend the request.Example request to
/rebalanceusing specified goalscurl -v -X POST 'cruise-control-server:9090/kafkacruisecontrol/rebalance?goals=RackAwareGoal,ReplicaCapacityGoal'
curl -v -X POST 'cruise-control-server:9090/kafkacruisecontrol/rebalance?goals=RackAwareGoal,ReplicaCapacityGoal'Copy to Clipboard Copied! Toggle word wrap Toggle overflow If the request satisfies the supplied goals, the cached optimization proposal is immediately returned. Otherwise, a new optimization proposal is generated using the supplied goals; this takes longer to calculate. You can enforce this behavior by adding the
ignore_proposal_cache=trueparameter to the request.Example request to
/rebalanceusing specified goals without hard goalscurl -v -X POST 'cruise-control-server:9090/kafkacruisecontrol/rebalance?goals=RackAwareGoal,ReplicaCapacityGoal,ReplicaDistributionGoal&skip_hard_goal_check=true'
curl -v -X POST 'cruise-control-server:9090/kafkacruisecontrol/rebalance?goals=RackAwareGoal,ReplicaCapacityGoal,ReplicaDistributionGoal&skip_hard_goal_check=true'Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example request to
/add_brokerthat includes specified brokerscurl -v -X POST 'cruise-control-server:9090/kafkacruisecontrol/add_broker?brokerid=3,4'
curl -v -X POST 'cruise-control-server:9090/kafkacruisecontrol/add_broker?brokerid=3,4'Copy to Clipboard Copied! Toggle word wrap Toggle overflow The request includes the IDs of the new brokers only. For example, this request adds brokers with the IDs
3and4. Replicas are moved to the new brokers from existing brokers when rebalancing.Example request to
/remove_brokerthat excludes specified brokerscurl -v -X POST 'cruise-control-server:9090/kafkacruisecontrol/remove_broker?brokerid=3,4'
curl -v -X POST 'cruise-control-server:9090/kafkacruisecontrol/remove_broker?brokerid=3,4'Copy to Clipboard Copied! Toggle word wrap Toggle overflow The request includes the IDs of the brokers being excluded only. For example, this request excludes brokers with the IDs
3and4. Replicas are moved from the brokers being removed to other existing brokers when rebalancing.NoteIf a broker that is being removed has excluded topics, replicas are still moved.
Review the optimization proposal contained in the response. The properties describe the pending cluster rebalance operation.
The proposal contains a high level summary of the proposed optimization, followed by summaries for each default optimization goal, and the expected cluster state after the proposal has executed.
Pay particular attention to the following information:
-
The
Cluster load after rebalancesummary. If it meets your requirements, you should assess the impact of the proposed changes using the high level summary. -
n inter-broker replica (y MB) movesindicates how much data will be moved across the network between brokers. The higher the value, the greater the potential performance impact on the Kafka cluster during the rebalance. -
n intra-broker replica (y MB) movesindicates how much data will be moved within the brokers themselves (between disks). The higher the value, the greater the potential performance impact on individual brokers (although less than that ofn inter-broker replica (y MB) moves). - The number of leadership moves. This has a negligible impact on the performance of the cluster during the rebalance.
-
The
Asynchronous responses
The Cruise Control REST API endpoints timeout after 10 seconds by default, although proposal generation continues on the server. A timeout might occur if the most recent cached optimization proposal is not ready, or if user-provided optimization goals were specified with ignore_proposal_cache=true.
To allow you to retrieve the optimization proposal at a later time, take note of the request’s unique identifier, which is given in the header of responses from the /rebalance endpoint.
To obtain the response using curl, specify the verbose (-v) option:
curl -v -X POST 'cruise-control-server:9090/kafkacruisecontrol/rebalance'
curl -v -X POST 'cruise-control-server:9090/kafkacruisecontrol/rebalance'
Here is an example header:
If an optimization proposal is not ready within the timeout, you can re-submit the POST request, this time including the User-Task-ID of the original request in the header:
curl -v -X POST -H 'User-Task-ID: 274b8095-d739-4840-85b9-f4cfaaf5c201' 'cruise-control-server:9090/kafkacruisecontrol/rebalance'
curl -v -X POST -H 'User-Task-ID: 274b8095-d739-4840-85b9-f4cfaaf5c201' 'cruise-control-server:9090/kafkacruisecontrol/rebalance'
What to do next
15.8. Approving optimization proposals 复制链接链接已复制到粘贴板!
If you are satisfied with your most recently generated optimization proposal, you can instruct Cruise Control to initiate a cluster rebalance and begin reassigning partitions.
Leave as little time as possible between generating an optimization proposal and initiating the cluster rebalance. If some time has passed since you generated the original optimization proposal, the cluster state might have changed. Therefore, the cluster rebalance that is initiated might be different to the one you reviewed. If in doubt, first generate a new optimization proposal.
Only one cluster rebalance, with a status of "Active", can be in progress at a time.
Prerequisites
- You have generated an optimization proposal from Cruise Control.
Procedure
Send a POST request to the
/rebalance,/add_broker, or/remove_brokerendpoint with thedryrun=falseparameter:If you used the
/add_brokeror/remove_brokerendpoint to generate a proposal that included or excluded brokers, use the same endpoint to perform the rebalance with or without the specified brokers.Example request to
/rebalancecurl -X POST 'cruise-control-server:9090/kafkacruisecontrol/rebalance?dryrun=false'
curl -X POST 'cruise-control-server:9090/kafkacruisecontrol/rebalance?dryrun=false'Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example request to
/add_brokercurl -v -X POST 'cruise-control-server:9090/kafkacruisecontrol/add_broker?dryrun=false&brokerid=3,4'
curl -v -X POST 'cruise-control-server:9090/kafkacruisecontrol/add_broker?dryrun=false&brokerid=3,4'Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example request to
/remove_brokercurl -v -X POST 'cruise-control-server:9090/kafkacruisecontrol/remove_broker?dryrun=false&brokerid=3,4'
curl -v -X POST 'cruise-control-server:9090/kafkacruisecontrol/remove_broker?dryrun=false&brokerid=3,4'Copy to Clipboard Copied! Toggle word wrap Toggle overflow Cruise Control initiates the cluster rebalance and returns the optimization proposal.
- Check the changes that are summarized in the optimization proposal. If the changes are not what you expect, you can stop the rebalance.
Check the progress of the cluster rebalance using the
/user_tasksendpoint. The cluster rebalance in progress has a status of "Active".To view all cluster rebalance tasks executed on the Cruise Control server:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To view the status of a particular cluster rebalance task, supply the
user-task-idsparameter and the task ID:curl 'cruise-control-server:9090/kafkacruisecontrol/user_tasks?user_task_ids=c459316f-9eb5-482f-9d2d-97b5a4cd294d'
curl 'cruise-control-server:9090/kafkacruisecontrol/user_tasks?user_task_ids=c459316f-9eb5-482f-9d2d-97b5a4cd294d'Copy to Clipboard Copied! Toggle word wrap Toggle overflow
(Optional) Removing brokers when scaling down
After a successful rebalance you can stop any brokers you excluded in order to scale down the Kafka cluster.
Check that each broker being removed does not have any live partitions in its log (
log.dirs).ls -l <LogDir> | grep -E '^d' | grep -vE '[a-zA-Z0-9.-]+\.[a-z0-9]+-delete$'
ls -l <LogDir> | grep -E '^d' | grep -vE '[a-zA-Z0-9.-]+\.[a-z0-9]+-delete$'Copy to Clipboard Copied! Toggle word wrap Toggle overflow If a log directory does not match the regular expression
\.[a-z0-9]-delete$, active partitions are still present. If you have active partitions, check the rebalance has finished or the configuration for the optimization proposal. You can run the proposal again. Make sure that there are no active partitions before moving on to the next step.Stop the broker.
./bin/kafka-server-stop.sh
./bin/kafka-server-stop.shCopy to Clipboard Copied! Toggle word wrap Toggle overflow Confirm that the broker has stopped.
jcmd | grep kafka
jcmd | grep kafkaCopy to Clipboard Copied! Toggle word wrap Toggle overflow
15.9. Stopping rebalances 复制链接链接已复制到粘贴板!
You can stop the cluster rebalance that is currently in progress.
This instructs Cruise Control to finish the current batch of partition reassignments and then stop the rebalance. When the rebalance has stopped, completed partition reassignments have already been applied; therefore, the state of the Kafka cluster is different when compared to before the start of the rebalance operation. If further rebalancing is required, you should generate a new optimization proposal.
The performance of the Kafka cluster in the intermediate (stopped) state might be worse than in the initial state.
Prerequisites
- A cluster rebalance is in progress (indicated by a status of "Active").
Procedure
Send a POST request to the
/stop_proposal_executionendpoint:curl -X POST 'cruise-control-server:9090/kafkacruisecontrol/stop_proposal_execution'
curl -X POST 'cruise-control-server:9090/kafkacruisecontrol/stop_proposal_execution'Copy to Clipboard Copied! Toggle word wrap Toggle overflow