Chapter 8. Using AMQ Streams with MirrorMaker 2.0

8.1. MirrorMaker 2.0 data replication
Copy link

MirrorMaker 2.0 consumes messages from a source Kafka cluster and writes them to a target Kafka cluster.

MirrorMaker 2.0 uses:

Source cluster configuration to consume data from the source cluster
Target cluster configuration to output data to the target cluster

MirrorMaker 2.0 is based on the Kafka Connect framework, connectors managing the transfer of data between clusters. A MirrorMaker 2.0 MirrorSourceConnector replicates topics from a source cluster to a target cluster.

The process of mirroring data from one cluster to another cluster is asynchronous. The recommended pattern is for messages to be produced locally alongside the source Kafka cluster, then consumed remotely close to the target Kafka cluster.

MirrorMaker 2.0 can be used with more than one source cluster.

Figure 8.1. Replication across two clusters

By default, a check for new topics in the source cluster is made every 10 minutes. You can change the frequency by adding refresh.topics.interval.seconds to the source connector configuration. However, increasing the frequency of the operation might affect overall performance.

8.2. Cluster configuration
Copy link

You can use MirrorMaker 2.0 in active/passive or active/active cluster configurations.

In an active/active configuration, both clusters are active and provide the same data simultaneously, which is useful if you want to make the same data available locally in different geographical locations.
In an active/passive configuration, the data from an active cluster is replicated in a passive cluster, which remains on standby, for example, for data recovery in the event of system failure.

The expectation is that producers and consumers connect to active clusters only.

A MirrorMaker 2.0 cluster is required at each target destination.

8.2.1. Bidirectional replication (active/active)
Copy link

The MirrorMaker 2.0 architecture supports bidirectional replication in an active/active cluster configuration.

Each cluster replicates the data of the other cluster using the concept of source and remote topics. As the same topics are stored in each cluster, remote topics are automatically renamed by MirrorMaker 2.0 to represent the source cluster. The name of the originating cluster is prepended to the name of the topic.

Figure 8.2. Topic renaming

By flagging the originating cluster, topics are not replicated back to that cluster.

The concept of replication through remote topics is useful when configuring an architecture that requires data aggregation. Consumers can subscribe to source and remote topics within the same cluster, without the need for a separate aggregation cluster.

8.2.2. Unidirectional replication (active/passive)
Copy link

The MirrorMaker 2.0 architecture supports unidirectional replication in an active/passive cluster configuration.

You can use an active/passive cluster configuration to make backups or migrate data to another cluster. In this situation, you might not want automatic renaming of remote topics.

You can override automatic renaming by adding IdentityReplicationPolicy to the source connector configuration. With this configuration applied, topics retain their original names.

8.2.3. Topic configuration synchronization
Copy link

Topic configuration is automatically synchronized between source and target clusters. By synchronizing configuration properties, the need for rebalancing is reduced.

8.2.4. Data integrity
Copy link

MirrorMaker 2.0 monitors source topics and propagates any configuration changes to remote topics, checking for and creating missing partitions. Only MirrorMaker 2.0 can write to remote topics.

8.2.5. Offset tracking
Copy link

MirrorMaker 2.0 tracks offsets for consumer groups using internal topics.

The offset-syncs topic maps the source and target offsets for replicated topic partitions from record metadata
The checkpoints topic maps the last committed offset in the source and target cluster for replicated topic partitions in each consumer group

Offsets for the checkpoints topic are tracked at predetermined intervals through configuration. Both topics enable replication to be fully restored from the correct offset position on failover.

MirrorMaker 2.0 uses its MirrorCheckpointConnector to emit checkpoints for offset tracking.

The location of the offset-syncs topic is the source cluster by default. You can use the offset-syncs.topic.location connector configuration to change this to the target cluster. You need read/write access to the cluster that contains the topic. Using the target cluster as the location of the offset-syncs topic allows you to use MirrorMaker 2.0 even if you have only read access to the source cluster.

8.2.6. Synchronizing consumer group offsets
Copy link

The __consumer_offsets topic stores information on committed offsets, for each consumer group. Offset synchronization periodically transfers the consumer offsets for the consumer groups of a source cluster into the consumer offsets topic of a target cluster.

Offset synchronization is particularly useful in an active/passive configuration. If the active cluster goes down, consumer applications can switch to the passive (standby) cluster and pick up from the last transferred offset position.

To use topic offset synchronization, enable the synchronization by adding sync.group.offsets.enabled to the checkpoint connector configuration, and setting the property to true. Synchronization is disabled by default.

When using the IdentityReplicationPolicy in the source connector, it also has to be configured in the checkpoint connector configuration. This ensures that the mirrored consumer offsets will be applied for the correct topics.

The consumer offsets are only synchronized for consumer groups that are not active in the target cluster.

If enabled, the synchronization of offsets from the source cluster is made periodically. You can change the frequency by adding sync.group.offsets.interval.seconds and emit.checkpoints.interval.seconds to the checkpoint connector configuration. The properties specify the frequency in seconds that the consumer group offsets are synchronized, and the frequency of checkpoints emitted for offset tracking. The default for both properties is 60 seconds. You can also change the frequency of checks for new consumer groups using the refresh.groups.interval.seconds property, which is performed every 10 minutes by default.

Because the synchronization is time-based, any switchover by consumers to a passive cluster will likely result in some duplication of messages.

8.2.7. Connectivity checks
Copy link

A heartbeat internal topic checks connectivity between clusters.

The heartbeat topic is replicated from the source cluster.

Target clusters use the topic to check:

The connector managing connectivity between clusters is running
The source cluster is available

MirrorMaker 2.0 uses its MirrorHeartbeatConnector to emit heartbeats that perform these checks.

8.3. Connector configuration
Copy link

Use Mirrormaker 2.0 connector configuration for the internal connectors that orchestrate the synchronization of data between Kafka clusters.

The following table describes connector properties and the connectors you configure to use them.

Expand

Table 8.1. MirrorMaker 2.0 connector configuration properties
Property	sourceConnector	checkpointConnector	heartbeatConnector
admin.timeout.ms Timeout for admin tasks, such as detecting new topics. Default is `60000` (1 minute).	✓	✓	✓
replication.policy.class Policy to define the remote topic naming convention. Default is `org.apache.kafka.connect.mirror.DefaultReplicationPolicy`.	✓	✓	✓
replication.policy.separator The separator used for topic naming in the target cluster. Default is `.` (dot). It is only used when the `replication.policy.class` is the `DefaultReplicationPolicy`.	✓	✓	✓
consumer.poll.timeout.ms Timeout when polling the source cluster. Default is `1000` (1 second).	✓	✓
offset-syncs.topic.location The location of the `offset-syncs` topic, which can be the `source` (default) or `target` cluster.	✓	✓
topic.filter.class Topic filter to select the topics to replicate. Default is `org.apache.kafka.connect.mirror.DefaultTopicFilter`.	✓	✓
config.property.filter.class Topic filter to select the topic configuration properties to replicate. Default is `org.apache.kafka.connect.mirror.DefaultConfigPropertyFilter`.	✓
config.properties.exclude Topic configuration properties that should not be replicated. Supports comma-separated property names and regular expressions.	✓
offset.lag.max Maximum allowable (out-of-sync) offset lag before a remote partition is synchronized. Default is `100`.	✓
offset-syncs.topic.replication.factor Replication factor for the internal `offset-syncs` topic. Default is `3`.	✓
refresh.topics.enabled Enables check for new topics and partitions. Default is `true`.	✓
refresh.topics.interval.seconds Frequency of topic refresh. Default is `600` (10 minutes).	✓
replication.factor The replication factor for new topics. Default is `2`.	✓
sync.topic.acls.enabled Enables synchronization of ACLs from the source cluster. Default is `true`. Not compatible with the User Operator.	✓
sync.topic.acls.interval.seconds Frequency of ACL synchronization. Default is `600` (10 minutes).	✓
sync.topic.configs.enabled Enables synchronization of topic configuration from the source cluster. Default is `true`.	✓
sync.topic.configs.interval.seconds Frequency of topic configuration synchronization. Default `600` (10 minutes).	✓
checkpoints.topic.replication.factor Replication factor for the internal `checkpoints` topic. Default is `3`.		✓
emit.checkpoints.enabled Enables synchronization of consumer offsets to the target cluster. Default is `true`.		✓
emit.checkpoints.interval.seconds Frequency of consumer offset synchronization. Default is `60` (1 minute).		✓
group.filter.class Group filter to select the consumer groups to replicate. Default is `org.apache.kafka.connect.mirror.DefaultGroupFilter`.		✓
refresh.groups.enabled Enables check for new consumer groups. Default is `true`.		✓
refresh.groups.interval.seconds Frequency of consumer group refresh. Default is `600` (10 minutes).		✓
sync.group.offsets.enabled Enables synchronization of consumer group offsets to the target cluster `__consumer_offsets` topic. Default is `false`.		✓
sync.group.offsets.interval.seconds Frequency of consumer group offset synchronization. Default is `60` (1 minute).		✓
emit.heartbeats.enabled Enables connectivity checks on the target cluster. Default is `true`.			✓
emit.heartbeats.interval.seconds Frequency of connectivity checks. Default is `1` (1 second).			✓
heartbeats.topic.replication.factor Replication factor for the internal `heartbeats` topic. Default is `3`.			✓

8.4. Specifying a maximum number of tasks
Copy link

Connectors create the tasks that are responsible for moving data in and out of Kafka. Each connector comprises one or more tasks that are distributed across a group of workers that run the tasks.

Tasks run in parallel. Workers are assigned one or more tasks. A single task is handled by one worker, so you don’t need more workers than tasks. If there are more tasks than workers, workers handle multiple tasks.

You can specify the maximum number of connector tasks in your MirrorMaker configuration using the tasks.max property. Without specifying a maximum number of tasks, the default setting is a single task. If the infrastructure supports the processing overhead, increasing the number of tasks can improve throughput.

tasks.max configuration for a MirrorMaker connector

clusters=cluster-1,cluster-2
# ...
tasks.max = 10

For the source connector, the maximum number of tasks possible is one for each partition being replicated from the source cluster. For the checkpoint connector, the maximum number of tasks possible is one for each group being replicated from the source cluster. The number of tasks that are started for these connectors is the lower value between the maximum number of possible tasks and the value for tasks.max.

The heartbeat connector always uses a single task.

8.5. Handling high volumes of messages
Copy link

If your MirrorMaker 2.0 deployment is going to be handling a high volume of messages, you might need to adjust its configuration to support it.

The flush pipeline for data replication is source topic (Kafka Connect) source message queue producer buffer target topic. An offset flush timeout period (offset.flush.timeout.ms) is the time to wait for the producer buffer (producer.buffer.memory) to flush and offset data to be committed. Try to avoid a situation where a large producer buffer and an insufficient offset flush timeout period causes a failed to flush or failed to commit offsets type of error.

This type of error means that there are too many messages in the producer buffer, so they can’t all be flushed before the offset flush timeout is reached.

If you are getting this type of error, try the following configuration changes:

Decreasing the default value in bytes of the producer.buffer.memory
Increasing the default value in milliseconds of the offset.flush.timeout.ms

The changes should help to keep the underlying Kafka Connect queue of outstanding messages at a manageable size. You might need to adjust the values to have the desired effect.

If these configuration changes don’t resolve the error, you can try increasing the number of tasks that run in parallel by doing the following.

Increasing the number of tasks using the tasks.max property in your MirrorMaker 2.0 configuration (connect-mirror-maker.properties)
Increasing the number of nodes for the workers that run tasks

Example MirrorMaker 2.0 configuration for handling high volumes of messages

clusters=cluster-1,cluster-2
# ...
cluster-2.offset.flush.timeout.ms = 30000
cluster-2.producer.buffer.memory = 8388608
# ...
tasks.max = 10

8.6. ACL rules synchronization
Copy link

If AclAuthorizer is being used, ACL rules that manage access to brokers also apply to remote topics. Users that can read a source topic can read its remote equivalent.

Note

OAuth 2.0 authorization does not support access to remote topics in this way.

8.7. Synchronizing data between Kafka clusters using MirrorMaker 2.0
Copy link

Use MirrorMaker 2.0 to synchronize data between Kafka clusters through configuration.

The previous version of MirrorMaker continues to be supported, by running MirrorMaker 2.0 in legacy mode.

The configuration must specify:

Each Kafka cluster
Connection information for each cluster, including TLS authentication
The replication flow and direction
- Cluster to cluster
- Topic to topic
Replication rules
Committed offset tracking intervals

This procedure describes how to implement MirrorMaker 2.0 by creating the configuration in a properties file, then passing the properties when using the MirrorMaker script file to set up the connections.

Note

MirrorMaker 2.0 uses Kafka Connect to make the connections to transfer data between clusters. Kafka provides MirrorMaker sink and source connectors for data replication. If you wish to use the connectors instead of the MirrorMaker script, the connectors must be configured in the Kafka Connect cluster. For more information, refer to the Apache Kafka documentation.

Before you begin

A sample configuration properties file is provided in ./config/connect-mirror-maker.properties.

Prerequisites

You need AMQ Streams installed on the hosts of each Kafka cluster node you are replicating.

Procedure

Open the sample properties file in a text editor, or create a new one, and edit the file to include connection information and the replication flows for each Kafka cluster.
The following example shows a configuration to connect two clusters, cluster-1 and cluster-2, bidirectionally. Cluster names are configurable through the clusters property.
```
clusters=cluster-1,cluster-2 
```
1
```
cluster-1.bootstrap.servers=<cluster_name>-kafka-bootstrap-<project_name_one>:443 
```
2
```
cluster-1.security.protocol=SSL 
```
3
```
cluster-1.ssl.truststore.password=<truststore_name>
cluster-1.ssl.truststore.location=<path_to_truststore>/truststore.cluster-1.jks_
cluster-1.ssl.keystore.password=<keystore_name>
cluster-1.ssl.keystore.location=<path_to_keystore>/user.cluster-1.p12_

cluster-2.bootstrap.servers=CLUSTER-NAME-kafka-bootstrap-<project_name_two>:443 
```
4
```
cluster-2.security.protocol=SSL 
```
5
```
cluster-2.ssl.truststore.password=<truststore_name>
cluster-2.ssl.truststore.location=<path_to_truststore>/truststore.cluster-2.jks_
cluster-2.ssl.keystore.password=<keystore_name>
cluster-2.ssl.keystore.location=<path_to_keystore>/user.cluster-2.p12_

cluster-1->cluster-2.enabled=true 
```
6
```
cluster-1->cluster-2.topics=.* 
```
7
```
cluster-2->cluster-1.enabled=true 
```
8
```
cluster-2->cluster-1B->C.topics=.* 
```
9
```
replication.policy.separator=- 
```
10
```
sync.topic.acls.enabled=false 
```
11
```
refresh.topics.interval.seconds=60 
```
12
```
refresh.groups.interval.seconds=60 
```
13
1
Each Kafka cluster is identified with its alias.
2
Connection information for cluster-1, using the bootstrap address and port 443. Both clusters use port 443 to connect to Kafka using OpenShift Routes.
3
The ssl. properties define TLS configuration for cluster-1.
4
Connection information for cluster-2.
5
The ssl. properties define the TLS configuration for cluster-2.
6
Replication flow enabled from the cluster-1 cluster to the cluster-2 cluster.
7
Replicates all topics from the cluster-1 cluster to the cluster-2 cluster.
8
Replication flow enabled from the cluster-2 cluster to the cluster-1 cluster.
9
Replicates specific topics from the cluster-2 cluster to the cluster-1 cluster.
10
Defines the separator used for the renaming of remote topics.
11
When enabled, ACLs are applied to synchronized topics. The default is false.
12
The period between checks for new topics to synchronize.
13
The period between checks for new consumer groups to synchronize.
OPTION: If required, add a policy that overrides the automatic renaming of remote topics. Instead of prepending the name with the name of the source cluster, the topic retains its original name.
This optional setting is used for active/passive backups and data migration.
```
replication.policy.class=org.apache.kafka.connect.mirror.IdentityReplicationPolicy
```
OPTION: If you want to synchronize consumer group offsets, add configuration to enable and manage the synchronization:
```
refresh.groups.interval.seconds=60
sync.group.offsets.enabled=true 
```
1
```
sync.group.offsets.interval.seconds=60 
```
2
```
emit.checkpoints.interval.seconds=60 
```
3
1
Optional setting to synchronize consumer group offsets, which is useful for recovery in an active/passive configuration. Synchronization is not enabled by default.
2
If the synchronization of consumer group offsets is enabled, you can adjust the frequency of the synchronization.
3
Adjusts the frequency of checks for offset tracking. If you change the frequency of offset synchronization, you might also need to adjust the frequency of these checks.

Start ZooKeeper and Kafka in the target clusters:

su - kafka
/opt/kafka/bin/zookeeper-server-start.sh -daemon /opt/kafka/config/zookeeper.properties

/opt/kafka/bin/kafka-server-start.sh -daemon /opt/kafka/config/server.properties

Start MirrorMaker with the cluster connection configuration and replication policies you defined in your properties file:
```
/opt/kafka/bin/connect-mirror-maker.sh /config/connect-mirror-maker.properties
```
MirrorMaker sets up connections between the clusters.
For each target cluster, verify that the topics are being replicated:
```
/bin/kafka-topics.sh --bootstrap-server <broker_address> --list
```

8.8. Using MirrorMaker 2.0 in legacy mode
Copy link

This procedure describes how to configure MirrorMaker 2.0 to use it in legacy mode. Legacy mode supports the previous version of MirrorMaker.

The MirrorMaker script /opt/kafka/bin/kafka-mirror-maker.sh can run MirrorMaker 2.0 in legacy mode.

Important

Kafka MirrorMaker 1 (referred to as just MirrorMaker in the documentation) has been deprecated in Apache Kafka 3.0.0 and will be removed in Apache Kafka 4.0.0. As a result, Kafka MirrorMaker 1 has been deprecated in AMQ Streams as well. Kafka MirrorMaker 1 will be removed from AMQ Streams when we adopt Apache Kafka 4.0.0. As a replacement, use MirrorMaker 2.0 with the IdentityReplicationPolicy.

Prerequisites

You need the properties files you currently use with the legacy version of MirrorMaker.

/opt/kafka/config/consumer.properties
/opt/kafka/config/producer.properties

Procedure

Edit the MirrorMaker consumer.properties and producer.properties files to turn off MirrorMaker 2.0 features.

For example:

replication.policy.class=org.apache.kafka.mirror.LegacyReplicationPolicy

1



refresh.topics.enabled=false

2


refresh.groups.enabled=false
emit.checkpoints.enabled=false
emit.heartbeats.enabled=false
sync.topic.configs.enabled=false
sync.topic.acls.enabled=false

1: Emulate the previous version of MirrorMaker.
2: MirrorMaker 2.0 features disabled, including the internal checkpoint and heartbeat topics

Save the changes and restart MirrorMaker with the properties files you used with the previous version of MirrorMaker:
```
su - kafka /opt/kafka/bin/kafka-mirror-maker.sh \
--consumer.config /opt/kafka/config/consumer.properties \
--producer.config /opt/kafka/config/producer.properties \
--num.streams=2
```
The consumer properties provide the configuration for the source cluster and the producer properties provide the target cluster configuration.
MirrorMaker sets up connections between the clusters.

Start ZooKeeper and Kafka in the target cluster:

su - kafka
/opt/kafka/bin/zookeeper-server-start.sh -daemon /opt/kafka/config/zookeeper.properties

su - kafka
/opt/kafka/bin/kafka-server-start.sh -daemon /opt/kafka/config/server.properties

For the target cluster, verify that the topics are being replicated:
```
/bin/kafka-topics.sh --bootstrap-server <BrokerAddress> --list
```

8.1. MirrorMaker 2.0 data replication
Copy link

8.2. Cluster configuration
Copy link

8.2.1. Bidirectional replication (active/active)
Copy link

8.2.2. Unidirectional replication (active/passive)
Copy link

8.2.3. Topic configuration synchronization
Copy link

8.2.4. Data integrity
Copy link

8.2.5. Offset tracking
Copy link

8.2.6. Synchronizing consumer group offsets
Copy link

8.2.7. Connectivity checks
Copy link

8.3. Connector configuration
Copy link

8.4. Specifying a maximum number of tasks
Copy link

8.5. Handling high volumes of messages
Copy link

8.6. ACL rules synchronization
Copy link

8.7. Synchronizing data between Kafka clusters using MirrorMaker 2.0
Copy link

8.8. Using MirrorMaker 2.0 in legacy mode
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 8. Using AMQ Streams with MirrorMaker 2.0

8.1. MirrorMaker 2.0 data replicationCopy linkLink copied to clipboard!

8.2. Cluster configurationCopy linkLink copied to clipboard!

8.2.1. Bidirectional replication (active/active)Copy linkLink copied to clipboard!

8.2.2. Unidirectional replication (active/passive)Copy linkLink copied to clipboard!

8.2.3. Topic configuration synchronizationCopy linkLink copied to clipboard!

8.2.4. Data integrityCopy linkLink copied to clipboard!

8.2.5. Offset trackingCopy linkLink copied to clipboard!

8.2.6. Synchronizing consumer group offsetsCopy linkLink copied to clipboard!

8.2.7. Connectivity checksCopy linkLink copied to clipboard!

8.3. Connector configurationCopy linkLink copied to clipboard!

8.4. Specifying a maximum number of tasksCopy linkLink copied to clipboard!

8.5. Handling high volumes of messagesCopy linkLink copied to clipboard!

8.6. ACL rules synchronizationCopy linkLink copied to clipboard!

8.7. Synchronizing data between Kafka clusters using MirrorMaker 2.0Copy linkLink copied to clipboard!

8.8. Using MirrorMaker 2.0 in legacy modeCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

8.1. MirrorMaker 2.0 data replication
Copy link

8.2. Cluster configuration
Copy link

8.2.1. Bidirectional replication (active/active)
Copy link

8.2.2. Unidirectional replication (active/passive)
Copy link

8.2.3. Topic configuration synchronization
Copy link

8.2.4. Data integrity
Copy link

8.2.5. Offset tracking
Copy link

8.2.6. Synchronizing consumer group offsets
Copy link

8.2.7. Connectivity checks
Copy link

8.3. Connector configuration
Copy link

8.4. Specifying a maximum number of tasks
Copy link

8.5. Handling high volumes of messages
Copy link

8.6. ACL rules synchronization
Copy link

8.7. Synchronizing data between Kafka clusters using MirrorMaker 2.0
Copy link

8.8. Using MirrorMaker 2.0 in legacy mode
Copy link