Chapter 2. Installing Debezium connectors
Install Debezium connectors through AMQ Streams by extending Kafka Connect with connector plugins. Following a deployment of AMQ Streams, you can deploy Debezium as a connector configuration through Kafka Connect.
2.1. Prerequisites
A Debezium installation requires the following:
- Red Hat Enterprise Linux version 7.x or 8.x with an x86_64 architecture.
-
Administrative privileges (
sudo
access). AMQ Streams 1.5 on Red Hat Enterprise Linux is installed on the host machine.
- AMQ Streams must be running in one of the supported JVM versions.
-
Credentials for the
kafka
user that was created when AMQ Streams was installed. An AMQ Streams cluster is running.
- For instructions on running a basic, non-production AMQ Streams cluster containing a single ZooKeeper and a single Kafka node, see Running a single node AMQ Streams cluster.
If you have an earlier version of AMQ Streams, you need to upgrade to AMQ Streams 1.5. For upgrade instructions, see AMQ Streams and Kafka upgrades.
Additional resources
- For more information about how to install AMQ Streams, see Installing AMQ Streams.
2.2. Kafka topic creation recommendations
Debezium uses multiple Kafka topics for storing data. The topics have to be either created by an administrator, or by Kafka itself by enabling auto-creation for topics using the auto.create.topics.enable
broker configuration.
The following list describes limitations and recommendations to consider when creating topics:
- Database history topics (for MySQL and SQL Server connectors)
- Infinite (or very long retention).
- Replication factor of at least 3 in production.
- Single partition.
- Other topics
Optionally, log compaction enabled (if you wish to only keep the last change event for a given record).
In this case, the
min.compaction.lag.ms
anddelete.retention.ms
topic-level settings in Apache Kafka should be configured so that consumers have enough time to receive all events and delete markers. Specifically, these values should be larger than the maximum downtime you anticipate for the sink connectors (for example, when you update them).- Replicated in production.
Single partition.
You can relax the single partition rule, but your application must handle out-of-order events for different rows in the database (events for a single row are still totally ordered). If multiple partitions are used, Kafka will determine the partition by hashing the key by default. Other partition strategies require using Simple Message Transforms (SMTs) to set the partition number for each record.
2.3. Deploying Debezium with AMQ Streams on RHEL
This procedure describes how to set up connectors for Debezium on Red Hat Enterprise Linux. Connectors are deployed to an AMQ Streams cluster using Kafka Connect, a framework for streaming data between Apache Kafka and external systems. Kafka Connect must be run in distributed mode rather than standalone mode.
This procedure assumes that AMQ Streams is installed and ZooKeeper and Kafka are running.
Procedure
- Visit the Red Hat Integration download site on the Red Hat Customer Portal and download the Debezium connector or connectors that you want to use. For example, download the Debezium 1.1.0 MySQL Connector to use Debezium with a MySQL database.
In
/opt/kafka
, create theconnector-plugins
directory if not already created for other Kafka Connect plugins:$ sudo mkdir /opt/kafka/connector-plugins
Extract the contents of the Debezium connector archive to the
/opt/kafka/connector-plugins
directory.This example extracts the contents of the MySQL connector:
$ sudo unzip debezium-connector-mysql-1.1.0-plugin.zip -d /opt/kafka/connector-plugins
- Repeat the above step for each connector that you want to install.
Switch to the
kafka
user:$ su - kafka $ Password:
Check whether Kafka Connect is already running in distributed mode. If it is running, stop the associated process on all Kafka Connect worker nodes. For example:
$ jcmd | grep ConnectDistributed 18514 org.apache.kafka.connect.cli.ConnectDistributed /opt/kafka/config/connect-distributed.properties $ kill 18514
Edit the
connect-distributed.properties
file in/opt/kafka/config/
and specify the location of the Debezium connector:plugin.path=/opt/kafka/connector-plugins
Run Kafka Connect in distributed mode:
$ /opt/kafka/bin/connect-distributed.sh /opt/kafka/config/connect-distributed.properties
Kafka Connect runs. During startup, Debezium connectors are loaded from the
connector-plugins
directory.- Repeat steps 6–8 for each Kafka Connect worker node.
Additional resources
Updating Kafka Connect
If you need to update your deployment, amend the Debezium connector JAR files in the /opt/kafka/connector-plugins
directory, and then restart Kafka Connect.
Next Steps
The Debezium User Guide describes how to configure each connector and its source database for change data capture. Once configured, a connector will connect to the source database and produce events for each inserted, updated, and deleted row or document.