Installing Debezium on OpenShift
For use with Debezium 1.2 on OpenShift Container Platform
Abstract
Chapter 1. Debezium Overview
Red Hat Debezium is a distributed platform that captures database operations, creates data change event records for row-level operations, and streams change event records to Kafka topics. Red Hat Debezium is built on Apache Kafka and is deployed and integrated with AMQ Streams.
Debezium captures row-level changes to a database table and passes corresponding change events to AMQ Streams. Applications can read these change event streams and access the change events in the order in which they occurred.
Debezium has multiple uses, including:
- Data replication
- Updating caches and search indexes
- Simplifying monolithic applications
- Data integration
- Enabling streaming queries
Debezium provides connectors (based on Kafka Connect) for the following common databases:
- MySQL
- PostgreSQL
- SQL Server
- MongoDB
Db2
ImportantThe Debezium Db2 connector is a Technology Preview feature. Technology Preview features are not supported with Red Hat production service-level agreements (SLAs) and might not be functionally complete; therefore, Red Hat does not recommend implementing any Technology Preview features in production environments. This Technology Preview feature provides early access to upcoming product innovations, enabling you to test functionality and provide feedback during the development process. For more information about support scope, see Technology Preview Features Support Scope.
Debezium is the upstream community project for Red Hat Debezium.
Chapter 2. Installing Debezium connectors
Install Debezium connectors through AMQ Streams by extending Kafka Connect with connector plug-ins. Following a deployment of AMQ Streams, you can deploy Debezium as a connector configuration through Kafka Connect.
2.1. Prerequisites
A Debezium installation requires the following:
- An OpenShift cluster
- A deployment of AMQ Streams with Kafka Connect
-
A user on the OpenShift cluster with
cluster-admin
permissions to set up the required cluster roles and API services
Java 8 or later is required to run the Debezium connectors.
To install Debezium, the OpenShift Container Platform command-line interface (CLI) is required. For information about how to install the CLI for OpenShift 4.4, see the OpenShift Container Platform 4.4 documentation.
Additional resources
- For more information about how to install AMQ Streams, see Using AMQ Streams on OpenShift.
- AMQ Streams includes a Cluster Operator to deploy and manage Kafka components. For more information about how to install Kafka components using the AMQ Streams Cluster Operator, see Deploying Kafka Connect to your cluster.
2.2. Kafka topic creation recommendations
Debezium uses multiple Kafka topics for storing data. The topics must be created by an administrator, or by Kafka itself by enabling auto-creation for topics using the auto.create.topics.enable
broker configuration property.
The following list describes limitations and recommendations to consider when creating topics:
- Database history topics for MySQL, SQL Server, and Db2 connectors
- Infinite or very long retention
- Replication factor of at least three in production
- Single partition
- Other topics
-
When Kafka log compaction is enabled because you want to keep only the last change event for a given record, configure the
min.compaction.lag.ms
anddelete.retention.ms
topic-level settings in Apache Kafka. You want to ensure that consumers have enough time to receive all events and delete markers. Consequently, set these values to be larger than the maximum downtime you anticipate for the sink connectors. For example, consider the downtime when you update the connectors. - Replicated in production.
Single partition.
You can relax the single partition rule, but your application must handle out-of-order events for different rows in the database. Events for a single row are still totally ordered. If you use multiple partitions, the default behavior is that Kafka determines the partition by hashing the key. Other partition strategies require using simple message transforms (SMTs) to set the partition number for each record.
-
When Kafka log compaction is enabled because you want to keep only the last change event for a given record, configure the
2.3. Deploying Debezium with AMQ Streams
To set up connectors for Debezium on Red Hat OpenShift Container Platform, deploy a Kafka cluster to OpenShift, download and configure Debezium connectors, and deploy Kafka Connect with the connectors.
Prerequisites
- You used Red Hat AMQ Streams to set up Apache Kafka and Kafka Connect on OpenShift. AMQ Streams offers operators and images that bring Kafka to OpenShift.
- Podman is installed.
Procedure
Deploy your Kafka cluster. If you already have a Kafka cluster deployed, skip the following three sub-steps.
- Install the AMQ Streams operator by following the steps in Installing AMQ Streams and deploying components.
- Select the desired configuration and deploy your Kafka Cluster.
- Deploy Kafka Connect.
You now have a working Kafka cluster that is running in OpenShift with Kafka Connect.
Check that your pods are running. The pod names correspond with your AMQ Streams deployment.
$ oc get pods NAME READY STATUS <cluster-name>-entity-operator-7b6b9d4c5f-k7b92 3/3 Running <cluster-name>-kafka-0 2/2 Running <cluster-name>-zookeeper-0 2/2 Running <cluster-name>-operator-97cd5cf7b-l58bq 1/1 Running
In addition to running pods, you should have a DeploymentConfig associated with Kafka Connect.
- Go to the Red Hat Integration download site.
- Download the Debezium connector archive(s) for your database(s).
Extract the archive(s) to create a directory structure for the connector plug-in(s). If you downloaded and extracted multiple archives, the structure looks like this:
$ tree ./my-plugins/ ./my-plugins/ ├── debezium-connector-db2 | ├── ... ├── debezium-connector-mongodb | ├── ... ├── debezium-connector-mysql │ ├── ... ├── debezium-connector-postgres │ ├── ... └── debezium-connector-sqlserver ├── ...
Create a new
Dockerfile
by usingregistry.redhat.io/amq7/amq-streams-kafka-25-rhel7:1.5.0
as the base image:FROM registry.redhat.io/amq7/amq-streams-kafka-25-rhel7:1.5.0 USER root:root COPY ./my-plugins/ /opt/kafka/plugins/ USER 1001
Build the container image. If the
Dockerfile
you created in the previous step is in the current directory, run the following command:podman build -t my-new-container-image:latest .
Push your custom image to your container registry:
podman push my-new-container-image:latest
Point to the new container image. Do one of the following:
Edit the
spec.image
field of theKafkaConnector
custom resource.If set, this property overrides the
STRIMZI_DEFAULT_KAFKA_CONNECT_IMAGE
variable in the Cluster Operator. For example:apiVersion: kafka.strimzi.io/v1beta1 kind: KafkaConnector metadata: name: my-connect-cluster spec: #... image: my-new-container-image
-
In the
install/cluster-operator/050-Deployment-strimzi-cluster-operator.yaml
file, edit theSTRIMZI_DEFAULT_KAFKA_CONNECT_IMAGE
variable to point to the new container image and reinstall the Cluster Operator. If you edit this file you will need to apply it to your OpenShift cluster.
The Kafka Connect deployment starts to use the new image.
Next steps
For each Debezium connector that you want to deploy, create and apply a
KafkaConnect
custom resource that configures a connector instance. This starts running the connector against the configured database. When the connector starts, it connects to the configured database and generates change event records for each inserted, updated, and deleted row or document. Details for deploying a connector are in the following sections:- Deploying the MySQL connector
- Deploying the MongoDB connector
- Deploying the PostgreSQL connector
- Deploying the SQL Server connector
To use the Db2 connector, you must have a license for the IBM InfoSphere Data Replication (IIDR) product. However, IIDR does not need to be installed.
-
For more information on the
KafkaConnect.spec.image property
andSTRIMZI_DEFAULT_KAFKA_CONNECT_IMAGE
variable, see Using AMQ Streams on OpenShift.
Appendix A. Using your subscription
Integration is provided through a software subscription. To manage your subscriptions, access your account at the Red Hat Customer Portal.
Accessing your account
- Go to access.redhat.com.
- If you do not already have an account, create one.
- Log in to your account.
Activating a subscription
- Go to access.redhat.com.
- Navigate to My Subscriptions.
- Navigate to Activate a subscription and enter your 16-digit activation number.
Downloading zip and tar files
To access zip or tar files, use the customer portal to find the relevant files for download. If you are using RPM packages, this step is not required.
- Open a browser and log in to the Red Hat Customer Portal Product Downloads page at access.redhat.com/downloads.
- Scroll down to INTEGRATION AND AUTOMATION.
- Click Red Hat Integration to display the Red Hat Integration downloads page.
- Click the Download link for your component.
Revised on 2020-10-02 21:45:37 UTC