Chapter 1. Debezium Connector for MySQL
MySQL has a binary log (binlog) that records all operations in the order in which they are committed to the database. This includes changes to table schemas and the data within tables. MySQL uses the binlog for replication and recovery.
The MySQL connector reads the binlog and produces change events for row-level INSERT
, UPDATE
, and DELETE
operations and records the change events in a Kafka topic. Client applications read those Kafka topics.
As MySQL is typically set up to purge binlogs after a specified period of time, the MySQL connector performs and initial consistent snapshot of each of your databases. The MySQL connector reads the binlog from the point at which the snapshot was made.
1.1. Overview of how the MySQL connector works
The Debezium MySQL connector tracks the structure of the tables, performs snapshots, transforms binlog events into Debezium change events and records where those events are recorded in Kafka.
1.1.1. How the MySQL connector uses database schemas
When a database client queries a database, it uses the database’s current schema. As database schemas often change, the Debezium MySQL connector knows how the schema appeared for each INSERT
, UPDATE
, and DELETE
operation.
MySQL includes both row-level changes and DDL statements in its binlog which the connector reads to parse and update the in-memory representation of each table’s schema. This is used to understand the table structure at the time of each operation, which produces accurate change events.
The connector records all DDL statements along with their position in the binlog in a separate database history so that when the connector restarts (after a possible crash or graceful shutdown), it continues reading the binlog from that specific point in time.
See The MySQL connector and Kafka topics for more on topic naming conventions.
Additional resources
- If you do not use the database history topic described here, check out the schema change topics.
1.1.2. How the MySQL connector performs database snapshots
When your Debezium MySQL connector is first started, it performs an initial consistent snapshot of your database. The following flow describes how this snapshot is completed.
This is the default snapshot mode which is set as initial
in the snapshot.mode
property. For other snapshots modes, please check out the MySQL connector configuration properties.
- The connector…
Step | Action |
---|---|
| Grabs a global read lock that blocks writes by other database clients. Note The snapshot itself does not prevent other clients from applying DDL which might interfere with the connector’s attempt to read the binlog position and table schemas. The global read lock is kept while the binlog position is read before released in a later step. |
| Starts a transaction with repeatable read semantics to ensure that all subsequent reads within the transaction are done against the consistent snapshot. |
| Reads the current binlog position. |
| Reads the schema of the databases and tables allowed by the connector’s configuration. |
| Releases the global read lock. This now allows other database clients to write to the database. |
|
Writes the DDL changes to the schema change topic, including all necessary Note This happens if applicable. |
|
Scans the database tables and generates |
| Commits the transaction. |
| Records the completed snapshot in the connector offsets. |
1.1.2.1. What happens if the connector fails?
If the connector fails, stops, or is rebalanced while making the initial snapshot, the connector creates a new snapshot once restarted. Once that intial snapshot is completed, the Debezium MySQL connector restarts from the same position in the binlog so it does not miss any updates.
If the connector stops for long enough, MySQL could purge old binlog files and the connector’s position would be lost. If the position is lost, the connector reverts to the initial snapshot for its starting position. For more tips on troubleshooting the Debezium MySQL connector, see MySQL connector common issues.
1.1.2.2. What if Global Read Locks are not allowed?
Some environments do not allow a global read lock. If the Debezium MySQL connector detects that global read locks are not permitted, the connector uses table-level locks instead and performs a snapshot with this method.
The user must have LOCK_TABLES
privileges.
- The connector…
Step | Action |
---|---|
| Starts a transaction with repeatable read semantics to ensure that all subsequent reads within the transaction are done against the consistent snapshot. |
| Reads and filters the names of the databases and tables. |
| Reads the current binlog position. |
| Reads the schema of the databases and tables allowed by the connector’s configuration. |
|
Writes the DDL changes to the schema change topic, including all necessary Note This happens if applicable. |
|
Scans the database tables and generates |
| Commits the transaction. |
| Releases the table-level locks. |
| Records the completed snapshot in the connector offsets. |
1.1.3. How the MySQL connector handles schema change topics
You can configure the Debezium MySQL connector to produce schema change events that include all DDL statements applied to databases in the MySQL server. The connector writes all of these events to a Kafka topic named <serverName>
where serverName
is the name of the connector as specified in the database.server.name
configuration property.
If you choose to use schema change events, use the schema change topic and do not consume the database history topic.
Make sure that the num.partitions
configuration for Kafka is set to 1
to ensure schema changes are kept in the correct order.
1.1.3.1. Schema change topic structure
Each message that is written to the schema change topic contains a message key which includes the name of the connected database used when applying DDL statements:
{ "schema": { "type": "struct", "name": "io.debezium.connector.mysql.SchemaChangeKey", "optional": false, "fields": [ { "field": "databaseName", "type": "string", "optional": false } ] }, "payload": { "databaseName": "inventory" } }
The schema change event message value contains a structure that includes the DDL statements, the database to which the statements were applied, and the position in the binlog where the statements appeared:
{ "schema": { "type": "struct", "name": "io.debezium.connector.mysql.SchemaChangeValue", "optional": false, "fields": [ { "field": "databaseName", "type": "string", "optional": false }, { "field": "ddl", "type": "string", "optional": false }, { "field": "source", "type": "struct", "name": "io.debezium.connector.mysql.Source", "optional": false, "fields": [ { "type": "string", "optional": true, "field": "version" }, { "type": "string", "optional": false, "field": "name" }, { "type": "int64", "optional": false, "field": "server_id" }, { "type": "int64", "optional": false, "field": "ts_sec" }, { "type": "string", "optional": true, "field": "gtid" }, { "type": "string", "optional": false, "field": "file" }, { "type": "int64", "optional": false, "field": "pos" }, { "type": "int32", "optional": false, "field": "row" }, { "type": "boolean", "optional": true, "default": false, "field": "snapshot" }, { "type": "int64", "optional": true, "field": "thread" }, { "type": "string", "optional": true, "field": "db" }, { "type": "string", "optional": true, "field": "table" }, { "type": "string", "optional": true, "field": "query" } ] } ] }, "payload": { "databaseName": "inventory", "ddl": "CREATE TABLE products ( id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY, name VARCHAR(255) NOT NULL, description VARCHAR(512), weight FLOAT ); ALTER TABLE products AUTO_INCREMENT = 101;", "source" : { "version": "0.10.0.Beta4", "name": "mysql-server-1", "server_id": 0, "ts_sec": 0, "gtid": null, "file": "mysql-bin.000003", "pos": 154, "row": 0, "snapshot": true, "thread": null, "db": null, "table": null, "query": null } } }
1.1.3.1.1. Important tips regarding schema change topics
The ddl
field may contain multiple DDL statements. Every statement applies to the database in the databaseName
field and appears in the same order as they were applied in the database. The source
field is structured exactly as a standard data change event written to table-specific topics. This field is useful to correlate events on different topic.
.... "payload": { "databaseName": "inventory", "ddl": "CREATE TABLE products ( id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,... "source" : { .... } } ....
- What if a client submits DDL statements to multiple databases?
- If MySQL applies them atomically, the connector takes the DDL statements in order, groups them by database, and creates a schema change event for each group.
- If MySQL applies them individually, the connector creates a separate schema change event for each statement.
Additional resources
- If you do not use the schema change topics detailed here, check out the database history topic.
1.1.4. MySQL connector events
All data change events produced by the Debezium MySQL connector contain a key and a value. The change event key and the change event value each contain a schema and a payload where the schema describes the structure of the payload and the payload contains the data.
The MySQL connector ensures that all Kafka Connect schema names adhere to the Avro schema name format. This is important as any character that is not a latin letter or underscore is replaced by an underscore which can lead to unexpected conflicts in schema names when the logical server names, database names, and table names container other characters that are replaced with these underscores.
1.1.4.1. Change event key
For any given table, the change event’s key has a structure that contains a field for each column in the PRIMARY KEY
(or unique constraint) at the time the event was created. Let us look at an example table and then how the schema and payload would appear for the table.
example table
CREATE TABLE customers ( id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY, first_name VARCHAR(255) NOT NULL, last_name VARCHAR(255) NOT NULL, email VARCHAR(255) NOT NULL UNIQUE KEY ) AUTO_INCREMENT=1001;
example change event key
{ "schema": { 1 "type": "struct", "name": "mysql-server-1.inventory.customers.Key", 2 "optional": false, 3 "fields": [ 4 { "field": "id", "type": "int32", "optional": false } ] }, "payload": { 5 "id": 1001 } }
- 1
- The
schema
describes what is in thepayload
. - 2
- The
mysql-server-1.inventory.customers.Key
is the name of the schema which defines the structure wheremysql-server-1
is the connector name,inventory
is the database, andcustomers
is the table. - 3
- Denotes that the
payload
is not optional. - 4
- Specifies the type of fields expected in the
payload
. - 5
- The payload itself, which in this case only contains a single
id
field.
This key describes the row in the inventory.customers
table which is out from the connector entitled mysql-server-1
whose id
primary key column has a value of 1001
.
1.1.4.2. Change event value
The change event value contains a schema and a payload section. There are three types of change event values which have an envelope structure. The fields in this structure are explained below and marked on each of the change event value examples.
Item | Field name | Description |
---|---|---|
1 |
|
|
2 |
| A mandatory string that describes the type of operation. values
|
3 |
| An optional field that specifies the state of the row before the event occurred. |
4 |
| An optional field that specifies the state of the row after the event occurred. |
5 |
| A mandatory field that describes the source metadata for the event including:
Note
If the binlog_rows_query_log_events option is enabled and the connector has the |
6 |
| An optional field that displays the time at which the connector processed the event. Note The time is based on the system clock in the JVM running the Kafka Connect task. |
Let us look at an example table and then how the schema and payload would appear for the table.
example table
CREATE TABLE customers ( id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY, first_name VARCHAR(255) NOT NULL, last_name VARCHAR(255) NOT NULL, email VARCHAR(255) NOT NULL UNIQUE KEY ) AUTO_INCREMENT=1001;
1.1.4.2.1. Create change event value
This example shows a create event for the customers
table:
{ "schema": { 1 "type": "struct", "fields": [ { "type": "struct", "fields": [ { "type": "int32", "optional": false, "field": "id" }, { "type": "string", "optional": false, "field": "first_name" }, { "type": "string", "optional": false, "field": "last_name" }, { "type": "string", "optional": false, "field": "email" } ], "optional": true, "name": "mysql-server-1.inventory.customers.Value", "field": "before" }, { "type": "struct", "fields": [ { "type": "int32", "optional": false, "field": "id" }, { "type": "string", "optional": false, "field": "first_name" }, { "type": "string", "optional": false, "field": "last_name" }, { "type": "string", "optional": false, "field": "email" } ], "optional": true, "name": "mysql-server-1.inventory.customers.Value", "field": "after" }, { "type": "struct", "fields": [ { "type": "string", "optional": false, "field": "version" }, { "type": "string", "optional": false, "field": "connector" }, { "type": "string", "optional": false, "field": "name" }, { "type": "int64", "optional": false, "field": "ts_ms" }, { "type": "boolean", "optional": true, "default": false, "field": "snapshot" }, { "type": "string", "optional": false, "field": "db" }, { "type": "string", "optional": true, "field": "table" }, { "type": "int64", "optional": false, "field": "server_id" }, { "type": "string", "optional": true, "field": "gtid" }, { "type": "string", "optional": false, "field": "file" }, { "type": "int64", "optional": false, "field": "pos" }, { "type": "int32", "optional": false, "field": "row" }, { "type": "int64", "optional": true, "field": "thread" }, { "type": "string", "optional": true, "field": "query" } ], "optional": false, "name": "io.product.connector.mysql.Source", "field": "source" }, { "type": "string", "optional": false, "field": "op" }, { "type": "int64", "optional": true, "field": "ts_ms" } ], "optional": false, "name": "mysql-server-1.inventory.customers.Envelope" }, "payload": { 2 "op": "c", "ts_ms": 1465491411815, "before": null, "after": { "id": 1004, "first_name": "Anne", "last_name": "Kretchmar", "email": "annek@noanswer.org" }, "source": { "version": "1.0.3.Final", "connector": "mysql", "name": "mysql-server-1", "ts_ms": 0, "snapshot": false, "db": "inventory", "table": "customers", "server_id": 0, "gtid": null, "file": "mysql-bin.000003", "pos": 154, "row": 0, "thread": 7, "query": "INSERT INTO customers (first_name, last_name, email) VALUES ('Anne', 'Kretchmar', 'annek@noanswer.org')" } } }
- 1
- The
schema
portion of this event’s value shows the schema for the envelope, the schema for the source structure (which is specific to the MySQL connector and reused across all events), and the table-specific schemas for thebefore
andafter
fields. - 2
- The
payload
portion of this event’s value shows the information in the event, namely that it is describing that the row was created (becauseop=c
), and that theafter
field value contains the values of the new inserted row’sid
,first_name
,last_name
, andemail
columns.
1.1.4.2.2. Update change event value
The value of an update change event on the customers
table has the exact same schema as a create event. The payload is structured the same, but holds different values. Here is an example (formatted for readability):
{ "schema": { ... }, "payload": { "before": { 1 "id": 1004, "first_name": "Anne", "last_name": "Kretchmar", "email": "annek@noanswer.org" }, "after": { 2 "id": 1004, "first_name": "Anne Marie", "last_name": "Kretchmar", "email": "annek@noanswer.org" }, "source": { 3 "version": "1.0.3.Final", "name": "mysql-server-1", "connector": "mysql", "name": "mysql-server-1", "ts_ms": 1465581, "snapshot": false, "db": "inventory", "table": "customers", "server_id": 223344, "gtid": null, "file": "mysql-bin.000003", "pos": 484, "row": 0, "thread": 7, "query": "UPDATE customers SET first_name='Anne Marie' WHERE id=1004" }, "op": "u", 4 "ts_ms": 1465581029523 5 } }
Comparing this to the value in the insert event, you can see a couple of differences in the payload
section:
- 1
- The
before
field now has the state of the row with the values before the database commit. - 2
- The
after
field now has the updated state of the row, and thefirst_name
value is nowAnne Marie
. You can compare thebefore
andafter
structures to determine what actually changed in this row because of the commit. - 3
- The
source
field structure has the same fields as before, but the values are different (this event is from a different position in the binlog). Thesource
structure shows information about MySQL’s record of this change (providing traceability). It also has information you can use to compare to other events in this and other topics to know whether this event occurred before, after, or as part of the same MySQL commit as other events. - 4
- The
op
field value is nowu
, signifying that this row changed because of an update. - 5
- The
ts_ms
field shows the timestamp when Debezium processed this event.
When the columns for a row’s primary or unique key are updated, the value of the row’s key is changed and Debezium outputs three events: a DELETE event and tombstone event with the old key for the row, followed by an INSERT event with the new key for the row.
1.1.4.2.3. Delete change event value
The value of a delete change event on the customers
table has the exact same schema as create and update events. The payload is structured the same, but holds different values. Here is an example (formatted for readability):
{ "schema": { ... }, "payload": { "before": { 1 "id": 1004, "first_name": "Anne Marie", "last_name": "Kretchmar", "email": "annek@noanswer.org" }, "after": null, 2 "source": { 3 "version": "1.0.3.Final", "connector": "mysql", "name": "mysql-server-1", "ts_ms": 1465581, "snapshot": false, "db": "inventory", "table": "customers", "server_id": 223344, "gtid": null, "file": "mysql-bin.000003", "pos": 805, "row": 0, "thread": 7, "query": "DELETE FROM customers WHERE id=1004" }, "op": "d", 4 "ts_ms": 1465581902461 5 } }
Comparing the payload
portion to the payloads in the create and update events, you can see some differences:
- 1
- The
before
field now has the state of the row that was deleted with the database commit. - 2
- The
after
field isnull
, signifying that the row no longer exists. - 3
- The
source
field structure has many of the same values as before, except thets_sec
andpos
fields have changed (and the file might have changed in other scenarios). - 4
- The
op
field value is nowd
, signifying that this row was deleted. - 5
- The
ts_ms
shows the timestamp when Debezium processed this event.
This event provides a consumer with the information that it needs to process the removal of this row. The old values are included because some consumers might require them in order to properly handle the removal.
The MySQL connector’s events are designed to work with Kafka log compaction, which allows for the removal of some older messages as long as at least the most recent message for every key is kept. This allows Kafka to reclaim storage space while ensuring the topic contains a complete data set and can be used for reloading key-based state.
When a row is deleted, the delete event value listed above still works with log compaction, because Kafka can still remove all earlier messages with that same key. If the message value is null
, Kafka knows that it can remove all messages with that same key. To make this possible, Debezium’s MySQL connector always follows a delete event with a special tombstone event that has the same key but a null
value.
1.1.5. How the MySQL connector maps data types
The Debezium MySQL connector represents changes to rows with events that are structured like the table in which the row exists. The event contains a field for each column value. The MySQL data type of that column dictates how the value is represented in the event.
Columns that store strings are defined in MySQL with a character set and collation. The MySQL connector uses the column’s character set when reading the binary representation of the column values in the binlog events. The following table shows how the connector maps the MySQL data types to both literal and semantic types.
- literal type : how the value is represented using Kafka Connect schema types
- semantic type : how the Kafka Connect schema captures the meaning of the field (schema name)
MySQL type | Literal type | Semantic type |
---|---|---|
|
| n/a |
|
| n/a |
|
| io.debezium.data.Bits Note
The example (where n is bits)
numBytes = n/8 + (n%8== 0 ? 0 : 1) |
|
| n/a |
|
| n/a |
|
| n/a |
|
| n/a |
|
| n/a |
|
| n/a |
|
| n/a |
|
| n/a |
|
| n/a |
|
| n/a |
|
| n/a |
|
| n/a |
|
| n/a |
|
| n/a |
|
| n/a |
|
| n/a |
|
| n/a |
|
| n/a |
|
| n/a |
|
| n/a |
|
| io.debezium.data.Json Note
Contains the string representation of a |
|
| io.debezium.data.Enum Note
The |
|
| io.debezium.data.EnumSet Note
The |
|
| io.debezium.time.Year |
|
| io.debezium.time.ZonedTimestamp Note
In ISO 8601 format with microsecond precision. MySQL allows |
1.1.5.1. Temporal values
Excluding the TIMESTAMP
data type, MySQL temporal types depend on the value of the time.precision.mode
configuration property.
See MySQL connector configuration properties for more details.
Temporal values without timezones are converted from UTC to milliseconds or microseconds (DATETIME
) or to the configured database timezone (TIMESTAMP
).
-
DATETIME
with a value of2018-06-20 06:37:03
becomes1529476623000
. -
TIMESTAMP
with a value of2018-06-20 06:37:03
becomes2018-06-20T13:37:03Z
.
MySQL allows zero-values for DATE
, DATETIME
, and TIMESTAMP
columns, which are sometimes preferred over null values. However, the MySQL connector represents them as null values when the column definition allows nulls, or as the epoch day when the column does not allow nulls.
- time.precision.mode=adaptive_time_microseconds(default)
The MySQL connector determins the literal type and semantic type based on the column’s data type definition so that events represent exactly the values in the database; all time fields are in microseconds.
MySQL type Literal type Semantic type DATE
INT32
io.debezium.time.Date
NoteRepresents the number of days since epoch.
TIME[(M)]
INT64
io.debezium.time.MicroTime
NoteRepresents the time value in microseconds and does not include timezone information. MySQL allows
M
to be in the range of0-6
.DATETIME, DATETIME(0), DATETIME(1), DATETIME(2), DATETIME(3)
INT64
io.debezium.time.Timestamp
NoteRepresents the number of milliseconds past epoch and does not include timezone information.
DATETIME(4), DATETIME(5), DATETIME(6)
INT64
io.debezium.time.MicroTimestamp
NoteRepresents the number of microseconds past epoch and does not include timezone information.
- time.precision.mode=connect
The MySQL connector uses the predefined Kafka Connect logical types. This approach is less precise than the default approach and the events could be less precise if the database column has a fractional second precision value of greater than
3
.MySQL type Literal type Semantic type DATE
INT32
org.apache.kafka.connect.data.Date
NoteRepresents the number of days since epoch.
TIME[(M)]
INT64
org.apache.kafka.connect.data.Time
NoteRepresents the time value in microseconds since midnight and does not include timezone information.
DATETIME[(M)]
INT64
org.apache.kafka.connect.data.Timestamp
NoteRepresents the number of milliseconds since epoch, and does not include timezone information.
1.1.5.2. Decimal values
Decimals are handled via the decimal.handling.mode
property.
See MySQL connector configuration properties for more details.
- decimal.handling.mode=precise
MySQL type Literal type Semantic type NUMERIC[(M[,D])]
BYTES
org.apache.kafka.connect.data.Decimal
NoteThe
scale
schema parameter contains an integer that represents how many digits the decimal point shifted.DECIMAL[(M[,D])]
BYTES
org.apache.kafka.connect.data.Decimal
NoteThe
scale
schema parameter contains an integer that represents how many digits the decimal point shifted.- decimal.handling.mode=double
MySQL type Literal type Semantic type NUMERIC[(M[,D])]
FLOAT64
n/a
DECIMAL[(M[,D])]
FLOAT64
n/a
- decimal.handling.mode=string
MySQL type Literal type Semantic type NUMERIC[(M[,D])]
STRING
n/a
DECIMAL[(M[,D])]
STRING
n/a
1.1.5.3. Spatial data types
Currently, the Debezium MySQL connector supports the following spatial data types:
MySQL type | Literal type | Semantic type |
---|---|---|
|
| io.debezium.data.geometry.Geometry Note Contains a structure with two fields:
|
1.1.6. The MySQL connector and Kafka topics
The Debezium MySQL connector writes events for all INSERT
, UPDATE
, and DELETE
operations from a single table to a single Kafka topic. The Kafka topic naming convention is as follows:
format
serverName.databaseName.tableName
Example 1.1. example
Let us say that fulfillment
is the server name and inventory
is the database which contains three tables of orders
, customers
, and products
. The Debezium MySQL connector produces events on three Kafka topics, one for each table in the database:
fulfillment.inventory.orders fulfillment.inventory.customers fulfillment.inventory.products
1.1.7. MySQL supported topologies
The Debezium MySQL connector supports the following MySQL topologies:
- Standalone
- When a single MySQL server is used, the server must have the binlog enabled (and optionally GTIDs enabled) so the Debezium MySQL connector can monitor the server. This is often acceptable, since the binary log can also be used as an incremental backup. In this case, the MySQL connector always connects to and follows this standalone MySQL server instance.
- Master and slave
The Debezium MySQL connector can follow one of the masters or one of the slaves (if that slave has its binlog enabled), but the connector only sees changes in the cluster that are visible to that server. Generally, this is not a problem except for the multi-master topologies.
The connector records its position in the server’s binlog, which is different on each server in the cluster. Therefore, the connector will need to follow just one MySQL server instance. If that server fails, it must be restarted or recovered before the connector can continue.
- High available clusters
- A variety of high availability solutions exist for MySQL, and they make it far easier to tolerate and almost immediately recover from problems and failures. Most HA MySQL clusters use GTIDs so that slaves are able to keep track of all changes on any of the master.
- Multi-master
A multi-master MySQL topology uses one or more MySQL slaves that each replicate from multiple masters. This is a powerful way to aggregate the replication of multiple MySQL clusters, and requires using GTIDs.
The Debezium MySQL connector can use these multi-master MySQL slaves as sources, and can fail over to different multi-master MySQL slaves as long as thew new slave is caught up to the old slave (e.g., the new slave has all of the transactions that were last seen on the first slave). This works even if the connector is only using a subset of databases and/or tables, as the connector can be configured to include or exclude specific GTID sources when attempting to reconnect to a new multi-master MySQL slave and find the correct position in the binlog.
- Hosted
There is support for the Debezium MySQL connector to use hosted options such as Amazon RDS and Amazon Aurora.
ImportantBecause these hosted options do not allow a global read lock, table-level locks are used to create the consistent snapshot.
1.2. Setting up MySQL server
1.2.1. Creating a MySQL user for Debezium
You have to define a MySQL user with appropriate permissions on all databases that the Debezium MySQL connector monitors.
Prerequisites
- You must have a MySQL server.
- You must know basic SQL commands.
Procedure
- Create the MySQL user:
mysql> CREATE USER 'user'@'localhost' IDENTIFIED BY 'password';
- Grant the required permissions to the user:
mysql> GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'user' IDENTIFIED BY 'password';
See permissions explained for notes on each permission.
If using a hosted option such as Amazon RDS or Amazon Aurora that do not allow a global read lock, table-level locks are used to create the consistent snapshot. In this case, you need to also grant LOCK_TABLES
permissions to the user that you create. See Overview of how the MySQL connector works for more details.
- Finalize the user’s permissions:
mysql> FLUSH PRIVILEGES;
1.2.1.1. Permissions explained
Permission/item | Description |
---|---|
| enables the connector to select rows from tables in databases Note This is only used when performing a snapshot. |
|
enables the connector the use of the Note This is only used when performing a snapshot. |
|
enables the connector to see database names by issuing the Note This is only used when performing a snapshot. |
| enables the connector to connect to and read the MySQL server binlog. |
| enables the connector the use of following statements:
Important This is always required for the connector. |
| Identifies the database to which the permission apply. |
| Specifies the user to which the permissions are granted. |
| Specifies the password for the user. |
1.2.2. Enabling the MySQL binlog for Debezium
You must enable binary logging for MySQL replication. The binary logs record transaction updates for replication tools to propagate changes.
Prerequisites
- You must have a MySQL server.
- You should have appropriate MySQL user privileges.
Procedure
-
Check if the
log-bin
option is already on or not.
mysql> SELECT variable_value as "BINARY LOGGING STATUS (log-bin) ::" FROM information_schema.global_variables WHERE variable_name='log_bin';
-
If
OFF
, configure your MySQL server configuration file with the following:
See Binlog config properties for notes on each property.
server-id = 223344 1 log_bin = mysql-bin 2 binlog_format = ROW 3 binlog_row_image = FULL 4 expire_logs_days = 10 5
- Confirm your changes by checking the binlog status once more.
mysql> SELECT variable_value as "BINARY LOGGING STATUS (log-bin) ::" FROM information_schema.global_variables WHERE variable_name='log_bin';
1.2.2.1. Binlog configuration properties
Number | Property | Description |
---|---|---|
1 |
|
The value for the |
2 |
|
The value of |
3 |
|
The |
4 |
|
The |
5 |
|
This is the number of days for automatic binlog file removal. The default is Tip Set the value to match the needs of your environment. |
1.2.3. Enabling MySQL Global Transaction Identifiers for Debezium
Global transaction identifiers (GTIDs) uniquely identify transactions that occur on a server within a cluster. Though not required for the Debezium MySQL connector, using GTIDs simplifies replication and allows you to more easily confirm if master and slave servers are consistent.
GTIDs are only available from MySQL 5.6.5 and later. See the MySQL documentation for more details.
Prerequisites
- You must have a MySQL server.
- You must know basic SQL commands.
- You must have access to the MySQL configuration file.
Procedure
-
Enable
gtid_mode
:
mysql> gtid_mode=ON
-
Enable
enforce_gtid_consistency
:
mysql> enforce_gtid_consistency=ON
- Confirm the changes:
mysql> show global variables like '%GTID%';
response
+--------------------------+-------+ | Variable_name | Value | +--------------------------+-------+ | enforce_gtid_consistency | ON | | gtid_mode | ON | +--------------------------+-------+
1.2.3.1. Options explained
Permission/item | Description |
---|---|
| Boolean which specifies whether GTID mode of the MySQL server is enabled or not.
|
| Boolean which instructs the server whether or not to enforce GTID consistency by allowing the execution of statements that can be logged in a transactionally safe manner; required when using GTIDs.
|
1.2.4. Setting up session timeouts for Debezium
When an initial consistent snapshot is made for large databases, your established connection could timeout while the tables are being read. You can prevent this behavior by configuring interactive_timeout
and wait_timeout
in your MySQL configuration file.
Prerequisites
- You must have a MySQL server.
- You must know basic SQL commands.
- You must have access to the MySQL configuration file.
Procedure
-
Configure
interactive_timeout
:
mysql> interactive_timeout=<duration-in-seconds>
-
Configure
wait_timeout
:
mysql> wait_timeout= <duration-in-seconds>
1.2.4.1. Options explained
Permission/item | Description |
---|---|
| The number of seconds the server waits for activity on an interactive connection before closing it. Tip See MySQL’s documentation for more details. |
| The number of seconds the server waits for activity on a noninteractive connection before closing it. Tip See MySQL’s documentation for more details. |
1.2.5. Enabling query log events for Debezium
You might want to see the original SQL
statement for each binlog event. Enabling the binlog_rows_query_log_events
option in the MySQL configuration file allows you to do this.
This option is only available from MySQL 5.6 and later.
Prerequisites
- You must have a MySQL server.
- You must know basic SQL commands.
- You must have access to the MySQL configuration file.
Procedure
-
Enable
binlog_rows_query_log_events
:
mysql> binlog_rows_query_log_events=ON
1.2.5.1. Options explained
Permission/item | Description |
---|---|
|
Boolean which enables/disables support for including the original
|
1.3. Deploying the MySQL connector
1.3.1. Installing the MySQL connector
Installing the Debezium MySQL connector is a simple process whereby you only need to download the JAR, extract it to your Kafka Connect environment, and ensure the plugin’s parent directory is specified in your Kafka Connect environment.
Prerequisites
- You have Zookeeper, Kafka, and Kafka Connect installed.
- You have MySQL Server installed and setup.
Procedure
- Download the Debezium MySQL connector.
- Extract the files into your Kafka Connect environment.
- Add the plugin’s parent directory to your Kafka Connect plugin path:
plugin.path=/kafka/connect
The above example assumes you have extracted the Debezium MySQL connector to the /kafka/connect/Debezium-connector-mysql
path.
- Restart your Kafka Connect process. This ensures the new JARs are picked up.
1.3.2. Configuring the MySQL connector
Typically, you configure the Debezium MySQL connector in a .yaml
file using the configuration properties available for the connector.
Prerequisites
- You should have completed the installation process for the connector.
Procedure
-
Set the
"name"
of the connector in the.yaml
file. - Set the configuration properties that you require for your Debezium MySQL connector.
For a complete list of configuration properties, see MySQL connector configuration properties.
MySQL connector example configuration
apiVersion: kafka.strimzi.io/v1beta1 kind: KafkaConnector metadata: name: inventory-connector 1 labels: strimzi.io/cluster: my-connect-cluster spec: class: io.debezium.connector.mysql.MySqlConnector tasksMax: 1 2 config: 3 database.hostname: mysql 4 database.port: 3306 database.user: debezium database.password: dbz database.server.id: 184054 5 database.server.name: dbserver1 6 database.whitelist: inventory 7 database.history.kafka.bootstrap.servers: my-cluster-kafka-bootstrap:9092 8 database.history.kafka.topic: schema-changes.inventory 9
- 1 1
- The name of the connector.
- 2 2
- Only one task should operate at any one time. Because the MySQL connector reads the MySQL server’s
binlog
, using a single connector task ensures proper order and event handling. The Kafka Connect service uses connectors to start one or more tasks that do the work, and it automatically distributes the running tasks across the cluster of Kafka Connect services. If any of the services stop or crash, those tasks will be redistributed to running services. - 3 3
- The connector’s configuration.
- 4 4
- The database host, which is the name of the container running the MySQL server (
mysql
). - 5 5
- A unique server ID and name. The server name is the logical identifier for the MySQL server or cluster of servers.
- 6
- This name will be used as the prefix for all Kafka topics.
- 7
- Only changes in the
inventory
database will be detected. - 8
- The connector will store the history of the database schemas in Kafka using this broker (the same broker to which you are sending events) and topic name.
- 9
- Upon restart, the connector will recover the schemas of the database that existed at the point in time in the
binlog
when the connector should begin reading.
1.3.3. MySQL connector configuration properties
The configuration properties listed here are required to run the Debezium MySQL connector. There are also advanced MySQL connector properties whose default value rarely needs to be changed and therefore, they do not need to be specified in the connector configuration.
The Debezium MySQL connector supports pass-through configuration when creating the Kafka producer and consumer. See the Kafka documentation for more details on pass-through properties.
Property | Default | Description |
---|---|---|
| Unique name for the connector. Attempting to register again with the same name will fail. (This property is required by all Kafka Connect connectors.) | |
|
The name of the Java class for the connector. Always use a value of | |
|
| The maximum number of tasks that should be created for this connector. The MySQL connector always uses a single task and therefore does not use this value, so the default is always acceptable. |
| IP address or hostname of the MySQL database server. | |
|
| Integer port number of the MySQL database server. |
| Name of the MySQL database to use when connecting to the MySQL database server. | |
| Password to use when connecting to the MySQL database server. | |
| Logical name that identifies and provides a namespace for the particular MySQL database server/cluster being monitored. The logical name should be unique across all other connectors, since it is used as a prefix for all Kafka topic names emanating from this connector. Only alphanumeric characters and underscores should be used. | |
| random | A numeric ID of this database client, which must be unique across all currently-running database processes in the MySQL cluster. This connector joins the MySQL database cluster as another server (with this unique ID) so it can read the binlog. By default, a random number is generated between 5400 and 6400, though we recommend setting an explicit value. |
| The full name of the Kafka topic where the connector will store the database schema history. | |
| A list of host/port pairs that the connector will use for establishing an initial connection to the Kafka cluster. This connection will be used for retrieving database schema history previously stored by the connector, and for writing each DDL statement read from the source database. This should point to the same Kafka cluster used by the Kafka Connect process. | |
| empty string |
An optional comma-separated list of regular expressions that match database names to be monitored; any database name not included in the whitelist will be excluded from monitoring. By default all databases will be monitored. May not be used with |
| empty string |
An optional comma-separated list of regular expressions that match database names to be excluded from monitoring; any database name not included in the blacklist will be monitored. May not be used with |
| empty string |
An optional comma-separated list of regular expressions that match fully-qualified table identifiers for tables to be monitored; any table not included in the whitelist will be excluded from monitoring. Each identifier is of the form databaseName.tableName. By default the connector will monitor every non-system table in each monitored database. May not be used with |
| empty string |
An optional comma-separated list of regular expressions that match fully-qualified table identifiers for tables to be excluded from monitoring; any table not included in the blacklist will be monitored. Each identifier is of the form databaseName.tableName. May not be used with |
| empty string | An optional comma-separated list of regular expressions that match the fully-qualified names of columns that should be excluded from change event message values. Fully-qualified names for columns are of the form databaseName.tableName.columnName, or databaseName.schemaName.tableName.columnName. |
| n/a | An optional comma-separated list of regular expressions that match the fully-qualified names of character-based columns whose values should be truncated in the change event message values if the field values are longer than the specified number of characters. Multiple properties with different lengths can be used in a single configuration, although in each the length must be a positive integer. Fully-qualified names for columns are of the form databaseName.tableName.columnName, or databaseName.schemaName.tableName.columnName. |
| n/a |
An optional comma-separated list of regular expressions that match the fully-qualified names of character-based columns whose values should be replaced in the change event message values with a field value consisting of the specified number of asterisk ( |
| n/a |
An optional comma-separated list of regular expressions that match the fully-qualified names of columns whose original type and length should be added as a parameter to the corresponding field schemas in the emitted change messages. The schema parameters |
|
|
Time, date, and timestamps can be represented with different kinds of precision, including: |
|
|
Specifies how the connector should handle values for |
|
|
Specifies how BIGINT UNSIGNED columns should be represented in change events, including: |
|
|
Boolean value that specifies whether the connector should publish changes in the database schema to a Kafka topic with the same name as the database server ID. Each schema change will be recorded using a key that contains the database name and whose value includes the DDL statement(s). This is independent of how the connector internally records database history. The default is |
|
|
Boolean value that specifies whether the connector should include the original SQL query that generated the change event. |
|
|
Specifies how the connector should react to exceptions during deserialization of binlog events. |
|
|
Specifies how the connector should react to binlog events that relate to tables that are not present in internal schema representation (i.e. internal representation is not consistent with database) |
|
|
Positive integer value that specifies the maximum size of the blocking queue into which change events read from the database log are placed before they are written to Kafka. This queue can provide backpressure to the binlog reader when, for example, writes to Kafka are slower or if Kafka is not available. Events that appear in the queue are not included in the offsets periodically recorded by this connector. Defaults to 8192, and should always be larger than the maximum batch size specified in the |
|
| Positive integer value that specifies the maximum size of each batch of events that should be processed during each iteration of this connector. Defaults to 2048. |
|
| Positive integer value that specifies the number of milliseconds the connector should wait during each iteration for new change events to appear. Defaults to 1000 milliseconds, or 1 second. |
|
| A positive integer value that specifies the maximum time in milliseconds this connector should wait after trying to connect to the MySQL database server before timing out. Defaults to 30 seconds. |
|
A comma-separated list of regular expressions that match source UUIDs in the GTID set used to find the binlog position in the MySQL server. Only the GTID ranges that have sources matching one of these include patterns will be used. May not be used with | |
|
A comma-separated list of regular expressions that match source UUIDs in the GTID set used to find the binlog position in the MySQL server. Only the GTID ranges that have sources matching none of these exclude patterns will be used. May not be used with | |
|
|
Controls whether a tombstone event should be generated after a delete event. |
| empty string |
A semi-colon list of regular expressions that match fully-qualified tables and columns to map a primary key. |
1.3.3.1. Advanced MySQL connector properties
Property | Default | Description |
---|---|---|
|
| A boolean value that specifies whether a separate thread should be used to ensure the connection to the MySQL server/cluster is kept alive. |
|
| Boolean value that specifies whether built-in system tables should be ignored. This applies regardless of the table whitelist or blacklists. By default system tables are excluded from monitoring, and no events are generated when changes are made to any of the system tables. |
|
| An integer value that specifies the maximum number of milliseconds the connector should wait during startup/recovery while polling for persisted data. The default is 100ms. |
|
|
The maximum number of times that the connector should attempt to read persisted history data before the connector recovery fails with an error. The maximum amount of time to wait after receiving no data is |
|
|
Boolean value that specifies if connector should ignore malformed or unknown database statements or stop processing and let operator to fix the issue. The safe default is |
|
|
Boolean value that specifies if connector should should record all DDL statements or (when |
|
|
Specifies whether to use an encrypted connection. The default is
The
The
The
The |
| 0 |
The size of a look-ahead buffer used by the binlog reader. |
|
|
Specifies the criteria for running a snapshot upon startup of the connector. The default is
|
|
|
Controls if and how long the connector holds onto the global MySQL read lock (preventing any updates to the database) while it is performing a snapshot. There are three possible values
|
|
Controls which rows from tables will be included in snapshot. | |
|
| During a snapshot operation, the connector will query each included table to produce a read event for all rows in that table. This parameter determines whether the MySQL connection will pull all results for a table into memory (which is fast but requires large amounts of memory), or whether the results will instead be streamed (can be slower, but will work for very large tables). The value specifies the minimum number of rows a table must contain before the connector will stream results, and defaults to 1,000. Set this parameter to '0' to skip all table size checks and always stream all results during a snapshot. |
|
|
Controls how frequently the heartbeat messages are sent. |
|
|
Controls the naming of the topic to which heartbeat messages are sent. |
|
A semicolon separated list of SQL statements to be executed when a JDBC connection (not the transaction log reading connection) to the database is established. Use doubled semicolon (';;') to use a semicolon as a character and not as a delimiter. | |
|
An interval in milli-seconds that the connector should wait before taking a snapshot after starting up; | |
| Specifies the maximum number of rows that should be read in one go from each table while taking a snapshot. The connector will read the table contents in multiple batches of this size. | |
|
| Positive integer value that specifies the maximum amount of time (in milliseconds) to wait to obtain table locks when performing a snapshot. If table locks cannot be acquired in this time interval, the snapshot will fail. See How the MySQL connector performs database snapshots. |
|
MySQL allows user to insert year value as either 2-digit or 4-digit. In case of two digits the value is automatically mapped to 1970 - 2069 range. This is usually done by database. | |
|
| Whether field names will be sanitized to adhere to Avro naming requirements. |
1.3.4. MySQL connector monitoring metrics
The Debezium MySQL connector has three metric types in addition to the built-in support for JMX metrics that Zookeeper, Kafka, and Kafka Connect have.
- snapshot metrics; for monitoring the connector when performing snapshots
- binlog metrics; for monitoring the connector when reading CDC table data
- schema history metrics; for monitoring the status of the connector’s schema history
1.3.4.1. Snapshot metrics
The MBean is debezium.mysql:type=connector-metrics,context=snapshot,server=<database.server.name>
.
Attribute | Type | Description |
---|---|---|
|
| The total number of tables that are being included in the snapshot. |
|
| The number of tables that the snapshot has yet to copy. |
|
| Whether the connector currently holds a global or table write lock. |
|
| Whether the snapshot was started. |
|
| Whether the snapshot was aborted. |
|
| Whether the snapshot completed. |
|
| The total number of seconds that the snapshot has taken so far, even if not complete. |
|
| Map containing the number of rows scanned for each table in the snapshot. Tables are incrementally added to the Map during processing. Updates every 10,000 rows scanned and upon completing a table. |
|
| The last snapshot event that the connector has read. |
|
| The number of milliseconds since the connector has read and processed the most recent event. |
|
| The total number of events that this connector has seen since last started or reset. |
|
| The number of events that have been filtered by whitelist or blacklist filtering rules configured on the connector. |
|
| The list of tables that are monitored by the connector. |
|
| The length of the queue used to pass events between snapshot reader and the main Kafka Connect loop. |
|
| The free capacity of the queue used to pass events between snapshot reader and the main Kafka Connect loop. |
1.3.4.2. Binlog metrics
The MBean is debezium.mysql:type=connector-metrics,context=binlog,server=<database.server.name>
.
The transaction-related attributes are only available if binlog event buffering is enabled. See binlog.buffer.size in the advanced connector configuration properties for more details.
Attribute | Type | Description |
---|---|---|
|
| Flag that denotes whether the connector is currently connected to the MySQL server. |
|
| The name of the binlog filename that the connector has most recently read. |
|
| The most recent position (in bytes) within the binlog that the connector has read. |
|
| Flag that denotes whether the connector is currently tracking GTIDs from MySQL server. |
|
| The string representation of the most recent GTID set seen by the connector when reading the binlog. |
|
| The last binlog event that the connector has read. |
|
| The number of milliseconds between the last event’s MySQL timestamp and the connector processing it. The values will incorporate any differences between the clocks on the machines where the MySQL server and the MySQL connector are running. |
|
| The total number of events that this connector has seen since last started or reset. |
|
| The number of events that have been skipped by the MySQL connector. Typically events are skipped due to a malformed or unparseable event from MySQL’s binlog. |
|
| The number of events that have been filtered by whitelist or blacklist filtering rules configured on the connector. |
|
| The number of disconnects by the MySQL connector. |
|
| The coordinates of the last received event. |
|
| Transaction identifier of the last processed transaction. |
|
| The last binlog event that the connector has read. |
|
| The number of milliseconds since the connector has read and processed the most recent event. |
|
| The list of tables that are monitored by Debezium. |
|
| The length of the queue used to pass events between binlog reader and the main Kafka Connect loop. |
|
| The free capacity of the queue used to pass events between binlog reader and the main Kafka Connect loop. |
|
| The number of processed transactions that were committed. |
|
| The number of processed transactions that were rolled back and not streamed. |
|
|
The number of transactions that have not conformed to expected protocol |
|
|
The number of transactions that have not fitted into the look-ahead buffer. Should be significantly smaller than |
1.3.4.3. Schema history metrics
The MBean is debezium.mysql:type=connector-metrics,context=schema-history,server=<database.server.name>
.
Attribute | Type | Description |
---|---|---|
|
|
One of |
|
| The time in epoch seconds at what recovery has started. |
|
| The number of changes that were read during recovery phase. |
|
| The total number of schema changes applie during recovery and runtime. |
|
| The number of milliseconds that elapsed since the last change was recovered from the history store. |
|
| The number of milliseconds that elapsed since the last change was applied. |
|
| The string representation of the last change recovered from the history store. |
|
| The string representation of the last applied change. |
1.4. MySQL connector common issues
1.4.1. Configuration and startup errors
The Debezium MySQL connector fails, reports an error, and stops running when the following startup errors occur:
- The connector’s configuration is invalid.
- The connector cannot connect to the MySQL server using the specified connectivity parameters.
- The connector is attempting to restart at a position in the binlog where MySQL no longer has the history available.
If you receive any of these errors, you receive more details in the error message. The error message also contains workarounds where possible.
1.4.3. Kafka Connect stops
There are three scenarios that cause some issues when Kafka Connect stops:
1.4.3.1. Kafka Connect stops gracefully
When Kafka Connect stops gracefully, there is only a short delay while the Debezium MySQL connector tasks are stopped and restarted on new Kafka Connect processes.
1.4.3.2. Kafka Connect process crashes
If Kafka Connect crashes, the process stops and any Debezium MySQL connector tasks terminate without their most recently-processed offsets being recorded. In distributed mode, Kafka Connect restarts the connector tasks on other processes. However, the MySQL connector resumes from the last offset recorded by the earlier processes. This means that the replacement tasks may generate some of the same events processed prior to the crash, creating duplicate events.
Each change event message includes source-specific information about:
- the event origin
- the MySQL server’s event time
- the binlog filename and position
- GTIDs (if used)
1.4.4. MySQL purges binlog files
If the Debezium MySQL connector stops for too long, the MySQL server purges older binlog files and the connector’s last position may be lost. When the connector is restarted, the MySQL server no longer has the starting point and the connector performs another initial snapshot. If the snapshot is disabled, the connector fails with an error.
See How the MySQL connector performs database snapshots for more information on initial snapshots.