홈
제품
Red Hat Integration
2020-Q3
Debezium User Guide
Chapter 2. Debezium connector for MySQL

이 콘텐츠는 선택한 언어로 제공되지 않습니다.

Chapter 2. Debezium connector for MySQL

MySQL has a binary log (binlog) that records all operations in the order in which they are committed to the database. This includes changes to table schemas and the data within tables. MySQL uses the binlog for replication and recovery.

The MySQL connector reads the binlog and produces change events for row-level INSERT, UPDATE, and DELETE operations and records the change events in a Kafka topic. Client applications read those Kafka topics.

As MySQL is typically set up to purge binlogs after a specified period of time, the MySQL connector performs and initial consistent snapshot of each of your databases. The MySQL connector reads the binlog from the point at which the snapshot was made.

2.1. Overview of how the MySQL connector works
링크 복사

The Debezium MySQL connector tracks the structure of the tables, performs snapshots, transforms binlog events into Debezium change events and records where those events are recorded in Kafka.

2.1.1. How the MySQL connector uses database schemas
링크 복사

When a database client queries a database, the client uses the database’s current schema. However, the database schema can be changed at any time, which means that the connector must be able to identify what the schema was at the time each insert, update, or delete operation was recorded. Also, a connector cannot just use the current schema because the connector might be processing events that are relatively old and may have been recorded before the tables' schemas were changed.

To handle this, MySQL includes in the binlog the row-level changes to the data and the DDL statements that are applied to the database. As the connector reads the binlog and comes across these DDL statements, it parses them and updates an in-memory representation of each table’s schema. The connector uses this schema representation to identify the structure of the tables at the time of each insert, update, or delete and to produce the appropriate change event. In a separate database history Kafka topic, the connector also records all DDL statements along with the position in the binlog where each DDL statement appeared.

When the connector restarts after having crashed or been stopped gracefully, the connector starts reading the binlog from a specific position, that is, from a specific point in time. The connector rebuilds the table structures that existed at this point in time by reading the database history Kafka topic and parsing all DDL statements up to the point in the binlog where the connector is starting.

This database history topic is for connector use only. The connector can optionally generate schema change events on a different topic that is intended for consumer applications. This is described in how the MySQL connector exposes schema changes.

When the MySQL connector captures changes in a table to which a schema change tool such as gh-ost or pt-online-schema-change is applied then helper tables created during the migration process need to be included among whitelisted tables.

If downstream systems do not need the messages generated by the temporary table then a simple message transform can be written and applied to filter them out.

For information about topic naming conventions, see MySQL connector and Kafka topics.

2.1.2. How the MySQL connector performs database snapshots
링크 복사

When your Debezium MySQL connector is first started, it performs an initial consistent snapshot of your database. The following flow describes how this snapshot is completed.

Note

This is the default snapshot mode which is set as initial in the snapshot.mode property. For other snapshots modes, please check out the MySQL connector configuration properties.

The connector…

Expand

Step	Action
`1`	Grabs a global read lock that blocks writes by other database clients. Note The snapshot itself does not prevent other clients from applying DDL which might interfere with the connector’s attempt to read the binlog position and table schemas. The global read lock is kept while the binlog position is read before released in a later step.
`2`	Starts a transaction with repeatable read semantics to ensure that all subsequent reads within the transaction are done against the consistent snapshot.
`3`	Reads the current binlog position.
`4`	Reads the schema of the databases and tables allowed by the connector’s configuration.
`5`	Releases the global read lock. This now allows other database clients to write to the database.
`6`	Writes the DDL changes to the schema change topic, including all necessary `DROP…` and `CREATE…` DDL statements. Note This happens if applicable.
`7`	Scans the database tables and generates `CREATE` events on the relevant table-specific Kafka topics for each row.
`8`	Commits the transaction.
`9`	Records the completed snapshot in the connector offsets.

2.1.2.1. What happens if the connector fails?
링크 복사

If the connector fails, stops, or is rebalanced while making the initial snapshot, the connector creates a new snapshot once restarted. Once that intial snapshot is completed, the Debezium MySQL connector restarts from the same position in the binlog so it does not miss any updates.

Note

If the connector stops for long enough, MySQL could purge old binlog files and the connector’s position would be lost. If the position is lost, the connector reverts to the initial snapshot for its starting position. For more tips on troubleshooting the Debezium MySQL connector, see MySQL connector common issues.

2.1.2.2. What if Global Read Locks are not allowed?
링크 복사

Some environments do not allow a global read lock. If the Debezium MySQL connector detects that global read locks are not permitted, the connector uses table-level locks instead and performs a snapshot with this method.

Important

The user must have LOCK_TABLES privileges.

The connector…

Expand

Step	Action
`1`	Starts a transaction with repeatable read semantics to ensure that all subsequent reads within the transaction are done against the consistent snapshot.
`2`	Reads and filters the names of the databases and tables.
`3`	Reads the current binlog position.
`4`	Reads the schema of the databases and tables allowed by the connector’s configuration.
`5`	Writes the DDL changes to the schema change topic, including all necessary `DROP…` and `CREATE…` DDL statements. Note This happens if applicable.
`6`	Scans the database tables and generates `CREATE` events on the relevant table-specific Kafka topics for each row.
`7`	Commits the transaction.
`8`	Releases the table-level locks.
`9`	Records the completed snapshot in the connector offsets.

2.1.3. How the MySQL connector exposes schema changes
링크 복사

You can configure the Debezium MySQL connector to produce schema change events that include all DDL statements applied to databases in the MySQL server. The connector writes all of these events to a Kafka topic named <serverName> where serverName is the name of the connector as specified in the database.server.name configuration property.

Important

If you choose to use schema change events, use the schema change topic and do not consume the database history topic.

Note

It is vital that there is a global order of the events in the database schema history. Therefore, the database history topic must not be partitioned. This means that a partition count of 1 must be specified when creating this topic. When relying on auto topic creation, make sure that Kafka’s num.partitions configuration option (the default number of partitions) is set to 1.

2.1.3.1. Schema change topic structure
링크 복사

Each message that is written to the schema change topic contains a message key which includes the name of the connected database used when applying DDL statements:

{
  "schema": {
    "type": "struct",
    "name": "io.debezium.connector.mysql.SchemaChangeKey",
    "optional": false,
    "fields": [
      {
        "field": "databaseName",
        "type": "string",
        "optional": false
      }
    ]
  },
  "payload": {
    "databaseName": "inventory"
  }
}

The schema change event message value contains a structure that includes the DDL statements, the database to which the statements were applied, and the position in the binlog where the statements appeared:

{
  "schema": {
    "type": "struct",
    "name": "io.debezium.connector.mysql.SchemaChangeValue",
    "optional": false,
    "fields": [
      {
        "field": "databaseName",
        "type": "string",
        "optional": false
      },
      {
        "field": "ddl",
        "type": "string",
        "optional": false
      },
      {
        "field": "source",
        "type": "struct",
        "name": "io.debezium.connector.mysql.Source",
        "optional": false,
        "fields": [
          {
            "type": "string",
            "optional": true,
            "field": "version"
          },
          {
            "type": "string",
            "optional": false,
            "field": "name"
          },
          {
            "type": "int64",
            "optional": false,
            "field": "server_id"
          },
          {
            "type": "int64",
            "optional": false,
            "field": "ts_ms"
          },
          {
            "type": "string",
            "optional": true,
            "field": "gtid"
          },
          {
            "type": "string",
            "optional": false,
            "field": "file"
          },
          {
            "type": "int64",
            "optional": false,
            "field": "pos"
          },
          {
            "type": "int32",
            "optional": false,
            "field": "row"
          },
          {
            "type": "boolean",
            "optional": true,
            "default": false,
            "field": "snapshot"
          },
          {
            "type": "int64",
            "optional": true,
            "field": "thread"
          },
          {
            "type": "string",
            "optional": true,
            "field": "db"
          },
          {
            "type": "string",
            "optional": true,
            "field": "table"
          },
          {
            "type": "string",
            "optional": true,
            "field": "query"
          }
        ]
      }
    ]
  },
  "payload": {
    "databaseName": "inventory",
    "ddl": "CREATE TABLE products ( id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY, name VARCHAR(255) NOT NULL, description VARCHAR(512), weight FLOAT ); ALTER TABLE products AUTO_INCREMENT = 101;",
    "source" : {
      "version": "1.2.4.Final",
      "name": "mysql-server-1",
      "server_id": 0,
      "ts_ms": 0,
      "gtid": null,
      "file": "mysql-bin.000003",
      "pos": 154,
      "row": 0,
      "snapshot": true,
      "thread": null,
      "db": null,
      "table": null,
      "query": null
    }
  }
}

2.1.3.2. Important tips about the schema change topic
링크 복사

The ddl field may contain multiple DDL statements. Every statement applies to the database in the databaseName field and appears in the same order as they were applied in the database. The source field is structured exactly as a standard data change event written to table-specific topics. This field is useful to correlate events on different topic.

....
    "payload": {
        "databaseName": "inventory",
        "ddl": "CREATE TABLE products ( id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,...
        "source" : {
            ....
        }
    }
....

What if a client submits DDL statements to multiple databases?

If MySQL applies them atomically, the connector takes the DDL statements in order, groups them by database, and creates a schema change event for each group.
If MySQL applies them individually, the connector creates a separate schema change event for each statement.

Additional resources

If you do not use the schema change topics detailed here, check out the database history topic.

2.1.4. MySQL connector events
링크 복사

The Debezium MySQL connector generates a data change event for each row-level INSERT, UPDATE, and DELETE operation. Each event contains a key and a value. The structure of the key and the value depends on the table that was changed.

Debezium and Kafka Connect are designed around continuous streams of event messages. However, the structure of these events may change over time, which can be difficult for consumers to handle. To address this, each event contains the schema for its content or, if you are using a schema registry, a schema ID that a consumer can use to obtain the schema from the registry. This makes each event self-contained.

The following skeleton JSON shows the basic four parts of a change event. However, how you configure the Kafka Connect converter that you choose to use in your application determines the representation of these four parts in change events. A schema field is in a change event only when you configure the converter to produce it. Likewise, the event key and event payload are in a change event only if you configure a converter to produce it. If you use the JSON converver and you configure it to produce all four basic change event parts, change events have this structure:

{
 "schema": {


   ...
  },
 "payload": {


   ...
 },
 "schema": {


   ...
 },
 "payload": {


   ...
 },
}

Expand

Table 2.1. Overview of change event basic content
Item	Field name	Description
1	`schema`	The first `schema` field is part of the event key. It specifies a Kafka Connect schema that describes what is in the event key’s `payload` portion. In other words, the first `schema` field describes the structure of the primary key, or the unique key if the table does not have a primary key, for the table that was changed. It is possible to override the table’s primary key by setting the `message.key.columns` connector configuration property. In this case, the first schema field describes the structure of the key identified by that property.
2	`payload`	The first `payload` field is part of the event key. It has the structure described by the previous `schema` field and it contains the key for the row that was changed.
3	`schema`	The second `schema` field is part of the event value. It specifies the Kafka Connect schema that describes what is in the event value’s `payload` portion. In other words, the second `schema` describes the structure of the row that was changed. Typically, this schema contains nested schemas.
4	`payload`	The second `payload` field is part of the event value. It has the structure described by the previous `schema` field and it contains the actual data for the row that was changed.

By default, the connector streams change event records to topics with names that are the same as the event’s originating table. See MySQL connector and Kafka topics.

Warning

The MySQL connector ensures that all Kafka Connect schema names adhere to the Avro schema name format. This means that the logical server name must start with a Latin letter or an underscore, that is, a-z, A-Z, or _. Each remaining character in the logical server name and each character in the database and table names must be a Latin letter, a digit, or an underscore, that is, a-z, A-Z, 0-9, or \_. If there is an invalid character it is replaced with an underscore character.

This can lead to unexpected conflicts if the logical server name, a database name, or a table name contains invalid characters, and the only characters that distinguish names from one another are invalid and thus replaced with underscores.

2.1.4.1. Change event keys
링크 복사

A change event’s key contains the schema for the changed table’s key and the changed row’s actual key. Both the schema and its corresponding payload contain a field for each column in the changed table’s PRIMARY KEY (or unique constraint) at the time the connector created the event.

Consider the following customers table, which is followed by an example of a change event key for this table.

Example table

CREATE TABLE customers (
  id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
  first_name VARCHAR(255) NOT NULL,
  last_name VARCHAR(255) NOT NULL,
  email VARCHAR(255) NOT NULL UNIQUE KEY
) AUTO_INCREMENT=1001;

Example change event key

Every change event that captures a change to the customers table has the same event key schema. For as long as the customers table has the previous definition, every change event that captures a change to the customers table has the following key structure. In JSON, it looks like this:

{
 "schema": {


    "type": "struct",
    "name": "mysql-server-1.inventory.customers.Key",


    "optional": false,


    "fields": [


      {
        "field": "id",
        "type": "int32",
        "optional": false
      }
    ]
  },
 "payload": {


    "id": 1001
  }
}

Expand

Table 2.2. Description of change event key
Item	Field name	Description
1	`schema`	The schema portion of the key specifies a Kafka Connect schema that describes what is in the key’s `payload` portion.
2	`mysql-server-1. inventory.customers.Key`	Name of the schema that defines the structure of the key’s payload. This schema describes the structure of the primary key for the table that was changed. Key schema names have the format connector-name.database-name.table-name.`Key`. In this example: `mysql-server-1` is the name of the connector that generated this event. `inventory` is the database that contains the table that was changed. `customers` is the table that was updated.
3	`optional`	Indicates whether the event key must contain a value in its `payload` field. In this example, a value in the key’s payload is required. A value in the key’s payload field is optional when a table does not have a primary key.
4	`fields`	Specifies each field that is expected in the `payload`, including each field’s name, type, and whether it is required.
5	`payload`	Contains the key for the row for which this change event was generated. In this example, the key, contains a single `id` field whose value is `1001`.

2.1.4.2. Change event values
링크 복사

The value in a change event is a bit more complicated than the key. Like the key, the value has a schema section and a payload section. The schema section contains the schema that describes the Envelope structure of the payload section, including its nested fields. Change events for operations that create, update or delete data all have a value payload with an envelope structure.

Consider the same sample table that was used to show an example of a change event key:

CREATE TABLE customers (
  id INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY,
  first_name VARCHAR(255) NOT NULL,
  last_name VARCHAR(255) NOT NULL,
  email VARCHAR(255) NOT NULL UNIQUE KEY
) AUTO_INCREMENT=1001;

The value portion of a change event for a change to this table is described for each event type:

2.1.4.2.1. create events
링크 복사

The following example shows the value portion of a change event that the connector generates for an operation that creates data in the customers table:

{
  "schema": {


    "type": "struct",
    "fields": [
      {
        "type": "struct",
        "fields": [
          {
            "type": "int32",
            "optional": false,
            "field": "id"
          },
          {
            "type": "string",
            "optional": false,
            "field": "first_name"
          },
          {
            "type": "string",
            "optional": false,
            "field": "last_name"
          },
          {
            "type": "string",
            "optional": false,
            "field": "email"
          }
        ],
        "optional": true,
        "name": "mysql-server-1.inventory.customers.Value",


        "field": "before"
      },
      {
        "type": "struct",
        "fields": [
          {
            "type": "int32",
            "optional": false,
            "field": "id"
          },
          {
            "type": "string",
            "optional": false,
            "field": "first_name"
          },
          {
            "type": "string",
            "optional": false,
            "field": "last_name"
          },
          {
            "type": "string",
            "optional": false,
            "field": "email"
          }
        ],
        "optional": true,
        "name": "mysql-server-1.inventory.customers.Value",
        "field": "after"
      },
      {
        "type": "struct",
        "fields": [
          {
            "type": "string",
            "optional": false,
            "field": "version"
          },
          {
            "type": "string",
            "optional": false,
            "field": "connector"
          },
          {
            "type": "string",
            "optional": false,
            "field": "name"
          },
          {
            "type": "int64",
            "optional": false,
            "field": "ts_ms"
          },
          {
            "type": "boolean",
            "optional": true,
            "default": false,
            "field": "snapshot"
          },
          {
            "type": "string",
            "optional": false,
            "field": "db"
          },
          {
            "type": "string",
            "optional": true,
            "field": "table"
          },
          {
            "type": "int64",
            "optional": false,
            "field": "server_id"
          },
          {
            "type": "string",
            "optional": true,
            "field": "gtid"
          },
          {
            "type": "string",
            "optional": false,
            "field": "file"
          },
          {
            "type": "int64",
            "optional": false,
            "field": "pos"
          },
          {
            "type": "int32",
            "optional": false,
            "field": "row"
          },
          {
            "type": "int64",
            "optional": true,
            "field": "thread"
          },
          {
            "type": "string",
            "optional": true,
            "field": "query"
          }
        ],
        "optional": false,
        "name": "io.debezium.connector.mysql.Source",


        "field": "source"
      },
      {
        "type": "string",
        "optional": false,
        "field": "op"
      },
      {
        "type": "int64",
        "optional": true,
        "field": "ts_ms"
      }
    ],
    "optional": false,
    "name": "mysql-server-1.inventory.customers.Envelope"


  },
  "payload": {


    "op": "c",


    "ts_ms": 1465491411815,


    "before": null,


    "after": {


      "id": 1004,
      "first_name": "Anne",
      "last_name": "Kretchmar",
      "email": "annek@noanswer.org"
    },
    "source": {


      "version": "1.2.4.Final",
      "connector": "mysql",
      "name": "mysql-server-1",
      "ts_ms": 0,
      "snapshot": false,
      "db": "inventory",
      "table": "customers",
      "server_id": 0,
      "gtid": null,
      "file": "mysql-bin.000003",
      "pos": 154,
      "row": 0,
      "thread": 7,
      "query": "INSERT INTO customers (first_name, last_name, email) VALUES ('Anne', 'Kretchmar', 'annek@noanswer.org')"
    }
  }
}

Expand

Table 2.3. Descriptions of create event value fields
Item	Field name	Description
1	`schema`	The value’s schema, which describes the structure of the value’s payload. A change event’s value schema is the same in every change event that the connector generates for a particular table.
2	`name`	In the `schema` section, each `name` field specifies the schema for a field in the value’s payload. `mysql-server-1.inventory.customers.Value` is the schema for the payload’s `before` and `after` fields. This schema is specific to the `customers` table. Names of schemas for `before` and `after` fields are of the form `logicalName.tableName.Value`, which ensures that the schema name is unique in the database. This means that when using the Avro converter, the resulting Avro schema for each table in each logical source has its own evolution and history.
3	`name`	`io.debezium.connector.mysql.Source` is the schema for the payload’s `source` field. This schema is specific to the MySQL connector. The connector uses it for all events that it generates.
4	`name`	`mysql-server-1.inventory.customers.Envelope` is the schema for the overall structure of the payload, where `mysql-server-1` is the connector name, `inventory` is the database, and `customers` is the table.
5	`payload`	The value’s actual data. This is the information that the change event is providing. It may appear that the JSON representations of the events are much larger than the rows they describe. This is because the JSON representation must include the schema and the payload portions of the message. However, by using the Avro converter, you can significantly decrease the size of the messages that the connector streams to Kafka topics.
6	`op`	Mandatory string that describes the type of operation that caused the connector to generate the event. In this example, `c` indicates that the operation created a row. Valid values are: `c` = create `u` = update `d` = delete `r` = read (applies to only snapshots)
7	`ts_ms`	Optional field that displays the time at which the connector processed the event. The time is based on the system clock in the JVM running the Kafka Connect task. In the `source` object, `ts_ms` indicates the time that the change was made in the database. By comparing the value for `payload.source.ts_ms` with the value for `payload.ts_ms`, you can determine the lag between the source database update and Debezium.
8	`before`	An optional field that specifies the state of the row before the event occurred. When the `op` field is `c` for create, as it is in this example, the `before` field is `null` since this change event is for new content.
9	`after`	An optional field that specifies the state of the row after the event occurred. In this example, the `after` field contains the values of the new row’s `id`, `first_name`, `last_name`, and `email` columns.
10	`source`	Mandatory field that describes the source metadata for the event. This field contains information that you can use to compare this event with other events, with regard to the origin of the events, the order in which the events occurred, and whether events were part of the same transaction. The source metadata includes: Debezium version Connector name binlog name where the event was recorded binlog position Row within the event If the event was part of a snapshot Name of the database and table that contain the new row ID of the MySQL thread that created the event (non-snapshot only) MySQL server ID (if available) Timestamp for when the change was made in the database If the `binlog_rows_query_log_events` MySQL configuration option is enabled and the connector configuration `include.query` property is enabled, the `source` field also provides the `query` field, which contains the original SQL statement that caused the change event.

2.1.4.2.2. update events
링크 복사

The value of a change event for an update in the sample customers table has the same schema as a create event for that table. Likewise, the event value’s payload has the same structure. However, the event value payload contains different values in an update event. Here is an example of a change event value in an event that the connector generates for an update in the customers table:

{
  "schema": { ... },
  "payload": {
    "before": {


      "id": 1004,
      "first_name": "Anne",
      "last_name": "Kretchmar",
      "email": "annek@noanswer.org"
    },
    "after": {


      "id": 1004,
      "first_name": "Anne Marie",
      "last_name": "Kretchmar",
      "email": "annek@noanswer.org"
    },
    "source": {


      "version": "1.2.4.Final",
      "name": "mysql-server-1",
      "connector": "mysql",
      "name": "mysql-server-1",
      "ts_ms": 1465581,
      "snapshot": false,
      "db": "inventory",
      "table": "customers",
      "server_id": 223344,
      "gtid": null,
      "file": "mysql-bin.000003",
      "pos": 484,
      "row": 0,
      "thread": 7,
      "query": "UPDATE customers SET first_name='Anne Marie' WHERE id=1004"
    },
    "op": "u",


    "ts_ms": 1465581029523

}
}

Expand

Table 2.4. Descriptions of update event value fields
Item	Field name	Description
1	`before`	An optional field that specifies the state of the row before the event occurred. In an update event value, the `before` field contains a field for each table column and the value that was in that column before the database commit. In this example, the `first_name` value is `Anne.`
2	`after`	An optional field that specifies the state of the row after the event occurred. You can compare the `before` and `after` structures to determine what the update to this row was. In the example, the `first_name` value is now `Anne Marie`.
3	`source`	Mandatory field that describes the source metadata for the event. The `source` field structure has the same fields as in a create event, but some values are different, for example, the sample update event is from a different position in the binlog. The source metadata includes: Debezium version Connector name binlog name where the event was recorded binlog position Row within the event If the event was part of a snapshot Name of the database and table that contain the updated row ID of the MySQL thread that created the event (non-snapshot only) MySQL server ID (if available) Timestamp for when the change was made in the database If the `binlog_rows_query_log_events` MySQL configuration option is enabled and the connector configuration `include.query` property is enabled, the `source` field also provides the `query` field, which contains the original SQL statement that caused the change event.
4	`op`	Mandatory string that describes the type of operation. In an update event value, the `op` field value is `u`, signifying that this row changed because of an update.
5	`ts_ms`	Optional field that displays the time at which the connector processed the event. The time is based on the system clock in the JVM running the Kafka Connect task. In the `source` object, `ts_ms` indicates the time that the change was made in the database. By comparing the value for `payload.source.ts_ms` with the value for `payload.ts_ms`, you can determine the lag between the source database update and Debezium.

Note

Updating the columns for a row’s primary/unique key changes the value of the row’s key. When a key changes, Debezium outputs three events: a DELETE event and a tombstone event with the old key for the row, followed by an event with the new key for the row. Details are in the next section.

2.1.4.2.3. Primary key updates
링크 복사

An UPDATE operation that changes a row’s primary key field(s) is known as a primary key change. For a primary key change, in place of an UPDATE event record, the connector emits a DELETE event record for the old key and a CREATE event record for the new (updated) key. These events have the usual structure and content, and in addition, each one has a message header related to the primary key change:

The DELETE event record has __debezium.newkey as a message header. The value of this header is the new primary key for the updated row.
The CREATE event record has __debezium.oldkey as a message header. The value of this header is the previous (old) primary key that the updated row had.

2.1.4.2.4. delete events
링크 복사

The value in a delete change event has the same schema portion as create and update events for the same table. The payload portion in a delete event for the sample customers table looks like this:

{
  "schema": { ... },
  "payload": {
    "before": {


      "id": 1004,
      "first_name": "Anne Marie",
      "last_name": "Kretchmar",
      "email": "annek@noanswer.org"
    },
    "after": null,


    "source": {


      "version": "1.2.4.Final",
      "connector": "mysql",
      "name": "mysql-server-1",
      "ts_ms": 1465581,
      "snapshot": false,
      "db": "inventory",
      "table": "customers",
      "server_id": 223344,
      "gtid": null,
      "file": "mysql-bin.000003",
      "pos": 805,
      "row": 0,
      "thread": 7,
      "query": "DELETE FROM customers WHERE id=1004"
    },
    "op": "d",


    "ts_ms": 1465581902461

}
}

Expand

Table 2.5. Descriptions of delete event value fields
Item	Field name	Description
1	`before`	Optional field that specifies the state of the row before the event occurred. In a delete event value, the `before` field contains the values that were in the row before it was deleted with the database commit.
2	`after`	Optional field that specifies the state of the row after the event occurred. In a delete event value, the `after` field is `null`, signifying that the row no longer exists.
3	`source`	Mandatory field that describes the source metadata for the event. In a delete event value, the `source` field structure is the same as for create and update events for the same table. Many `source` field values are also the same. In a delete event value, the `ts_ms` and `pos` field values, as well as other values, might have changed. But the `source` field in a delete event value provides the same metadata: Debezium version Connector name binlog name where the event was recorded binlog position Row within the event If the event was part of a snapshot Name of the database and table that contain the updated row ID of the MySQL thread that created the event (non-snapshot only) MySQL server ID (if available) Timestamp for when the change was made in the database If the `binlog_rows_query_log_events` MySQL configuration option is enabled and the connector configuration `include.query` property is enabled, the `source` field also provides the `query` field, which contains the original SQL statement that caused the change event.
4	`op`	Mandatory string that describes the type of operation. The `op` field value is `d`, signifying that this row was deleted.
5	`ts_ms`	Optional field that displays the time at which the connector processed the event. The time is based on the system clock in the JVM running the Kafka Connect task. In the `source` object, `ts_ms` indicates the time that the change was made in the database. By comparing the value for `payload.source.ts_ms` with the value for `payload.ts_ms`, you can determine the lag between the source database update and Debezium.

A delete change event record provides a consumer with the information it needs to process the removal of this row. The old values are included because some consumers might require them in order to properly handle the removal.

MySQL connector events are designed to work with Kafka log compaction. Log compaction enables removal of some older messages as long as at least the most recent message for every key is kept. This lets Kafka reclaim storage space while ensuring that the topic contains a complete data set and can be used for reloading key-based state.

Tombstone events

When a row is deleted, the delete event value still works with log compaction, because Kafka can remove all earlier messages that have that same key. However, for Kafka to remove all messages that have that same key, the message value must be null. To make this possible, after Debezium’s MySQL connector emits a delete event, the connector emits a special tombstone event that has the same key but a null value.

2.1.5. How the MySQL connector maps data types
링크 복사

The Debezium MySQL connector represents changes to rows with events that are structured like the table in which the row exists. The event contains a field for each column value. The MySQL data type of that column dictates how the value is represented in the event.

Columns that store strings are defined in MySQL with a character set and collation. The MySQL connector uses the column’s character set when reading the binary representation of the column values in the binlog events. The following table shows how the connector maps the MySQL data types to both literal and semantic types.

literal type : how the value is represented using Kafka Connect schema types
semantic type : how the Kafka Connect schema captures the meaning of the field (schema name)

Expand

MySQL type	Literal type	Semantic type
`BOOLEAN, BOOL`	`BOOLEAN`	n/a
`BIT(1)`	`BOOLEAN`	n/a
`BIT(>1)`	`BYTES`	`io.debezium.data.Bits` The `length` schema parameter contains an integer that represents the number of bits. The `byte[]` contains the bits in little-endian form and is sized to contain the specified number of bits. For example, where n is bits: `numBytes = n/8 + (n%8== 0 ? 0 : 1)`
`TINYINT`	`INT16`	n/a
`SMALLINT[(M)]`	`INT16`	n/a
`MEDIUMINT[(M)]`	`INT32`	n/a
`INT, INTEGER[(M)]`	`INT32`	n/a
`BIGINT[(M)]`	`INT64`	n/a
`REAL[(M,D)]`	`FLOAT32`	n/a
`FLOAT[(M,D)]`	`FLOAT64`	n/a
`DOUBLE[(M,D)]`	`FLOAT64`	n/a
`CHAR(M)]`	`STRING`	n/a
`VARCHAR(M)]`	`STRING`	n/a
`BINARY(M)]`	`BYTES` or `STRING`	n/a Either the raw bytes (the default), a base64-encoded String, or a hex-encoded String, based on the binary handling mode setting
`VARBINARY(M)]`	`BYTES` or `STRING`	n/a Either the raw bytes (the default), a base64-encoded String, or a hex-encoded String, based on the binary handling mode setting
`TINYBLOB`	`BYTES` or `STRING`	n/a Either the raw bytes (the default), a base64-encoded String, or a hex-encoded String, based on the binary handling mode setting
`TINYTEXT`	`STRING`	n/a
`BLOB`	`BYTES` or `STRING`	n/a Either the raw bytes (the default), a base64-encoded String, or a hex-encoded String, based on the binary handling mode setting
`TEXT`	`STRING`	n/a
`MEDIUMBLOB`	`BYTES` or `STRING`	n/a Either the raw bytes (the default), a base64-encoded String, or a hex-encoded String, based on the binary handling mode setting
`MEDIUMTEXT`	`STRING`	n/a
`LONGBLOB`	`BYTES` or `STRING`	n/a Either the raw bytes (the default), a base64-encoded String, or a hex-encoded String, based on the binary handling mode setting
`LONGTEXT`	`STRING`	n/a
`JSON`	`STRING`	`io.debezium.data.Json` Contains the string representation of a `JSON` document, array, or scalar.
`ENUM`	`STRING`	`io.debezium.data.Enum` The `allowed` schema parameter contains the comma-separated list of allowed values.
`SET`	`STRING`	`io.debezium.data.EnumSet` The `allowed` schema parameter contains the comma-separated list of allowed values.
`YEAR[(2\|4)]`	`INT32`	`io.debezium.time.Year`
`TIMESTAMP[(M)]`	`STRING`	`io.debezium.time.ZonedTimestamp` In ISO 8601 format with microsecond precision. MySQL allows `M` to be in the range of `0-6`.

2.1.5.1. Temporal values
링크 복사

Excluding the TIMESTAMP data type, MySQL temporal types depend on the value of the time.precision.mode configuration property. For TIMESTAMP columns whose default value is specified as CURRENT_TIMESTAMP or NOW, the value 1970-01-01 00:00:00 is used as the default value in the Kafka Connect schema.

MySQL allows zero-values for DATE, `DATETIME, and TIMESTAMP columns because zero-values are sometimes preferred over null values. The MySQL connector represents zero-values as null values when the column definition allows null values, or as the epoch day when the column does not allow null values.

Temporal values without time zones

The DATETIME type represents a local date and time such as "2018-01-13 09:48:27". As you can see, there is no time zone information. Such columns are converted into epoch milli-seconds or micro-seconds based on the column’s precision by using UTC. The TIMESTAMP type represents a timestamp without time zone information and is converted by MySQL from the server (or session’s) current time zone into UTC when writing and vice versa when reading back the value. For example:

DATETIME with a value of 2018-06-20 06:37:03 becomes 1529476623000.
TIMESTAMP with a value of 2018-06-20 06:37:03 becomes 2018-06-20T13:37:03Z.

Such columns are converted into an equivalent io.debezium.time.ZonedTimestamp in UTC based on the server (or session’s) current time zone. The time zone will be queried from the server by default. If this fails, it must be specified explicitly by the database.serverTimezone connector configuration property. For example, if the database’s time zone (either globally or configured for the connector by means of the database.serverTimezone property) is "America/Los_Angeles", the TIMESTAMP value "2018-06-20 06:37:03" is represented by a ZonedTimestamp with the value "2018-06-20T13:37:03Z".

Note that the time zone of the JVM running Kafka Connect and Debezium does not affect these conversions.

More details about properties related to termporal values are in the documentation for MySQL connector configuration properties.

time.precision.mode=adaptive_time_microseconds(default)

The MySQL connector determines the literal type and semantic type based on the column’s data type definition so that events represent exactly the values in the database. All time fields are in microseconds. Only positive TIME field values in the range of 00:00:00.000000 to 23:59:59.999999 can be captured correctly.

Expand

MySQL type	Literal type	Semantic type
`DATE`	`INT32`	`io.debezium.time.Date` Represents the number of days since the epoch.
`TIME[(M)]`	`INT64`	`io.debezium.time.MicroTime` Represents the time value in microseconds and does not include time zone information. MySQL allows `M` to be in the range of `0-6`.
`DATETIME, DATETIME(0), DATETIME(1), DATETIME(2), DATETIME(3)`	`INT64`	`io.debezium.time.Timestamp` Represents the number of milliseconds since the epoch and does not include time zone information.
`DATETIME(4), DATETIME(5), DATETIME(6)`	`INT64`	`io.debezium.time.MicroTimestamp` Represents the number of microseconds since the epoch and does not include time zone information.

time.precision.mode=connect

The MySQL connector uses the predefined Kafka Connect logical types. This approach is less precise than the default approach and the events could be less precise if the database column has a fractional second precision value of greater than 3. Only values in the range of 00:00:00.000 to 23:59:59.999 can be handled. Set time.precision.mode=connect only if you can ensure that the TIME values in your tables never exceed the supported ranges. The connect setting is expected to be removed in a future version of Debezium.

Expand

MySQL type	Literal type	Semantic type
`DATE`	`INT32`	`org.apache.kafka.connect.data.Date` Represents the number of days since the epoch.
`TIME[(M)]`	`INT64`	`org.apache.kafka.connect.data.Time` Represents the time value in microseconds since midnight and does not include time zone information.
`DATETIME[(M)]`	`INT64`	`org.apache.kafka.connect.data.Timestamp` Represents the number of milliseconds since epoch, and does not include time zone information.

2.1.5.2. Decimal values
링크 복사

Decimals are handled via the decimal.handling.mode property. See MySQL connector configuration properties for more details.

decimal.handling.mode=precise

Expand

MySQL type	Literal type	Semantic type
`NUMERIC[(M[,D])]`	`BYTES`	`org.apache.kafka.connect.data.Decimal` The `scale` schema parameter contains an integer that represents how many digits the decimal point shifted.
`DECIMAL[(M[,D])]`	`BYTES`	`org.apache.kafka.connect.data.Decimal` The `scale` schema parameter contains an integer that represents how many digits the decimal point shifted.

decimal.handling.mode=double

Expand

MySQL type	Literal type	Semantic type
`NUMERIC[(M[,D])]`	`FLOAT64`	n/a
`DECIMAL[(M[,D])]`	`FLOAT64`	n/a

decimal.handling.mode=string

Expand

MySQL type	Literal type	Semantic type
`NUMERIC[(M[,D])]`	`STRING`	n/a
`DECIMAL[(M[,D])]`	`STRING`	n/a

2.1.5.3. Boolean values
링크 복사

MySQL handles the BOOLEAN value internally in a specific way. The BOOLEAN column is internally mapped to TINYINT(1) datatype. When the table is created during streaming then it uses proper BOOLEAN mapping as Debezium receives the original DDL. During snapshot Debezium executes SHOW CREATE TABLE to obtain table definition which returns TINYINT(1) for both BOOLEAN and TINYINT(1) columns.

Debezium then has no way how to obtain the original type mapping and will map to TINYINT(1).

An example configuration is

converters=boolean
boolean.type=io.debezium.connector.mysql.converters.TinyIntOneToBooleanConverter
boolean.selector=db1.table1.*, db1.table2.column1

2.1.5.4. Spatial data types
링크 복사

Currently, the Debezium MySQL connector supports the following spatial data types:

Expand

MySQL type	Literal type	Semantic type
`GEOMETRY, LINESTRING, POLYGON, MULTIPOINT, MULTILINESTRING, MULTIPOLYGON, GEOMETRYCOLLECTION`	`STRUCT`	`io.debezium.data.geometry.Geometry` Contains a structure with two fields: `srid (INT32`: a spatial reference system id that defines the type of geometry object stored in the structure `wkb (BYTES)`: a binary representation of the geometry object encoded in the Well-Known-Binary (wkb) format. See the Open Geospatial Consortium for more details.

2.1.6. The MySQL connector and Kafka topics
링크 복사

The Debezium MySQL connector writes events for all INSERT, UPDATE, and DELETE operations from a single table to a single Kafka topic. The Kafka topic naming convention is as follows:

serverName.databaseName.tableName

For example, suppose that fulfillment is the server name and inventory is the database that contains three tables: orders, customers, and products. The Debezium MySQL connector emits events to three Kafka topics, one for each table in the database:

fulfillment.inventory.orders
fulfillment.inventory.customers
fulfillment.inventory.products

2.1.7. MySQL supported topologies
링크 복사

The Debezium MySQL connector supports the following MySQL topologies:

Standalone

When a single MySQL server is used, the server must have the binlog enabled (and optionally GTIDs enabled) so the Debezium MySQL connector can monitor the server. This is often acceptable, since the binary log can also be used as an incremental backup. In this case, the MySQL connector always connects to and follows this standalone MySQL server instance.

Master and slave

The Debezium MySQL connector can follow one of the masters or one of the slaves (if that slave has its binlog enabled), but the connector only sees changes in the cluster that are visible to that server. Generally, this is not a problem except for the multi-master topologies.

The connector records its position in the server’s binlog, which is different on each server in the cluster. Therefore, the connector will need to follow just one MySQL server instance. If that server fails, it must be restarted or recovered before the connector can continue.

High available clusters

A variety of high availability solutions exist for MySQL, and they make it far easier to tolerate and almost immediately recover from problems and failures. Most HA MySQL clusters use GTIDs so that slaves are able to keep track of all changes on any of the master.

Multi-master

A multi-master MySQL topology uses one or more MySQL slaves that each replicate from multiple masters. This is a powerful way to aggregate the replication of multiple MySQL clusters, and requires using GTIDs.

The Debezium MySQL connector can use these multi-master MySQL slaves as sources, and can fail over to different multi-master MySQL slaves as long as thew new slave is caught up to the old slave (e.g., the new slave has all of the transactions that were last seen on the first slave). This works even if the connector is only using a subset of databases and/or tables, as the connector can be configured to include or exclude specific GTID sources when attempting to reconnect to a new multi-master MySQL slave and find the correct position in the binlog.

Hosted

There is support for the Debezium MySQL connector to use hosted options such as Amazon RDS and Amazon Aurora.

Important

Because these hosted options do not allow a global read lock, table-level locks are used to create the consistent snapshot.

2.2. Setting up MySQL server
링크 복사

2.2.1. Creating a MySQL user for Debezium
링크 복사

You have to define a MySQL user with appropriate permissions on all databases that the Debezium MySQL connector monitors.

Prerequisites

You must have a MySQL server.
You must know basic SQL commands.

Procedure

Create the MySQL user:

mysql> CREATE USER 'user'@'localhost' IDENTIFIED BY 'password';

Grant the required permissions to the user:
```
mysql> GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'user' IDENTIFIED BY 'password';
```
See permissions explained for notes on each permission.
Important
If using a hosted option such as Amazon RDS or Amazon Aurora that do not allow a global read lock, table-level locks are used to create the consistent snapshot. In this case, you need to also grant LOCK_TABLES permissions to the user that you create. See Overview of how the MySQL connector works for more details.

Finalize the user’s permissions:
```
mysql> FLUSH PRIVILEGES;
```

Expand

Table 2.6. Permissions explained
Permission/item	Description
`SELECT`	Enables the connector to select rows from tables in databases Note This is only used when performing a snapshot.
`RELOAD`	When performing a snapshot, enables the connector to use the `FLUSH` statement to clear or reload internal caches, flush tables, or acquire locks.
`SHOW DATABASES`	When performing a snapshot, enables the connector to see database names by issuing the `SHOW DATABASE` statement.
`REPLICATION SLAVE`	Enables the connector to connect to and read the MySQL server binlog.
`REPLICATION CLIENT`	Enables the connector to run the following commands: `SHOW MASTER STATUS` `SHOW SLAVE STATUS` `SHOW BINARY LOGS` Important This is always required for the connector.
`ON`	Identifies the database to which the permission apply.
`TO 'user'`	Specifies the user to which the permissions are granted.
`IDENTIFIED BY 'password'`	Specifies the password for the user.

2.2.2. Enabling the MySQL binlog for Debezium
링크 복사

You must enable binary logging for MySQL replication. The binary logs record transaction updates for replication tools to propagate changes.

Prerequisites

You must have a MySQL server.
You should have appropriate MySQL user privileges.

Procedure

Check if the log-bin option is already on or not.

mysql> SELECT variable_value as "BINARY LOGGING STATUS (log-bin) ::"
FROM information_schema.global_variables WHERE variable_name='log_bin';

If OFF, configure your MySQL server configuration file with the following binlog config properties:

server-id         = 223344


log_bin           = mysql-bin


binlog_format     = ROW


binlog_row_image  = FULL


expire_logs_days  = 10

Confirm your changes by checking the binlog status:

mysql> SELECT variable_value as "BINARY LOGGING STATUS (log-bin) ::"
FROM information_schema.global_variables WHERE variable_name='log_bin';

Expand

Table 2.7. Binlog configuration properties
Number	Property	Description
1	`server-id`	The value for the `server-id` must be unique for each server and replication client within the MySQL cluster. When the MySQL connector is setup, we assign the connector a unique server ID.
2	`log_bin`	The value of `log_bin` is the base name of the sequence of binlog files.
3	`binlog_format`	The `binlog-format` must be set to `ROW` or `row`.
4	`binlog_row_image`	The `binlog_row_image` must be set to `FULL` or `full`.
5	`expire_logs_days`	This is the number of days for automatic binlog file removal. The default is `0` which means no automatic removal. Set the value to match the needs of your environment.

2.2.3. Enabling MySQL Global Transaction Identifiers for Debezium
링크 복사

Global transaction identifiers (GTIDs) uniquely identify transactions that occur on a server within a cluster. Though not required for the Debezium MySQL connector, using GTIDs simplifies replication and allows you to more easily confirm if master and slave servers are consistent.

Note

GTIDs are only available from MySQL 5.6.5 and later. See the MySQL documentation for more details.

Prerequisites

You must have a MySQL server.
You must know basic SQL commands.
You must have access to the MySQL configuration file.

Procedure

Enable gtid_mode:
```
mysql> gtid_mode=ON
```
Enable enforce_gtid_consistency:
```
mysql> enforce_gtid_consistency=ON
```

Confirm the changes:

mysql> show global variables like '%GTID%';

response

+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| enforce_gtid_consistency | ON    |
| gtid_mode                | ON    |
+--------------------------+-------+

Expand

Table 2.8. Options explained
Permission/item	Description
`gtid_mode`	Boolean that specifies whether GTID mode of the MySQL server is enabled or not. `ON` = enabled `OFF` = disabled
`enforce_gtid_consistency`	Boolean that instructs the server whether to enforce GTID consistency by allowing the execution of statements that can be logged in a transactionally safe manner. Required when using GTIDs. `ON` = enabled `OFF` = disabled

2.2.4. Setting up session timeouts for Debezium
링크 복사

When an initial consistent snapshot is made for large databases, your established connection could timeout while the tables are being read. You can prevent this behavior by configuring interactive_timeout and wait_timeout in your MySQL configuration file.

Prerequisites

You must have a MySQL server.
You must know basic SQL commands.
You must have access to the MySQL configuration file.

Procedure

Configure interactive_timeout:

mysql> interactive_timeout=<duration-in-seconds>

Configure wait_timeout:

mysql> wait_timeout= <duration-in-seconds>

Expand

Table 2.9. Options explained
Permission/item	Description
`interactive_timeout`	The number of seconds the server waits for activity on an interactive connection before closing it. See MySQL’s documentation for more details.
`wait_timeout`	The number of seconds the server waits for activity on a noninteractive connection before closing it. See MySQL’s documentation for more details.

2.2.5. Enabling query log events for Debezium
링크 복사

You might want to see the original SQL statement for each binlog event. Enabling the binlog_rows_query_log_events option in the MySQL configuration file allows you to do this.

Note

This option is available for MySQL 5.6 and later.

Prerequisites

You must have a MySQL server.
You must know basic SQL commands.
You must have access to the MySQL configuration file.

Procedure

Enable binlog_rows_query_log_events:
```
mysql> binlog_rows_query_log_events=ON
```

Additional information

binlog_rows_query_log_events is set to a Boolean value that enables/disables support for including the original SQL statement in the binlog entry.

ON = enabled
OFF = disabled

2.3. Deploying the MySQL connector
링크 복사

2.3.1. Installing the MySQL connector
링크 복사

Installing the Debezium MySQL connector is a simple process whereby you only need to download the JAR, extract it to your Kafka Connect environment, and ensure the plug-in’s parent directory is specified in your Kafka Connect environment.

Prerequisites

Kafka and Kafka Connect are installed.
MySQL Server is installed and set up to run the Debezium MySQL connector.

Procedure

Download the Debezium MySQL connector.
Extract the files into your Kafka Connect environment.
Add the plug-in’s parent directory to your Kafka Connect plugin.path:
```
plugin.path=/kafka/connect
```
This example assumes you have extracted the Debezium MySQL connector to the /kafka/connect/debezium-connector-mysql path.

Restart your Kafka Connect process. This ensures the new JARs are picked up.

2.3.2. Configuring the MySQL connector
링크 복사

Typically, you configure the Debezium MySQL connector in a .yaml file using the configuration properties available for the connector.

Prerequisites

You should have completed the installation process for the connector.

Procedure

Set the "name" of the connector in the .yaml file.
Set the configuration properties that you require for your Debezium MySQL connector.

Tip

For a complete list of configuration properties, see MySQL connector configuration properties.

MySQL connector example configuration

  apiVersion: kafka.strimzi.io/v1beta1
  kind: KafkaConnector
  metadata:
    name: inventory-connector


    labels:
      strimzi.io/cluster: my-connect-cluster
  spec:
    class: io.debezium.connector.mysql.MySqlConnector
    tasksMax: 1


    config:


      database.hostname: mysql


      database.port: 3306
      database.user: debezium
      database.password: dbz
      database.server.id: 184054


      database.server.name: dbserver1


      database.whitelist: inventory


      database.history.kafka.bootstrap.servers: my-cluster-kafka-bootstrap:9092


      database.history.kafka.topic: schema-changes.inventory

Expand

Table 2.10. Descriptions of connector configuration settings
Item	Description
1	The name of the connector.
2	Only one task should operate at any one time. Because the MySQL connector reads the MySQL server’s `binlog`, using a single connector task ensures proper order and event handling. The Kafka Connect service uses connectors to start one or more tasks that do the work, and it automatically distributes the running tasks across the cluster of Kafka Connect services. If any of the services stop or crash, those tasks will be redistributed to running services.
3	The connector’s configuration.
4	The database host, which is the name of the container running the MySQL server (`mysql`).
5	A unique server ID and name. The server name is the logical identifier for the MySQL server or cluster of servers. This name will be used as the prefix for all Kafka topics.
6	Only changes in the `inventory` database will be detected.
7	The connector will store the history of the database schemas in Kafka using this broker (the same broker to which you are sending events) and topic name. Upon restart, the connector will recover the schemas of the database that existed at the point in time in the `binlog` when the connector should begin reading.

2.3.3. Adding MySQL connector configuration to Kafka Connect
링크 복사

You can use a provided Debezium container to deploy a Debezium MySQL connector. In this procedure, you build a custom Kafka Connect container image for Debezium, configure the Debezium connector as needed, and then add your connector configuration to your Kafka Connect environment.

Prerequisites

Podman or Docker is installed and you have sufficient rights to create and manage containers.
You installed the Debezium MySQL connector archive.

Procedure

Extract the Debezium MySQL connector archive to create a directory structure for the connector plug-in, for example:
```
tree ./my-plugins/
./my-plugins/
├── debezium-connector-mysql
│   ├── ...
```
Create and publish a custom image for running your Debezium connector:
1. Create a new Dockerfile by using registry.redhat.io/amq7/amq-streams-kafka-25-rhel7:1.5.0 as the base image. In the following example, you would replace my-plugins with the name of your plug-ins directory:
  FROM registry.redhat.io/amq7/amq-streams-kafka-25-rhel7:1.5.0 USER root:root COPY ./my-plugins/ /opt/kafka/plugins/ USER 1001
  Before Kafka Connect starts running the connector, Kafka Connect loads any third-party plug-ins that are in the /opt/kafka/plugins directory.
2. Build the container image. For example, if you saved the Dockerfile that you created in the previous step as debezium-container-for-mysql, and if the Dockerfile is in the current directory, then you would run the following command:
  podman build -t debezium-container-for-mysql:latest .
3. Push your custom image to your container registry, for example:
  podman push debezium-container-for-mysql:latest
4. Point to the new container image. Do one of the following:
  - Edit the spec.image property of the KafkaConnector custom resource. If set, this property overrides the STRIMZI_DEFAULT_KAFKA_CONNECT_IMAGE variable in the Cluster Operator. For example:
    
    apiVersion: kafka.strimzi.io/v1beta1 kind: KafkaConnector metadata: name: my-connect-cluster spec: #... image: debezium-container-for-mysql
  - In the install/cluster-operator/050-Deployment-strimzi-cluster-operator.yaml file, edit the STRIMZI_DEFAULT_KAFKA_CONNECT_IMAGE variable to point to the new container image and reinstall the Cluster Operator. If you edit this file you must apply it to your OpenShift cluster.
Create a KafkaConnector custom resource that defines your Debezium MySQL connector instance. See the connector configuration example.
Apply the connector instance, for example:
oc apply -f inventory-connector.yaml
This registers inventory-connector and the connector starts to run against the inventory database.
Verify that the connector was created and has started to capture changes in the specified database. You can verify the connector instance by watching the Kafka Connect log output as, for example, inventory-connector starts.
1. Display the Kafka Connect log output:
  oc logs $(oc get pods -o name -l strimzi.io/name=my-connect-cluster-connect)
2. Review the log output to verify that the initial snapshot has been executed. You should see something like the following lines:
  ... INFO Starting snapshot for ... ... INFO Snapshot is using user 'debezium' ...

Results

When the connector starts, it performs a consistent snapshot of the MySQL databases that the connector is configured for. The connector then starts generating data change events for row-level operations and streaming change event records to Kafka topics.

2.3.4. MySQL connector configuration properties
링크 복사

The configuration properties listed here are required to run the Debezium MySQL connector. There are also advanced MySQL connector properties whose default value rarely needs to be changed and therefore, they do not need to be specified in the connector configuration.

The Debezium MySQL connector supports pass-through configuration when creating the Kafka producer and consumer. See information about pass-through properties at the end of this section, and also see the Kafka documentation for more details about pass-through properties.

Expand

Property	Default	Description
`name`		Unique name for the connector. Attempting to register again with the same name will fail. (This property is required by all Kafka Connect connectors.)
`connector.class`		The name of the Java class for the connector. Always use a value of `io.debezium.connector.mysql.MySqlConnector` for the MySQL connector.
`tasks.max`	`1`	The maximum number of tasks that should be created for this connector. The MySQL connector always uses a single task and therefore does not use this value, so the default is always acceptable.
`database.hostname`		IP address or hostname of the MySQL database server.
`database.port`	`3306`	Integer port number of the MySQL database server.
`database.user`		Name of the MySQL database to use when connecting to the MySQL database server.
`database.password`		Password to use when connecting to the MySQL database server.
`database.server.name`		Logical name that identifies and provides a namespace for the particular MySQL database server/cluster being monitored. The logical name should be unique across all other connectors, since it is used as a prefix for all Kafka topic names emanating from this connector. Only alphanumeric characters and underscores should be used.
`database.server.id`	random	A numeric ID of this database client, which must be unique across all currently-running database processes in the MySQL cluster. This connector joins the MySQL database cluster as another server (with this unique ID) so it can read the binlog. By default, a random number is generated between 5400 and 6400, though we recommend setting an explicit value.
`database.history.kafka.topic`		The full name of the Kafka topic where the connector will store the database schema history.
`database.history.kafka.bootstrap.servers`		A list of host/port pairs that the connector will use for establishing an initial connection to the Kafka cluster. This connection will be used for retrieving database schema history previously stored by the connector, and for writing each DDL statement read from the source database. This should point to the same Kafka cluster used by the Kafka Connect process.
`database.whitelist`	empty string	An optional comma-separated list of regular expressions that match database names to be monitored; any database name not included in the whitelist will be excluded from monitoring. By default all databases will be monitored. May not be used with `database.blacklist`.
`database.blacklist`	empty string	An optional comma-separated list of regular expressions that match database names to be excluded from monitoring; any database name not included in the blacklist will be monitored. May not be used with `database.whitelist`.
`table.whitelist`	empty string	An optional comma-separated list of regular expressions that match fully-qualified table identifiers for tables to be monitored; any table not included in the whitelist will be excluded from monitoring. Each identifier is of the form databaseName.tableName. By default the connector will monitor every non-system table in each monitored database. May not be used with `table.blacklist`.
`table.blacklist`	empty string	An optional comma-separated list of regular expressions that match fully-qualified table identifiers for tables to be excluded from monitoring; any table not included in the blacklist will be monitored. Each identifier is of the form databaseName.tableName. May not be used with `table.whitelist`.
`column.blacklist`	empty string	An optional comma-separated list of regular expressions that match the fully-qualified names of columns that should be excluded from change event message values. Fully-qualified names for columns are of the form databaseName.tableName.columnName, or databaseName.schemaName.tableName.columnName.
`column.truncate.to.length.chars`	n/a	An optional comma-separated list of regular expressions that match the fully-qualified names of character-based columns whose values should be truncated in the change event message values if the field values are longer than the specified number of characters. Multiple properties with different lengths can be used in a single configuration, although in each the length must be a positive integer. Fully-qualified names for columns are of the form databaseName.tableName.columnName.
`column.mask.with.length.chars`	n/a	An optional comma-separated list of regular expressions that match the fully-qualified names of character-based columns whose values should be replaced in the change event message values with a field value consisting of the specified number of asterisk (``) characters. Multiple properties with different lengths can be used in a single configuration, although in each the length must be a positive integer or zero. Fully-qualified names for columns are of the form databaseName*.tableName.columnName.
`column.mask.hash.hashAlgorithm.with.salt.salt`	n/a	An optional comma-separated list of regular expressions that match the fully-qualified names of character-based columns whose values should be pseudonyms in the change event message values with a field value consisting of the hashed value using the algorithm `hashAlgorithm` and salt `salt`. Based on the used hash function referential integrity is kept while data is pseudonymized. Supported hash functions are described in the {link-java7-standard-names}[MessageDigest section] of the Java Cryptography Architecture Standard Algorithm Name Documentation. The hash is automatically shortened to the length of the column. Multiple properties with different lengths can be used in a single configuration, although in each the length must be a positive integer or zero. Fully-qualified names for columns are of the form databaseName.tableName.columnName. Example: `column.mask.hash.SHA-256.with.salt.CzQMA0cB5K = inventory.orders.customerName, inventory.shipment.customerName` where `CzQMA0cB5K` is a randomly selected salt. Note: Depending on the `hashAlgorithm` used, the `salt` selected and the actual data set, the resulting masked data set may not be completely anonymized.
`column.propagate.source.type`	n/a	An optional comma-separated list of regular expressions that match the fully-qualified names of columns whose original type and length should be added as a parameter to the corresponding field schemas in the emitted change messages. The schema parameters `__Debezium.source.column.type`, `__Debezium.source.column.length` and `_Debezium.source.column.scale` will be used to propagate the original type name and length (for variable-width types), respectively. Useful to properly size corresponding columns in sink databases. Fully-qualified names for columns are of the form databaseName.tableName.columnName, or databaseName.schemaName.tableName.columnName.
`datatype.propagate.source.type`	n/a	An optional comma-separated list of regular expressions that match the database-specific data type name of columns whose original type and length should be added as a parameter to the corresponding field schemas in the emitted change messages. The schema parameters `__debezium.source.column.type`, `__debezium.source.column.length` and `__debezium.source.column.scale` will be used to propagate the original type name and length (for variable-width types), respectively. Useful to properly size corresponding columns in sink databases. Fully-qualified data type names are of the form databaseName.tableName.typeName, or databaseName.schemaName.tableName.typeName. See how the MySQL connector maps data types for the list of MySQL-specific data type names.
`time.precision.mode`	`adaptive_time_microseconds`	Time, date, and timestamps can be represented with different kinds of precision, including: `adaptive_time_microseconds` (the default) captures the date, datetime and timestamp values exactly as in the database using either millisecond, microsecond, or nanosecond precision values based on the database column’s type, with the exception of TIME type fields, which are always captured as microseconds; or `connect` always represents time and timestamp values using Kafka Connect’s built-in representations for Time, Date, and Timestamp, which uses millisecond precision regardless of the database columns' precision.
`decimal.handling.mode`	`precise`	Specifies how the connector should handle values for `DECIMAL` and `NUMERIC` columns: `precise` (the default) represents them precisely using `java.math.BigDecimal` values represented in change events in a binary form; or `double` represents them using `double` values, which may result in a loss of precision but will be far easier to use. `string` option encodes values as formatted string which is easy to consume but a semantic information about the real type is lost.
`bigint.unsigned.handling.mode`	`long`	Specifies how BIGINT UNSIGNED columns should be represented in change events, including: `precise` uses `java.math.BigDecimal` to represent values, which are encoded in the change events using a binary representation and Kafka Connect’s `org.apache.kafka.connect.data.Decimal` type; `long` (the default) represents values using Java’s `long`, which may not offer the precision but will be far easier to use in consumers. `long` is usually the preferable setting. Only when working with values larger than 2^63, the `precise` setting should be used as those values cannot be conveyed using `long`.
`include.schema.changes`	`true`	Boolean value that specifies whether the connector should publish changes in the database schema to a Kafka topic with the same name as the database server ID. Each schema change will be recorded using a key that contains the database name and whose value includes the DDL statement(s). This is independent of how the connector internally records database history. The default is `true`.
`include.query`	`false`	Boolean value that specifies whether the connector should include the original SQL query that generated the change event. Note: This option requires MySQL be configured with the binlog_rows_query_log_events option set to ON. Query will not be present for events generated from the snapshot process. WARNING: Enabling this option may expose tables or fields explicitly blacklisted or masked by including the original SQL statement in the change event. For this reason this option is defaulted to 'false'.
`event.processing.failure.handling.mode`	`fail`	Specifies how the connector should react to exceptions during deserialization of binlog events. `fail` will propagate the exception (indicating the problematic event and its binlog offset), causing the connector to stop. `warn` will cause the problematic event to be skipped and the problematic event and its binlog offset to be logged. `skip` will cause problematic event will be skipped.
`inconsistent.schema.handling.mode`	`fail`	Specifies how the connector should react to binlog events that relate to tables that are not present in internal schema representation (i.e. internal representation is not consistent with database) `fail` will throw an exception (indicating the problematic event and its binlog offset), causing the connector to stop. `warn` will cause the problematic event to be skipped and the problematic event and its binlog offset to be logged. `skip` will cause the problematic event to be skipped.
`max.queue.size`	`8192`	Positive integer value that specifies the maximum size of the blocking queue into which change events read from the database log are placed before they are written to Kafka. This queue can provide backpressure to the binlog reader when, for example, writes to Kafka are slower or if Kafka is not available. Events that appear in the queue are not included in the offsets periodically recorded by this connector. Defaults to 8192, and should always be larger than the maximum batch size specified in the `max.batch.size` property.
`max.batch.size`	`2048`	Positive integer value that specifies the maximum size of each batch of events that should be processed during each iteration of this connector. Defaults to 2048.
`poll.interval.ms`	`1000`	Positive integer value that specifies the number of milliseconds the connector should wait during each iteration for new change events to appear. Defaults to 1000 milliseconds, or 1 second.
`connect.timeout.ms`	`30000`	A positive integer value that specifies the maximum time in milliseconds this connector should wait after trying to connect to the MySQL database server before timing out. Defaults to 30 seconds.
`gtid.source.includes`		A comma-separated list of regular expressions that match source UUIDs in the GTID set used to find the binlog position in the MySQL server. Only the GTID ranges that have sources matching one of these include patterns will be used. May not be used with `gtid.source.excludes`.
`gtid.source.excludes`		A comma-separated list of regular expressions that match source UUIDs in the GTID set used to find the binlog position in the MySQL server. Only the GTID ranges that have sources matching none of these exclude patterns will be used. May not be used with `gtid.source.includes`.
`tombstones.on.delete`	`true`	Controls whether a tombstone event should be generated after a delete event. When `true` the delete operations are represented by a delete event and a subsequent tombstone event. When `false` only a delete event is sent. Emitting the tombstone event (the default behavior) allows Kafka to completely delete all events pertaining to the given key once the source record got deleted.
`message.key.columns`	empty string	A semi-colon list of regular expressions that match fully-qualified tables and columns to map a primary key. Each item (regular expression) must match the `<fully-qualified table>:<a comma-separated list of columns>` representing the custom key. Fully-qualified tables could be defined as databaseName.tableName.
`binary.handling.mode`	bytes	Specifies how binary (`blob`, `binary`, `varbinary`, etc.) columns should be represented in change events, including: `bytes` represents binary data as byte array (default), `base64` represents binary data as base64-encoded String, `hex` represents binary data as hex-encoded (base16) String

2.3.4.1. Advanced MySQL connector properties
링크 복사

The following table describes advanced MySQL connector properties.

Expand

Property	Default	Description
`connect.keep.alive`	`true`	A boolean value that specifies whether a separate thread should be used to ensure the connection to the MySQL server/cluster is kept alive.
`table.ignore.builtin`	`true`	Boolean value that specifies whether built-in system tables should be ignored. This applies regardless of the table whitelist or blacklists. By default system tables are excluded from monitoring, and no events are generated when changes are made to any of the system tables.
`database.history.kafka.recovery.poll.interval.ms`	`100`	An integer value that specifies the maximum number of milliseconds the connector should wait during startup/recovery while polling for persisted data. The default is 100ms.
`database.history.kafka.recovery.attempts`	`4`	The maximum number of times that the connector should attempt to read persisted history data before the connector recovery fails with an error. The maximum amount of time to wait after receiving no data is `recovery.attempts` x `recovery.poll.interval.ms`.
`database.history.skip.unparseable.ddl`	`false`	Boolean value that specifies if connector should ignore malformed or unknown database statements or stop processing and let operator to fix the issue. The safe default is `false`. Skipping should be used only with care as it can lead to data loss or mangling when binlog is processed.
`database.history.store.only.monitored.tables.ddl`	`false`	Boolean value that specifies if connector should should record all DDL statements or (when `true`) only those that are relevant to tables that are monitored by Debezium (via filter configuration). The safe default is `false`. This feature should be used only with care as the missing data might be necessary when the filters are changed.
`database.ssl.mode`	`disabled`	Specifies whether to use an encrypted connection. The default is `disabled`, and specifies to use an unencrypted connection. The `preferred` option establishes an encrypted connection if the server supports secure connections but falls back to an unencrypted connection otherwise. The `required` option establishes an encrypted connection but will fail if one cannot be made for any reason. The `verify_ca` option behaves like `required` but additionally it verifies the server TLS certificate against the configured Certificate Authority (CA) certificates and will fail if it doesn’t match any valid CA certificates. The `verify_identity` option behaves like `verify_ca` but additionally verifies that the server certificate matches the host of the remote connection.
`binlog.buffer.size`	0	The size of a look-ahead buffer used by the binlog reader. Under specific conditions it is possible that MySQL binlog contains uncommitted data finished by a `ROLLBACK` statement. Typical examples are using savepoints or mixing temporary and regular table changes in a single transaction. When a beginning of a transaction is detected then Debezium tries to roll forward the binlog position and find either `COMMIT` or `ROLLBACK` so it can decide whether the changes from the transaction will be streamed or not. The size of the buffer defines the maximum number of changes in the transaction that Debezium can buffer while searching for transaction boundaries. If the size of transaction is larger than the buffer then Debezium needs to rewind and re-read the events that has not fit into the buffer while streaming. Value `0` disables buffering. Disabled by default. Note: This feature should be considered an incubating one. We need a feedback from customers but it is expected that it is not completely polished.
`snapshot.mode`	`initial`	Specifies the criteria for running a snapshot upon startup of the connector. The default is `initial`, and specifies the connector can run a snapshot only when no offsets have been recorded for the logical server name. The `when_needed` option specifies that the connector run a snapshot upon startup whenever it deems it necessary (when no offsets are available, or when a previously recorded offset specifies a binlog location or GTID that is not available in the server). The `never` option specifies that the connect should never use snapshots and that upon first startup with a logical server name the connector should read from the beginning of the binlog; this should be used with care, as it is only valid when the binlog is guaranteed to contain the entire history of the database. If you don’t need the topics to contain a consistent snapshot of the data but only need them to have the changes since the connector was started, you can use the `schema_only` option, where the connector only snapshots the schemas (not the data). `schema_only_recovery` is a recovery option for an existing connector to recover a corrupted or lost database history topic, or to periodically "clean up" a database history topic (which requires infinite retention) that may be growing unexpectedly.
`snapshot.locking.mode`	`minimal`	Controls if and how long the connector holds onto the global MySQL read lock (preventing any updates to the database) while it is performing a snapshot. There are three possible values `minimal`, `extended`, and `none`. `minimal` The connector holds the global read lock for just the initial portion of the snapshot while the connector reads the database schemas and other metadata. The remaining work in a snapshot involves selecting all rows from each table, and this can be done in a consistent fashion using the REPEATABLE READ transaction even when the global read lock is no longer held and while other MySQL clients are updating the database. `extended` In some cases where clients are submitting operations that MySQL excludes from REPEATABLE READ semantics, it may be desirable to block all writes for the entire duration of the snapshot. For these such cases, use this option. `none` Will prevent the connector from acquiring any table locks during the snapshot process. This value can be used with all snapshot modes but it is safe to use if and only if no schema changes are happening while the snapshot is taken. Note that for tables defined with MyISAM engine, the tables would still be locked despite this property being set as MyISAM acquires a table lock. This behavior is unlike InnoDB engine which acquires row level locks.
`snapshot.select.statement.overrides`		Controls which rows from tables will be included in snapshot. This property contains a comma-separated list of fully-qualified tables (DB_NAME.TABLE_NAME). Select statements for the individual tables are specified in further configuration properties, one for each table, identified by the id `snapshot.select.statement.overrides.[DB_NAME].[TABLE_NAME]`. The value of those properties is the SELECT statement to use when retrieving data from the specific table during snapshotting. A possible use case for large append-only tables is setting a specific point where to start (resume) snapshotting, in case a previous snapshotting was interrupted. Note: This setting has impact on snapshots only. Events captured from binlog are not affected by it at all.
`min.row.count.to.stream.results`	`1000`	During a snapshot operation, the connector will query each included table to produce a read event for all rows in that table. This parameter determines whether the MySQL connection will pull all results for a table into memory (which is fast but requires large amounts of memory), or whether the results will instead be streamed (can be slower, but will work for very large tables). The value specifies the minimum number of rows a table must contain before the connector will stream results, and defaults to 1,000. Set this parameter to '0' to skip all table size checks and always stream all results during a snapshot.
`heartbeat.interval.ms`	`0`	Controls how frequently the heartbeat messages are sent. This property contains an interval in milli-seconds that defines how frequently the connector sends heartbeat messages into a heartbeat topic. Set this parameter to `0` to not send heartbeat messages at all. Disabled by default.
`heartbeat.topics.prefix`	`__debezium-heartbeat`	Controls the naming of the topic to which heartbeat messages are sent. The topic is named according to the pattern `<heartbeat.topics.prefix>.<server.name>`.
`database.initial.statements`		A semicolon separated list of SQL statements to be executed when a JDBC connection (not the transaction log reading connection) to the database is established. Use doubled semicolon (';;') to use a semicolon as a character and not as a delimiter. Note: The connector may establish JDBC connections at its own discretion, so this should typically be used for configuration of session parameters only, but not for executing DML statements.
`snapshot.delay.ms`		An interval in milli-seconds that the connector should wait before taking a snapshot after starting up; Can be used to avoid snapshot interruptions when starting multiple connectors in a cluster, which may cause re-balancing of connectors.
`snapshot.fetch.size`		Specifies the maximum number of rows that should be read in one go from each table while taking a snapshot. The connector will read the table contents in multiple batches of this size.
`snapshot.lock.timeout.ms`	`10000`	Positive integer value that specifies the maximum amount of time (in milliseconds) to wait to obtain table locks when performing a snapshot. If table locks cannot be acquired in this time interval, the snapshot will fail. See How the MySQL connector performs database snapshots.
`enable.time.adjuster`		MySQL allows user to insert year value as either 2-digit or 4-digit. In case of two digits the value is automatically mapped to 1970 - 2069 range. This is usually done by database. Set to `true` (the default) when Debezium should do the conversion. Set to `false` when conversion is fully delegated to the database.
`sanitize.field.names`	`true` when connector configuration explicitly specifies the `key.converter` or `value.converter` parameters to use Avro, otherwise defaults to `false`.	Whether field names will be sanitized to adhere to Avro naming requirements.
`skipped.operations`		comma-separated list of oplog operations that will be skipped during streaming. The operations include: `c` for inserts, `u` for updates, and `d` for deletes. By default, no operations are skipped.

2.3.4.2. Pass-through configuration properties
링크 복사

The MySQL connector also supports pass-through configuration properties that are used when creating the Kafka producer and consumer. Specifically, all connector configuration properties that begin with the database.history.producer. prefix are used (without the prefix) when creating the Kafka producer that writes to the database history. All properties that begin with the prefix database.history.consumer. are used (without the prefix) when creating the Kafka consumer that reads the database history upon connector start-up.

For example, the following connector configuration properties can be used to secure connections to the Kafka broker:

database.history.producer.security.protocol=SSL
database.history.producer.ssl.keystore.location=/var/private/ssl/kafka.server.keystore.jks
database.history.producer.ssl.keystore.password=test1234
database.history.producer.ssl.truststore.location=/var/private/ssl/kafka.server.truststore.jks
database.history.producer.ssl.truststore.password=test1234
database.history.producer.ssl.key.password=test1234
database.history.consumer.security.protocol=SSL
database.history.consumer.ssl.keystore.location=/var/private/ssl/kafka.server.keystore.jks
database.history.consumer.ssl.keystore.password=test1234
database.history.consumer.ssl.truststore.location=/var/private/ssl/kafka.server.truststore.jks
database.history.consumer.ssl.truststore.password=test1234
database.history.consumer.ssl.key.password=test1234

2.3.4.3. Pass-through properties for database drivers
링크 복사

In addition to the pass-through properties for the Kafka producer and consumer, there are pass-through properties for database drivers. These properties have the database. prefix. For example, database.tinyInt1isBit=false is passed to the JDBC URL.

2.3.5. MySQL connector monitoring metrics
링크 복사

The Debezium MySQL connector has three metric types in addition to the built-in support for JMX metrics that Zookeeper, Kafka, and Kafka Connect have.

snapshot metrics; for monitoring the connector when performing snapshots
binlog metrics; for monitoring the connector when reading CDC table data
schema history metrics; for monitoring the status of the connector’s schema history

Refer to the monitoring documentation for details of how to expose these metrics via JMX.

2.3.5.1. Snapshot metrics
링크 복사

The MBean is debezium.mysql:type=connector-metrics,context=snapshot,server=<database.server.name>.

Expand

Attributes	Type	Description
`LastEvent`	`string`	The last snapshot event that the connector has read.
`MilliSecondsSinceLastEvent`	`long`	The number of milliseconds since the connector has read and processed the most recent event.
`TotalNumberOfEventsSeen`	`long`	The total number of events that this connector has seen since last started or reset.
`NumberOfEventsFiltered`	`long`	The number of events that have been filtered by whitelist or blacklist filtering rules configured on the connector.
`MonitoredTables`	`string[]`	The list of tables that are monitored by the connector.
`QueueTotalCapacity`	`int`	The length the queue used to pass events between the snapshotter and the main Kafka Connect loop.
`QueueRemainingCapacity`	`int`	The free capacity of the queue used to pass events between the snapshotter and the main Kafka Connect loop.
`TotalTableCount`	`int`	The total number of tables that are being included in the snapshot.
`RemainingTableCount`	`int`	The number of tables that the snapshot has yet to copy.
`SnapshotRunning`	`boolean`	Whether the snapshot was started.
`SnapshotAborted`	`boolean`	Whether the snapshot was aborted.
`SnapshotCompleted`	`boolean`	Whether the snapshot completed.
`SnapshotDurationInSeconds`	`long`	The total number of seconds that the snapshot has taken so far, even if not complete.
`RowsScanned`	`Map<String, Long>`	Map containing the number of rows scanned for each table in the snapshot. Tables are incrementally added to the Map during processing. Updates every 10,000 rows scanned and upon completing a table.

The Debezium MySQL connector also provides the following custom snapshot metrics:

Expand

Attribute	Type	Description
`HoldingGlobalLock`	`boolean`	Whether the connector currently holds a global or table write lock.

2.3.5.2. Binlog metrics
링크 복사

The MBean is debezium.mysql:type=connector-metrics,context=binlog,server=<database.server.name>.

Note

The transaction-related attributes are only available if binlog event buffering is enabled. See binlog.buffer.size in the advanced connector configuration properties for more details.

Expand

Attributes	Type	Description
`LastEvent`	`string`	The last streaming event that the connector has read.
`MilliSecondsSinceLastEvent`	`long`	The number of milliseconds since the connector has read and processed the most recent event.
`TotalNumberOfEventsSeen`	`long`	The total number of events that this connector has seen since last started or reset.
`NumberOfEventsFiltered`	`long`	The number of events that have been filtered by whitelist or blacklist filtering rules configured on the connector.
`MonitoredTables`	`string[]`	The list of tables that are monitored by the connector.
`QueueTotalCapacity`	`int`	The length the queue used to pass events between the streamer and the main Kafka Connect loop.
`QueueRemainingCapacity`	`int`	The free capacity of the queue used to pass events between the streamer and the main Kafka Connect loop.
`Connected`	`boolean`	Flag that denotes whether the connector is currently connected to the database server.
`MilliSecondsBehindSource`	`long`	The number of milliseconds between the last change event’s timestamp and the connector processing it. The values will incoporate any differences between the clocks on the machines where the database server and the connector are running.
`NumberOfCommittedTransactions`	`long`	The number of processed transactions that were committed.
`SourceEventPosition`	`Map<String, String>`	The coordinates of the last received event.
`LastTransactionId`	`string`	Transaction identifier of the last processed transaction.

The Debezium MySQL connector also provides the following custom binlog metrics:

Expand

Attribute	Type	Description
`BinlogFilename`	`string`	The name of the binlog filename that the connector has most recently read.
`BinlogPosition`	`long`	The most recent position (in bytes) within the binlog that the connector has read.
`IsGtidModeEnabled`	`boolean`	Flag that denotes whether the connector is currently tracking GTIDs from MySQL server.
`GtidSet`	`string`	The string representation of the most recent GTID set seen by the connector when reading the binlog.
`NumberOfSkippedEvents`	`long`	The number of events that have been skipped by the MySQL connector. Typically events are skipped due to a malformed or unparseable event from MySQL’s binlog.
`NumberOfDisconnects`	`long`	The number of disconnects by the MySQL connector.
`NumberOfRolledBackTransactions`	`long`	The number of processed transactions that were rolled back and not streamed.
`NumberOfNotWellFormedTransactions`	`long`	The number of transactions that have not conformed to expected protocol `BEGIN` + `COMMIT`/`ROLLBACK`. Should be `0` under normal conditions.
`NumberOfLargeTransactions`	`long`	The number of transactions that have not fitted into the look-ahead buffer. Should be significantly smaller than `NumberOfCommittedTransactions` and `NumberOfRolledBackTransactions` for optimal performance.

2.3.5.3. Schema history metrics
링크 복사

The MBean is debezium.mysql:type=connector-metrics,context=schema-history,server=<database.server.name>.

Expand

Attributes	Type	Description
`Status`	`string`	One of `STOPPED`, `RECOVERING` (recovering history from the storage), `RUNNING` describing the state of the database history.
`RecoveryStartTime`	`long`	The time in epoch seconds at what recovery has started.
`ChangesRecovered`	`long`	The number of changes that were read during recovery phase.
`ChangesApplied`	`long`	the total number of schema changes applied during recovery and runtime.
`MilliSecondsSinceLastRecoveredChange`	`long`	The number of milliseconds that elapsed since the last change was recovered from the history store.
`MilliSecondsSinceLastAppliedChange`	`long`	The number of milliseconds that elapsed since the last change was applied.
`LastRecoveredChange`	`string`	The string representation of the last change recovered from the history store.
`LastAppliedChange`	`string`	The string representation of the last applied change.

2.4. MySQL connector common issues
링크 복사

2.4.1. Configuration and startup errors
링크 복사

The Debezium MySQL connector fails, reports an error, and stops running when the following startup errors occur:

The connector’s configuration is invalid.
The connector cannot connect to the MySQL server using the specified connectivity parameters.
The connector is attempting to restart at a position in the binlog where MySQL no longer has the history available.

If you receive any of these errors, you receive more details in the error message. The error message also contains workarounds where possible.

2.4.2. MySQL is unavailable
링크 복사

If your MySQL server becomes unavailable, the Debezium MySQL connector fails with an error and the connector stops. You simply need to restart the connector when the server is available.

2.4.2.1. Using GTIDs
링크 복사

If you have GTIDs enabled and a highly available MySQL cluster, restart the connector immediately as the connector will simply connect to a different MySQL server in the cluster, find the location in the server’s binlog that represents the last transaction, and begin reading the new server’s binlog from that specific location.

2.4.2.2. Not Using GTIDs
링크 복사

If you do not have GTIDs enabled, the connector only records the binlog position of the MySQL server to which it was connected. In order to restart from the correct binlog position, you must reconnect to that specific server.

2.4.3. Kafka Connect stops
링크 복사

There are three scenarios that cause some issues when Kafka Connect stops:

2.4.3.1. Kafka Connect stops gracefully
링크 복사

When Kafka Connect stops gracefully, there is only a short delay while the Debezium MySQL connector tasks are stopped and restarted on new Kafka Connect processes.

2.4.3.2. Kafka Connect process crashes
링크 복사

If Kafka Connect crashes, the process stops and any Debezium MySQL connector tasks terminate without their most recently-processed offsets being recorded. In distributed mode, Kafka Connect restarts the connector tasks on other processes. However, the MySQL connector resumes from the last offset recorded by the earlier processes. This means that the replacement tasks may generate some of the same events processed prior to the crash, creating duplicate events.

Tip

Each change event message includes source-specific information about:

the event origin
the MySQL server’s event time
the binlog filename and position
GTIDs (if used)

2.4.3.3. Kafka becomes unavailable
링크 복사

The Kafka Connect framework records Debezium change events in Kafka using the Kafka producer API. If the Kafka brokers become unavailable, the Debezium MySQL connector pauses until the connection is reestablished and the connector resumes where it last left off.

2.4.4. MySQL purges binlog files
링크 복사

If the Debezium MySQL connector stops for too long, the MySQL server purges older binlog files and the connector’s last position may be lost. When the connector is restarted, the MySQL server no longer has the starting point and the connector performs another initial snapshot. If the snapshot is disabled, the connector fails with an error.

Tip

See How the MySQL connector performs database snapshots for more information on initial snapshots.

이 콘텐츠는 선택한 언어로 제공되지 않습니다.

Chapter 2. Debezium connector for MySQL

2.1. Overview of how the MySQL connector works링크 복사링크가 클립보드에 복사되었습니다!

2.1.1. How the MySQL connector uses database schemas링크 복사링크가 클립보드에 복사되었습니다!

2.1.2. How the MySQL connector performs database snapshots링크 복사링크가 클립보드에 복사되었습니다!

2.1.2.1. What happens if the connector fails?링크 복사링크가 클립보드에 복사되었습니다!

2.1.2.2. What if Global Read Locks are not allowed?링크 복사링크가 클립보드에 복사되었습니다!

2.1.3. How the MySQL connector exposes schema changes링크 복사링크가 클립보드에 복사되었습니다!

2.1.3.1. Schema change topic structure링크 복사링크가 클립보드에 복사되었습니다!

2.1.3.2. Important tips about the schema change topic링크 복사링크가 클립보드에 복사되었습니다!

2.1.4. MySQL connector events링크 복사링크가 클립보드에 복사되었습니다!

2.1.4.1. Change event keys링크 복사링크가 클립보드에 복사되었습니다!

2.1.4.2. Change event values링크 복사링크가 클립보드에 복사되었습니다!

2.1.4.2.1. create events링크 복사링크가 클립보드에 복사되었습니다!

2.1.4.2.2. update events링크 복사링크가 클립보드에 복사되었습니다!

2.1.4.2.3. Primary key updates링크 복사링크가 클립보드에 복사되었습니다!

2.1.4.2.4. delete events링크 복사링크가 클립보드에 복사되었습니다!

2.1.5. How the MySQL connector maps data types링크 복사링크가 클립보드에 복사되었습니다!

2.1.5.1. Temporal values링크 복사링크가 클립보드에 복사되었습니다!

2.1.5.2. Decimal values링크 복사링크가 클립보드에 복사되었습니다!

2.1.5.3. Boolean values링크 복사링크가 클립보드에 복사되었습니다!

2.1.5.4. Spatial data types링크 복사링크가 클립보드에 복사되었습니다!

2.1.6. The MySQL connector and Kafka topics링크 복사링크가 클립보드에 복사되었습니다!

2.1.7. MySQL supported topologies링크 복사링크가 클립보드에 복사되었습니다!

2.2. Setting up MySQL server링크 복사링크가 클립보드에 복사되었습니다!

2.2.1. Creating a MySQL user for Debezium링크 복사링크가 클립보드에 복사되었습니다!

2.2.2. Enabling the MySQL binlog for Debezium링크 복사링크가 클립보드에 복사되었습니다!

2.2.3. Enabling MySQL Global Transaction Identifiers for Debezium링크 복사링크가 클립보드에 복사되었습니다!

2.2.4. Setting up session timeouts for Debezium링크 복사링크가 클립보드에 복사되었습니다!

2.2.5. Enabling query log events for Debezium링크 복사링크가 클립보드에 복사되었습니다!

2.3. Deploying the MySQL connector링크 복사링크가 클립보드에 복사되었습니다!

2.3.1. Installing the MySQL connector링크 복사링크가 클립보드에 복사되었습니다!

2.3.2. Configuring the MySQL connector링크 복사링크가 클립보드에 복사되었습니다!

2.3.3. Adding MySQL connector configuration to Kafka Connect링크 복사링크가 클립보드에 복사되었습니다!

2.3.4. MySQL connector configuration properties링크 복사링크가 클립보드에 복사되었습니다!

2.3.4.1. Advanced MySQL connector properties링크 복사링크가 클립보드에 복사되었습니다!

2.3.4.2. Pass-through configuration properties링크 복사링크가 클립보드에 복사되었습니다!

2.3.4.3. Pass-through properties for database drivers링크 복사링크가 클립보드에 복사되었습니다!

2.3.5. MySQL connector monitoring metrics링크 복사링크가 클립보드에 복사되었습니다!

2.3.5.1. Snapshot metrics링크 복사링크가 클립보드에 복사되었습니다!

2.3.5.2. Binlog metrics링크 복사링크가 클립보드에 복사되었습니다!

2.3.5.3. Schema history metrics링크 복사링크가 클립보드에 복사되었습니다!

2.4. MySQL connector common issues링크 복사링크가 클립보드에 복사되었습니다!

2.4.1. Configuration and startup errors링크 복사링크가 클립보드에 복사되었습니다!

2.4.2. MySQL is unavailable링크 복사링크가 클립보드에 복사되었습니다!

2.4.2.1. Using GTIDs링크 복사링크가 클립보드에 복사되었습니다!

2.4.2.2. Not Using GTIDs링크 복사링크가 클립보드에 복사되었습니다!

2.4.3. Kafka Connect stops링크 복사링크가 클립보드에 복사되었습니다!

2.4.3.1. Kafka Connect stops gracefully링크 복사링크가 클립보드에 복사되었습니다!

2.4.3.2. Kafka Connect process crashes링크 복사링크가 클립보드에 복사되었습니다!

2.4.3.3. Kafka becomes unavailable링크 복사링크가 클립보드에 복사되었습니다!

2.4.4. MySQL purges binlog files링크 복사링크가 클립보드에 복사되었습니다!

자세한 정보

평가판, 구매 및 판매

커뮤니티

Red Hat 소개

보다 포괄적 수용을 위한 오픈 소스 용어 교체

Red Hat 문서 정보

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

2.1. Overview of how the MySQL connector works
링크 복사

2.1.1. How the MySQL connector uses database schemas
링크 복사

2.1.2. How the MySQL connector performs database snapshots
링크 복사

2.1.2.1. What happens if the connector fails?
링크 복사

2.1.2.2. What if Global Read Locks are not allowed?
링크 복사

2.1.3. How the MySQL connector exposes schema changes
링크 복사

2.1.3.1. Schema change topic structure
링크 복사

2.1.3.2. Important tips about the schema change topic
링크 복사

2.1.4. MySQL connector events
링크 복사

2.1.4.1. Change event keys
링크 복사

2.1.4.2. Change event values
링크 복사

2.1.4.2.1. create events
링크 복사

2.1.4.2.2. update events
링크 복사

2.1.4.2.3. Primary key updates
링크 복사

2.1.4.2.4. delete events
링크 복사

2.1.5. How the MySQL connector maps data types
링크 복사

2.1.5.1. Temporal values
링크 복사

2.1.5.2. Decimal values
링크 복사

2.1.5.3. Boolean values
링크 복사

2.1.5.4. Spatial data types
링크 복사

2.1.6. The MySQL connector and Kafka topics
링크 복사

2.1.7. MySQL supported topologies
링크 복사

2.2. Setting up MySQL server
링크 복사

2.2.1. Creating a MySQL user for Debezium
링크 복사

2.2.2. Enabling the MySQL binlog for Debezium
링크 복사

2.2.3. Enabling MySQL Global Transaction Identifiers for Debezium
링크 복사

2.2.4. Setting up session timeouts for Debezium
링크 복사

2.2.5. Enabling query log events for Debezium
링크 복사

2.3. Deploying the MySQL connector
링크 복사

2.3.1. Installing the MySQL connector
링크 복사

2.3.2. Configuring the MySQL connector
링크 복사

2.3.3. Adding MySQL connector configuration to Kafka Connect
링크 복사

2.3.4. MySQL connector configuration properties
링크 복사

2.3.4.1. Advanced MySQL connector properties
링크 복사

2.3.4.2. Pass-through configuration properties
링크 복사

2.3.4.3. Pass-through properties for database drivers
링크 복사

2.3.5. MySQL connector monitoring metrics
링크 복사

2.3.5.1. Snapshot metrics
링크 복사

2.3.5.2. Binlog metrics
링크 복사

2.3.5.3. Schema history metrics
링크 복사

2.4. MySQL connector common issues
링크 복사

2.4.1. Configuration and startup errors
링크 복사

2.4.2. MySQL is unavailable
링크 복사

2.4.2.1. Using GTIDs
링크 복사

2.4.2.2. Not Using GTIDs
링크 복사

2.4.3. Kafka Connect stops
링크 복사

2.4.3.1. Kafka Connect stops gracefully
링크 복사

2.4.3.2. Kafka Connect process crashes
링크 복사

2.4.3.3. Kafka becomes unavailable
링크 복사

2.4.4. MySQL purges binlog files
링크 복사