Chapter 7. Validating schemas using Kafka serializers/deserializers in Java clients

Service Registry provides client serializers/deserializers (SerDes) for Kafka producer and consumer applications written in Java. Kafka producer applications use serializers to encode messages that conform to a specific event schema. Kafka consumer applications use deserializers to validate that messages have been serialized using the correct schema, based on a specific schema ID. This ensures consistent schema use and helps to prevent data errors at runtime.

This chapter explains how to use Kafka client SerDe in your producer and consumer client applications:

Prerequisites

You have read Chapter 1, Introduction to Service Registry
You have installed Service Registry
You have created Kafka producer and consumer client applications
For more details on Kafka client applications, see Using AMQ Streams on OpenShift.

7.1. Kafka client applications and Service Registry
Copy link

Service Registry decouples schema management from client application configuration. You can enable a Java client application to use a schema from Service Registry by specifying its URL in your client code.

You can store the schemas in the registry to serialize and deserialize messages, which are referenced from your client applications to ensure that the messages that they send and receive are compatible with those schemas. Kafka client applications can push or pull their schemas from Service Registry at runtime.

Schemas can evolve, so you can define rules in Service Registry, for example, to ensure that schema changes are valid and do not break previous versions used by applications. Service Registry checks for compatibility by comparing a modified schema with previous schema versions.

Service Registry schema technologies

Service Registry provides schema registry support for schema technologies such as:

Avro
Protobuf
JSON Schema

These schema technologies can be used by client applications through the Kafka client serializer/deserializer (SerDe) services provided by Service Registry. The maturity and usage of the SerDe classes provided by Service Registry might vary. The sections that follow provide more details about each schema type.

Producer schema configuration

A producer client application uses a serializer to put the messages that it sends to a specific broker topic into the correct data format.

To enable a producer to use Service Registry for serialization:

Define and register your schema with Service Registry (if it does not already exist).
Configure your producer client code with the following:
- URL of Service Registry
- Service Registry serializer to use with messages
- Strategy to map the Kafka message to a schema artifact in Service Registry
- Strategy to look up or register the schema used for serialization in Service Registry

After registering your schema, when you start Kafka and Service Registry, you can access the schema to format messages sent to the Kafka broker topic by the producer. Alternatively, depending on configuration, the producer can automatically register the schema on first use.

If a schema already exists, you can create a new version using the registry REST API based on compatibility rules defined in Service Registry. Versions are used for compatibility checking as a schema evolves. A group ID, artifact ID, and version represents a unique tuple that identifies a schema.

Consumer schema configuration

A consumer client application uses a deserializer to get the messages that it consumes from a specific broker topic into the correct data format.

To enable a consumer to use Service Registry for deserialization:

Define and register your schema with Service Registry (if it does not already exist)
Configure the consumer client code with the following:
- URL of Service Registry
- Service Registry deserializer to use with the messages
- Input data stream for deserialization

Retrieve schemas using a global ID

By default, the schema is retrieved from Service Registry by the deserializer using a global ID, which is specified in the message being consumed. The schema global ID can be located in the message headers or in the message payload, depending on the configuration of the producer application.

When locating the global ID in the message payload, the format of the data begins with a magic byte, used as a signal to consumers, followed by the global ID, and the message data as normal. For example:

# ...
[MAGIC_BYTE]
[GLOBAL_ID]
[MESSAGE DATA]

Then when you start Kafka and Service Registry, you can access the schema to format messages received from the Kafka broker topic.

Retrieve schemas using a content ID

Alternatively, you can configure to retrieve schemas from Service Registry based on the content ID, which is the unique ID of the artifact content. While the global ID is the unique ID of an artifact version.

The content ID does not uniquely identify a version, but uniquely identifies the version content only. If multiple versions share the exact same content, they have a different global ID but the same content ID. Confluent Schema Registry uses content ID by default.

7.2. Strategies to look up a schema in Service Registry
Copy link

The Kafka client serializer uses lookup strategies to determine the artifact ID and global ID under which the message schema is registered in Service Registry. For a given topic and message, you can use different implementations of the ArtifactResolverStrategy Java interface to return a reference to an artifact in the registry.

The classes for each strategy are in the io.apicurio.registry.serde.strategy package. Specific strategy classes for Avro SerDe are in the io.apicurio.registry.serde.avro.strategy package. The default strategy is the TopicIdStrategy, which looks for Service Registry artifacts with the same name as the Kafka topic receiving messages.

Example

public ArtifactReference artifactReference(String topic, boolean isKey, T schema) {
        return ArtifactReference.builder()
                .groupId(null)
                .artifactId(String.format("%s-%s", topic, isKey ? "key" : "value"))
                .build();

The topic parameter is the name of the Kafka topic receiving the message.
The isKey parameter is true when the message key is serialized, and false when the message value is serialized.
The schema parameter is the schema of the message serialized or deserialized.
The ArtifactReference returned contains the artifact ID under which the schema is registered.

Which lookup strategy you use depends on how and where you store your schema. For example, you might use a strategy that uses a record ID if you have different Kafka topics with the same Avro message type.

ArtifaceResolverStrategy interface

The artifact resolver strategy provides a way to map the Kafka topic and message information to an artifact in Service Registry. The common convention for the mapping is to combine the Kafka topic name with the key or value, depending on whether the serializer is used for the Kafka message key or value.

However, you can use alternative conventions for the mapping by using a strategy provided by Service Registry, or by creating a custom Java class that implements io.apicurio.registry.serde.strategy.ArtifactResolverStrategy.

Strategies to return an artifact reference

Service Registry provides the following strategies to return an artifact reference based on an implementation of ArtifaceResolverStrategy:

RecordIdStrategy: Avro-specific strategy that uses the full name of the schema.
TopicRecordIdStrategy: Avro-specific strategy that uses the topic name and the full name of the schema.
TopicIdStrategy: Default strategy that uses the topic name and key or value suffix.
SimpleTopicIdStrategy: Simple strategy that only uses the topic name.

DefaultSchemaResolver interface

The default schema resolver locates and identifies the specific version of the schema registered under the artifact reference provided by the artifact resolver strategy. Every version of every artifact has a single globally unique identifier that can be used to retrieve the content of that artifact. This global ID is included in every Kafka message so that a deserializer can properly fetch the schema from Apicurio Registry.

The default schema resolver can look up an existing artifact version, or it can register one if not found, depending on which strategy is used. You can also provide your own strategy by creating a custom Java class that implements io.apicurio.registry.serde.SchemaResolver. However, it is recommended to use the DefaultSchemaResolver and specify configuration properties instead.

Configuration for registry lookup options

When using the DefaultSchemaResolver, you can configure its behavior using application properties. The following table shows some commonly used examples:

Expand

Table 7.1. Service Registry lookup configuration options
Property	Type	Description	Default
`apicurio.registry.find-latest`	`boolean`	Specify whether the serializer tries to find the latest artifact in the registry for the corresponding group ID and artifact ID.	`false`
`apicurio.registry.use-id`	`String`	Instructs the serializer to write the specified ID to Kafka and instructs the deserializer to use this ID to find the schema.	None
`apicurio.registry.auto-register`	`boolean`	Specify whether the serializer tries to create an artifact in the registry. The JSON Schema serializer does not support this.	`false`
`apicurio.registry.check-period-ms`	`String`	Specify how long to cache the global ID in milliseconds. If not configured, the global ID is fetched every time.	None

7.3. Registering a schema in Service Registry
Copy link

After you have defined a schema in the appropriate format, such as Apache Avro, you can add the schema to Service Registry.

You can add the schema using the following approaches:

Service Registry web console
curl command using the Service Registry REST API
Maven plug-in supplied with Service Registry
Schema configuration added to your client code

Client applications cannot use Service Registry until you have registered your schemas.

Service Registry web console

When Service Registry is installed, you can connect to the web console from the ui endpoint:

http://MY-REGISTRY-URL/ui

From the console, you can add, view and configure schemas. You can also create the rules that prevent invalid content being added to the registry.

Curl command example

 curl -X POST -H "Content-type: application/json; artifactType=AVRO" \
   -H "X-Registry-ArtifactId: share-price" \


   --data '{
     "type":"record",
     "name":"price",
     "namespace":"com.example",
     "fields":[{"name":"symbol","type":"string"},
     {"name":"price","type":"string"}]}'
   https://my-cluster-my-registry-my-project.example.com/apis/registry/v2/groups/my-group/artifacts -s

1: Simple Avro schema artifact.
2: OpenShift route name that exposes Service Registry.

Maven plug-in example

<plugin>
  <groupId>io.apicurio</groupId>
  <artifactId>apicurio-registry-maven-plugin</artifactId>
  <version>${apicurio.version}</version>
  <executions>
      <execution>
        <phase>generate-sources</phase>
        <goals>
            <goal>register</goal>


        </goals>
        <configuration>
            <registryUrl>http://REGISTRY-URL/apis/registry/v2</registryUrl>


            <artifacts>
                <artifact>
                    <groupId>TestGroup</groupId>


                    <artifactId>FullNameRecord</artifactId>
                    <file>${project.basedir}/src/main/resources/schemas/record.avsc</file>
                    <ifExists>FAIL</ifExists>
                </artifact>
                <artifact>
                    <groupId>TestGroup</groupId>
                    <artifactId>ExampleAPI</artifactId>


                    <type>GRAPHQL</type>
                    <file>${project.basedir}/src/main/resources/apis/example.graphql</file>
                    <ifExists>RETURN_OR_UPDATE</ifExists>
                    <canonicalize>true</canonicalize>
                </artifact>
            </artifacts>
        </configuration>
    </execution>
  </executions>
 </plugin>

1: Specify register as the execution goal to upload the schema artifact to the registry.
2: Specify the Service Registry URL with the ../apis/registry/v2 endpoint.
3: Specify the Service Registry artifact group ID.
4: You can upload multiple artifacts using the specified group ID, artifact ID, and location.

Configuration using a producer client example

String registryUrl_node1 = PropertiesUtil.property(clientProperties, "registry.url.node1",
    "https://my-cluster-service-registry-myproject.example.com/apis/registry/v2");


try (RegistryService service = RegistryClient.create(registryUrl_node1)) {
    String artifactId = ApplicationImpl.INPUT_TOPIC + "-value";
    try {
        service.getArtifactMetaData(artifactId);


    } catch (WebApplicationException e) {
        CompletionStage <ArtifactMetaData> csa = service.createArtifact(
            ArtifactType.AVRO,
            artifactId,
            new ByteArrayInputStream(LogInput.SCHEMA$.toString().getBytes())
        );
        csa.toCompletableFuture().get();
    }
}

1: You can register properties against more than one URL node.
2: Check to see if the schema already exists based on the artifact ID.

7.4. Using a schema from a Kafka consumer client
Copy link

This procedure describes how to configure a Kafka consumer client written in Java to use a schema from Service Registry.

Prerequisites

Service Registry is installed
The schema is registered with Service Registry

Procedure

Configure the client with the URL of Service Registry. For example:

String registryUrl = "https://registry.example.com/apis/registry/v2";
Properties props = new Properties();
props.putIfAbsent(SerdeConfig.REGISTRY_URL, registryUrl);

Configure the client with the Service Registry deserializer. For example:

// Configure Kafka settings
props.putIfAbsent(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, SERVERS);
props.putIfAbsent(ConsumerConfig.GROUP_ID_CONFIG, "Consumer-" + TOPIC_NAME);
props.putIfAbsent(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "true");
props.putIfAbsent(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, "1000");
props.putIfAbsent(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
// Configure deserializer settings
props.putIfAbsent(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
    AvroKafkaDeserializer.class.getName());


props.putIfAbsent(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
    AvroKafkaDeserializer.class.getName());

1: The deserializer provided by Service Registry.
2: The deserialization is in Apache Avro JSON format.

7.5. Using a schema from a Kafka producer client
Copy link

This procedure describes how to configure a Kafka producer client written in Java to use a schema from Service Registry.

Prerequisites

Service Registry is installed
The schema is registered with Service Registry

Procedure

Configure the client with the URL of Service Registry. For example:

String registryUrl = "https://registry.example.com/apis/registry/v2";
Properties props = new Properties();
props.putIfAbsent(SerdeConfig.REGISTRY_URL, registryUrl);

Configure the client with the serializer, and the strategy to look up the schema in Service Registry. For example:

props.put(CommonClientConfigs.BOOTSTRAP_SERVERS_CONFIG, "my-cluster-kafka-bootstrap:9092");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, AvroKafkaSerializer.class.getName());


props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, AvroKafkaSerializer.class.getName());


props.put(SerdeConfig.FIND_LATEST_ARTIFACT, FindLatestIdStrategy.class.getName());

1: The serializer for the message key provided by Service Registry.
2: The serializer for the message value provided by Service Registry.
3: The lookup strategy to find the global ID for the schema.

7.6. Using a schema from a Kafka Streams application
Copy link

This procedure describes how to configure a Kafka Streams client written in Java to use an Apache Avro schema from Service Registry.

Prerequisites

Service Registry is installed
The schema is registered with Service Registry

Procedure

Create and configure a Java client with the Service Registry URL:

String registryUrl = "https://registry.example.com/apis/registry/v2";

RegistryService client = RegistryClient.cached(registryUrl);

Configure the serializer and deserializer:

Serializer<LogInput> serializer = new AvroKafkaSerializer<LogInput>();



Deserializer<LogInput> deserializer = new AvroKafkaDeserializer <LogInput>();



Serde<LogInput> logSerde = Serdes.serdeFrom(
    serializer,
    deserializer
);

Map<String, Object> config = new HashMap<>();
config.put(SerdeConfig.REGISTRY_URL, registryUrl);
config.put(AvroKafkaSerdeConfig.USE_SPECIFIC_AVRO_READER, true);
logSerde.configure(config, false);

1: The Avro serializer provided by Service Registry.
2: The Avro deserializer provided by Service Registry.
3: Configures the Service Registry URL and the Avro reader for deserialization in Avro format.

Create the Kafka Streams client:

KStream<String, LogInput> input = builder.stream(
    INPUT_TOPIC,
    Consumed.with(Serdes.String(), logSerde)
);

Chapter 7. Validating schemas using Kafka serializers/deserializers in Java clients

7.1. Kafka client applications and Service Registry
Copy link

Service Registry schema technologies

Producer schema configuration

Consumer schema configuration

7.2. Strategies to look up a schema in Service Registry
Copy link

ArtifaceResolverStrategy interface

Strategies to return an artifact reference

DefaultSchemaResolver interface

Configuration for registry lookup options

7.3. Registering a schema in Service Registry
Copy link

Service Registry web console

Curl command example

Maven plug-in example

Configuration using a producer client example

7.4. Using a schema from a Kafka consumer client
Copy link

7.5. Using a schema from a Kafka producer client
Copy link

7.6. Using a schema from a Kafka Streams application
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat

Making open source more inclusive

About Red Hat Documentation

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 7. Validating schemas using Kafka serializers/deserializers in Java clients

7.1. Kafka client applications and Service RegistryCopy linkLink copied to clipboard!

Service Registry schema technologies

Producer schema configuration

Consumer schema configuration

7.2. Strategies to look up a schema in Service RegistryCopy linkLink copied to clipboard!

ArtifaceResolverStrategy interface

Strategies to return an artifact reference

DefaultSchemaResolver interface

Configuration for registry lookup options

7.3. Registering a schema in Service RegistryCopy linkLink copied to clipboard!

Service Registry web console

Curl command example

Maven plug-in example

Configuration using a producer client example

7.4. Using a schema from a Kafka consumer clientCopy linkLink copied to clipboard!

7.5. Using a schema from a Kafka producer clientCopy linkLink copied to clipboard!

7.6. Using a schema from a Kafka Streams applicationCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat

Making open source more inclusive

About Red Hat Documentation

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

7.1. Kafka client applications and Service Registry
Copy link

7.2. Strategies to look up a schema in Service Registry
Copy link

7.3. Registering a schema in Service Registry
Copy link

7.4. Using a schema from a Kafka consumer client
Copy link

7.5. Using a schema from a Kafka producer client
Copy link

7.6. Using a schema from a Kafka Streams application
Copy link