Rechercher

Ce contenu n'est pas disponible dans la langue sélectionnée.

Chapter 3. Getting started

download PDF

Streams for Apache Kafka is distributed in a ZIP file that contains installation artifacts for the Kafka components.

Note

The Kafka Bridge has separate installation files. For information on installing and using the Kafka Bridge, see Using the Streams for Apache Kafka Bridge.

3.1. Installation environment

Streams for Apache Kafka runs on Red Hat Enterprise Linux. The host (node) can be a physical or virtual machine (VM). Use the installation files provided with Streams for Apache Kafka to install Kafka components. You can install Kafka in a single-node or multi-node environment.

Single-node environment
A single-node Kafka cluster runs instances of Kafka components on a single host. This configuration is not suitable for a production environment.
Multi-node environment
A multi-node Kafka cluster runs instances of Kafka components on multiple hosts.

We recommended that you run Kafka and other Kafka components, such as Kafka Connect, on separate hosts. By running the components in this way, it’s easier to maintain and upgrade each component.

Kafka clients establish a connection to the Kafka cluster using the bootstrap.servers configuration property. If you are using Kafka Connect, for example, the Kafka Connect configuration properties must include a bootstrap.servers value that specifies the hostname and port of the hosts where the Kafka brokers are running. If the Kafka cluster is running on more than one host with multiple Kafka brokers, you specify a hostname and port for each broker. Each Kafka broker is identified by a node.id.

3.1.1. Data storage considerations

An efficient data storage infrastructure is essential to the optimal performance of Streams for Apache Kafka.

Block storage is required. File storage, such as NFS, does not work with Kafka.

Choose from one of the following options for your block storage:

  • Cloud-based block storage solutions, such as Amazon Elastic Block Store (EBS)
  • Local storage
  • Storage Area Network (SAN) volumes accessed by a protocol such as Fibre Channel or iSCSI

3.1.2. File systems

Kafka uses a file system for storing messages. Streams for Apache Kafka is compatible with the XFS and ext4 file systems, which are commonly used with Kafka. Consider the underlying architecture and requirements of your deployment when choosing and setting up your file system.

For more information, refer to Filesystem Selection in the Kafka documentation.

3.2. Downloading Streams for Apache Kafka

A ZIP file distribution of Streams for Apache Kafka is available for download from the Red Hat website. You can download the latest version of Red Hat Streams for Apache Kafka from the Streams for Apache Kafka software downloads page.

  • For Kafka and other Kafka components, download the amq-streams-<version>-bin.zip file
  • For Kafka Bridge, download the amq-streams-<version>-bridge-bin.zip file.

    For installation instructions, see Using the Streams for Apache Kafka Bridge.

3.3. Installing Kafka

Use the Streams for Apache Kafka ZIP files to install Kafka on Red Hat Enterprise Linux. You can install Kafka in a single-node or multi-node environment. In this procedure, a single Kafka instance is installed on a single host (node).

The Streams for Apache Kafka installation files include the binaries for running other Kafka components, like Kafka Connect, Kafka MirrorMaker 2, and Kafka Bridge. In a single-node environment, you can run these components from the same host where you installed Kafka. However, we recommend that you add the installation files and run other Kafka components on separate hosts.

Prerequisites

Procedure

Install Kafka on your host.

  1. Add a new kafka user and group:

    groupadd kafka
    useradd -g kafka kafka
    passwd kafka
  2. Extract and move the contents of the amq-streams-<version>-bin.zip file into the /opt/kafka directory:

    unzip amq-streams-<version>-bin.zip -d /opt
    mv /opt/kafka*redhat* /opt/kafka
  3. Change the ownership of the /opt/kafka directory to the kafka user:

    chown -R kafka:kafka /opt/kafka
  4. Create directory /var/lib/kafka for storing Kafka data and set its ownership to the kafka user:

    mkdir /var/lib/kafka
    chown -R kafka:kafka /var/lib/kafka

    You can now run a default configuration of Kafka as a single-node cluster.

    You can also use the installation to run other Kafka components, like Kafka Connect, on the same host.

    To run other components, specify the hostname and port to connect to the Kafka broker using the bootstrap.servers property in the component configuration.

    Example bootstrap servers configuration pointing to a single Kafka broker on the same host

    bootstrap.servers=localhost:9092

    However, we recommend installing and running Kafka components on separate hosts.

  5. (Optional) Install Kafka components on separate hosts.

    1. Extract the installation files to the /opt/kafka directory on each host.
    2. Change the ownership of the /opt/kafka directory to the kafka user.
    3. Add bootstrap.servers configuration that connects the component to the host (or hosts in a multi-node environment) running the Kafka brokers.

      Example bootstrap servers configuration pointing to Kafka brokers on different hosts

      bootstrap.servers=kafka0.<host_ip_address>:9092,kafka1.<host_ip_address>:9092,kafka2.<host_ip_address>:9092

      You can use this configuration for Kafka Connect, MirrorMaker 2, and the Kafka Bridge.

3.4. Running a Kafka cluster in KRaft mode

Configure and run Kafka in KRaft mode. You can run Kafka as a single-node or multi-node Kafka cluster. Run a minimum of three broker and three controller nodes, with topic replication across the brokers, for stability and availability.

Kafka nodes perform the role of broker, controller, or both.

Broker role
A broker, sometimes referred to as a node or server, orchestrates the storage and passing of messages.
Controller role
A controller coordinates the cluster and manages the metadata used to track the status of brokers and partitions.
Note

Cluster metadata is stored in the internal __cluster_metadata topic.

You can use combined broker and controller nodes, though you might want to separate these functions. Brokers performing combined roles can be more convenient in simpler deployments.

To identify a cluster, you create an ID. The ID is used when creating logs for the nodes you add to the cluster.

Specify the following in the configuration of each node:

  • A node ID
  • Broker roles
  • A list of nodes (or voters) that act as controllers

You specify a list of controllers, configured as voters, using the node ID and connection details (hostname and port) for each controller.

You apply broker configuration, including the setting of roles, using a configuration properties file. Broker configuration differs according to role. KRaft provides three example broker configuration properties files.

  • /opt/kafka/config/kraft/broker.properties has example configuration for a broker role
  • /opt/kafka/config/kraft/controller.properties has example configuration for a controller role
  • /opt/kafka/config/kraft/server.properties has example configuration for a combined role

You can base your broker configuration on these example properties files. In this procedure, the example server.properties configuration is used.

Prerequisites

Procedure

  1. Generate a unique ID for the Kafka cluster.

    You can use the kafka-storage tool to do this:

    /opt/kafka/bin/kafka-storage.sh random-uuid

    The command returns an ID. A cluster ID is required in KRaft mode.

  2. Create a configuration properties file for each node in the cluster.

    You can base the file on the examples provided with Kafka.

    1. Specify a role as broker, controller, or broker, controller

      For example, specify process.roles=broker, controller for a combined role.

    2. Specify a unique node.id for each node in the cluster starting from 0.

      For example, node.id=1.

    3. Specify a list of controller.quorum.voters in the format <node_id>@<hostname:port>.

      For example, controller.quorum.voters=1@localhost:9093.

    4. Specify listeners:

      • Configure the name, hostname and port for each listener.

        For example, listeners=PLAINTEXT:localhost:9092,CONTROLLER:localhost:9093.

      • Configure the listener names used for inter-broker communication.

        For example, inter.broker.listener.name=PLAINTEXT.

      • Configure the listener names used by the controller quorum.

        For example, controller.listener.names=CONTROLLER.

      • Configure the name, hostname and port for each listener that is advertised to clients for connection to Kafka.

        For example, advertised.listeners=PLAINTEXT:localhost:9092.

  3. Set up log directories for each node in your Kafka cluster:

    /opt/kafka/bin/kafka-storage.sh format -t <uuid> -c /opt/kafka/config/kraft/server.properties

    Returns:

    Formatting /tmp/kraft-combined-logs

    Replace <uuid> with the cluster ID you generated. Use the same ID for each node in your cluster.

    Apply the broker configuration using the properties file you created for the broker.

    By default, the log directory (log.dirs) specified in the server.properties configuration file is set to /tmp/kraft-combined-logs. The /tmp directory is typically cleared on each system reboot, making it suitable for development environments only.

    You can add a comma-separated list to set up multiple log directories.

  4. Start each Kafka node.

    /opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/kraft/server.properties
  5. Check that Kafka is running:

    jcmd | grep kafka

    Returns:

    process ID kafka.Kafka /opt/kafka/config/kraft/server.properties

    Check the logs of each node to ensure that they have successfully joined the KRaft cluster:

    tail -f /opt/kafka/logs/server.log

You can now create topics, and send and receive messages from the brokers.

For brokers passing messages, you can use topic replication across the brokers in a cluster for data durability. Configure topics to have a replication factor of at least three and a minimum number of in-sync replicas set to 1 less than the replication factor. For more information, see Section 7.7, “Creating a topic”.

3.5. Stopping the Streams for Apache Kafka services

You can stop Kafka services by running a script. After running the script, all connections to the Kafka services are terminated.

Procedure

  1. Stop the Kafka node.

    su - kafka
    /opt/kafka/bin/kafka-server-stop.sh
  2. Confirm that the Kafka node is stopped.

    jcmd | grep kafka

3.6. Performing a graceful rolling restart of Kafka brokers

This procedure shows how to do a graceful rolling restart of brokers in a multi-node cluster. A rolling restart is usually required following an upgrade or change to the Kafka cluster configuration properties.

Note

Some broker configurations do not need a restart of the broker. For more information, see Updating Broker Configs in the Apache Kafka documentation.

After you perform a restart of a broker, check for under-replicated topic partitions to make sure that replica partitions have caught up.

To achieve a graceful restart with no loss of availability, ensure that you are replicating topics and that at least the minimum number of replicas (min.insync.replicas) replicas are in sync. The min.insync.replicas configuration determines the minimum number of replicas that must acknowledge a write for the write to be considered successful.

For a multi-node cluster, the standard approach is to have a topic replication factor of at least 3 and a minimum number of in-sync replicas set to 1 less than the replication factor. If you are using acks=all in your producer configuration for data durability, check that the broker you restarted is in sync with all the partitions it’s replicating before restarting the next broker.

Single-node clusters are unavailable during a restart, since all partitions are on the same broker.

Prerequisites

  • Streams for Apache Kafka is installed on each host, and the configuration files are available.
  • The Kafka cluster is operating as expected.

    Check for under-replicated partitions or any other issues affecting broker operation. The steps in this procedure describe how to check for under-replicated partitions.

Procedure

Perform the following steps on each Kafka broker. Complete the steps on the first broker before moving on to the next. Perform the steps on the brokers that also act as controllers last. Otherwise, the controllers need to change on more than one restart.

  1. Stop the Kafka broker:

    /opt/kafka/bin/kafka-server-stop.sh
  2. Make any changes to the broker configuration that require a restart after completion.

    For further information, see the following:

  3. Restart the Kafka broker:

    /opt/kafka/bin/kafka-server-start.sh -daemon /opt/kafka/config/kraft/server.properties
  4. Check that Kafka is running:

    jcmd | grep kafka

    Returns:

    process ID kafka.Kafka /opt/kafka/config/kraft/server.properties

    Check the logs of each node to ensure that they have successfully joined the KRaft cluster:

    tail -f /opt/kafka/logs/server.log
  5. Wait until the broker has zero under-replicated partitions. You can check from the command line or use metrics.

    • Use the kafka-topics.sh command with the --under-replicated-partitions parameter:

      /opt/kafka/bin/kafka-topics.sh --bootstrap-server <broker_host>:<port>  --describe --under-replicated-partitions

      For example:

      /opt/kafka/bin/kafka-topics.sh --bootstrap-server localhost:9092 --describe --under-replicated-partitions

      The command provides a list of topics with under-replicated partitions in a cluster.

      Topics with under-replicated partitions

      Topic: topic3 Partition: 4 Leader: 2 Replicas: 2,3 Isr: 2
      Topic: topic3 Partition: 5 Leader: 3 Replicas: 1,2 Isr: 1
      Topic: topic1 Partition: 1 Leader: 3 Replicas: 1,3 Isr: 3
      # …

      Under-replicated partitions are listed if the ISR (in-sync replica) count is less than the number of replicas. If a list is not returned, there are no under-replicated partitions.

    • Use the UnderReplicatedPartitions metric:

      kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions

      The metric provides a count of partitions where replicas have not caught up. You wait until the count is zero.

      Tip

      Use the Kafka Exporter to create an alert when there are one or more under-replicated partitions for a topic.

Checking logs when restarting

If a broker fails to start, check the application logs for information. You can also check the status of a broker shutdown and restart in the /opt/kafka/logs/server.log application log.

Red Hat logoGithubRedditYoutubeTwitter

Apprendre

Essayez, achetez et vendez

Communautés

À propos de la documentation Red Hat

Nous aidons les utilisateurs de Red Hat à innover et à atteindre leurs objectifs grâce à nos produits et services avec un contenu auquel ils peuvent faire confiance.

Rendre l’open source plus inclusif

Red Hat s'engage à remplacer le langage problématique dans notre code, notre documentation et nos propriétés Web. Pour plus de détails, consultez leBlog Red Hat.

À propos de Red Hat

Nous proposons des solutions renforcées qui facilitent le travail des entreprises sur plusieurs plates-formes et environnements, du centre de données central à la périphérie du réseau.

© 2024 Red Hat, Inc.