Chapter 3. Getting started
Streams for Apache Kafka is distributed in a ZIP file that contains installation artifacts for the Kafka components.
The Kafka Bridge has separate installation files. For information on installing and using the Kafka Bridge, see Using the Streams for Apache Kafka Bridge.
3.1. Installation environment
Streams for Apache Kafka runs on Red Hat Enterprise Linux. The host (node) can be a physical or virtual machine (VM). Use the installation files provided with Streams for Apache Kafka to install Kafka components. You can install Kafka in a single-node or multi-node environment.
- Single-node environment
- A single-node Kafka cluster runs instances of Kafka components on a single host. This configuration is not suitable for a production environment.
- Multi-node environment
- A multi-node Kafka cluster runs instances of Kafka components on multiple hosts.
We recommended that you run Kafka and other Kafka components, such as Kafka Connect, on separate hosts. By running the components in this way, it’s easier to maintain and upgrade each component.
Kafka clients establish a connection to the Kafka cluster using the bootstrap.servers
configuration property. If you are using Kafka Connect, for example, the Kafka Connect configuration properties must include a bootstrap.servers
value that specifies the hostname and port of the hosts where the Kafka brokers are running. If the Kafka cluster is running on more than one host with multiple Kafka brokers, you specify a hostname and port for each broker. Each Kafka broker is identified by a node.id
.
3.1.1. Data storage considerations
An efficient data storage infrastructure is essential to the optimal performance of Streams for Apache Kafka.
Block storage is required. File storage, such as NFS, does not work with Kafka.
Choose from one of the following options for your block storage:
- Cloud-based block storage solutions, such as Amazon Elastic Block Store (EBS)
- Local storage
- Storage Area Network (SAN) volumes accessed by a protocol such as Fibre Channel or iSCSI
3.1.2. File systems
Kafka uses a file system for storing messages. Streams for Apache Kafka is compatible with the XFS and ext4 file systems, which are commonly used with Kafka. Consider the underlying architecture and requirements of your deployment when choosing and setting up your file system.
For more information, refer to Filesystem Selection in the Kafka documentation.
3.2. Downloading Streams for Apache Kafka
A ZIP file distribution of Streams for Apache Kafka is available for download from the Red Hat website. You can download the latest version of Red Hat Streams for Apache Kafka from the Streams for Apache Kafka software downloads page.
-
For Kafka and other Kafka components, download the
amq-streams-<version>-bin.zip
file For Kafka Bridge, download the
amq-streams-<version>-bridge-bin.zip
file.For installation instructions, see Using the Streams for Apache Kafka Bridge.
3.3. Installing Kafka
Use the Streams for Apache Kafka ZIP files to install Kafka on Red Hat Enterprise Linux. You can install Kafka in a single-node or multi-node environment. In this procedure, a single Kafka instance is installed on a single host (node).
The Streams for Apache Kafka installation files include the binaries for running other Kafka components, like Kafka Connect, Kafka MirrorMaker 2, and Kafka Bridge. In a single-node environment, you can run these components from the same host where you installed Kafka. However, we recommend that you add the installation files and run other Kafka components on separate hosts.
Prerequisites
- You have downloaded the installation files.
- You have reviewed the supported configurations in the Streams for Apache Kafka 2.7 on Red Hat Enterprise Linux Release Notes.
-
You are logged in to Red Hat Enterprise Linux as admin (
root
) user.
Procedure
Install Kafka on your host.
Add a new
kafka
user and group:groupadd kafka useradd -g kafka kafka passwd kafka
Extract and move the contents of the
amq-streams-<version>-bin.zip
file into the/opt/kafka
directory:unzip amq-streams-<version>-bin.zip -d /opt mv /opt/kafka*redhat* /opt/kafka
Change the ownership of the
/opt/kafka
directory to thekafka
user:chown -R kafka:kafka /opt/kafka
Create directory
/var/lib/kafka
for storing Kafka data and set its ownership to thekafka
user:mkdir /var/lib/kafka chown -R kafka:kafka /var/lib/kafka
You can now run a default configuration of Kafka as a single-node cluster.
You can also use the installation to run other Kafka components, like Kafka Connect, on the same host.
To run other components, specify the hostname and port to connect to the Kafka broker using the
bootstrap.servers
property in the component configuration.Example bootstrap servers configuration pointing to a single Kafka broker on the same host
bootstrap.servers=localhost:9092
However, we recommend installing and running Kafka components on separate hosts.
(Optional) Install Kafka components on separate hosts.
-
Extract the installation files to the
/opt/kafka
directory on each host. -
Change the ownership of the
/opt/kafka
directory to thekafka
user. Add
bootstrap.servers
configuration that connects the component to the host (or hosts in a multi-node environment) running the Kafka brokers.Example bootstrap servers configuration pointing to Kafka brokers on different hosts
bootstrap.servers=kafka0.<host_ip_address>:9092,kafka1.<host_ip_address>:9092,kafka2.<host_ip_address>:9092
You can use this configuration for Kafka Connect, MirrorMaker 2, and the Kafka Bridge.
-
Extract the installation files to the
3.4. Running a Kafka cluster in KRaft mode
Configure and run Kafka in KRaft mode. You can run Kafka as a single-node or multi-node Kafka cluster. Run a minimum of three broker and three controller nodes, with topic replication across the brokers, for stability and availability.
Kafka nodes perform the role of broker, controller, or both.
- Broker role
- A broker, sometimes referred to as a node or server, orchestrates the storage and passing of messages.
- Controller role
- A controller coordinates the cluster and manages the metadata used to track the status of brokers and partitions.
Cluster metadata is stored in the internal __cluster_metadata
topic.
You can use combined broker and controller nodes, though you might want to separate these functions. Brokers performing combined roles can be more convenient in simpler deployments.
To identify a cluster, you create an ID. The ID is used when creating logs for the nodes you add to the cluster.
Specify the following in the configuration of each node:
- A node ID
- Broker roles
-
A list of nodes (or
voters
) that act as controllers
You specify a list of controllers, configured as voters
, using the node ID and connection details (hostname and port) for each controller.
You apply broker configuration, including the setting of roles, using a configuration properties file. Broker configuration differs according to role. KRaft provides three example broker configuration properties files.
-
/opt/kafka/config/kraft/broker.properties
has example configuration for a broker role -
/opt/kafka/config/kraft/controller.properties
has example configuration for a controller role -
/opt/kafka/config/kraft/server.properties
has example configuration for a combined role
You can base your broker configuration on these example properties files. In this procedure, the example server.properties
configuration is used.
Prerequisites
- Streams for Apache Kafka is installed on each host, and the configuration files are available.
Procedure
Generate a unique ID for the Kafka cluster.
You can use the
kafka-storage
tool to do this:/opt/kafka/bin/kafka-storage.sh random-uuid
The command returns an ID. A cluster ID is required in KRaft mode.
Create a configuration properties file for each node in the cluster.
You can base the file on the examples provided with Kafka.
Specify a role as
broker
,controller
, orbroker, controller
For example, specify
process.roles=broker, controller
for a combined role.Specify a unique
node.id
for each node in the cluster starting from0
.For example,
node.id=1
.Specify a list of
controller.quorum.voters
in the format<node_id>@<hostname:port>
.For example,
controller.quorum.voters=1@localhost:9093
.Specify listeners:
Configure the name, hostname and port for each listener.
For example,
listeners=PLAINTEXT:localhost:9092,CONTROLLER:localhost:9093
.Configure the listener names used for inter-broker communication.
For example,
inter.broker.listener.name=PLAINTEXT
.Configure the listener names used by the controller quorum.
For example,
controller.listener.names=CONTROLLER
.Configure the name, hostname and port for each listener that is advertised to clients for connection to Kafka.
For example,
advertised.listeners=PLAINTEXT:localhost:9092
.
Set up log directories for each node in your Kafka cluster:
/opt/kafka/bin/kafka-storage.sh format -t <uuid> -c /opt/kafka/config/kraft/server.properties
Returns:
Formatting /tmp/kraft-combined-logs
Replace <uuid> with the cluster ID you generated. Use the same ID for each node in your cluster.
Apply the broker configuration using the properties file you created for the broker.
By default, the log directory (
log.dirs
) specified in theserver.properties
configuration file is set to/tmp/kraft-combined-logs
. The/tmp
directory is typically cleared on each system reboot, making it suitable for development environments only.You can add a comma-separated list to set up multiple log directories.
Start each Kafka node.
/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/kraft/server.properties
Check that Kafka is running:
jcmd | grep kafka
Returns:
process ID kafka.Kafka /opt/kafka/config/kraft/server.properties
Check the logs of each node to ensure that they have successfully joined the KRaft cluster:
tail -f /opt/kafka/logs/server.log
You can now create topics, and send and receive messages from the brokers.
For brokers passing messages, you can use topic replication across the brokers in a cluster for data durability. Configure topics to have a replication factor of at least three and a minimum number of in-sync replicas set to 1 less than the replication factor. For more information, see Section 7.7, “Creating a topic”.
3.5. Stopping the Streams for Apache Kafka services
You can stop Kafka services by running a script. After running the script, all connections to the Kafka services are terminated.
Procedure
Stop the Kafka node.
su - kafka /opt/kafka/bin/kafka-server-stop.sh
Confirm that the Kafka node is stopped.
jcmd | grep kafka
3.6. Performing a graceful rolling restart of Kafka brokers
This procedure shows how to do a graceful rolling restart of brokers in a multi-node cluster. A rolling restart is usually required following an upgrade or change to the Kafka cluster configuration properties.
Some broker configurations do not need a restart of the broker. For more information, see Updating Broker Configs in the Apache Kafka documentation.
After you perform a restart of a broker, check for under-replicated topic partitions to make sure that replica partitions have caught up.
To achieve a graceful restart with no loss of availability, ensure that you are replicating topics and that at least the minimum number of replicas (min.insync.replicas
) replicas are in sync. The min.insync.replicas
configuration determines the minimum number of replicas that must acknowledge a write for the write to be considered successful.
For a multi-node cluster, the standard approach is to have a topic replication factor of at least 3 and a minimum number of in-sync replicas set to 1 less than the replication factor. If you are using acks=all
in your producer configuration for data durability, check that the broker you restarted is in sync with all the partitions it’s replicating before restarting the next broker.
Single-node clusters are unavailable during a restart, since all partitions are on the same broker.
Prerequisites
- Streams for Apache Kafka is installed on each host, and the configuration files are available.
The Kafka cluster is operating as expected.
Check for under-replicated partitions or any other issues affecting broker operation. The steps in this procedure describe how to check for under-replicated partitions.
Procedure
Perform the following steps on each Kafka broker. Complete the steps on the first broker before moving on to the next. Perform the steps on the brokers that also act as controllers last. Otherwise, the controllers need to change on more than one restart.
Stop the Kafka broker:
/opt/kafka/bin/kafka-server-stop.sh
Make any changes to the broker configuration that require a restart after completion.
For further information, see the following:
Restart the Kafka broker:
/opt/kafka/bin/kafka-server-start.sh -daemon /opt/kafka/config/kraft/server.properties
Check that Kafka is running:
jcmd | grep kafka
Returns:
process ID kafka.Kafka /opt/kafka/config/kraft/server.properties
Check the logs of each node to ensure that they have successfully joined the KRaft cluster:
tail -f /opt/kafka/logs/server.log
Wait until the broker has zero under-replicated partitions. You can check from the command line or use metrics.
Use the
kafka-topics.sh
command with the--under-replicated-partitions
parameter:/opt/kafka/bin/kafka-topics.sh --bootstrap-server <broker_host>:<port> --describe --under-replicated-partitions
For example:
/opt/kafka/bin/kafka-topics.sh --bootstrap-server localhost:9092 --describe --under-replicated-partitions
The command provides a list of topics with under-replicated partitions in a cluster.
Topics with under-replicated partitions
Topic: topic3 Partition: 4 Leader: 2 Replicas: 2,3 Isr: 2 Topic: topic3 Partition: 5 Leader: 3 Replicas: 1,2 Isr: 1 Topic: topic1 Partition: 1 Leader: 3 Replicas: 1,3 Isr: 3 # …
Under-replicated partitions are listed if the ISR (in-sync replica) count is less than the number of replicas. If a list is not returned, there are no under-replicated partitions.
Use the
UnderReplicatedPartitions
metric:kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions
The metric provides a count of partitions where replicas have not caught up. You wait until the count is zero.
TipUse the Kafka Exporter to create an alert when there are one or more under-replicated partitions for a topic.
Checking logs when restarting
If a broker fails to start, check the application logs for information. You can also check the status of a broker shutdown and restart in the /opt/kafka/logs/server.log
application log.