Chapter 3. Getting started
Streams for Apache Kafka is distributed in a ZIP file that contains installation artifacts for the Kafka components.
The Kafka Bridge has separate installation files. For information on installing and using the Kafka Bridge, see Using the Streams for Apache Kafka Bridge.
3.1. Installation environment Copy linkLink copied to clipboard!
Streams for Apache Kafka runs on Red Hat Enterprise Linux. The host (node) can be a physical or virtual machine (VM). Use the installation files provided with Streams for Apache Kafka to install Kafka components. You can install Kafka in a single-node or multi-node environment.
- Single-node environment
- A single-node Kafka cluster runs instances of Kafka components on a single host. This configuration is not suitable for a production environment.
- Multi-node environment
- A multi-node Kafka cluster runs instances of Kafka components on multiple hosts.
We recommended that you run Kafka and other Kafka components, such as Kafka Connect, on separate hosts. By running the components in this way, it’s easier to maintain and upgrade each component.
Kafka clients establish a connection to the Kafka cluster using the bootstrap.servers configuration property. If you are using Kafka Connect, for example, the Kafka Connect configuration properties must include a bootstrap.servers value that specifies the hostname and port of the hosts where the Kafka brokers are running. If the Kafka cluster is running on more than one host with multiple Kafka brokers, you specify a hostname and port for each broker. Each Kafka broker is identified by a node.id.
3.1.1. Data storage considerations Copy linkLink copied to clipboard!
An efficient data storage infrastructure is essential to the optimal performance of Streams for Apache Kafka.
Block storage is required. File storage, such as NFS, does not work with Kafka.
Choose from one of the following options for your block storage:
- Cloud-based block storage solutions, such as Amazon Elastic Block Store (EBS)
- Local storage
- Storage Area Network (SAN) volumes accessed by a protocol such as Fibre Channel or iSCSI
3.1.2. File systems Copy linkLink copied to clipboard!
Kafka uses a file system for storing messages. Streams for Apache Kafka is compatible with the XFS and ext4 file systems, which are commonly used with Kafka. Consider the underlying architecture and requirements of your deployment when choosing and setting up your file system.
For more information, refer to Filesystem Selection in the Kafka documentation.
3.1.3. Apache Kafka and ZooKeeper storage Copy linkLink copied to clipboard!
Use separate disks for Apache Kafka and ZooKeeper.
Kafka supports JBOD (Just a Bunch of Disks) storage, a data storage configuration of multiple disks or volumes. JBOD provides increased data storage for Kafka brokers. It can also improve performance.
Solid-state drives (SSDs), though not essential, can improve the performance of Kafka in large clusters where data is sent to and received from multiple topics asynchronously. SSDs are particularly effective with ZooKeeper, which requires fast, low latency data access.
You do not need to provision replicated storage because Kafka and ZooKeeper both have built-in data replication.
3.2. Downloading Streams for Apache Kafka Copy linkLink copied to clipboard!
A ZIP file distribution of Streams for Apache Kafka is available for download from the Red Hat website. You can download the latest version of Red Hat Streams for Apache Kafka from the Streams for Apache Kafka software downloads page.
-
For Kafka and other Kafka components, download the
amq-streams-<version>-kafka-bin.zipfile For Kafka Bridge, download the
amq-streams-<version>-bridge-bin.zipfile.For installation instructions, see Using the Streams for Apache Kafka Bridge.
3.3. Installing Kafka Copy linkLink copied to clipboard!
Use the Streams for Apache Kafka ZIP files to install Kafka on Red Hat Enterprise Linux. You can install Kafka in a single-node or multi-node environment. In this procedure, a single Kafka broker and ZooKeeper instance are installed on a single host (node).
The Streams for Apache Kafka installation files include the binaries for running other Kafka components, like Kafka Connect, Kafka MirrorMaker 2, and Kafka Bridge. In a single-node environment, you can run these components from the same host where you installed Kafka. However, we recommend that you add the installation files and run other Kafka components on separate hosts.
Apache ZooKeeper provides a cluster coordination service for highly reliable distributed coordination. Kafka uses ZooKeeper for storing configuration data and for cluster coordination. Before running Kafka, a ZooKeeper cluster has to be ready.
If you are using a multi-node environment, you install Kafka brokers and ZooKeeper instances on more than one host. Repeat the installation steps for each host. To identify each ZooKeeper instance and broker, you add a unique ID in the configuration. For more information, see Chapter 4, Running a multi-node environment.
Prerequisites
- You have downloaded the installation files.
- You have reviewed the supported configurations in the Streams for Apache Kafka 2.8 on Red Hat Enterprise Linux Release Notes.
-
You are logged in to Red Hat Enterprise Linux as admin (
root) user.
Procedure
Install Kafka with ZooKeeper on your host.
Add a new
kafkauser and group:groupadd kafka useradd -g kafka kafka passwd kafka
groupadd kafka useradd -g kafka kafka passwd kafkaCopy to Clipboard Copied! Toggle word wrap Toggle overflow Extract and move the contents of the
amq-streams-<version>-kafka-bin.zipfile into the/opt/kafkadirectory:unzip amq-streams-<version>-kafka-bin.zip -d /opt mv /opt/kafka*redhat* /opt/kafka
unzip amq-streams-<version>-kafka-bin.zip -d /opt mv /opt/kafka*redhat* /opt/kafkaCopy to Clipboard Copied! Toggle word wrap Toggle overflow Change the ownership of the
/opt/kafkadirectory to thekafkauser:chown -R kafka:kafka /opt/kafka
chown -R kafka:kafka /opt/kafkaCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create directory
/var/lib/zookeeperfor storing ZooKeeper data and set its ownership to thekafkauser:mkdir /var/lib/zookeeper chown -R kafka:kafka /var/lib/zookeeper
mkdir /var/lib/zookeeper chown -R kafka:kafka /var/lib/zookeeperCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create directory
/var/lib/kafkafor storing Kafka data and set its ownership to thekafkauser:mkdir /var/lib/kafka chown -R kafka:kafka /var/lib/kafka
mkdir /var/lib/kafka chown -R kafka:kafka /var/lib/kafkaCopy to Clipboard Copied! Toggle word wrap Toggle overflow You can now run a default configuration of Kafka as a single-node cluster.
You can also use the installation to run other Kafka components, like Kafka Connect, on the same host.
To run other components, specify the hostname and port to connect to the Kafka broker using the
bootstrap.serversproperty in the component configuration.Example bootstrap servers configuration pointing to a single Kafka broker on the same host
bootstrap.servers=localhost:9092
bootstrap.servers=localhost:9092Copy to Clipboard Copied! Toggle word wrap Toggle overflow However, we recommend installing and running Kafka components on separate hosts.
(Optional) Install Kafka components on separate hosts.
-
Repeat the steps to extract and install the installation files to the
/opt/kafkadirectory on each host. Add
bootstrap.serversconfiguration that connects the component to the host (or hosts in a multi-node environment) running the Kafka brokers.Example bootstrap servers configuration pointing to Kafka brokers on different hosts
bootstrap.servers=kafka0.<host_ip_address>:9092,kafka1.<host_ip_address>:9092,kafka2.<host_ip_address>:9092
bootstrap.servers=kafka0.<host_ip_address>:9092,kafka1.<host_ip_address>:9092,kafka2.<host_ip_address>:9092Copy to Clipboard Copied! Toggle word wrap Toggle overflow You can use this configuration for Kafka Connect, MirrorMaker 2, and the Kafka Bridge.
-
Repeat the steps to extract and install the installation files to the
3.4. Running a single-node Kafka cluster Copy linkLink copied to clipboard!
This procedure shows how to run a basic Streams for Apache Kafka cluster consisting of a single Apache ZooKeeper node and a single Apache Kafka node, both running on the same host. The default configuration files are used for Kafka.
A single node Streams for Apache Kafka cluster does not provide reliability and high availability and is suitable only for development purposes.
Prerequisites
- Streams for Apache Kafka is installed on the host
Running the cluster
Generate a unique ID for the Kafka cluster.
You can use the
kafka-storagetool to do this:/opt/kafka/bin/kafka-storage.sh random-uuid
/opt/kafka/bin/kafka-storage.sh random-uuidCopy to Clipboard Copied! Toggle word wrap Toggle overflow The command returns an ID.
NoteA cluster ID is required in KRaft mode.
Edit the Kafka configuration file
/opt/kafka/config/server.properties. Set thelog.dirsoption to/var/lib/kafka/:log.dirs=/var/lib/kafka/
log.dirs=/var/lib/kafka/Copy to Clipboard Copied! Toggle word wrap Toggle overflow Switch to the
kafkauser:su - kafka
su - kafkaCopy to Clipboard Copied! Toggle word wrap Toggle overflow Start Kafka:
/opt/kafka/bin/kafka-server-start.sh -daemon /opt/kafka/config/server.properties
/opt/kafka/bin/kafka-server-start.sh -daemon /opt/kafka/config/server.propertiesCopy to Clipboard Copied! Toggle word wrap Toggle overflow Check that Kafka is running:
jcmd | grep kafka
jcmd | grep kafkaCopy to Clipboard Copied! Toggle word wrap Toggle overflow Returns:
process ID kafka.Kafka /opt/kafka/config/server.properties
process ID kafka.Kafka /opt/kafka/config/server.propertiesCopy to Clipboard Copied! Toggle word wrap Toggle overflow
3.5. Sending and receiving messages from a topic Copy linkLink copied to clipboard!
This procedure describes how to start the Kafka console producer and consumer clients and use them to send and receive several messages.
A new topic is automatically created in step one. Topic auto-creation is controlled using the auto.create.topics.enable configuration property (set to true by default). Alternatively, you can configure and create topics before using the cluster. For more information, see Topics.
Procedure
Start the Kafka console producer and configure it to send messages to a new topic:
/opt/kafka/bin/kafka-console-producer.sh --broker-list <bootstrap_address> --topic <topic-name>
/opt/kafka/bin/kafka-console-producer.sh --broker-list <bootstrap_address> --topic <topic-name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow For example:
/opt/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic my-topic
/opt/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic my-topicCopy to Clipboard Copied! Toggle word wrap Toggle overflow Enter several messages into the console. Press Enter to send each individual message to your new topic:
>message 1 >message 2 >message 3 >message 4
>message 1 >message 2 >message 3 >message 4Copy to Clipboard Copied! Toggle word wrap Toggle overflow When Kafka creates a new topic automatically, you might receive a warning that the topic does not exist:
WARN Error while fetching metadata with correlation id 39 : {4-3-16-topic1=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)WARN Error while fetching metadata with correlation id 39 : {4-3-16-topic1=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)Copy to Clipboard Copied! Toggle word wrap Toggle overflow The warning should not reappear after you send further messages.
In a new terminal window, start the Kafka console consumer and configure it to read messages from the beginning of your new topic.
/opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server <bootstrap_address> --topic <topic-name> --from-beginning
/opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server <bootstrap_address> --topic <topic-name> --from-beginningCopy to Clipboard Copied! Toggle word wrap Toggle overflow For example:
/opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic my-topic --from-beginning
/opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic my-topic --from-beginningCopy to Clipboard Copied! Toggle word wrap Toggle overflow The incoming messages display in the consumer console.
- Switch to the producer console and send additional messages. Check that they display in the consumer console.
- Stop the Kafka console producer and then the consumer by pressing Ctrl+C.
3.6. Stopping the Streams for Apache Kafka services Copy linkLink copied to clipboard!
You can stop the Kafka and ZooKeeper services by running a script. All connections to the Kafka and ZooKeeper services will be terminated.
Prerequisites
- Streams for Apache Kafka is installed on the host
- ZooKeeper and Kafka are up and running
Procedure
Stop the Kafka broker.
su - kafka /opt/kafka/bin/kafka-server-stop.sh
su - kafka /opt/kafka/bin/kafka-server-stop.shCopy to Clipboard Copied! Toggle word wrap Toggle overflow Confirm that the Kafka broker is stopped.
jcmd | grep kafka
jcmd | grep kafkaCopy to Clipboard Copied! Toggle word wrap Toggle overflow Stop ZooKeeper.
su - kafka /opt/kafka/bin/zookeeper-server-stop.sh
su - kafka /opt/kafka/bin/zookeeper-server-stop.shCopy to Clipboard Copied! Toggle word wrap Toggle overflow