Chapter 1. Disaster recovery for Apache Kafka


Apache Kafka provides strong resiliency and fault tolerance features out of the box. Kafka can replicate data across multiple brokers which allows it to survive broker failures without data loss and maintain full availability to clients. Brokers can also be configured with a "rack", via the broker.rack configuration, to indicate a unit of isolation. By default Kafka spreads partition replicas across as many racks as possible. The rack can represent an actual rack but also a data center or availability zone. Depending on how you deploy and configure the clusters, this can effectively allow being resilient to a full rack, data center or availability zone failure.

Kafka still requires relatively low latency between all brokers, so it’s not recommended to place brokers in different geographical regions and use the rack to represent a region. So for use cases that require stronger resiliency, for example, the loss of an entire region, you can’t rely only on inter-broker replication; a disaster recovery plan is needed.

A disaster recovery plan for Kafka typically consists of tools and processes to maintain or restore access to data. A plan starts with business requirements by deciding, among other things, a Recovery Time Objective (RTO) which is the acceptable duration for recovering, and a Recovery Point Objective (RPO) which is the maximum acceptable data loss. On the operating side, a solid plan requires understanding the state of both clusters at all times, the decisions involved, and who is responsible for taking them, when setting up and operating an environment, as well as the processes and tools to follow.

In most cases there is not a perfect way to handle and fully recover from a disaster. Several decision points involve tradeoffs, for example, availability versus consistency, that need to be understood to recover in a way that is best for your business. You need to carefully consider these in advance and include them in your plan so if a disaster strikes you know what actions to take.

Apache Kafka includes MirrorMaker (version 2), a tool that copies data between different Kafka clusters. This is called mirroring to differentiate it from replication which is used for data copied between brokers within a cluster.

Cluster migration is a different use case where MirrorMaker moves all data from one cluster to another. This is needed for example when the underlying infrastructure of a cluster is decommissioned, so a new cluster must be deployed and we need to copy all the data from the old cluster. The process for a cluster migration is effectively the same as a planned failover.

Red Hat logoGithubredditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust. Explore our recent updates.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

Theme

© 2026 Red Hat
Back to top