Chapter 1. Introduction to Metro-DR
Disaster recovery is the ability to recover and continue business critical applications from natural or human created disasters. It is a component of the overall business continuance strategy of any major organization as designed to preserve the continuity of business operations during major adverse events.
Metro-DR capability provides volume persistent data and metadata replication across sites that are in the same geographical area. In the public cloud these would be similar to protecting from an Availability Zone failure. Metro-DR ensures business continuity during the unavailability of a data center with no data loss. This is usually expressed at Recovery Point Objective (RPO) and Recovery Time Objective (RTO).
- RPO is a measure of how frequently you take backups or snapshots of persistent data. In practice, the RPO indicates the amount of data that will be lost or need to be reentered after an outage. Metro-DR solution ensures your RPO is zero because data is replicated in a synchronous fashion.
- RTO is the amount of downtime a business can tolerate. The RTO answers the question, “How long can it take for our system to recover after we were notified of a business disruption?”
The intent of this guide is to detail the Metro Disaster Recovery (Metro-DR) steps and commands necessary to be able to failover an application from one Red Hat OpenShift Container Platform cluster to another and then failback the same application to the original primary cluster. In this case the RHOCP clusters will be created or imported using Red Hat Advanced Cluster Management (RHACM) and have distance limitations between the RHOCP clusters of less than 10 ms RTT latency.
The persistent storage for applications will be provided by an external Red Hat Ceph Storage cluster stretched between the two locations with the RHOCP instances connected to this storage cluster. An arbiter node with a storage monitor service will be required at a third location (different location than where RHOCP instances are deployed) to establish quorum for the Red Hat Ceph Storage cluster in the case of a site outage. The third location has relaxed latency requirements, which supports values as high up to 100 ms RTT latency from the storage cluster connected to the RHOCP instances.
1.1. Components of Metro-DR solution
Metro-DR is composed of Red Hat Advanced Cluster Management for Kubernetes, Red Hat Ceph Storage and OpenShift Data Foundation components to provide application and data mobility across OpenShift Container Platform clusters.
Red Hat Advanced Cluster Management for Kubernetes
Red Hat Advanced Cluster Management (RHACM) provides the ability to manage multiple clusters and application lifecycles. Hence, it serves as a control plane in a multi-cluster environment.
RHACM is split into two parts:
- RHACM Hub: components that run on the multi-cluster control plane
- Managed clusters: components that run on the clusters that are managed
For more information about this product, see RHACM documentation and the RHACM “Manage Applications” documentation.
Red Hat Ceph Storage
Red Hat Ceph Storage is a massively scalable, open, software-defined storage platform that combines the most stable version of the Ceph storage system with a Ceph management platform, deployment utilities, and support services. It significantly lowers the cost of storing enterprise data and helps organizations manage exponential data growth. The software is a robust and modern petabyte-scale storage platform for public or private cloud deployments.
OpenShift Data Foundation
OpenShift Data Foundation provides the ability to provision and manage storage for stateful applications in an OpenShift Container Platform cluster. It is backed by Ceph as the storage provider, whose lifecycle is managed by Rook in the OpenShift Data Foundation component stack and Ceph-CSI provides the provisioning and management of Persistent Volumes for stateful applications.
OpenShift Data Foundation stack is enhanced with the ability to provide csi-addons
to manage per Persistent Volume Claim mirroring.
OpenShift DR
OpenShift DR is a disaster recovery orchestrator for stateful applications across a set of peer OpenShift clusters which are deployed and managed using RHACM and provides cloud-native interfaces to orchestrate the life-cycle of an application’s state on Persistent Volumes. These include:
- Protecting an application state relationship across OpenShift clusters
- Failing over an application’s state to a peer cluster
- Relocate an application’s state to the previously deployed cluster
OpenShift DR is split into two components:
- OpenShift DR Hub Operator: Installed on the hub cluster to manage failover and relocation for applications.
- OpenShift DR Cluster Operator: Installed on each managed cluster to manage the lifecycle of all PVCs of an application.
1.2. Metro-DR deployment workflow
This section provides an overview of the steps required to configure and deploy Metro-DR capabilities using OpenShift Data Foundation version 4.10, RHCS 5 and RHACM latest version across two distinct OpenShift Container Platform clusters. In addition to two managed clusters, a third OpenShift Container Platform cluster will be required to deploy the Advanced Cluster Management.
To configure your infrastructure, perform the below steps in the order given:
- Ensure you meet each of the Metro-DR requirements which includes RHACM operator installation, creation or importing of OpenShift Container Platform into RHACM hub and network configuration. See Requirements for enabling Metro-DR.
- Ensure you meet the requirements for deploying Red Hat Ceph Storage stretch cluster with arbiter. See Requirements for deploying Red Hat Ceph Storage.
- Configure Red Hat Ceph Storage stretch cluster mode. For instructions on enabling Ceph cluster on two different data centers using stretched mode functionality, see Configuring Red Hat Ceph Storage stretch cluster.
- Install OpenShift Data Foundation 4.10 on Primary and Secondary managed clusters. See Installing OpenShift Data Foundation on managed clusters.
- Install the Openshift DR Hub Operator on the Hub cluster. See Installing OpenShift DR Hub Operator on Hub cluster.
- Configure the managed and Hub cluster. See Configuring managed and hub clusters.
- Create the DRPolicy resource on the hub cluster which is used to deploy, failover, and relocate the workloads across managed clusters. See Creating Disaster Recovery Policy on Hub cluster.
- Enable automatic installation of the OpenShift DR Cluster operator and automatic transfer of S3 secrets on the managed clusters. For instructions, see Enabling automatic install of OpenShift DR cluster operator and Enabling automatic transfer of S3 secrets on managed clusters.
- Create a sample application using RHACM console for testing failover and relocation testing. For instructions, see Creating sample application, application failover and relocating an application between managed clusters.