Chapter 4. Regional-DR solution for OpenShift Data Foundation

4.1. Components of Regional-DR solution
Copy link

Regional-DR is composed of Red Hat Advanced Cluster Management for Kubernetes and OpenShift Data Foundation components to provide application and data mobility across Red Hat OpenShift Container Platform clusters.

Red Hat Advanced Cluster Management for Kubernetes

Red Hat Advanced Cluster Management (RHACM) provides the ability to manage multiple clusters and application lifecycles. Hence, it serves as a control plane in a multi-cluster environment.

RHACM is split into two parts:

RHACM Hub: components that run on the multi-cluster control plane.
Managed clusters: components that run on the clusters that are managed.

For more information about this product, see RHACM documentation and the RHACM “Manage Applications” documentation.

OpenShift Data Foundation

OpenShift Data Foundation provides the ability to provision and manage storage for stateful applications in an OpenShift Container Platform cluster.

OpenShift Data Foundation is backed by Ceph as the storage provider, whose lifecycle is managed by Rook in the OpenShift Data Foundation component stack. Ceph-CSI provides the provisioning and management of Persistent Volumes for stateful applications.

OpenShift Data Foundation stack is now enhanced with the following abilities for disaster recovery:

Enable RBD block pools for mirroring across OpenShift Data Foundation instances (clusters)
Ability to mirror specific images within an RBD block pool
Provides csi-addons to manage per Persistent Volume Claim (PVC) mirroring

OpenShift DR

OpenShift DR is a set of orchestrators to configure and manage stateful applications across a set of peer OpenShift clusters which are managed using RHACM and provides cloud-native interfaces to orchestrate the life-cycle of an application’s state on Persistent Volumes. These include:

Protecting an application and its state relationship across OpenShift clusters
Failing over an application and its state to a peer cluster
Relocate an application and its state to the previously deployed cluster

OpenShift DR is split into three components:

ODF Multicluster Orchestrator: Installed on the multi-cluster control plane (RHACM Hub), it orchestrates configuration and peering of OpenShift Data Foundation clusters for Metro and Regional DR relationships
OpenShift DR Hub Operator: Automatically installed as part of ODF Multicluster Orchestrator installation on the hub cluster to orchestrate failover or relocation of DR enabled applications.
OpenShift DR Cluster Operator: Automatically installed on each managed cluster that is part of a Metro and Regional DR relationship to manage the lifecycle of all PVCs of an application.

4.2. Regional-DR deployment workflow
Copy link

This section provides an overview of the steps required to configure and deploy Regional-DR capabilities using the latest version of Red Hat OpenShift Data Foundation across two distinct OpenShift Container Platform clusters. In addition to two managed clusters, a third OpenShift Container Platform cluster will be required to deploy the Red Hat Advanced Cluster Management (RHACM).

To configure your infrastructure, perform the below steps in the order given:

Ensure requirements across the three: Hub, Primary and Secondary Openshift Container Platform clusters that are part of the DR solution are met. See Requirements for enabling Regional-DR.
Install OpenShift Data Foundation operator and create a storage system on Primary and Secondary managed clusters. See Creating OpenShift Data Foundation cluster on managed clusters.
Install the ODF Multicluster Orchestrator on the Hub cluster. See Installing ODF Multicluster Orchestrator on Hub cluster.
Configure SSL access between the Hub, Primary and Secondary clusters. See Configuring SSL access across clusters.
Create a DRPolicy resource for use with applications requiring DR protection across the Primary and Secondary clusters. See Creating Disaster Recovery Policy on Hub cluster.
Note
There can be more than a single policy.
Testing your disaster recovery solution with:
1. Subscription-based application:
  - Create Subscription-based applications. See Creating sample application.
  - Test failover and relocate operations using the sample subscription-based application between managed clusters. See Subscription-based application failover and relocating subscription-based application.
2. ApplicationSet-based application:
  - Create sample applications. See Creating ApplicationSet-based applications.
  - Test failover and relocate operations using the sample application between managed clusters. See ApplicationSet-based application failover and relocating ApplicationSet-based application.
3. Discovered applications
  - Ensure all requirements mentioned in Prerequisites is addressed. See Prerequisites for disaster recovery protection of discovered applications
  - Create a sample discovered application. See Creating a sample discovered application
  - Enroll the discovered application. See Enrolling a sample discovered application for disaster recovery protection
  - Test failover and relocate. See Discovered application failover and relocate

4.3. Requirements for enabling Regional-DR
Copy link

The prerequisites to installing a disaster recovery solution supported by Red Hat OpenShift Data Foundation are as follows:

You must have three OpenShift clusters that have network reachability between them:
- Hub cluster where Red Hat Advanced Cluster Management (RHACM) for Kubernetes operator is installed.
- Primary managed cluster where OpenShift Data Foundation is running.
- Secondary managed cluster where OpenShift Data Foundation is running.
Note
For configuring hub recovery setup, you need a 4th cluster which acts as the passive hub. The primary managed cluster (Site-1) can be co-situated with the active RHACM hub cluster while the passive hub cluster is situated along with the secondary managed cluster (Site-2). Alternatively, the active RHACM hub cluster can be placed in a neutral site (Site-3) that is not impacted by the failures of either of the primary managed cluster at Site-1 or the secondary cluster at Site-2. In this situation, if a passive hub cluster is used it can be placed with the secondary cluster at Site-2. For more information, see Configuring passive hub cluster for hub recovery.
Ensure that RHACM operator and MultiClusterHub is installed on the Hub cluster. See RHACM installation guide for instructions.
After the operator is successfully installed, a popover with a message that the Web console update is available appears on the user interface. Click Refresh web console from this popover for the console changes to reflect.

Important

Ensure that application traffic routing and redirection are configured appropriately.

On the Hub cluster
- Navigate to All Clusters Infrastructure Clusters.
- Import or create the Primary managed cluster and the Secondary managed cluster using the RHACM console.
- Choose the appropriate options for your environment.
For instructions, see Creating a cluster and Importing a target managed cluster to the hub cluster.

Connect the private OpenShift cluster and service networks using the RHACM Submariner add-ons. Verify that the two clusters have non-overlapping service and cluster private networks. Otherwise, ensure that the Globalnet is enabled during the Submariner add-ons installation.

Run the following command for each of the managed clusters to determine if Globalnet needs to be enabled. The example shown here is for non-overlapping cluster and service networks so Globalnet would not be enabled.

oc get networks.config.openshift.io cluster -o json | jq .spec

$ oc get networks.config.openshift.io cluster -o json | jq .spec

Copy to Clipboard

Toggle word wrap

Example output for Primary cluster:

{
  "clusterNetwork": [
    {
      "cidr": "10.5.0.0/16",
      "hostPrefix": 23
    }
  ],
  "externalIP": {
    "policy": {}
  },
  "networkType": "OVNKubernetes",
  "serviceNetwork": [
    "10.15.0.0/16"
  ]
}

{
  "clusterNetwork": [
    {
      "cidr": "10.5.0.0/16",
      "hostPrefix": 23
    }
  ],
  "externalIP": {
    "policy": {}
  },
  "networkType": "OVNKubernetes",
  "serviceNetwork": [
    "10.15.0.0/16"
  ]
}

Copy to Clipboard

Toggle word wrap

Example output for Secondary cluster:

{
  "clusterNetwork": [
    {
      "cidr": "10.6.0.0/16",
      "hostPrefix": 23
    }
  ],
  "externalIP": {
    "policy": {}
  },
  "networkType": "OVNKubernetes",
  "serviceNetwork": [
    "10.16.0.0/16"
  ]
}

{
  "clusterNetwork": [
    {
      "cidr": "10.6.0.0/16",
      "hostPrefix": 23
    }
  ],
  "externalIP": {
    "policy": {}
  },
  "networkType": "OVNKubernetes",
  "serviceNetwork": [
    "10.16.0.0/16"
  ]
}

Copy to Clipboard

Toggle word wrap

For more information, see Submariner documentation.

4.4. Creating an OpenShift Data Foundation cluster on managed clusters
Copy link

In order to configure storage replication between the two OpenShift Container Platform clusters, create an OpenShift Data Foundation storage system after you install the OpenShift Data Foundation operator.

Note

Refer to OpenShift Data Foundation deployment guides and instructions that are specific to your infrastructure (AWS, VMware, BM, Azure, etc.).

Procedure

Install and configure the latest OpenShift Data Foundation cluster on each of the managed clusters.
For information about the OpenShift Data Foundation deployment, refer to your infrastructure specific deployment guides (for example, AWS, VMware, Bare metal, Azure).
Note
While creating the storage cluster, in the Data Protection step, you must select the Prepare cluster for disaster recovery (Regional-DR only) checkbox.
Validate the successful deployment of OpenShift Data Foundation on each managed cluster with the following command:
```
oc get storagecluster -n openshift-storage ocs-storagecluster -o jsonpath='{.status.phase}{"\n"}'
```
```
$ oc get storagecluster -n openshift-storage ocs-storagecluster -o jsonpath='{.status.phase}{"\n"}'
```
Copy to Clipboard Toggle word wrap
For the Multicloud Gateway (MCG):
```
oc get noobaa -n openshift-storage noobaa -o jsonpath='{.status.phase}{"\n"}'
```
```
$ oc get noobaa -n openshift-storage noobaa -o jsonpath='{.status.phase}{"\n"}'
```
Copy to Clipboard Toggle word wrap
If the status result is Ready for both queries on the Primary managed cluster and the Secondary managed cluster, then continue with the next step.
In the OpenShift Web Console, navigate to Installed Operators OpenShift Data Foundation Storage System ocs-storagecluster-storagesystem Resources and verify that the Status of StorageCluster is Ready and has a green tick mark next to it.
[Optional] If Globalnet was enabled when Submariner was installed, then edit the StorageCluster after the OpenShift Data Foundation install finishes.
For Globalnet networks, manually edit the StorageCluster yaml to add the clusterID and set enabled to true. Replace <clustername> with your RHACM imported or newly created managed cluster name. Edit the StorageCluster on both the Primary managed cluster and the Secondary managed cluster.
Warning
Do not make this change in the StorageCluster unless you enabled Globalnet when Submariner was installed.
```
oc edit storagecluster -o yaml -n openshift-storage
```
```
$ oc edit storagecluster -o yaml -n openshift-storage
```
Copy to Clipboard Toggle word wrap
```
spec:
  network:
    multiClusterService:
      clusterID: <clustername>
      enabled: true
```
```
spec:
  network:
    multiClusterService:
      clusterID: <clustername>
      enabled: true
```
Copy to Clipboard Toggle word wrap
Important
If multiClusterService is enabled, it can not be disabled or undone as it failsover the MONs and restarts the OSDs with GlobalNet IP addresses which can not be changed once assigned.

After the above changes are made,

Wait for the OSD pods to restart and OSD services to be created.
Wait for all MONS to failover.

Ensure that the MONS and OSD services are exported.

oc get serviceexport -n openshift-storage

$ oc get serviceexport -n openshift-storage

Copy to Clipboard

Toggle word wrap

NAME              AGE
rook-ceph-mon-d   4d14h
rook-ceph-mon-e   4d14h
rook-ceph-mon-f   4d14h
rook-ceph-osd-0   4d14h
rook-ceph-osd-1   4d14h
rook-ceph-osd-2   4d14h

NAME              AGE
rook-ceph-mon-d   4d14h
rook-ceph-mon-e   4d14h
rook-ceph-mon-f   4d14h
rook-ceph-osd-0   4d14h
rook-ceph-osd-1   4d14h
rook-ceph-osd-2   4d14h

Copy to Clipboard

Toggle word wrap

Ensure that cluster is in a Ready state and cluster health has a green tick indicating Health ok. Verify using step 3.

4.5. Installing OpenShift Data Foundation Multicluster Orchestrator operator
Copy link

OpenShift Data Foundation Multicluster Orchestrator is a controller that is installed from OpenShift Container Platform’s OperatorHub on the Hub cluster.

Procedure

On the Hub cluster, navigate to OperatorHub and use the keyword filter to search for ODF Multicluster Orchestrator.
Click ODF Multicluster Orchestrator tile.
Keep all default settings and click Install.
Ensure that the operator resources are installed in openshift-operators project and available to all namespaces.
Note
The ODF Multicluster Orchestrator also installs the Openshift DR Hub Operator on the RHACM hub cluster as a dependency.

Verify that the operator Pods are in a Running state. The OpenShift DR Hub operator is also installed at the same time in openshift-operators namespace.

oc get pods -n openshift-operators

$ oc get pods -n openshift-operators

Copy to Clipboard

Toggle word wrap

Example output:

NAME                                        READY   STATUS       RESTARTS    AGE
odf-multicluster-console-6845b795b9-blxrn   1/1     Running      0           4d20h
odfmo-controller-manager-f9d9dfb59-jbrsd    1/1     Running      0           4d20h
ramen-hub-operator-6fb887f885-fss4w         2/2     Running      0           4d20h

NAME                                        READY   STATUS       RESTARTS    AGE
odf-multicluster-console-6845b795b9-blxrn   1/1     Running      0           4d20h
odfmo-controller-manager-f9d9dfb59-jbrsd    1/1     Running      0           4d20h
ramen-hub-operator-6fb887f885-fss4w         2/2     Running      0           4d20h

Copy to Clipboard

Toggle word wrap

4.6. Configuring SSL access across clusters
Copy link

Configure network (SSL) access between the primary and secondary clusters so that metadata can be stored on the alternate cluster in a Multicloud Gateway (MCG) object bucket using a secure transport protocol and in the Hub cluster for verifying access to the object buckets.

Note

If all of your OpenShift clusters are deployed using a signed and valid set of certificates for your environment then this section can be skipped.

Procedure

Extract the ingress certificate for the Primary managed cluster and save the output to primary.crt.

oc get cm default-ingress-cert -n openshift-config-managed -o jsonpath="{['data']['ca-bundle\.crt']}" > primary.crt

$ oc get cm default-ingress-cert -n openshift-config-managed -o jsonpath="{['data']['ca-bundle\.crt']}" > primary.crt

Copy to Clipboard

Toggle word wrap

Extract the ingress certificate for the Secondary managed cluster and save the output to secondary.crt.

oc get cm default-ingress-cert -n openshift-config-managed -o jsonpath="{['data']['ca-bundle\.crt']}" > secondary.crt

$ oc get cm default-ingress-cert -n openshift-config-managed -o jsonpath="{['data']['ca-bundle\.crt']}" > secondary.crt

Copy to Clipboard

Toggle word wrap

Create a new ConfigMap file to hold the remote cluster’s certificate bundle with filename cm-clusters-crt.yaml.

Note

There could be more or less than three certificates for each cluster as shown in this example file. Also, ensure that the certificate contents are correctly indented after you copy and paste from the primary.crt and secondary.crt files that were created before.

apiVersion: v1
data:
  ca-bundle.crt: |
    -----BEGIN CERTIFICATE-----
    <copy contents of cert1 from primary.crt here>
    -----END CERTIFICATE-----

    -----BEGIN CERTIFICATE-----
    <copy contents of cert2 from primary.crt here>
    -----END CERTIFICATE-----

    -----BEGIN CERTIFICATE-----
    <copy contents of cert3 primary.crt here>
    -----END CERTIFICATE----

    -----BEGIN CERTIFICATE-----
    <copy contents of cert1 from secondary.crt here>
    -----END CERTIFICATE-----

    -----BEGIN CERTIFICATE-----
    <copy contents of cert2 from secondary.crt here>
    -----END CERTIFICATE-----

    -----BEGIN CERTIFICATE-----
    <copy contents of cert3 from secondary.crt here>
    -----END CERTIFICATE-----
kind: ConfigMap
metadata:
  name: user-ca-bundle
  namespace: openshift-config

apiVersion: v1
data:
  ca-bundle.crt: |
    -----BEGIN CERTIFICATE-----
    <copy contents of cert1 from primary.crt here>
    -----END CERTIFICATE-----

    -----BEGIN CERTIFICATE-----
    <copy contents of cert2 from primary.crt here>
    -----END CERTIFICATE-----

    -----BEGIN CERTIFICATE-----
    <copy contents of cert3 primary.crt here>
    -----END CERTIFICATE----

    -----BEGIN CERTIFICATE-----
    <copy contents of cert1 from secondary.crt here>
    -----END CERTIFICATE-----

    -----BEGIN CERTIFICATE-----
    <copy contents of cert2 from secondary.crt here>
    -----END CERTIFICATE-----

    -----BEGIN CERTIFICATE-----
    <copy contents of cert3 from secondary.crt here>
    -----END CERTIFICATE-----
kind: ConfigMap
metadata:
  name: user-ca-bundle
  namespace: openshift-config

Copy to Clipboard

Toggle word wrap

Create the ConfigMap on the Primary managed cluster, Secondary managed cluster, and the Hub cluster.
```
oc create -f cm-clusters-crt.yaml
```
```
$ oc create -f cm-clusters-crt.yaml
```
Copy to Clipboard Toggle word wrap
Example output:
```
configmap/user-ca-bundle created
```
```
configmap/user-ca-bundle created
```
Copy to Clipboard Toggle word wrap

Patch default proxy resource on the Primary managed cluster, Secondary managed cluster, and the Hub cluster.

oc patch proxy cluster --type=merge  --patch='{"spec":{"trustedCA":{"name":"user-ca-bundle"}}}'

$ oc patch proxy cluster --type=merge  --patch='{"spec":{"trustedCA":{"name":"user-ca-bundle"}}}'

Copy to Clipboard

Toggle word wrap

Example output:

proxy.config.openshift.io/cluster patched

proxy.config.openshift.io/cluster patched

Copy to Clipboard

Toggle word wrap

4.7. Creating Disaster Recovery Policy on Hub cluster
Copy link

Openshift Disaster Recovery Policy (DRPolicy) resource specifies OpenShift Container Platform clusters participating in the disaster recovery solution and the desired replication interval. DRPolicy is a cluster scoped resource that users can apply to applications that require Disaster Recovery solution.

The ODF MultiCluster Orchestrator Operator facilitates the creation of each DRPolicy and the corresponding DRClusters through the Multicluster Web console.

Prerequisites

Ensure that there is a minimum set of two managed clusters.

Procedure

On the OpenShift console, navigate to All Clusters Data Services Disaster recovery.
On the Overview tab, click Create a disaster recovery policy or you can navigate to Policies tab and click Create DRPolicy.
Enter Policy name. Ensure that each DRPolicy has a unique name (for example: ocp4bos1-ocp4bos2-5m).
Select two clusters from the list of managed clusters to which this new policy will be associated with.
Note
If you get an error message "OSDs not migrated" after selecting the clusters, then follow the instructions from knowledgebase article on Migration of existing OSD to the optimized OSD in OpenShift Data Foundation for Regional-DR cluster before proceeding with the next step.
Replication policy is automatically set to Asynchronous(async) based on the OpenShift clusters selected and a Sync schedule option will become available.
Set Sync schedule.
Important
For every desired replication interval a new DRPolicy must be created with a unique name (such as: ocp4bos1-ocp4bos2-10m). The same clusters can be selected but the Sync schedule can be configured with a different replication interval in minutes/hours/days. The minimum is one minute.
Click Create.
Verify that the DRPolicy is created successfully. Run this command on the Hub cluster for each of the DRPolicy resources created, where <drpolicy_name> is replaced with your unique name.
```
oc get drpolicy <drpolicy_name> -o jsonpath='{.status.conditions[].reason}{"\n"}'
```
```
$ oc get drpolicy <drpolicy_name> -o jsonpath='{.status.conditions[].reason}{"\n"}'
```
Copy to Clipboard Toggle word wrap
Example output:
```
Succeeded
```
```
Succeeded
```
Copy to Clipboard Toggle word wrap
When a DRPolicy is created, along with it, two DRCluster resources are also created. It could take up to 10 minutes for all three resources to be validated and for the status to show as Succeeded.
Note
Editing of SchedulingInterval, ReplicationClassSelector, VolumeSnapshotClassSelector and DRClusters field values are not supported in the DRPolicy.
Verify the object bucket access from the Hub cluster to both the Primary managed cluster and the Secondary managed cluster.
1. Get the names of the DRClusters on the Hub cluster.
  $ oc get drclusters
  Copy to Clipboard Toggle word wrap
  Example output:
  NAME AGE ocp4bos1 4m42s ocp4bos2 4m42s
  Copy to Clipboard Toggle word wrap
2. Check S3 access to each bucket created on each managed cluster. Use the DRCluster validation command, where <drcluster_name> is replaced with your unique name.
  Note
  Editing of Region and S3ProfileName field values are non supported in DRClusters.
  $ oc get drcluster <drcluster_name> -o jsonpath='{.status.conditions[2].reason}{"\n"}'
  Copy to Clipboard Toggle word wrap
  Example output:
  Succeeded
  Copy to Clipboard Toggle word wrap
  Note
  Make sure to run commands for both DRClusters on the Hub cluster.

Verify that the OpenShift DR Cluster operator installation was successful on the Primary managed cluster and the Secondary managed cluster.

oc get csv,pod -n openshift-dr-system

$ oc get csv,pod -n openshift-dr-system

Copy to Clipboard

Toggle word wrap

Example output:

NAME                                                                            DISPLAY                         VERSION        REPLACES   PHASE
clusterserviceversion.operators.coreos.com/odr-cluster-operator.v4.15.0         Openshift DR Cluster Operator   4.15.0                    Succeeded
clusterserviceversion.operators.coreos.com/volsync-product.v0.8.0               VolSync                         0.8.0                     Succeeded

NAME                                             READY   STATUS    RESTARTS   AGE
pod/ramen-dr-cluster-operator-6467cf5d4c-cc8kz   2/2     Running   0          3d12h

NAME                                                                            DISPLAY                         VERSION        REPLACES   PHASE
clusterserviceversion.operators.coreos.com/odr-cluster-operator.v4.15.0         Openshift DR Cluster Operator   4.15.0                    Succeeded
clusterserviceversion.operators.coreos.com/volsync-product.v0.8.0               VolSync                         0.8.0                     Succeeded

NAME                                             READY   STATUS    RESTARTS   AGE
pod/ramen-dr-cluster-operator-6467cf5d4c-cc8kz   2/2     Running   0          3d12h

Copy to Clipboard

Toggle word wrap

You can also verify that OpenShift DR Cluster Operator is installed successfully on the OperatorHub of each managed cluster.

Note

On the initial run, VolSync operator is installed automatically. VolSync is used to set up volume replication between two clusters to protect CephFs-based PVCs. The replication feature is enabled by default.

Verify that the status of the OpenShift Data Foundation mirroring daemon health on the Primary managed cluster and the Secondary managed cluster.
```
oc get cephblockpool ocs-storagecluster-cephblockpool -n openshift-storage -o jsonpath='{.status.mirroringStatus.summary}{"\n"}'
```
```
$ oc get cephblockpool ocs-storagecluster-cephblockpool -n openshift-storage -o jsonpath='{.status.mirroringStatus.summary}{"\n"}'
```
Copy to Clipboard Toggle word wrap
Example output:
```
{"daemon_health":"OK","health":"OK","image_health":"OK","states":{}}
```
```
{"daemon_health":"OK","health":"OK","image_health":"OK","states":{}}
```
Copy to Clipboard Toggle word wrap
Important
It could take up to 10 minutes for the daemon_health and health to go from Warning to OK. If the status does not become OK eventually, then use the RHACM console to verify that the Submariner connection between managed clusters is still in a healthy state. Do not proceed until all values are OK.

4.8. Create sample application for testing disaster recovery solution
Copy link

OpenShift Data Foundation disaster recovery (DR) solution supports disaster recovery for Subscription-based and ApplicationSet-based applications that are managed by RHACM. For more details, see Subscriptions and ApplicationSet documentation.

The following sections detail how to create an application and apply a DRPolicy to an application.

Subscription-based applications
OpenShift users that do not have cluster-admin permissions, see the knowledge article on how to assign necessary permissions to an application user for executing disaster recovery actions.
ApplicationSet-based applications
OpenShift users that do not have cluster-admin permissions cannot create ApplicationSet-based applications.

4.8.1. Subscription-based applications
Copy link

4.8.1.1. Creating a sample Subscription-based application
Copy link

In order to test failover from the Primary managed cluster to the Secondary managed cluster and relocate, we need a sample application.

Prerequisites

When creating an application for general consumption, ensure that the application is deployed to ONLY one cluster.
Use the sample application called busybox as an example.
Ensure all external routes of the application are configured using either Global Traffic Manager (GTM) or Global Server Load Balancing (GLSB) service for traffic redirection when the application fails over or is relocated.
As a best practice, group Red Hat Advanced Cluster Management (RHACM) subscriptions that belong together, refer to a single Placement Rule to DR protect them as a group. Further create them as a single application for a logical grouping of the subscriptions for future DR actions like failover and relocate.
Note
If unrelated subscriptions refer to the same Placement Rule for placement actions, they are also DR protected as the DR workflow controls all subscriptions that references the Placement Rule.

Procedure

On the Hub cluster, navigate to Applications and click Create application.
Select type as Subscription.
Enter your application Name (for example, busybox) and Namespace (for example, busybox-sample).
In the Repository location for resources section, select Repository type Git.
Enter the Git repository URL for the sample application, the github Branch and Path where the resources busybox Pod and PVC will be created.
- Use the sample application repository as https://github.com/red-hat-storage/ocm-ramen-samples
- Select Branch as release-4.16.
- Choose one of the following Path
  - busybox-odr to use RBD Regional-DR.
  - busybox-odr-cephfs to use CephFS Regional-DR.
Scroll down in the form until you see Deploy application resources on clusters with all specified labels.
- Select the global Cluster sets or the one that includes the correct managed clusters for your environment.
- Add a label <name> with its value set to the managed cluster name.
Click Create which is at the top right hand corner.
On the follow-on screen go to the Topology tab. You should see that there are all Green checkmarks on the application topology.
Note
To get more information, click on any of the topology elements and a window will appear on the right of the topology view.

Validating the sample application deployment.

Now that the busybox application has been deployed to your preferred Cluster, the deployment can be validated.

Log in to your managed cluster where busybox was deployed by RHACM.

oc get pods,pvc -n busybox-sample

$ oc get pods,pvc -n busybox-sample

Copy to Clipboard

Toggle word wrap

Example output:

NAME                          READY   STATUS    RESTARTS   AGE
pod/busybox-67bf494b9-zl5tr   1/1     Running   0          77s


NAME                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                AGE
persistentvolumeclaim/busybox-pvc   Bound    pvc-c732e5fe-daaf-4c4d-99dd-462e04c18412   5Gi        RWO            ocs-storagecluster-ceph-rbd   77s

NAME                          READY   STATUS    RESTARTS   AGE
pod/busybox-67bf494b9-zl5tr   1/1     Running   0          77s


NAME                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                AGE
persistentvolumeclaim/busybox-pvc   Bound    pvc-c732e5fe-daaf-4c4d-99dd-462e04c18412   5Gi        RWO            ocs-storagecluster-ceph-rbd   77s

Copy to Clipboard

Toggle word wrap

4.8.1.2. Apply Data policy to sample application
Copy link

Prerequisites

Ensure that both managed clusters referenced in the Data policy are reachable. If not, the application will not be protected for disaster recovery until both clusters are online.

Procedure

On the Hub cluster, navigate to All Clusters Applications.
Click the Actions menu at the end of application to view the list of available actions.
Click Manage data policy Assign data policy.
Select Policy and click Next.
Select an Application resource and then use PVC label selector to select PVC label for the selected application resource.
Note
You can select more than one PVC label for the selected application resources. You can also use the Add application resource option to add multiple resources.
After adding all the application resources, click Next.
Review the Policy configuration details and click Assign. The newly assigned Data policy is displayed on the Manage data policy modal list view.
Verify that you can view the assigned policy details on the Applications page.
1. On the Applications page, navigate to the Data policy column and click the policy link to expand the view.
2. Verify that you can see the number of policies assigned along with failover and relocate status.
3. Click View more details to view the status of ongoing activities with the policy in use with the application.

Optional: Verify RADOS block device (RBD) volumereplication and volumereplicationgroup on the primary cluster.

oc get volumereplications.replication.storage.openshift.io -A

$ oc get volumereplications.replication.storage.openshift.io -A

Copy to Clipboard

Toggle word wrap

Example output:

NAME             AGE     VOLUMEREPLICATIONCLASS                  PVCNAME          DESIREDSTATE   CURRENTSTATE
busybox-pvc      2d16h   rbd-volumereplicationclass-1625360775   busybox-pvc      primary        Primary

NAME             AGE     VOLUMEREPLICATIONCLASS                  PVCNAME          DESIREDSTATE   CURRENTSTATE
busybox-pvc      2d16h   rbd-volumereplicationclass-1625360775   busybox-pvc      primary        Primary

Copy to Clipboard

Toggle word wrap

oc get volumereplicationgroups.ramendr.openshift.io -A

$ oc get volumereplicationgroups.ramendr.openshift.io -A

Copy to Clipboard

Toggle word wrap

Example output:

NAME           DESIREDSTATE   CURRENTSTATE
busybox-drpc   primary        Primary

NAME           DESIREDSTATE   CURRENTSTATE
busybox-drpc   primary        Primary

Copy to Clipboard

Toggle word wrap

Optional: Verify CephFS volsync replication source has been set up successfully in the primary cluster and VolSync ReplicationDestination has been set up in the failover cluster.

oc get replicationsource -n busybox-sample

$ oc get replicationsource -n busybox-sample

Copy to Clipboard

Toggle word wrap

Example output:

NAME             SOURCE           LAST SYNC              DURATION          NEXT SYNC
busybox-pvc      busybox-pvc      2022-12-20T08:46:07Z   1m7.794661104s    2022-12-20T08:50:00Z

NAME             SOURCE           LAST SYNC              DURATION          NEXT SYNC
busybox-pvc      busybox-pvc      2022-12-20T08:46:07Z   1m7.794661104s    2022-12-20T08:50:00Z

Copy to Clipboard

Toggle word wrap

oc get replicationdestination -n busybox-sample

$ oc get replicationdestination -n busybox-sample

Copy to Clipboard

Toggle word wrap

Example output:

NAME             LAST SYNC              DURATION          NEXT SYNC
busybox-pvc      2022-12-20T08:46:32Z   4m39.52261108s

NAME             LAST SYNC              DURATION          NEXT SYNC
busybox-pvc      2022-12-20T08:46:32Z   4m39.52261108s

Copy to Clipboard

Toggle word wrap

4.8.2. ApplicationSet-based applications
Copy link

4.8.2.1. Creating ApplicationSet-based applications
Copy link

Prerequisite

Ensure that the Red Hat OpenShift GitOps operator is installed on all three clusters: Hub cluster, Primary managed cluster and Secondary managed cluster. For instructions, see Installing Red Hat OpenShift GitOps Operator in web console.
On the Hub cluster, ensure that both Primary and Secondary managed clusters are registered to GitOps. For registration instructions, see Registering managed clusters to GitOps. Then check if the Placement used by GitOpsCluster resource to register both managed clusters, has the tolerations to deal with cluster unavailability. You can verify if the following tolerations are added to the Placement using the command oc get placement <placement-name> -n openshift-gitops -o yaml.
```
  tolerations:
  - key: cluster.open-cluster-management.io/unreachable
    operator: Exists
  - key: cluster.open-cluster-management.io/unavailable
    operator: Exists
```
```
  tolerations:
  - key: cluster.open-cluster-management.io/unreachable
    operator: Exists
  - key: cluster.open-cluster-management.io/unavailable
    operator: Exists
```
Copy to Clipboard Toggle word wrap
In case the tolerations are not added, see Configuring application placement tolerations for Red Hat Advanced Cluster Management and OpenShift GitOps.
Ensure that you have created the ClusterRoleBinding yaml on both the Primary and Secondary managed clusters. For instruction, see the Prerequisites chapter in RHACM documentation.

Procedure

On the Hub cluster, navigate to All Clusters Applications and click Create application.
Choose the application type as Argo CD ApplicationSet - Pull model.
In the General step, enter your Application set name.
Select Argo server openshift-gitops and Requeue time as 180 seconds.
Click Next.
In the Repository location for resources section, select Repository type Git.
Enter the Git repository URL for the sample application, the github Branch and Path where the resources busybox Pod and PVC will be created.
1. Use the sample application repository as https://github.com/red-hat-storage/ocm-ramen-samples
2. Select Revision as release-4.16
3. Choose one of the following Path:
  - busybox-odr to use RBD Regional-DR.
  - busybox-odr-cephfs to use CephFS Regional-DR.
Enter Remote namespace value. (example, busybox-sample) and click Next.
Choose the Sync policy settings as per your requirement or go with the default selections, and then click Next.
You can choose one or more options.
In Label expressions, add a label <name> with its value set to the managed cluster name.
Click Next.
Review the setting details and click Submit.

4.8.2.2. Apply Data policy to sample ApplicationSet-based application
Copy link

Prerequisites

Ensure that both managed clusters referenced in the Data policy are reachable. If not, the application will not be protected for disaster recovery until both clusters are online.

Procedure

On the Hub cluster, navigate to All Clusters Applications.
Click the Actions menu at the end of application to view the list of available actions.
Click Manage data policy Assign data policy.
Select Policy and click Next.
Select an Application resource and then use PVC label selector to select PVC label for the selected application resource.
Note
You can select more than one PVC label for the selected application resources.
After adding all the application resources, click Next.
Review the Policy configuration details and click Assign. The newly assigned Data policy is displayed on the Manage data policy modal list view.
Verify that you can view the assigned policy details on the Applications page.
1. On the Applications page, navigate to the Data policy column and click the policy link to expand the view.
2. Verify that you can see the number of policies assigned along with failover and relocate status.

Optional: Verify Rados block device (RBD) volumereplication and volumereplicationgroup on the primary cluster.

oc get volumereplications.replication.storage.openshift.io -A

$ oc get volumereplications.replication.storage.openshift.io -A

Copy to Clipboard

Toggle word wrap

Example output:

NAME             AGE     VOLUMEREPLICATIONCLASS                  PVCNAME          DESIREDSTATE   CURRENTSTATE
busybox-pvc      2d16h   rbd-volumereplicationclass-1625360775   busybox-pvc      primary        Primary

NAME             AGE     VOLUMEREPLICATIONCLASS                  PVCNAME          DESIREDSTATE   CURRENTSTATE
busybox-pvc      2d16h   rbd-volumereplicationclass-1625360775   busybox-pvc      primary        Primary

Copy to Clipboard

Toggle word wrap

oc get volumereplicationgroups.ramendr.openshift.io -A

$ oc get volumereplicationgroups.ramendr.openshift.io -A

Copy to Clipboard

Toggle word wrap

Example output:

NAME           DESIREDSTATE   CURRENTSTATE
busybox-drpc   primary        Primary

NAME           DESIREDSTATE   CURRENTSTATE
busybox-drpc   primary        Primary

Copy to Clipboard

Toggle word wrap

Optional: Verify CephFS volsync replication source has been setup successfully in the primary cluster and VolSync ReplicationDestination has been setup in the failover cluster.

oc get replicationsource -n busybox-sample

$ oc get replicationsource -n busybox-sample

Copy to Clipboard

Toggle word wrap

Example output:

NAME             SOURCE           LAST SYNC              DURATION          NEXT SYNC
busybox-pvc      busybox-pvc      2022-12-20T08:46:07Z   1m7.794661104s    2022-12-20T08:50:00Z

NAME             SOURCE           LAST SYNC              DURATION          NEXT SYNC
busybox-pvc      busybox-pvc      2022-12-20T08:46:07Z   1m7.794661104s    2022-12-20T08:50:00Z

Copy to Clipboard

Toggle word wrap

oc get replicationdestination -n busybox-sample

$ oc get replicationdestination -n busybox-sample

Copy to Clipboard

Toggle word wrap

Example output:

NAME             LAST SYNC              DURATION          NEXT SYNC
busybox-pvc      2022-12-20T08:46:32Z   4m39.52261108s

NAME             LAST SYNC              DURATION          NEXT SYNC
busybox-pvc      2022-12-20T08:46:32Z   4m39.52261108s

Copy to Clipboard

Toggle word wrap

4.8.3. Deleting sample application
Copy link

This section provides instructions for deleting the sample application busybox using the RHACM console.

Important

When deleting a DR protected application, access to both clusters that belong to the DRPolicy is required. This is to ensure that all protected API resources and resources in the respective S3 stores are cleaned up as part of removing the DR protection. If access to one of the clusters is not healthy, deleting the DRPlacementControl resource for the application, on the hub, would remain in the Deleting state.

Prerequisites

These instructions to delete the sample application should not be executed until the failover and relocate testing is completed and the application is ready to be removed from RHACM and the managed clusters.

Procedure

On the RHACM console, navigate to Applications.
Search for the sample application to be deleted (for example, busybox).
Click the Action Menu (⋮) next to the application you want to delete.
Click Delete application.
When the Delete application is selected a new screen will appear asking if the application related resources should also be deleted.
Select Remove application related resources checkbox to delete the Subscription and PlacementRule.
Click Delete. This will delete the busybox application on the Primary managed cluster (or whatever cluster the application was running on).
In addition to the resources deleted using the RHACM console, delete the DRPlacementControl if it is not auto-deleted after deleting the busybox application.
1. Log in to the OpenShift Web console for the Hub cluster and navigate to Installed Operators for the project busybox-sample.
  For ApplicationSet applications, select the project as openshift-gitops.
2. Click OpenShift DR Hub Operator and then click the DRPlacementControl tab.
3. Click the Action Menu (⋮) next to the busybox application DRPlacementControl that you want to delete.
4. Click Delete DRPlacementControl.
5. Click Delete.

Note

This process can be used to delete any application with a DRPlacementControl resource.

4.9. Subscription-based application failover between managed clusters
Copy link

Failover is a process that transitions an application from a primary cluster to a secondary cluster in the event of a primary cluster failure. While failover provides the ability for the application to run on the secondary cluster with minimal interruption, making an uninformed failover decision can have adverse consequences, such as complete data loss in the event of unnoticed replication failure from primary to secondary cluster. If a significant amount of time has gone by since the last successful replication, it’s best to wait until the failed primary is recovered.

LastGroupSyncTime is a critical metric that reflects the time since the last successful replication occurred for all PVCs associated with an application. In essence, it measures the synchronization health between the primary and secondary clusters. So, prior to initiating a failover from one cluster to another, check for this metric and only initiate the failover if the LastGroupSyncTime is within a reasonable time in the past.

Note

During the course of failover the Ceph-RBD mirror deployment on the failover cluster is scaled down to ensure a clean failover for volumes that are backed by Ceph-RBD as the storage provisioner.

Prerequisites

If your setup has active and passive RHACM hub clusters, see Hub recovery using Red Hat Advanced Cluster Management.
When the primary cluster is in a state other than Ready, check the actual status of the cluster as it might take some time to update.
1. Navigate to the RHACM console Infrastructure Clusters Cluster list tab.
2. Check the status of both the managed clusters individually before performing failover operation.
  However, failover operation can still be performed when the cluster you are failing over to is in a Ready state.
Run the following command on the Hub Cluster to check if lastGroupSyncTime is within an acceptable data loss window, when compared to current time.
```
oc get drpc -o yaml -A | grep lastGroupSyncTime
```
```
$ oc get drpc -o yaml -A | grep lastGroupSyncTime
```
Copy to Clipboard Toggle word wrap
Example output:
```
[...]
lastGroupSyncTime: "2023-07-10T12:40:10Z"
```
```
[...]
lastGroupSyncTime: "2023-07-10T12:40:10Z"
```
Copy to Clipboard Toggle word wrap

Procedure

On the Hub cluster, navigate to Applications.
Click the Actions menu at the end of application row to view the list of available actions.
Click Failover application.
After the Failover application modal is shown, select policy and target cluster to which the associated application will failover in case of a disaster.
Click the Select subscription group dropdown to verify the default selection or modify this setting.
By default, the subscription group that replicates for the application resources is selected.
Check the status of the Failover readiness.
- If the status is Ready with a green tick, it indicates that the target cluster is ready for failover to start. Proceed to step 7.
- If the status is Unknown or Not ready, then wait until the status changes to Ready.
Click Initiate. The busybox application is now failing over to the Secondary-managed cluster.
Close the modal window and track the status using the Data policy column on the Applications page.
Verify that the activity status shows as FailedOver for the application.
1. Navigate to the Applications Overview tab.
2. In the Data policy column, click the policy link for the application you applied the policy to.
3. On the Data policy popover, click the View more details link.
4. Verify that you can see one or more policy names and the ongoing activities (Last sync time and Activity status) associated with the policy in use with the application.

4.10. ApplicationSet-based application failover between managed clusters
Copy link

Failover is a process that transitions an application from a primary cluster to a secondary cluster in the event of a primary cluster failure. While failover provides the ability for the application to run on the secondary cluster with minimal interruption, making an uninformed failover decision can have adverse consequences, such as complete data loss in the event of unnoticed replication failure from primary to secondary cluster. If a significant amount of time has gone by since the last successful replication, it’s best to wait until the failed primary is recovered.

LastGroupSyncTime is a critical metric that reflects the time since the last successful replication occurred for all PVCs associated with an application. In essence, it measures the synchronization health between the primary and secondary clusters. So, prior to initiating a failover from one cluster to another, check for this metric and only initiate the failover if the LastGroupSyncTime is within a reasonable time in the past.

Note

During the course of failover the Ceph-RBD mirror deployment on the failover cluster is scaled down to ensure a clean failover for volumes that are backed by Ceph-RBD as the storage provisioner.

Prerequisites

If your setup has active and passive RHACM hub clusters, see Hub recovery using Red Hat Advanced Cluster Management .
When the primary cluster is in a state other than Ready, check the actual status of the cluster as it might take some time to update.
1. Navigate to the RHACM console Infrastructure Clusters Cluster list tab.
2. Check the status of both the managed clusters individually before performing failover operation.
  However, failover operation can still be performed when the cluster you are failing over to is in a Ready state.
Run the following command on the Hub Cluster to check if lastGroupSyncTime is within an acceptable data loss window, when compared to current time.
```
oc get drpc -o yaml -A | grep lastGroupSyncTime
```
```
$ oc get drpc -o yaml -A | grep lastGroupSyncTime
```
Copy to Clipboard Toggle word wrap
Example output:
```
[...]
lastGroupSyncTime: "2023-07-10T12:40:10Z"
```
```
[...]
lastGroupSyncTime: "2023-07-10T12:40:10Z"
```
Copy to Clipboard Toggle word wrap

Procedure

On the Hub cluster, navigate to Applications.
Click the Actions menu at the end of application row to view the list of available actions.
Click Failover application.
When the Failover application modal is shown, verify the details presented are correct and check the status of the Failover readiness. If the status is Ready with a green tick, it indicates that the target cluster is ready for failover to start.
Click Initiate. The busybox resources are now created on the target cluster.
Close the modal window and track the status using the Data policy column on the Applications page.
Verify that the activity status shows as FailedOver for the application.
1. Navigate to the Applications Overview tab.
2. In the Data policy column, click the policy link for the application you applied the policy to.
3. On the Data policy popover, verify that you can see one or more policy names and the ongoing activities associated with the policy in use with the application.

4.11. Relocating Subscription-based application between managed clusters
Copy link

Relocate an application to its preferred location when all managed clusters are available.

Prerequisite

If your setup has active and passive RHACM hub clusters, see Hub recovery using Red Hat Advanced Cluster Management.
When the primary cluster is in a state other than Ready, check the actual status of the cluster as it might take some time to update. Relocate can only be performed when both primary and preferred clusters are up and running.
1. Navigate to RHACM console Infrastructure Clusters Cluster list tab.
2. Check the status of both the managed clusters individually before performing relocate operation.
Perform relocate when lastGroupSyncTime is within the replication interval (for example, 5 minutes) when compared to current time. This is recommended to minimize the Recovery Time Objective (RTO) for any single application.
Run this command on the Hub Cluster:
```
oc get drpc -o yaml -A | grep lastGroupSyncTime
```
```
$ oc get drpc -o yaml -A | grep lastGroupSyncTime
```
Copy to Clipboard Toggle word wrap
Example output:
```
[...]
lastGroupSyncTime: "2023-07-10T12:40:10Z"
```
```
[...]
lastGroupSyncTime: "2023-07-10T12:40:10Z"
```
Copy to Clipboard Toggle word wrap
Compare the output time (UTC) to current time to validate that all lastGroupSyncTime values are within their application replication interval. If not, wait to Relocate until this is true for all lastGroupSyncTime values.

Procedure

On the Hub cluster, navigate to Applications.
Click the Actions menu at the end of application row to view the list of available actions.
Click Relocate application.
When the Relocate application modal is shown, select policy and target cluster to which the associated application will relocate to in case of a disaster.
By default, the subscription group that will deploy the application resources is selected. Click the Select subscription group dropdown to verify the default selection or modify this setting.
Check the status of the Relocation readiness.
- If the status is Ready with a green tick, it indicates that the target cluster is ready for relocation to start. Proceed to step 7.
- If the status is Unknown or Not ready, then wait until the status changes to Ready.
Click Initiate. The busybox resources are now created on the target cluster.
Close the modal window and track the status using the Data policy column on the Applications page.
Verify that the activity status shows as Relocated for the application.
1. Navigate to the Applications Overview tab.
2. In the Data policy column, click the policy link for the application you applied the policy to.
3. On the Data policy popover, click the View more details link.
4. Verify that you can see one or more policy names and the ongoing activities (Last sync time and Activity status) associated with the policy in use with the application.

4.12. Relocating an ApplicationSet-based application between managed clusters
Copy link

Relocate an application to its preferred location when all managed clusters are available.

Prerequisite

If your setup has active and passive RHACM hub clusters, see Hub recovery using Red Hat Advanced Cluster Management.
When the primary cluster is in a state other than Ready, check the actual status of the cluster as it might take some time to update. Relocate can only be performed when both primary and preferred clusters are up and running.
1. Navigate to RHACM console Infrastructure Clusters Cluster list tab.
2. Check the status of both the managed clusters individually before performing relocate operation.
Perform relocate when lastGroupSyncTime is within the replication interval (for example, 5 minutes) when compared to current time. This is recommended to minimize the Recovery Time Objective (RTO) for any single application.
Run this command on the Hub Cluster:
```
oc get drpc -o yaml -A | grep lastGroupSyncTime
```
```
$ oc get drpc -o yaml -A | grep lastGroupSyncTime
```
Copy to Clipboard Toggle word wrap
Example output:
```
[...]
lastGroupSyncTime: "2023-07-10T12:40:10Z"
```
```
[...]
lastGroupSyncTime: "2023-07-10T12:40:10Z"
```
Copy to Clipboard Toggle word wrap
Compare the output time (UTC) to current time to validate that all lastGroupSyncTime values are within their application replication interval. If not, wait to Relocate until this is true for all lastGroupSyncTime values.

Procedure

On the Hub cluster, navigate to Applications.
Click the Actions menu at the end of application row to view the list of available actions.
Click Relocate application.
When the Relocate application modal is shown, select policy and target cluster to which the associated application will relocate to in case of a disaster.
Click Initiate. The busybox resources are now created on the target cluster.
Close the modal window and track the status using the Data policy column on the Applications page.
Verify that the activity status shows as Relocated for the application.
1. Navigate to the Applications Overview tab.
2. In the Data policy column, click the policy link for the application you applied the policy to.
3. On the Data policy popover, verify that you can see one or more policy names and the relocation status associated with the policy in use with the application.

4.13. Disaster recovery protection for discovered applications
Copy link

Red Hat OpenShift Data Foundation now provides disaster recovery (DR) protection and support for workloads that are deployed in one of the managed clusters directly without using Red Hat Advanced Cluster Management (RHACM). These workloads are called discovered applications.

Workloads deployed using RHACM are termed managed applications, while those deployed directly on one of the managed clusters without using RHACM are called discovered applications. Although RHACM displays the details of both types of workloads, it does not manage the lifecycle (create, delete, edit) of discovered applications.

4.13.1. Prerequisites for disaster recovery protection of discovered applications
Copy link

This section provides instructions to guide you through the prerequisites for protecting discovered applications. This includes tasks such as assigning a data policy and initiating DR actions such as failover and relocate.

Ensure that all the DR configurations have been installed on the Primary managed cluster and the Secondary managed cluster.
Install the OADP 1.4 operator.
Note
Any version before OADP 1.4 will not work for protecting discovered applications.
1. On the Primary and Secondary managed cluster, navigate to OperatorHub and use the keyword filter to search for OADP.
2. Click the OADP tile.
3. Keep all default settings and click Install. Ensure that the operator resources are installed in the openshift-adp project.
Note
If OADP 1.4 is installed after DR configuration has been completed then the ramen-dr-cluster-operator pods on the Primary managed cluster and the Secondary managed cluster in namespace openshift-dr-system must be restarted (deleted and recreated).

[Optional] Add CACertificates to ramen-hub-operator-config ConfigMap.

Configure network (SSL) access between the primary and secondary clusters so that metadata can be stored on the alternate cluster in a Multicloud Gateway (MCG) object bucket using a secure transport protocol and in the Hub cluster for verifying access to the object buckets.

Note

If all of your OpenShift clusters are deployed using a signed and valid set of certificates for your environment then this section can be skipped.

If you are using self-signed certificates, then you have already created a ConfigMap named user-ca-bundle in the openshift-config namespace and added this ConfigMap to the default Proxy cluster resource. This means you need to add the caCertificates parameter to the configmap ramen-hub-operator-config with the encoded value.

Find the encoded value for the CACertificates.

oc get configmap user-ca-bundle -n openshift-config -o jsonpath="{['data']['ca-bundle\.crt']}" |base64 -w 0

$ oc get configmap user-ca-bundle -n openshift-config -o jsonpath="{['data']['ca-bundle\.crt']}" |base64 -w 0

Copy to Clipboard

Toggle word wrap

Add this base64 encoded value to the configmap ramen-hub-operator-config on the Hub cluster. Example below shows where to add CACertificates.

oc edit configmap ramen-hub-operator-config -n openshift-operators

$ oc edit configmap ramen-hub-operator-config -n openshift-operators

Copy to Clipboard

Toggle word wrap

[...]
    ramenOpsNamespace: openshift-dr-ops
    s3StoreProfiles:
    - s3Bucket: odrbucket-36bceb61c09c
      s3CompatibleEndpoint: https://s3-openshift-storage.apps.hyper3.vmw.ibmfusion.eu
      s3ProfileName: s3profile-hyper3-ocs-storagecluster
      s3Region: noobaa
      s3SecretRef:
        name: 60f2ea6069e168346d5ad0e0b5faa59bb74946f
      caCertificates: {input base64 encoded value here}
    - s3Bucket: odrbucket-36bceb61c09c
      s3CompatibleEndpoint: https://s3-openshift-storage.apps.hyper4.vmw.ibmfusion.eu
      s3ProfileName: s3profile-hyper4-ocs-storagecluster
      s3Region: noobaa
      s3SecretRef:
        name: cc237eba032ad5c422fb939684eb633822d7900
      caCertificates: {input base64 encoded value here}

[...]
    ramenOpsNamespace: openshift-dr-ops
    s3StoreProfiles:
    - s3Bucket: odrbucket-36bceb61c09c
      s3CompatibleEndpoint: https://s3-openshift-storage.apps.hyper3.vmw.ibmfusion.eu
      s3ProfileName: s3profile-hyper3-ocs-storagecluster
      s3Region: noobaa
      s3SecretRef:
        name: 60f2ea6069e168346d5ad0e0b5faa59bb74946f
      caCertificates: {input base64 encoded value here}
    - s3Bucket: odrbucket-36bceb61c09c
      s3CompatibleEndpoint: https://s3-openshift-storage.apps.hyper4.vmw.ibmfusion.eu
      s3ProfileName: s3profile-hyper4-ocs-storagecluster
      s3Region: noobaa
      s3SecretRef:
        name: cc237eba032ad5c422fb939684eb633822d7900
      caCertificates: {input base64 encoded value here}

Copy to Clipboard

Toggle word wrap

Verify that there are DR secrets created in the OADP operator default namespace openshift-adp on the Primary managed cluster and the Secondary managed cluster. The DR secrets that were created when the first DRPolicy was created, will be similar to the secrets below. The DR secret name is preceded with the letter v.
```
oc get secrets -n openshift-adp
```
```
$ oc get secrets -n openshift-adp
NAME                                       TYPE     DATA   AGE
v60f2ea6069e168346d5ad0e0b5faa59bb74946f   Opaque   1      3d20h
vcc237eba032ad5c422fb939684eb633822d7900   Opaque   1      3d20h
[...]
```
Copy to Clipboard Toggle word wrap
Note
There will be one DR created secret for each managed cluster in the openshift-adp namespace.

Verify if the Data Protection Application (DPA) is already installed on each managed cluster in the OADP namespace openshift-adp. If not already created then follow the next step to create this resource.

Create the DPA by copying the following YAML definition content to dpa.yaml.

apiVersion: oadp.openshift.io/v1alpha1
kind: DataProtectionApplication
metadata:
  labels:
    app.kubernetes.io/component: velero
  name: velero
  namespace: openshift-adp
spec:
  backupImages: false
  configuration:
    nodeAgent:
      enable: false
      uploaderType: restic
    velero:
      defaultPlugins:
        - openshift
        - aws
      noDefaultBackupLocation: true

apiVersion: oadp.openshift.io/v1alpha1
kind: DataProtectionApplication
metadata:
  labels:
    app.kubernetes.io/component: velero
  name: velero
  namespace: openshift-adp
spec:
  backupImages: false
  configuration:
    nodeAgent:
      enable: false
      uploaderType: restic
    velero:
      defaultPlugins:
        - openshift
        - aws
      noDefaultBackupLocation: true

Copy to Clipboard

Toggle word wrap

Create the DPA resource.

oc create -f dpa.yaml -n openshift-adp

$ oc create -f dpa.yaml -n openshift-adp

Copy to Clipboard

Toggle word wrap

dataprotectionapplication.oadp.openshift.io/velero created

dataprotectionapplication.oadp.openshift.io/velero created

Copy to Clipboard

Toggle word wrap

Verify that the OADP resources are created and are in Running state.

oc get pods,dpa -n openshift-adp

$ oc get pods,dpa -n openshift-adp
NAME                                                    READY   STATUS    RESTARTS   AGE
pod/openshift-adp-controller-manager-7b64b74fcd-msjbs   1/1     Running   0          5m30s
pod/velero-694b5b8f5c-b4kwg                             1/1     Running   0          3m31s


NAME                                                 AGE
dataprotectionapplication.oadp.openshift.io/velero   3m31s

Copy to Clipboard

Toggle word wrap

4.13.2. Creating a sample discovered application
Copy link

In order to test failover from the Primary managed cluster to the Secondary managed cluster and relocate for discovered applications, you need a sample application that is installed without using the RHACM create application capability.

Procedure

Log in to the Primary managed cluster and clone the sample application repository.
```
git clone https://github.com/red-hat-storage/ocm-ramen-samples.git
```
```
$ git clone https://github.com/red-hat-storage/ocm-ramen-samples.git
```
Copy to Clipboard Toggle word wrap
Verify that you are on the main branch.
```
cd ~/ocm-ramen-samples
git branch
```
```
$ cd ~/ocm-ramen-samples
$ git branch
* main
```
Copy to Clipboard Toggle word wrap
The correct directory should be used when creating the sample application based on your scenario, metro or regional.
Note
Only applications using CephRBD or block volumes are supported for discovered applications.
```
ls workloads/deployment | egrep -v 'cephfs|k8s|base'
```
```
$ ls workloads/deployment | egrep -v 'cephfs|k8s|base'
odr-metro-rbd
odr-regional-rbd
```
Copy to Clipboard Toggle word wrap
Create a project named busybox-discovered on both the Primary and Secondary managed clusters.
```
oc new-project busybox-discovered
```
```
$ oc new-project busybox-discovered
```
Copy to Clipboard Toggle word wrap
Create the busybox application on the Primary managed cluster. This sample application example is for Regional-DR using a block (Ceph RBD) volume.
```
oc apply -k workloads/deployment/odr-regional-rbd -n busybox-discovered
```
```
$ oc apply -k workloads/deployment/odr-regional-rbd -n busybox-discovered
persistentvolumeclaim/busybox-pvc created
deployment.apps/busybox created
```
Copy to Clipboard Toggle word wrap
Note
OpenShift Data Foundation Disaster Recovery solution now extends protection to discovered applications that span across multiple namespaces.

Verify that busybox is running in the correct project on the Primary managed cluster.

oc get pods,pvc,deployment -n busybox-discovered

$ oc get pods,pvc,deployment -n busybox-discovered

Copy to Clipboard

Toggle word wrap

NAME                           READY   STATUS    RESTARTS   AGE
pod/busybox-796fccbb95-qmxjf   1/1     Running   0          18s


NAME                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  VOLUMEATTRIBUTESCLASS   AGE
persistentvolumeclaim/busybox-pvc   Bound    pvc-b20e4129-902d-47c7-b962-040ad64130c4   1Gi        RWO            ocs-storagecluster-ceph-rbd   <unset>                 18s


NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/busybox   1/1     1            1           18

NAME                           READY   STATUS    RESTARTS   AGE
pod/busybox-796fccbb95-qmxjf   1/1     Running   0          18s


NAME                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  VOLUMEATTRIBUTESCLASS   AGE
persistentvolumeclaim/busybox-pvc   Bound    pvc-b20e4129-902d-47c7-b962-040ad64130c4   1Gi        RWO            ocs-storagecluster-ceph-rbd   <unset>                 18s


NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/busybox   1/1     1            1           18

Copy to Clipboard

Toggle word wrap

4.13.3. Enrolling a sample discovered application for disaster recovery protection
Copy link

This section guides you on how to apply an existing DR Policy to a discovered application from the Protected applications tab.

Prerequisites

Ensure that Disaster Recovery has been configured and that at least one DR Policy has been created.

Procedure

On RHACM console, navigate to Disaster recovery Protected applications tab.
Click Enroll application to start configuring existing applications for DR protection.
Select ACM discovered applications.
In the Namespace page, choose the DR cluster which is the name of the Primary managed cluster where busybox is installed.
Select namespace where the application is installed. For example, busybox-discovered.
Note
If you have workload spread across multiple namespaces then you can select all of those namespaces to DR protect.
Choose a unique Name, for example busybox-rbd, for the discovered application and click Next.
In the Configuration page, the Resource label is used to protect your resources where you can set which resources will be included in the kubernetes-object backup and what volume’s persistent data will be replicated. Resource label is selected by default.
Provide Label expressions and PVC label selector. Choose the label appname=busybox for both the kubernetes-objects and for the PVC(s).
Click Next.
In the Replication page, select an existing DR Policy and the kubernetes-objects backup interval.
Note
It is recommended to choose the same duration for the PVC data replication and kubernetes-object backup interval (i.e., 5 minutes).
Click Next.
Review the configuration and click Save.
Use the Back button to go back to the screen to correct any issues.
Verify that the Application volumes (PVCs) and the Kubernetes-objects backup have a Healthy status before proceeding to DR Failover and Relocate testing. You can view the status of your Discovered applications on the Protected applications tab.
1. To see the status of the DRPC, run the following command on the Hub cluster:
  $ oc get drpc {drpc_name} -o wide -n openshift-dr-ops
  Copy to Clipboard Toggle word wrap
  The discovered applications store resources such as DRPlacementControl (DRPC) and Placement on the Hub cluster in a new namespace called openshift-dr-ops. The DRPC name can be identified by the unique Name configured in prior steps (i.e., busybox-rbd).
2. To see the status of the VolumeReplicationGroup (VRG) for discovered applications, run the following command on the managed cluster where the busybox application was manually installed.
  $ oc get vrg {vrg_name} -n openshift-dr-ops
  Copy to Clipboard Toggle word wrap
  The VRG resource is stored in the namespace openshift-dr-ops after a DR Policy is assigned to the discovered application. The VRG name can be identified by the unique Name configured in prior steps (i.e., busybox-rbd).

4.13.4. Discovered application failover and relocate
Copy link

A protected Discovered application can Failover or Relocate to its peer cluster similar to managed applications. However, there are some additional steps for discovered applications since RHACM does not manage the lifecycle of the application as it does for Managed applications.

This section guides you through the Failover and Relocate process for a protected discovered application.

Important

Never initiate a Failover or Relocate of an application when one or both resource types are in a Warning or Critical status.

4.13.4.1. Failover disaster recovery protected discovered application
Copy link

This section guides you on how to failover a discovered application which is disaster recovery protected.

Prerequisites

Ensure that the application namespace is created in both managed clusters (for example, busybox-discovered).

Procedure

In the RHACM console, navigate to Disaster Recovery Protected applications tab.
At the end of the application row, click on the Actions menu and choose to initiate Failover.
In the Failover application modal window, review the status of the application and the target cluster.
Click Initiate. Wait for the Failover process to complete.

Verify that the busybox application is running on the Secondary managed cluster.

oc get pods,pvc,volumereplication -n busybox-discovered

$ oc get pods,pvc,volumereplication -n busybox-discovered
NAME                           READY   STATUS    RESTARTS   AGE
pod/busybox-796fccbb95-qmxjf   1/1     Running   0          2m46s


NAME                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  VOLUMEATTRIBUTESCLASS   AGE
persistentvolumeclaim/busybox-pvc   Bound    pvc-b20e4129-902d-47c7-b962-040ad64130c4   1Gi        RWO            ocs-storagecluster-ceph-rbd   <unset>                 2m57s


NAME                                                             AGE     VOLUMEREPLICATIONCLASS                  PVCNAME       DESIREDSTATE   CURRENTSTATE
volumereplication.replication.storage.openshift.io/busybox-pvc   2m45s   rbd-volumereplicationclass-1625360775   busybox-pvc   primary        Primary

Copy to Clipboard

Toggle word wrap

Check the progression status of Failover until the result is WaitOnUserToCleanup. The DRPC name can be identified by the unique Name configured in prior steps (for example, busybox-rbd).
```
oc get drpc {drpc_name} -n openshift-dr-ops -o jsonpath='{.status.progression}{"\n"}'
```
```
$ oc get drpc {drpc_name} -n openshift-dr-ops -o jsonpath='{.status.progression}{"\n"}'
WaitOnUserToCleanUp
```
Copy to Clipboard Toggle word wrap
Remove the busybox application from the Primary managed cluster to complete the Failover process.
1. Navigate to the Protected applications tab. You will see a message to remove the application.
2. Navigate to the cloned repository for busybox and run the following commands on the Primary managed cluster where you failed over from. Use the same directory that was used to create the application (for example, odr-regional-rbd).
  $ cd ~/ocm-ramen-samples/ $ git branch * main $ oc delete -k workloads/deployment/odr-regional-rbd -n busybox-discovered persistentvolumeclaim "busybox-pvc" deleted deployment.apps "busybox" deleted
  Copy to Clipboard Toggle word wrap
After deleting the application, navigate to the Protected applications tab and verify that the busybox resources are both in Healthy status.

4.13.4.2. Relocate disaster recovery protected discovered application
Copy link

This section guides you on how to relocate a discovered application which is disaster recovery protected.

Procedure

In the RHACM console, navigate to Disaster Recovery Protected applications tab.
At the end of the application row, click on the Actions menu and choose to initiate Relocate.
In the Relocate application modal window, review the status of the application and the target cluster.
Click Initiate.
Check the progression status of Relocate until the result is WaitOnUserToCleanup. The DRPC name can be identified by the unique Name configured in prior steps (for example, busybox-rbd).
```
oc get drpc {drpc_name} -n openshift-dr-ops -o jsonpath='{.status.progression}{"\n"}'
```
```
$ oc get drpc {drpc_name} -n openshift-dr-ops -o jsonpath='{.status.progression}{"\n"}'
WaitOnUserToCleanUp
```
Copy to Clipboard Toggle word wrap
Remove the busybox application from the Secondary managed cluster before Relocate to the Primary managed cluster is completed.
Navigate to the cloned repository for busybox and run the following commands on the Secondary managed cluster where you relocated from. Use the same directory that was used to create the application (for example, odr-regional-rbd).
```
cd ~/ocm-ramen-samples/
git branch
oc delete -k workloads/deployment/odr-regional-rbd -n busybox-discovered
```
```
$ cd ~/ocm-ramen-samples/
$ git branch
* main
$ oc delete -k workloads/deployment/odr-regional-rbd -n busybox-discovered
persistentvolumeclaim "busybox-pvc" deleted
deployment.apps "busybox" deleted
```
Copy to Clipboard Toggle word wrap
After deleting the application, navigate to the Protected applications tab and verify that the busybox resources are both in Healthy status.

Verify that the busybox application is running on the Primary managed cluster.

oc get pods,pvc -n busybox-discovered

$ oc get pods,pvc -n busybox-discovered
NAME                           READY   STATUS    RESTARTS   AGE
pod/busybox-796fccbb95-qmxjf   1/1     Running   0          2m46s


NAME                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  VOLUMEATTRIBUTESCLASS   AGE
persistentvolumeclaim/busybox-pvc   Bound    pvc-b20e4129-902d-47c7-b962-040ad64130c4   1Gi        RWO            ocs-storagecluster-ceph-rbd   <unset>                 2m57s


NAME                                                             AGE     VOLUMEREPLICATIONCLASS                  PVCNAME       DESIREDSTATE   CURRENTSTATE
volumereplication.replication.storage.openshift.io/busybox-pvc   2m45s   rbd-volumereplicationclass-1625360775   busybox-pvc   primary        Primary

Copy to Clipboard

Toggle word wrap

4.13.5. Disable disaster recovery for protected applications
Copy link

This section guides you to disable disaster recovery resources when you want to delete the protected applications or when the application no longer needs to be protected.

Procedure

Login to the Hub cluster.
List the DRPlacementControl (DRPC) resources. Each DRPC resource was created when the application was assigned a DR policy.
```
oc get drpc -n openshift-dr-ops
```
```
$ oc get drpc -n openshift-dr-ops
```
Copy to Clipboard Toggle word wrap
Find the DRPC that has a name that includes the unique identifier that you chose when assigning a DR policy (for example, busybox-rbd) and delete the DRPC.
```
oc delete {drpc_name} -n openshift-dr-ops
```
```
$ oc delete {drpc_name} -n openshift-dr-ops
```
Copy to Clipboard Toggle word wrap
List the Placement resources. Each Placement resource was created when the application was assigned a DR policy.
```
oc get placements -n openshift-dr-ops
```
```
$ oc get placements -n openshift-dr-ops
```
Copy to Clipboard Toggle word wrap
Find the Placement that has a name that includes the unique identifier that you chose when assigning a DR policy (for example, busybox-rbd-placement-1) and delete the Placement.
```
oc delete placements {placement_name} -n openshift-dr-ops
```
```
$ oc delete placements {placement_name} -n openshift-dr-ops
```
Copy to Clipboard Toggle word wrap

4.14. Recovering to a replacement cluster with Regional-DR
Copy link

When there is a failure with the primary cluster, you get the options to either repair, wait for the recovery of the existing cluster, or replace the cluster entirely if the cluster is irredeemable. This solution guides you when replacing a failed primary cluster with a new cluster and enables failback (relocate) to this new cluster.

In these instructions, we are assuming that a RHACM managed cluster must be replaced after the applications have been installed and protected. For purposes of this section, the RHACM managed cluster is the replacement cluster, while the cluster that is not replaced is the surviving cluster and the new cluster is the recovery cluster.

Note

Replacement cluster recovery for Discovered applications is currently not supported. Only Managed applications are supported.

Prerequisite

Ensure that the Regional-DR environment has been configured with applications installed using Red Hat Advance Cluster Management (RHACM).
Ensure that the applications are assigned a Data policy which protects them against cluster failure.

Procedure

On the Hub cluster, navigate to Applications and failover all protected applications on the failed replacement cluster to the surviving cluster.
Validate that all protected applications are running on the surviving cluster before moving to the next step.
Note
The PROGRESSION state for each application DRPlacementControl shows as Cleaning Up. This is to be expected if the replacement cluster is offline or down.
From the Hub cluster, delete the DRCluster for the replacement cluster.
```
oc delete drcluster <drcluster_name> --wait=false
```
```
$ oc delete drcluster <drcluster_name> --wait=false
```
Copy to Clipboard Toggle word wrap
Note
Use --wait=false since the DRCluster will not be deleted until a later step.

Disable disaster recovery for each protected application on the surviving cluster. Perform all the sub-steps on the hub cluster.

For each application, edit the Placement and ensure that the surviving cluster is selected.

oc edit placement <placement_name> -n <namespace>

$ oc edit placement <placement_name> -n <namespace>

Copy to Clipboard

Toggle word wrap

apiVersion: cluster.open-cluster-management.io/v1beta1
kind: Placement
metadata:
annotations:
  cluster.open-cluster-management.io/experimental-scheduling-disable: "true"
[...]
spec:
clusterSets:
- submariner
predicates:
- requiredClusterSelector:
    claimSelector: {}
    labelSelector:
      matchExpressions:
      - key: name
        operator: In
        values:
        - cluster1  <-- Modify to be surviving cluster name
[...]

apiVersion: cluster.open-cluster-management.io/v1beta1
kind: Placement
metadata:
annotations:
  cluster.open-cluster-management.io/experimental-scheduling-disable: "true"
[...]
spec:
clusterSets:
- submariner
predicates:
- requiredClusterSelector:
    claimSelector: {}
    labelSelector:
      matchExpressions:
      - key: name
        operator: In
        values:
        - cluster1  <-- Modify to be surviving cluster name
[...]

Copy to Clipboard

Toggle word wrap

Note

For Subscription-based applications the associated Placement can be found in the same namespace on the hub cluster similar to the managed clusters. For ApplicationSets-based applications the associated Placement can be found in the openshift-gitops namespace on the hub cluster.

Verify that the s3Profile is removed for the replacement cluster by running the following command on the surviving cluster for each protected application’s VolumeReplicationGroup.
```
oc get vrg -n <application_namespace> -o jsonpath='{.items[0].spec.s3Profiles}' | jq
```
```
$ oc get vrg -n <application_namespace> -o jsonpath='{.items[0].spec.s3Profiles}' | jq
```
Copy to Clipboard Toggle word wrap
Delete all DRPlacementControl (DRPC) resources from the Hub cluster after the protected application Placement resources are all configured to use the surviving cluster and replacement cluster s3Profile(s) removed from protected applications.
1. Before deleting the DRPC, edit the DRPC of each application and add the annotation drplacementcontrol.ramendr.openshift.io/do-not-delete-pvc: "true".
  $ oc edit drpc {drpc_name} -n {namespace}
  Copy to Clipboard Toggle word wrap
  apiVersion: ramendr.openshift.io/v1alpha1 kind: DRPlacementControl metadata: annotations: ## Add this annotation drplacementcontrol.ramendr.openshift.io/do-not-delete-pvc: "true"
  Copy to Clipboard Toggle word wrap
2. Verify that the annotation has been copied to the associated VolumeReplicationGroup (VRG) on the surviving cluster for each protected application.
  $ oc get vrg -n {namespace} -o jsonpath='{.items[*].metadata.annotations}' | jq
  Copy to Clipboard Toggle word wrap
3. Delete DRPC.
  $ oc delete drpc {drpc_name} -n {namespace}
  Copy to Clipboard Toggle word wrap
  Note
  For Subscription-based applications the associated DRPlacementControl can be found in the same namespace as the managed clusters on the hub cluster. For ApplicationSet-based applications the associated DRPlacementControl can be found in the openshift-gitops namespace on the hub cluster.
4. Verify that all DRPlacementControl resources are deleted before proceeding to the next step. This command is a query across all namespaces. There should be no resources found.
  $ oc get drpc -A
  Copy to Clipboard Toggle word wrap

Edit each applications Placement and remove the annotation cluster.open-cluster-management.io/experimental-scheduling-disable: "true".

oc edit placement {placement_name} -n {namespace}

$ oc edit placement {placement_name} -n {namespace}

Copy to Clipboard

Toggle word wrap

apiVersion: cluster.open-cluster-management.io/v1beta1
kind: Placement
metadata:
annotations:
  ## Remove this annotation
  cluster.open-cluster-management.io/experimental-scheduling-disable: "true"
[...]

apiVersion: cluster.open-cluster-management.io/v1beta1
kind: Placement
metadata:
annotations:
  ## Remove this annotation
  cluster.open-cluster-management.io/experimental-scheduling-disable: "true"
[...]

Copy to Clipboard

Toggle word wrap

Repeat the process detailed in the last step and the sub-steps for every protected application on the surviving cluster. Disabling DR for protected applications is now completed.

on the hub cluster, run the following script to remove all disaster recovery configurations from the surviving cluster and the hub cluster.

#!/bin/bash

secrets=$(oc get secrets -n openshift-operators | grep Opaque | cut -d" " -f1)
echo $secrets
for secret in $secrets
do
    oc patch -n openshift-operators secret/$secret -p '{"metadata":{"finalizers":null}}' --type=merge
done

mirrorpeers=$(oc get mirrorpeer -o name)
echo $mirrorpeers
for mp in $mirrorpeers
do
    oc patch $mp -p '{"metadata":{"finalizers":null}}' --type=merge
    oc delete $mp
done

drpolicies=$(oc get drpolicy -o name)
echo $drpolicies
for drp in $drpolicies
do
    oc patch $drp -p '{"metadata":{"finalizers":null}}' --type=merge
    oc delete $drp
done

drclusters=$(oc get drcluster -o name)
echo $drclusters
for drp in $drclusters
do
    oc patch $drp -p '{"metadata":{"finalizers":null}}' --type=merge
    oc delete $drp
done

oc delete project openshift-operators

managedclusters=$(oc get managedclusters -o name | cut -d"/" -f2)
echo $managedclusters
for mc in $managedclusters
do
    secrets=$(oc get secrets -n $mc | grep multicluster.odf.openshift.io/secret-type | cut -d" " -f1)
    echo $secrets
    for secret in $secrets
    do
        set -x
        oc patch -n $mc secret/$secret -p '{"metadata":{"finalizers":null}}' --type=merge
        oc delete -n $mc secret/$secret
    done
done

oc delete clusterrolebinding spoke-clusterrole-bindings

#!/bin/bash

secrets=$(oc get secrets -n openshift-operators | grep Opaque | cut -d" " -f1)
echo $secrets
for secret in $secrets
do
    oc patch -n openshift-operators secret/$secret -p '{"metadata":{"finalizers":null}}' --type=merge
done

mirrorpeers=$(oc get mirrorpeer -o name)
echo $mirrorpeers
for mp in $mirrorpeers
do
    oc patch $mp -p '{"metadata":{"finalizers":null}}' --type=merge
    oc delete $mp
done

drpolicies=$(oc get drpolicy -o name)
echo $drpolicies
for drp in $drpolicies
do
    oc patch $drp -p '{"metadata":{"finalizers":null}}' --type=merge
    oc delete $drp
done

drclusters=$(oc get drcluster -o name)
echo $drclusters
for drp in $drclusters
do
    oc patch $drp -p '{"metadata":{"finalizers":null}}' --type=merge
    oc delete $drp
done

oc delete project openshift-operators

managedclusters=$(oc get managedclusters -o name | cut -d"/" -f2)
echo $managedclusters
for mc in $managedclusters
do
    secrets=$(oc get secrets -n $mc | grep multicluster.odf.openshift.io/secret-type | cut -d" " -f1)
    echo $secrets
    for secret in $secrets
    do
        set -x
        oc patch -n $mc secret/$secret -p '{"metadata":{"finalizers":null}}' --type=merge
        oc delete -n $mc secret/$secret
    done
done

oc delete clusterrolebinding spoke-clusterrole-bindings

Copy to Clipboard

Toggle word wrap

Note

This script uses the command oc delete project openshift-operators to remove the Disaster Recovery (DR) operators in this namespace on the hub cluster. If there are other non-DR operators in this namespace, you must install them again from OperatorHub.

After the namespace openshift-operators is automatically created again, add the monitoring label back for collecting the disaster recovery metrics.
```
oc label namespace openshift-operators openshift.io/cluster-monitoring='true'
```
```
$ oc label namespace openshift-operators openshift.io/cluster-monitoring='true'
```
Copy to Clipboard Toggle word wrap
On the surviving cluster, ensure that the object bucket created during the DR installation is deleted. Delete the object bucket if it was not removed by script. The name of the object bucket used for DR starts with odrbucket.
```
oc get obc -n openshift-storage
```
```
$ oc get obc -n openshift-storage
```
Copy to Clipboard Toggle word wrap
Uninstall Submariner for only the replacement cluster (failed cluster) using the RHACM console.
1. Navigate to Infrastructure Clusters Clustersets Submariner add-ons view and uninstall Submariner for only the replacement cluster.
  Note
  The uninstall process of Submariner for the replacement cluster (failed cluster) will stay GREEN and not complete until the replacement cluster has been detached from the RHACM console.
2. Navigate back to Clusters view and detach replacement cluster.
3. Create new OpenShift cluster (recovery cluster) and import into Infrastructure Clusters view.
4. Add the new recovery cluster to the Clusterset used by Submariner.
5. Install Submariner add-ons only for the new recovery cluster.
  Note
  If GlobalNet is used for the surviving cluster make sure to enable GlobalNet for the recovery cluster as well.
Install OpenShift Data Foundation on the recovery cluster. The OpenShift Data Foundation version should be OpenShift Data Foundation 4.16 (or greater) and the same version of ODF as on the surviving cluster. While creating the storage cluster, in the Data Protection step, you must select the Prepare cluster for disaster recovery (Regional-DR only) checkbox.
Note
Make sure to follow the optional instructions in the documentation to modify the OpenShift Data Foundation storage cluster on the recovery cluster if GlobalNet has been enabled when installing Submariner.
On the Hub cluster, install the ODF Multicluster Orchestrator operator from OperatorHub. For instructions, see chapter on Installing OpenShift Data Foundation Multicluster Orchestrator operator.
Using the RHACM console, navigate to Data Services Disaster recovery Policies tab.
1. Select Create DRPolicy and name your policy.
2. Select the recovery cluster and the surviving cluster.
3. Create the policy. For instructions see chapter on Creating Disaster Recovery Policy on Hub cluster.
Proceed to the next step only after the status of DRPolicy changes to Validated.

Verify that cephblockpool IDs remain unchanged.

Run the following command on the recovery cluster.

oc get cm -n openshift-storage rook-ceph-csi-mapping-config -o yaml

$ oc get cm -n openshift-storage rook-ceph-csi-mapping-config -o yaml

Copy to Clipboard

Toggle word wrap

The result is the sample output.

apiVersion: v1
data:
  csi-mapping-config-json: '[{"ClusterIDMapping":{"openshift-storage":"openshift-storage"},"RBDPoolIDMapping":[{"1":"1"}]}]'
kind: ConfigMap
[...]

apiVersion: v1
data:
  csi-mapping-config-json: '[{"ClusterIDMapping":{"openshift-storage":"openshift-storage"},"RBDPoolIDMapping":[{"1":"1"}]}]'
kind: ConfigMap
[...]

Copy to Clipboard

Toggle word wrap

Run the following command on the surviving cluster.

oc get cm -n openshift-storage rook-ceph-csi-mapping-config -o yaml

$ oc get cm -n openshift-storage rook-ceph-csi-mapping-config -o yaml

Copy to Clipboard

Toggle word wrap

The result is the sample output.

apiVersion: v1
data:
  csi-mapping-config-json: '[{"ClusterIDMapping":{"openshift-storage":"openshift-storage"},"RBDPoolIDMapping":[{"3":"1"}]}]'
kind: ConfigMap
[...]

apiVersion: v1
data:
  csi-mapping-config-json: '[{"ClusterIDMapping":{"openshift-storage":"openshift-storage"},"RBDPoolIDMapping":[{"3":"1"}]}]'
kind: ConfigMap
[...]

Copy to Clipboard

Toggle word wrap

Check the RBDPoolIDMapping in the yaml for both the clusters. If RBDPoolIDMapping does not match, then edit the rook-ceph-csi-mapping-config config map of recovery cluster to add the additional or missing RBDPoolIDMapping directly as shown in the following examples.
```
csi-mapping-config-json: '[{"ClusterIDMapping":{"openshift-storage":"openshift-storage"},"RBDPoolIDMapping":[{"1":"1"}]},{"ClusterIDMapping":{"openshift-storage":"openshift-storage"},"RBDPoolIDMapping":[{"1":"3"}]}]’
```
```
csi-mapping-config-json: '[{"ClusterIDMapping":{"openshift-storage":"openshift-storage"},"RBDPoolIDMapping":[{"1":"1"}]},{"ClusterIDMapping":{"openshift-storage":"openshift-storage"},"RBDPoolIDMapping":[{"1":"3"}]}]’
```
Copy to Clipboard Toggle word wrap
Note
After editing the configmap, restart rook-ceph-operator pod in the namespace openshift-storage on the surviving cluster by deleting the pod.

Apply the DRPolicy to the applications on the surviving cluster that were originally protected before the replacement cluster failed.
Relocate the newly protected applications on the surviving cluster back to the new recovery cluster. Using the RHACM console, navigate to the Applications menu to perform the relocation.

Note

If any issues are encountered while following this process, then reach out to Red Hat Customer Support.

4.15. Viewing Recovery Point Objective values for disaster recovery enabled applications
Copy link

Recovery Point Objective (RPO) value is the most recent sync time of persistent data from the cluster where the application is currently active to its peer. This sync time helps determine duration of data lost during failover.

Note

This RPO value is applicable only for Regional-DR during failover. Relocation ensures there is no data loss during the operation, as all peer clusters are available.

You can view the Recovery Point Objective (RPO) value of all the protected volumes for their workload on the Hub cluster.

Procedure

On the Hub cluster, navigate to Applications Overview tab.
In the Data policy column, click the policy link for the application you applied the policy to.
A Data Policies modal page appears with the number of disaster recovery policies applied to each application along with failover and relocation status.
On the Data Policies modal page, click the View more details link.
A detailed Data Policies modal page is displayed that shows the policy names and the ongoing activities (Last sync, Activity status) associated with the policy that is applied to the application.
The Last sync time reported in the modal page, represents the most recent sync time of all volumes that are DR protected for the application.

4.16. Hub recovery using Red Hat Advanced Cluster Management
Copy link

When your setup has active and passive Red Hat Advanced Cluster Management for Kubernetes (RHACM) hub clusters, and in case where the active hub is down, you can use the passive hub to failover or relocate the disaster recovery protected workloads.

4.16.1. Configuring passive hub cluster
Copy link

To perform hub recovery in case the active hub is down or unreachable, follow the procedure in this section to configure the passive hub cluster and then failover or relocate the disaster recovery protected workloads.

Procedure

Ensure that RHACM operator and MultiClusterHub is installed on the passive hub cluster. See RHACM installation guide for instructions.
After the operator is successfully installed, a popover with a message that the Web console update is available appears on the user interface. Click Refresh web console from this popover for the console changes to reflect.
Before hub recovery, configure backup and restore. See Backup and restore topics of RHACM Business continuity guide.
Install the multicluster orchestrator (MCO) operator along with Red Hat OpenShift GitOps operator on the passive RHACM hub prior to the restore. For instructions to restore your RHACM hub, see Installing OpenShift Data Foundation Multicluster Orchestrator operator.
Ensure that .spec.cleanupBeforeRestore is set to None for the Restore.cluster.open-cluster-management.io resource. For details, see Restoring passive resources while checking for backups chapter of RHACM documentation.
If SSL access across clusters was configured manually during setup, then re-configure SSL access across clusters. For instructions, see Configuring SSL access across clusters chapter.
On the passive hub, add a label to openshift-operators namespace to enable basic monitoring of VolumeSyncronizationDelay alert using this command. For alert details, see Disaster recovery alerts chapter.
```
oc label namespace openshift-operators openshift.io/cluster-monitoring='true'
```
```
$ oc label namespace openshift-operators openshift.io/cluster-monitoring='true'
```
Copy to Clipboard Toggle word wrap

4.16.2. Switching to passive hub cluster
Copy link

Use this procedure when the active hub is down or unreachable.

Procedure

During the restore procedure, to avoid eviction of resources when ManifestWorks are not regenerated correctly, you can enlarge the AppliedManifestWork eviction grace period. On the passive hub cluster, check for existing global KlusterletConfig.
- If global KlusterletConfig exists then edit and set the value for appliedManifestWorkEvictionGracePeriod parameter to a larger value. For example, 24 hours or more.
- If global KlusterletConfig does not exist, then create the Klusterletconfig using the following yaml:
  apiVersion: config.open-cluster-management.io/v1alpha1 kind: KlusterletConfig metadata: name: global spec: appliedManifestWorkEvictionGracePeriod: "24h"
  Copy to Clipboard Toggle word wrap
  The configuration will be propagated to all the managed clusters automatically.
Restore the backups on the passive hub cluster. For information, see Restoring a hub cluster from backup.
Important
Recovering a failed hub to its passive instance will only restore applications and their DR protected state to its last scheduled backup. Any application that was DR protected after the last scheduled backup would need to be protected again on the new hub.

Verify that the restore is complete.

oc -n <restore-namespace> wait restore <restore-name> --for=jsonpath='{.status.phase}'=Finished --timeout=120s

$ oc -n <restore-namespace> wait restore <restore-name> --for=jsonpath='{.status.phase}'=Finished --timeout=120s

Copy to Clipboard

Toggle word wrap

Verify that the Primary and Seconday managed clusters are successfully imported into the RHACM console and they are accessible. If any of the managed clusters are down or unreachable then they will not be successfully imported.
Wait until DRPolicy validation succeeds before performing any DR operation.
Note
Submariner is automatically installed once the managed clusters are imported on the passive hub.
Verify that the DRPolicy is created successfully. Run this command on the Hub cluster for each of the DRPolicy resources created, where <drpolicy_name> is replaced with a unique name.
```
oc get drpolicy <drpolicy_name> -o jsonpath='{.status.conditions[].reason}{"\n"}'
```
```
$ oc get drpolicy <drpolicy_name> -o jsonpath='{.status.conditions[].reason}{"\n"}'
```
Copy to Clipboard Toggle word wrap
Example output:
```
Succeeded
```
```
Succeeded
```
Copy to Clipboard Toggle word wrap
Refresh the RHACM console to make the DR monitoring dashboard tab accessible if it was enabled on the Active hub cluster.
Verify the DRPC output using the following command on the new hub cluster:
```
oc get drpc -A -o wide
```
```
$ oc get drpc -A -o wide
```
Copy to Clipboard Toggle word wrap
If PROGRESSION shows a status of PAUSED, administrative intervention is required to unpause it. PROGRESSION enters PAUSED state under the following conditions:
- Cluster Query Failure: None of the clusters were successfully queried during the DRPC reconciliation. This situation can occur during hub recovery.
- Action Mismatch: The DRPC action differs from the queried VRG action.
- Cluster Mismatch: The DRPC action and the VRG action are the same, but the Primary VRG is found in a different cluster than the one expected by the DRPC.
  Important
  If you cannot diagnose and resolve the cause of the pause, contact Red Hat Customer Support.
  If PROGRESSION is in either Completed or Cleaning up, it is safe to proceed.
Edit the global KlusterletConfig on the new hub and remove the parameter appliedManifestWorkEvictionGracePeriod and its value.
Depending on whether the active hub cluster, or both the active hub cluster along with the primary managed cluster had been down, follow the next steps based on your scenario:
1. If only the active hub cluster had been down, and if the managed clusters are still accessible, no further action is required.
2. If the primary managed cluster had been down, along with the active hub cluster, you need to fail over the workloads from the primary managed cluster to the secondary managed cluster.
  For failover instructions, based on your workload type, see Subscription-based applications or ApplicationSet-based applications.
If the primary managed cluster is down, along with the active hub cluster, you need to fail over the workloads from the primary managed cluster to the secondary managed cluster.
For failover instructions, based on your workload type, see Subscription-based applications or ApplicationSet-based applications.
Verify that the failover is successful. If the Primary managed cluster is also down, then the PROGRESSION status for the workload would be in the Cleaning Up phase until the down Primary managed cluster is back online and successfully imported into the RHACM console.
On the passive hub cluster, run the following command to check the PROGRESSION status.
```
oc get drpc -o wide -A
```
```
$ oc get drpc -o wide -A
```
Copy to Clipboard Toggle word wrap

4.1. Components of Regional-DR solutionCopy linkLink copied to clipboard!

4.2. Regional-DR deployment workflowCopy linkLink copied to clipboard!

4.3. Requirements for enabling Regional-DRCopy linkLink copied to clipboard!

4.4. Creating an OpenShift Data Foundation cluster on managed clustersCopy linkLink copied to clipboard!

4.5. Installing OpenShift Data Foundation Multicluster Orchestrator operatorCopy linkLink copied to clipboard!

4.6. Configuring SSL access across clustersCopy linkLink copied to clipboard!

4.7. Creating Disaster Recovery Policy on Hub clusterCopy linkLink copied to clipboard!

4.8. Create sample application for testing disaster recovery solutionCopy linkLink copied to clipboard!

4.8.1. Subscription-based applicationsCopy linkLink copied to clipboard!

4.8.1.1. Creating a sample Subscription-based applicationCopy linkLink copied to clipboard!

4.8.1.2. Apply Data policy to sample applicationCopy linkLink copied to clipboard!

4.8.2. ApplicationSet-based applicationsCopy linkLink copied to clipboard!

4.8.2.1. Creating ApplicationSet-based applicationsCopy linkLink copied to clipboard!

4.8.2.2. Apply Data policy to sample ApplicationSet-based applicationCopy linkLink copied to clipboard!

4.8.3. Deleting sample applicationCopy linkLink copied to clipboard!

4.9. Subscription-based application failover between managed clustersCopy linkLink copied to clipboard!

4.10. ApplicationSet-based application failover between managed clustersCopy linkLink copied to clipboard!

4.11. Relocating Subscription-based application between managed clustersCopy linkLink copied to clipboard!

4.12. Relocating an ApplicationSet-based application between managed clustersCopy linkLink copied to clipboard!

4.13. Disaster recovery protection for discovered applicationsCopy linkLink copied to clipboard!

4.13.1. Prerequisites for disaster recovery protection of discovered applicationsCopy linkLink copied to clipboard!

4.13.2. Creating a sample discovered applicationCopy linkLink copied to clipboard!

4.13.3. Enrolling a sample discovered application for disaster recovery protectionCopy linkLink copied to clipboard!

4.13.4. Discovered application failover and relocateCopy linkLink copied to clipboard!

4.13.4.1. Failover disaster recovery protected discovered applicationCopy linkLink copied to clipboard!

4.13.4.2. Relocate disaster recovery protected discovered applicationCopy linkLink copied to clipboard!

4.13.5. Disable disaster recovery for protected applicationsCopy linkLink copied to clipboard!

4.14. Recovering to a replacement cluster with Regional-DRCopy linkLink copied to clipboard!

4.15. Viewing Recovery Point Objective values for disaster recovery enabled applicationsCopy linkLink copied to clipboard!

4.16. Hub recovery using Red Hat Advanced Cluster ManagementCopy linkLink copied to clipboard!

4.16.1. Configuring passive hub clusterCopy linkLink copied to clipboard!

4.16.2. Switching to passive hub clusterCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

4.1. Components of Regional-DR solution
Copy link

4.2. Regional-DR deployment workflow
Copy link

4.3. Requirements for enabling Regional-DR
Copy link

4.4. Creating an OpenShift Data Foundation cluster on managed clusters
Copy link

4.5. Installing OpenShift Data Foundation Multicluster Orchestrator operator
Copy link

4.6. Configuring SSL access across clusters
Copy link

4.7. Creating Disaster Recovery Policy on Hub cluster
Copy link

4.8. Create sample application for testing disaster recovery solution
Copy link

4.8.1. Subscription-based applications
Copy link

4.8.1.1. Creating a sample Subscription-based application
Copy link

4.8.1.2. Apply Data policy to sample application
Copy link

4.8.2. ApplicationSet-based applications
Copy link

4.8.2.1. Creating ApplicationSet-based applications
Copy link

4.8.2.2. Apply Data policy to sample ApplicationSet-based application
Copy link

4.8.3. Deleting sample application
Copy link

4.9. Subscription-based application failover between managed clusters
Copy link

4.10. ApplicationSet-based application failover between managed clusters
Copy link

4.11. Relocating Subscription-based application between managed clusters
Copy link

4.12. Relocating an ApplicationSet-based application between managed clusters
Copy link

4.13. Disaster recovery protection for discovered applications
Copy link

4.13.1. Prerequisites for disaster recovery protection of discovered applications
Copy link

4.13.2. Creating a sample discovered application
Copy link

4.13.3. Enrolling a sample discovered application for disaster recovery protection
Copy link

4.13.4. Discovered application failover and relocate
Copy link

4.13.4.1. Failover disaster recovery protected discovered application
Copy link

4.13.4.2. Relocate disaster recovery protected discovered application
Copy link

4.13.5. Disable disaster recovery for protected applications
Copy link

4.14. Recovering to a replacement cluster with Regional-DR
Copy link

4.15. Viewing Recovery Point Objective values for disaster recovery enabled applications
Copy link

4.16. Hub recovery using Red Hat Advanced Cluster Management
Copy link

4.16.1. Configuring passive hub cluster
Copy link

4.16.2. Switching to passive hub cluster
Copy link