Chapter 12. Application failover between managed clusters
This section provides instructions on how to failover the busybox sample application. The failover method for Metro-DR is application based. Each application that is to be protected in this manner must have a corresponding DRPlacementControl
resource and a PlacementRule
resource created in the application namespace
as shown in the Create Sample Application for DR testing section.
Procedure
Create NetworkFence resource and enable Fencing.
Specify the list of CIDR blocks or IP addresses on which network fencing operation will be performed. In our case, this will be the EXTERNAL-IP of every OpenShift node in the cluster that needs to be fenced from using the external RHCS cluster.
Execute this command to get the IP addresses for the Primary managed cluster.
$ oc get nodes -o jsonpath='{range .items[*]}{.status.addresses[?(@.type=="ExternalIP")].address}{"\n"}{end}'
Example output:
10.70.56.118 10.70.56.193 10.70.56.154 10.70.56.242 10.70.56.136 10.70.56.99
NoteCollect the current IP addresses of all OpenShift nodes before there is a site outage. Best practice would be to create the NetworkFence YAML file and have it available and up-to-date for a disaster recovery event.
The IP addresses for all nodes will be added to the NetworkFence example resource as shown below. This example is for six nodes but there could be more nodes in your cluster.
apiVersion: csiaddons.openshift.io/v1alpha1 kind: NetworkFence metadata: name: network-fence-<cluster1> spec: driver: openshift-storage.rbd.csi.ceph.com cidrs: - <IP_Address1>/32 - <IP_Address2>/32 - <IP_Address3>/32 - <IP_Address4>/32 - <IP_Address5>/32 - <IP_Address6>/32 [...] secret: name: rook-csi-rbd-provisioner namespace: openshift-storage parameters: clusterID: openshift-storage
For the YAML file example above, modify the IP addresses and provide the correct <cluster1> to be the cluster name found in RHACM for the Primary managed cluster. Save this to filename
network-fence-<cluster1>.yaml
.ImportantThe NetworkFence must be created from the opposite managed cluster where the application is currently running prior to failover. In this case, that is the Secondary managed cluster.
$ oc create -f network-fence-<cluster1>.yaml
Example output:
networkfences.csiaddons.openshift.io/network-fence-ocp4perf1 created
ImportantAfter the NetworkFence is created, all communication from applications to the OpenShift Data Foundation storage will fail and some Pods will be in an unhealthy state (For example: CreateContainerError, CrashLoopBackOff) on the cluster that is now fenced.
In the same cluster as where the NetworkFence was created, verify that the status is Succeeded. Modify <cluster1> to be correct.
export NETWORKFENCE=network-fence-<cluster1> oc get networkfences.csiaddons.openshift.io/$NETWORKFENCE -n openshift-dr-system -o jsonpath='{.status.result}{"\n"}'
Example output:
Succeeded
Modify DRPolicy for the
fenced
cluster.Edit the DRPolicy on the Hub cluster and change <cluster1> (for example: ocp4perf1) from
Unfenced
toManuallyFenced
.$ oc edit drpolicy odr-policy
Example output:
[...] spec: drClusterSet: - clusterFence: ManuallyFenced ## <-- Modify from Unfenced to ManuallyFenced name: ocp4perf1 region: metro s3ProfileName: s3-primary - clusterFence: Unfenced name: ocp4perf2 region: metro s3ProfileName: s3-secondary [...]
Example output:
drpolicy.ramendr.openshift.io/odr-policy edited
Validate the DRPolicy status in the Hub cluster has changed to
Fenced
for the Primary managed cluster.$ oc get drpolicies.ramendr.openshift.io odr-policy -o yaml | grep -A 6 drClusters
Example output:
drClusters: ocp4perf1: status: Fenced string: ocp4perf1 ocp4perf2: status: Unfenced string: ocp4perf2
Modify DRPlacementControl to
failover
- On the Hub cluster navigate to Installed Operators and then click Openshift DR Hub Operator.
- Click DRPlacementControl tab.
-
Click DRPC
busybox-drpc
and then the YAML view. Add the
action
andfailoverCluster
details as shown in below screenshot. ThefailoverCluster
should be the ACM cluster name for the Secondary managed cluster.DRPlacementControl add action Failover
- Click Save.
Verify that the application
busybox
is now running in the Secondary managed cluster, the failover clusterocp4perf2
specified in the YAML file.$ oc get pods,pvc -n busybox-sample
Example output:
NAME READY STATUS RESTARTS AGE pod/busybox 1/1 Running 0 35s NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/busybox-pvc Bound pvc-79f2a74d-6e2c-48fb-9ed9-666b74cfa1bb 5Gi RWO ocs-storagecluster-ceph-rbd 35s
Verify that
busybox
is no longer running on the Primary managed cluster.$ oc get pods,pvc -n busybox-sample
Example output:
No resources found in busybox-sample namespace.
Be aware of known Metro-DR issues as documented in Known Issues section of Release Notes.