OpenShift Container Storage is now OpenShift Data Foundation starting with version 4.9.
Chapter 13. Relocating an application between managed clusters
A relocation operation is very similar to failover. Relocate is application based and uses the DRPlacementControl to trigger the relocation. The main difference for failback is that the application is scaled down on the failoverCluster and therefore creating a NetworkFence is not required.
Procedure
Remove NetworkFence resource and disable
Fencing.Before a failback or relocate action can be successful the NetworkFence for the Primary managed cluster must be deleted.
Execute this command in the Secondary managed cluster and modify <cluster1> to be correct for the NetworkFence YAML filename created in the prior section.
oc delete -f network-fence-<cluster1>.yaml
$ oc delete -f network-fence-<cluster1>.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output:
networkfence.csiaddons.openshift.io "network-fence-ocp4perf1" deleted
networkfence.csiaddons.openshift.io "network-fence-ocp4perf1" deletedCopy to Clipboard Copied! Toggle word wrap Toggle overflow Reboot OpenShift Container Platform nodes that were
Fenced.This step is required because some application Pods on the prior fenced cluster, in this case the Primary managed cluster, are in an unhealthy state (For example: CreateContainerError, CrashLoopBackOff). This can be most easily fixed by rebooting all worker OpenShift nodes one at a time.
NoteThe OpenShift Web Console dashboards and Overview page can also be used to assess the health of applications and the external storage. The detailed OpenShift Data Foundation dashboard is found by navigating to Storage
Data Foundation. Verify all Pods are in a healthy state by running this command on the Primary managed cluster after all OpenShift nodes have rebooted and are in a
Readystatus. The output for this query should be zero Pods.oc get pods -A | egrep -v 'Running|Completed'
$ oc get pods -A | egrep -v 'Running|Completed'Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output:
NAMESPACE NAME READY STATUS RESTARTS AGE
NAMESPACE NAME READY STATUS RESTARTS AGECopy to Clipboard Copied! Toggle word wrap Toggle overflow ImportantIf there are Pods still in an unhealthy status because of severed storage communication, troubleshoot and resolve before continuing. Because the storage cluster is external to OpenShift, it also has to be properly recovered after a site outage for OpenShift applications to be healthy.
Modify DRPolicy to
Unfencedstatus.In order for the ODR HUB operator to know the NetworkFence has been removed for the Primary managed cluster the DRPolicy must be modified for the newly
Unfencedcluster.Edit the DRPolicy on the Hub cluster and change <cluster1> (example
ocp4perf1) fromManuallyFencedtoUnfenced.oc edit drpolicy odr-policy
$ oc edit drpolicy odr-policyCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output:
drpolicy.ramendr.openshift.io/odr-policy edited
drpolicy.ramendr.openshift.io/odr-policy editedCopy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the status of DRPolicy in the Hub cluster has changed to
Unfencedfor the Primary managed cluster.oc get drpolicies.ramendr.openshift.io odr-policy -o yaml | grep -A 6 drClusters
$ oc get drpolicies.ramendr.openshift.io odr-policy -o yaml | grep -A 6 drClustersCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Modify DRPlacementControl to failback
- On the Hub cluster navigate to Installed Operators and then click Openshift DR Hub Operator.
- Click DRPlacementControl tab.
-
Click DRPC
busybox-drpcand then the YAML view. Modify action to
Relocate.DRPlacementControl modify action to Relocate
- Click Save.
Verify if the application
busyboxis now running in the Primary managed cluster.The failback is to the preferredClusterocp4perf1as specified in the YAML file, which is where the application was running before the failover operation.oc get pods,pvc -n busybox-sample
$ oc get pods,pvc -n busybox-sampleCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output:
NAME READY STATUS RESTARTS AGE pod/busybox 1/1 Running 0 60s NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/busybox-pvc Bound pvc-79f2a74d-6e2c-48fb-9ed9-666b74cfa1bb 5Gi RWO ocs-storagecluster-ceph-rbd 61s
NAME READY STATUS RESTARTS AGE pod/busybox 1/1 Running 0 60s NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/busybox-pvc Bound pvc-79f2a74d-6e2c-48fb-9ed9-666b74cfa1bb 5Gi RWO ocs-storagecluster-ceph-rbd 61sCopy to Clipboard Copied! Toggle word wrap Toggle overflow Verify if
busyboxis running in the Secondary managed cluster. The busybox application should no longer be running on this managed cluster.oc get pods,pvc -n busybox-sample
$ oc get pods,pvc -n busybox-sampleCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output:
No resources found in busybox-sample namespace.
No resources found in busybox-sample namespace.Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Be aware of known Metro-DR issues as documented in Known Issues section of Release Notes.