Chapter 13. Relocating an application between managed clusters
A relocation operation is very similar to failover. Relocate is application based and uses the DRPlacementControl
to trigger the relocation. The main difference for failback is that the application is scaled down on the failoverCluster and therefore creating a NetworkFence is not required.
Procedure
Remove NetworkFence resource and disable
Fencing
.Before a failback or relocate action can be successful the NetworkFence for the Primary managed cluster must be deleted.
Execute this command in the Secondary managed cluster and modify <cluster1> to be correct for the NetworkFence YAML filename created in the prior section.
$ oc delete -f network-fence-<cluster1>.yaml
Example output:
networkfence.csiaddons.openshift.io "network-fence-ocp4perf1" deleted
Reboot OpenShift Container Platform nodes that were
Fenced
.This step is required because some application Pods on the prior fenced cluster, in this case the Primary managed cluster, are in an unhealthy state (For example: CreateContainerError, CrashLoopBackOff). This can be most easily fixed by rebooting all worker OpenShift nodes one at a time.
NoteThe OpenShift Web Console dashboards and Overview page can also be used to assess the health of applications and the external storage. The detailed OpenShift Data Foundation dashboard is found by navigating to Storage
Data Foundation. Verify all Pods are in a healthy state by running this command on the Primary managed cluster after all OpenShift nodes have rebooted and are in a
Ready
status. The output for this query should be zero Pods.$ oc get pods -A | egrep -v 'Running|Completed'
Example output:
NAMESPACE NAME READY STATUS RESTARTS AGE
ImportantIf there are Pods still in an unhealthy status because of severed storage communication, troubleshoot and resolve before continuing. Because the storage cluster is external to OpenShift, it also has to be properly recovered after a site outage for OpenShift applications to be healthy.
Modify DRPolicy to
Unfenced
status.In order for the ODR HUB operator to know the NetworkFence has been removed for the Primary managed cluster the DRPolicy must be modified for the newly
Unfenced
cluster.Edit the DRPolicy on the Hub cluster and change <cluster1> (example
ocp4perf1
) fromManuallyFenced
toUnfenced
.$ oc edit drpolicy odr-policy
Example output:
[...] spec: drClusterSet: - clusterFence: Unfenced ## <-- Modify from ManuallyFenced to Unfenced name: ocp4perf1 region: metro s3ProfileName: s3-primary - clusterFence: Unfenced name: ocp4perf2 region: metro s3ProfileName: s3-secondary [...]
Example output:
drpolicy.ramendr.openshift.io/odr-policy edited
Verify that the status of DRPolicy in the Hub cluster has changed to
Unfenced
for the Primary managed cluster.$ oc get drpolicies.ramendr.openshift.io odr-policy -o yaml | grep -A 6 drClusters
Example output:
drClusters: ocp4perf1: status: Unfenced string: ocp4perf1 ocp4perf2: status: Unfenced string: ocp4perf2
Modify DRPlacementControl to failback
- On the Hub cluster navigate to Installed Operators and then click Openshift DR Hub Operator.
- Click DRPlacementControl tab.
-
Click DRPC
busybox-drpc
and then the YAML view. Modify action to
Relocate
.DRPlacementControl modify action to Relocate
- Click Save.
Verify if the application
busybox
is now running in the Primary managed cluster.The failback is to the preferredClusterocp4perf1
as specified in the YAML file, which is where the application was running before the failover operation.$ oc get pods,pvc -n busybox-sample
Example output:
NAME READY STATUS RESTARTS AGE pod/busybox 1/1 Running 0 60s NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/busybox-pvc Bound pvc-79f2a74d-6e2c-48fb-9ed9-666b74cfa1bb 5Gi RWO ocs-storagecluster-ceph-rbd 61s
Verify if
busybox
is running in the Secondary managed cluster. The busybox application should no longer be running on this managed cluster.$ oc get pods,pvc -n busybox-sample
Example output:
No resources found in busybox-sample namespace.
Be aware of known Metro-DR issues as documented in Known Issues section of Release Notes.