Accueiil
Products
OpenShift Container Platform
4.17
Hosted control planes
Chapter 9. High availability for hosted control planes

Ce contenu n'est pas disponible dans la langue sélectionnée.

Chapter 9. High availability for hosted control planes

9.1. About high availability for hosted control planes
Copier lien

You can maintain high availability (HA) of hosted control planes by implementing the following actions:

Recover etcd members for a hosted cluster.
Back up and restore etcd for a hosted cluster.
Perform a disaster recovery process for a hosted cluster.

9.1.1. Impact of the failed management cluster component
Copier lien

If the management cluster component fails, your workload remains unaffected. In the OpenShift Container Platform management cluster, the control plane is decoupled from the data plane to provide resiliency.

The following table covers the impact of a failed management cluster component on the control plane and the data plane. However, the table does not cover all scenarios for the management cluster component failures.

Expand

Table 9.1. Impact of the failed component on hosted control planes
Name of the failed component	Hosted control plane API status	Hosted cluster data plane status
Worker node	Available	Available
Availability zone	Available	Available
Management cluster control plane	Available	Available
Management cluster control plane and worker nodes	Not available	Available

9.2. Recovering an unhealthy etcd cluster
Copier lien

In a highly available control plane, three etcd pods run as a part of a stateful set in an etcd cluster. To recover an etcd cluster, identify unhealthy etcd pods by checking the etcd cluster health.

9.2.1. Checking the status of an etcd cluster
Copier lien

You can check the status of the etcd cluster health by logging into any etcd pod.

Procedure

Log in to an etcd pod by entering the following command:
```
oc rsh -n openshift-etcd -c etcd <etcd_pod_name>
```
```
$ oc rsh -n openshift-etcd -c etcd <etcd_pod_name>
```
Copy to Clipboard Toggle word wrap

Print the health status of an etcd cluster by entering the following command:

etcdctl endpoint status -w table

sh-4.4# etcdctl endpoint status -w table

Copy to Clipboard

Toggle word wrap

Example output

+------------------------------+-----------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|          ENDPOINT            |       ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+------------------------------+-----------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.1xxx.20:2379 | 8fxxxxxxxxxx    |  3.5.12 |  123 MB |     false |      false |        10 |     180156 |             180156 |        |
| https://192.168.1xxx.21:2379 | a5xxxxxxxxxx    |  3.5.12 |  122 MB |     false |      false |        10 |     180156 |             180156 |        |
| https://192.168.1xxx.22:2379 | 7cxxxxxxxxxx    |  3.5.12 |  124 MB |      true |      false |        10 |     180156 |             180156 |        |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

+------------------------------+-----------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|          ENDPOINT            |       ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+------------------------------+-----------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.1xxx.20:2379 | 8fxxxxxxxxxx    |  3.5.12 |  123 MB |     false |      false |        10 |     180156 |             180156 |        |
| https://192.168.1xxx.21:2379 | a5xxxxxxxxxx    |  3.5.12 |  122 MB |     false |      false |        10 |     180156 |             180156 |        |
| https://192.168.1xxx.22:2379 | 7cxxxxxxxxxx    |  3.5.12 |  124 MB |      true |      false |        10 |     180156 |             180156 |        |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

Copy to Clipboard

Toggle word wrap

9.2.2. Recovering a failing etcd pod
Copier lien

Each etcd pod of a 3-node cluster has its own persistent volume claim (PVC) to store its data. An etcd pod might fail because of corrupted or missing data. You can recover a failing etcd pod and its PVC.

Procedure

To confirm that the etcd pod is failing, enter the following command:

oc get pods -l app=etcd -n openshift-etcd

$ oc get pods -l app=etcd -n openshift-etcd

Copy to Clipboard

Toggle word wrap

Example output

NAME     READY   STATUS             RESTARTS     AGE
etcd-0   2/2     Running            0            64m
etcd-1   2/2     Running            0            45m
etcd-2   1/2     CrashLoopBackOff   1 (5s ago)   64m

NAME     READY   STATUS             RESTARTS     AGE
etcd-0   2/2     Running            0            64m
etcd-1   2/2     Running            0            45m
etcd-2   1/2     CrashLoopBackOff   1 (5s ago)   64m

Copy to Clipboard

Toggle word wrap

The failing etcd pod might have the CrashLoopBackOff or Error status.

Delete the failing pod and its PVC by entering the following command:
```
oc delete pods etcd-2 -n openshift-etcd
```
```
$ oc delete pods etcd-2 -n openshift-etcd
```
Copy to Clipboard Toggle word wrap

Verification

Verify that a new etcd pod is up and running by entering the following command:

oc get pods -l app=etcd -n openshift-etcd

$ oc get pods -l app=etcd -n openshift-etcd

Copy to Clipboard

Toggle word wrap

Example output

NAME     READY   STATUS    RESTARTS   AGE
etcd-0   2/2     Running   0          67m
etcd-1   2/2     Running   0          48m
etcd-2   2/2     Running   0          2m2s

NAME     READY   STATUS    RESTARTS   AGE
etcd-0   2/2     Running   0          67m
etcd-1   2/2     Running   0          48m
etcd-2   2/2     Running   0          2m2s

Copy to Clipboard

Toggle word wrap

9.3. Backing up and restoring etcd in an on-premise environment
Copier lien

You can back up and restore etcd on a hosted cluster in an on-premise environment to fix failures.

9.3.1. Backing up and restoring etcd on a hosted cluster in an on-premise environment
Copier lien

By backing up and restoring etcd on a hosted cluster, you can fix failures, such as corrupted or missing data in an etcd member of a three node cluster. If multiple members of the etcd cluster encounter data loss or have a CrashLoopBackOff status, this approach helps prevent an etcd quorum loss.

Prerequisites

The oc and jq binaries have been installed.

Procedure

First, set up your environment variables:

Set up environment variables for your hosted cluster by entering the following commands, replacing values as necessary:
```
CLUSTER_NAME=my-cluster
```
```
$ CLUSTER_NAME=my-cluster
```
Copy to Clipboard Toggle word wrap
```
HOSTED_CLUSTER_NAMESPACE=clusters
```
```
$ HOSTED_CLUSTER_NAMESPACE=clusters
```
Copy to Clipboard Toggle word wrap
```
CONTROL_PLANE_NAMESPACE="${HOSTED_CLUSTER_NAMESPACE}-${CLUSTER_NAME}"
```
```
$ CONTROL_PLANE_NAMESPACE="${HOSTED_CLUSTER_NAMESPACE}-${CLUSTER_NAME}"
```
Copy to Clipboard Toggle word wrap

Pause reconciliation of the hosted cluster by entering the following command, replacing values as necessary:

oc patch -n ${HOSTED_CLUSTER_NAMESPACE} hostedclusters/${CLUSTER_NAME} \
  -p '{"spec":{"pausedUntil":"true"}}' --type=merge

$ oc patch -n ${HOSTED_CLUSTER_NAMESPACE} hostedclusters/${CLUSTER_NAME} \
  -p '{"spec":{"pausedUntil":"true"}}' --type=merge

Copy to Clipboard

Toggle word wrap

Next, take a snapshot of etcd by using one of the following methods:

Use a previously backed-up snapshot of etcd.

If you have an available etcd pod, take a snapshot from the active etcd pod by completing the following steps:

List etcd pods by entering the following command:

oc get -n ${CONTROL_PLANE_NAMESPACE} pods -l app=etcd

$ oc get -n ${CONTROL_PLANE_NAMESPACE} pods -l app=etcd

Copy to Clipboard

Toggle word wrap

Take a snapshot of the pod database and save it locally to your machine by entering the following commands:

ETCD_POD=etcd-0

$ ETCD_POD=etcd-0

Copy to Clipboard

Toggle word wrap

oc exec -n ${CONTROL_PLANE_NAMESPACE} -c etcd -t ${ETCD_POD} -- \
  env ETCDCTL_API=3 /usr/bin/etcdctl \
  --cacert /etc/etcd/tls/etcd-ca/ca.crt \
  --cert /etc/etcd/tls/client/etcd-client.crt \
  --key /etc/etcd/tls/client/etcd-client.key \
  --endpoints=https://localhost:2379 \
  snapshot save /var/lib/snapshot.db

$ oc exec -n ${CONTROL_PLANE_NAMESPACE} -c etcd -t ${ETCD_POD} -- \
  env ETCDCTL_API=3 /usr/bin/etcdctl \
  --cacert /etc/etcd/tls/etcd-ca/ca.crt \
  --cert /etc/etcd/tls/client/etcd-client.crt \
  --key /etc/etcd/tls/client/etcd-client.key \
  --endpoints=https://localhost:2379 \
  snapshot save /var/lib/snapshot.db

Copy to Clipboard

Toggle word wrap

Verify that the snapshot is successful by entering the following command:

oc exec -n ${CONTROL_PLANE_NAMESPACE} -c etcd -t ${ETCD_POD} -- \
  env ETCDCTL_API=3 /usr/bin/etcdctl -w table snapshot status \
  /var/lib/snapshot.db

$ oc exec -n ${CONTROL_PLANE_NAMESPACE} -c etcd -t ${ETCD_POD} -- \
  env ETCDCTL_API=3 /usr/bin/etcdctl -w table snapshot status \
  /var/lib/snapshot.db

Copy to Clipboard

Toggle word wrap

Make a local copy of the snapshot by entering the following command:

oc cp -c etcd ${CONTROL_PLANE_NAMESPACE}/${ETCD_POD}:/var/lib/snapshot.db \
  /tmp/etcd.snapshot.db

$ oc cp -c etcd ${CONTROL_PLANE_NAMESPACE}/${ETCD_POD}:/var/lib/snapshot.db \
  /tmp/etcd.snapshot.db

Copy to Clipboard

Toggle word wrap

Make a copy of the snapshot database from etcd persistent storage:
1. List etcd pods by entering the following command:
  $ oc get -n ${CONTROL_PLANE_NAMESPACE} pods -l app=etcd
  Copy to Clipboard Toggle word wrap
2. Find a pod that is running and set its name as the value of ETCD_POD: ETCD_POD=etcd-0, and then copy its snapshot database by entering the following command:
  $ oc cp -c etcd \ ${CONTROL_PLANE_NAMESPACE}/${ETCD_POD}:/var/lib/data/member/snap/db \ /tmp/etcd.snapshot.db
  Copy to Clipboard Toggle word wrap

Next, scale down the etcd statefulset by entering the following command:

oc scale -n ${CONTROL_PLANE_NAMESPACE} statefulset/etcd --replicas=0

$ oc scale -n ${CONTROL_PLANE_NAMESPACE} statefulset/etcd --replicas=0

Copy to Clipboard

Toggle word wrap

Delete volumes for second and third members by entering the following command:

oc delete -n ${CONTROL_PLANE_NAMESPACE} pvc/data-etcd-1 pvc/data-etcd-2

$ oc delete -n ${CONTROL_PLANE_NAMESPACE} pvc/data-etcd-1 pvc/data-etcd-2

Copy to Clipboard

Toggle word wrap

Create a pod to access the first etcd member’s data:

Get the etcd image by entering the following command:

ETCD_IMAGE=$(oc get -n ${CONTROL_PLANE_NAMESPACE} statefulset/etcd \
  -o jsonpath='{ .spec.template.spec.containers[0].image }')

$ ETCD_IMAGE=$(oc get -n ${CONTROL_PLANE_NAMESPACE} statefulset/etcd \
  -o jsonpath='{ .spec.template.spec.containers[0].image }')

Copy to Clipboard

Toggle word wrap

Create a pod that allows access to etcd data:

$ cat << EOF | oc apply -n ${CONTROL_PLANE_NAMESPACE} -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: etcd-data
spec:
  replicas: 1
  selector:
    matchLabels:
      app: etcd-data
  template:
    metadata:
      labels:
        app: etcd-data
    spec:
      containers:
      - name: access
        image: $ETCD_IMAGE
        volumeMounts:
        - name: data
          mountPath: /var/lib
        command:
        - /usr/bin/bash
        args:
        - -c
        - |-
          while true; do
            sleep 1000
          done
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: data-etcd-0
    EOF

$ cat << EOF | oc apply -n ${CONTROL_PLANE_NAMESPACE} -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: etcd-data
spec:
  replicas: 1
  selector:
    matchLabels:
      app: etcd-data
  template:
    metadata:
      labels:
        app: etcd-data
    spec:
      containers:
      - name: access
        image: $ETCD_IMAGE
        volumeMounts:
        - name: data
          mountPath: /var/lib
        command:
        - /usr/bin/bash
        args:
        - -c
        - |-
          while true; do
            sleep 1000
          done
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: data-etcd-0
    EOF

Copy to Clipboard

Toggle word wrap

Check the status of the etcd-data pod and wait for it to be running by entering the following command:
```
oc get -n ${CONTROL_PLANE_NAMESPACE} pods -l app=etcd-data
```
```
$ oc get -n ${CONTROL_PLANE_NAMESPACE} pods -l app=etcd-data
```
Copy to Clipboard Toggle word wrap

Get the name of the etcd-data pod by entering the following command:

DATA_POD=$(oc get -n ${CONTROL_PLANE_NAMESPACE} pods --no-headers \
  -l app=etcd-data -o name | cut -d/ -f2)

$ DATA_POD=$(oc get -n ${CONTROL_PLANE_NAMESPACE} pods --no-headers \
  -l app=etcd-data -o name | cut -d/ -f2)

Copy to Clipboard

Toggle word wrap

Copy an etcd snapshot into the pod by entering the following command:

oc cp /tmp/etcd.snapshot.db \
  ${CONTROL_PLANE_NAMESPACE}/${DATA_POD}:/var/lib/restored.snap.db

$ oc cp /tmp/etcd.snapshot.db \
  ${CONTROL_PLANE_NAMESPACE}/${DATA_POD}:/var/lib/restored.snap.db

Copy to Clipboard

Toggle word wrap

Remove old data from the etcd-data pod by entering the following commands:

oc exec -n ${CONTROL_PLANE_NAMESPACE} ${DATA_POD} -- rm -rf /var/lib/data

$ oc exec -n ${CONTROL_PLANE_NAMESPACE} ${DATA_POD} -- rm -rf /var/lib/data

Copy to Clipboard

Toggle word wrap

oc exec -n ${CONTROL_PLANE_NAMESPACE} ${DATA_POD} -- mkdir -p /var/lib/data

$ oc exec -n ${CONTROL_PLANE_NAMESPACE} ${DATA_POD} -- mkdir -p /var/lib/data

Copy to Clipboard

Toggle word wrap

Restore the etcd snapshot by entering the following command:

oc exec -n ${CONTROL_PLANE_NAMESPACE} ${DATA_POD} -- \
     etcdutl snapshot restore /var/lib/restored.snap.db \
     --data-dir=/var/lib/data --skip-hash-check \
     --name etcd-0 \
     --initial-cluster-token=etcd-cluster \
     --initial-cluster etcd-0=https://etcd-0.etcd-discovery.${CONTROL_PLANE_NAMESPACE}.svc:2380,etcd-1=https://etcd-1.etcd-discovery.${CONTROL_PLANE_NAMESPACE}.svc:2380,etcd-2=https://etcd-2.etcd-discovery.${CONTROL_PLANE_NAMESPACE}.svc:2380 \
     --initial-advertise-peer-urls https://etcd-0.etcd-discovery.${CONTROL_PLANE_NAMESPACE}.svc:2380

$ oc exec -n ${CONTROL_PLANE_NAMESPACE} ${DATA_POD} -- \
     etcdutl snapshot restore /var/lib/restored.snap.db \
     --data-dir=/var/lib/data --skip-hash-check \
     --name etcd-0 \
     --initial-cluster-token=etcd-cluster \
     --initial-cluster etcd-0=https://etcd-0.etcd-discovery.${CONTROL_PLANE_NAMESPACE}.svc:2380,etcd-1=https://etcd-1.etcd-discovery.${CONTROL_PLANE_NAMESPACE}.svc:2380,etcd-2=https://etcd-2.etcd-discovery.${CONTROL_PLANE_NAMESPACE}.svc:2380 \
     --initial-advertise-peer-urls https://etcd-0.etcd-discovery.${CONTROL_PLANE_NAMESPACE}.svc:2380

Copy to Clipboard

Toggle word wrap

Remove the temporary etcd snapshot from the pod by entering the following command:

oc exec -n ${CONTROL_PLANE_NAMESPACE} ${DATA_POD} -- \
  rm /var/lib/restored.snap.db

$ oc exec -n ${CONTROL_PLANE_NAMESPACE} ${DATA_POD} -- \
  rm /var/lib/restored.snap.db

Copy to Clipboard

Toggle word wrap

Delete data access deployment by entering the following command:

oc delete -n ${CONTROL_PLANE_NAMESPACE} deployment/etcd-data

$ oc delete -n ${CONTROL_PLANE_NAMESPACE} deployment/etcd-data

Copy to Clipboard

Toggle word wrap

Scale up the etcd cluster by entering the following command:

oc scale -n ${CONTROL_PLANE_NAMESPACE} statefulset/etcd --replicas=3

$ oc scale -n ${CONTROL_PLANE_NAMESPACE} statefulset/etcd --replicas=3

Copy to Clipboard

Toggle word wrap

Wait for the etcd member pods to return and report as available by entering the following command:
```
oc get -n ${CONTROL_PLANE_NAMESPACE} pods -l app=etcd -w
```
```
$ oc get -n ${CONTROL_PLANE_NAMESPACE} pods -l app=etcd -w
```
Copy to Clipboard Toggle word wrap

Restore reconciliation of the hosted cluster by entering the following command:

oc patch -n ${HOSTED_CLUSTER_NAMESPACE} hostedclusters/${CLUSTER_NAME} \
  -p '{"spec":{"pausedUntil":"null"}}' --type=merge

$ oc patch -n ${HOSTED_CLUSTER_NAMESPACE} hostedclusters/${CLUSTER_NAME} \
  -p '{"spec":{"pausedUntil":"null"}}' --type=merge

Copy to Clipboard

Toggle word wrap

Manually roll out the hosted cluster by entering the following command:

oc annotate hostedcluster -n \
  <hosted_cluster_namespace> <hosted_cluster_name> \
  hypershift.openshift.io/restart-date=$(date --iso-8601=seconds)

$ oc annotate hostedcluster -n \
  <hosted_cluster_namespace> <hosted_cluster_name> \
  hypershift.openshift.io/restart-date=$(date --iso-8601=seconds)

Copy to Clipboard

Toggle word wrap

The Multus admission controller and network node identity pods do not start yet.

Delete the pods for the second and third members of etcd and their PVCs by entering the following commands:

oc delete -n ${CONTROL_PLANE_NAMESPACE} pvc/data-etcd-1 pod/etcd-1 --wait=false

$ oc delete -n ${CONTROL_PLANE_NAMESPACE} pvc/data-etcd-1 pod/etcd-1 --wait=false

Copy to Clipboard

Toggle word wrap

oc delete -n ${CONTROL_PLANE_NAMESPACE} pvc/data-etcd-2 pod/etcd-2 --wait=false

$ oc delete -n ${CONTROL_PLANE_NAMESPACE} pvc/data-etcd-2 pod/etcd-2 --wait=false

Copy to Clipboard

Toggle word wrap

Manually roll out the hosted cluster again by entering the following command:

oc annotate hostedcluster -n \
  <hosted_cluster_namespace> <hosted_cluster_name> \
  hypershift.openshift.io/restart-date=$(date --iso-8601=seconds) \
  --overwrite

$ oc annotate hostedcluster -n \
  <hosted_cluster_namespace> <hosted_cluster_name> \
  hypershift.openshift.io/restart-date=$(date --iso-8601=seconds) \
  --overwrite

Copy to Clipboard

Toggle word wrap

After a few minutes, the control plane pods start running.

9.4. Backing up and restoring etcd on AWS
Copier lien

You can back up and restore etcd on a hosted cluster on Amazon Web Services (AWS) to fix failures.

9.4.1. Taking a snapshot of etcd for a hosted cluster
Copier lien

To back up etcd for a hosted cluster, you must take a snapshot of etcd. Later, you can restore etcd by using the snapshot.

Important

This procedure requires API downtime.

Procedure

Pause reconciliation of the hosted cluster by entering the following command:

oc patch -n clusters hostedclusters/<hosted_cluster_name> \
  -p '{"spec":{"pausedUntil":"true"}}' --type=merge

$ oc patch -n clusters hostedclusters/<hosted_cluster_name> \
  -p '{"spec":{"pausedUntil":"true"}}' --type=merge

Copy to Clipboard

Toggle word wrap

Stop all etcd-writer deployments by entering the following command:

oc scale deployment -n <hosted_cluster_namespace> --replicas=0 \
  kube-apiserver openshift-apiserver openshift-oauth-apiserver

$ oc scale deployment -n <hosted_cluster_namespace> --replicas=0 \
  kube-apiserver openshift-apiserver openshift-oauth-apiserver

Copy to Clipboard

Toggle word wrap

To take an etcd snapshot, use the exec command in each etcd container by entering the following command:

oc exec -it <etcd_pod_name> -n <hosted_cluster_namespace> -- \
  env ETCDCTL_API=3 /usr/bin/etcdctl \
  --cacert /etc/etcd/tls/etcd-ca/ca.crt \
  --cert /etc/etcd/tls/client/etcd-client.crt \
  --key /etc/etcd/tls/client/etcd-client.key \
  --endpoints=localhost:2379 \
  snapshot save /var/lib/data/snapshot.db

$ oc exec -it <etcd_pod_name> -n <hosted_cluster_namespace> -- \
  env ETCDCTL_API=3 /usr/bin/etcdctl \
  --cacert /etc/etcd/tls/etcd-ca/ca.crt \
  --cert /etc/etcd/tls/client/etcd-client.crt \
  --key /etc/etcd/tls/client/etcd-client.key \
  --endpoints=localhost:2379 \
  snapshot save /var/lib/data/snapshot.db

Copy to Clipboard

Toggle word wrap

To check the snapshot status, use the exec command in each etcd container by running the following command:

oc exec -it <etcd_pod_name> -n <hosted_cluster_namespace> -- \
  env ETCDCTL_API=3 /usr/bin/etcdctl -w table snapshot status \
  /var/lib/data/snapshot.db

$ oc exec -it <etcd_pod_name> -n <hosted_cluster_namespace> -- \
  env ETCDCTL_API=3 /usr/bin/etcdctl -w table snapshot status \
  /var/lib/data/snapshot.db

Copy to Clipboard

Toggle word wrap

Copy the snapshot data to a location where you can retrieve it later, such as an S3 bucket. See the following example.

Note

The following example uses signature version 2. If you are in a region that supports signature version 4, such as the us-east-2 region, use signature version 4. Otherwise, when copying the snapshot to an S3 bucket, the upload fails.

Example

BUCKET_NAME=somebucket
CLUSTER_NAME=cluster_name
FILEPATH="/${BUCKET_NAME}/${CLUSTER_NAME}-snapshot.db"
CONTENT_TYPE="application/x-compressed-tar"
DATE_VALUE=`date -R`
SIGNATURE_STRING="PUT\n\n${CONTENT_TYPE}\n${DATE_VALUE}\n${FILEPATH}"
ACCESS_KEY=accesskey
SECRET_KEY=secret
SIGNATURE_HASH=`echo -en ${SIGNATURE_STRING} | openssl sha1 -hmac ${SECRET_KEY} -binary | base64`
HOSTED_CLUSTER_NAMESPACE=hosted_cluster_namespace

oc exec -it etcd-0 -n ${HOSTED_CLUSTER_NAMESPACE} -- curl -X PUT -T "/var/lib/data/snapshot.db" \
  -H "Host: ${BUCKET_NAME}.s3.amazonaws.com" \
  -H "Date: ${DATE_VALUE}" \
  -H "Content-Type: ${CONTENT_TYPE}" \
  -H "Authorization: AWS ${ACCESS_KEY}:${SIGNATURE_HASH}" \
  https://${BUCKET_NAME}.s3.amazonaws.com/${CLUSTER_NAME}-snapshot.db

BUCKET_NAME=somebucket
CLUSTER_NAME=cluster_name
FILEPATH="/${BUCKET_NAME}/${CLUSTER_NAME}-snapshot.db"
CONTENT_TYPE="application/x-compressed-tar"
DATE_VALUE=`date -R`
SIGNATURE_STRING="PUT\n\n${CONTENT_TYPE}\n${DATE_VALUE}\n${FILEPATH}"
ACCESS_KEY=accesskey
SECRET_KEY=secret
SIGNATURE_HASH=`echo -en ${SIGNATURE_STRING} | openssl sha1 -hmac ${SECRET_KEY} -binary | base64`
HOSTED_CLUSTER_NAMESPACE=hosted_cluster_namespace

oc exec -it etcd-0 -n ${HOSTED_CLUSTER_NAMESPACE} -- curl -X PUT -T "/var/lib/data/snapshot.db" \
  -H "Host: ${BUCKET_NAME}.s3.amazonaws.com" \
  -H "Date: ${DATE_VALUE}" \
  -H "Content-Type: ${CONTENT_TYPE}" \
  -H "Authorization: AWS ${ACCESS_KEY}:${SIGNATURE_HASH}" \
  https://${BUCKET_NAME}.s3.amazonaws.com/${CLUSTER_NAME}-snapshot.db

Copy to Clipboard

Toggle word wrap

To restore the snapshot on a new cluster later, save the encryption secret that the hosted cluster references.

Get the secret encryption key by entering the following command:

oc get hostedcluster <hosted_cluster_name> \
  -o=jsonpath='{.spec.secretEncryption.aescbc}'
{"activeKey":{"name":"<hosted_cluster_name>-etcd-encryption-key"}}

$ oc get hostedcluster <hosted_cluster_name> \
  -o=jsonpath='{.spec.secretEncryption.aescbc}'
{"activeKey":{"name":"<hosted_cluster_name>-etcd-encryption-key"}}

Copy to Clipboard

Toggle word wrap

Save the secret encryption key by entering the following command:
```
oc get secret <hosted_cluster_name>-etcd-encryption-key \
  -o=jsonpath='{.data.key}'
```
```
$ oc get secret <hosted_cluster_name>-etcd-encryption-key \
  -o=jsonpath='{.data.key}'
```
Copy to Clipboard Toggle word wrap
You can decrypt this key when restoring a snapshot on a new cluster.

Restart all etcd-writer deployments by entering the following command:

oc scale deployment -n <control_plane_namespace> --replicas=3 \
  kube-apiserver openshift-apiserver openshift-oauth-apiserver

$ oc scale deployment -n <control_plane_namespace> --replicas=3 \
  kube-apiserver openshift-apiserver openshift-oauth-apiserver

Copy to Clipboard

Toggle word wrap

Resume the reconciliation of the hosted cluster by entering the following command:

oc patch -n <hosted_cluster_namespace> \
  -p '[\{"op": "remove", "path": "/spec/pausedUntil"}]' --type=json

$ oc patch -n <hosted_cluster_namespace> \
  -p '[\{"op": "remove", "path": "/spec/pausedUntil"}]' --type=json

Copy to Clipboard

Toggle word wrap

Next steps

Restore the etcd snapshot.

9.4.2. Restoring an etcd snapshot on a hosted cluster
Copier lien

If you have a snapshot of etcd from your hosted cluster, you can restore it. Currently, you can restore an etcd snapshot only during cluster creation.

To restore an etcd snapshot, you modify the output from the create cluster --render command and define a restoreSnapshotURL value in the etcd section of the HostedCluster specification.

Note

The --render flag in the hcp create command does not render the secrets. To render the secrets, you must use both the --render and the --render-sensitive flags in the hcp create command.

Prerequisites

You took an etcd snapshot on a hosted cluster.

Procedure

On the aws command-line interface (CLI), create a pre-signed URL so that you can download your etcd snapshot from S3 without passing credentials to the etcd deployment:

ETCD_SNAPSHOT=${ETCD_SNAPSHOT:-"s3://${BUCKET_NAME}/${CLUSTER_NAME}-snapshot.db"}
ETCD_SNAPSHOT_URL=$(aws s3 presign ${ETCD_SNAPSHOT})

ETCD_SNAPSHOT=${ETCD_SNAPSHOT:-"s3://${BUCKET_NAME}/${CLUSTER_NAME}-snapshot.db"}
ETCD_SNAPSHOT_URL=$(aws s3 presign ${ETCD_SNAPSHOT})

Copy to Clipboard

Toggle word wrap

Modify the HostedCluster specification to refer to the URL:

spec:
  etcd:
    managed:
      storage:
        persistentVolume:
          size: 4Gi
        type: PersistentVolume
        restoreSnapshotURL:
        - "${ETCD_SNAPSHOT_URL}"
    managementType: Managed

spec:
  etcd:
    managed:
      storage:
        persistentVolume:
          size: 4Gi
        type: PersistentVolume
        restoreSnapshotURL:
        - "${ETCD_SNAPSHOT_URL}"
    managementType: Managed

Copy to Clipboard

Toggle word wrap

Ensure that the secret that you referenced from the spec.secretEncryption.aescbc value contains the same AES key that you saved in the previous steps.

9.5. Backing up and restoring a hosted cluster on OpenShift Virtualization
Copier lien

You can back up and restore a hosted cluster on OpenShift Virtualization to fix failures.

9.5.1. Backing up a hosted cluster on OpenShift Virtualization
Copier lien

When you back up a hosted cluster on OpenShift Virtualization, the hosted cluster can remain running. The backup contains the hosted control plane components and the etcd for the hosted cluster.

When the hosted cluster is not running compute nodes on external infrastructure, hosted cluster workload data that is stored in persistent volume claims (PVCs) that are provisioned by KubeVirt CSI are also backed up. The backup does not contain any KubeVirt virtual machines (VMs) that are used as compute nodes. Those VMs are automatically re-created after the restore process is completed.

Procedure

Create a Velero backup resource by creating a YAML file that is similar to the following example:

apiVersion: velero.io/v1
kind: Backup
metadata:
  name: hc-clusters-hosted-backup
  namespace: openshift-adp
  labels:
    velero.io/storage-location: default
spec:
  includedNamespaces: 
  - clusters
  - clusters-hosted
  includedResources:
  - sa
  - role
  - rolebinding
  - deployment
  - statefulset
  - pv
  - pvc
  - bmh
  - configmap
  - infraenv
  - priorityclasses
  - pdb
  - hostedcluster
  - nodepool
  - secrets
  - hostedcontrolplane
  - cluster
  - datavolume
  - service
  - route
  excludedResources: [ ]
  labelSelector: 
    matchExpressions:
    - key: 'hypershift.openshift.io/is-kubevirt-rhcos'
      operator: 'DoesNotExist'
  storageLocation: default
  preserveNodePorts: true
  ttl: 4h0m0s
  snapshotMoveData: true 
  datamover: "velero" 
  defaultVolumesToFsBackup: false

apiVersion: velero.io/v1
kind: Backup
metadata:
  name: hc-clusters-hosted-backup
  namespace: openshift-adp
  labels:
    velero.io/storage-location: default
spec:
  includedNamespaces:


  - clusters
  - clusters-hosted
  includedResources:
  - sa
  - role
  - rolebinding
  - deployment
  - statefulset
  - pv
  - pvc
  - bmh
  - configmap
  - infraenv
  - priorityclasses
  - pdb
  - hostedcluster
  - nodepool
  - secrets
  - hostedcontrolplane
  - cluster
  - datavolume
  - service
  - route
  excludedResources: [ ]
  labelSelector:


    matchExpressions:
    - key: 'hypershift.openshift.io/is-kubevirt-rhcos'
      operator: 'DoesNotExist'
  storageLocation: default
  preserveNodePorts: true
  ttl: 4h0m0s
  snapshotMoveData: true


  datamover: "velero"


  defaultVolumesToFsBackup: false

Copy to Clipboard

Toggle word wrap

1: This field selects the namespaces from the objects to back up. Include namespaces from both the hosted cluster and the hosted control plane. In this example, clusters is a namespace from the hosted cluster and clusters-hosted is a namespace from the hosted control plane. By default, the HostedControlPlane namespace is clusters-<hosted_cluster_name>.
2: The boot image of the VMs that are used as the hosted cluster nodes are stored in large PVCs. To reduce backup time and storage size, you can filter those PVCs out of the backup by adding this label selector.
3: This field and the datamover field enable automatically uploading the CSI VolumeSnapshots to remote cloud storage.
4: This field and the snapshotMoveData field enable automatically uploading the CSI VolumeSnapshots to remote cloud storage.
5: This field indicates whether pod volume file system backup is used for all volumes by default. Set this value to false to back up the PVCs that you want.

Apply the changes to the YAML file by entering the following command:
```
oc apply -f <backup_file_name>.yaml
```
```
$ oc apply -f <backup_file_name>.yaml
```
Copy to Clipboard Toggle word wrap
Replace <backup_file_name> with the name of your file.
Monitor the backup process in the backup object status and in the Velero logs.
- To monitor the backup object status, enter the following command:
  $ watch "oc get backups.velero.io -n openshift-adp <backup_file_name> -o jsonpath='{.status}' | jq"
  Copy to Clipboard Toggle word wrap
- To monitor the Velero logs, enter the following command:
  $ oc logs -n openshift-adp -ldeploy=velero -f
  Copy to Clipboard Toggle word wrap

Verification

When the status.phase field is Completed, the backup process is considered complete.

9.5.2. Restoring a hosted cluster on OpenShift Virtualization
Copier lien

After you back up a hosted cluster on OpenShift Virtualization, you can restore the backup.

Note

The restore process can be completed only on the same management cluster where you created the backup.

Procedure

Ensure that no pods or persistent volume claims (PVCs) are running in the HostedControlPlane namespace.
Delete the following objects from the management cluster:
- HostedCluster
- NodePool
- PVCs

Create a restoration manifest YAML file that is similar to the following example:

apiVersion: velero.io/v1
kind: Restore
metadata:
  name: hc-clusters-hosted-restore
  namespace: openshift-adp
spec:
  backupName: hc-clusters-hosted-backup
  restorePVs: true 
  existingResourcePolicy: update 
  excludedResources:
  - nodes
  - events
  - events.events.k8s.io
  - backups.velero.io
  - restores.velero.io
  - resticrepositories.velero.io

apiVersion: velero.io/v1
kind: Restore
metadata:
  name: hc-clusters-hosted-restore
  namespace: openshift-adp
spec:
  backupName: hc-clusters-hosted-backup
  restorePVs: true


  existingResourcePolicy: update


  excludedResources:
  - nodes
  - events
  - events.events.k8s.io
  - backups.velero.io
  - restores.velero.io
  - resticrepositories.velero.io

Copy to Clipboard

Toggle word wrap

1: This field starts the recovery of pods with the included persistent volumes.
2: Setting existingResourcePolicy to update ensures that any existing objects are overwritten with backup content. This action can cause issues with objects that contain immutable fields, which is why you deleted the HostedCluster, node pools, and PVCs. If you do not set this policy, the Velero engine skips the restoration of objects that already exist.

Apply the changes to the YAML file by entering the following command:
```
oc apply -f <restore_resource_file_name>.yaml
```
```
$ oc apply -f <restore_resource_file_name>.yaml
```
Copy to Clipboard Toggle word wrap
Replace <restore_resource_file_name> with the name of your file.
Monitor the restore process by checking the restore status field and the Velero logs.
- To check the restore status field, enter the following command:
  $ watch "oc get restores.velero.io -n openshift-adp <backup_file_name> -o jsonpath='{.status}' | jq"
  Copy to Clipboard Toggle word wrap
- To check the Velero logs, enter the following command:
  $ oc logs -n openshift-adp -ldeploy=velero -f
  Copy to Clipboard Toggle word wrap

Verification

When the status.phase field is Completed, the restore process is considered complete.

Next steps

After some time, the KubeVirt VMs are created and join the hosted cluster as compute nodes. Make sure that the hosted cluster workloads are running again as expected.

9.6. Disaster recovery for a hosted cluster in AWS
Copier lien

You can recover a hosted cluster to the same region within Amazon Web Services (AWS). For example, you need disaster recovery when the upgrade of a management cluster fails and the hosted cluster is in a read-only state.

The disaster recovery process involves the following steps:

Backing up the hosted cluster on the source management cluster
Restoring the hosted cluster on a destination management cluster
Deleting the hosted cluster from the source management cluster

Your workloads remain running during the process. The Cluster API might be unavailable for a period, but that does not affect the services that are running on the worker nodes.

Important

Both the source management cluster and the destination management cluster must have the --external-dns flags to maintain the API server URL. That way, the server URL ends with https://api-sample-hosted.sample-hosted.aws.openshift.com. See the following example:

Example: External DNS flags

--external-dns-provider=aws \
--external-dns-credentials=<path_to_aws_credentials_file> \
--external-dns-domain-filter=<basedomain>

--external-dns-provider=aws \
--external-dns-credentials=<path_to_aws_credentials_file> \
--external-dns-domain-filter=<basedomain>

Copy to Clipboard

Toggle word wrap

If you do not include the --external-dns flags to maintain the API server URL, you cannot migrate the hosted cluster.

9.6.1. Overview of the backup and restore process
Copier lien

The backup and restore process works as follows:

On management cluster 1, which you can think of as the source management cluster, the control plane and workers interact by using the external DNS API. The external DNS API is accessible, and a load balancer sits between the management clusters.

View larger image
You take a snapshot of the hosted cluster, which includes etcd, the control plane, and the worker nodes. During this process, the worker nodes continue to try to access the external DNS API even if it is not accessible, the workloads are running, the control plane is saved in a local manifest file, and etcd is backed up to an S3 bucket. The data plane is active and the control plane is paused.

View larger image
On management cluster 2, which you can think of as the destination management cluster, you restore etcd from the S3 bucket and restore the control plane from the local manifest file. During this process, the external DNS API is stopped, the hosted cluster API becomes inaccessible, and any workers that use the API are unable to update their manifest files, but the workloads are still running.

View larger image
The external DNS API is accessible again, and the worker nodes use it to move to management cluster 2. The external DNS API can access the load balancer that points to the control plane.

View larger image
On management cluster 2, the control plane and worker nodes interact by using the external DNS API. The resources are deleted from management cluster 1, except for the S3 backup of etcd. If you try to set up the hosted cluster again on mangagement cluster 1, it will not work.

View larger image

9.6.2. Backing up a hosted cluster on AWS
Copier lien

To recover your hosted cluster in your target management cluster, you first need to back up all of the relevant data.

Procedure

Create a config map file to declare the source management cluster by entering the following command:

oc create configmap mgmt-parent-cluster -n default \
  --from-literal=from=${MGMT_CLUSTER_NAME}

$ oc create configmap mgmt-parent-cluster -n default \
  --from-literal=from=${MGMT_CLUSTER_NAME}

Copy to Clipboard

Toggle word wrap

Shut down the reconciliation in the hosted cluster and in the node pools by entering the following commands:

PAUSED_UNTIL="true"

$ PAUSED_UNTIL="true"

Copy to Clipboard

Toggle word wrap

oc patch -n ${HC_CLUSTER_NS} hostedclusters/${HC_CLUSTER_NAME} \
  -p '{"spec":{"pausedUntil":"'${PAUSED_UNTIL}'"}}' --type=merge

$ oc patch -n ${HC_CLUSTER_NS} hostedclusters/${HC_CLUSTER_NAME} \
  -p '{"spec":{"pausedUntil":"'${PAUSED_UNTIL}'"}}' --type=merge

Copy to Clipboard

Toggle word wrap

oc patch -n ${HC_CLUSTER_NS} nodepools/${NODEPOOLS} \
  -p '{"spec":{"pausedUntil":"'${PAUSED_UNTIL}'"}}' --type=merge

$ oc patch -n ${HC_CLUSTER_NS} nodepools/${NODEPOOLS} \
  -p '{"spec":{"pausedUntil":"'${PAUSED_UNTIL}'"}}' --type=merge

Copy to Clipboard

Toggle word wrap

oc scale deployment -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} --replicas=0 \
  kube-apiserver openshift-apiserver openshift-oauth-apiserver control-plane-operator

$ oc scale deployment -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} --replicas=0 \
  kube-apiserver openshift-apiserver openshift-oauth-apiserver control-plane-operator

Copy to Clipboard

Toggle word wrap

Back up etcd and upload the data to an S3 bucket by running the following bash script:

Tip

Wrap this script in a function and call it from the main function.

# ETCD Backup
ETCD_PODS="etcd-0"
if [ "${CONTROL_PLANE_AVAILABILITY_POLICY}" = "HighlyAvailable" ]; then
  ETCD_PODS="etcd-0 etcd-1 etcd-2"
fi

for POD in ${ETCD_PODS}; do
  # Create an etcd snapshot
  oc exec -it ${POD} -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -- env ETCDCTL_API=3 /usr/bin/etcdctl --cacert /etc/etcd/tls/client/etcd-client-ca.crt --cert /etc/etcd/tls/client/etcd-client.crt --key /etc/etcd/tls/client/etcd-client.key --endpoints=localhost:2379 snapshot save /var/lib/data/snapshot.db
  oc exec -it ${POD} -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -- env ETCDCTL_API=3 /usr/bin/etcdctl -w table snapshot status /var/lib/data/snapshot.db

  FILEPATH="/${BUCKET_NAME}/${HC_CLUSTER_NAME}-${POD}-snapshot.db"
  CONTENT_TYPE="application/x-compressed-tar"
  DATE_VALUE=`date -R`
  SIGNATURE_STRING="PUT\n\n${CONTENT_TYPE}\n${DATE_VALUE}\n${FILEPATH}"

  set +x
  ACCESS_KEY=$(grep aws_access_key_id ${AWS_CREDS} | head -n1 | cut -d= -f2 | sed "s/ //g")
  SECRET_KEY=$(grep aws_secret_access_key ${AWS_CREDS} | head -n1 | cut -d= -f2 | sed "s/ //g")
  SIGNATURE_HASH=$(echo -en ${SIGNATURE_STRING} | openssl sha1 -hmac "${SECRET_KEY}" -binary | base64)
  set -x

  # FIXME: this is pushing to the OIDC bucket
  oc exec -it etcd-0 -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -- curl -X PUT -T "/var/lib/data/snapshot.db" \
    -H "Host: ${BUCKET_NAME}.s3.amazonaws.com" \
    -H "Date: ${DATE_VALUE}" \
    -H "Content-Type: ${CONTENT_TYPE}" \
    -H "Authorization: AWS ${ACCESS_KEY}:${SIGNATURE_HASH}" \
    https://${BUCKET_NAME}.s3.amazonaws.com/${HC_CLUSTER_NAME}-${POD}-snapshot.db
done

# ETCD Backup
ETCD_PODS="etcd-0"
if [ "${CONTROL_PLANE_AVAILABILITY_POLICY}" = "HighlyAvailable" ]; then
  ETCD_PODS="etcd-0 etcd-1 etcd-2"
fi

for POD in ${ETCD_PODS}; do
  # Create an etcd snapshot
  oc exec -it ${POD} -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -- env ETCDCTL_API=3 /usr/bin/etcdctl --cacert /etc/etcd/tls/client/etcd-client-ca.crt --cert /etc/etcd/tls/client/etcd-client.crt --key /etc/etcd/tls/client/etcd-client.key --endpoints=localhost:2379 snapshot save /var/lib/data/snapshot.db
  oc exec -it ${POD} -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -- env ETCDCTL_API=3 /usr/bin/etcdctl -w table snapshot status /var/lib/data/snapshot.db

  FILEPATH="/${BUCKET_NAME}/${HC_CLUSTER_NAME}-${POD}-snapshot.db"
  CONTENT_TYPE="application/x-compressed-tar"
  DATE_VALUE=`date -R`
  SIGNATURE_STRING="PUT\n\n${CONTENT_TYPE}\n${DATE_VALUE}\n${FILEPATH}"

  set +x
  ACCESS_KEY=$(grep aws_access_key_id ${AWS_CREDS} | head -n1 | cut -d= -f2 | sed "s/ //g")
  SECRET_KEY=$(grep aws_secret_access_key ${AWS_CREDS} | head -n1 | cut -d= -f2 | sed "s/ //g")
  SIGNATURE_HASH=$(echo -en ${SIGNATURE_STRING} | openssl sha1 -hmac "${SECRET_KEY}" -binary | base64)
  set -x

  # FIXME: this is pushing to the OIDC bucket
  oc exec -it etcd-0 -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -- curl -X PUT -T "/var/lib/data/snapshot.db" \
    -H "Host: ${BUCKET_NAME}.s3.amazonaws.com" \
    -H "Date: ${DATE_VALUE}" \
    -H "Content-Type: ${CONTENT_TYPE}" \
    -H "Authorization: AWS ${ACCESS_KEY}:${SIGNATURE_HASH}" \
    https://${BUCKET_NAME}.s3.amazonaws.com/${HC_CLUSTER_NAME}-${POD}-snapshot.db
done

Copy to Clipboard

Toggle word wrap

For more information about backing up etcd, see "Backing up and restoring etcd on a hosted cluster".

Back up Kubernetes and OpenShift Container Platform objects by entering the following commands. You need to back up the following objects:

HostedCluster and NodePool objects from the HostedCluster namespace
HostedCluster secrets from the HostedCluster namespace
HostedControlPlane from the Hosted Control Plane namespace
Cluster from the Hosted Control Plane namespace
AWSCluster, AWSMachineTemplate, and AWSMachine from the Hosted Control Plane namespace
MachineDeployments, MachineSets, and Machines from the Hosted Control Plane namespace

ControlPlane secrets from the Hosted Control Plane namespace

Enter the following commands:

mkdir -p ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS} \
  ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}

$ mkdir -p ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS} \
  ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}

Copy to Clipboard

Toggle word wrap

chmod 700 ${BACKUP_DIR}/namespaces/

$ chmod 700 ${BACKUP_DIR}/namespaces/

Copy to Clipboard

Toggle word wrap

Back up the HostedCluster objects from the HostedCluster namespace by entering the following commands:

echo "Backing Up HostedCluster Objects:"

$ echo "Backing Up HostedCluster Objects:"

Copy to Clipboard

Toggle word wrap

oc get hc ${HC_CLUSTER_NAME} -n ${HC_CLUSTER_NS} -o yaml > \
  ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}/hc-${HC_CLUSTER_NAME}.yaml

$ oc get hc ${HC_CLUSTER_NAME} -n ${HC_CLUSTER_NS} -o yaml > \
  ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}/hc-${HC_CLUSTER_NAME}.yaml

Copy to Clipboard

Toggle word wrap

echo "--> HostedCluster"

$ echo "--> HostedCluster"

Copy to Clipboard

Toggle word wrap

sed -i '' -e '/^status:$/,$d' \
  ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}/hc-${HC_CLUSTER_NAME}.yaml

$ sed -i '' -e '/^status:$/,$d' \
  ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}/hc-${HC_CLUSTER_NAME}.yaml

Copy to Clipboard

Toggle word wrap

Back up the NodePool objects from the HostedCluster namespace by entering the following commands:

oc get np ${NODEPOOLS} -n ${HC_CLUSTER_NS} -o yaml > \
  ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}/np-${NODEPOOLS}.yaml

$ oc get np ${NODEPOOLS} -n ${HC_CLUSTER_NS} -o yaml > \
  ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}/np-${NODEPOOLS}.yaml

Copy to Clipboard

Toggle word wrap

echo "--> NodePool"

$ echo "--> NodePool"

Copy to Clipboard

Toggle word wrap

sed -i '' -e '/^status:$/,$ d' \
  ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}/np-${NODEPOOLS}.yaml

$ sed -i '' -e '/^status:$/,$ d' \
  ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}/np-${NODEPOOLS}.yaml

Copy to Clipboard

Toggle word wrap

Back up the secrets in the HostedCluster namespace by running the following shell script:

echo "--> HostedCluster Secrets:"
for s in $(oc get secret -n ${HC_CLUSTER_NS} | grep "^${HC_CLUSTER_NAME}" | awk '{print $1}'); do
    oc get secret -n ${HC_CLUSTER_NS} $s -o yaml > ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}/secret-${s}.yaml
done

$ echo "--> HostedCluster Secrets:"
for s in $(oc get secret -n ${HC_CLUSTER_NS} | grep "^${HC_CLUSTER_NAME}" | awk '{print $1}'); do
    oc get secret -n ${HC_CLUSTER_NS} $s -o yaml > ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}/secret-${s}.yaml
done

Copy to Clipboard

Toggle word wrap

Back up the secrets in the HostedCluster control plane namespace by running the following shell script:

echo "--> HostedCluster ControlPlane Secrets:"
for s in $(oc get secret -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} | egrep -v "docker|service-account-token|oauth-openshift|NAME|token-${HC_CLUSTER_NAME}" | awk '{print $1}'); do
    oc get secret -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} $s -o yaml > ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/secret-${s}.yaml
done

$ echo "--> HostedCluster ControlPlane Secrets:"
for s in $(oc get secret -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} | egrep -v "docker|service-account-token|oauth-openshift|NAME|token-${HC_CLUSTER_NAME}" | awk '{print $1}'); do
    oc get secret -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} $s -o yaml > ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/secret-${s}.yaml
done

Copy to Clipboard

Toggle word wrap

Back up the hosted control plane by entering the following commands:

echo "--> HostedControlPlane:"

$ echo "--> HostedControlPlane:"

Copy to Clipboard

Toggle word wrap

oc get hcp ${HC_CLUSTER_NAME} -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -o yaml > \
  ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/hcp-${HC_CLUSTER_NAME}.yaml

$ oc get hcp ${HC_CLUSTER_NAME} -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -o yaml > \
  ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/hcp-${HC_CLUSTER_NAME}.yaml

Copy to Clipboard

Toggle word wrap

Back up the cluster by entering the following commands:

echo "--> Cluster:"

$ echo "--> Cluster:"

Copy to Clipboard

Toggle word wrap

CL_NAME=$(oc get hcp ${HC_CLUSTER_NAME} -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} \
  -o jsonpath={.metadata.labels.\*} | grep ${HC_CLUSTER_NAME})

$ CL_NAME=$(oc get hcp ${HC_CLUSTER_NAME} -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} \
  -o jsonpath={.metadata.labels.\*} | grep ${HC_CLUSTER_NAME})

Copy to Clipboard

Toggle word wrap

oc get cluster ${CL_NAME} -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -o yaml > \
  ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/cl-${HC_CLUSTER_NAME}.yaml

$ oc get cluster ${CL_NAME} -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -o yaml > \
  ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/cl-${HC_CLUSTER_NAME}.yaml

Copy to Clipboard

Toggle word wrap

Back up the AWS cluster by entering the following commands:

echo "--> AWS Cluster:"

$ echo "--> AWS Cluster:"

Copy to Clipboard

Toggle word wrap

oc get awscluster ${HC_CLUSTER_NAME} -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -o yaml > \
  ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/awscl-${HC_CLUSTER_NAME}.yaml

$ oc get awscluster ${HC_CLUSTER_NAME} -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -o yaml > \
  ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/awscl-${HC_CLUSTER_NAME}.yaml

Copy to Clipboard

Toggle word wrap

Back up the AWS MachineTemplate objects by entering the following commands:

echo "--> AWS Machine Template:"

$ echo "--> AWS Machine Template:"

Copy to Clipboard

Toggle word wrap

oc get awsmachinetemplate ${NODEPOOLS} -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -o yaml > \
  ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/awsmt-${HC_CLUSTER_NAME}.yaml

$ oc get awsmachinetemplate ${NODEPOOLS} -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -o yaml > \
  ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/awsmt-${HC_CLUSTER_NAME}.yaml

Copy to Clipboard

Toggle word wrap

Back up the AWS Machines objects by running the following shell script:

echo "--> AWS Machine:"

$ echo "--> AWS Machine:"

Copy to Clipboard

Toggle word wrap

CL_NAME=$(oc get hcp ${HC_CLUSTER_NAME} -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -o jsonpath={.metadata.labels.\*} | grep ${HC_CLUSTER_NAME})
for s in $(oc get awsmachines -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} --no-headers | grep ${CL_NAME} | cut -f1 -d\ ); do
    oc get -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} awsmachines $s -o yaml > ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/awsm-${s}.yaml
done

$ CL_NAME=$(oc get hcp ${HC_CLUSTER_NAME} -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -o jsonpath={.metadata.labels.\*} | grep ${HC_CLUSTER_NAME})
for s in $(oc get awsmachines -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} --no-headers | grep ${CL_NAME} | cut -f1 -d\ ); do
    oc get -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} awsmachines $s -o yaml > ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/awsm-${s}.yaml
done

Copy to Clipboard

Toggle word wrap

Back up the MachineDeployments objects by running the following shell script:

echo "--> HostedCluster MachineDeployments:"
for s in $(oc get machinedeployment -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -o name); do
    mdp_name=$(echo ${s} | cut -f 2 -d /)
    oc get -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} $s -o yaml > ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/machinedeployment-${mdp_name}.yaml
done

$ echo "--> HostedCluster MachineDeployments:"
for s in $(oc get machinedeployment -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -o name); do
    mdp_name=$(echo ${s} | cut -f 2 -d /)
    oc get -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} $s -o yaml > ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/machinedeployment-${mdp_name}.yaml
done

Copy to Clipboard

Toggle word wrap

Back up the MachineSets objects by running the following shell script:

echo "--> HostedCluster MachineSets:"
for s in $(oc get machineset -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -o name); do
    ms_name=$(echo ${s} | cut -f 2 -d /)
    oc get -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} $s -o yaml > ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/machineset-${ms_name}.yaml
done

$ echo "--> HostedCluster MachineSets:"
for s in $(oc get machineset -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -o name); do
    ms_name=$(echo ${s} | cut -f 2 -d /)
    oc get -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} $s -o yaml > ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/machineset-${ms_name}.yaml
done

Copy to Clipboard

Toggle word wrap

Back up the Machines objects from the Hosted Control Plane namespace by running the following shell script:

echo "--> HostedCluster Machine:"
for s in $(oc get machine -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -o name); do
    m_name=$(echo ${s} | cut -f 2 -d /)
    oc get -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} $s -o yaml > ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/machine-${m_name}.yaml
done

$ echo "--> HostedCluster Machine:"
for s in $(oc get machine -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -o name); do
    m_name=$(echo ${s} | cut -f 2 -d /)
    oc get -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} $s -o yaml > ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/machine-${m_name}.yaml
done

Copy to Clipboard

Toggle word wrap

Clean up the ControlPlane routes by entering the following command:
```
oc delete routes -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} --all
```
```
$ oc delete routes -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} --all
```
Copy to Clipboard Toggle word wrap
By entering that command, you enable the ExternalDNS Operator to delete the Route53 entries.

Verify that the Route53 entries are clean by running the following script:

function clean_routes() {

    if [[ -z "${1}" ]];then
        echo "Give me the NS where to clean the routes"
        exit 1
    fi

    # Constants
    if [[ -z "${2}" ]];then
        echo "Give me the Route53 zone ID"
        exit 1
    fi

    ZONE_ID=${2}
    ROUTES=10
    timeout=40
    count=0

    # This allows us to remove the ownership in the AWS for the API route
    oc delete route -n ${1} --all

    while [ ${ROUTES} -gt 2 ]
    do
        echo "Waiting for ExternalDNS Operator to clean the DNS Records in AWS Route53 where the zone id is: ${ZONE_ID}..."
        echo "Try: (${count}/${timeout})"
        sleep 10
        if [[ $count -eq timeout ]];then
            echo "Timeout waiting for cleaning the Route53 DNS records"
            exit 1
        fi
        count=$((count+1))
        ROUTES=$(aws route53 list-resource-record-sets --hosted-zone-id ${ZONE_ID} --max-items 10000 --output json | grep -c ${EXTERNAL_DNS_DOMAIN})
    done
}

# SAMPLE: clean_routes "<HC ControlPlane Namespace>" "<AWS_ZONE_ID>"
clean_routes "${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}" "${AWS_ZONE_ID}"

function clean_routes() {

    if [[ -z "${1}" ]];then
        echo "Give me the NS where to clean the routes"
        exit 1
    fi

    # Constants
    if [[ -z "${2}" ]];then
        echo "Give me the Route53 zone ID"
        exit 1
    fi

    ZONE_ID=${2}
    ROUTES=10
    timeout=40
    count=0

    # This allows us to remove the ownership in the AWS for the API route
    oc delete route -n ${1} --all

    while [ ${ROUTES} -gt 2 ]
    do
        echo "Waiting for ExternalDNS Operator to clean the DNS Records in AWS Route53 where the zone id is: ${ZONE_ID}..."
        echo "Try: (${count}/${timeout})"
        sleep 10
        if [[ $count -eq timeout ]];then
            echo "Timeout waiting for cleaning the Route53 DNS records"
            exit 1
        fi
        count=$((count+1))
        ROUTES=$(aws route53 list-resource-record-sets --hosted-zone-id ${ZONE_ID} --max-items 10000 --output json | grep -c ${EXTERNAL_DNS_DOMAIN})
    done
}

# SAMPLE: clean_routes "<HC ControlPlane Namespace>" "<AWS_ZONE_ID>"
clean_routes "${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}" "${AWS_ZONE_ID}"

Copy to Clipboard

Toggle word wrap

Verification

Check all of the OpenShift Container Platform objects and the S3 bucket to verify that everything looks as expected.

Next steps

Restore your hosted cluster.

9.6.3. Restoring a hosted cluster
Copier lien

Gather all of the objects that you backed up and restore them in your destination management cluster.

Prerequisites

You backed up the data from your source management cluster.

Tip

Ensure that the kubeconfig file of the destination management cluster is placed as it is set in the KUBECONFIG variable or, if you use the script, in the MGMT2_KUBECONFIG variable. Use export KUBECONFIG=<Kubeconfig FilePath> or, if you use the script, use export KUBECONFIG=${MGMT2_KUBECONFIG}.

Procedure

Verify that the new management cluster does not contain any namespaces from the cluster that you are restoring by entering these commands:
```
export KUBECONFIG=${MGMT2_KUBECONFIG}
```
```
$ export KUBECONFIG=${MGMT2_KUBECONFIG}
```
Copy to Clipboard Toggle word wrap
```
BACKUP_DIR=${HC_CLUSTER_DIR}/backup
```
```
$ BACKUP_DIR=${HC_CLUSTER_DIR}/backup
```
Copy to Clipboard Toggle word wrap
Namespace deletion in the destination Management cluster
```
oc delete ns ${HC_CLUSTER_NS} || true
```
```
$ oc delete ns ${HC_CLUSTER_NS} || true
```
Copy to Clipboard Toggle word wrap
```
oc delete ns ${HC_CLUSTER_NS}-{HC_CLUSTER_NAME} || true
```
```
$ oc delete ns ${HC_CLUSTER_NS}-{HC_CLUSTER_NAME} || true
```
Copy to Clipboard Toggle word wrap
Re-create the deleted namespaces by entering these commands:
Namespace creation commands
```
oc new-project ${HC_CLUSTER_NS}
```
```
$ oc new-project ${HC_CLUSTER_NS}
```
Copy to Clipboard Toggle word wrap
```
oc new-project ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}
```
```
$ oc new-project ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}
```
Copy to Clipboard Toggle word wrap

Restore the secrets in the HC namespace by entering this command:

oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}/secret-*

$ oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}/secret-*

Copy to Clipboard

Toggle word wrap

Restore the objects in the HostedCluster control plane namespace by entering these commands:

Restore secret command

oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/secret-*

$ oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/secret-*

Copy to Clipboard

Toggle word wrap

Cluster restore commands

oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/hcp-*

$ oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/hcp-*

Copy to Clipboard

Toggle word wrap

oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/cl-*

$ oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/cl-*

Copy to Clipboard

Toggle word wrap

If you are recovering the nodes and the node pool to reuse AWS instances, restore the objects in the HC control plane namespace by entering these commands:

Commands for AWS

oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/awscl-*

$ oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/awscl-*

Copy to Clipboard

Toggle word wrap

oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/awsmt-*

$ oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/awsmt-*

Copy to Clipboard

Toggle word wrap

oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/awsm-*

$ oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/awsm-*

Copy to Clipboard

Toggle word wrap

Commands for machines

oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/machinedeployment-*

$ oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/machinedeployment-*

Copy to Clipboard

Toggle word wrap

oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/machineset-*

$ oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/machineset-*

Copy to Clipboard

Toggle word wrap

oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/machine-*

$ oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}/machine-*

Copy to Clipboard

Toggle word wrap

Restore the etcd data and the hosted cluster by running this bash script:

ETCD_PODS="etcd-0"
if [ "${CONTROL_PLANE_AVAILABILITY_POLICY}" = "HighlyAvailable" ]; then
  ETCD_PODS="etcd-0 etcd-1 etcd-2"
fi

HC_RESTORE_FILE=${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}/hc-${HC_CLUSTER_NAME}-restore.yaml
HC_BACKUP_FILE=${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}/hc-${HC_CLUSTER_NAME}.yaml
HC_NEW_FILE=${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}/hc-${HC_CLUSTER_NAME}-new.yaml
cat ${HC_BACKUP_FILE} > ${HC_NEW_FILE}
cat > ${HC_RESTORE_FILE} <<EOF
    restoreSnapshotURL:
EOF

for POD in ${ETCD_PODS}; do
  # Create a pre-signed URL for the etcd snapshot
  ETCD_SNAPSHOT="s3://${BUCKET_NAME}/${HC_CLUSTER_NAME}-${POD}-snapshot.db"
  ETCD_SNAPSHOT_URL=$(AWS_DEFAULT_REGION=${MGMT2_REGION} aws s3 presign ${ETCD_SNAPSHOT})

  # FIXME no CLI support for restoreSnapshotURL yet
  cat >> ${HC_RESTORE_FILE} <<EOF
    - "${ETCD_SNAPSHOT_URL}"
EOF
done

cat ${HC_RESTORE_FILE}

if ! grep ${HC_CLUSTER_NAME}-snapshot.db ${HC_NEW_FILE}; then
  sed -i '' -e "/type: PersistentVolume/r ${HC_RESTORE_FILE}" ${HC_NEW_FILE}
  sed -i '' -e '/pausedUntil:/d' ${HC_NEW_FILE}
fi

HC=$(oc get hc -n ${HC_CLUSTER_NS} ${HC_CLUSTER_NAME} -o name || true)
if [[ ${HC} == "" ]];then
    echo "Deploying HC Cluster: ${HC_CLUSTER_NAME} in ${HC_CLUSTER_NS} namespace"
    oc apply -f ${HC_NEW_FILE}
else
    echo "HC Cluster ${HC_CLUSTER_NAME} already exists, avoiding step"
fi

ETCD_PODS="etcd-0"
if [ "${CONTROL_PLANE_AVAILABILITY_POLICY}" = "HighlyAvailable" ]; then
  ETCD_PODS="etcd-0 etcd-1 etcd-2"
fi

HC_RESTORE_FILE=${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}/hc-${HC_CLUSTER_NAME}-restore.yaml
HC_BACKUP_FILE=${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}/hc-${HC_CLUSTER_NAME}.yaml
HC_NEW_FILE=${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}/hc-${HC_CLUSTER_NAME}-new.yaml
cat ${HC_BACKUP_FILE} > ${HC_NEW_FILE}
cat > ${HC_RESTORE_FILE} <<EOF
    restoreSnapshotURL:
EOF

for POD in ${ETCD_PODS}; do
  # Create a pre-signed URL for the etcd snapshot
  ETCD_SNAPSHOT="s3://${BUCKET_NAME}/${HC_CLUSTER_NAME}-${POD}-snapshot.db"
  ETCD_SNAPSHOT_URL=$(AWS_DEFAULT_REGION=${MGMT2_REGION} aws s3 presign ${ETCD_SNAPSHOT})

  # FIXME no CLI support for restoreSnapshotURL yet
  cat >> ${HC_RESTORE_FILE} <<EOF
    - "${ETCD_SNAPSHOT_URL}"
EOF
done

cat ${HC_RESTORE_FILE}

if ! grep ${HC_CLUSTER_NAME}-snapshot.db ${HC_NEW_FILE}; then
  sed -i '' -e "/type: PersistentVolume/r ${HC_RESTORE_FILE}" ${HC_NEW_FILE}
  sed -i '' -e '/pausedUntil:/d' ${HC_NEW_FILE}
fi

HC=$(oc get hc -n ${HC_CLUSTER_NS} ${HC_CLUSTER_NAME} -o name || true)
if [[ ${HC} == "" ]];then
    echo "Deploying HC Cluster: ${HC_CLUSTER_NAME} in ${HC_CLUSTER_NS} namespace"
    oc apply -f ${HC_NEW_FILE}
else
    echo "HC Cluster ${HC_CLUSTER_NAME} already exists, avoiding step"
fi

Copy to Clipboard

Toggle word wrap

If you are recovering the nodes and the node pool to reuse AWS instances, restore the node pool by entering this command:
```
oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}/np-*
```
```
$ oc apply -f ${BACKUP_DIR}/namespaces/${HC_CLUSTER_NS}/np-*
```
Copy to Clipboard Toggle word wrap

Verification

To verify that the nodes are fully restored, use this function:

timeout=40
count=0
NODE_STATUS=$(oc get nodes --kubeconfig=${HC_KUBECONFIG} | grep -v NotReady | grep -c "worker") || NODE_STATUS=0

while [ ${NODE_POOL_REPLICAS} != ${NODE_STATUS} ]
do
    echo "Waiting for Nodes to be Ready in the destination MGMT Cluster: ${MGMT2_CLUSTER_NAME}"
    echo "Try: (${count}/${timeout})"
    sleep 30
    if [[ $count -eq timeout ]];then
        echo "Timeout waiting for Nodes in the destination MGMT Cluster"
        exit 1
    fi
    count=$((count+1))
    NODE_STATUS=$(oc get nodes --kubeconfig=${HC_KUBECONFIG} | grep -v NotReady | grep -c "worker") || NODE_STATUS=0
done

timeout=40
count=0
NODE_STATUS=$(oc get nodes --kubeconfig=${HC_KUBECONFIG} | grep -v NotReady | grep -c "worker") || NODE_STATUS=0

while [ ${NODE_POOL_REPLICAS} != ${NODE_STATUS} ]
do
    echo "Waiting for Nodes to be Ready in the destination MGMT Cluster: ${MGMT2_CLUSTER_NAME}"
    echo "Try: (${count}/${timeout})"
    sleep 30
    if [[ $count -eq timeout ]];then
        echo "Timeout waiting for Nodes in the destination MGMT Cluster"
        exit 1
    fi
    count=$((count+1))
    NODE_STATUS=$(oc get nodes --kubeconfig=${HC_KUBECONFIG} | grep -v NotReady | grep -c "worker") || NODE_STATUS=0
done

Copy to Clipboard

Toggle word wrap

Next steps

Shut down and delete your cluster.

9.6.4. Deleting a hosted cluster from your source management cluster
Copier lien

After you back up your hosted cluster and restore it to your destination management cluster, you shut down and delete the hosted cluster on your source management cluster.

Prerequisites

You backed up your data and restored it to your source management cluster.

Tip

Ensure that the kubeconfig file of the destination management cluster is placed as it is set in the KUBECONFIG variable or, if you use the script, in the MGMT_KUBECONFIG variable. Use export KUBECONFIG=<Kubeconfig FilePath> or, if you use the script, use export KUBECONFIG=${MGMT_KUBECONFIG}.

Procedure

Scale the deployment and statefulset objects by entering these commands:
Important
Do not scale the stateful set if the value of its spec.persistentVolumeClaimRetentionPolicy.whenScaled field is set to Delete, because this could lead to a loss of data.
As a workaround, update the value of the spec.persistentVolumeClaimRetentionPolicy.whenScaled field to Retain. Ensure that no controllers exist that reconcile the stateful set and would return the value back to Delete, which could lead to a loss of data.
```
export KUBECONFIG=${MGMT_KUBECONFIG}
```
```
$ export KUBECONFIG=${MGMT_KUBECONFIG}
```
Copy to Clipboard Toggle word wrap
Scale down deployment commands
```
oc scale deployment -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} --replicas=0 --all
```
```
$ oc scale deployment -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} --replicas=0 --all
```
Copy to Clipboard Toggle word wrap
```
oc scale statefulset.apps -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} --replicas=0 --all
```
```
$ oc scale statefulset.apps -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} --replicas=0 --all
```
Copy to Clipboard Toggle word wrap
```
sleep 15
```
```
$ sleep 15
```
Copy to Clipboard Toggle word wrap

Delete the NodePool objects by entering these commands:

NODEPOOLS=$(oc get nodepools -n ${HC_CLUSTER_NS} -o=jsonpath='{.items[?(@.spec.clusterName=="'${HC_CLUSTER_NAME}'")].metadata.name}')
if [[ ! -z "${NODEPOOLS}" ]];then
    oc patch -n "${HC_CLUSTER_NS}" nodepool ${NODEPOOLS} --type=json --patch='[ { "op":"remove", "path": "/metadata/finalizers" }]'
    oc delete np -n ${HC_CLUSTER_NS} ${NODEPOOLS}
fi

NODEPOOLS=$(oc get nodepools -n ${HC_CLUSTER_NS} -o=jsonpath='{.items[?(@.spec.clusterName=="'${HC_CLUSTER_NAME}'")].metadata.name}')
if [[ ! -z "${NODEPOOLS}" ]];then
    oc patch -n "${HC_CLUSTER_NS}" nodepool ${NODEPOOLS} --type=json --patch='[ { "op":"remove", "path": "/metadata/finalizers" }]'
    oc delete np -n ${HC_CLUSTER_NS} ${NODEPOOLS}
fi

Copy to Clipboard

Toggle word wrap

Delete the machine and machineset objects by entering these commands:

# Machines
for m in $(oc get machines -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -o name); do
    oc patch -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} ${m} --type=json --patch='[ { "op":"remove", "path": "/metadata/finalizers" }]' || true
    oc delete -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} ${m} || true
done

# Machines
for m in $(oc get machines -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -o name); do
    oc patch -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} ${m} --type=json --patch='[ { "op":"remove", "path": "/metadata/finalizers" }]' || true
    oc delete -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} ${m} || true
done

Copy to Clipboard

Toggle word wrap

oc delete machineset -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} --all || true

$ oc delete machineset -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} --all || true

Copy to Clipboard

Toggle word wrap

Delete the cluster object by entering these commands:

C_NAME=$(oc get cluster -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -o name)

$ C_NAME=$(oc get cluster -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -o name)

Copy to Clipboard

Toggle word wrap

oc patch -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} ${C_NAME} --type=json --patch='[ { "op":"remove", "path": "/metadata/finalizers" }]'

$ oc patch -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} ${C_NAME} --type=json --patch='[ { "op":"remove", "path": "/metadata/finalizers" }]'

Copy to Clipboard

Toggle word wrap

oc delete cluster.cluster.x-k8s.io -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} --all

$ oc delete cluster.cluster.x-k8s.io -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} --all

Copy to Clipboard

Toggle word wrap

Delete the AWS machines (Kubernetes objects) by entering these commands. Do not worry about deleting the real AWS machines. The cloud instances will not be affected.

for m in $(oc get awsmachine.infrastructure.cluster.x-k8s.io -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -o name)
do
    oc patch -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} ${m} --type=json --patch='[ { "op":"remove", "path": "/metadata/finalizers" }]' || true
    oc delete -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} ${m} || true
done

for m in $(oc get awsmachine.infrastructure.cluster.x-k8s.io -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} -o name)
do
    oc patch -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} ${m} --type=json --patch='[ { "op":"remove", "path": "/metadata/finalizers" }]' || true
    oc delete -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} ${m} || true
done

Copy to Clipboard

Toggle word wrap

Delete the HostedControlPlane and ControlPlane HC namespace objects by entering these commands:

Delete HCP and ControlPlane HC NS commands

oc patch -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} hostedcontrolplane.hypershift.openshift.io ${HC_CLUSTER_NAME} --type=json --patch='[ { "op":"remove", "path": "/metadata/finalizers" }]'

$ oc patch -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} hostedcontrolplane.hypershift.openshift.io ${HC_CLUSTER_NAME} --type=json --patch='[ { "op":"remove", "path": "/metadata/finalizers" }]'

Copy to Clipboard

Toggle word wrap

oc delete hostedcontrolplane.hypershift.openshift.io -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} --all

$ oc delete hostedcontrolplane.hypershift.openshift.io -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} --all

Copy to Clipboard

Toggle word wrap

oc delete ns ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} || true

$ oc delete ns ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME} || true

Copy to Clipboard

Toggle word wrap

Delete the HostedCluster and HC namespace objects by entering these commands:

Delete HC and HC Namespace commands

oc -n ${HC_CLUSTER_NS} patch hostedclusters ${HC_CLUSTER_NAME} -p '{"metadata":{"finalizers":null}}' --type merge || true

$ oc -n ${HC_CLUSTER_NS} patch hostedclusters ${HC_CLUSTER_NAME} -p '{"metadata":{"finalizers":null}}' --type merge || true

Copy to Clipboard

Toggle word wrap

oc delete hc -n ${HC_CLUSTER_NS} ${HC_CLUSTER_NAME}  || true

$ oc delete hc -n ${HC_CLUSTER_NS} ${HC_CLUSTER_NAME}  || true

Copy to Clipboard

Toggle word wrap

oc delete ns ${HC_CLUSTER_NS} || true

$ oc delete ns ${HC_CLUSTER_NS} || true

Copy to Clipboard

Toggle word wrap

Verification

To verify that everything works, enter these commands:

Validations commands

export KUBECONFIG=${MGMT2_KUBECONFIG}

$ export KUBECONFIG=${MGMT2_KUBECONFIG}

Copy to Clipboard

Toggle word wrap

oc get hc -n ${HC_CLUSTER_NS}

$ oc get hc -n ${HC_CLUSTER_NS}

Copy to Clipboard

Toggle word wrap

oc get np -n ${HC_CLUSTER_NS}

$ oc get np -n ${HC_CLUSTER_NS}

Copy to Clipboard

Toggle word wrap

oc get pod -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}

$ oc get pod -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}

Copy to Clipboard

Toggle word wrap

oc get machines -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}

$ oc get machines -n ${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}

Copy to Clipboard

Toggle word wrap

Commands for inside the HostedCluster

export KUBECONFIG=${HC_KUBECONFIG}

$ export KUBECONFIG=${HC_KUBECONFIG}

Copy to Clipboard

Toggle word wrap

oc get clusterversion

$ oc get clusterversion

Copy to Clipboard

Toggle word wrap

oc get nodes

$ oc get nodes

Copy to Clipboard

Toggle word wrap

Next steps

Delete the OVN pods in the hosted cluster so that you can connect to the new OVN control plane that runs in the new management cluster:

Load the KUBECONFIG environment variable with the hosted cluster’s kubeconfig path.

Enter this command:

oc delete pod -n openshift-ovn-kubernetes --all

$ oc delete pod -n openshift-ovn-kubernetes --all

Copy to Clipboard

Toggle word wrap

9.7. Disaster recovery for a hosted cluster by using OADP
Copier lien

You can use the OpenShift API for Data Protection (OADP) Operator to perform disaster recovery on Amazon Web Services (AWS) and bare metal.

The disaster recovery process with OpenShift API for Data Protection (OADP) involves the following steps:

Preparing your platform, such as Amazon Web Services or bare metal, to use OADP
Backing up the data plane workload
Backing up the control plane workload
Restoring a hosted cluster by using OADP

9.7.1. Prerequisites
Copier lien

You must meet the following prerequisites on the management cluster:

You installed the OADP Operator.
You created a storage class.
You have access to the cluster with cluster-admin privileges.
You have access to the OADP subscription through a catalog source.
You have access to a cloud storage provider that is compatible with OADP, such as S3, Microsoft Azure, Google Cloud, or MinIO.
In a disconnected environment, you have access to a self-hosted storage provider, for example Red Hat OpenShift Data Foundation or MinIO, that is compatible with OADP.
Your hosted control planes pods are up and running.

9.7.2. Preparing AWS to use OADP
Copier lien

To perform disaster recovery for a hosted cluster, you can use OpenShift API for Data Protection (OADP) on Amazon Web Services (AWS) S3 compatible storage. After creating the DataProtectionApplication object, new velero deployment and node-agent pods are created in the openshift-adp namespace.

To prepare AWS to use OADP, see "Configuring the OpenShift API for Data Protection with Multicloud Object Gateway".

Next steps

Backing up the data plane workload
Backing up the control plane workload

9.7.3. Preparing bare metal to use OADP
Copier lien

To perform disaster recovery for a hosted cluster, you can use OpenShift API for Data Protection (OADP) on bare metal. After creating the DataProtectionApplication object, new velero deployment and node-agent pods are created in the openshift-adp namespace.

To prepare bare metal to use OADP, see "Configuring the OpenShift API for Data Protection with AWS S3 compatible storage".

Next steps

Backing up the data plane workload
Backing up the control plane workload

9.7.4. Backing up the data plane workload
Copier lien

If the data plane workload is not important, you can skip this procedure. To back up the data plane workload by using the OADP Operator, see "Backing up applications".

Next steps

Restoring a hosted cluster by using OADP

9.7.5. Backing up the control plane workload
Copier lien

You can back up the control plane workload by creating the Backup custom resource (CR).

To monitor and observe the backup process, see "Observing the backup and restore process".

Procedure

Pause the reconciliation of the HostedCluster resource by running the following command:

oc --kubeconfig <management_cluster_kubeconfig_file> \
  patch hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> \
  --type json -p '[{"op": "add", "path": "/spec/pausedUntil", "value": "true"}]'

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  patch hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> \
  --type json -p '[{"op": "add", "path": "/spec/pausedUntil", "value": "true"}]'

Copy to Clipboard

Toggle word wrap

Get the infrastructure ID of your hosted cluster by running the following command:

oc get hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> -o=jsonpath="{.spec.infraID}"

$ oc get hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> -o=jsonpath="{.spec.infraID}"

Copy to Clipboard

Toggle word wrap

Note the infrastructure ID to use in the next step.

Pause the reconciliation of the cluster.cluster.x-k8s.io resource by running the following command:

oc --kubeconfig <management_cluster_kubeconfig_file> \
  patch cluster.cluster.x-k8s.io \
  -n <hosted_cluster_namespace>-<hosted_cluster_name> <hosted_cluster_infra_id> \
  --type json -p '[{"op": "add", "path": "/spec/paused", "value": true}]'

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  patch cluster.cluster.x-k8s.io \
  -n <hosted_cluster_namespace>-<hosted_cluster_name> <hosted_cluster_infra_id> \
  --type json -p '[{"op": "add", "path": "/spec/paused", "value": true}]'

Copy to Clipboard

Toggle word wrap

Pause the reconciliation of the NodePool resource by running the following command:

oc --kubeconfig <management_cluster_kubeconfig_file> \
  patch nodepool -n <hosted_cluster_namespace> <node_pool_name> \
  --type json -p '[{"op": "add", "path": "/spec/pausedUntil", "value": "true"}]'

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  patch nodepool -n <hosted_cluster_namespace> <node_pool_name> \
  --type json -p '[{"op": "add", "path": "/spec/pausedUntil", "value": "true"}]'

Copy to Clipboard

Toggle word wrap

Pause the reconciliation of the AgentCluster resource by running the following command:

oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate agentcluster -n <hosted_control_plane_namespace>  \
  cluster.x-k8s.io/paused=true --all

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate agentcluster -n <hosted_control_plane_namespace>  \
  cluster.x-k8s.io/paused=true --all

Copy to Clipboard

Toggle word wrap

Pause the reconciliation of the AgentMachine resource by running the following command:

oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate agentmachine -n <hosted_control_plane_namespace>  \
  cluster.x-k8s.io/paused=true --all

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate agentmachine -n <hosted_control_plane_namespace>  \
  cluster.x-k8s.io/paused=true --all

Copy to Clipboard

Toggle word wrap

Annotate the HostedCluster resource to prevent the deletion of the hosted control plane namespace by running the following command:

oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> \
  hypershift.openshift.io/skip-delete-hosted-controlplane-namespace=true

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> \
  hypershift.openshift.io/skip-delete-hosted-controlplane-namespace=true

Copy to Clipboard

Toggle word wrap

Create a YAML file that defines the Backup CR:

Example 9.1. Example backup-control-plane.yaml file

apiVersion: velero.io/v1
kind: Backup
metadata:
  name: <backup_resource_name> 
  namespace: openshift-adp
  labels:
    velero.io/storage-location: default
spec:
  hooks: {}
  includedNamespaces: 
  - <hosted_cluster_namespace> 
  - <hosted_control_plane_namespace> 
  includedResources:
  - sa
  - role
  - rolebinding
  - pod
  - pvc
  - pv
  - bmh
  - configmap
  - infraenv 
  - priorityclasses
  - pdb
  - agents
  - hostedcluster
  - nodepool
  - secrets
  - hostedcontrolplane
  - cluster
  - agentcluster
  - agentmachinetemplate
  - agentmachine
  - machinedeployment
  - machineset
  - machine
  excludedResources: []
  storageLocation: default
  ttl: 2h0m0s
  snapshotMoveData: true 
  datamover: "velero" 
  defaultVolumesToFsBackup: true

apiVersion: velero.io/v1
kind: Backup
metadata:
  name: <backup_resource_name>


  namespace: openshift-adp
  labels:
    velero.io/storage-location: default
spec:
  hooks: {}
  includedNamespaces:


  - <hosted_cluster_namespace>


  - <hosted_control_plane_namespace>


  includedResources:
  - sa
  - role
  - rolebinding
  - pod
  - pvc
  - pv
  - bmh
  - configmap
  - infraenv


  - priorityclasses
  - pdb
  - agents
  - hostedcluster
  - nodepool
  - secrets
  - hostedcontrolplane
  - cluster
  - agentcluster
  - agentmachinetemplate
  - agentmachine
  - machinedeployment
  - machineset
  - machine
  excludedResources: []
  storageLocation: default
  ttl: 2h0m0s
  snapshotMoveData: true


  datamover: "velero"


  defaultVolumesToFsBackup: true

Copy to Clipboard

Toggle word wrap

1: Replace backup_resource_name with the name of your Backup resource.
2: Selects specific namespaces to back up objects from them. You must include your hosted cluster namespace and the hosted control plane namespace.
3: Replace <hosted_cluster_namespace> with the name of the hosted cluster namespace, for example, clusters.
4: Replace <hosted_control_plane_namespace> with the name of the hosted control plane namespace, for example, clusters-hosted.
5: You must create the infraenv resource in a separate namespace. Do not delete the infraenv resource during the backup process.
6 7: Enables the CSI volume snapshots and uploads the control plane workload automatically to the cloud storage.
8: Sets the fs-backup backing up method for persistent volumes (PVs) as default. This setting is useful when you use a combination of Container Storage Interface (CSI) volume snapshots and the fs-backup method.

Note

If you want to use CSI volume snapshots, you must add the backup.velero.io/backup-volumes-excludes=<pv_name> annotation to your PVs.

Apply the Backup CR by running the following command:
```
oc apply -f backup-control-plane.yaml
```
```
$ oc apply -f backup-control-plane.yaml
```
Copy to Clipboard Toggle word wrap

Verification

Verify if the value of the status.phase is Completed by running the following command:

oc get backups.velero.io <backup_resource_name> -n openshift-adp \
  -o jsonpath='{.status.phase}'

$ oc get backups.velero.io <backup_resource_name> -n openshift-adp \
  -o jsonpath='{.status.phase}'

Copy to Clipboard

Toggle word wrap

Next steps

Restoring a hosted cluster by using OADP

9.7.6. Restoring a hosted cluster by using OADP
Copier lien

You can restore the hosted cluster by creating the Restore custom resource (CR).

If you are using an in-place update, InfraEnv does not need spare nodes. You need to re-provision the worker nodes from the new management cluster.
If you are using a replace update, you need some spare nodes for InfraEnv to deploy the worker nodes.

Important

After you back up your hosted cluster, you must destroy it to initiate the restoring process. To initiate node provisioning, you must back up workloads in the data plane before deleting the hosted cluster.

Prerequisites

You completed the steps in Removing a cluster by using the console to delete your hosted cluster.
You completed the steps in Removing remaining resources after removing a cluster.

To monitor and observe the backup process, see "Observing the backup and restore process".

Procedure

Verify that no pods and persistent volume claims (PVCs) are present in the hosted control plane namespace by running the following command:
```
oc get pod pvc -n <hosted_control_plane_namespace>
```
```
$ oc get pod pvc -n <hosted_control_plane_namespace>
```
Copy to Clipboard Toggle word wrap
Expected output
```
No resources found
```
```
No resources found
```
Copy to Clipboard Toggle word wrap

Create a YAML file that defines the Restore CR:

Example restore-hosted-cluster.yaml file

apiVersion: velero.io/v1
kind: Restore
metadata:
  name: <restore_resource_name> 
  namespace: openshift-adp
spec:
  backupName: <backup_resource_name> 
  restorePVs: true 
  existingResourcePolicy: update 
  excludedResources:
  - nodes
  - events
  - events.events.k8s.io
  - backups.velero.io
  - restores.velero.io
  - resticrepositories.velero.io

apiVersion: velero.io/v1
kind: Restore
metadata:
  name: <restore_resource_name>


  namespace: openshift-adp
spec:
  backupName: <backup_resource_name>


  restorePVs: true


  existingResourcePolicy: update


  excludedResources:
  - nodes
  - events
  - events.events.k8s.io
  - backups.velero.io
  - restores.velero.io
  - resticrepositories.velero.io

Copy to Clipboard

Toggle word wrap

1: Replace <restore_resource_name> with the name of your Restore resource.
2: Replace <backup_resource_name> with the name of your Backup resource.
3: Initiates the recovery of persistent volumes (PVs) and its pods.
4: Ensures that the existing objects are overwritten with the backed up content.

Important

You must create the infraenv resource in a separate namespace. Do not delete the infraenv resource during the restore process. The infraenv resource is mandatory for the new nodes to be reprovisioned.

Apply the Restore CR by running the following command:
```
oc apply -f restore-hosted-cluster.yaml
```
```
$ oc apply -f restore-hosted-cluster.yaml
```
Copy to Clipboard Toggle word wrap

Verify if the value of the status.phase is Completed by running the following command:

oc get hostedcluster <hosted_cluster_name> -n <hosted_cluster_namespace> \
  -o jsonpath='{.status.phase}'

$ oc get hostedcluster <hosted_cluster_name> -n <hosted_cluster_namespace> \
  -o jsonpath='{.status.phase}'

Copy to Clipboard

Toggle word wrap

After the restore process is complete, start the reconciliation of the HostedCluster and NodePool resources that you paused during backing up of the control plane workload:

Start the reconciliation of the HostedCluster resource by running the following command:

oc --kubeconfig <management_cluster_kubeconfig_file> \
  patch hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> \
  --type json \
  -p '[{"op": "add", "path": "/spec/pausedUntil", "value": "false"}]'

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  patch hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> \
  --type json \
  -p '[{"op": "add", "path": "/spec/pausedUntil", "value": "false"}]'

Copy to Clipboard

Toggle word wrap

Start the reconciliation of the NodePool resource by running the following command:

oc --kubeconfig <management_cluster_kubeconfig_file> \
  patch nodepool -n <hosted_cluster_namespace> <node_pool_name> \
  --type json \
  -p '[{"op": "add", "path": "/spec/pausedUntil", "value": "false"}]'

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  patch nodepool -n <hosted_cluster_namespace> <node_pool_name> \
  --type json \
  -p '[{"op": "add", "path": "/spec/pausedUntil", "value": "false"}]'

Copy to Clipboard

Toggle word wrap

Start the reconciliation of the Agent provider resources that you paused during backing up of the control plane workload:

Start the reconciliation of the AgentCluster resource by running the following command:

oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate agentcluster -n <hosted_control_plane_namespace>  \
  cluster.x-k8s.io/paused- --overwrite=true --all

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate agentcluster -n <hosted_control_plane_namespace>  \
  cluster.x-k8s.io/paused- --overwrite=true --all

Copy to Clipboard

Toggle word wrap

Start the reconciliation of the AgentMachine resource by running the following command:

oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate agentmachine -n <hosted_control_plane_namespace>  \
  cluster.x-k8s.io/paused- --overwrite=true --all

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate agentmachine -n <hosted_control_plane_namespace>  \
  cluster.x-k8s.io/paused- --overwrite=true --all

Copy to Clipboard

Toggle word wrap

Remove the hypershift.openshift.io/skip-delete-hosted-controlplane-namespace- annotation in the HostedCluster resource to avoid manually deleting the hosted control plane namespace by running the following command:

oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> \
  hypershift.openshift.io/skip-delete-hosted-controlplane-namespace- \
  --overwrite=true --all

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  annotate hostedcluster -n <hosted_cluster_namespace> <hosted_cluster_name> \
  hypershift.openshift.io/skip-delete-hosted-controlplane-namespace- \
  --overwrite=true --all

Copy to Clipboard

Toggle word wrap

Scale the NodePool resource to the desired number of replicas by running the following command:

oc --kubeconfig <management_cluster_kubeconfig_file> \
  scale nodepool -n <hosted_cluster_namespace> <node_pool_name> \
  --replicas <replica_count>

$ oc --kubeconfig <management_cluster_kubeconfig_file> \
  scale nodepool -n <hosted_cluster_namespace> <node_pool_name> \
  --replicas <replica_count>

Copy to Clipboard

Toggle word wrap

1: Replace <replica_count> by an integer value, for example, 3.

9.7.7. Observing the backup and restore process
Copier lien

When using OpenShift API for Data Protection (OADP) to backup and restore a hosted cluster, you can monitor and observe the process.

Procedure

Observe the backup process by running the following command:

watch "oc get backups.velero.io -n openshift-adp <backup_resource_name> -o jsonpath='{.status}'"

$ watch "oc get backups.velero.io -n openshift-adp <backup_resource_name> -o jsonpath='{.status}'"

Copy to Clipboard

Toggle word wrap

Observe the restore process by running the following command:

watch "oc get restores.velero.io -n openshift-adp <backup_resource_name> -o jsonpath='{.status}'"

$ watch "oc get restores.velero.io -n openshift-adp <backup_resource_name> -o jsonpath='{.status}'"

Copy to Clipboard

Toggle word wrap

Observe the Velero logs by running the following command:
```
oc logs -n openshift-adp -ldeploy=velero -f
```
```
$ oc logs -n openshift-adp -ldeploy=velero -f
```
Copy to Clipboard Toggle word wrap

Observe the progress of all of the OADP objects by running the following command:

watch "echo BackupRepositories:;echo;oc get backuprepositories.velero.io -A;echo; echo BackupStorageLocations: ;echo; oc get backupstoragelocations.velero.io -A;echo;echo DataUploads: ;echo;oc get datauploads.velero.io -A;echo;echo DataDownloads: ;echo;oc get datadownloads.velero.io -n openshift-adp; echo;echo VolumeSnapshotLocations: ;echo;oc get volumesnapshotlocations.velero.io -A;echo;echo Backups:;echo;oc get backup -A; echo;echo Restores:;echo;oc get restore -A"

$ watch "echo BackupRepositories:;echo;oc get backuprepositories.velero.io -A;echo; echo BackupStorageLocations: ;echo; oc get backupstoragelocations.velero.io -A;echo;echo DataUploads: ;echo;oc get datauploads.velero.io -A;echo;echo DataDownloads: ;echo;oc get datadownloads.velero.io -n openshift-adp; echo;echo VolumeSnapshotLocations: ;echo;oc get volumesnapshotlocations.velero.io -A;echo;echo Backups:;echo;oc get backup -A; echo;echo Restores:;echo;oc get restore -A"

Copy to Clipboard

Toggle word wrap

9.7.8. Using the velero CLI to describe the Backup and Restore resources
Copier lien

When using OpenShift API for Data Protection, you can get more details of the Backup and Restore resources by using the velero command-line interface (CLI).

Procedure

Create an alias to use the velero CLI from a container by running the following command:

alias velero='oc -n openshift-adp exec deployment/velero -c velero -it -- ./velero'

$ alias velero='oc -n openshift-adp exec deployment/velero -c velero -it -- ./velero'

Copy to Clipboard

Toggle word wrap

Get details of your Restore custom resource (CR) by running the following command:
```
velero restore describe <restore_resource_name> --details
```
```
$ velero restore describe <restore_resource_name> --details 
```
1
Copy to Clipboard Toggle word wrap
1
Replace <restore_resource_name> with the name of your Restore resource.
Get details of your Backup CR by running the following command:
```
velero restore describe <backup_resource_name> --details
```
```
$ velero restore describe <backup_resource_name> --details 
```
1
Copy to Clipboard Toggle word wrap
1
Replace <backup_resource_name> with the name of your Backup resource.

Ce contenu n'est pas disponible dans la langue sélectionnée.

Chapter 9. High availability for hosted control planes

9.1. About high availability for hosted control planes
Copier lien

9.1.1. Impact of the failed management cluster component
Copier lien

9.2. Recovering an unhealthy etcd cluster
Copier lien

9.2.1. Checking the status of an etcd cluster
Copier lien

9.2.2. Recovering a failing etcd pod
Copier lien

9.3. Backing up and restoring etcd in an on-premise environment
Copier lien

9.3.1. Backing up and restoring etcd on a hosted cluster in an on-premise environment
Copier lien

9.4. Backing up and restoring etcd on AWS
Copier lien

9.4.1. Taking a snapshot of etcd for a hosted cluster
Copier lien

9.4.2. Restoring an etcd snapshot on a hosted cluster
Copier lien

9.5. Backing up and restoring a hosted cluster on OpenShift Virtualization
Copier lien

9.5.1. Backing up a hosted cluster on OpenShift Virtualization
Copier lien

9.5.2. Restoring a hosted cluster on OpenShift Virtualization
Copier lien

9.6. Disaster recovery for a hosted cluster in AWS
Copier lien

9.6.1. Overview of the backup and restore process
Copier lien

9.6.2. Backing up a hosted cluster on AWS
Copier lien

9.6.3. Restoring a hosted cluster
Copier lien

9.6.4. Deleting a hosted cluster from your source management cluster
Copier lien

9.7. Disaster recovery for a hosted cluster by using OADP
Copier lien

9.7.1. Prerequisites
Copier lien

9.7.2. Preparing AWS to use OADP
Copier lien

9.7.3. Preparing bare metal to use OADP
Copier lien

9.7.4. Backing up the data plane workload
Copier lien

9.7.5. Backing up the control plane workload
Copier lien

9.7.6. Restoring a hosted cluster by using OADP
Copier lien

9.7.7. Observing the backup and restore process
Copier lien

9.7.8. Using the velero CLI to describe the Backup and Restore resources
Copier lien

Apprendre

Essayez, achetez et vendez

Communautés

À propos de la documentation Red Hat

Rendre l’open source plus inclusif

À propos de Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Ce contenu n'est pas disponible dans la langue sélectionnée.

Chapter 9. High availability for hosted control planes

9.1. About high availability for hosted control planesCopier lienLien copié sur presse-papiers!

9.1.1. Impact of the failed management cluster componentCopier lienLien copié sur presse-papiers!

9.2. Recovering an unhealthy etcd clusterCopier lienLien copié sur presse-papiers!

9.2.1. Checking the status of an etcd clusterCopier lienLien copié sur presse-papiers!

9.2.2. Recovering a failing etcd podCopier lienLien copié sur presse-papiers!

9.3. Backing up and restoring etcd in an on-premise environmentCopier lienLien copié sur presse-papiers!

9.3.1. Backing up and restoring etcd on a hosted cluster in an on-premise environmentCopier lienLien copié sur presse-papiers!

9.4. Backing up and restoring etcd on AWSCopier lienLien copié sur presse-papiers!

9.4.1. Taking a snapshot of etcd for a hosted clusterCopier lienLien copié sur presse-papiers!

9.4.2. Restoring an etcd snapshot on a hosted clusterCopier lienLien copié sur presse-papiers!

9.5. Backing up and restoring a hosted cluster on OpenShift VirtualizationCopier lienLien copié sur presse-papiers!

9.5.1. Backing up a hosted cluster on OpenShift VirtualizationCopier lienLien copié sur presse-papiers!

9.5.2. Restoring a hosted cluster on OpenShift VirtualizationCopier lienLien copié sur presse-papiers!

9.6. Disaster recovery for a hosted cluster in AWSCopier lienLien copié sur presse-papiers!

9.6.1. Overview of the backup and restore processCopier lienLien copié sur presse-papiers!

9.6.2. Backing up a hosted cluster on AWSCopier lienLien copié sur presse-papiers!

9.6.3. Restoring a hosted clusterCopier lienLien copié sur presse-papiers!

9.6.4. Deleting a hosted cluster from your source management clusterCopier lienLien copié sur presse-papiers!

9.7. Disaster recovery for a hosted cluster by using OADPCopier lienLien copié sur presse-papiers!

9.7.1. PrerequisitesCopier lienLien copié sur presse-papiers!

9.7.2. Preparing AWS to use OADPCopier lienLien copié sur presse-papiers!

9.7.3. Preparing bare metal to use OADPCopier lienLien copié sur presse-papiers!

9.7.4. Backing up the data plane workloadCopier lienLien copié sur presse-papiers!

9.7.5. Backing up the control plane workloadCopier lienLien copié sur presse-papiers!

9.7.6. Restoring a hosted cluster by using OADPCopier lienLien copié sur presse-papiers!

9.7.7. Observing the backup and restore processCopier lienLien copié sur presse-papiers!

9.7.8. Using the velero CLI to describe the Backup and Restore resourcesCopier lienLien copié sur presse-papiers!

Apprendre

Essayez, achetez et vendez

Communautés

À propos de la documentation Red Hat

Rendre l’open source plus inclusif

À propos de Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

9.1. About high availability for hosted control planes
Copier lien

9.1.1. Impact of the failed management cluster component
Copier lien

9.2. Recovering an unhealthy etcd cluster
Copier lien

9.2.1. Checking the status of an etcd cluster
Copier lien

9.2.2. Recovering a failing etcd pod
Copier lien

9.3. Backing up and restoring etcd in an on-premise environment
Copier lien

9.3.1. Backing up and restoring etcd on a hosted cluster in an on-premise environment
Copier lien

9.4. Backing up and restoring etcd on AWS
Copier lien

9.4.1. Taking a snapshot of etcd for a hosted cluster
Copier lien

9.4.2. Restoring an etcd snapshot on a hosted cluster
Copier lien

9.5. Backing up and restoring a hosted cluster on OpenShift Virtualization
Copier lien

9.5.1. Backing up a hosted cluster on OpenShift Virtualization
Copier lien

9.5.2. Restoring a hosted cluster on OpenShift Virtualization
Copier lien

9.6. Disaster recovery for a hosted cluster in AWS
Copier lien

9.6.1. Overview of the backup and restore process
Copier lien

9.6.2. Backing up a hosted cluster on AWS
Copier lien

9.6.3. Restoring a hosted cluster
Copier lien

9.6.4. Deleting a hosted cluster from your source management cluster
Copier lien

9.7. Disaster recovery for a hosted cluster by using OADP
Copier lien

9.7.1. Prerequisites
Copier lien

9.7.2. Preparing AWS to use OADP
Copier lien

9.7.3. Preparing bare metal to use OADP
Copier lien

9.7.4. Backing up the data plane workload
Copier lien

9.7.5. Backing up the control plane workload
Copier lien

9.7.6. Restoring a hosted cluster by using OADP
Copier lien

9.7.7. Observing the backup and restore process
Copier lien

9.7.8. Using the velero CLI to describe the Backup and Restore resources
Copier lien