Chapter 15. Backing up and restoring a director Operator deployed overcloud
To back up a Red Hat OpenStack Platform (RHOSP) overcloud that was deployed with director Operator (OSPdO), you must backup the Red Hat OpenShift Container Platform (RHOCP) OSPdO resources, and the use the Relax-and-Recover (ReaR) tool to backup the control plane and overcloud.
15.1. Backing up and restoring director Operator resources Copy linkLink copied to clipboard!
Red Hat OpenStack Platform (RHOSP) director Operator (OSPdO) provides custom resource definitions (CRDs) for backing up and restoring a deployment. You do not have to manually export and import multiple configurations. OSPdO knows which custom resources (CRs), including the ConfigMap and Secret CRs, that it needs to create a complete backup because it is aware of the state of all resources. Therefore, OSPdO does not backup any configuration that is in an incomplete or error state.
To backup and restore an OSPdO deployment, you create an OpenStackBackupRequest CR to initiate the creation or restoration of a backup. Your OpenStackBackupRequest CR creates the OpenStackBackup CR that stores the backup of the custom resources (CRs), the ConfigMap and the Secret configurations for the specified namespace.
15.1.1. Backing up director Operator resources Copy linkLink copied to clipboard!
To create a backup you must create an OpenStackBackupRequest custom resource (CR) for the namespace. The OpenStackBackup CR is created when the OpenStackBackupRequest object is created in save mode.
Procedure
-
Create a file named
openstack_backup.yamlon your workstation. Add the following configuration to your
openstack_backup.yamlfile to create theOpenStackBackupRequestcustom resource (CR):apiVersion: osp-director.openstack.org/v1beta1 kind: OpenStackBackupRequest metadata: name: openstackbackupsave namespace: openstack spec: mode: save1 additionalConfigMaps: []2 additionalSecrets: []3 NoteOSPdO attempts to include all
ConfigMapandSecretobjects associated with the OSPdO CRs in the namespace, such asOpenStackControlPlaneandOpenStackBaremetalSet. You do not need to include those in the additional lists.-
Save the
openstack_backup.yamlfile. Create the
OpenStackBackupRequestCR:$ oc create -f openstack_backup.yaml -n openstackMonitor the creation status of the
OpenStackBackupRequestCR:$ oc get openstackbackuprequest openstackbackupsave -n openstackThe
Quiescingstate indicates that OSPdO is waiting for the CRs to reach their finished state. The number of CRs can affect how long it takes to finish creating the backup.NAME OPERATION SOURCE STATUS COMPLETION TIMESTAMP openstackbackupsave save QuiescingIf the status remains in the
Quiescingstate for longer than expected, you can investigate the OSPdO logs to check progress:$ oc logs <operator_pod> -c manager -f 2022-01-11T18:26:15.180Z INFO controllers.OpenStackBackupRequest Quiesce for save for OpenStackBackupRequest openstackbackupsave is waiting for: [OpenStackBaremetalSet: compute, OpenStackControlPlane: overcloud, OpenStackVMSet: controller]-
Replace
<operator_pod>with the name of the Operator pod.
-
Replace
The
Savedstate indicates that theOpenStackBackupCR is created.NAME OPERATION SOURCE STATUS COMPLETION TIMESTAMP openstackbackupsave save Saved 2022-01-11T19:12:58ZThe
Errorstate indicates the backup has failed to create. Review the request contents to find the error:$ oc get openstackbackuprequest openstackbackupsave -o yaml -n openstack
View the
OpenStackBackupresource to confirm it exists:$ oc get openstackbackup -n openstack NAME AGE openstackbackupsave-1641928378 6m7s
15.1.2. Restoring director Operator resources from a backup Copy linkLink copied to clipboard!
When you request to restore a backup, Red Hat OpenStack Platform (RHOSP) director Operator (OSPdO) takes the contents of the specified OpenStackBackup resource and attempts to apply them to all existing custom resources (CRs), ConfigMap and Secret resources present within the namespace. OSPdO overwrites any existing resources in the namespace, and creates new resources for those not found within the namespace.
Procedure
List the available backups:
$ oc get osbackupInspect the details of a specific backup:
$ oc get backup <name> -o yaml-
Replace
<name>with the name of the backup you want to inspect.
-
Replace
-
Create a file named
openstack_restore.yamlon your workstation. Add the following configuration to your
openstack_restore.yamlfile to create theOpenStackBackupRequestcustom resource (CR):apiVersion: osp-director.openstack.org/v1beta1 kind: OpenStackBackupRequest metadata: name: openstackbackuprestore namespace: openstack spec: mode: <mode> restoreSource: <restore_source>Replace
<mode>with one of the following options:-
restore: Requests a restore from an existingOpenStackBackup. -
cleanRestore: Completely wipes the existing OSPdO resources within the namespace before restoring and creating new resources from the existingOpenStackBackup.
-
-
Replace
<restore_source>with the ID of theOpenStackBackupto restore, for example,openstackbackupsave-1641928378.
-
Save the
openstack_restore.yamlfile. Create the
OpenStackBackupRequestCR:$ oc create -f openstack_restore.yaml -n openstackMonitor the creation status of the
OpenStackBackupRequestCR:$ oc get openstackbackuprequest openstackbackuprestore -n openstackThe
Loadingstate indicates that all resources from theOpenStackBackupare being applied against the cluster.NAME OPERATION SOURCE STATUS COMPLETION TIMESTAMP openstackbackuprestore restore openstackbackupsave-1641928378 LoadingThe
Reconcilingstate indicates that all resources are loaded and OSPdO has begun reconciling to attempt to provision all resources.NAME OPERATION SOURCE STATUS COMPLETION TIMESTAMP openstackbackuprestore restore openstackbackupsave-1641928378 ReconcilingThe
Restoredstate indicates that theOpenStackBackupCR has been restored.NAME OPERATION SOURCE STATUS COMPLETION TIMESTAMP openstackbackuprestore restore openstackbackupsave-1641928378 Restored 2022-01-12T13:48:57ZThe
Errorstate indicates the restoration has failed. Review the request contents to find the error:$ oc get openstackbackuprequest openstackbackuprestore -o yaml -n openstack
15.2. Backing up and restoring a director Operator deployed overcloud with the Relax-and-Recover tool Copy linkLink copied to clipboard!
To back up a director Operator deployed overcloud with the Relax-and-Recover (ReaR) tool, you configure the backup node, install the ReaR tool on the control plane, and create the backup image. You can create backups as a part of your regular environment maintenance.
In addition, you must back up the control plane before performing updates or upgrades. You can use the backups to restore the control plane to its previous state if an error occurs during an update or upgrade.
15.2.1. Supported backup formats and protocols Copy linkLink copied to clipboard!
The backup and restore process uses the open-source tool Relax-and-Recover (ReaR) to create and restore bootable backup images. ReaR is written in Bash and supports multiple image formats and multiple transport protocols.
The following list shows the backup formats and protocols that Red Hat OpenStack Platform supports when you use ReaR to back up and restore a director Operator deployed control plane.
- Bootable media formats
- ISO
- File transport protocols
- SFTP
- NFS
15.2.2. Configuring the backup storage location Copy linkLink copied to clipboard!
You can install and configure an NFS server to store the backup file. Before you create a backup of the control plane, configure the backup storage location in the bar-vars.yaml environment file. This file stores the key-value parameters that you want to pass to the backup execution.
- If you previously installed and configured an NFS or SFTP server, you do not need to complete this procedure. You enter the server information when you set up ReaR on the node that you want to back up.
-
By default, the Relax-and-Recover (ReaR) IP address parameter for the NFS server is
192.168.24.1. You must add the parametertripleo_backup_and_restore_serverto set the IP address value that matches your environment.
Procedure
Create an NFS backup directory on your workstation:
$ mkdir -p /home/nfs/backup $ chmod 777 /home/nfs/backup $ cat >/etc/exports.d/backup.exports<<EOF /home/nfs/backup *(rw,sync,no_root_squash) EOF $ exportfs -avCreate the
bar-vars.yamlfile on your workstation:$ touch /home/stack/bar-vars.yamlIn the
bar-vars.yamlfile, configure the backup storage location:tripleo_backup_and_restore_server: <ip_address> tripleo_backup_and_restore_shared_storage_folder: <backup_dir>-
Replace
<ip_address>with the IP address of your NFS server, for example,172.22.0.1. The default IP address is192.168.24.1 -
Replace
<backup_dir>with the location of the backup storage folder, for example,/home/nfs/backup.
-
Replace
15.2.3. Performing a backup of the control plane Copy linkLink copied to clipboard!
To create a backup of the control plane, you must install and configure Relax-and-Recover (ReaR) on each of the Controller virtual machines (VMs).
Due to a known issue, the ReaR backup of overcloud nodes continues even if a Controller node is down. Ensure that all your Controller nodes are running before you run the ReaR backup. A fix is planned for a later Red Hat OpenStack Platform (RHOSP) release. For more information, see BZ#2077335 - Back up of the overcloud ctlplane keeps going even if one controller is unreachable.
Procedure
Extract the static Ansible inventory file from the location in which it was saved during installation:
$ oc rsh openstackclient $ cd $ find . -name tripleo-ansible-inventory.yaml $ cp ~/overcloud-deploy/<stack>/tripleo-ansible-inventory.yaml .-
Replace
<stack>with the name of your stack, for example,cloud-admin. By default, the name of the stack isovercloud.
-
Replace
Install ReaR on each Controller virtual machine (VM):
$ openstack overcloud backup --setup-rear --extra-vars /home/cloud-admin/bar-vars.yaml --inventory /home/cloud-admin/tripleo-ansible-inventory.yamlOpen the
/etc/rear/local.conffile on each Controller VM :$ ssh controller-0 [cloud-admin@controller-0 ~]$ sudo -i [root@controller-0 ~]# cat >>/etc/rear/local.conf<<EOFIn the
/etc/rear/local.conffile, add theNETWORKING_PREPARATION_COMMANDSparameter to configure the Controller VM networks in the following format:NETWORKING_PREPARATION_COMMANDS=('<command_1>' '<command_2>' ...'<command_n>')-
Replace
<command_1>,<command_2>, and all commands up to<command_n>, with commands that configure the network interface names or IP addresses. For example, you can add theip link add br-ctlplane type bridgecommand to configure the control plane bridge name or add theip link set eth0 upcommand to set the name of the interface. You can add more commands to the parameter based on your network configuration.
-
Replace
Repeat the following command on each Controller VM to back up their
config-drivepartitions:[root@controller-0 ~]# dd if=/dev/vda1 of=/mnt/config-driveCreate a backup of the Controller VMs:
$ oc rsh openstackclient $ openstack overcloud backup --inventory /home/cloud-admin/tripleo-ansible-inventory.yamlThe backup process runs sequentially on each Controller VM without disrupting the service to your environment.
NoteYou cannot use cron to schedule backups because cron cannot be used on the
openstackclientpod.
15.2.4. Restoring the control plane Copy linkLink copied to clipboard!
If an error occurs during an update or upgrade, you can restore the control plane to its previous state by using the backup ISO image that you created using the Relax-and-Recover (ReaR) tool.
To restore the control plane, you must restore all Controller virtual machines (VMs) to ensure state consistency.
You can find the backup ISO images on the backup node.
Red Hat supports backups of Red Hat OpenStack Platform with native SDNs, such as Open vSwitch (OVS) and the default Open Virtual Network (OVN). For information about third-party SDNs, refer to the third-party SDN documentation.
Prerequisites
- You have created a backup of the control plane nodes.
- You have access to the backup node.
-
A
vncviewerpackage is installed on the workstation.
Procedure
Power off each Controller VM. Ensure that all the Controller VMs are powered off completely:
$ oc get vmUpload the backup ISO images for each Controller VM into a cluster PVC:
$ virtctl image-upload pvc <backup_image> \ --pvc-size=<pvc_size> \ --image-path=<image_path> \ --insecure-
Replace
<backup_image>with name of the PVC backup image for the Controller VM. For example,backup-controller-0-202310231141. -
Replace
<pvc_size>with the size of PVC required for the image specified with the--image-pathoption. For example,4G. -
Replace
<image_path>with the path to the backup ISO image for the Controller VM. For example,/home/nfs/backup/controller-0/controller-0-202310231141.iso.
-
Replace
Disable the director Operator by changing its replicas to
0:$ oc patch csv -n openstack <csv> --type json -p="[{"op": "replace", "path": "/spec/install/spec/deployments/0/spec/replicas", "value": "0"}]"-
Replace
<csv>with the CSV from the environment, for example,osp-director-operator.v1.3.1.
-
Replace
Verify that the
osp-director-operator-controller-managerpod is stopped:$ oc pod osp-director-operator-controller-managerCreate a backup of each Controller VM resource:
$ oc get vm controller-0 -o yaml > controller-0-bk.yamlUpdate the Controller VM resource with
bootOrderset to1and attach the uploaded PVC as a CD-ROM:$ oc edit vm controller-0 @@ -96,10 +96,7 @@ devices: disks: - bootOrder: 1 + cdrom: + bus: sata + name: cdromiso + - dedicatedIOThread: false - dedicatedIOThread: false disk: bus: virtio name: rootdisk @@ -177,9 +174,6 @@ name: tenant terminationGracePeriodSeconds: 0 volumes: + - name: cdromiso + persistentVolumeClaim: + claimName: <backup_image> - dataVolume: name: controller-0-36a1 name: rootdisk-
Replace
<backup_image>with name of the PVC backup image uploaded for the Controller VM in step 2. For example,backup-controller-0-202310231141.
-
Replace
Start each Controller VM:
$ virtctl start controller-0-
Wait until the status of each Controller VM is
RUNNING. Connect to each Controller VM by using VNC:
$ virtctl vnc controller-0NoteIf you are using SSH to access the Red Hat OpenShift Container Platform (RHOCP) CLI on a remote system, ensure the SSH X11 forwarding is correctly configured. For more information, see the Red Hat Knowledgebase solution How do I configure X11 forwarding over SSH in Red Hat Enterprise Linux?.
-
ReaR starts automatic recovery after a timeout by default. If recovery does not start automically, you can manually select the
Recoveroption from theRelax-and-Recoverboot menu and specify the name of the control plane node to recover. Wait until the recovery is finished. When the control plane node restoration process completes, the console displays the following message:
Finished recovering your system Exiting rear recover Running exit tasks- Enter the recovery shell as root.
When the command line console is available, restore the
config-drivepartition of each control plane node:# once completed, restore the config-drive partition (which is ISO9660) RESCUE <control_plane_node>:~ $ dd if=/mnt/local/mnt/config-drive of=<config_drive_partition>Power off each node:
$ RESCUE <control_plane_node>:~ # poweroff-
Update the Controller VM resource and deattach the CD-ROM. Make sure the rootDisk has
bootOrder: 1. Enable the director Operator by changing its replicas to
1:$ oc patch csv -n openstack <csv> --type json -p="[{"op": "replace", "path": "/spec/install/spec/deployments/0/spec/replicas", "value": "1"}]"-
Verify that the
osp-director-operator-controller-managerpod is started. Start each Controller VM:
$ virtctl start controller-0 $ virtctl start controller-1 $ virtctl start controller-2- Wait until the Controller VMs are running. SELinux is relabelled on first boot.
Check the cluster status:
$ pcs statusIf the Galera cluster does not restore as part of the restoration procedure, you must restore Galera manually. For more information, see Restoring the Galera cluster manually.