Chapter 15. Backing up and restoring a director Operator deployed overcloud
To back up a Red Hat OpenStack Platform (RHOSP) overcloud that was deployed with director Operator (OSPdO), you must backup the Red Hat OpenShift Container Platform (RHOCP) OSPdO resources, and the use the Relax-and-Recover (ReaR) tool to backup the control plane and overcloud.
15.1. Backing up and restoring director Operator resources
Red Hat OpenStack Platform (RHOSP) director Operator (OSPdO) provides custom resource definitions (CRDs) for backing up and restoring a deployment. You do not have to manually export and import multiple configurations. OSPdO knows which custom resources (CRs), including the ConfigMap
and Secret
CRs, that it needs to create a complete backup because it is aware of the state of all resources. Therefore, OSPdO does not backup any configuration that is in an incomplete or error state.
To backup and restore an OSPdO deployment, you create an OpenStackBackupRequest
CR to initiate the creation or restoration of a backup. Your OpenStackBackupRequest
CR creates the OpenStackBackup
CR that stores the backup of the custom resources (CRs), the ConfigMap
and the Secret
configurations for the specified namespace.
15.1.1. Backing up director Operator resources
To create a backup you must create an OpenStackBackupRequest
custom resource (CR) for the namespace. The OpenStackBackup
CR is created when the OpenStackBackupRequest
object is created in save
mode.
Procedure
-
Create a file named
openstack_backup.yaml
on your workstation. Add the following configuration to your
openstack_backup.yaml
file to create theOpenStackBackupRequest
custom resource (CR):Copy to Clipboard Copied! Toggle word wrap Toggle overflow apiVersion: osp-director.openstack.org/v1beta1 kind: OpenStackBackupRequest metadata: name: openstackbackupsave namespace: openstack spec: mode: save additionalConfigMaps: [] additionalSecrets: []
apiVersion: osp-director.openstack.org/v1beta1 kind: OpenStackBackupRequest metadata: name: openstackbackupsave namespace: openstack spec: mode: save
1 additionalConfigMaps: []
2 additionalSecrets: []
3 NoteOSPdO attempts to include all
ConfigMap
andSecret
objects associated with the OSPdO CRs in the namespace, such asOpenStackControlPlane
andOpenStackBaremetalSet
. You do not need to include those in the additional lists.-
Save the
openstack_backup.yaml
file. Create the
OpenStackBackupRequest
CR:Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc create -f openstack_backup.yaml -n openstack
$ oc create -f openstack_backup.yaml -n openstack
Monitor the creation status of the
OpenStackBackupRequest
CR:Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc get openstackbackuprequest openstackbackupsave -n openstack
$ oc get openstackbackuprequest openstackbackupsave -n openstack
The
Quiescing
state indicates that OSPdO is waiting for the CRs to reach their finished state. The number of CRs can affect how long it takes to finish creating the backup.Copy to Clipboard Copied! Toggle word wrap Toggle overflow NAME OPERATION SOURCE STATUS COMPLETION TIMESTAMP openstackbackupsave save Quiescing
NAME OPERATION SOURCE STATUS COMPLETION TIMESTAMP openstackbackupsave save Quiescing
If the status remains in the
Quiescing
state for longer than expected, you can investigate the OSPdO logs to check progress:Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc logs <operator_pod> -c manager -f
$ oc logs <operator_pod> -c manager -f 2022-01-11T18:26:15.180Z INFO controllers.OpenStackBackupRequest Quiesce for save for OpenStackBackupRequest openstackbackupsave is waiting for: [OpenStackBaremetalSet: compute, OpenStackControlPlane: overcloud, OpenStackVMSet: controller]
-
Replace
<operator_pod>
with the name of the Operator pod.
-
Replace
The
Saved
state indicates that theOpenStackBackup
CR is created.Copy to Clipboard Copied! Toggle word wrap Toggle overflow NAME OPERATION SOURCE STATUS COMPLETION TIMESTAMP openstackbackupsave save Saved 2022-01-11T19:12:58Z
NAME OPERATION SOURCE STATUS COMPLETION TIMESTAMP openstackbackupsave save Saved 2022-01-11T19:12:58Z
The
Error
state indicates the backup has failed to create. Review the request contents to find the error:Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc get openstackbackuprequest openstackbackupsave -o yaml -n openstack
$ oc get openstackbackuprequest openstackbackupsave -o yaml -n openstack
View the
OpenStackBackup
resource to confirm it exists:Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc get openstackbackup -n openstack
$ oc get openstackbackup -n openstack NAME AGE openstackbackupsave-1641928378 6m7s
15.1.2. Restoring director Operator resources from a backup
When you request to restore a backup, Red Hat OpenStack Platform (RHOSP) director Operator (OSPdO) takes the contents of the specified OpenStackBackup
resource and attempts to apply them to all existing custom resources (CRs), ConfigMap
and Secret
resources present within the namespace. OSPdO overwrites any existing resources in the namespace, and creates new resources for those not found within the namespace.
Procedure
List the available backups:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc get osbackup
$ oc get osbackup
Inspect the details of a specific backup:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc get backup <name> -o yaml
$ oc get backup <name> -o yaml
-
Replace
<name>
with the name of the backup you want to inspect.
-
Replace
-
Create a file named
openstack_restore.yaml
on your workstation. Add the following configuration to your
openstack_restore.yaml
file to create theOpenStackBackupRequest
custom resource (CR):Copy to Clipboard Copied! Toggle word wrap Toggle overflow apiVersion: osp-director.openstack.org/v1beta1 kind: OpenStackBackupRequest metadata: name: openstackbackuprestore namespace: openstack spec: mode: <mode> restoreSource: <restore_source>
apiVersion: osp-director.openstack.org/v1beta1 kind: OpenStackBackupRequest metadata: name: openstackbackuprestore namespace: openstack spec: mode: <mode> restoreSource: <restore_source>
Replace
<mode>
with one of the following options:-
restore
: Requests a restore from an existingOpenStackBackup
. -
cleanRestore
: Completely wipes the existing OSPdO resources within the namespace before restoring and creating new resources from the existingOpenStackBackup
.
-
-
Replace
<restore_source>
with the ID of theOpenStackBackup
to restore, for example,openstackbackupsave-1641928378
.
-
Save the
openstack_restore.yaml
file. Create the
OpenStackBackupRequest
CR:Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc create -f openstack_restore.yaml -n openstack
$ oc create -f openstack_restore.yaml -n openstack
Monitor the creation status of the
OpenStackBackupRequest
CR:Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc get openstackbackuprequest openstackbackuprestore -n openstack
$ oc get openstackbackuprequest openstackbackuprestore -n openstack
The
Loading
state indicates that all resources from theOpenStackBackup
are being applied against the cluster.Copy to Clipboard Copied! Toggle word wrap Toggle overflow NAME OPERATION SOURCE STATUS COMPLETION TIMESTAMP openstackbackuprestore restore openstackbackupsave-1641928378 Loading
NAME OPERATION SOURCE STATUS COMPLETION TIMESTAMP openstackbackuprestore restore openstackbackupsave-1641928378 Loading
The
Reconciling
state indicates that all resources are loaded and OSPdO has begun reconciling to attempt to provision all resources.Copy to Clipboard Copied! Toggle word wrap Toggle overflow NAME OPERATION SOURCE STATUS COMPLETION TIMESTAMP openstackbackuprestore restore openstackbackupsave-1641928378 Reconciling
NAME OPERATION SOURCE STATUS COMPLETION TIMESTAMP openstackbackuprestore restore openstackbackupsave-1641928378 Reconciling
The
Restored
state indicates that theOpenStackBackup
CR has been restored.Copy to Clipboard Copied! Toggle word wrap Toggle overflow NAME OPERATION SOURCE STATUS COMPLETION TIMESTAMP openstackbackuprestore restore openstackbackupsave-1641928378 Restored 2022-01-12T13:48:57Z
NAME OPERATION SOURCE STATUS COMPLETION TIMESTAMP openstackbackuprestore restore openstackbackupsave-1641928378 Restored 2022-01-12T13:48:57Z
The
Error
state indicates the restoration has failed. Review the request contents to find the error:Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc get openstackbackuprequest openstackbackuprestore -o yaml -n openstack
$ oc get openstackbackuprequest openstackbackuprestore -o yaml -n openstack
15.2. Backing up and restoring a director Operator deployed overcloud with the Relax-and-Recover tool
To back up a director Operator deployed overcloud with the Relax-and-Recover (ReaR) tool, you configure the backup node, install the ReaR tool on the control plane, and create the backup image. You can create backups as a part of your regular environment maintenance.
In addition, you must back up the control plane before performing updates or upgrades. You can use the backups to restore the control plane to its previous state if an error occurs during an update or upgrade.
15.2.1. Supported backup formats and protocols
The backup and restore process uses the open-source tool Relax-and-Recover (ReaR) to create and restore bootable backup images. ReaR is written in Bash and supports multiple image formats and multiple transport protocols.
The following list shows the backup formats and protocols that Red Hat OpenStack Platform supports when you use ReaR to back up and restore a director Operator deployed control plane.
- Bootable media formats
- ISO
- File transport protocols
- SFTP
- NFS
15.2.2. Configuring the backup storage location
You can install and configure an NFS server to store the backup file. Before you create a backup of the control plane, configure the backup storage location in the bar-vars.yaml
environment file. This file stores the key-value parameters that you want to pass to the backup execution.
- If you previously installed and configured an NFS or SFTP server, you do not need to complete this procedure. You enter the server information when you set up ReaR on the node that you want to back up.
-
By default, the Relax-and-Recover (ReaR) IP address parameter for the NFS server is
192.168.24.1
. You must add the parametertripleo_backup_and_restore_server
to set the IP address value that matches your environment.
Procedure
Create an NFS backup directory on your workstation:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow mkdir -p /home/nfs/backup chmod 777 /home/nfs/backup cat >/etc/exports.d/backup.exports<<EOF exportfs -av
$ mkdir -p /home/nfs/backup $ chmod 777 /home/nfs/backup $ cat >/etc/exports.d/backup.exports<<EOF /home/nfs/backup *(rw,sync,no_root_squash) EOF $ exportfs -av
Create the
bar-vars.yaml
file on your workstation:Copy to Clipboard Copied! Toggle word wrap Toggle overflow touch /home/stack/bar-vars.yaml
$ touch /home/stack/bar-vars.yaml
In the
bar-vars.yaml
file, configure the backup storage location:Copy to Clipboard Copied! Toggle word wrap Toggle overflow tripleo_backup_and_restore_server: <ip_address> tripleo_backup_and_restore_shared_storage_folder: <backup_dir>
tripleo_backup_and_restore_server: <ip_address> tripleo_backup_and_restore_shared_storage_folder: <backup_dir>
-
Replace
<ip_address>
with the IP address of your NFS server, for example,172.22.0.1
. The default IP address is192.168.24.1
-
Replace
<backup_dir>
with the location of the backup storage folder, for example,/home/nfs/backup
.
-
Replace
15.2.3. Performing a backup of the control plane
To create a backup of the control plane, you must install and configure Relax-and-Recover (ReaR) on each of the Controller virtual machines (VMs).
Due to a known issue, the ReaR backup of overcloud nodes continues even if a Controller node is down. Ensure that all your Controller nodes are running before you run the ReaR backup. A fix is planned for a later Red Hat OpenStack Platform (RHOSP) release. For more information, see BZ#2077335 - Back up of the overcloud ctlplane keeps going even if one controller is unreachable.
Procedure
Extract the static Ansible inventory file from the location in which it was saved during installation:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc rsh openstackclient cd find . -name tripleo-ansible-inventory.yaml cp ~/overcloud-deploy/<stack>/tripleo-ansible-inventory.yaml .
$ oc rsh openstackclient $ cd $ find . -name tripleo-ansible-inventory.yaml $ cp ~/overcloud-deploy/<stack>/tripleo-ansible-inventory.yaml .
-
Replace
<stack>
with the name of your stack, for example,cloud-admin
. By default, the name of the stack isovercloud
.
-
Replace
Install ReaR on each Controller virtual machine (VM):
Copy to Clipboard Copied! Toggle word wrap Toggle overflow openstack overcloud backup --setup-rear --extra-vars /home/cloud-admin/bar-vars.yaml --inventory /home/cloud-admin/tripleo-ansible-inventory.yaml
$ openstack overcloud backup --setup-rear --extra-vars /home/cloud-admin/bar-vars.yaml --inventory /home/cloud-admin/tripleo-ansible-inventory.yaml
Open the
/etc/rear/local.conf
file on each Controller VM :Copy to Clipboard Copied! Toggle word wrap Toggle overflow ssh controller-0
$ ssh controller-0 [cloud-admin@controller-0 ~]$ sudo -i [root@controller-0 ~]# cat >>/etc/rear/local.conf<<EOF
In the
/etc/rear/local.conf
file, add theNETWORKING_PREPARATION_COMMANDS
parameter to configure the Controller VM networks in the following format:Copy to Clipboard Copied! Toggle word wrap Toggle overflow NETWORKING_PREPARATION_COMMANDS=('<command_1>' '<command_2>' ...'<command_n>')
NETWORKING_PREPARATION_COMMANDS=('<command_1>' '<command_2>' ...'<command_n>')
-
Replace
<command_1>
,<command_2>
, and all commands up to<command_n>
, with commands that configure the network interface names or IP addresses. For example, you can add theip link add br-ctlplane type bridge
command to configure the control plane bridge name or add theip link set eth0 up
command to set the name of the interface. You can add more commands to the parameter based on your network configuration.
-
Replace
Repeat the following command on each Controller VM to back up their
config-drive
partitions:Copy to Clipboard Copied! Toggle word wrap Toggle overflow dd if=/dev/vda1 of=/mnt/config-drive
[root@controller-0 ~]# dd if=/dev/vda1 of=/mnt/config-drive
Create a backup of the Controller VMs:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc rsh openstackclient openstack overcloud backup --inventory /home/cloud-admin/tripleo-ansible-inventory.yaml
$ oc rsh openstackclient $ openstack overcloud backup --inventory /home/cloud-admin/tripleo-ansible-inventory.yaml
The backup process runs sequentially on each Controller VM without disrupting the service to your environment.
NoteYou cannot use cron to schedule backups because cron cannot be used on the
openstackclient
pod.
15.2.4. Restoring the control plane
If an error occurs during an update or upgrade, you can restore the control plane to its previous state by using the backup ISO image that you created using the Relax-and-Recover (ReaR) tool.
To restore the control plane, you must restore all Controller virtual machines (VMs) to ensure state consistency.
You can find the backup ISO images on the backup node.
Red Hat supports backups of Red Hat OpenStack Platform with native SDNs, such as Open vSwitch (OVS) and the default Open Virtual Network (OVN). For information about third-party SDNs, refer to the third-party SDN documentation.
Prerequisites
- You have created a backup of the control plane nodes.
- You have access to the backup node.
-
A
vncviewer
package is installed on the workstation.
Procedure
Power off each Controller VM. Ensure that all the Controller VMs are powered off completely:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc get vm
$ oc get vm
Upload the backup ISO images for each Controller VM into a cluster PVC:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow virtctl image-upload pvc <backup_image> \ --pvc-size=<pvc_size> \ --image-path=<image_path> \ --insecure
$ virtctl image-upload pvc <backup_image> \ --pvc-size=<pvc_size> \ --image-path=<image_path> \ --insecure
-
Replace
<backup_image>
with name of the PVC backup image for the Controller VM. For example,backup-controller-0-202310231141
. -
Replace
<pvc_size>
with the size of PVC required for the image specified with the--image-path
option. For example,4G
. -
Replace
<image_path>
with the path to the backup ISO image for the Controller VM. For example,/home/nfs/backup/controller-0/controller-0-202310231141.iso
.
-
Replace
Disable the director Operator by changing its replicas to
0
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc patch csv -n openstack <csv> --type json -p="[{"op": "replace", "path": "/spec/install/spec/deployments/0/spec/replicas", "value": "0"}]"
$ oc patch csv -n openstack <csv> --type json -p="[{"op": "replace", "path": "/spec/install/spec/deployments/0/spec/replicas", "value": "0"}]"
-
Replace
<csv>
with the CSV from the environment, for example,osp-director-operator.v1.3.1
.
-
Replace
Verify that the
osp-director-operator-controller-manager
pod is stopped:Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc pod osp-director-operator-controller-manager
$ oc pod osp-director-operator-controller-manager
Create a backup of each Controller VM resource:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc get vm controller-0 -o yaml > controller-0-bk.yaml
$ oc get vm controller-0 -o yaml > controller-0-bk.yaml
Update the Controller VM resource with
bootOrder
set to1
and attach the uploaded PVC as a CD-ROM:Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc edit vm controller-0
$ oc edit vm controller-0 @@ -96,10 +96,7 @@ devices: disks: - bootOrder: 1 + cdrom: + bus: sata + name: cdromiso + - dedicatedIOThread: false - dedicatedIOThread: false disk: bus: virtio name: rootdisk @@ -177,9 +174,6 @@ name: tenant terminationGracePeriodSeconds: 0 volumes: + - name: cdromiso + persistentVolumeClaim: + claimName: <backup_image> - dataVolume: name: controller-0-36a1 name: rootdisk
-
Replace
<backup_image>
with name of the PVC backup image uploaded for the Controller VM in step 2. For example,backup-controller-0-202310231141
.
-
Replace
Start each Controller VM:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow virtctl start controller-0
$ virtctl start controller-0
-
Wait until the status of each Controller VM is
RUNNING
. Connect to each Controller VM by using VNC:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow virtctl vnc controller-0
$ virtctl vnc controller-0
NoteIf you are using SSH to access the Red Hat OpenShift Container Platform (RHOCP) CLI on a remote system, ensure the SSH X11 forwarding is correctly configured. For more information, see the Red Hat Knowledgebase solution How do I configure X11 forwarding over SSH in Red Hat Enterprise Linux?.
-
ReaR starts automatic recovery after a timeout by default. If recovery does not start automically, you can manually select the
Recover
option from theRelax-and-Recover
boot menu and specify the name of the control plane node to recover. Wait until the recovery is finished. When the control plane node restoration process completes, the console displays the following message:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Finished recovering your system Exiting rear recover Running exit tasks
Finished recovering your system Exiting rear recover Running exit tasks
- Enter the recovery shell as root.
When the command line console is available, restore the
config-drive
partition of each control plane node:Copy to Clipboard Copied! Toggle word wrap Toggle overflow once completed, restore the config-drive partition (which is ISO9660)
# once completed, restore the config-drive partition (which is ISO9660) RESCUE <control_plane_node>:~ $ dd if=/mnt/local/mnt/config-drive of=<config_drive_partition>
Power off each node:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow RESCUE <control_plane_node>:~ # poweroff
$ RESCUE <control_plane_node>:~ # poweroff
-
Update the Controller VM resource and deattach the CD-ROM. Make sure the rootDisk has
bootOrder: 1
. Enable the director Operator by changing its replicas to
1
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc patch csv -n openstack <csv> --type json -p="[{"op": "replace", "path": "/spec/install/spec/deployments/0/spec/replicas", "value": "1"}]"
$ oc patch csv -n openstack <csv> --type json -p="[{"op": "replace", "path": "/spec/install/spec/deployments/0/spec/replicas", "value": "1"}]"
-
Verify that the
osp-director-operator-controller-manager
pod is started. Start each Controller VM:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow virtctl start controller-0 virtctl start controller-1 virtctl start controller-2
$ virtctl start controller-0 $ virtctl start controller-1 $ virtctl start controller-2
- Wait until the Controller VMs are running. SELinux is relabelled on first boot.
Check the cluster status:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow pcs status
$ pcs status
If the Galera cluster does not restore as part of the restoration procedure, you must restore Galera manually. For more information, see Restoring the Galera cluster manually.