Chapter 3. Restoring the undercloud and control plane nodes


If your undercloud or control plane nodes become corrupted or if an error occurs during an update or upgrade, you can restore the undercloud or overcloud control plane nodes from a backup to their previous state. If the restore process fails to automatically restore the Galera cluster or nodes with colocated Ceph monitors, you can restore these components manually.

Before you restore a control plane nodes with colocated Ceph monitors, prepare your environment by creating a script that mounts the Ceph monitor backup file to the node file system and another script that ReaR uses to locate the backup file.

Important

If you cannot back up the /var/lib/ceph directory, you must contact the Red Hat Technical Support team to rebuild the ceph-mon index. For more information, see Red Hat Technical Support Team.

Prerequisites

Procedure

  1. On each node that you want to restore, create the script /usr/share/rear/setup/default/011_backup_ceph.sh and add the following content:

    mount -t <file_type> <device_disk> /mnt/local
    cd /mnt/local
    [ -d "var/lib/ceph" ] && tar cvfz /tmp/ceph.tar.gz var/lib/ceph --xattrs --xattrs-include='.' --acls
    cd /
    umount <device_disk>
    Copy to Clipboard Toggle word wrap

    Replace <file_type> and <device_disk> with the type and location of the backup file. Normally, the file type is xfs and the location is /dev/vda2.

  2. On the same node, create the script /usr/share/rear/wrapup/default/501_restore_ceph.sh and add the following content:

    if [ -f "/tmp/ceph.tar.gz" ]; then
      rm -rf /mnt/local/var/lib/ceph/*
      tar xvC /mnt/local -f /tmp/ceph.tar.gz var/lib/ceph --xattrs --xattrs-include='.'
    fi
    Copy to Clipboard Toggle word wrap

3.2. Restoring the undercloud node

You can restore the undercloud node to its previous state using the backup ISO image that you created using ReaR. You can find the backup ISO images on the backup node. Burn the bootable ISO image to a DVD or download it to the undercloud node through Integrated Lights-Out (iLO) remote access.

Prerequisites

Procedure

  1. Power off the undercloud node. Ensure that the undercloud node is powered off completely before you proceed.
  2. Boot the undercloud node with the backup ISO image.
  3. When the Relax-and-Recover boot menu displays, select Recover <undercloud_node>. Replace <undercloud_node> with the name of your undercloud node.

    Note

    If your system uses UEFI, select the Relax-and-Recover (no Secure Boot) option.

  4. Log in as the root user and restore the node:

    The following message displays:

    Welcome to Relax-and-Recover. Run "rear recover" to restore your system!
    RESCUE <undercloud_node>:~ # rear recover
    Copy to Clipboard Toggle word wrap

    When the undercloud node restoration process completes, the console displays the following message:

    Finished recovering your system
    Exiting rear recover
    Running exit tasks
    Copy to Clipboard Toggle word wrap
  5. Power off the node:

    RESCUE <undercloud_node>:~ #  poweroff
    Copy to Clipboard Toggle word wrap

    On boot up, the node resumes its previous state.

3.3. Restoring the control plane nodes

If an error occurs during an update or upgrade, you can restore the control plane nodes to their previous state using the backup ISO image that you have created using ReaR.

To restore the control plane, you must restore all control plane nodes to ensure state consistency.

You can find the backup ISO images on the backup node. Burn the bootable ISO image to a DVD or download it to the undercloud node through Integrated Lights-Out (iLO) remote access.

Note

Red Hat supports backups of Red Hat OpenStack Platform with native SDNs, such as Open vSwitch (OVS) and the default Open Virtual Network (OVN). For information about third-party SDNs, refer to the third-party SDN documentation.

Prerequisites

Procedure

  1. Power off each control plane node. Ensure that the control plane nodes are powered off completely before you proceed.
  2. Boot each control plane node with the corresponding backup ISO image.
  3. When the Relax-and-Recover boot menu displays, on each control plane node, select Recover <control_plane_node>. Replace <control_plane_node> with the name of the corresponding control plane node.

    Note

    If your system uses UEFI, select the Relax-and-Recover (no Secure Boot) option.

  4. On each control plane node, log in as the root user and restore the node:

    The following message displays:

    Welcome to Relax-and-Recover. Run "rear recover" to restore your system!
    RESCUE <control_plane_node>:~ # rear recover
    Copy to Clipboard Toggle word wrap

    When the control plane node restoration process completes, the console displays the following message:

    Finished recovering your system
    Exiting rear recover
    Running exit tasks
    Copy to Clipboard Toggle word wrap
  5. When the command line console is available, restore the config-drive partition of each control plane node:

    # once completed, restore the config-drive partition (which is ISO9660)
    RESCUE <control_plane_node>:~ $ dd if=/mnt/local/mnt/config-drive of=<config_drive_partition>
    Copy to Clipboard Toggle word wrap
  6. Power off the node:

    RESCUE <control_plane_node>:~ #  poweroff
    Copy to Clipboard Toggle word wrap
  7. Set the boot sequence to the normal boot device. On boot up, the node resumes its previous state.
  8. To ensure that the services are running correctly, check the status of pacemaker. Log in to a Controller node as the root user and enter the following command:

    # pcs status
    Copy to Clipboard Toggle word wrap
  9. To view the status of the overcloud, use the OpenStack Integration Test Suite (tempest). For more information, see Validating your OpenStack cloud with the Integration Test Suite (tempest).

Troubleshooting

  • Clear resource alarms that are displayed by pcs status by running the following command:
 # pcs resource clean
Copy to Clipboard Toggle word wrap
  • Clear STONITH fencing action errors that are displayed by pcs status by running the following commands:
# pcs resource clean
# pcs stonith history cleanup
Copy to Clipboard Toggle word wrap

3.4. Restoring the Galera cluster manually

If the Galera cluster does not restore as part of the restoration procedure, you must restore Galera manually.

Note

In this procedure, you must perform some steps on one Controller node. Ensure that you perform these steps on the same Controller node as you go through the procedure.

Procedure

  1. On Controller-0, retrieve the Galera cluster virtual IP:

    $ sudo hiera -c /etc/puppet/hiera.yaml mysql_vip
    Copy to Clipboard Toggle word wrap
  2. Disable the database connections through the virtual IP on all Controller nodes:

    $ sudo iptables -I INPUT  -p tcp --destination-port 3306 -d $MYSQL_VIP  -j DROP
    Copy to Clipboard Toggle word wrap
  3. On Controller-0, retrieve the MySQL root password:

    $ sudo hiera -c /etc/puppet/hiera.yaml mysql::server::root_password
    Copy to Clipboard Toggle word wrap
  4. On Controller-0, set the Galera resource to unmanaged mode:

    $ sudo pcs resource unmanage galera-bundle
    Copy to Clipboard Toggle word wrap
  5. Stop the MySQL containers on all Controller nodes:

    $ sudo podman container stop $(sudo podman container ls --all --format "{{.Names}}" --filter=name=galera-bundle)
    Copy to Clipboard Toggle word wrap
  6. Move the current directory on all Controller nodes:

    $ sudo mv /var/lib/mysql /var/lib/mysql-save
    Copy to Clipboard Toggle word wrap
  7. Create the new directory /var/lib/mysq on all Controller nodes:

    $ sudo mkdir /var/lib/mysql
    $ sudo chown 42434:42434 /var/lib/mysql
    $ sudo chcon -t container_file_t /var/lib/mysql
    $ sudo chmod 0755 /var/lib/mysql
    $ sudo chcon -r object_r /var/lib/mysql
    $ sudo chcon -u system_u /var/lib/mysql
    Copy to Clipboard Toggle word wrap
  8. Start the MySQL containers on all Controller nodes:

    $ sudo podman container start $(sudo podman container ls --all --format "{{ .Names }}" --filter=name=galera-bundle)
    Copy to Clipboard Toggle word wrap
  9. Create the MySQL database on all Controller nodes:

    $ sudo podman exec -i $(sudo podman container ls --all --format "{{ .Names }}" \
          --filter=name=galera-bundle) bash -c "mysql_install_db --datadir=/var/lib/mysql --user=mysql --log_error=/var/log/mysql/mysql_init.log"
    Copy to Clipboard Toggle word wrap
  10. Start the database on all Controller nodes:

    $ sudo podman exec $(sudo podman container ls --all --format "{{ .Names }}" \
          --filter=name=galera-bundle) bash -c "mysqld_safe --skip-networking --wsrep-on=OFF --log-error=/var/log/mysql/mysql_safe.log" &
    Copy to Clipboard Toggle word wrap
  11. Move the .my.cnf Galera configuration file on all Controller nodes:

    $ sudo podman exec $(sudo podman container ls --all --format "{{ .Names }}" \
          --filter=name=galera-bundle) bash -c "mv /root/.my.cnf /root/.my.cnf.bck"
    Copy to Clipboard Toggle word wrap
  12. Reset the Galera root password on all Controller nodes:

    $ sudo podman exec $(sudo podman container ls --all --format "{{ .Names }}"  \
          --filter=name=galera-bundle) bash -c "mysql -uroot -e'use mysql;update user set password=PASSWORD(\"$ROOTPASSWORD\")where User=\"root\";flush privileges;'"
    Copy to Clipboard Toggle word wrap
  13. Restore the .my.cnf Galera configuration file inside the Galera container on all Controller nodes:

    $ sudo podman exec $(sudo podman container ls --all --format "{{ .Names }}"   \
          --filter=name=galera-bundle) bash -c "mv /root/.my.cnf.bck /root/.my.cnf"
    Copy to Clipboard Toggle word wrap
  14. On Controller-0, copy the backup database files to /var/lib/MySQL:

    $ sudo cp $BACKUP_FILE /var/lib/mysql
    $ sudo cp $BACKUP_GRANT_FILE /var/lib/mysql
    Copy to Clipboard Toggle word wrap
    Note

    The path to these files is /home/heat-admin/.

  15. On Controller-0, restore the MySQL database:

    $ sudo podman exec $(sudo podman container ls --all --format "{{ .Names }}"    \
    --filter=name=galera-bundle) bash -c "mysql -u root -p$ROOT_PASSWORD < \"/var/lib/mysql/$BACKUP_FILE\"  "
    
    $ sudo podman exec $(sudo podman container ls --all --format "{{ .Names }}"    \
    --filter=name=galera-bundle) bash -c "mysql -u root -p$ROOT_PASSWORD < \"/var/lib/mysql/$BACKUP_GRANT_FILE\"  "
    Copy to Clipboard Toggle word wrap
  16. Shut down the databases on all Controller nodes:

    $ sudo podman exec $(sudo podman container ls --all --format "{{ .Names }}"    \
          --filter=name=galera-bundle) bash -c "mysqladmin shutdown"
    Copy to Clipboard Toggle word wrap
  17. On Controller-0, start the bootstrap node:

    $ sudo podman exec $(sudo podman container ls --all --format "{{ .Names }}"  --filter=name=galera-bundle) \
            /usr/bin/mysqld_safe --pid-file=/var/run/mysql/mysqld.pid --socket=/var/lib/mysql/mysql.sock --datadir=/var/lib/mysql \
            --log-error=/var/log/mysql/mysql_cluster.log  --user=mysql --open-files-limit=16384 \
            --wsrep-cluster-address=gcomm:// &
    Copy to Clipboard Toggle word wrap
  18. Verification: On Controller-0, check the status of the cluster:

    $ sudo podman exec $(sudo podman container ls --all --format "{{ .Names }}" \
             --filter=name=galera-bundle) bash -c "clustercheck"
    Copy to Clipboard Toggle word wrap

    Ensure that the following message is displayed: “Galera cluster node is synced”, otherwise you must recreate the node.

  19. On Controller-0, retrieve the cluster address from the configuration:

    $ sudo podman exec $(sudo podman container ls --all --format "{{ .Names }}" \
    --filter=name=galera-bundle) bash -c "grep wsrep_cluster_address /etc/my.cnf.d/galera.cnf" | awk '{print $3}'
    Copy to Clipboard Toggle word wrap
  20. On each of the remaining Controller nodes, start the database and validate the cluster:

    1. Start the database:

      $ sudo podman exec $(sudo podman container ls --all --format "{{ .Names }}" \
            --filter=name=galera-bundle) /usr/bin/mysqld_safe --pid-file=/var/run/mysql/mysqld.pid --socket=/var/lib/mysql/mysql.sock \
            --datadir=/var/lib/mysql --log-error=/var/log/mysql/mysql_cluster.log  --user=mysql --open-files-limit=16384 \
            --wsrep-cluster-address=$CLUSTER_ADDRESS &
      Copy to Clipboard Toggle word wrap
    2. Check the status of the MYSQL cluster:

      $ sudo podman exec $(sudo podman container ls --all --format "{{ .Names }}" \
               --filter=name=galera-bundle) bash -c "clustercheck"
      Copy to Clipboard Toggle word wrap

      Ensure that the following message is displayed: “Galera cluster node is synced”, otherwise you must recreate the node.

  21. Stop the MySQL container on all Controller nodes:

    $ sudo podman exec $(sudo podman container ls --all --format "{{ .Names }}" --filter=name=galera-bundle) \
            /usr/bin/mysqladmin -u root shutdown
    Copy to Clipboard Toggle word wrap
  22. On all Controller nodes, remove the following firewall rule to allow database connections through the virtual IP address:

    $ sudo iptables -D  INPUT  -p tcp --destination-port 3306 -d $MYSQL_VIP  -j DROP
    Copy to Clipboard Toggle word wrap
  23. Restart the MySQL container on all Controller nodes:

    $ sudo podman container restart $(sudo podman container ls --all --format  "{{ .Names }}" --filter=name=galera-bundle)
    Copy to Clipboard Toggle word wrap
  24. Restart the clustercheck container on all Controller nodes:

    $ sudo podman container restart $(sudo podman container ls --all --format  "{{ .Names }}" --filter=name=clustercheck)
    Copy to Clipboard Toggle word wrap
  25. On Controller-0, set the Galera resource to managed mode:

    $ sudo pcs resource manage galera-bundle
    Copy to Clipboard Toggle word wrap

Verification

  1. To ensure that services are running correctly, check the status of pacemaker:

    $ sudo pcs status
    Copy to Clipboard Toggle word wrap
  2. To view the status of the overcloud, use the OpenStack Integration Test Suite (tempest). For more information, see Validating your OpenStack cloud with the Integration Test Suite (tempest).
  3. If you suspect an issue with a particular node, check the state of the cluster with clustercheck:

    $ sudo podman exec clustercheck /usr/bin/clustercheck
    Copy to Clipboard Toggle word wrap

If the undercloud database does not restore as part of the undercloud restore process, you can restore the database manually. You can only restore the database if you previously created a standalone database backup.

Prerequisites

Procedure

  1. Log in to the director undercloud node as the root user.
  2. Stop all tripleo services:

    [root@director ~]# systemctl  stop  tripleo_*
    Copy to Clipboard Toggle word wrap
  3. Ensure that no containers are running on the server by entering the following command:

    [root@director ~]# podman ps
    Copy to Clipboard Toggle word wrap

    If any containers are running, enter the following command to stop the containers:

    [root@director ~]# podman stop <container_name>
    Copy to Clipboard Toggle word wrap
  4. Create a backup of the current /var/lib/mysql directory and then delete the directory:

    [root@director ~]# cp -a /var/lib/mysql /var/lib/mysql_bck
    [root@director ~]# rm -rf /var/lib/mysql
    Copy to Clipboard Toggle word wrap
  5. Recreate the database directory and set the SELinux attributes for the new directory:

    [root@director ~]# mkdir /var/lib/mysql
    [root@director ~]# chown 42434:42434 /var/lib/mysql
    [root@director ~]# chmod 0755 /var/lib/mysql
    [root@director ~]# chcon -t container_file_t /var/lib/mysql
    [root@director ~]# chcon -r object_r /var/lib/mysql
    [root@director ~]# chcon -u system_u /var/lib/mysql
    Copy to Clipboard Toggle word wrap
  6. Create a local tag for the mariadb image. Replace <image_id> and <undercloud.ctlplane.example.com> with the values applicable in your environment:

    [root@director ~]# podman images | grep mariadb
    <undercloud.ctlplane.example.com>:8787/rh-osbs/rhosp16-openstack-mariadb                 	16.2_20210322.1   <image_id>   3 weeks ago   718 MB
    Copy to Clipboard Toggle word wrap
    [root@director ~]# podman tag <image_id> mariadb
    Copy to Clipboard Toggle word wrap
    [root@director ~]# podman images | grep maria
    localhost/mariadb                                                                         	latest        	<image_id>   3 weeks ago   718 MB
    <undercloud.ctlplane.example.com>:8787/rh-osbs/rhosp16-openstack-mariadb                 	16.2_20210322.1   <image_id>   3 weeks ago   718 MB
    Copy to Clipboard Toggle word wrap
  7. Initialize the /var/lib/mysql directory with the container:

    [root@director ~]# podman run --net=host -v /var/lib/mysql:/var/lib/mysql localhost/mariadb mysql_install_db --datadir=/var/lib/mysql --user=mysql
    Copy to Clipboard Toggle word wrap
  8. Copy the database backup file that you want to import to the database:

    [root@director ~]# cp /root/undercloud-all-databases.sql /var/lib/mysql
    Copy to Clipboard Toggle word wrap
  9. Start the database service to import the data:

    [root@director ~]# podman run --net=host -dt -v /var/lib/mysql:/var/lib/mysql  localhost/mariadb  /usr/libexec/mysqld
    Copy to Clipboard Toggle word wrap
  10. Import the data and configure the max_allowed_packet parameter:

    1. Log in to the container and configure it:

      [root@director ~]# podman exec -it <container_id> /bin/bash
          ()[mysql@5a4e429c6f40 /]$ mysql -u root -e "set global max_allowed_packet = 1073741824;"
          ()[mysql@5a4e429c6f40 /]$ mysql -u root < /var/lib/mysql/undercloud-all-databases.sql
          ()[mysql@5a4e429c6f40 /]$ mysql -u root -e 'flush privileges'
          ()[mysql@5a4e429c6f40 /]$ exit
          exit
      Copy to Clipboard Toggle word wrap
    2. Stop the container:

      [root@director ~]# podman stop <container_id>
      Copy to Clipboard Toggle word wrap
    3. Check that no containers are running:

      [root@director ~]# podman ps
      CONTAINER ID  IMAGE  COMMAND  CREATED  STATUS  PORTS  NAMES
      [root@director ~]#
      Copy to Clipboard Toggle word wrap
  11. Restart all tripleo services:

    [root@director ~]# systemctl start multi-user.target
    Copy to Clipboard Toggle word wrap
Red Hat logoGithubredditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust. Explore our recent updates.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

Theme

© 2026 Red Hat
Back to top