Appendix B. Restoring the overcloud
B.1. Restoring the overcloud control plane services Copy linkLink copied to clipboard!
The following procedure restores backups of the overcloud databases and configuration. In this situation, it is recommended to open three terminal windows so that you can perform certain operations simultaneously on all three Controller nodes. It is also recommended to select a Controller node to perform high availability operations. This procedure refers to this Controller node as the bootstrap Controller node.
This procedure only restores control plane services. It does not include restore Compute node workloads nor data on Ceph Storage nodes.
Red Hat supports backups of Red Hat OpenStack Platform with native SDNs, such as Open vSwitch (OVS) and the default Open Virtual Network (OVN). For information about third-party SDNs, refer to the third-party SDN documentation.
Procedure
Stop Pacemaker and remove all containerized services.
Log into the bootstrap Controller node and stop the pacemaker cluster:
sudo pcs cluster stop --all
# sudo pcs cluster stop --allCopy to Clipboard Copied! Toggle word wrap Toggle overflow Wait until the cluster shuts down completely:
sudo pcs status
# sudo pcs statusCopy to Clipboard Copied! Toggle word wrap Toggle overflow On all Controller nodes, remove all containers for OpenStack services:
docker stop $(docker ps -a -q) docker rm $(docker ps -a -q)
# docker stop $(docker ps -a -q) # docker rm $(docker ps -a -q)Copy to Clipboard Copied! Toggle word wrap Toggle overflow
If you are restoring from a failed major version upgrade, you might need to reverse any
yumtransactions that occurred on all nodes. This involves the following on each node:Enable the repositories for previous versions. For example:
sudo subscription-manager repos --enable=rhel-7-server-openstack-10-rpms sudo subscription-manager repos --enable=rhel-7-server-openstack-11-rpms sudo subscription-manager repos --enable=rhel-7-server-openstack-12-rpms
# sudo subscription-manager repos --enable=rhel-7-server-openstack-10-rpms # sudo subscription-manager repos --enable=rhel-7-server-openstack-11-rpms # sudo subscription-manager repos --enable=rhel-7-server-openstack-12-rpmsCopy to Clipboard Copied! Toggle word wrap Toggle overflow Enable the following Ceph repositories:
sudo subscription-manager repos --enable=rhel-7-server-rhceph-2-tools-rpms sudo subscription-manager repos --enable=rhel-7-server-rhceph-2-mon-rpms
# sudo subscription-manager repos --enable=rhel-7-server-rhceph-2-tools-rpms # sudo subscription-manager repos --enable=rhel-7-server-rhceph-2-mon-rpmsCopy to Clipboard Copied! Toggle word wrap Toggle overflow Check the
yumhistory:sudo yum history list all
# sudo yum history list allCopy to Clipboard Copied! Toggle word wrap Toggle overflow Identify transactions that occurred during the upgrade process. Most of these operations will have occurred on one of the Controller nodes (the Controller node selected as the bootstrap node during the upgrade). If you need to view a particular transaction, view it with the
history infosubcommand:sudo yum history info 25
# sudo yum history info 25Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteTo force
yum history list allto display the command ran from each transaction, sethistory_list_view=commandsin youryum.conffile.Revert any
yumtransactions that occurred since the upgrade. For example:sudo yum history undo 25 sudo yum history undo 24 sudo yum history undo 23 ...
# sudo yum history undo 25 # sudo yum history undo 24 # sudo yum history undo 23 ...Copy to Clipboard Copied! Toggle word wrap Toggle overflow Make sure to start from the last transaction and continue in descending order. You can also revert multiple transactions in one execution using the
rollbackoption. For example, the following command rolls back transaction from the last transaction to 23:sudo yum history rollback 23
# sudo yum history rollback 23Copy to Clipboard Copied! Toggle word wrap Toggle overflow ImportantIt is recommended to use
undofor each transaction instead ofrollbackso that you can verify the reversal of each transaction.Once the relevant
yumtransaction have reversed, enable only the original OpenStack Platform repository on all nodes. For example:sudo subscription-manager repos --disable=rhel-7-server-openstack-*-rpms sudo subscription-manager repos --enable=rhel-7-server-openstack-10-rpms
# sudo subscription-manager repos --disable=rhel-7-server-openstack-*-rpms # sudo subscription-manager repos --enable=rhel-7-server-openstack-10-rpmsCopy to Clipboard Copied! Toggle word wrap Toggle overflow Disable the following Ceph repositories:
sudo subscription-manager repos --enable=rhel-7-server-rhceph-3-tools-rpms sudo subscription-manager repos --enable=rhel-7-server-rhceph-3-mon-rpms
# sudo subscription-manager repos --enable=rhel-7-server-rhceph-3-tools-rpms # sudo subscription-manager repos --enable=rhel-7-server-rhceph-3-mon-rpmsCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Restore the database:
- Copy the database backups to the bootstrap Controller node.
Stop external connections to the database port on all Controller nodes:
# MYSQLIP=$(hiera -c /etc/puppet/hiera.yaml mysql_bind_host) sudo /sbin/iptables -I INPUT -d $MYSQLIP -p tcp --dport 3306 -j DROP
# MYSQLIP=$(hiera -c /etc/puppet/hiera.yaml mysql_bind_host) # sudo /sbin/iptables -I INPUT -d $MYSQLIP -p tcp --dport 3306 -j DROPCopy to Clipboard Copied! Toggle word wrap Toggle overflow This isolates all the database traffic to the nodes.
Temporarily disable database replication. Edit the
/etc/my.cnf.d/galera.cnffile on all Controller nodes.vi /etc/my.cnf.d/galera.cnf
# vi /etc/my.cnf.d/galera.cnfCopy to Clipboard Copied! Toggle word wrap Toggle overflow Make the following changes:
-
Comment out the
wsrep_cluster_addressparameter. -
Set
wsrep_providertonone
-
Comment out the
-
Save the
/etc/my.cnf.d/galera.cnffile. Make sure the MariaDB database is disabled on all Controller nodes. During the upgrade to OpenStack Platform 13, the MariaDB service moves to a containerized service, which you disabled earlier. Make sure the service isn’t running as a process on the host as well:
mysqladmin -u root shutdown
# mysqladmin -u root shutdownCopy to Clipboard Copied! Toggle word wrap Toggle overflow NoteYou might get a warning from HAProxy that the database is disabled.
Move existing MariaDB data directories and prepare new data directories on all Controller nodes,
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Start the database manually on all Controller nodes:
mysqld_safe --skip-grant-tables --skip-networking --wsrep-on=OFF &
# mysqld_safe --skip-grant-tables --skip-networking --wsrep-on=OFF &Copy to Clipboard Copied! Toggle word wrap Toggle overflow Get the old password Reset the database password on all Controller nodes:
OLDPASSWORD=$(sudo cat .my.cnf | grep -m1 password | cut -d'=' -f2 | tr -d "'") mysql -uroot -e"use mysql;update user set password=PASSWORD($OLDPASSWORD)"
# OLDPASSWORD=$(sudo cat .my.cnf | grep -m1 password | cut -d'=' -f2 | tr -d "'") # mysql -uroot -e"use mysql;update user set password=PASSWORD($OLDPASSWORD)"Copy to Clipboard Copied! Toggle word wrap Toggle overflow Stop the database on all Controller nodes:
/usr/bin/mysqladmin -u root shutdown
# /usr/bin/mysqladmin -u root shutdownCopy to Clipboard Copied! Toggle word wrap Toggle overflow Start the database manually on the bootstrap Controller node without the
--skip-grant-tablesoption:mysqld_safe --skip-networking --wsrep-on=OFF &
# mysqld_safe --skip-networking --wsrep-on=OFF &Copy to Clipboard Copied! Toggle word wrap Toggle overflow On the bootstrap Controller node, restore the OpenStack database. This will be replicated to the other Controller nodes later:
mysql -u root < openstack_database.sql
# mysql -u root < openstack_database.sqlCopy to Clipboard Copied! Toggle word wrap Toggle overflow On the bootstrap controller node, restore the users and permissions:
mysql -u root < grants.sql
# mysql -u root < grants.sqlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Shut down the bootstrap Controller node with the following command:
mysqladmin shutdown
# mysqladmin shutdownCopy to Clipboard Copied! Toggle word wrap Toggle overflow Enable database replication. Edit the
/etc/my.cnf.d/galera.cnffile on all Controller nodes.vi /etc/my.cnf.d/galera.cnf
# vi /etc/my.cnf.d/galera.cnfCopy to Clipboard Copied! Toggle word wrap Toggle overflow Make the following changes:
-
Uncomment out the
wsrep_cluster_addressparameter. -
Set
wsrep_providerto/usr/lib64/galera/libgalera_smm.so
-
Uncomment out the
-
Save the
/etc/my.cnf.d/galera.cnffile. Run the database on the bootstrap node:
/usr/bin/mysqld_safe --pid-file=/var/run/mysql/mysqld.pid --socket=/var/lib/mysql/mysql.sock --datadir=/var/lib/mysql --log-error=/var/log/mysql_cluster.log --user=mysql --open-files-limit=16384 --wsrep-cluster-address=gcomm:// &
# /usr/bin/mysqld_safe --pid-file=/var/run/mysql/mysqld.pid --socket=/var/lib/mysql/mysql.sock --datadir=/var/lib/mysql --log-error=/var/log/mysql_cluster.log --user=mysql --open-files-limit=16384 --wsrep-cluster-address=gcomm:// &Copy to Clipboard Copied! Toggle word wrap Toggle overflow The lack of nodes in the
--wsrep-cluster-addressoption will force Galera to create a new cluster and make the bootstrap node the master node.Check the status of the node:
clustercheck
# clustercheckCopy to Clipboard Copied! Toggle word wrap Toggle overflow This command should report
Galera cluster node is synced.. Check the/var/log/mysql_cluster.logfile for errors.On the remaining Controller nodes, start the database:
/usr/bin/mysqld_safe --pid-file=/var/run/mysql/mysqld.pid --socket=/var/lib/mysql/mysql.sock --datadir=/var/lib/mysql --log-error=/var/log/mysql_cluster.log --user=mysql --open-files-limit=16384 --wsrep-cluster-address=gcomm://overcloud-controller-0,overcloud-controller-1,overcloud-controller-2 &
$ /usr/bin/mysqld_safe --pid-file=/var/run/mysql/mysqld.pid --socket=/var/lib/mysql/mysql.sock --datadir=/var/lib/mysql --log-error=/var/log/mysql_cluster.log --user=mysql --open-files-limit=16384 --wsrep-cluster-address=gcomm://overcloud-controller-0,overcloud-controller-1,overcloud-controller-2 &Copy to Clipboard Copied! Toggle word wrap Toggle overflow The inclusion of the nodes in the
--wsrep-cluster-addressoption adds nodes to the new cluster and synchronizes content from the master.Periodically check the status of each node:
clustercheck
# clustercheckCopy to Clipboard Copied! Toggle word wrap Toggle overflow When all nodes have completed their synchronization operations, this command should report
Galera cluster node is synced.for each node.Stop the database on all nodes:
mysqladmin shutdown
$ mysqladmin shutdownCopy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the firewall rule from each node for the services to restore access to the database:
sudo /sbin/iptables -D INPUT -d $MYSQLIP -p tcp --dport 3306 -j DROP
# sudo /sbin/iptables -D INPUT -d $MYSQLIP -p tcp --dport 3306 -j DROPCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Restore the Pacemaker configuration:
- Copy the Pacemaker archive to the bootstrap node.
- Log into the bootstrap node.
Run the configuration restoration command:
pcs config restore pacemaker_controller_backup.tar.bz2
# pcs config restore pacemaker_controller_backup.tar.bz2Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Restore the filesystem:
Copy the backup
tarfile for each Controller node to a temporary directory and uncompress all the data:mkdir /var/tmp/filesystem_backup/ cd /var/tmp/filesystem_backup/ # mv <backup_file>.tar.gz . tar --xattrs --xattrs-include='*.*' -xvzf <backup_file>.tar.gz
# mkdir /var/tmp/filesystem_backup/ # cd /var/tmp/filesystem_backup/ # mv <backup_file>.tar.gz . # tar --xattrs --xattrs-include='*.*' -xvzf <backup_file>.tar.gzCopy to Clipboard Copied! Toggle word wrap Toggle overflow NoteDo not extract directly to the
/directory. This overrides your current filesystem. It is recommended to extract the file in a temporary directory.Restore the
os-*-configfiles and restartos-collect-config:# cp -rf /var/tmp/filesystem_backup/var/lib/os-collect-config/* /var/lib/os-collect-config/. # cp -rf /var/tmp/filesystem_backup/usr/libexec/os-apply-config/* /usr/libexec/os-apply-config/. # cp -rf /var/tmp/filesystem_backup/usr/libexec/os-refresh-config/* /usr/libexec/os-refresh-config/. systemctl restart os-collect-config
# cp -rf /var/tmp/filesystem_backup/var/lib/os-collect-config/* /var/lib/os-collect-config/. # cp -rf /var/tmp/filesystem_backup/usr/libexec/os-apply-config/* /usr/libexec/os-apply-config/. # cp -rf /var/tmp/filesystem_backup/usr/libexec/os-refresh-config/* /usr/libexec/os-refresh-config/. # systemctl restart os-collect-configCopy to Clipboard Copied! Toggle word wrap Toggle overflow Restore the Puppet hieradata files:
cp -r /var/tmp/filesystem_backup/etc/puppet/hieradata /etc/puppet/hieradata cp -r /var/tmp/filesystem_backup/etc/puppet/hiera.yaml /etc/puppet/hiera.yaml
# cp -r /var/tmp/filesystem_backup/etc/puppet/hieradata /etc/puppet/hieradata # cp -r /var/tmp/filesystem_backup/etc/puppet/hiera.yaml /etc/puppet/hiera.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow - Retain this directory in case you need any configuration files.
Restore the redis resource:
- Copy the Redis dump to each Controller node.
Move the Redis dump to the original location on each Controller:
mv dump.rdb /var/lib/redis/dump.rdb
# mv dump.rdb /var/lib/redis/dump.rdbCopy to Clipboard Copied! Toggle word wrap Toggle overflow Restore permissions to the Redis directory:
chown -R redis: /var/lib/redis
# chown -R redis: /var/lib/redisCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Remove the contents of any of the following directories:
rm -rf /var/lib/config-data/puppet-generated/* rm /root/.ffu_workaround
# rm -rf /var/lib/config-data/puppet-generated/* # rm /root/.ffu_workaroundCopy to Clipboard Copied! Toggle word wrap Toggle overflow Restore the permissions for the OpenStack Object Storage (swift) service:
chown -R swift: /srv/node chown -R swift: /var/lib/swift chown -R swift: /var/cache/swift
# chown -R swift: /srv/node # chown -R swift: /var/lib/swift # chown -R swift: /var/cache/swiftCopy to Clipboard Copied! Toggle word wrap Toggle overflow -
Log into the undercloud and run the original
openstack overcloud deploycommand from your OpenStack Platform 10 deployment. Make sure to include all environment files relevant to your original deployment. - Wait until the deployment completes.
After restoring the overcloud control plane data, check each relevant service is enabled and running correctly:
For high availability services on controller nodes:
pcs resource enable [SERVICE] pcs resource cleanup [SERVICE]
# pcs resource enable [SERVICE] # pcs resource cleanup [SERVICE]Copy to Clipboard Copied! Toggle word wrap Toggle overflow For System services on controller and compute nodes:
systemctl start [SERVICE] systemctl enable [SERVICE]
# systemctl start [SERVICE] # systemctl enable [SERVICE]Copy to Clipboard Copied! Toggle word wrap Toggle overflow
The next few sections provide a reference of services that should be enabled.
B.2. Restored High Availability Services Copy linkLink copied to clipboard!
The following is a list of high availability services that should be active on OpenStack Platform 10 Controller nodes after a restore. If any of these service are disabled, use the following commands to enable them:
pcs resource enable [SERVICE] pcs resource cleanup [SERVICE]
# pcs resource enable [SERVICE]
# pcs resource cleanup [SERVICE]
| Controller Services |
|---|
| galera |
| haproxy |
| openstack-cinder-volume |
| rabbitmq |
| redis |
B.3. Restored Controller Services Copy linkLink copied to clipboard!
The following is a list of core Systemd services that should be active on OpenStack Platform 10 Controller nodes after a restore. If any of these service are disabled, use the following commands to enable them:
systemctl start [SERVICE] systemctl enable [SERVICE]
# systemctl start [SERVICE]
# systemctl enable [SERVICE]
| Controller Services |
|---|
| httpd |
| memcached |
| neutron-dhcp-agent |
| neutron-l3-agent |
| neutron-metadata-agent |
| neutron-openvswitch-agent |
| neutron-ovs-cleanup |
| neutron-server |
| ntpd |
| openstack-aodh-evaluator |
| openstack-aodh-listener |
| openstack-aodh-notifier |
| openstack-ceilometer-central |
| openstack-ceilometer-collector |
| openstack-ceilometer-notification |
| openstack-cinder-api |
| openstack-cinder-scheduler |
| openstack-glance-api |
| openstack-glance-registry |
| openstack-gnocchi-metricd |
| openstack-gnocchi-statsd |
| openstack-heat-api-cfn |
| openstack-heat-api-cloudwatch |
| openstack-heat-api |
| openstack-heat-engine |
| openstack-nova-api |
| openstack-nova-conductor |
| openstack-nova-consoleauth |
| openstack-nova-novncproxy |
| openstack-nova-scheduler |
| openstack-swift-account-auditor |
| openstack-swift-account-reaper |
| openstack-swift-account-replicator |
| openstack-swift-account |
| openstack-swift-container-auditor |
| openstack-swift-container-replicator |
| openstack-swift-container-updater |
| openstack-swift-container |
| openstack-swift-object-auditor |
| openstack-swift-object-expirer |
| openstack-swift-object-replicator |
| openstack-swift-object-updater |
| openstack-swift-object |
| openstack-swift-proxy |
| openvswitch |
| os-collect-config |
| ovs-delete-transient-ports |
| ovs-vswitchd |
| ovsdb-server |
| pacemaker |
B.4. Restored Overcloud Compute Services Copy linkLink copied to clipboard!
The following is a list of core Systemd services that should be active on OpenStack Platform 10 Compute nodes after a restore. If any of these service are disabled, use the following commands to enable them:
systemctl start [SERVICE] systemctl enable [SERVICE]
# systemctl start [SERVICE]
# systemctl enable [SERVICE]
| Compute Services |
|---|
| neutron-openvswitch-agent |
| neutron-ovs-cleanup |
| ntpd |
| openstack-ceilometer-compute |
| openstack-nova-compute |
| openvswitch |
| os-collect-config |