이 콘텐츠는 선택한 언어로 제공되지 않습니다.
Back Up and Restore the Overcloud Database
How to back up and restore the overcloud MariaDB database
Abstract
Chapter 1. Introduction 링크 복사링크가 클립보드에 복사되었습니다!
The Back Up and Restore procedure for the OpenStack overcloud is intended for disaster recovery scenarios, such as the failure of multiple nodes or data directory corruption affecting the data integrity of the entire cluster.
For single node failures, bootstrap a new node from a working donor node.
Part I. Backing Up an OpenStack HA Overcloud Database 링크 복사링크가 클립보드에 복사되었습니다!
This procedure describes the steps to backup the MariaDB database in a running Galera cluster of an OpenStack overcloud.
Chapter 2. About Overcloud Components 링크 복사링크가 클립보드에 복사되었습니다!
For the purposes of backing up the OpenStack overcloud, consider the following components:
- Pacemaker: Pacemaker is a high-availability cluster resource manager. Pacemaker achieves maximum availability for cluster services by detecting and recovering from node- and resource-level failures. Pacemaker uses messaging and membership capabilities provided by Corosync.
- MariaDB: MariaDB is the SQL database service for the OpenStack overcloud.
- Galera: Galera is a synchronous, multi-master clustering solution for MariaDB. In the context of OpenStack, Galera provides high availability for the database service in HA deployments of OpenStack controller nodes.
On a typical HA cluster with 3 controller nodes, OpenStack services such as OpenStack Compute (nova) or OpenStack Networking (neutron) access the MariaDB database via HAProxy over a virtual IP address. HAProxy balances all incoming traffic to a single active controller node among the available controller nodes in the cluster. In a 3-node cluster, the other 2 controller nodes act as hot standby nodes, which synchronize with the first node using Galera’s synchronous replication. If HAProxy detects that the first target node is "unavailable," it will select one of the two remaining hot standby nodes and balance incoming traffic to the selected node. When clients attempt to use an existing database connection from a failed node, HAProxy will migrate the database connections to the new node. As a side effect of this lazy migration approach, each client will recreate a SQL connection.
Consequently, backing up a running cluster requires identifying one of the idle controller nodes as the backup target to avoid any impact on the controller node that is currently servicing clients.
Chapter 3. Locate an Idle Node 링크 복사링크가 클립보드에 복사되었습니다!
To backup a running cluster, identify an idle node as the back up target by connecting to the cluster via the HAProxy server; then, query the database to identify the host it is running on. The resulting node is the active node. Select one of the other hot standby nodes as the backup target.
For the purposes of a backing up a high availability cluster, Red Hat assumes at least three Galera cluster nodes. For example, overcloud nodes overcloud-controller-0, overcloud-controller-1 and overcloud-controller-2:
overcloud-controller-0.example.com 192.168.1.1 overcloud-controller-1.example.com 192.168.1.2 overcloud-controller-2.example.com 192.168.1.3
overcloud-controller-0.example.com 192.168.1.1
overcloud-controller-1.example.com 192.168.1.2
overcloud-controller-2.example.com 192.168.1.3
Procedure
Retrieve the virtual IP address HAProxy is listening on to identify which node is active.
grep -A1 mysql /etc/haproxy/haproxy.cfg
[root@overcloud-controller-0]# grep -A1 mysql /etc/haproxy/haproxy.cfgCopy to Clipboard Copied! Toggle word wrap Toggle overflow The result will look similar to this:
listen mysql bind 192.0.2.18:3306
listen mysql bind 192.0.2.18:3306Copy to Clipboard Copied! Toggle word wrap Toggle overflow Retrieve the username and password of a user allowed to connect to the virtual IP address. For example, the
novauser and password from/etc/nova/nova.conftypically has these credentials.crudini --get /etc/nova/nova.conf database connection
[root@overcloud-controller-0 heat-admin]# crudini --get /etc/nova/nova.conf database connectionCopy to Clipboard Copied! Toggle word wrap Toggle overflow Or
grep mysql /etc/nova/nova.conf
[root@overcloud-controller-0 heat-admin]# grep mysql /etc/nova/nova.confCopy to Clipboard Copied! Toggle word wrap Toggle overflow The result should look something like this:
connection=mysql+pymysql://nova:xvsZqeaJn2fYwMK8NbscAJ6xG@172.16.2.5/nova
connection=mysql+pymysql://nova:xvsZqeaJn2fYwMK8NbscAJ6xG@172.16.2.5/novaCopy to Clipboard Copied! Toggle word wrap Toggle overflow The username and password in the foregoing example are
novaandxvsZqeaJn2fYwMK8NbscAJ6xGrespectively. The next step requires the username and password retrieved in this step.Connect to the database over the virtual IP address to get the name of the targeted node.
mysql -u <username> -p -h 192.0.2.18 -nNE -e "show variables like 'hostname';"
[root@overcloud-controller-0]# mysql -u <username> -p -h 192.0.2.18 -nNE -e "show variables like 'hostname';" Enter password: ****Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace
<username>with a user allowed to connect to the virtual IP address. The command line interface will prompt for a password. The result should look similar to this:************************ 1. row ************************ hostname overcloud-controller-0.example.com
************************ 1. row ************************ hostname overcloud-controller-0.example.comCopy to Clipboard Copied! Toggle word wrap Toggle overflow In the foregoing example,
overcloud-controller-0is the active node targeted by HAProxy.Ensure the remaining hot standby nodes are connected and in sync. In the following example, assume
overcloud-controller-1andovercloud-controller-2are the hot standby nodes.curl overcloud-controller-1.example.com:9200 overcloud-controller-2.example.com:9200
[root@overcloud-controller-0]# curl overcloud-controller-1.example.com:9200 overcloud-controller-2.example.com:9200Copy to Clipboard Copied! Toggle word wrap Toggle overflow If the remaining nodes are connected and in sync with the active node targeted by HAProxy, the result should look similar to this:
Galera cluster node is synced. Galera cluster node is synced.
Galera cluster node is synced. Galera cluster node is synced.Copy to Clipboard Copied! Toggle word wrap Toggle overflow ImportantExecute the backup procedure on a hot standby node that is connected to the cluster and synchronized with the active node that HAProxy is targeting.
In the foregoing example, both overcloud-controller-1 and overcloud-controller-2 are connected and synchronized. Identify a connected and synchronized hot standby node to backup the overcloud.
Chapter 4. Back Up the Node 링크 복사링크가 클립보드에 복사되었습니다!
The back up procedure uses mysqldump with the --single-transaction option to avoid table locks. This approach ensures that the backup procedure limits the performance impact on a running cluster.
Since Galera only uses InnoDB tables, the --single-transaction only applies to InnoDB tables.
Procedure
Backup all tables other than the
mysqltables.mysql -u root -e "select distinct table_schema from information_schema.tables where engine='innodb' and table_schema != 'mysql';" -s -N | xargs mysqldump -u root --single-transaction --databases > openstack_database-$TIMESTAMP.sql
mysql -u root -e "select distinct table_schema from information_schema.tables where engine='innodb' and table_schema != 'mysql';" -s -N | xargs mysqldump -u root --single-transaction --databases > openstack_database-$TIMESTAMP.sqlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Backup all grants and account info from the database.
mysql -uroot -s -N -e "SELECT CONCAT('\"SHOW GRANTS FOR ''',user,'''@''',host,''';\"') FROM mysql.user where (length(user) > 0 and user NOT LIKE 'root')" | xargs -n1 mysql -u root -s -N -e | sed 's/$/;/' > grants.sqlmysql -uroot -s -N -e "SELECT CONCAT('\"SHOW GRANTS FOR ''',user,'''@''',host,''';\"') FROM mysql.user where (length(user) > 0 and user NOT LIKE 'root')" | xargs -n1 mysql -u root -s -N -e | sed 's/$/;/' > grants.sqlCopy to Clipboard Copied! Toggle word wrap Toggle overflow The backup procedure requires separately backing up the user accounts and grants, because they use ISAM tables.
Finally, store a copy of all backup files at a remote location.
Part II. Restoring an OpenStack HA Overcloud Database 링크 복사링크가 클립보드에 복사되었습니다!
This procedure describes the steps to restore the MariaDB database in a running Galera cluster of an OpenStack overcloud if ALL instances in the Galera cluster fail. Before executing this procedure, contact Red Hat support to confirm that database restoration is the appropriate solution.
Database restoration implies a database outage. No OpenStack service will be able to connect to the database during the restore procedure.
Before attempting to restore the database, verify that the database in each Galera node is beyond recovery. If the database is corrupt and beyond recovery in a single node but healthy in another node in the Galera cluster, allow Galera to resynchronize or re-replicate the local instance of the database from the healthy instance on another node.
The inability to start the Galera service on at least one or even all nodes does not necessarily imply that the database is corrupt and beyond recovery. The inability to start the service may be due to a separate issue in the Galera, MariaDB or Pacemaker services. Resolve those issues before attempting to use this procedure.
If Galera is working and the database is healthy on any of the HA nodes, DO NOT follow these steps. The replication/synchronization functionality of Galera should restore the cluster.
Finally, if objects have been created or deleted since the last backup, they will need to be added to or deleted from the database manually after completing the restore procedure.
Chapter 5. Identify the Virtual IP Address 링크 복사링크가 클립보드에 복사되었습니다!
For the purposes of a restoring a high availability cluster, Red Hat assumes at least three cluster nodes. For example, overcloud nodes overcloud-controller-0, overcloud-controller-1 and overcloud-controller-2:
overcloud-controller-0.example.com 192.168.1.1 overcloud-controller-1.example.com 192.168.1.2 overcloud-controller-2.example.com 192.168.1.3
overcloud-controller-0.example.com 192.168.1.1
overcloud-controller-1.example.com 192.168.1.2
overcloud-controller-2.example.com 192.168.1.3
Procedure
Identify the virtual IP address that OpenStack services use to access the database.
Run the following command:
grep -A1 mysql /etc/haproxy/haproxy.cfg
[root@overcloud-controller-0]# grep -A1 mysql /etc/haproxy/haproxy.cfg
It should retrieve the virtual IP address and port. For example:
listen mysql bind 192.168.1.10::3306 transparent
listen mysql
bind 192.168.1.10::3306 transparent
In the foregoing example, the virtual IP address is 192.168.1.10.
Chapter 6. Configure iptables and Pacemaker 링크 복사링크가 클립보드에 복사되었습니다!
Configure the firewall and Pacemaker to isolate the database cluster.
Procedure
Insert an
iptablesrule on each controller node to drop inbound connections over the the virtual IP address to the database port.iptables -I INPUT -d 192.168.1.10 -p tcp --dport 3306 -j DROP iptables -I INPUT -d 192.168.1.10 -p tcp --dport 3306 -j DROP iptables -I INPUT -d 192.168.1.10 -p tcp --dport 3306 -j DROP
[root@overcloud-controller-0]# iptables -I INPUT -d 192.168.1.10 -p tcp --dport 3306 -j DROP [root@overcloud-controller-1]# iptables -I INPUT -d 192.168.1.10 -p tcp --dport 3306 -j DROP [root@overcloud-controller-2]# iptables -I INPUT -d 192.168.1.10 -p tcp --dport 3306 -j DROPCopy to Clipboard Copied! Toggle word wrap Toggle overflow From one of the controller nodes, remove the
mariadb-galeraservice from Pacemaker management.pcs resource unmanage galera
[root@overcloud-controller-0]# pcs resource unmanage galeraCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Chapter 7. Update the Configuration 링크 복사링크가 클립보드에 복사되었습니다!
Update the Galera configuration on ALL Galera nodes.
Procedure
Locate the Galera configuration file and open it.
grep wsrep_cluster_address /etc/my.cnf.d/galera.cnf vi /etc/my.cnf.d/galera.cnf
grep wsrep_cluster_address /etc/my.cnf.d/galera.cnf vi /etc/my.cnf.d/galera.cnfCopy to Clipboard Copied! Toggle word wrap Toggle overflow Comment out the
wsrep_cluster_addresssetting.#wsrep_cluster_address = gcomm://overcloud-controller-0,overcloud-controller-1,overcloud-controller-2
#wsrep_cluster_address = gcomm://overcloud-controller-0,overcloud-controller-1,overcloud-controller-2Copy to Clipboard Copied! Toggle word wrap Toggle overflow Comment out the
wsrep_providersetting.#wsrep_provider = /usr/lib64/galera/libgalera_smm.so
#wsrep_provider = /usr/lib64/galera/libgalera_smm.soCopy to Clipboard Copied! Toggle word wrap Toggle overflow - Save the configuration file and close it.
Chapter 8. Stop the MariaDB Service and Prepare the Directories 링크 복사링크가 클립보드에 복사되었습니다!
The procedure requires stopping the MariaDB service and making a copy of the directories in case of data directory corruption. The copy of the directories is useful for root-cause-analysis investigation. Perform the following procedure on ALL nodes.
Procedure
Stop the MariaDB service on ALL controller nodes.
[root@overcloud-controller-0] mysqladmin -u root shutdown
[root@overcloud-controller-0] mysqladmin -u root shutdownCopy to Clipboard Copied! Toggle word wrap Toggle overflow Move the existing
mariadbdata directories and prepare new data directories on ALL controller nodes.Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Chapter 9. Start MariaDB Locally and Setup Accounts 링크 복사링크가 클립보드에 복사되었습니다!
To start MariaDB and setup base accounts, execute the following procedure.
Procedure
Start the MariaDB service and set up base accounts ALL controller nodes.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The director creates a root password and stores it in
/root/.my.cnf.On ALL controller nodes, ensure that the
mysqlprocess is not running.ps -ef | grep mysql ps -ef | grep mysql ps -ef | grep mysql
[root@overcloud-controller-0]# ps -ef | grep mysql [root@overcloud-controller-1]# ps -ef | grep mysql [root@overcloud-controller-2]# ps -ef | grep mysqlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Uncomment the
wsrep_cluster_addressandwsrep_providersettings in/etc/my.cnf.d/galera.cnfon ALL controller nodes.wsrep_cluster_address = gcomm://overcloud-controller-0,overcloud-controller-1,overcloud-controller-2 wsrep_provider = /usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address = gcomm://overcloud-controller-0,overcloud-controller-1,overcloud-controller-2 wsrep_provider = /usr/lib64/galera/libgalera_smm.soCopy to Clipboard Copied! Toggle word wrap Toggle overflow On one of the nodes in the cluster, bring the Galera cluster up with Pacemaker.
pcs resource manage galera pcs resource cleanup galera
[root@overcloud-controller-0]# pcs resource manage galera [root@overcloud-controller-0]# pcs resource cleanup galeraCopy to Clipboard Copied! Toggle word wrap Toggle overflow Make sure that cluster is running.
pcs status | grep -C3 galera
[root@overcloud-controller-0]# pcs status | grep -C3 galeraCopy to Clipboard Copied! Toggle word wrap Toggle overflow If the cluster is running, the output should look like this:
Master/Slave Set: galera-master [galera] Masters: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]Master/Slave Set: galera-master [galera] Masters: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Chapter 10. Restore the Database 링크 복사링크가 클립보드에 복사되었습니다!
Restore the MariaDB database tables on one node. Galera will replicate the database to other nodes automatically.
Procedure
Execute the following to restore the database tables. Ensure the
openstack_database.sqlfile name includes the appropriate timestamp.mysql -u root < openstack_database.sql mysql -u root < grants.sql
[root@overcloud-controller-0]# mysql -u root < openstack_database.sql [root@overcloud-controller-0]# mysql -u root < grants.sqlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Execute
clustercheckon the current node./bin/clustercheck
[root@overcloud-controller-0]# /bin/clustercheckCopy to Clipboard Copied! Toggle word wrap Toggle overflow Test
clustercheckviaxinetd.d:curl overcloud-controller-0.example.com:9200 curl overcloud-controller-1.example.com:9200 curl overcloud-controller-2.example.com:9200
curl overcloud-controller-0.example.com:9200 curl overcloud-controller-1.example.com:9200 curl overcloud-controller-2.example.com:9200Copy to Clipboard Copied! Toggle word wrap Toggle overflow Finally, remove the
iptablesrule from each node to restore access to database.iptables -D INPUT -d 192.168.1.10 -p tcp --dport 3306 -j DROP iptables -D INPUT -d 192.168.1.10 -p tcp --dport 3306 -j DROP iptables -D INPUT -d 192.168.1.10 -p tcp --dport 3306 -j DROP
[root@overcloud-controller-0]# iptables -D INPUT -d 192.168.1.10 -p tcp --dport 3306 -j DROP [root@overcloud-controller-1]# iptables -D INPUT -d 192.168.1.10 -p tcp --dport 3306 -j DROP [root@overcloud-controller-2]# iptables -D INPUT -d 192.168.1.10 -p tcp --dport 3306 -j DROPCopy to Clipboard Copied! Toggle word wrap Toggle overflow
If there are any OpenStack services in pcs status that have a failed status due to problems connecting to database during the restore procedure, run pcs resource cleanup <name> to recover those services.