10.7. Disaster Recovery
Red Hat Gluster Storage provides geo-replication failover and failback capabilities for disaster recovery. If the master goes offline, you can perform a
failover
procedure so that a slave can replace the master. When this happens, all the I/O operations, including reads and writes, are done on the slave which is now acting as the master. When the original master is back online, you can perform a failback
procedure on the original slave so that it synchronizes the differences back to the original master.
10.7.1. Failover: Promoting a Slave to Master
If the master volume goes offline, you can promote a slave volume to be the master, and start using that volume for data access.
Run the following commands on the slave machine to promote it to be the master:
#gluster volume set VOLNAME geo-replication.indexing on
# gluster volume set VOLNAME changelog on
For example
# gluster volume set slave-vol geo-replication.indexing on volume set: success # gluster volume set slave-vol changelog on volume set: success
You can now configure applications to use the slave volume for I/O operations.
10.7.2. Failback: Resuming Master and Slave back to their Original State
When the original master is back online, you can perform the following procedure on the original slave so that it synchronizes the differences back to the original master:
- Stop the existing geo-rep session from original master to orginal slave using the following command:
#
gluster volume geo-replication ORIGINAL_MASTER_VOL ORIGINAL_SLAVE_HOST::ORIGINAL_SLAVE_VOL stop
force
For example,# gluster volume geo-replication Volume1 storage.backup.com::slave-vol stop force Stopping geo-replication session between Volume1 and storage.backup.com::slave-vol has been successful
- Create a new geo-replication session with the original slave as the new master, and the original master as the new slave with
force
option. Detailed information on creating geo-replication session is available at: . - Start the special synchronization mode to speed up the recovery of data from slave. This option adds capability to geo-replication to ignore the files created before enabling
indexing
option. With this option, geo-replication will synchronize only those files which are created after making Slave volume as Master volume.#
gluster volume geo-replication ORIGINAL_SLAVE_VOL ORIGINAL_MASTER_HOST::ORIGINAL_MASTER_VOL config special-sync-mode recover
For example,# gluster volume geo-replication slave-vol master.com::Volume1 config special-sync-mode recover geo-replication config updated successfully
- Start the new geo-replication session using the following command:
#
gluster volume geo-replication ORIGINAL_SLAVE_VOL ORIGINAL_MASTER_HOST::ORIGINAL_MASTER_VOL start
For example,# gluster volume geo-replication slave-vol master.com::Volume1 start Starting geo-replication session between slave-vol and master.com::Volume1 has been successful
- Stop the I/O operations on the original slave and set the checkpoint. By setting a checkpoint, synchronization information is available on whether the data that was on the master at that point in time has been replicated to the slaves.
#
gluster volume geo-replication ORIGINAL_SLAVE_VOL ORIGINAL_MASTER_HOST::ORIGINAL_MASTER_VOL config checkpoint now
For example,# gluster volume geo-replication slave-vol master.com::Volume1 config checkpoint now geo-replication config updated successfully
- Checkpoint completion ensures that the data from the original slave is restored back to the original master. But since the IOs were stopped at slave before checkpoint was set, we need to touch the slave mount for checkpoint to be completed
#
touch orginial_slave_mount
#
gluster volume geo-replication ORIGINAL_SLAVE_VOL ORIGINAL_MASTER_HOST::ORIGINAL_MASTER_VOL status detail
For example,# touch /mnt/gluster/slavevol # gluster volume geo-replication slave-vol master.com::Volume1 status detail
- After the checkpoint is complete, stop and delete the current geo-replication session between the original slave and original master
#
gluster volume geo-replication ORIGINAL_SLAVE_VOL ORIGINAL_MASTER_HOST::ORIGINAL_MASTER_VOL stop
#
gluster volume geo-replication ORIGINAL_SLAVE_VOL ORIGINAL_MASTER_HOST::ORIGINAL_MASTER_VOL delete
For example,# gluster volume geo-replication slave-vol master.com::Volume1 stop Stopping geo-replication session between slave-vol and master.com::Volume1 has been successful # gluster volume geo-replication slave-vol master.com::Volume1 delete geo-replication command executed successfully
- Reset the options that were set for promoting the slave volume as the master volume by running the following commands:
#
gluster volume reset ORIGINAL_SLAVE_VOL geo-replication.indexing force
# gluster volume reset ORIGINAL_SLAVE_VOL changelog
For example,# gluster volume reset slave-vol geo-replication.indexing force volume set: success # gluster volume reset slave-vol changelog volume set: success
- Resume the original roles by starting the geo-rep session from the original master using the following command:
#
gluster volume geo-replication ORIGINAL_MASTER_VOL ORIGINAL_SLAVE_HOST::ORIGINAL_SLAVE_VOL start
# gluster volume geo-replication Volume1 storage.backup.com::slave-vol start Starting geo-replication session between slave-vol and master.com::Volume1 been successful