10.4. Starting Geo-replication
This section describes how to and start geo-replication in your storage environment, and verify that it is functioning correctly.
10.4.1. Starting a Geo-replication Session
Important
You must create the geo-replication session before starting geo-replication. For more information, see Section 10.3.4.1, “Setting Up your Environment for Geo-replication Session”.
To start geo-replication, use one of the following commands:
- To start the geo-replication session between the hosts:
#
gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL start
For example:# gluster volume geo-replication Volume1 storage.backup.com::slave-vol start Starting geo-replication session between Volume1 & storage.backup.com::slave-vol has been successful
This command will start distributed geo-replication on all the nodes that are part of the master volume. If a node that is part of the master volume is down, the command will still be successful. In a replica pair, the geo-replication session will be active on any of the replica nodes, but remain passive on the others.After executing the command, it may take a few minutes for the session to initialize and become stable.Note
If you attempt to create a geo-replication session and the slave already has data, the following error message will be displayed:slave-node::slave is not empty. Please delete existing files in slave-node::slave and retry, or use force to continue without deleting the existing files. geo-replication command failed
- To start the geo-replication session forcefully between the hosts:
#
gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL start force
For example:# gluster volume geo-replication Volume1 storage.backup.com::slave-vol start force Starting geo-replication session between Volume1 & storage.backup.com::slave-vol has been successful
This command will force start geo-replication sessions on the nodes that are part of the master volume. If it is unable to successfully start the geo-replication session on any node which is online and part of the master volume, the command will still start the geo-replication sessions on as many nodes as it can. This command can also be used to re-start geo-replication sessions on the nodes where the session has died, or has not started.
10.4.2. Verifying a Successful Geo-replication Deployment
You can use the
status
command to verify the status of geo-replication in your environment:
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL status
For example:
# gluster volume geo-replication Volume1 storage.backup.com::slave-vol status
10.4.3. Displaying Geo-replication Status Information
The
status
command can be used to display information about a specific geo-replication master session, master-slave session, or all geo-replication sessions. The status output provides both node and brick level information.
- To display information about all geo-replication sessions, use the following command:
#
gluster volume geo-replication status [detail]
- To display information on all geo-replication sessions from a particular master volume, use the following command:
#
gluster volume geo-replication MASTER_VOL status [detail]
- To display information of a particular master-slave session, use the following command:
#
gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL status [detail]
Important
There will be a mismatch between the outputs of thedf
command (including-h
and-k
) and inode of the master and slave volumes when the data is in full sync. This is due to the extra inode and size consumption by thechangelog
journaling data, which keeps track of the changes done on the file system on themaster
volume. Instead of running thedf
command to verify the status of synchronization, use# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL status detail
instead. - The geo-replication status command output provides the following information:
- Master Node: Master node and Hostname as listed in the
gluster volume info
command output - Master Vol: Master volume name
- Master Brick: The path of the brick
- Slave User: Slave user name
- Slave: Slave volume name
- Slave Node: IP address/hostname of the slave node to which master worker is connected to.
- Status: The status of the geo-replication worker can be one of the following:
- Initializing: This is the initial phase of the Geo-replication session; it remains in this state for a minute in order to make sure no abnormalities are present.
- Created: The geo-replication session is created, but not started.
- Active: The
gsync
daemon in this node is active and syncing the data. - Passive: A replica pair of the active node. The data synchronization is handled by the active node. Hence, this node does not sync any data.
- Faulty: The geo-replication session has experienced a problem, and the issue needs to be investigated further. For more information, see Section 10.12, “Troubleshooting Geo-replication” section.
- Stopped: The geo-replication session has stopped, but has not been deleted.
- Crawl Status: Crawl status can be one of the following:
- Changelog Crawl: The
changelog
translator has produced the changelog and that is being consumed bygsyncd
daemon to sync data. - Hybrid Crawl: The
gsyncd
daemon is crawling the glusterFS file system and generating pseudo changelog to sync data. - History Crawl: The
gsyncd
daemon consumes the history changelogs produced by the changelog translator to sync data.
- Last Synced: The last synced time.
- Entry: The number of pending entry (CREATE, MKDIR, RENAME, UNLINK etc) operations per session.
- Data: The number of
Data
operations pending per session. - Meta: The number of
Meta
operations pending per session. - Failures: The number of failures. If the failure count is more than zero, view the log files for errors in the Master bricks.
- Checkpoint Time: Displays the date and time of the checkpoint, if set. Otherwise, it displays as N/A.
- Checkpoint Completed: Displays the status of the checkpoint.
- Checkpoint Completion Time: Displays the completion time if Checkpoint is completed. Otherwise, it displays as N/A.
10.4.4. Configuring a Geo-replication Session
To configure a geo-replication session, use the following command:
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL config [Name] [Value]
For example:
# gluster volume geo-replication Volume1 storage.backup.com::slave-vol config sync_method rsync
For example, to view the list of all option/value pairs:
# gluster volume geo-replication Volume1 storage.backup.com::slave-vol config
To delete a setting for a geo-replication config option, prefix the option with
!
(exclamation mark). For example, to reset log-level
to the default value:
# gluster volume geo-replication Volume1 storage.backup.com::slave-vol config '!log-level'
Warning
You must ensure to perform these configuration changes when all the peers in cluster are in
Connected
(online) state. If you change the configuration when any of the peer is down, the geo-replication cluster would be in inconsistent state when the node comes back online.
Configurable Options
The following table provides an overview of the configurable options for a geo-replication setting:
Option | Description |
---|---|
gluster_log_file LOGFILE | The path to the geo-replication glusterfs log file. |
gluster_log_level LOGFILELEVEL | The log level for glusterfs processes. |
log_file LOGFILE | The path to the geo-replication log file. |
log_level LOGFILELEVEL | The log level for geo-replication. |
changelog_log_level LOGFILELEVEL | The log level for the changelog. The default log level is set to INFO. |
changelog_batch_size SIZEINBYTES | The total size for the changelog in a batch. The default size is set to 727040 bytes. |
ssh_command COMMAND | The SSH command to connect to the remote machine (the default is SSH ). |
sync_method NAME | The command to use for setting synchronizing method for the files. The available options are rsync or tarssh . The default is rsync . The tarssh allows tar over Secure Shell protocol. Use tarssh option to handle workloads of files that have not undergone edits.
Note
On a RHEL 8.3 or above, before configuring the sync_method as _tarssh_ , make sure to install _tar_ package.
# yum install tar |
volume_id=UID | The command to delete the existing master UID for the intermediate/slave node. |
timeout SECONDS | The timeout period in seconds. |
sync_jobs N |
The number of sync-jobs represents the maximum number of syncer threads (rsync processes or tar over ssh processes for syncing) inside each worker. The number of workers is always equal to the number of bricks in the Master volume. For example, a distributed-replicated volume of (3 x 2) with sync-jobs configured at 3 results in 9 total sync-jobs (aka threads) across all nodes/servers.
Active and Passive Workers : The number of active workers is based on the volume configuration. In case of a distribute volume, all bricks (workers) will be active and participate in syncing. In case of replicate or dispersed volume, one worker from each replicate/disperse group (subvolume) will be active and participate in syncing. This is to avoid duplicate syncing from other bricks. The remaining workers in each replicate/disperse group (subvolume) will be passive. In case the active worker goes down, one of the passive worker from the same replicate/disperse group will become an active worker.
|
ignore_deletes | If this option is set to true , a file deleted on the master will not trigger a delete operation on the slave. As a result, the slave will remain as a superset of the master and can be used to recover the master in the event of a crash and/or accidental delete. If this option is set to false , which is the default config option for ignore-deletes , a file deleted on the master will trigger a delete operation on the slave. |
checkpoint [LABEL|now] | Sets a checkpoint with the given option LABEL. If the option is set as now , then the current time will be used as the label. |
sync_acls [true | false] | Syncs acls to the Slave cluster. By default, this option is enabled.
Note
Geo-replication can sync acls only with rsync as the sync engine and not with tarssh as the sync engine.
|
sync_xattrs [true | false] | Syncs extended attributes to the Slave cluster. By default, this option is enabled.
Note
Geo-replication can sync extended attributes only with rsync as the sync engine and not with tarssh as the sync engine.
|
log_rsync_performance [true | false] | If this option is set to enable , geo-replication starts recording the rsync performance in log files. By default, this option is disabled. |
rsync_options | Additional options to rsync. For example, you can limit the rsync bandwidth usage "--bwlimit=<value>". |
use_meta_volume [true | false] | Set this option to enable , to use meta volume in Geo-replication. By default, this option is disabled.
Note
For more information on meta-volume, see Section 10.3.5, “Configuring a Meta-Volume”.
|
meta_volume_mnt PATH | The path of the meta volume mount point. |
gfid_conflict_resolution [true | false] | Auto GFID conflict resolution feature provides an ability to automatically detect and fix the GFID conflicts between master and slave. This configuration option provides an ability to enable or disable this feature. By default, this option is true . |
special_sync_mode |
Speeds up the recovery of data from slave. Adds capability to geo-replication to ignore the files created before enabling indexing option.
Tunables for failover or failback mechanism:
None : gsyncd behaves as normal.
blind : gsyncd works with xtime pairs to identify candidates for synchronization.
wrapup : same as normal mode but does not assign xtimes to orphaned files.
recover : files are only transferred if they are identified as changed on the slave.
Note
Use this mode after ensuring that the number of files in the slave is equal to that of master. Geo-replication will synchronize only those files which are created after making Slave volume as Master volume.
|
10.4.4.1. Geo-replication Checkpoints
10.4.4.1.1. About Geo-replication Checkpoints
Geo-replication data synchronization is an asynchronous process, so changes made on the master may take time to be replicated to the slaves. Data replication to a slave may also be interrupted by various issues, such network outages.
Red Hat Gluster Storage provides the ability to set geo-replication checkpoints. By setting a checkpoint, synchronization information is available on whether the data that was on the master at that point in time has been replicated to the slaves.
10.4.4.1.2. Configuring and Viewing Geo-replication Checkpoint Information
- To set a checkpoint on a geo-replication session, use the following command:
#
gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL config checkpoint
[now|LABEL]
For example, to set checkpoint betweenVolume1
andstorage.backup.com:/data/remote_dir
:# gluster volume geo-replication Volume1 storage.backup.com::slave-vol config checkpoint now geo-replication config updated successfully
The label for a checkpoint can be set as the current time usingnow
, or a particular label can be specified, as shown below:# gluster volume geo-replication Volume1 storage.backup.com::slave-vol config checkpoint NEW_ACCOUNTS_CREATED geo-replication config updated successfully.
- To display the status of a checkpoint for a geo-replication session, use the following command:
#
gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL status detail
- To delete checkpoints for a geo-replication session, use the following command:
#
gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL config '!checkpoint'
For example, to delete the checkpoint set betweenVolume1
andstorage.backup.com::slave-vol
:# gluster volume geo-replication Volume1 storage.backup.com::slave-vol config '!checkpoint' geo-replication config updated successfully
10.4.5. Stopping a Geo-replication Session
To stop a geo-replication session for a root user, use one of the following commands:
- To stop a geo-replication session between the hosts:
#
gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop
For example:# gluster volume geo-replication Volume1 storage.backup.com::slave-vol stop Stopping geo-replication session between Volume1 & storage.backup.com::slave-vol has been successful
Note
Thestop
command will fail if:- any node that is a part of the volume is offline.
- if it is unable to stop the geo-replication session on any particular node.
- if the geo-replication session between the master and slave is not active.
- To stop a geo-replication session forcefully between the hosts:
#
gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop force
For example:# gluster volume geo-replication Volume1 storage.backup.com::slave-vol stop force Stopping geo-replication session between Volume1 & storage.backup.com::slave-vol has been successful
Usingforce
will stop the geo-replication session between the master and slave even if any node that is a part of the volume is offline. If it is unable to stop the geo-replication session on any particular node, the command will still stop the geo-replication sessions on as many nodes as it can. Usingforce
will also stop inactive geo-replication sessions.
To stop a geo-replication session for a non-root user, use the following command:
# gluster volume geo-replication MASTER_VOL geoaccount@SLAVE_HOST::SLAVE_VOL stop
10.4.6. Deleting a Geo-replication Session
Important
You must first stop a geo-replication session before it can be deleted. For more information, see Section 10.4.5, “Stopping a Geo-replication Session”.
To delete a geo-replication session for a root user, use the following command:
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL delete reset-sync-time
reset-sync-time
: The geo-replication delete command retains the information about the last synchronized time. Due to this, if the same geo-replication session is recreated, then the synchronization will continue from the time where it was left before deleting the session. For the geo-replication session to not maintain any details about the deleted session, use the reset-sync-time
option with the delete command. Now, when the session is recreated, it starts synchronization from the beginning just like a new session.
For example:
# gluster volume geo-replication Volume1 storage.backup.com::slave-vol delete geo-replication command executed successfully
Note
The
delete
command will fail if:
- any node that is a part of the volume is offline.
- if it is unable to delete the geo-replication session on any particular node.
- if the geo-replication session between the master and slave is still active.
Important
The SSH keys will not removed from the master and slave nodes when the geo-replication session is deleted. You can manually remove the
pem
files which contain the SSH keys from the /var/lib/glusterd/geo-replication/
directory.
To delete a geo-replication session for a non-root user, use the following command:
# gluster volume geo-replication MASTER_VOL geoaccount@SLAVE_HOST::SLAVE_VOL delete reset-sync-time