10.11. Troubleshooting Geo-replication

This section describes the most common troubleshooting scenarios related to geo-replication.

10.11.1. Tuning Geo-replication performance with Change Log

There are options for the change log that can be configured to give better performance in a geo-replication environment.

The rollover-time option sets the rate at which the change log is consumed. The default rollover time is 60 seconds, but it can be configured to a faster rate. A recommended rollover-time for geo-replication is 10-15 seconds. To change the rollover-time option, use following the command:

# gluster volume set VOLNAME rollover-time 15

The fsync-interval option determines the frequency that updates to the change log are written to disk. The default interval is 0, which means that updates to the change log are written synchronously as they occur, and this may negatively impact performance in a geo-replication environment. Configuring fsync-interval to a non-zero value will write updates to disk asynchronously at the specified interval. To change the fsync-interval option, use following the command:

# gluster volume set VOLNAME fsync-interval 3

10.11.2. Triggering Explicit Sync on Entries

Geo-replication provides an option to explicitly trigger the sync operation of files and directories. A virtual extended attribute glusterfs.geo-rep.trigger-sync is provided to accomplish the same.

# setfattr -n glusterfs.geo-rep.trigger-sync -v "1" <file-path>

The support of explicit trigger of sync is supported only for directories and regular files.

10.11.3. Synchronization Is Not Complete

Situation

The geo-replication status is displayed as Stable, but the data has not been completely synchronized.

Solution

A full synchronization of the data can be performed by erasing the index and restarting geo-replication. After restarting geo-replication, it will begin a synchronization of the data using checksums. This may be a long and resource intensive process on large data sets. If the issue persists, contact Red Hat Support.

For more information about erasing the index, see Section 11.1, “Configuring Volume Options”.

10.11.4. Issues with File Synchronization

Situation

The geo-replication status is displayed as Stable, but only directories and symlinks are synchronized. Error messages similar to the following are in the logs:

[2011-05-02 13:42:13.467644] E [master:288:regjob] GMaster: failed to sync ./some_file`

Solution

Geo-replication requires rsync v3.0.0 or higher on the host and the remote machines. Verify if you have installed the required version of rsync.

10.11.5. Geo-replication Status is Often `Faulty`

Situation

The geo-replication status is often displayed as Faulty, with a backtrace similar to the following:

012-09-28 14:06:18.378859] E [syncdutils:131:log_raise_exception] <top>: FAIL: Traceback (most recent call last): File "/usr/local/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 152, in twraptf(*aa) File "/usr/local/libexec/glusterfs/python/syncdaemon/repce.py", line 118, in listen rid, exc, res = recv(self.inf) File "/usr/local/libexec/glusterfs/python/syncdaemon/repce.py", line 42, in recv return pickle.load(inf) EOFError

Solution

This usually indicates that RPC communication between the master gsyncd module and slave gsyncd module is broken. Make sure that the following prerequisites are met:

Passwordless SSH is set up properly between the host and remote machines.
FUSE is installed on the machines. The geo-replication module mounts Red Hat Gluster Storage volumes using FUSE to sync data.

10.11.6. Intermediate Master is in a Faulty State

Situation

In a cascading environment, the intermediate master is in a faulty state, and messages similar to the following are in the log:

raise RuntimeError ("aborting on uuid change from %s to %s" % \ RuntimeError: aborting on uuid change from af07e07c-427f-4586-ab9f- 4bf7d299be81 to de6b5040-8f4e-4575-8831-c4f55bd41154

Solution

In a cascading configuration, an intermediate master is loyal to its original primary master. The above log message indicates that the geo-replication module has detected that the primary master has changed. If this change was deliberate, delete the volume-id configuration option in the session that was initiated from the intermediate master.

10.11.7. Remote gsyncd Not Found

Situation

The master is in a faulty state, and messages similar to the following are in the log:

[2012-04-04 03:41:40.324496] E [resource:169:errfail] Popen: ssh> bash: /usr/local/libexec/glusterfs/gsyncd: No such file or directory

Solution

The steps to configure a SSH connection for geo-replication have been updated. Use the steps as described in Section 10.3.4.1, “Setting Up your Environment for Geo-replication Session”

10.11. Troubleshooting Geo-replication

10.11.1. Tuning Geo-replication performance with Change Log

10.11.2. Triggering Explicit Sync on Entries

10.11.3. Synchronization Is Not Complete

10.11.4. Issues with File Synchronization

10.11.5. Geo-replication Status is Often `Faulty`

10.11.6. Intermediate Master is in a Faulty State

10.11.7. Remote gsyncd Not Found

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Red Hat legal and privacy links

Red Hat legal and privacy links

10.11. Troubleshooting Geo-replication

10.11.1. Tuning Geo-replication performance with Change Log

10.11.2. Triggering Explicit Sync on Entries

10.11.3. Synchronization Is Not Complete

10.11.4. Issues with File Synchronization

10.11.5. Geo-replication Status is Often Faulty

10.11.6. Intermediate Master is in a Faulty State

10.11.7. Remote gsyncd Not Found

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Red Hat legal and privacy links

Red Hat legal and privacy links

10.11.5. Geo-replication Status is Often `Faulty`