11.11. Managing Split-brain
- Data split-brain: Contents of the file under split-brain are different in different replica pairs and automatic healing is not possible.
- Metadata split-brain : The metadata of the files (example, user defined extended attribute) are different and automatic healing is not possible.
- Entry split-brain: This happens when a file have different gfids on each of the replica pair.
11.11.1. Preventing Split-brain Copy linkLink copied to clipboard!
11.11.1.1. Configuring Server-Side Quorum Copy linkLink copied to clipboard!
cluster.server-quorum-type
volume option as server
. For more information on this volume option, see Section 11.1, “Configuring Volume Options”.
glusterd
service. Whenever the glusterd
service on a machine observes that the quorum is not met, it brings down the bricks to prevent data split-brain. When the network connections are brought back up and the quorum is restored, the bricks in the volume are brought back up. When the quorum is not met for a volume, any commands that update the volume configuration or peer addition or detach are not allowed. It is to be noted that both, the glusterd
service not running and the network connection between two machines being down are treated equally.
gluster volume set all cluster.server-quorum-ratio PERCENTAGE
# gluster volume set all cluster.server-quorum-ratio PERCENTAGEgluster volume set all cluster.server-quorum-ratio PERCENTAGE
gluster volume set all cluster.server-quorum-ratio 51%
# gluster volume set all cluster.server-quorum-ratio 51%
gluster volume set VOLNAME cluster.server-quorum-type server
# gluster volume set VOLNAME cluster.server-quorum-type servergluster volume set VOLNAME cluster.server-quorum-type servergluster volume set VOLNAME cluster.server-quorum-type servergluster volume set VOLNAME cluster.server-quorum-type server
Important
11.11.1.2. Configuring Client-Side Quorum Copy linkLink copied to clipboard!
Example 11.8. Client-Side Quorum
A
, only replica group A
becomes read-only. Replica groups B
and C
continue to allow data modifications.
Important
- If
cluster.quorum-type
isfixed
, writes will continue till number of bricks up and running in replica pair is equal to the count specified incluster.quorum-count
option. This is irrespective of first or second or third brick. All the bricks are equivalent here. - If
cluster.quorum-type
isauto
, then at least ceil (n/2) number of bricks need to be up to allow writes, wheren
is the replica count. For example,In addition, forCopy to Clipboard Copied! Toggle word wrap Toggle overflow auto
, if the number of bricks that are up is exactly ceil (n/2), andn
is an even number, then the first brick of the replica must also be up to allow writes. For replica 6, if more than 3 bricks are up, then it can be any of the bricks. But if exactly 3 bricks are up, then the first brick has to be up and running. - In a three-way replication setup, it is recommended to set
cluster.quorum-type
toauto
to avoid split-brains. If the quorum is not met, the replica pair becomes read-only.
cluster.quorum-type
and cluster.quorum-count
options. For more information on these options, see Section 11.1, “Configuring Volume Options”.
Important
gluster volume set VOLNAME group virt
command. If on a two replica set up, if the first brick in the replica pair is offline, virtual machines will be paused because quorum is not met and writes are disallowed.
gluster volume reset VOLNAME quorum-type
# gluster volume reset VOLNAME quorum-typegluster volume reset VOLNAME quorum-typegluster volume reset VOLNAME quorum-type
This example provides information on how to set server-side and client-side quorum on a Distribute Replicate volume to avoid split-brain scenario. The configuration of this example has 2 X 2 ( 4 bricks) Distribute Replicate setup.
gluster volume set VOLNAME cluster.server-quorum-type server
# gluster volume set VOLNAME cluster.server-quorum-type servergluster volume set VOLNAME cluster.server-quorum-type server
gluster volume set all cluster.server-quorum-ratio 51%
# gluster volume set all cluster.server-quorum-ratio 51%gluster volume set all cluster.server-quorum-ratio 51%
quorum-type
option to auto
to allow writes to the file only if the percentage of active replicate bricks is more than 50% of the total number of bricks that constitute that replica.
gluster volume set VOLNAME quorum-type auto
# gluster volume set VOLNAME quorum-type autogluster volume set VOLNAME quorum-type auto
Important
n
) in a replica set is an even number, it is mandatory that the n/2
count must consist of the primary brick and it must be up and running. If n
is an odd number, the n/2
count can have any brick up and running, that is, the primary brick need not be up and running to allow writes.
11.11.2. Recovering from File Split-brain Copy linkLink copied to clipboard!
- See Section 11.11.2.1, “ Recovering File Split-brain from the Mount Point” for information on how to recover from data and meta-data split-brain from the mount point.
- See Section 11.11.2.2, “Recovering File Split-brain from the gluster CLI” for information on how to recover from data and meta-data split-brain using CLI
gfid/entry
split-brain, see Chapter 25, Manually Recovering File Split-brain .
11.11.2.1. Recovering File Split-brain from the Mount Point Copy linkLink copied to clipboard!
Steps to recover from a split-brain from the mount point
- You can use a set of
getfattr
andsetfattr
commands to detect the data and meta-data split-brain status of a file and resolve split-brain from the mount point.Important
This process for split-brain resolution from mount will not work on NFS mounts as it does not provide extended attributes support.In this example, thetest-volume
volume has bricksbrick0
,brick1
,brick2
andbrick3
.Copy to Clipboard Copied! Toggle word wrap Toggle overflow Directory structure of the bricks is as follows:Copy to Clipboard Copied! Toggle word wrap Toggle overflow In the following output, some of the files in the volume are in split-brain.Copy to Clipboard Copied! Toggle word wrap Toggle overflow To know data or meta-data split-brain status of a file:getfattr -n replica.split-brain-status <path-to-file>
# getfattr -n replica.split-brain-status <path-to-file>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The above command executed from mount provides information if a file is in data or meta-data split-brain. This command is not applicable to gfid/entry split-brain.For example,file100
is in meta-data split-brain. Executing the above mentioned command forfile100
gives :getfattr -n replica.split-brain-status file100 file: file100
# getfattr -n replica.split-brain-status file100 # file: file100 replica.split-brain-status="data-split-brain:no metadata-split-brain:yes Choices:test-client-0,test-client-1"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow file1
is in data split-brain.getfattr -n replica.split-brain-status file1 file: file1
# getfattr -n replica.split-brain-status file1 # file: file1 replica.split-brain-status="data-split-brain:yes metadata-split-brain:no Choices:test-client-2,test-client-3"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow file99
is in both data and meta-data split-brain.getfattr -n replica.split-brain-status file99 file: file99
# getfattr -n replica.split-brain-status file99 # file: file99 replica.split-brain-status="data-split-brain:yes metadata-split-brain:yes Choices:test-client-2,test-client-3"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow dir
is ingfid/entry
split-brain but as mentioned earlier, the above command is does not display if the file is ingfid/entry
split-brain. Hence, the command displaysThe file is not under data or metadata split-brain
. For information on resolving gfid/entry split-brain, see Chapter 25, Manually Recovering File Split-brain .getfattr -n replica.split-brain-status dir file: dir
# getfattr -n replica.split-brain-status dir # file: dir replica.split-brain-status="The file is not under data or metadata split-brain"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow file2
is not in any kind of split-brain.getfattr -n replica.split-brain-status file2 file: file2
# getfattr -n replica.split-brain-status file2 # file: file2 replica.split-brain-status="The file is not under data or metadata split-brain"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
- Analyze the files in data and meta-data split-brain and resolve the issue
When you perform operations like
cat
,getfattr
, and more from the mount on files in split-brain, it throws an input/output error. For further analyzing such files, you can usesetfattr
command.setfattr -n replica.split-brain-choice -v "choiceX" <path-to-file>
# setfattr -n replica.split-brain-choice -v "choiceX" <path-to-file>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Using this command, a particular brick can be chosen to access the file in split-brain.For example,file1
is in data-split-brain and when you try to read from the file, it throws input/output error.cat file1 cat: file1: Input/output error
# cat file1 cat: file1: Input/output error
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Split-brain choices provided for file1 weretest-client-2
andtest-client-3
.Settingtest-client-2
as split-brain choice for file1 serves reads fromb2
for the file.setfattr -n replica.split-brain-choice -v test-client-2 file1
# setfattr -n replica.split-brain-choice -v test-client-2 file1
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Now, you can perform operations on the file. For example, read operations on the file:cat file1 xyz
# cat file1 xyz
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Similarly, to inspect the file from other choice,replica.split-brain-choice
is to be set totest-client-3
.Trying to inspect the file from a wrong choice errors out. You can undo the split-brain-choice that has been set, the above mentionedsetfattr
command can be used withnone
as the value for extended attribute.For example,setfattr -n replica.split-brain-choice -v none file1
# setfattr -n replica.split-brain-choice -v none file1
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Now performingcat
operation on the file will again result in input/output error, as before.cat file cat: file1: Input/output error
# cat file cat: file1: Input/output error
Copy to Clipboard Copied! Toggle word wrap Toggle overflow After you decide which brick to use as a source for resolving the split-brain, it must be set for the healing to be done.setfattr -n replica.split-brain-heal-finalize -v <heal-choice> <path-to-file>
# setfattr -n replica.split-brain-heal-finalize -v <heal-choice> <path-to-file>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Examplesetfattr -n replica.split-brain-heal-finalize -v test-client-2 file1
# setfattr -n replica.split-brain-heal-finalize -v test-client-2 file1
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The above process can be used to resolve data and/or meta-data split-brain on all the files.Setting the split-brain-choice on the fileAfter setting the split-brain-choice on the file, the file can be analyzed only for five minutes. If the duration of analyzing the file needs to be increased, use the following command and set the required time intimeout-in-minute
argument.setfattr -n replica.split-brain-choice-timeout -v <timeout-in-minutes> <mount_point/file>
# setfattr -n replica.split-brain-choice-timeout -v <timeout-in-minutes> <mount_point/file>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow This is a global timeout and is applicable to all files as long as the mount exists. The timeout need not be set each time a file needs to be inspected but for a new mount it will have to be set again for the first time. This option becomes invalid if the operations like add-brick or remove-brick are performed.Note
Iffopen-keep-cache
FUSE mount option is disabled, then inode must be invalidated each time before selecting a newreplica.split-brain-choice
to inspect a file using the following command:setfattr -n inode-invalidate -v 0 <path-to-file>
# setfattr -n inode-invalidate -v 0 <path-to-file>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
11.11.2.2. Recovering File Split-brain from the gluster CLI Copy linkLink copied to clipboard!
- Use bigger-file as source
- Use the file with latest mtime as source
- Use one replica as source for a particular file
- Use one replica as source for all files
Note
entry/gfid
split-brain resolution is not supported using CLI. For information on resolving gfid/entry
split-brain, see Chapter 25, Manually Recovering File Split-brain .
This method is useful for per file healing and where you can decided that the file with bigger size is to be considered as source.
- Run the following command to obtain the list of files that are in split-brain:
gluster volume heal VOLNAME info split-brain
# gluster volume heal VOLNAME info split-brain
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Copy to Clipboard Copied! Toggle word wrap Toggle overflow From the command output, identify the files that are in split-brain.You can find the differences in the file size and md5 checksums by performing a stat and md5 checksums on the file from the bricks. The following is the stat and md5 checksum output of a file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow You can notice the differences in the file size and md5 checksums. - Execute the following command along with the full file name as seen from the root of the volume (or) the gfid-string representation of the file, which is displayed in the heal info command's output.
gluster volume heal <VOLNAME> split-brain bigger-file <FILE>
# gluster volume heal <VOLNAME> split-brain bigger-file <FILE>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow For example,gluster volume heal test-volume split-brain bigger-file /dir/file1
# gluster volume heal test-volume split-brain bigger-file /dir/file1 Healed /dir/file1.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
This method is useful for per file healing and if you want the file with latest mtime has to be considered as source.
- Run the following command to obtain the list of files that are in split-brain:
gluster volume heal VOLNAME info split-brain
# gluster volume heal VOLNAME info split-brain
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Copy to Clipboard Copied! Toggle word wrap Toggle overflow From the command output, identify the files that are in split-brain.You can find the differences in the file size and md5 checksums by performing a stat and md5 checksums on the file from the bricks. The following is the stat and md5 checksum output of a file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow You can notice the differences in the md5 checksums, and the modify time. - Execute the following command
gluster volume heal <VOLNAME> split-brain latest-mtime <FILE>
# gluster volume heal <VOLNAME> split-brain latest-mtime <FILE>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In this command, FILE can be either the full file name as seen from the root of the volume or the gfid-string representation of the file.For example,#gluster volume heal test-volume split-brain latest-mtime /file4 Healed /file4
#gluster volume heal test-volume split-brain latest-mtime /file4 Healed /file4
Copy to Clipboard Copied! Toggle word wrap Toggle overflow After the healing is complete, the md5 checksum, file size, and modify time on both bricks must be same. The following is a sample output of the stat and md5 checksums command after completion of healing the file. You can notice that the file has been healed using the brick having the latest mtime (brick b1, in this example) as the source.Copy to Clipboard Copied! Toggle word wrap Toggle overflow
This method is useful if you know which file is to be considered as source.
- Run the following command to obtain the list of files that are in split-brain:
gluster volume heal VOLNAME info split-brain
# gluster volume heal VOLNAME info split-brain
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Copy to Clipboard Copied! Toggle word wrap Toggle overflow From the command output, identify the files that are in split-brain.You can find the differences in the file size and md5 checksums by performing a stat and md5 checksums on the file from the bricks. The following is the stat and md5 checksum output of a file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow You can notice the differences in the file size and md5 checksums. - Execute the following command
gluster volume heal <VOLNAME> split-brain source-brick <HOSTNAME:BRICKNAME> <FILE>
# gluster volume heal <VOLNAME> split-brain source-brick <HOSTNAME:BRICKNAME> <FILE>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In this command, FILE present in <HOSTNAME:BRICKNAME> is taken as source for healing.For example,gluster volume heal test-volume split-brain source-brick test-host:b1 /file4
# gluster volume heal test-volume split-brain source-brick test-host:b1 /file4 Healed /file4
Copy to Clipboard Copied! Toggle word wrap Toggle overflow After the healing is complete, the md5 checksum and file size on both bricks must be same. The following is a sample output of the stat and md5 checksums command after completion of healing the file.Copy to Clipboard Copied! Toggle word wrap Toggle overflow
This method is useful if you know want to use a particular brick as a source for the split-brain files in that replica pair.
- Run the following command to obtain the list of files that are in split-brain:
gluster volume heal VOLNAME info split-brain
# gluster volume heal VOLNAME info split-brain
Copy to Clipboard Copied! Toggle word wrap Toggle overflow From the command output, identify the files that are in split-brain. - Execute the following command
gluster volume heal <VOLNAME> split-brain source-brick <HOSTNAME:BRICKNAME>
# gluster volume heal <VOLNAME> split-brain source-brick <HOSTNAME:BRICKNAME>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In this command, for all the files that are in split-brain in this replica, <HOSTNAME:BRICKNAME> is taken as source for healing.For example,gluster volume heal test-volume split-brain source-brick test-host:b1
# gluster volume heal test-volume split-brain source-brick test-host:b1
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
11.11.3. Triggering Self-Healing on Replicated Volumes Copy linkLink copied to clipboard!
Self-heal daemon has the capability to handle multiple heals in parallel and is supported on Replicate and Distribute-replicate volumes. However, increasing the number of heals has impact on I/O performance so the following options have been provided. The cluster.shd-max-threads
volume option controls the number of entries that can be self healed in parallel on each replica by self-heal daemon using. Using cluster.shd-wait-qlength
volume option, you can configure the number of entries that must be kept in the queue for self-heal daemon threads to take up as soon as any of the threads are free to heal.
cluster.shd-max-threads
and cluster.shd-wait-qlength
volume set options, see Section 11.1, “Configuring Volume Options”.
- To view the list of files that need healing:
gluster volume heal VOLNAME info
# gluster volume heal VOLNAME infogluster volume heal VOLNAME infogluster volume heal VOLNAME info
Copy to Clipboard Copied! Toggle word wrap Toggle overflow For example, to view the list of files on test-volume that need healing:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - To trigger self-healing only on the files which require healing:
gluster volume heal VOLNAME
# gluster volume heal VOLNAMEgluster volume heal VOLNAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow For example, to trigger self-healing on files which require healing on test-volume:gluster volume heal test-volume
# gluster volume heal test-volume Heal operation on volume test-volume has been successful
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - To trigger self-healing on all the files on a volume:
gluster volume heal VOLNAME full
# gluster volume heal VOLNAME fullgluster volume heal VOLNAME fullgluster volume heal VOLNAME full
Copy to Clipboard Copied! Toggle word wrap Toggle overflow For example, to trigger self-heal on all the files on test-volume:gluster volume heal test-volume full
# gluster volume heal test-volume full Heal operation on volume test-volume has been successful
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - To view the list of files on a volume that are in a split-brain state:
gluster volume heal VOLNAME info split-brain
# gluster volume heal VOLNAME info split-braingluster volume heal VOLNAME info split-braingluster volume heal VOLNAME info split-brain
Copy to Clipboard Copied! Toggle word wrap Toggle overflow For example, to view the list of files on test-volume that are in a split-brain state:Copy to Clipboard Copied! Toggle word wrap Toggle overflow