11.15. Managing Split-brain
- Data split-brain: Contents of the file under split-brain are different in different replica pairs and automatic healing is not possible.Red Hat allows the user to resolve Data split-brain from the mount point and from the CLI.For information on how to recover from data split-brain from the mount point, see Section 11.15.2.1, “ Recovering File Split-brain from the Mount Point”.For information on how to recover from data split-brain using CLIS, see Section 11.15.2.2, “Recovering File Split-brain from the gluster CLI”.
- Metadata split-brain: The metadata of the files like user defined extended attribute are different and automatic healing is not possible.Like Data split-brain, Metadata split-brain can also be resolved from both mount point and CLI.For information on how to recover from metadata split-brain from the mount point, see Section 11.15.2.1, “ Recovering File Split-brain from the Mount Point”.para>For information on how to recover from metadata split-brain using CLI, see Section 11.15.2.2, “Recovering File Split-brain from the gluster CLI”.
- Entry split-brain: Entry split-brain can be of two types:
- GlusterFS Internal File Identifier or GFID split-Brain: This happen when files or directories in different replica pairs have different GFIDs.
- Type Mismatch Split-Brain: This happen when the files/directories stored in replica pairs are of different types but with the same names.
Red Hat Gluster Storage 3.4 allows you to resolve GFID split-brain from gluster CLI. For more information, see Section 11.15.3, “Recovering GFID Split-brain from the gluster CLI”.You can resolve split-brain manually by inspecting the file contents from the backend and deciding which is the true copy (source) and modifying the appropriate extended attributes such that healing can happen automatically.
11.15.1. Preventing Split-brain
11.15.1.1. Configuring Server-Side Quorum
cluster.server-quorum-type
volume option as server
. For more information on this volume option, see Section 11.1, “Configuring Volume Options”.
glusterd
service. Whenever the glusterd
service on a machine observes that the quorum is not met, it brings down the bricks to prevent data split-brain. When the network connections are brought back up and the quorum is restored, the bricks in the volume are brought back up. When the quorum is not met for a volume, any commands that update the volume configuration or peer addition or detach are not allowed. It is to be noted that both, the glusterd
service not running and the network connection between two machines being down are treated equally.
# gluster volume set all cluster.server-quorum-ratio PERCENTAGE
# gluster volume set all cluster.server-quorum-ratio 51%
# gluster volume set VOLNAME cluster.server-quorum-type server
Important
11.15.1.2. Configuring Client-Side Quorum
Client-Side Quorum Options
- cluster.quorum-count
- The minimum number of bricks that must be available in order for writes to be allowed. This is set on a per-volume basis. Valid values are between
1
and the number of bricks in a replica set. This option is used by thecluster.quorum-type
option to determine write behavior. - cluster.quorum-type
- Determines when the client is allowed to write to a volume. Valid values are
fixed
andauto
.Ifcluster.quorum-type
isfixed
, writes are allowed as long as the number of bricks available in the replica set is greater than or equal to the value of thecluster.quorum-count
option.Ifcluster.quorum-type
isauto
, writes are allowed when at least 50% of the bricks in a replica set are be available. In a replica set with an even number of bricks, if exactly 50% of the bricks are available, the first brick in the replica set must be available in order for writes to continue.In a three-way replication setup, it is recommended to setcluster.quorum-type
toauto
to avoid split-brains. If the quorum is not met, the replica pair becomes read-only.
Example 11.7. Client-Side Quorum
A
, only replica group A
becomes read-only. Replica groups B
and C
continue to allow data modifications.
cluster.quorum-type
and cluster.quorum-count
options.
Important
gluster volume set VOLNAME group virt
command. If on a two replica set up, if the first brick in the replica pair is offline, virtual machines will be paused because quorum is not met and writes are disallowed.
# gluster volume reset VOLNAME quorum-type
This example provides information on how to set server-side and client-side quorum on a Distribute Replicate volume to avoid split-brain scenario. The configuration of this example has 2 X 2 ( 4 bricks) Distribute Replicate setup.
# gluster volume info testvol Volume Name: testvol Type: Distributed-Replicate Volume ID: 0df52d58-bded-4e5d-ac37-4c82f7c89cfh Status: Created Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: server1:/rhgs/brick1 Brick2: server2:/rhgs/brick2 Brick3: server3:/rhgs/brick3 Brick4: server4:/rhgs/brick4
# gluster volume set VOLNAME cluster.server-quorum-type server
# gluster volume set all cluster.server-quorum-ratio 51%
quorum-type
option to auto
to allow writes to the file only if the percentage of active replicate bricks is more than 50% of the total number of bricks that constitute that replica.
# gluster volume set VOLNAME quorum-type auto
Important
n
) in a replica set is an even number, it is mandatory that the n/2
count must consist of the primary brick and it must be up and running. If n
is an odd number, the n/2
count can have any brick up and running, that is, the primary brick need not be up and running to allow writes.
11.15.2. Recovering from File Split-brain
- See Section 11.15.2.1, “ Recovering File Split-brain from the Mount Point” for information on how to recover from data and meta-data split-brain from the mount point.
- See Section 11.15.2.2, “Recovering File Split-brain from the gluster CLI” for information on how to recover from data and meta-data split-brain using CLI
11.15.2.1. Recovering File Split-brain from the Mount Point
Steps to recover from a split-brain from the mount point
- You can use a set of
getfattr
andsetfattr
commands to detect the data and meta-data split-brain status of a file and resolve split-brain from the mount point.Important
This process for split-brain resolution from mount will not work on NFS mounts as it does not provide extended attributes support.In this example, thetest-volume
volume has bricksbrick0
,brick1
,brick2
andbrick3
.# gluster volume info test-volume Volume Name: test-volume Type: Distributed-Replicate Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: test-host:/rhgs/brick0 Brick2: test-host:/rhgs/brick1 Brick3: test-host:/rhgs/brick2 Brick4: test-host:/rhgs/brick3
Directory structure of the bricks is as follows:# tree -R /test/b? /rhgs/brick0 ├── dir │ └── a └── file100 /rhgs/brick1 ├── dir │ └── a └── file100 /rhgs/brick2 ├── dir ├── file1 ├── file2 └── file99 /rhgs/brick3 ├── dir ├── file1 ├── file2 └── file99
In the following output, some of the files in the volume are in split-brain.# gluster volume heal test-volume info split-brain Brick test-host:/rhgs/brick0/ /file100 /dir Number of entries in split-brain: 2 Brick test-host:/rhgs/brick1/ /file100 /dir Number of entries in split-brain: 2 Brick test-host:/rhgs/brick2/ /file99 <gfid:5399a8d1-aee9-4653-bb7f-606df02b3696> Number of entries in split-brain: 2 Brick test-host:/rhgs/brick3/ <gfid:05c4b283-af58-48ed-999e-4d706c7b97d5> <gfid:5399a8d1-aee9-4653-bb7f-606df02b3696> Number of entries in split-brain: 2
To know data or meta-data split-brain status of a file:# getfattr -n replica.split-brain-status <path-to-file>
The above command executed from mount provides information if a file is in data or meta-data split-brain. This command is not applicable to entry/type-mismatch split-brain.For example,file100
is in meta-data split-brain. Executing the above mentioned command forfile100
gives :# getfattr -n replica.split-brain-status file100 # file: file100 replica.split-brain-status="data-split-brain:no metadata-split-brain:yes Choices:test-client-0,test-client-1"
file1
is in data split-brain.# getfattr -n replica.split-brain-status file1 # file: file1 replica.split-brain-status="data-split-brain:yes metadata-split-brain:no Choices:test-client-2,test-client-3"
file99
is in both data and meta-data split-brain.# getfattr -n replica.split-brain-status file99 # file: file99 replica.split-brain-status="data-split-brain:yes metadata-split-brain:yes Choices:test-client-2,test-client-3"
dir
is inentry/type-mismatch
split-brain but as mentioned earlier, the above command is does not display if the file is inentry/type-mismatch
split-brain. Hence, the command displaysThe file is not under data or metadata split-brain
. For information on resolving entry/type-mismatch split-brain, see Chapter 25, Manually Recovering File Split-brain .# getfattr -n replica.split-brain-status dir # file: dir replica.split-brain-status="The file is not under data or metadata split-brain"
file2
is not in any kind of split-brain.# getfattr -n replica.split-brain-status file2 # file: file2 replica.split-brain-status="The file is not under data or metadata split-brain"
- Analyze the files in data and meta-data split-brain and resolve the issue
When you perform operations like
cat
,getfattr
, and more from the mount on files in split-brain, it throws an input/output error. For further analyzing such files, you can usesetfattr
command.# setfattr -n replica.split-brain-choice -v "choiceX" <path-to-file>
Using this command, a particular brick can be chosen to access the file in split-brain.For example,file1
is in data-split-brain and when you try to read from the file, it throws input/output error.# cat file1 cat: file1: Input/output error
Split-brain choices provided for file1 weretest-client-2
andtest-client-3
.Settingtest-client-2
as split-brain choice for file1 serves reads fromb2
for the file.# setfattr -n replica.split-brain-choice -v test-client-2 file1
Now, you can perform operations on the file. For example, read operations on the file:# cat file1 xyz
Similarly, to inspect the file from other choice,replica.split-brain-choice
is to be set totest-client-3
.Trying to inspect the file from a wrong choice errors out. You can undo the split-brain-choice that has been set, the above mentionedsetfattr
command can be used withnone
as the value for extended attribute.For example,# setfattr -n replica.split-brain-choice -v none file1
Now performingcat
operation on the file will again result in input/output error, as before.# cat file cat: file1: Input/output error
After you decide which brick to use as a source for resolving the split-brain, it must be set for the healing to be done.# setfattr -n replica.split-brain-heal-finalize -v <heal-choice> <path-to-file>
Example# setfattr -n replica.split-brain-heal-finalize -v test-client-2 file1
The above process can be used to resolve data and/or meta-data split-brain on all the files.Setting the split-brain-choice on the fileAfter setting the split-brain-choice on the file, the file can be analyzed only for five minutes. If the duration of analyzing the file needs to be increased, use the following command and set the required time intimeout-in-minute
argument.# setfattr -n replica.split-brain-choice-timeout -v <timeout-in-minutes> <mount_point/file>
This is a global timeout and is applicable to all files as long as the mount exists. The timeout need not be set each time a file needs to be inspected but for a new mount it will have to be set again for the first time. This option becomes invalid if the operations like add-brick or remove-brick are performed.Note
Iffopen-keep-cache
FUSE mount option is disabled, then inode must be invalidated each time before selecting a newreplica.split-brain-choice
to inspect a file using the following command:# setfattr -n inode-invalidate -v 0 <path-to-file>
11.15.2.2. Recovering File Split-brain from the gluster CLI
- Use bigger-file as source
- Use the file with latest mtime as source
- Use one replica as source for a particular file
- Use one replica as source for all files
Note
Theentry/type-mismatch
split-brain resolution is not supported using CLI. For information on resolvingentry/type-mismatch
split-brain, see Chapter 25, Manually Recovering File Split-brain .
This method is useful for per file healing and where you can decided that the file with bigger size is to be considered as source.
- Run the following command to obtain the list of files that are in split-brain:
# gluster volume heal VOLNAME info split-brain
Brick <hostname:brickpath-b1> <gfid:aaca219f-0e25-4576-8689-3bfd93ca70c2> <gfid:39f301ae-4038-48c2-a889-7dac143e82dd> <gfid:c3c94de2-232d-4083-b534-5da17fc476ac> Number of entries in split-brain: 3 Brick <hostname:brickpath-b2> /dir/file1 /dir /file4 Number of entries in split-brain: 3
From the command output, identify the files that are in split-brain.You can find the differences in the file size and md5 checksums by performing a stat and md5 checksums on the file from the bricks. The following is the stat and md5 checksum output of a file:On brick b1: # stat b1/dir/file1 File: ‘b1/dir/file1’ Size: 17 Blocks: 16 IO Block: 4096 regular file Device: fd03h/64771d Inode: 919362 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2015-03-06 13:55:40.149897333 +0530 Modify: 2015-03-06 13:55:37.206880347 +0530 Change: 2015-03-06 13:55:37.206880347 +0530 Birth: - # md5sum b1/dir/file1 040751929ceabf77c3c0b3b662f341a8 b1/dir/file1 On brick b2: # stat b2/dir/file1 File: ‘b2/dir/file1’ Size: 13 Blocks: 16 IO Block: 4096 regular file Device: fd03h/64771d Inode: 919365 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2015-03-06 13:54:22.974451898 +0530 Modify: 2015-03-06 13:52:22.910758923 +0530 Change: 2015-03-06 13:52:22.910758923 +0530 Birth: - # md5sum b2/dir/file1 cb11635a45d45668a403145059c2a0d5 b2/dir/file1
You can notice the differences in the file size and md5 checksums. - Execute the following command along with the full file name as seen from the root of the volume (or) the gfid-string representation of the file, which is displayed in the heal info command's output.
# gluster volume heal <VOLNAME> split-brain bigger-file <FILE>
For example,# gluster volume heal test-volume split-brain bigger-file /dir/file1 Healed /dir/file1.
On brick b1: # stat b1/dir/file1 File: ‘b1/dir/file1’ Size: 17 Blocks: 16 IO Block: 4096 regular file Device: fd03h/64771d Inode: 919362 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2015-03-06 14:17:27.752429505 +0530 Modify: 2015-03-06 13:55:37.206880347 +0530 Change: 2015-03-06 14:17:12.880343950 +0530 Birth: - # md5sum b1/dir/file1 040751929ceabf77c3c0b3b662f341a8 b1/dir/file1 On brick b2: # stat b2/dir/file1 File: ‘b2/dir/file1’ Size: 17 Blocks: 16 IO Block: 4096 regular file Device: fd03h/64771d Inode: 919365 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2015-03-06 14:17:23.249403600 +0530 Modify: 2015-03-06 13:55:37.206880000 +0530 Change: 2015-03-06 14:17:12.881343955 +0530 Birth: - # md5sum b2/dir/file1 040751929ceabf77c3c0b3b662f341a8 b2/dir/file1
This method is useful for per file healing and if you want the file with latest mtime has to be considered as source.
- Run the following command to obtain the list of files that are in split-brain:
# gluster volume heal VOLNAME info split-brain
Brick <hostname:brickpath-b1> <gfid:aaca219f-0e25-4576-8689-3bfd93ca70c2> <gfid:39f301ae-4038-48c2-a889-7dac143e82dd> <gfid:c3c94de2-232d-4083-b534-5da17fc476ac> Number of entries in split-brain: 3 Brick <hostname:brickpath-b2> /dir/file1 /dir /file4 Number of entries in split-brain: 3
From the command output, identify the files that are in split-brain.You can find the differences in the file size and md5 checksums by performing a stat and md5 checksums on the file from the bricks. The following is the stat and md5 checksum output of a file:On brick b1: stat b1/file4 File: ‘b1/file4’ Size: 4 Blocks: 16 IO Block: 4096 regular file Device: fd03h/64771d Inode: 919356 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2015-03-06 13:53:19.417085062 +0530 Modify: 2015-03-06 13:53:19.426085114 +0530 Change: 2015-03-06 13:53:19.426085114 +0530 Birth: - # md5sum b1/file4 b6273b589df2dfdbd8fe35b1011e3183 b1/file4 On brick b2: # stat b2/file4 File: ‘b2/file4’ Size: 4 Blocks: 16 IO Block: 4096 regular file Device: fd03h/64771d Inode: 919358 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2015-03-06 13:52:35.761833096 +0530 Modify: 2015-03-06 13:52:35.769833142 +0530 Change: 2015-03-06 13:52:35.769833142 +0530 Birth: - # md5sum b2/file4 0bee89b07a248e27c83fc3d5951213c1 b2/file4
You can notice the differences in the md5 checksums, and the modify time. - Execute the following command
# gluster volume heal <VOLNAME> split-brain latest-mtime <FILE>
In this command, FILE can be either the full file name as seen from the root of the volume or the gfid-string representation of the file.For example,#gluster volume heal test-volume split-brain latest-mtime /file4 Healed /file4
After the healing is complete, the md5 checksum, file size, and modify time on both bricks must be same. The following is a sample output of the stat and md5 checksums command after completion of healing the file. You can notice that the file has been healed using the brick having the latest mtime (brick b1, in this example) as the source.On brick b1: # stat b1/file4 File: ‘b1/file4’ Size: 4 Blocks: 16 IO Block: 4096 regular file Device: fd03h/64771d Inode: 919356 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2015-03-06 14:23:38.944609863 +0530 Modify: 2015-03-06 13:53:19.426085114 +0530 Change: 2015-03-06 14:27:15.058927962 +0530 Birth: - # md5sum b1/file4 b6273b589df2dfdbd8fe35b1011e3183 b1/file4 On brick b2: # stat b2/file4 File: ‘b2/file4’ Size: 4 Blocks: 16 IO Block: 4096 regular file Device: fd03h/64771d Inode: 919358 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2015-03-06 14:23:38.944609000 +0530 Modify: 2015-03-06 13:53:19.426085000 +0530 Change: 2015-03-06 14:27:15.059927968 +0530 Birth: # md5sum b2/file4 b6273b589df2dfdbd8fe35b1011e3183 b2/file4
This method is useful if you know which file is to be considered as source.
- Run the following command to obtain the list of files that are in split-brain:
# gluster volume heal VOLNAME info split-brain
Brick <hostname:brickpath-b1> <gfid:aaca219f-0e25-4576-8689-3bfd93ca70c2> <gfid:39f301ae-4038-48c2-a889-7dac143e82dd> <gfid:c3c94de2-232d-4083-b534-5da17fc476ac> Number of entries in split-brain: 3 Brick <hostname:brickpath-b2> /dir/file1 /dir /file4 Number of entries in split-brain: 3
From the command output, identify the files that are in split-brain.You can find the differences in the file size and md5 checksums by performing a stat and md5 checksums on the file from the bricks. The following is the stat and md5 checksum output of a file:On brick b1: stat b1/file4 File: ‘b1/file4’ Size: 4 Blocks: 16 IO Block: 4096 regular file Device: fd03h/64771d Inode: 919356 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2015-03-06 13:53:19.417085062 +0530 Modify: 2015-03-06 13:53:19.426085114 +0530 Change: 2015-03-06 13:53:19.426085114 +0530 Birth: - # md5sum b1/file4 b6273b589df2dfdbd8fe35b1011e3183 b1/file4 On brick b2: # stat b2/file4 File: ‘b2/file4’ Size: 4 Blocks: 16 IO Block: 4096 regular file Device: fd03h/64771d Inode: 919358 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2015-03-06 13:52:35.761833096 +0530 Modify: 2015-03-06 13:52:35.769833142 +0530 Change: 2015-03-06 13:52:35.769833142 +0530 Birth: - # md5sum b2/file4 0bee89b07a248e27c83fc3d5951213c1 b2/file4
You can notice the differences in the file size and md5 checksums. - Execute the following command
# gluster volume heal <VOLNAME> split-brain source-brick <HOSTNAME:BRICKNAME> <FILE>
In this command, FILE present in <HOSTNAME:BRICKNAME> is taken as source for healing.For example,# gluster volume heal test-volume split-brain source-brick test-host:b1 /file4 Healed /file4
After the healing is complete, the md5 checksum and file size on both bricks must be same. The following is a sample output of the stat and md5 checksums command after completion of healing the file.On brick b1: # stat b1/file4 File: ‘b1/file4’ Size: 4 Blocks: 16 IO Block: 4096 regular file Device: fd03h/64771d Inode: 919356 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2015-03-06 14:23:38.944609863 +0530 Modify: 2015-03-06 13:53:19.426085114 +0530 Change: 2015-03-06 14:27:15.058927962 +0530 Birth: - # md5sum b1/file4 b6273b589df2dfdbd8fe35b1011e3183 b1/file4 On brick b2: # stat b2/file4 File: ‘b2/file4’ Size: 4 Blocks: 16 IO Block: 4096 regular file Device: fd03h/64771d Inode: 919358 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2015-03-06 14:23:38.944609000 +0530 Modify: 2015-03-06 13:53:19.426085000 +0530 Change: 2015-03-06 14:27:15.059927968 +0530 Birth: - # md5sum b2/file4 b6273b589df2dfdbd8fe35b1011e3183 b2/file4
This method is useful if you know want to use a particular brick as a source for the split-brain files in that replica pair.
- Run the following command to obtain the list of files that are in split-brain:
# gluster volume heal VOLNAME info split-brain
From the command output, identify the files that are in split-brain. - Execute the following command
# gluster volume heal <VOLNAME> split-brain source-brick <HOSTNAME:BRICKNAME>
In this command, for all the files that are in split-brain in this replica, <HOSTNAME:BRICKNAME> is taken as source for healing.For example,# gluster volume heal test-volume split-brain source-brick test-host:b1
11.15.3. Recovering GFID Split-brain from the gluster CLI
- Use bigger-file as source
- Use the file with latest mtime as source
- Use one replica as source for a particular file
Note
This method is useful for per file healing and where you can decided that the file with bigger size is to be considered as source.
- Run the following command to obtain the path of the file that is in split-brain:
#gluster volume heal VOLNAME info split-brain
From the output, identify the files for which file operations performed from the client failed with input/output error.For example,# gluster volume heal 12 info split-brain
Brick 10.70.47.45:/bricks/brick2/b0 /f5 / - Is in split-brain Status: Connected Number of entries: 2 Brick 10.70.47.144:/bricks/brick2/b1 /f5 / - Is in split-brain Status: Connected Number of entries: 2
In the above command, 12 is the volume name, b0 and b1 are the bricks. - Execute the below command on the brick to fetch information if a file is in GFID split-brain. The
getfattr
command is used to obtain and verify the AFR changelog extended attributes of the files.#getfattr -d -e hex -m. <path-to-file>
For example,On brick /b0 # getfattr -d -m . -e hex /bricks/brick2/b0/f5 getfattr: Removing leading '/' from absolute path names # file: bricks/brick2/b0/f5 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.12-client-1=0x000000020000000100000000 trusted.afr.dirty=0x000000000000000000000000 trusted.gfid=0xce0a9956928e40afb78e95f78defd64f trusted.gfid2path.9cde09916eabc845=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6635 On brick /b1 # getfattr -d -m . -e hex /bricks/brick2/b1/f5 getfattr: Removing leading '/' from absolute path names # file: bricks/brick2/b1/f5 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.12-client-0=0x000000020000000100000000 trusted.afr.dirty=0x000000000000000000000000 trusted.gfid=0x9563544118653550e888ab38c232e0c trusted.gfid2path.9cde09916eabc845=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6635
You can notice the difference in GFID for the file f5 in both the bricks.You can find the differences in the file size by executingstat
command on the file from the bricks. The following is the output of the file f5 in bricks b0 and b1:On brick /b0 # stat /bricks/brick2/b0/f5 File: ‘/bricks/brick2/b0/f5’ Size: 15 Blocks: 8 IO Block: 4096 regular file Device: fd15h/64789d Inode: 67113350 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Context: system_u:object_r:glusterd_brick_t:s0 Access: 2018-08-29 20:46:26.353751073 +0530 Modify: 2018-08-29 20:46:26.361751203 +0530 Change: 2018-08-29 20:47:16.363751236 +0530 Birth: - On brick /b1 # stat /bricks/brick2/b1/f5 File: ‘/bricks/brick2/b1/f5’ Size: 2 Blocks: 8 IO Block: 4096 regular file Device: fd15h/64789d Inode: 67111750 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Context: system_u:object_r:glusterd_brick_t:s0 Access: 2018-08-29 20:44:56.153301616 +0530 Modify: 2018-08-29 20:44:56.161301745 +0530 Change: 2018-08-29 20:44:56.162301761 +0530 Birth: -
- Execute the following command along with the full filename as seen from the root of the volume which is displayed in the
heal info
command's output:#gluster volume heal VOLNAME split-brain bigger-file FILE
For example,# gluster volume heal12 split-brain bigger-file /f5 GFID split-brain resolved for file /f5
After the healing is complete, the file size on both bricks must be the same as that of the file which had the bigger size. The following is a sample output of thegetfattr
command after completion of healing the file.On brick /b0 # getfattr -d -m . -e hex /bricks/brick2/b0/f5 getfattr: Removing leading '/' from absolute path names # file: bricks/brick2/b0/f5 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.gfid=0xce0a9956928e40afb78e95f78defd64f trusted.gfid2path.9cde09916eabc845=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6635 On brick /b1 # getfattr -d -m . -e hex /bricks/brick2/b1/f5 getfattr: Removing leading '/' from absolute path names # file: bricks/brick2/b1/f5 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.gfid=0xce0a9956928e40afb78e95f78defd64f trusted.gfid2path.9cde09916eabc845=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6635
This method is useful for per file healing and if you want the file with latest mtime has to be considered as source.
- Run the following command to obtain the list of files that are in split-brain:
# gluster volume heal VOLNAME info split-brain
From the output, identify the files for which file operations performed from the client failed with input/output error.For example,# gluster volume heal 12 info split-brain
Brick 10.70.47.45:/bricks/brick2/b0 /f4 / - Is in split-brain Status: Connected Number of entries: 2 Brick 10.70.47.144:/bricks/brick2/b1 /f4 / - Is in split-brain Status: Connected Number of entries: 2
In the above command, 12 is the volume name, b0 and b1 are the bricks. - The below command executed from backend provides information if a file is in GFID split-brain.
# getfattr -d -e hex -m. <path-to-file>
For example,On brick /b0 # getfattr -d -m . -e hex /bricks/brick2/b0/f4 getfattr: Removing leading '/' from absolute path names # file: bricks/brick2/b0/f4 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.12-client-1=0x000000020000000100000000 trusted.afr.dirty=0x000000000000000000000000 trusted.gfid=0xb66b66d07b315f3c9cffac2fb6422a28 trusted.gfid2path.364f55367c7bd6f4=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6634 On brick /b1 # getfattr -d -m . -e hex /bricks/brick2/b1/f4 getfattr: Removing leading '/' from absolute path names # file: bricks/brick2/b1/f4 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.12-client-0=0x000000020000000100000000 trusted.afr.dirty=0x000000000000000000000000 trusted.gfid=0x87242f808c6e56a007ef7d49d197acff trusted.gfid2path.364f55367c7bd6f4=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6634
You can notice the difference in GFID for the file f4 in both the bricks.You can find the difference in the modify time by executingstat
command on the file from the bricks. The following is the output of the file f4 in bricks b0 and b1:On brick /b0 # stat /bricks/brick2/b0/f4 File: ‘/bricks/brick2/b0/f4’ Size: 14 Blocks: 8 IO Block: 4096 regular file Device: fd15h/64789d Inode: 67113349 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Context: system_u:object_r:glusterd_brick_t:s0 Access: 2018-08-29 20:57:38.913629991 +0530 Modify: 2018-08-29 20:57:38.921630122 +0530 Change: 2018-08-29 20:57:38.923630154 +0530 Birth: - On brick /b1 # stat /bricks/brick2/b1/f4 File: ‘/bricks/brick2/b1/f4’ Size: 2 Blocks: 8 IO Block: 4096 regular file Device: fd15h/64789d Inode: 67111749 Links: 2 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Context: system_u:object_r:glusterd_brick_t:s0 Access: 2018-08-24 20:54:50.953217256 +0530 Modify: 2018-08-24 20:54:50.961217385 +0530 Change: 2018-08-24 20:54:50.962217402 +0530 Birth: -
- Execute the following command:
#gluster volume healVOLNAME split-brain latest-mtime FILE
For example,# gluster volume heal 12 split-brain latest-mtime /f4 GFID split-brain resolved for file /f4
After the healing is complete, the GFID of the files on both bricks must be same. The following is a sample output of thegetfattr
command after completion of healing the file. You can notice that the file has been healed using the brick having the latest mtime as the source.On brick /b0 # getfattr -d -m . -e hex /bricks/brick2/b0/f4 getfattr: Removing leading '/' from absolute path names # file: bricks/brick2/b0/f4 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.gfid=0xb66b66d07b315f3c9cffac2fb6422a28 trusted.gfid2path.364f55367c7bd6f4=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6634 On brick /b1 # getfattr -d -m . -e hex /bricks/brick2/b1/f4 getfattr: Removing leading '/' from absolute path names # file: bricks/brick2/b1/f4 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.gfid=0xb66b66d07b315f3c9cffac2fb6422a28 trusted.gfid2path.364f55367c7bd6f4=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6634
This method is useful if you know which file is to be considered as source.
- Run the following command to obtain the list of files that are in split-brain:
#gluster volume heal VOLNAME info split-brain
From the output, identify the files for which file operations performed from the client failed with input/output error.For example,# gluster volume heal 12 info split-brain
Brick 10.70.47.45:/bricks/brick2/b0 /f3 / - Is in split-brain Status: Connected Number of entries: 2 Brick 10.70.47.144:/bricks/brick2/b1 /f3 / - Is in split-brain Status: Connected Number of entries: 2
In the above command, 12 is the volume name, b0 and b1 are the bricks.Note
With one replica as source option, there is no way to resolve all the GFID split-brain in one shot by not specifying any file-path in the CLI as done for data/metadata split-brain resolutions.For each file in GFID split-brain, you have to run theheal
command separately. - The below command executed from backend provides information if a file is in GFID split-brain.
# getfattr -d -e hex -m. <path-to-file>
For example,# getfattr -d -m . -e hex /bricks/brick2/b0/f3 On brick /b0 getfattr: Removing leading '/' from absolute path names # file: bricks/brick2/b0/f3 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.12-client-1=0x000000020000000100000000 trusted.afr.dirty=0x000000000000000000000000 trusted.gfid=0x9d542fb1b3b15837a2f7f9dcdf5d6ee8 trusted.gfid2path.364f55367c7bd6f4=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6634 On brick /b1 # getfattr -d -m . -e hex /bricks/brick2/b1/f3 getfattr: Removing leading '/' from absolute path names # file: bricks/brick2/b0/f3 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.afr.12-client-1=0x000000020000000100000000 trusted.afr.dirty=0x000000000000000000000000 trusted.gfid=0xc90d9b0f65f6530b95b9f3f8334033df trusted.gfid2path.364f55367c7bd6f4=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6634
You can notice the difference in GFID for the file f3 in both the bricks. - Execute the following command:
#gluster volume heal VOLNAME split-brain source-brick HOSTNAME : export-directory-absolute-path FILE
In this command, FILE present in HOSTNAME : export-directory-absolute-path is taken as source for healing.For example,# gluster volume heal 12 split-brain source-brick 10.70.47.144:/bricks/brick2/b1 /f3 GFID split-brain resolved for file /f3
After the healing is complete, the GFID of the file on both the bricks should be same as that of the file which had bigger size. The following is a sample output of thegetfattr
command after the file is healed.On brick /b0 # getfattr -d -m . -e hex /bricks/brick2/b0/f3 getfattr: Removing leading '/' from absolute path names # file: bricks/brick2/b0/f3 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.gfid=0x90d9b0f65f6530b95b9f3f8334033df trusted.gfid2path.364f55367c7bd6f4=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6634 On brick /b1 # getfattr -d -m . -e hex /bricks/brick2/b1/f3 getfattr: Removing leading '/' from absolute path names # file: bricks/brick2/b1/f3 security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000 trusted.gfid=0x90d9b0f65f6530b95b9f3f8334033df trusted.gfid2path.364f55367c7bd6f4=0x30303030303030302d303030302d303030302d303030302d3030303030303030303030312f6634
Note
You can not use the GFID of the file as an argument with any of the CLI options to resolve GFID split-brain. It should be the absolute path as seen from the mount point to the file considered as source.With source-brick option there is no way to resolve all the GFID split-brain in one shot by not specifying any file-path in the CLI as done while resolving data or metadata split-brain. For each file in GFID split-brain, run the CLI with the policy you want to use.Resolving directory GFID split-brain using CLI with the "source-brick" option in a "distributed-replicated" volume needs to be done on all the volumes explicitly. Since directories get created on all the subvolumes, using one particular brick as source for directory GFID split-brain, heal the directories for that subvolume. In this case, other subvolumes must be healed using the brick which has same GFID as that of the previous brick which was used as source for healing other subvolume. For information on resolvingentry/type-mismatch
split-brain, see Chapter 25, Manually Recovering File Split-brain .
11.15.4. Triggering Self-Healing on Replicated Volumes
Self-heal daemon has the capability to handle multiple heals in parallel and is supported on Replicate and Distribute-replicate volumes. However, increasing the number of heals has impact on I/O performance so the following options have been provided. The cluster.shd-max-threads
volume option controls the number of entries that can be self healed in parallel on each replica by self-heal daemon using. Using cluster.shd-wait-qlength
volume option, you can configure the number of entries that must be kept in the queue for self-heal daemon threads to take up as soon as any of the threads are free to heal.
cluster.shd-max-threads
and cluster.shd-wait-qlength
volume set options, see Section 11.1, “Configuring Volume Options”.
- To view the list of files that need healing:
#
gluster volume heal VOLNAME info
For example, to view the list of files on test-volume that need healing:# gluster volume heal test-volume info Brick server1:/gfs/test-volume_0 Number of entries: 0 Brick server2:/gfs/test-volume_1 /95.txt /32.txt /66.txt /35.txt /18.txt /26.txt - Possibly undergoing heal /47.txt /55.txt /85.txt - Possibly undergoing heal ... Number of entries: 101
- To trigger self-healing only on the files which require healing:
#
gluster volume heal VOLNAME
For example, to trigger self-healing on files which require healing on test-volume:# gluster volume heal test-volume Heal operation on volume test-volume has been successful
- To trigger self-healing on all the files on a volume:
#
gluster volume heal VOLNAME full
For example, to trigger self-heal on all the files on test-volume:# gluster volume heal test-volume full Heal operation on volume test-volume has been successful
- To view the list of files on a volume that are in a split-brain state:
#
gluster volume heal VOLNAME info split-brain
For example, to view the list of files on test-volume that are in a split-brain state:# gluster volume heal test-volume info split-brain Brick server1:/gfs/test-volume_2 Number of entries: 12 at path on brick ---------------------------------- 2012-06-13 04:02:05 /dir/file.83 2012-06-13 04:02:05 /dir/file.28 2012-06-13 04:02:05 /dir/file.69 Brick server2:/gfs/test-volume_2 Number of entries: 12 at path on brick ---------------------------------- 2012-06-13 04:02:05 /dir/file.83 2012-06-13 04:02:05 /dir/file.28 2012-06-13 04:02:05 /dir/file.69 ...