このコンテンツは選択した言語では利用できません。
Chapter 25. Manually Recovering File Split-brain
- Run the following command to obtain the path of the file that is in split-brain:
gluster volume heal VOLNAME info split-brain
# gluster volume heal VOLNAME info split-brainCopy to Clipboard Copied! Toggle word wrap Toggle overflow From the command output, identify the files for which file operations performed from the client keep failing with Input/Output error. - Close the applications that opened split-brain file from the mount point. If you are using a virtual machine, you must power off the machine.
- Obtain and verify the AFR changelog extended attributes of the file using the
getfattrcommand. Then identify the type of split-brain to determine which of the bricks contains the 'good copy' of the file.getfattr -d -m . -e hex <file-path-on-brick>
getfattr -d -m . -e hex <file-path-on-brick>Copy to Clipboard Copied! Toggle word wrap Toggle overflow For example,Copy to Clipboard Copied! Toggle word wrap Toggle overflow The extended attributes withtrusted.afr.VOLNAMEvolname-client-<subvolume-index>are used by AFR to maintain changelog of the file. The values of thetrusted.afr.VOLNAMEvolname-client-<subvolume-index>are calculated by the glusterFS client (FUSE or NFS-server) processes. When the glusterFS client modifies a file or directory, the client contacts each brick and updates the changelog extended attribute according to the response of the brick.subvolume-indexis thebrick number - 1ofgluster volume info VOLNAMEoutput.For example,Copy to Clipboard Copied! Toggle word wrap Toggle overflow In the example above:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Each file in a brick maintains the changelog of itself and that of the files present in all the other bricks in it's replica set as seen by that brick.In the example volume given above, all files in brick-a will have 2 entries, one for itself and the other for the file present in it's replica pair. The following is the changelog for brick2,- trusted.afr.vol-client-0=0x000000000000000000000000 - is the changelog for itself (brick1)
- trusted.afr.vol-client-1=0x000000000000000000000000 - changelog for brick2 as seen by brick1
Likewise, all files in brick2 will have the following:- trusted.afr.vol-client-0=0x000000000000000000000000 - changelog for brick1 as seen by brick2
- trusted.afr.vol-client-1=0x000000000000000000000000 - changelog for itself (brick2)
Note
These files do not have entries for themselves, only for the other bricks in the replica. For example,brick1will only havetrusted.afr.vol-client-1set andbrick2will only havetrusted.afr.vol-client-0set. Interpreting the changelog remains same as explained below.The same can be extended for other replica pairs.Interpreting changelog (approximate pending operation count) valueEach extended attribute has a value which is 24 hexa decimal digits. First 8 digits represent changelog of data. Second 8 digits represent changelog of metadata. Last 8 digits represent Changelog of directory entries.
Pictorially representing the same is as follows:0x 000003d7 00000001 00000000110 | | | | | \_ changelog of directory entries | \_ changelog of metadata \ _ changelog of data0x 000003d7 00000001 00000000110 | | | | | \_ changelog of directory entries | \_ changelog of metadata \ _ changelog of dataCopy to Clipboard Copied! Toggle word wrap Toggle overflow For directories, metadata and entry changelogs are valid. For regular files, data and metadata changelogs are valid. For special files like device files and so on, metadata changelog is valid. When a file split-brain happens it could be either be data split-brain or meta-data split-brain or both.The following is an example of both data, metadata split-brain on the same file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Scrutinize the changelogsThe changelog extended attributes on file/rhgs/brick1/aare as follows:- The first 8 digits of
trusted.afr.vol-client-0 are all zeros (0x00000000................),The first 8 digits oftrusted.afr.vol-client-1are not all zeros (0x000003d7................).So the changelog on/rhgs/brick-a/aimplies that some data operations succeeded on itself but failed on/rhgs/brick2/a. - The second 8 digits of
trusted.afr.vol-client-0 are all zeros (0x........00000000........), and the second 8 digits oftrusted.afr.vol-client-1are not all zeros (0x........00000001........).So the changelog on/rhgs/brick1/aimplies that some metadata operations succeeded on itself but failed on/rhgs/brick2/a.
The changelog extended attributes on file/rhgs/brick2/aare as follows:- The first 8 digits of
trusted.afr.vol-client-0are not all zeros (0x000003b0................).The first 8 digits oftrusted.afr.vol-client-1are all zeros (0x00000000................).So the changelog on/rhgs/brick2/aimplies that some data operations succeeded on itself but failed on/rhgs/brick1/a. - The second 8 digits of
trusted.afr.vol-client-0are not all zeros (0x........00000001........)The second 8 digits oftrusted.afr.vol-client-1are all zeros (0x........00000000........).So the changelog on/rhgs/brick2/aimplies that some metadata operations succeeded on itself but failed on/rhgs/brick1/a.
Here, both the copies have data, metadata changes that are not on the other file. Hence, it is both data and metadata split-brain.Deciding on the correct copyYou must inspect
statandgetfattroutput of the files to decide which metadata to retain and contents of the file to decide which data to retain. To continue with the example above, here, we are retaining the data of/rhgs/brick1/aand metadata of/rhgs/brick2/a.Resetting the relevant changelogs to resolve the split-brainResolving data split-brainYou must change the changelog extended attributes on the files as if some data operations succeeded on
/rhgs/brick1/abut failed on /rhgs/brick-b/a. But/rhgs/brick2/ashouldnothave any changelog showing data operations succeeded on/rhgs/brick2/abut failed on/rhgs/brick1/a. You must reset the data part of the changelog ontrusted.afr.vol-client-0of/rhgs/brick2/a.Resolving metadata split-brainYou must change the changelog extended attributes on the files as if some metadata operations succeeded on/rhgs/brick2/abut failed on/rhgs/brick1/a. But/rhgs/brick1/ashouldnothave any changelog which says some metadata operations succeeded on/rhgs/brick1/abut failed on/rhgs/brick2/a. You must reset metadata part of the changelog ontrusted.afr.vol-client-1of/rhgs/brick1/aRun the following commands to reset the extended attributes.- On
/rhgs/brick2/a, fortrusted.afr.vol-client-0 0x000003b00000000100000000to0x000000000000000100000000, execute the following command:setfattr -n trusted.afr.vol-client-0 -v 0x000000000000000100000000 /rhgs/brick2/a
# setfattr -n trusted.afr.vol-client-0 -v 0x000000000000000100000000 /rhgs/brick2/aCopy to Clipboard Copied! Toggle word wrap Toggle overflow - On
/rhgs/brick1/a, fortrusted.afr.vol-client-1 0x0000000000000000ffffffffto0x000003d70000000000000000, execute the following command:setfattr -n trusted.afr.vol-client-1 -v 0x000003d70000000000000000 /rhgs/brick1/a
# setfattr -n trusted.afr.vol-client-1 -v 0x000003d70000000000000000 /rhgs/brick1/aCopy to Clipboard Copied! Toggle word wrap Toggle overflow
After you reset the extended attributes, the changelogs would look similar to the following:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Resolving Directory entry split-brainAFR has the ability to conservatively merge different entries in the directories when there is a split-brain on directory. If on one brick directory
storagehas entries1,2and has entries3,4on the other brick then AFR will merge all of the entries in the directory to have1, 2, 3, 4entries in the same directory. But this may result in deleted files to re-appear in case the split-brain happens because of deletion of files on the directory. Split-brain resolution needs human intervention when there is at least one entry which has same file name but differentgfidin that directory.For example:Onbrick-athe directory has 2 entriesfile1withgfid_xandfile2. Onbrick-bdirectory has 2 entriesfile1withgfid_yandfile3. Here the gfid's offile1on the bricks are different. These kinds of directory split-brain needs human intervention to resolve the issue. You must remove eitherfile1onbrick-aor thefile1onbrick-bto resolve the split-brain.In addition, the correspondinggfid-linkfile must be removed. Thegfid-linkfiles are present in the .glusterfsdirectory in the top-level directory of the brick. If the gfid of the file is0x307a5c9efddd4e7c96e94fd4bcdcbd1b(the trusted.gfid extended attribute received from thegetfattrcommand earlier), the gfid-link file can be found at/rhgs/brick1/.glusterfs/30/7a/307a5c9efddd4e7c96e94fd4bcdcbd1b.Warning
Before deleting thegfid-link, you must ensure that there are no hard links to the file present on that brick. If hard-links exist, you must delete them. - Trigger self-heal by running the following command:
ls -l <file-path-on-gluster-mount>
# ls -l <file-path-on-gluster-mount>Copy to Clipboard Copied! Toggle word wrap Toggle overflow orgluster volume heal VOLNAME
# gluster volume heal VOLNAMECopy to Clipboard Copied! Toggle word wrap Toggle overflow