이 콘텐츠는 선택한 언어로 제공되지 않습니다.

3.11. The GFS2 Withdraw Function


The GFS2 withdraw function is a data integrity feature of the GFS2 file system that prevents potential file system damage due to faulty hardware or kernel software. If the GFS2 kernel module detects an inconsistency while using a GFS2 file system on any given cluster node, it withdraws from the file system, leaving it unavailable to that node until it is unmounted and remounted (or the machine detecting the problem is rebooted). All other mounted GFS2 file systems remain fully functional on that node. (The GFS2 withdraw function is less severe than a kernel panic, which causes the node to be fenced.)
The main categories of inconsistency that can cause a GFS2 withdraw are as follows:
  • Inode consistency error
  • Resource group consistency error
  • Journal consistency error
  • Magic number metadata consistency error
  • Metadata type consistency error
An example of an inconsistency that would cause a GFS2 withdraw is an incorrect block count for a file’s inode. When GFS2 deletes a file, it systematically removes all the data and metadata blocks referenced by that file. When done, it checks the inode’s block count. If the block count is not 1 (meaning all that is left is the disk inode itself), that indicates a file system inconsistency, since the inode’s block count did not match the actual blocks used for the file.
In many cases, the problem may have been caused by faulty hardware (faulty memory, motherboard, HBA, disk drives, cables, and so forth). It may also have been caused by a kernel bug (another kernel module accidentally overwriting GFS2’s memory), or actual file system damage (caused by a GFS2 bug).
In most cases, the GFS2 inconsistency is fixed by rebooting the cluster node. Before rebooting the cluster node, disable the GFS2 file system “clone” service from Pacemaker, which unmounts the file system on that node only.
# pcs resource disable --wait=100 mydata_fs_clone
# /sbin/reboot

Warning

Do not try to unmount and remount the file system manually with the umount and mount commands. You must use the pcs command, otherwise Pacemaker will detect the file system service has disappeared and fence the node.
The consistency problem that caused the withdraw may make stopping the file system service impossible as it may cause the system to hang.
If the problem persists after a remount, you should stop the file system service to unmount the file system from all nodes in the cluster, then perform a file system check with the fsck.gfs2 command before restarting the service with the following procedure.
  1. Reboot the affected node.
  2. Disable the non-clone file system service in Pacemaker to unmount the file system from every node in the cluster.
    # pcs resource disable --wait=100 mydata_fs
  3. From one node of the cluster, run the fsck.gfs2 command on the file system device to check for and repair any file system damage.
    # fsck.gfs2 -y /dev/vg_mydata/mydata > /tmp/fsck.out
  4. Remount the GFS2 file system from all nodes by re-enabling the file system service:
    # pcs resource enable --wait=100 mydata_fs
You can override the GFS2 withdraw function by mounting the file system with the -o errors=panic option specified in the file system service.
# pcs resource update mydata_fs “options=noatime,errors=panic”
When this option is specified, any errors that would normally cause the system to withdraw force a kernel panic instead. This stops the node's communications, which causes the node to be fenced. This is especially useful for clusters that are left unattended for long periods of time without monitoring or intervention.
Internally, the GFS2 withdraw function works by disconnecting the locking protocol to ensure that all further file system operations result in I/O errors. As a result, when the withdraw occurs, it is normal to see a number of I/O errors from the device mapper device reported in the system logs.
Red Hat logoGithubRedditYoutubeTwitter

자세한 정보

평가판, 구매 및 판매

커뮤니티

Red Hat 문서 정보

Red Hat을 사용하는 고객은 신뢰할 수 있는 콘텐츠가 포함된 제품과 서비스를 통해 혁신하고 목표를 달성할 수 있습니다.

보다 포괄적 수용을 위한 오픈 소스 용어 교체

Red Hat은 코드, 문서, 웹 속성에서 문제가 있는 언어를 교체하기 위해 최선을 다하고 있습니다. 자세한 내용은 다음을 참조하세요.Red Hat 블로그.

Red Hat 소개

Red Hat은 기업이 핵심 데이터 센터에서 네트워크 에지에 이르기까지 플랫폼과 환경 전반에서 더 쉽게 작업할 수 있도록 강화된 솔루션을 제공합니다.

© 2024 Red Hat, Inc.