8.6. Replacing Hosts


8.6.1. Replacing a Host Machine with a Different Hostname

You can replace a failed host machine with another host having a different Hostname.

Important

Ensure that the new peer has the exact disk capacity as that of the one it is replacing. For example, if the peer in the cluster has two 100GB drives, then the new peer must have the same disk capacity and number of drives.
In the following example the original machine which has had an irrecoverable failure is sys0.example.com and the replacement machine is sys5.example.com. The brick with an unrecoverable failure is sys0.example.com:/rhs/brick1/b1 and the replacement brick is sys5.example.com:/rhs/brick1/b1.
  1. Probe the new peer from one of the existing peers to bring it into the cluster.
    Copy to Clipboard Toggle word wrap
    # gluster peer probe sys5.example.com
  2. Ensure that the new brick (sys5.example.com:/rhs/brick1/b1) that replaces the old brick (sys0.example.com:/rhs/brick1/b1) is empty.
  3. Create a FUSE mount point from any server to edit the extended attributes.
    Copy to Clipboard Toggle word wrap
    #mount -t glusterfs server-ip:/VOLNAME mount-point
  4. Perform the following operations to change the Automatic File Replication extended attributes so that the heal process happens from the other brick (sys1.example.com:/rhs/brick1/b1) in the replica pair to the new brick (sys5.example.com:/rhs/brick1/b1). Note that /mnt/r2 is the FUSE mount path.
    1. Create a new directory on the mount point and ensure that a directory with such a name is not already present.
      Copy to Clipboard Toggle word wrap
      #mkdir /mnt/r2/<name-of-nonexistent-dir>
    2. Delete the directory and set and delete the extended attributes.
      Copy to Clipboard Toggle word wrap
      #rmdir /mnt/r2/<name-of-nonexistent-dir>
                 #setfattr -n trusted.non-existent-key -v abc /mnt/r2
                 #setfattr -x trusted.non-existent-key /mnt/r2
    3. Ensure that the extended attributes on the other bricks in the replica (in this example, trusted.afr.vol-client-0) is not set to zero.
      Copy to Clipboard Toggle word wrap
      #getfattr -d -m. -e hex /rhs/brick1/b1
      # file: rhs/brick1/b1
      security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 
      trusted.afr.vol-client-0=0x000000000000000300000002  
      trusted.afr.vol-client-1=0x000000000000000000000000 
      trusted.gfid=0x00000000000000000000000000000001 
      trusted.glusterfs.dht=0x0000000100000000000000007ffffffe 
      trusted.glusterfs.volume-id=0xde822e25ebd049ea83bfaa3c4be2b440
  5. Retrieve the brick paths in sys0.example.com using the following command:
    Copy to Clipboard Toggle word wrap
    #gluster volume info <VOLNAME>
    Copy to Clipboard Toggle word wrap
     Volume Name: vol
    Type: Replicate
    Volume ID: 0xde822e25ebd049ea83bfaa3c4be2b440
    Status: Started
    Snap Volume: no
    Number of Bricks: 1 x 2 = 2
    Transport-type: tcp
    Bricks:
    Brick1: sys0.example.com:/rhs/brick1/b1
    Brick2: sys1.example.com:/rhs/brick1/b1
    Options Reconfigured:
    performance.readdir-ahead: on
    snap-max-hard-limit: 256
    snap-max-soft-limit: 90
    auto-delete: disable
    
    Brick path in sys0.example.com is /rhs/brick1/b1. This has to be replaced with the brick in the newly added host, sys5.example.com
  6. Create the required brick path in sys5.example.com.For example, if /rhs/brick is the XFS mount point in sys5.example.com, then create a brick directory in that path.
    Copy to Clipboard Toggle word wrap
    # mkdir /rhs/brick1/b1
  7. Execute the replace-brick command with the force option:
    Copy to Clipboard Toggle word wrap
    				# gluster volume replace-brick vol sys0.example.com:/rhs/brick1/b1 sys5.example.com:/rhs/brick1/b1 commit force
       volume replace-brick: success: replace-brick commit successful
    
  8. Verify if the new brick is online.
    Copy to Clipboard Toggle word wrap
    				# gluster volume status
    Status of volume: vol 
    Gluster process                                  Port    Online Pid 
    Brick sys5.example.com:/rhs/brick1/b1            49156    Y    5731 
    Brick sys1.example.com:/rhs/brick1/b1            49153    Y    5354
    
  9. Initiate self-heal on the volume. The status of the heal process can be seen by executing the command:
    Copy to Clipboard Toggle word wrap
    #gluster volume heal VOLNAME full
  10. The status of the heal process can be seen by executing the command:
    Copy to Clipboard Toggle word wrap
    # gluster volume heal VOLNAME info
  11. Detach the original machine from the trusted pool.
    Copy to Clipboard Toggle word wrap
    #gluster peer detach sys0.example.com
  12. Ensure that after the self-heal completes, the extended attributes are set to zero on the other bricks in the replica.
    Copy to Clipboard Toggle word wrap
    #getfattr -d -m. -e hex /rhs/brick1/b1
    getfattr: Removing leading '/' from absolute path names 
    #file: rhs/brick1/b1 
    security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 
    trusted.afr.vol-client-0=0x000000000000000000000000
    trusted.afr.vol-client-1=0x000000000000000000000000 
    trusted.gfid=0x00000000000000000000000000000001 
    trusted.glusterfs.dht=0x0000000100000000000000007ffffffe
    trusted.glusterfs.volume-id=0xde822e25ebd049ea83bfaa3c4be2b440
    

    Note

    Note that in this example, the extended attributes trusted.afr.vol-client-0 and trusted.afr.vol-client-1 are set to zero.

8.6.2. Replacing a Host Machine with the Same Hostname

You can replace a failed host with another node having the same FQDN (Fully Qualified Domain Name). A host in a Red Hat Storage Trusted Storage Pool has its own identity called the UUID generated by the glusterFS Management Daemon.The UUID for the host is available in /var/lib/glusterd/glusterd/info file.

Warning

Do not perform this procedure on Geo-replicated volumes.
In the following example, the host with the FQDN as sys0.example.com was irrecoverable and must to be replaced with a host, having the same FQDN. The following steps have to be performed on the new host.
  1. Stop the glusterd service on the sys0.example.com.
    Copy to Clipboard Toggle word wrap
    # service glusterd stop
  2. Retrieve the UUID of the failed host (sys0.example.com) from another of the Red Hat Storage Trusted Storage Pool by executing the following command:
    Copy to Clipboard Toggle word wrap
    # gluster peer status
    Number of Peers: 2
    
    Hostname: sys1.example.com
    Uuid: 1d9677dc-6159-405e-9319-ad85ec030880
    State: Peer in Cluster (Connected)
    
    Hostname: sys0.example.com
    Uuid: b5ab2ec3-5411-45fa-a30f-43bd04caf96b
    State: Peer Rejected (Connected)
    
    Note that the UUID of the failed host is b5ab2ec3-5411-45fa-a30f-43bd04caf96b
  3. Edit the glusterd.info file in the new host and include the UUID of the host you retrieved in the previous step.
    Copy to Clipboard Toggle word wrap
    # cat /var/lib/glusterd/glusterd.info
    UUID=b5ab2ec3-5411-45fa-a30f-43bd04caf96b
    operating-version=30000
  4. Select any host (say for example, sys1.example.com) in the Red Hat Storage Trusted Storage Pool and retrieve its UUID from the glusterd.info file.
    Copy to Clipboard Toggle word wrap
    # grep -i uuid /var/lib/glusterd/glusterd.info
    UUID=8cc6377d-0153-4540-b965-a4015494461c
  5. Gather the peer information files from the host (sys1.example.com) in the previous step. Execute the following command in that host (sys1.example.com) of the cluster.
    Copy to Clipboard Toggle word wrap
    # cp -a /var/lib/glusterd/peers /tmp/
  6. Remove the peer file corresponding to the failed host (sys0.example.com) from the /tmp/peers directory.
    Copy to Clipboard Toggle word wrap
    # rm /tmp/peers/b5ab2ec3-5411-45fa-a30f-43bd04caf96b
    Note that the UUID corresponds to the UUID of the failed host (sys0.example.com) retrieved in Step 2.
  7. Archive all the files and copy those to the failed host(sys0.example.com).
    Copy to Clipboard Toggle word wrap
    # cd /tmp; tar -cvf peers.tar peers
  8. Copy the above created file to the new peer.
    Copy to Clipboard Toggle word wrap
    # scp /tmp/peers.tar root@sys0.example.com:/tmp
  9. Copy the extracted content to the /var/lib/glusterd/peers directory. Execute the following command in the newly added host with the same name (sys0.example.com) and IP Address.
    Copy to Clipboard Toggle word wrap
    # tar -xvf /tmp/peers.tar
    # cp peers/* /var/lib/glusterd/peers/
  10. Select any other host in the cluster other than the node (sys1.example.com) selected in step 4. Copy the peer file corresponding to the UUID of the host retrieved in Step 4 to the new host (sys0.example.com) by executing the following command:
    Copy to Clipboard Toggle word wrap
    # scp /var/lib/glusterd/peers/<UUID-retrieved-from-step4> root@Example1:/var/lib/glusterd/peers/
  11. Retrieve the brick directory information, by executing the following command in any host in the cluster.
    Copy to Clipboard Toggle word wrap
    # gluster volume info
    Copy to Clipboard Toggle word wrap
    Volume Name: vol
    Type: Replicate
    Volume ID: 0x8f16258c88a0498fbd53368706af7496
    Status: Started
    Snap Volume: no
    Number of Bricks: 1 x 2 = 2
    Transport-type: tcp
    Bricks:
    Brick1: sys0.example.com:/rhs/brick1/b1
    Brick2: sys1.example.com:/rhs/brick1/b1
    Options Reconfigured:
    performance.readdir-ahead: on
    snap-max-hard-limit: 256
    snap-max-soft-limit: 90
    auto-delete: disable
    In the above example, the brick path in sys0.example.com is, /rhs/brick1/b1. If the brick path does not exist in sys0.example.com, perform steps a, b, and c.
    1. Create a brick path in the host, sys0.example.com.
      Copy to Clipboard Toggle word wrap
      mkdir /rhs/brick1/b1
    2. Retrieve the volume ID from the existing brick of another host by executing the following command on any host that contains the bricks for the volume.
      Copy to Clipboard Toggle word wrap
      # getfattr -d -m. -ehex <brick-path>
      Copy the volume-id.
      Copy to Clipboard Toggle word wrap
      # getfattr -d -m. -ehex /rhs/brick1/b1
      getfattr: Removing leading '/' from absolute path names
      # file: rhs/brick1/b1
      trusted.afr.vol-client-0=0x000000000000000000000000
      trusted.afr.vol-client-1=0x000000000000000000000000
      trusted.gfid=0x00000000000000000000000000000001
      trusted.glusterfs.dht=0x0000000100000000000000007ffffffe
      trusted.glusterfs.volume-id=0x8f16258c88a0498fbd53368706af7496
      In the above example, the volume id is 0x8f16258c88a0498fbd53368706af7496
    3. Set this volume ID on the brick created in the newly added host and execute the following command on the newly added host (sys0.example.com).
      Copy to Clipboard Toggle word wrap
      # setfattr -n trusted.glusterfs.volume-id -v <volume-id> <brick-path>
      For Example:
      Copy to Clipboard Toggle word wrap
      # setfattr -n trusted.glusterfs.volume-id -v 0x8f16258c88a0498fbd53368706af7496 /rhs/brick2/drv2
    Data recovery is possible only if the volume type is replicate or distribute-replicate. If the volume type is plain distribute, you can skip steps 12 and 13.
  12. Create a FUSE mount point to mount the glusterFS volume.
    Copy to Clipboard Toggle word wrap
    # mount -t glusterfs <server-name>:/VOLNAME <mount>
  13. Perform the following operations to change the Automatic File Replication extended attributes so that the heal process happens from the other brick (sys1.example.com:/rhs/brick1/b1) in the replica pair to the new brick (sys0.example.com:/rhs/brick1/b1). Note that /mnt/r2 is the FUSE mount path.
    1. Create a new directory on the mount point and ensure that a directory with such a name is not already present.
      Copy to Clipboard Toggle word wrap
      #mkdir /mnt/r2/<name-of-nonexistent-dir>
    2. Delete the directory and set the extended attributes.
      Copy to Clipboard Toggle word wrap
       #rmdir /mnt/r2/<name-of-nonexistent-dir>
      #setfattr -n trusted.non-existent-key -v abc /mnt/r2
      #setfattr -x trusted.non-existent-key /mnt/r2
    3. Ensure that the extended attributes on the other bricks in the replica (in this example, trusted.afr.vol-client-0) is not set to zero.
      Copy to Clipboard Toggle word wrap
      #getfattr -d -m. -e hex /rhs/brick1/b1 # file: rhs/brick1/b1
      security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 
      trusted.afr.vol-client-0=0x000000000000000300000002  
      trusted.afr.vol-client-1=0x000000000000000000000000 
      trusted.gfid=0x00000000000000000000000000000001 
      trusted.glusterfs.dht=0x0000000100000000000000007ffffffe 
      trusted.glusterfs.volume-id=0x8f16258c88a0498fbd53368706af7496
  14. Start the glusterd service.
    Copy to Clipboard Toggle word wrap
    # service glusterd start
  15. Perform the self-heal operation on the restored volume.
    Copy to Clipboard Toggle word wrap
    # gluster volume heal VOLNAME full
  16. You can view the gluster volume self-heal status by executing the following command:
    Copy to Clipboard Toggle word wrap
    # gluster volume heal VOLNAME info
Replacing a host with the same Hostname in a two-node Red Hat Storage Trusted Storage Pool

If there are only 2 hosts in the Red Hat Storage Trusted Storage Pool where the host sys0.example.com must be replaced, perform the following steps:

  1. Stop the glusterd service on sys0.example.com.
    Copy to Clipboard Toggle word wrap
    # service glusterd stop
  2. Retrieve the UUID of the failed host (sys0.example.com) from another peer in the Red Hat Storage Trusted Storage Pool by executing the following command:
    Copy to Clipboard Toggle word wrap
    # gluster peer status
    Number of Peers: 1
    
    Hostname: sys0.example.com
    Uuid: b5ab2ec3-5411-45fa-a30f-43bd04caf96b
    State: Peer Rejected (Connected)
    
    Note that the UUID of the failed host is b5ab2ec3-5411-45fa-a30f-43bd04caf96b
  3. Edit the glusterd.info file in the new host (sys0.example.com) and include the UUID of the host you retrieved in the previous step.
    Copy to Clipboard Toggle word wrap
    # cat /var/lib/glusterd/glusterd.info
    UUID=b5ab2ec3-5411-45fa-a30f-43bd04caf96b
    operating-version=30000
  4. Create the peer file in the newly created host (sys0.example.com) in /var/lib/glusterd/peers/<uuid-of-other-peer> with the name of the UUID of the other host (sys1.example.com).
    UUID of the host can be obtained with the following:
    Copy to Clipboard Toggle word wrap
    # gluster system:: uuid get

    Example 8.7. Example to obtain the UUID of a host

    Copy to Clipboard Toggle word wrap
    For example,
    # gluster system:: uuid get
    UUID: 1d9677dc-6159-405e-9319-ad85ec030880
    In this case the UUID of other peer is 1d9677dc-6159-405e-9319-ad85ec030880
  5. Create a file /var/lib/glusterd/peers/1d9677dc-6159-405e-9319-ad85ec030880 in sys0.example.com, with the following command:
    Copy to Clipboard Toggle word wrap
    # touch /var/lib/glusterd/peers/1d9677dc-6159-405e-9319-ad85ec030880
    The file you create must contain the following information:
    Copy to Clipboard Toggle word wrap
    UUID=<uuid-of-other-node>
    state=3
    hostname=<hostname>
  6. Continue to perform steps 11 to 16 as documented in the previous procedure.
Back to top
Red Hat logoGithubredditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust. Explore our recent updates.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

Theme

© 2025 Red Hat, Inc.