11.10. Replacing Hosts


Before replacing hosts ensure that the new peer has the exact disk capacity as that of the one it is replacing. For example, if the peer in the cluster has two 100GB drives, then the new peer must have the same disk capacity and number of drives. Also, steps described in this section can be performed on other volumes types as well, refer to Section 11.9, “Migrating Volumes” when performing replace and reset operations on the volumes.

11.10.1. Replacing a Host Machine with a Different Hostname

You can replace a failed host machine with another host that has a different hostname. In the following example the original machine which has had an irrecoverable failure is server0.example.com and the replacement machine is server5.example.com. The brick with an unrecoverable failure is server0.example.com:/rhgs/brick1 and the replacement brick is server5.example.com:/rhgs/brick1.
  1. Stop the geo-replication session if configured by executing the following command.
    # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop force
  2. Probe the new peer from one of the existing peers to bring it into the cluster.
    # gluster peer probe server5.example.com
  3. Ensure that the new brick (server5.example.com:/rhgs/brick1) that is replacing the old brick (server0.example.com:/rhgs/brick1) is empty.
  4. If the geo-replication session is configured, perform the following steps:
    1. Setup the geo-replication session by generating the ssh keys:
      # gluster system:: execute gsec_create 
    2. Create geo-replication session again with force option to distribute the keys from new nodes to Slave nodes.
      # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL create push-pem force
    3. After successfully setting up the shared storage volume, when a new node is replaced in the cluster, the shared storage is not mounted automatically on this node. Neither is the /etc/fstab entry added for the shared storage on this node. To make use of shared storage on this node, execute the following commands:
      # mount -t glusterfs local node's ip:gluster_shared_storage
      /var/run/gluster/shared_storage
      # cp /etc/fstab /var/run/gluster/fstab.tmp
      # echo  local node's ip:/gluster_shared_storage
      /var/run/gluster/shared_storage/ glusterfs defaults 0 0" >> /etc/fstab

      Note

      With the release of 3.5 Batch Update 3, the mount point of shared storage is changed from /var/run/gluster/ to /run/gluster/ .
      For more information on setting up shared storage volume, see Section 11.12, “Setting up Shared Storage Volume”.
    4. Configure the meta-volume for geo-replication:
      # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL config use_meta_volume true
      For more information on configuring meta-volume, see Section 10.3.5, “Configuring a Meta-Volume”.
  5. Retrieve the brick paths in server0.example.com using the following command:
    # gluster volume info <VOLNAME>
    Volume Name: vol
    Type: Replicate
    Volume ID: 0xde822e25ebd049ea83bfaa3c4be2b440
    Status: Started
    Snap Volume: no
    Number of Bricks: 1 x 2 = 2
    Transport-type: tcp
    Bricks:
    Brick1: server0.example.com:/rhgs/brick1
    Brick2: server1.example.com:/rhgs/brick1
    Options Reconfigured:
    cluster.granular-entry-heal: on
    performance.readdir-ahead: on
    snap-max-hard-limit: 256
    snap-max-soft-limit: 90
    auto-delete: disable
    
    Brick path in server0.example.com is /rhgs/brick1. This has to be replaced with the brick in the newly added host, server5.example.com.
  6. Create the required brick path in server5.example.com.For example, if /rhs/brick is the XFS mount point in server5.example.com, then create a brick directory in that path.
    # mkdir /rhgs/brick1
  7. Execute the replace-brick command with the force option:
    # gluster volume replace-brick vol server0.example.com:/rhgs/brick1 server5.example.com:/rhgs/brick1 commit force
    volume replace-brick: success: replace-brick commit successful
  8. Verify that the new brick is online.
    # gluster volume status
    Status of volume: vol
    Gluster process                                  Port    Online Pid
    Brick server5.example.com:/rhgs/brick1           49156    Y    5731
    Brick server1.example.com:/rhgs/brick1            49153    Y    5354
  9. Initiate self-heal on the volume. The status of the heal process can be seen by executing the command:
    # gluster volume heal VOLNAME
  10. The status of the heal process can be seen by executing the command:
    # gluster volume heal VOLNAME info
  11. Detach the original machine from the trusted pool.
    # gluster peer detach (server)
    All clients mounted through the peer which is getting detached need to be remounted, using one of the other active peers in the trusted storage pool, this ensures that the client gets notification on any changes done on the gluster configuration and if the same has been done do you want to proceed? (y/n) y
    peer detach: success
  12. Ensure that after the self-heal completes, the extended attributes are set to zero on the other bricks in the replica.
    # getfattr -d -m. -e hex /rhgs/brick1
    getfattr: Removing leading '/' from absolute path names
    #file: rhgs/brick1
    security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
    trusted.afr.vol-client-0=0x000000000000000000000000
    trusted.afr.vol-client-1=0x000000000000000000000000
    trusted.gfid=0x00000000000000000000000000000001
    trusted.glusterfs.dht=0x0000000100000000000000007ffffffe
    trusted.glusterfs.volume-id=0xde822e25ebd049ea83bfaa3c4be2b440
    In this example, the extended attributes trusted.afr.vol-client-0 and trusted.afr.vol-client-1 have zero values. This means that the data on the two bricks is identical. If these attributes are not zero after self-heal is completed, the data has not been synchronised correctly.
  13. Start the geo-replication session using force option:
    # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL start force

11.10.2. Replacing a Host Machine with the Same Hostname

You can replace a failed host with another node having the same FQDN (Fully Qualified Domain Name). A host in a Red Hat Gluster Storage Trusted Storage Pool has its own identity called the UUID generated by the glusterFS Management Daemon.The UUID for the host is available in /var/lib/glusterd/glusterd.info file.
In the following example, the host with the FQDN as server0.example.com was irrecoverable and must to be replaced with a host, having the same FQDN. The following steps have to be performed on the new host.
  1. Stop the geo-replication session if configured by executing the following command.
     # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop force 
  2. Stop the glusterd service on the server0.example.com.
    On RHEL 7 and RHEL 8, run
    # systemctl stop glusterd
    On RHEL 6, run
    # service glusterd stop

    Important

    Red Hat Gluster Storage is not supported on Red Hat Enterprise Linux 6 (RHEL 6) from 3.5 Batch Update 1 onwards. See Version Details table in section Red Hat Gluster Storage Software Components and Versions of the Installation Guide
  3. Retrieve the UUID of the failed host (server0.example.com) from another of the Red Hat Gluster Storage Trusted Storage Pool by executing the following command:
    # gluster peer status
    Number of Peers: 2
    
    Hostname: server1.example.com
    Uuid: 1d9677dc-6159-405e-9319-ad85ec030880
    State: Peer in Cluster (Connected)
    
    Hostname: server0.example.com
    Uuid: b5ab2ec3-5411-45fa-a30f-43bd04caf96b
    State: Peer Rejected (Connected)
    
    Note that the UUID of the failed host is b5ab2ec3-5411-45fa-a30f-43bd04caf96b
  4. Edit the glusterd.info file in the new host and include the UUID of the host you retrieved in the previous step.
    # cat /var/lib/glusterd/glusterd.info
    UUID=b5ab2ec3-5411-45fa-a30f-43bd04caf96b
    operating-version=30703

    Note

    The operating version of this node must be same as in other nodes of the trusted storage pool.
  5. Select any host (say for example, server1.example.com) in the Red Hat Gluster Storage Trusted Storage Pool and retrieve its UUID from the glusterd.info file.
    # grep -i uuid /var/lib/glusterd/glusterd.info
    UUID=8cc6377d-0153-4540-b965-a4015494461c
  6. Gather the peer information files from the host (server1.example.com) in the previous step. Execute the following command in that host (server1.example.com) of the cluster.
    # cp -a /var/lib/glusterd/peers /tmp/
  7. Remove the peer file corresponding to the failed host (server0.example.com) from the /tmp/peers directory.
    # rm /tmp/peers/b5ab2ec3-5411-45fa-a30f-43bd04caf96b
    Note that the UUID corresponds to the UUID of the failed host (server0.example.com) retrieved in Step 3.
  8. Archive all the files and copy those to the failed host(server0.example.com).
    # cd /tmp; tar -cvf peers.tar peers
  9. Copy the above created file to the new peer.
    # scp /tmp/peers.tar root@server0.example.com:/tmp
  10. Copy the extracted content to the /var/lib/glusterd/peers directory. Execute the following command in the newly added host with the same name (server0.example.com) and IP Address.
    # tar -xvf /tmp/peers.tar
    # cp peers/* /var/lib/glusterd/peers/
  11. Select any other host in the cluster other than the node (server1.example.com) selected in step 5. Copy the peer file corresponding to the UUID of the host retrieved in Step 5 to the new host (server0.example.com) by executing the following command:
    # scp /var/lib/glusterd/peers/<UUID-retrieved-from-step5> root@Example1:/var/lib/glusterd/peers/
  12. Start the glusterd service.
    # systemctl start glusterd
  13. If new brick has same hostname and same path, refer to Section 11.9.5, “Reconfiguring a Brick in a Volume”, and if it has different hostname and different brick path for replicated volumes then, refer to Section 11.9.2, “Replacing an Old Brick with a New Brick on a Replicate or Distribute-replicate Volume”.
  14. In case of disperse volumes, when a new brick has different hostname and different brick path then, refer to Section 11.9.4, “Replacing an Old Brick with a New Brick on a Dispersed or Distributed-dispersed Volume”.
  15. Perform the self-heal operation on the restored volume.
    # gluster volume heal VOLNAME
  16. You can view the gluster volume self-heal status by executing the following command:
    # gluster volume heal VOLNAME info
  17. If the geo-replication session is configured, perform the following steps:
    1. Setup the geo-replication session by generating the ssh keys:
      # gluster system:: execute gsec_create 
    2. Create geo-replication session again with force option to distribute the keys from new nodes to Slave nodes.
      # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL create push-pem force
    3. After successfully setting up the shared storage volume, when a new node is replaced in the cluster, the shared storage is not mounted automatically on this node. Neither is the /etc/fstab entry added for the shared storage on this node. To make use of shared storage on this node, execute the following commands:
      # mount -t glusterfs <local node's ip>:gluster_shared_storage /var/run/gluster/shared_storage # cp /etc/fstab /var/run/gluster/fstab.tmp # echo "<local node's ip>:/gluster_shared_storage /var/run/gluster/shared_storage/ glusterfs defaults 0 0" >> /etc/fstab 

      Note

      With the release of 3.5 Batch Update 3, the mount point of shared storage is changed from /var/run/gluster/ to /run/gluster/ .
      For more information on setting up shared storage volume, see Section 11.12, “Setting up Shared Storage Volume”.
    4. Configure the meta-volume for geo-replication:
      # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL config use_meta_volume true
    5. Start the geo-replication session using force option:
      # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL start force
Replacing a host with the same Hostname in a two-node Red Hat Gluster Storage Trusted Storage Pool

If there are only 2 hosts in the Red Hat Gluster Storage Trusted Storage Pool where the host server0.example.com must be replaced, perform the following steps:

  1. Stop the geo-replication session if configured by executing the following command:
     # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop force 
  2. Stop the glusterd service on server0.example.com.
    On RHEL 7 and RHEL 8, run
    # systemctl stop glusterd
    On RHEL 6, run
    # service glusterd stop

    Important

    Red Hat Gluster Storage is not supported on Red Hat Enterprise Linux 6 (RHEL 6) from 3.5 Batch Update 1 onwards. See Version Details table in section Red Hat Gluster Storage Software Components and Versions of the Installation Guide
  3. Retrieve the UUID of the failed host (server0.example.com) from another peer in the Red Hat Gluster Storage Trusted Storage Pool by executing the following command:
    # gluster peer status
    Number of Peers: 1
    
    Hostname: server0.example.com
    Uuid: b5ab2ec3-5411-45fa-a30f-43bd04caf96b
    State: Peer Rejected (Connected)
    
    Note that the UUID of the failed host is b5ab2ec3-5411-45fa-a30f-43bd04caf96b
  4. Edit the glusterd.info file in the new host (server0.example.com) and include the UUID of the host you retrieved in the previous step.
    # cat /var/lib/glusterd/glusterd.info
    UUID=b5ab2ec3-5411-45fa-a30f-43bd04caf96b
    operating-version=30703

    Note

    The operating version of this node must be same as in other nodes of the trusted storage pool.
  5. Create the peer file in the newly created host (server0.example.com) in /var/lib/glusterd/peers/<uuid-of-other-peer> with the name of the UUID of the other host (server1.example.com).
    UUID of the host can be obtained with the following:
    # gluster system:: uuid get

    Example 11.6. Example to obtain the UUID of a host

    For example,
    # gluster system:: uuid get
    UUID: 1d9677dc-6159-405e-9319-ad85ec030880
    In this case the UUID of other peer is 1d9677dc-6159-405e-9319-ad85ec030880
  6. Create a file /var/lib/glusterd/peers/1d9677dc-6159-405e-9319-ad85ec030880 in server0.example.com, with the following command:
    # touch /var/lib/glusterd/peers/1d9677dc-6159-405e-9319-ad85ec030880
    The file you create must contain the following information:
    UUID=<uuid-of-other-node>
    state=3
    hostname=<hostname>
  7. Continue to perform steps 12 to 18 as documented in the previous procedure.
Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.