11.8. Replacing Hosts

11.8.1. Replacing a Host Machine with a Different Hostname
Copy link

You can replace a failed host machine with another host that has a different hostname.

Important

Ensure that the new peer has the exact disk capacity as that of the one it is replacing. For example, if the peer in the cluster has two 100GB drives, then the new peer must have the same disk capacity and number of drives.

In the following example the original machine which has had an irrecoverable failure is server0.example.com and the replacement machine is server5.example.com. The brick with an unrecoverable failure is server0.example.com:/rhgs/brick1 and the replacement brick is server5.example.com:/rhgs/brick1.

Stop the geo-replication session if configured by executing the following command:

gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop force

# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop forcegluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop forcegluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop forcegluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop forcegluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop forcegluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop force

Copy to Clipboard

Toggle word wrap

Probe the new peer from one of the existing peers to bring it into the cluster.
```
gluster peer probe server5.example.com
```
```
# gluster peer probe server5.example.com
```
Copy to Clipboard Toggle word wrap
Ensure that the new brick (server5.example.com:/rhgs/brick1) that is replacing the old brick (server0.example.com:/rhgs/brick1) is empty.
If the geo-replication session is configured, perform the following steps:
1. Setup the geo-replication session by generating the ssh keys:
  # gluster system:: execute gsec_create
  Copy to Clipboard Toggle word wrap
2. Create geo-replication session again with force option to distribute the keys from new nodes to Slave nodes.
  # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL create push-pem force
  Copy to Clipboard Toggle word wrap
3. After successfully setting up the shared storage volume, when a new node is replaced in the cluster, the shared storage is not mounted automatically on this node. Neither is the /etc/fstab entry added for the shared storage on this node. To make use of shared storage on this node, execute the following commands:
  # mount -t glusterfs local node's ip:gluster_shared_storage /var/run/gluster/shared_storage # cp /etc/fstab /var/run/gluster/fstab.tmp # echo local node's ip:/gluster_shared_storage /var/run/gluster/shared_storage/ glusterfs defaults 0 0" >> /etc/fstab
  Copy to Clipboard Toggle word wrap
  For more information on setting up shared storage volume, see Section 11.10, “Setting up Shared Storage Volume”.
4. Configure the meta-volume for geo-replication:
  # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL config use_meta_volume true
  Copy to Clipboard Toggle word wrap
  For more information on configuring meta-volume, see Section 10.3.5, “Configuring a Meta-Volume”.

Retrieve the brick paths in server0.example.com using the following command:

gluster volume info <VOLNAME>

# gluster volume info <VOLNAME>

Copy to Clipboard

Toggle word wrap

Volume Name: vol
Type: Replicate
Volume ID: 0xde822e25ebd049ea83bfaa3c4be2b440
Status: Started
Snap Volume: no
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: server0.example.com:/rhgs/brick1
Brick2: server1.example.com:/rhgs/brick1
Options Reconfigured:
performance.readdir-ahead: on
snap-max-hard-limit: 256
snap-max-soft-limit: 90
auto-delete: disable

Volume Name: vol
Type: Replicate
Volume ID: 0xde822e25ebd049ea83bfaa3c4be2b440
Status: Started
Snap Volume: no
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: server0.example.com:/rhgs/brick1
Brick2: server1.example.com:/rhgs/brick1
Options Reconfigured:
performance.readdir-ahead: on
snap-max-hard-limit: 256
snap-max-soft-limit: 90
auto-delete: disable

Copy to Clipboard

Toggle word wrap

Brick path in server0.example.com is /rhgs/brick1. This has to be replaced with the brick in the newly added host, server5.example.com.

Create the required brick path in server5.example.com.For example, if /rhs/brick is the XFS mount point in server5.example.com, then create a brick directory in that path.
```
mkdir /rhgs/brick1
```
```
# mkdir /rhgs/brick1
```
Copy to Clipboard Toggle word wrap

Execute the replace-brick command with the force option:

gluster volume replace-brick vol server0.example.com:/rhgs/brick1 server5.example.com:/rhgs/brick1 commit force

# gluster volume replace-brick vol server0.example.com:/rhgs/brick1 server5.example.com:/rhgs/brick1 commit force
volume replace-brick: success: replace-brick commit successful

Copy to Clipboard

Toggle word wrap

Verify that the new brick is online.

gluster volume status

# gluster volume status
Status of volume: vol
Gluster process                                  Port    Online Pid
Brick server5.example.com:/rhgs/brick1           49156    Y    5731
Brick server1.example.com:/rhgs/brick1            49153    Y    5354

Copy to Clipboard

Toggle word wrap

Initiate self-heal on the volume. The status of the heal process can be seen by executing the command:
```
gluster volume heal VOLNAME
```
```
# gluster volume heal VOLNAME
```
Copy to Clipboard Toggle word wrap
The status of the heal process can be seen by executing the command:
```
gluster volume heal VOLNAME info
```
```
# gluster volume heal VOLNAME info
```
Copy to Clipboard Toggle word wrap
Detach the original machine from the trusted pool.
```
gluster peer detach server0.example.com
```
```
# gluster peer detach server0.example.com
```
Copy to Clipboard Toggle word wrap

Ensure that after the self-heal completes, the extended attributes are set to zero on the other bricks in the replica.

getfattr -d -m. -e hex /rhgs/brick1

# getfattr -d -m. -e hex /rhgs/brick1
getfattr: Removing leading '/' from absolute path names
#file: rhgs/brick1
security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000
trusted.afr.vol-client-0=0x000000000000000000000000
trusted.afr.vol-client-1=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x0000000100000000000000007ffffffe
trusted.glusterfs.volume-id=0xde822e25ebd049ea83bfaa3c4be2b440

Copy to Clipboard

Toggle word wrap

In this example, the extended attributes trusted.afr.vol-client-0 and trusted.afr.vol-client-1 have zero values. This means that the data on the two bricks is identical. If these attributes are not zero after self-heal is completed, the data has not been synchronised correctly.

Start the geo-replication session using force option:

gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL start force

# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL start force

Copy to Clipboard

Toggle word wrap

11.8.2. Replacing a Host Machine with the Same Hostname
Copy link

You can replace a failed host with another node having the same FQDN (Fully Qualified Domain Name). A host in a Red Hat Gluster Storage Trusted Storage Pool has its own identity called the UUID generated by the glusterFS Management Daemon.The UUID for the host is available in /var/lib/glusterd/glusterd/info file.

In the following example, the host with the FQDN as server0.example.com was irrecoverable and must to be replaced with a host, having the same FQDN. The following steps have to be performed on the new host.

Stop the geo-replication session if configured by executing the following command:

gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop force

 # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop force # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop force # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop force # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop force # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop force # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop force

Copy to Clipboard

Toggle word wrap

Stop the glusterd service on the server0.example.com.
```
service glusterd stop
```
```
# service glusterd stop
```
Copy to Clipboard Toggle word wrap
Important
If glusterd crashes, there is no functionality impact to this crash as it occurs during the shutdown. For more information, see Section 24.3, “Resolving glusterd Crash”
Retrieve the UUID of the failed host (server0.example.com) from another of the Red Hat Gluster Storage Trusted Storage Pool by executing the following command:
```
gluster peer status
```
```
# gluster peer status
Number of Peers: 2

Hostname: server1.example.com
Uuid: 1d9677dc-6159-405e-9319-ad85ec030880
State: Peer in Cluster (Connected)

Hostname: server0.example.com
Uuid: b5ab2ec3-5411-45fa-a30f-43bd04caf96b
State: Peer Rejected (Connected)
```
Copy to Clipboard Toggle word wrap
Note that the UUID of the failed host is b5ab2ec3-5411-45fa-a30f-43bd04caf96b
Edit the glusterd.info file in the new host and include the UUID of the host you retrieved in the previous step.
```
cat /var/lib/glusterd/glusterd.info
UUID=b5ab2ec3-5411-45fa-a30f-43bd04caf96b
operating-version=30703
```
```
# cat /var/lib/glusterd/glusterd.info
UUID=b5ab2ec3-5411-45fa-a30f-43bd04caf96b
operating-version=30703
```
Copy to Clipboard Toggle word wrap
Note
The operating version of this node must be same as in other nodes of the trusted storage pool.
Select any host (say for example, server1.example.com) in the Red Hat Gluster Storage Trusted Storage Pool and retrieve its UUID from the glusterd.info file.
```
grep -i uuid /var/lib/glusterd/glusterd.info
```
```
# grep -i uuid /var/lib/glusterd/glusterd.info
UUID=8cc6377d-0153-4540-b965-a4015494461c
```
Copy to Clipboard Toggle word wrap
Gather the peer information files from the host (server1.example.com) in the previous step. Execute the following command in that host (server1.example.com) of the cluster.
```
cp -a /var/lib/glusterd/peers /tmp/
```
```
# cp -a /var/lib/glusterd/peers /tmp/
```
Copy to Clipboard Toggle word wrap
Remove the peer file corresponding to the failed host (server0.example.com) from the /tmp/peers directory.
```
rm /tmp/peers/b5ab2ec3-5411-45fa-a30f-43bd04caf96b
```
```
# rm /tmp/peers/b5ab2ec3-5411-45fa-a30f-43bd04caf96b
```
Copy to Clipboard Toggle word wrap
Note that the UUID corresponds to the UUID of the failed host (server0.example.com) retrieved in Step 3.
Archive all the files and copy those to the failed host(server0.example.com).
```
cd /tmp; tar -cvf peers.tar peers
```
```
# cd /tmp; tar -cvf peers.tar peers
```
Copy to Clipboard Toggle word wrap
Copy the above created file to the new peer.
```
scp /tmp/peers.tar root@server0.example.com:/tmp
```
```
# scp /tmp/peers.tar root@server0.example.com:/tmp
```
Copy to Clipboard Toggle word wrap
Copy the extracted content to the /var/lib/glusterd/peers directory. Execute the following command in the newly added host with the same name (server0.example.com) and IP Address.
```
tar -xvf /tmp/peers.tar
cp peers/* /var/lib/glusterd/peers/
```
```
# tar -xvf /tmp/peers.tar
# cp peers/* /var/lib/glusterd/peers/
```
Copy to Clipboard Toggle word wrap
Select any other host in the cluster other than the node (server1.example.com) selected in step 5. Copy the peer file corresponding to the UUID of the host retrieved in Step 4 to the new host (server0.example.com) by executing the following command:
```
scp /var/lib/glusterd/peers/<UUID-retrieved-from-step4> root@Example1:/var/lib/glusterd/peers/
```
```
# scp /var/lib/glusterd/peers/<UUID-retrieved-from-step4> root@Example1:/var/lib/glusterd/peers/
```
Copy to Clipboard Toggle word wrap

Retrieve the brick directory information, by executing the following command in any host in the cluster.

gluster volume info

# gluster volume info
Volume Name: vol
Type: Replicate
Volume ID: 0x8f16258c88a0498fbd53368706af7496
Status: Started
Snap Volume: no
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: server0.example.com:/rhgs/brick1
Brick2: server1.example.com:/rhgs/brick1
Options Reconfigured:
performance.readdir-ahead: on
snap-max-hard-limit: 256
snap-max-soft-limit: 90
auto-delete: disable

Copy to Clipboard

Toggle word wrap

In the above example, the brick path in server0.example.com is, /rhgs/brick1. If the brick path does not exist in server0.example.com, perform steps a, b, and c.

Create a brick path in the host, server0.example.com.
```
mkdir /rhgs/brick1
```
```
mkdir /rhgs/brick1
```
Copy to Clipboard Toggle word wrap

Retrieve the volume ID from the existing brick of another host by executing the following command on any host that contains the bricks for the volume.

getfattr -d -m. -ehex <brick-path>

# getfattr -d -m. -ehex <brick-path>

Copy to Clipboard

Toggle word wrap

Copy the volume-id.

getfattr -d -m. -ehex /rhgs/brick1
file: rhgs/brick1

# getfattr -d -m. -ehex /rhgs/brick1
getfattr: Removing leading '/' from absolute path names
# file: rhgs/brick1
trusted.afr.vol-client-0=0x000000000000000000000000
trusted.afr.vol-client-1=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x0000000100000000000000007ffffffe
trusted.glusterfs.volume-id=0x8f16258c88a0498fbd53368706af7496

Copy to Clipboard

Toggle word wrap

In the above example, the volume id is 0x8f16258c88a0498fbd53368706af7496

Set this volume ID on the brick created in the newly added host and execute the following command on the newly added host (server0.example.com).

setfattr -n trusted.glusterfs.volume-id -v <volume-id> <brick-path>

# setfattr -n trusted.glusterfs.volume-id -v <volume-id> <brick-path>

Copy to Clipboard

Toggle word wrap

For Example:

setfattr -n trusted.glusterfs.volume-id -v 0x8f16258c88a0498fbd53368706af7496 /rhs/brick2/drv2

# setfattr -n trusted.glusterfs.volume-id -v 0x8f16258c88a0498fbd53368706af7496 /rhs/brick2/drv2

Copy to Clipboard

Toggle word wrap

Data recovery is possible only if the volume type is replicate or distribute-replicate. If the volume type is plain distribute, you can skip steps 12 and 13.

Create a FUSE mount point to mount the glusterFS volume.

mount -t glusterfs <server-name>:/VOLNAME <mount>

# mount -t glusterfs <server-name>:/VOLNAME <mount>mount -t glusterfs <server-name>:/VOLNAME <mount>mount -t glusterfs <server-name>:/VOLNAME <mount>

Copy to Clipboard

Toggle word wrap

Perform the following operations to change the Automatic File Replication extended attributes so that the heal process happens from the other brick (server1.example.com:/rhgs/brick1) in the replica pair to the new brick (server0.example.com:/rhgs/brick1). Note that /mnt/r2 is the FUSE mount path.
1. Create a new directory on the mount point and ensure that a directory with such a name is not already present.
  # mkdir /mnt/r2/<name-of-nonexistent-dir>
  Copy to Clipboard Toggle word wrap
2. Delete the directory and set the extended attributes.
  # rmdir /mnt/r2/<name-of-nonexistent-dir> # setfattr -n trusted.non-existent-key -v abc /mnt/r2 # setfattr -x trusted.non-existent-key /mnt/r2
  Copy to Clipboard Toggle word wrap
3. Ensure that the extended attributes on the other bricks in the replica (in this example, trusted.afr.vol-client-0) is not set to zero.
  # getfattr -d -m. -e hex /rhgs/brick1 # file: rhgs/brick1 security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f723a66696c655f743a733000 trusted.afr.vol-client-0=0x000000000000000300000002 trusted.afr.vol-client-1=0x000000000000000000000000 trusted.gfid=0x00000000000000000000000000000001 trusted.glusterfs.dht=0x0000000100000000000000007ffffffe trusted.glusterfs.volume-id=0x8f16258c88a0498fbd53368706af7496
  Copy to Clipboard Toggle word wrap
Note
You must ensure to perform steps 12, 13, and 14 for all the volumes having bricks from server0.example.com.
Start the glusterd service.
```
service glusterd start
```
```
# service glusterd start
```
Copy to Clipboard Toggle word wrap
Perform the self-heal operation on the restored volume.
```
gluster volume heal VOLNAME
```
```
# gluster volume heal VOLNAMEgluster volume heal VOLNAME
```
Copy to Clipboard Toggle word wrap
You can view the gluster volume self-heal status by executing the following command:
```
gluster volume heal VOLNAME info
```
```
# gluster volume heal VOLNAME infogluster volume heal VOLNAME infogluster volume heal VOLNAME info
```
Copy to Clipboard Toggle word wrap

If the geo-replication session is configured, perform the following steps:

Setup the geo-replication session by generating the ssh keys:
```
gluster system:: execute gsec_create
```
```
# gluster system:: execute gsec_create 
```
Copy to Clipboard Toggle word wrap

Create geo-replication session again with force option to distribute the keys from new nodes to Slave nodes.

gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL create push-pem force

# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL create push-pem force

Copy to Clipboard

Toggle word wrap

After successfully setting up the shared storage volume, when a new node is replaced in the cluster, the shared storage is not mounted automatically on this node. Neither is the /etc/fstab entry added for the shared storage on this node. To make use of shared storage on this node, execute the following commands:

mount -t glusterfs <local node's ip>:gluster_shared_storage /var/run/gluster/shared_storage # cp /etc/fstab /var/run/gluster/fstab.tmp # echo "<local node's ip>:/gluster_shared_storage /var/run/gluster/shared_storage/ glusterfs defaults 0 0" >> /etc/fstab

# mount -t glusterfs <local node's ip>:gluster_shared_storage /var/run/gluster/shared_storage # cp /etc/fstab /var/run/gluster/fstab.tmp # echo "<local node's ip>:/gluster_shared_storage /var/run/gluster/shared_storage/ glusterfs defaults 0 0" >> /etc/fstab

Copy to Clipboard

Toggle word wrap

For more information on setting up shared storage volume, see Section 11.10, “Setting up Shared Storage Volume”.

Configure the meta-volume for geo-replication:

gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL config use_meta_volume true

# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL config use_meta_volume true

Copy to Clipboard

Toggle word wrap

Start the geo-replication session using force option:

gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL start force

# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL start force

Copy to Clipboard

Toggle word wrap

Replacing a host with the same Hostname in a two-node Red Hat Gluster Storage Trusted Storage Pool

If there are only 2 hosts in the Red Hat Gluster Storage Trusted Storage Pool where the host server0.example.com must be replaced, perform the following steps:

Stop the geo-replication session if configured by executing the following command:

gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop force

 # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop force # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop force # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop force # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop force # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop force # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop force

Copy to Clipboard

Toggle word wrap

Stop the glusterd service on server0.example.com.
```
service glusterd stop
```
```
# service glusterd stop
```
Copy to Clipboard Toggle word wrap
Important
If glusterd crashes, there is no functionality impact to this crash as it occurs during the shutdown. For more information, see Section 24.3, “Resolving glusterd Crash”
Retrieve the UUID of the failed host (server0.example.com) from another peer in the Red Hat Gluster Storage Trusted Storage Pool by executing the following command:
```
gluster peer status
```
```
# gluster peer status
Number of Peers: 1

Hostname: server0.example.com
Uuid: b5ab2ec3-5411-45fa-a30f-43bd04caf96b
State: Peer Rejected (Connected)
```
Copy to Clipboard Toggle word wrap
Note that the UUID of the failed host is b5ab2ec3-5411-45fa-a30f-43bd04caf96b
Edit the glusterd.info file in the new host (server0.example.com) and include the UUID of the host you retrieved in the previous step.
```
cat /var/lib/glusterd/glusterd.info
UUID=b5ab2ec3-5411-45fa-a30f-43bd04caf96b
operating-version=30703
```
```
# cat /var/lib/glusterd/glusterd.info
UUID=b5ab2ec3-5411-45fa-a30f-43bd04caf96b
operating-version=30703
```
Copy to Clipboard Toggle word wrap
Note
The operating version of this node must be same as in other nodes of the trusted storage pool.
Create the peer file in the newly created host (server0.example.com) in /var/lib/glusterd/peers/<uuid-of-other-peer> with the name of the UUID of the other host (server1.example.com).
UUID of the host can be obtained with the following:
```
gluster system:: uuid get
```
```
# gluster system:: uuid get
```
Copy to Clipboard Toggle word wrap
Example 11.7. Example to obtain the UUID of a host
For example, # gluster system:: uuid get UUID: 1d9677dc-6159-405e-9319-ad85ec030880
Copy to Clipboard Toggle word wrap
In this case the UUID of other peer is 1d9677dc-6159-405e-9319-ad85ec030880
Create a file /var/lib/glusterd/peers/1d9677dc-6159-405e-9319-ad85ec030880 in server0.example.com, with the following command:
```
touch /var/lib/glusterd/peers/1d9677dc-6159-405e-9319-ad85ec030880
```
```
# touch /var/lib/glusterd/peers/1d9677dc-6159-405e-9319-ad85ec030880
```
Copy to Clipboard Toggle word wrap
The file you create must contain the following information:
```
UUID=<uuid-of-other-node>
state=3
hostname=<hostname>
```
```
UUID=<uuid-of-other-node>
state=3
hostname=<hostname>
```
Copy to Clipboard Toggle word wrap
Continue to perform steps 12 to 18 as documented in the previous procedure.

11.8. Replacing Hosts

11.8.1. Replacing a Host Machine with a Different Hostname
Copy link

11.8.2. Replacing a Host Machine with the Same Hostname
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

11.8. Replacing Hosts

11.8.1. Replacing a Host Machine with a Different HostnameCopy linkLink copied to clipboard!

11.8.2. Replacing a Host Machine with the Same HostnameCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

11.8.1. Replacing a Host Machine with a Different Hostname
Copy link

11.8.2. Replacing a Host Machine with the Same Hostname
Copy link