Chapter 6. Replacing a primary host using new bricks

6.1. Host replacement prerequisites
Copiar enlace

Determine which node to use as the Ansible controller node (the node from which all Ansible playbooks are executed). Red Hat recommends using a healthy node in the same cluster as the failed node as the Ansible controller node.
Power off all virtual machines in the cluster.
Stop brick processes and unmount file systems on the failed host, to avoid file system inconsistency issues.
```
pkill glusterfsd
umount /gluster_bricks/{engine,vmstore,data}
```
```
# pkill glusterfsd
# umount /gluster_bricks/{engine,vmstore,data}
```
Copy to Clipboard Toggle word wrap
Check which operating system is running on your hyperconverged hosts by running the following command:
```
nodectl info
```
```
$ nodectl info
```
Copy to Clipboard Toggle word wrap
Install the same operating system on a replacement host.

6.2. Preparing the cluster for host replacement
Copiar enlace

Verify host state in the Administrator Portal.
1. Log in to the Red Hat Virtualization Administrator Portal.
  The host is listed as NonResponsive in the Administrator Portal. Virtual machines that previously ran on this host are in the Unknown state.
2. Click Compute Hosts and click the Action menu (⋮).
3. Click Confirm host has been rebooted and confirm the operation.
4. Verify that the virtual machines are now listed with a state of Down.

Update the SSH fingerprint for the failed node.

Log in to the Ansible controller node as the root user.

Remove the existing SSH fingerprint for the failed node.

sed -i `/failed-host-frontend.example.com/d` /root/.ssh/known_hosts
sed -i `/failed-host-backend.example.com/d` /root/.ssh/known_hosts

# sed -i `/failed-host-frontend.example.com/d` /root/.ssh/known_hosts
# sed -i `/failed-host-backend.example.com/d` /root/.ssh/known_hosts

Copy to Clipboard

Toggle word wrap

Copy the public key from the Ansible controller node to the freshly installed node.

ssh-copy-id root@new-host-backend.example.com
ssh-copy-id root@new-host-frontend.example.com

# ssh-copy-id root@new-host-backend.example.com
# ssh-copy-id root@new-host-frontend.example.com

Copy to Clipboard

Toggle word wrap

Verify that you can log in to all hosts in the cluster, including the Ansible controller node, using key-based SSH authentication without a password. Test access using all network addresses. The following example assumes that the Ansible controller node is host1.

ssh root@host1-backend.example.com
ssh root@host1-frontend.example.com
ssh root@host2-backend.example.com
ssh root@host2-frontend.example.com
ssh root@new-host-backend.example.com
ssh root@new-host-frontend.example.com

# ssh root@host1-backend.example.com
# ssh root@host1-frontend.example.com
# ssh root@host2-backend.example.com
# ssh root@host2-frontend.example.com
# ssh root@new-host-backend.example.com
# ssh root@new-host-frontend.example.com

Copy to Clipboard

Toggle word wrap

Use ssh-copy-id to copy the public key to any host you cannot log into without a password using this method.

ssh-copy-id root@host-frontend.example.com
ssh-copy-id root@host-backend.example.com

# ssh-copy-id root@host-frontend.example.com
# ssh-copy-id root@host-backend.example.com

Copy to Clipboard

Toggle word wrap

6.3. Creating the node_prep_inventory.yml file
Copiar enlace

Define the replacement node in the node_prep_inventory.yml file.

Procedure

Familiarize yourself with your Gluster configuration.
The configuration that you define in your inventory file must match the existing Gluster volume configuration. Use gluster volume info to check where your bricks should be mounted for each Gluster volume, for example:
```
gluster volume info engine | grep -i brick
```
```
# gluster volume info engine | grep -i brick
Number of Bricks: 1 x 3 = 3
Bricks:
Brick1: host1.example.com:/gluster_bricks/engine/engine
Brick2: host2.example.com:/gluster_bricks/engine/engine
Brick3: host3.example.com:/gluster_bricks/engine/engine
```
Copy to Clipboard Toggle word wrap

Back up the node_prep_inventory.yml file.

cd /etc/ansible/roles/gluster.ansible/playbooks/hc-ansible-deployment
cp node_prep_inventory.yml node_prep_inventory.yml.bk

# cd /etc/ansible/roles/gluster.ansible/playbooks/hc-ansible-deployment
# cp node_prep_inventory.yml node_prep_inventory.yml.bk

Copy to Clipboard

Toggle word wrap

Edit the node_prep_inventory.yml file to define your node preparation.
See Appendix B, Understanding the node_prep_inventory.yml file for more information about this inventory file and its parameters.

6.4. Creating the node_replace_inventory.yml file
Copiar enlace

Define your cluster hosts by creating a node_replacement_inventory.yml file.

Procedure

Back up the node_replace_inventory.yml file.

cd /etc/ansible/roles/gluster.ansible/playbooks/hc-ansible-deployment
cp node_replace_inventory.yml node_replace_inventory.yml.bk

# cd /etc/ansible/roles/gluster.ansible/playbooks/hc-ansible-deployment
# cp node_replace_inventory.yml node_replace_inventory.yml.bk

Copy to Clipboard

Toggle word wrap

Edit the node_replace_inventory.yml file to define your cluster.
See Appendix C, Understanding the node_replace_inventory.yml file for more information about this inventory file and its parameters.

6.5. Executing the replace_node.yml playbook file
Copiar enlace

The replace_node.yml playbook reconfigures a Red Hat Hyperconverged Infrastructure for Virtualization cluster to use a new node after an existing cluster node has failed.

Procedure

Execute the playbook.

cd /etc/ansible/roles/gluster.ansible/playbooks/hc-ansible-deployment/
ansible-playbook -i node_prep_inventory.yml -i node_replace_inventory.yml tasks/replace_node.yml

# cd /etc/ansible/roles/gluster.ansible/playbooks/hc-ansible-deployment/
# ansible-playbook -i node_prep_inventory.yml -i node_replace_inventory.yml tasks/replace_node.yml

Copy to Clipboard

Toggle word wrap

6.6. Updating the cluster for a new primary host
Copiar enlace

When you replace a failed host using a different FQDN, you need to update configuration in the cluster to use the replacement host.

Procedure

Change into the hc-ansible-deployment directory.

cd /etc/ansible/roles/gluster.ansible/playbooks/hc-ansible-deployment/

# cd /etc/ansible/roles/gluster.ansible/playbooks/hc-ansible-deployment/

Copy to Clipboard

Toggle word wrap

Make a copy of the reconfigure_storage_inventory.yml file.

cp reconfigure_storage_inventory.yml reconfigure_storage_inventory.yml.bk

# cp reconfigure_storage_inventory.yml reconfigure_storage_inventory.yml.bk

Copy to Clipboard

Toggle word wrap

Edit the reconfigure_storage_inventory.yml file to identify the following:

hosts: Two active hosts in the cluster that have been configured to host the Hosted Engine virtual machine.
gluster_maintenance_old_node: The backend network FQDN of the failed node.
gluster_maintenance_new_node: The backend network FQDN of the replacement node.
ovirt_engine_hostname: The FQDN of the Hosted Engine virtual machine.

For example:

all:
  hosts:
    host2-backend.example.com:
    host3-backend.example.com:

  vars:
    gluster_maintenance_old_node: host1-backend.example.com
    gluster_maintenance_new_node: host4-backend.example.com
    ovirt_engine_hostname: engine.example.com

all:
  hosts:
    host2-backend.example.com:
    host3-backend.example.com:

  vars:
    gluster_maintenance_old_node: host1-backend.example.com
    gluster_maintenance_new_node: host4-backend.example.com
    ovirt_engine_hostname: engine.example.com

Copy to Clipboard

Toggle word wrap

Execute the reconfigure_he_storage.yml playbook with your updated inventory file.

ansible-playbook -i reconfigure_he_storage_inventory.yml tasks/reconfigure_he_storage.yml

# ansible-playbook -i reconfigure_he_storage_inventory.yml tasks/reconfigure_he_storage.yml

Copy to Clipboard

Toggle word wrap

6.7. Removing a failed host from the cluster
Copiar enlace

When a replacement host is ready, remove the existing failed host from the cluster.

Procedure

Remove the failed host.
1. Log in into the Administrator Portal.
2. Click Compute Hosts.
  The failed host is in the NonResponsive state. Virtual machines running on the failed host are in the Unknown state.
3. Select the failed host.
4. Click the main Action menu (⋮) for the Hosts page and select Confirm host has been rebooted.
5. Click OK to confirm the operation.
  Virtual machines move to the Down state.
6. Select the failed host and click Management Maintenance.
7. Click the Action menu (⋮) beside the failed host and click Remove.
Update the storage domains.
For each storage domain:
1. Click Storage Domains.
2. Click the storage domain name, then click Data Center Maintenance and confirm the operation.
3. Click Manage Domain.
  1. Edit the Path field to match the new FQDN.
  2. Click OK.
    Note
    A dialog box with an Operation Cancelled error appears as a result of Bug 1853995, but the path is updated as expected.
4. Click the Action menu (⋮) beside the storage domain and click Activate.
Add the replacement host to the cluster.
Attach the gluster logical network to the replacement host.
Restart all virtual machines.
1. For highly available virtual machines, disable and re-enable high-availability.
  1. Click Compute Virtual Machines and select a virtual machine.
  2. Click Edit High Availability uncheck the High Availability check box and click OK.
  3. Click Edit High Availability check the High Availability check box and click OK.
2. Start all the virtual machines.
  1. Click Compute Virtual Machines and select a virtual machine.
  2. Click the Action menu (⋮) Start.

6.8. Verifying healing in progress
Copiar enlace

After replacing a failed host with a new host, verify that your storage is healing as expected.

Procedure

Verify that healing is in progress.

Run the following command on any hyperconverged host:

for vol in `gluster volume list`; do gluster volume heal $vol info summary; done

# for vol in `gluster volume list`; do gluster volume heal $vol info summary; done

Copy to Clipboard

Toggle word wrap

The output shows a summary of healing activity on each brick in each volume, for example:

Brick brick1
Status: Connected
Total Number of entries: 3
Number of entries in heal pending: 2
Number of entries in split-brain: 1
Number of entries possibly healing: 0

Brick brick1
Status: Connected
Total Number of entries: 3
Number of entries in heal pending: 2
Number of entries in split-brain: 1
Number of entries possibly healing: 0

Copy to Clipboard

Toggle word wrap

Depending on brick size, volumes can take a long time to heal. You can still run and migrate virtual machines using this node while the underlying storage heals.

Este contenido no está disponible en el idioma seleccionado.

6.1. Host replacement prerequisites
Copiar enlace

6.2. Preparing the cluster for host replacement
Copiar enlace

6.3. Creating the node_prep_inventory.yml file
Copiar enlace

6.4. Creating the node_replace_inventory.yml file
Copiar enlace

6.5. Executing the replace_node.yml playbook file
Copiar enlace

6.6. Updating the cluster for a new primary host
Copiar enlace

6.7. Removing a failed host from the cluster
Copiar enlace

6.8. Verifying healing in progress
Copiar enlace

Aprender

Pruebe, compre y venda

Comunidades

Acerca de la documentación de Red Hat

Hacer que el código abierto sea más inclusivo

Acerca de Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Este contenido no está disponible en el idioma seleccionado.

Chapter 6. Replacing a primary host using new bricks

6.1. Host replacement prerequisitesCopiar enlaceEnlace copiado en el portapapeles!

6.2. Preparing the cluster for host replacementCopiar enlaceEnlace copiado en el portapapeles!

6.3. Creating the node_prep_inventory.yml fileCopiar enlaceEnlace copiado en el portapapeles!

6.4. Creating the node_replace_inventory.yml fileCopiar enlaceEnlace copiado en el portapapeles!

6.5. Executing the replace_node.yml playbook fileCopiar enlaceEnlace copiado en el portapapeles!

6.6. Updating the cluster for a new primary hostCopiar enlaceEnlace copiado en el portapapeles!

6.7. Removing a failed host from the clusterCopiar enlaceEnlace copiado en el portapapeles!

6.8. Verifying healing in progressCopiar enlaceEnlace copiado en el portapapeles!

Aprender

Pruebe, compre y venda

Comunidades

Acerca de la documentación de Red Hat

Hacer que el código abierto sea más inclusivo

Acerca de Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

6.1. Host replacement prerequisites
Copiar enlace

6.2. Preparing the cluster for host replacement
Copiar enlace

6.3. Creating the node_prep_inventory.yml file
Copiar enlace

6.4. Creating the node_replace_inventory.yml file
Copiar enlace

6.5. Executing the replace_node.yml playbook file
Copiar enlace

6.6. Updating the cluster for a new primary host
Copiar enlace

6.7. Removing a failed host from the cluster
Copiar enlace

6.8. Verifying healing in progress
Copiar enlace