Este contenido no está disponible en el idioma seleccionado.

Chapter 7. Replacing a non-primary host using new bricks


7.1. Host replacement prerequisites

  • Determine which node to use as the Ansible controller node (the node from which all Ansible playbooks are executed). Red Hat recommends using a healthy node in the same cluster as the failed node as the Ansible controller node.
  • Stop brick processes and unmount file systems on the failed host, to avoid file system inconsistency issues.

    # pkill glusterfsd
    # umount /gluster_bricks/{engine,vmstore,data}
    Copy to Clipboard Toggle word wrap
  • Check which operating system is running on your hyperconverged hosts by running the following command:

    $ nodectl info
    Copy to Clipboard Toggle word wrap
  • Install the same operating system on a replacement host.

7.2. Preparing the cluster for host replacement

  1. Verify host state in the Administrator Portal.

    1. Log in to the Red Hat Virtualization Administrator Portal.

      The host is listed as NonResponsive in the Administrator Portal. Virtual machines that previously ran on this host are in the Unknown state.

    2. Click Compute Hosts and click the Action menu (⋮).
    3. Click Confirm host has been rebooted and confirm the operation.
    4. Verify that the virtual machines are now listed with a state of Down.
  2. Update the SSH fingerprint for the failed node.

    1. Log in to the Ansible controller node as the root user.
    2. Remove the existing SSH fingerprint for the failed node.

      # sed -i `/failed-host-frontend.example.com/d` /root/.ssh/known_hosts
      # sed -i `/failed-host-backend.example.com/d` /root/.ssh/known_hosts
      Copy to Clipboard Toggle word wrap
    3. Copy the public key from the Ansible controller node to the freshly installed node.

      # ssh-copy-id root@new-host-backend.example.com
      # ssh-copy-id root@new-host-frontend.example.com
      Copy to Clipboard Toggle word wrap
    4. Verify that you can log in to all hosts in the cluster, including the Ansible controller node, using key-based SSH authentication without a password. Test access using all network addresses. The following example assumes that the Ansible controller node is host1.

      # ssh root@host1-backend.example.com
      # ssh root@host1-frontend.example.com
      # ssh root@host2-backend.example.com
      # ssh root@host2-frontend.example.com
      # ssh root@new-host-backend.example.com
      # ssh root@new-host-frontend.example.com
      Copy to Clipboard Toggle word wrap

      Use ssh-copy-id to copy the public key to any host you cannot log into without a password using this method.

      # ssh-copy-id root@host-frontend.example.com
      # ssh-copy-id root@host-backend.example.com
      Copy to Clipboard Toggle word wrap

7.3. Creating the node_prep_inventory.yml file

Define the replacement node in the node_prep_inventory.yml file.

Procedure

  1. Familiarize yourself with your Gluster configuration.

    The configuration that you define in your inventory file must match the existing Gluster volume configuration. Use gluster volume info to check where your bricks should be mounted for each Gluster volume, for example:

    # gluster volume info engine | grep -i brick
    Number of Bricks: 1 x 3 = 3
    Bricks:
    Brick1: host1.example.com:/gluster_bricks/engine/engine
    Brick2: host2.example.com:/gluster_bricks/engine/engine
    Brick3: host3.example.com:/gluster_bricks/engine/engine
    Copy to Clipboard Toggle word wrap
  2. Back up the node_prep_inventory.yml file.

    # cd /etc/ansible/roles/gluster.ansible/playbooks/hc-ansible-deployment
    # cp node_prep_inventory.yml node_prep_inventory.yml.bk
    Copy to Clipboard Toggle word wrap
  3. Edit the node_prep_inventory.yml file to define your node preparation.

    See Appendix B, Understanding the node_prep_inventory.yml file for more information about this inventory file and its parameters.

7.4. Creating the node_replace_inventory.yml file

Define your cluster hosts by creating a node_replacement_inventory.yml file.

Procedure

  1. Back up the node_replace_inventory.yml file.

    # cd /etc/ansible/roles/gluster.ansible/playbooks/hc-ansible-deployment
    # cp node_replace_inventory.yml node_replace_inventory.yml.bk
    Copy to Clipboard Toggle word wrap
  2. Edit the node_replace_inventory.yml file to define your cluster.

    See Appendix C, Understanding the node_replace_inventory.yml file for more information about this inventory file and its parameters.

7.5. Executing the replace_node.yml playbook file

The replace_node.yml playbook reconfigures a Red Hat Hyperconverged Infrastructure for Virtualization cluster to use a new node after an existing cluster node has failed.

Procedure

  1. Execute the playbook.

    # cd /etc/ansible/roles/gluster.ansible/playbooks/hc-ansible-deployment/
    # ansible-playbook -i node_prep_inventory.yml -i node_replace_inventory.yml tasks/replace_node.yml
    Copy to Clipboard Toggle word wrap

7.6. Removing a failed host from the cluster

When a replacement host is ready, remove the existing failed host from the cluster.

Procedure

  1. Remove the failed host.

    1. Log in to the Administrator Portal.
    2. Click Compute Hosts.

      The replacement host is in the NonResponsive state. Virtual machines running on that host are in the Unknown state.

    3. Select the replacement host.
    4. Click the main Action menu (⋮) for the Hosts page and select Confirm host has been rebooted.
    5. Click OK to confirm.
    6. Click the Action menu (⋮) beside the failed host and click Remove.
  2. Clean stale Hosted Engine metadata.

    1. Determine the identifier of the failed node.

      # hosted-engine --vm-status | grep failed-node.example.com
      --== Host server1-frontend.example.com (id: 1) status ==--
      Hostname                           : failed-node.example.com
      Copy to Clipboard Toggle word wrap
    2. Remove the metadata associated with that host identifier.

      # hosted-engine --clean-metadata --host-id=1 --force
      Copy to Clipboard Toggle word wrap

7.7. Verifying healing in progress

After replacing a failed host with a new host, verify that your storage is healing as expected.

Procedure

  • Verify that healing is in progress.

    Run the following command on any hyperconverged host:

    # for vol in `gluster volume list`; do gluster volume heal $vol info summary; done
    Copy to Clipboard Toggle word wrap

    The output shows a summary of healing activity on each brick in each volume, for example:

    Brick brick1
    Status: Connected
    Total Number of entries: 3
    Number of entries in heal pending: 2
    Number of entries in split-brain: 1
    Number of entries possibly healing: 0
    Copy to Clipboard Toggle word wrap

    Depending on brick size, volumes can take a long time to heal. You can still run and migrate virtual machines using this node while the underlying storage heals.

Volver arriba
Red Hat logoGithubredditYoutubeTwitter

Aprender

Pruebe, compre y venda

Comunidades

Acerca de la documentación de Red Hat

Ayudamos a los usuarios de Red Hat a innovar y alcanzar sus objetivos con nuestros productos y servicios con contenido en el que pueden confiar. Explore nuestras recientes actualizaciones.

Hacer que el código abierto sea más inclusivo

Red Hat se compromete a reemplazar el lenguaje problemático en nuestro código, documentación y propiedades web. Para más detalles, consulte el Blog de Red Hat.

Acerca de Red Hat

Ofrecemos soluciones reforzadas que facilitan a las empresas trabajar en plataformas y entornos, desde el centro de datos central hasta el perímetro de la red.

Theme

© 2025 Red Hat