Maintaining Red Hat Hyperconverged Infrastructure
Common maintenance tasks for Red Hat Hyperconverged Infrastructure
Abstract
Part I. Configuration Tasks
Chapter 1. Add Compute and Storage Resources
Red Hat Hyperconverged Infrastructure (RHHI) can be scaled in multiples of three nodes to a maximum of nine nodes.
1.1. Scaling RHHI deployments
1.1.1. Before you begin
- Be aware that the only supported method of scaling Red Hat Hyperconverged Infrastructure (RHHI) is to create additional volumes that span the new nodes. Expanding the existing volumes to span across more nodes is not supported.
- Arbitrated replicated volumes are not supported for scaling.
- If your existing deployment uses certificates signed by a Certificate Authority for encryption, prepare the certificates that will be required for the new nodes.
1.1.2. Scaling RHHI by adding additional volumes on new nodes
Install the three physical machines
Follow the instructions in Deploying Red Hat Hyperconverged Infrastructure: https://access.redhat.com/documentation/en-us/red_hat_hyperconverged_infrastructure/1.0/html/deploying_red_hat_hyperconverged_infrastructure/install-host-physical-machines.
NoteOnly one arbitrated replicated volume is supported per deployment.
Configure key-based SSH authentication
Follow the instructions in Deploying Red Hat Hyperconverged Infrastructure to configure key-based SSH authentication from one node to all nodes: https://access.redhat.com/documentation/en-us/red_hat_hyperconverged_infrastructure/1.0/html/deploying_red_hat_hyperconverged_infrastructure/task-configure-key-based-ssh-auth
Automatically configure new nodes
- Create an add_nodes.conf file based on the template provided in Section B.3, “Example gdeploy configuration file for scaling to additional nodes”.
Run gdeploy using the add_nodes.conf file:
# gdeploy -c add_nodes.conf
(Optional) If encryption is enabled
Ensure that the following files exist in the following locations on all nodes.
- /etc/ssl/glusterfs.key
- The node’s private key.
- /etc/ssl/glusterfs.pem
- The certificate signed by the Certificate Authority, which becomes the node’s certificate.
- /etc/ssl/glusterfs.ca
- The Certificate Authority’s certificate. For self-signed configurations, this file contains the concatenated certificates of all nodes.
Enable management encryption.
Create the /var/lib/glusterd/secure-access file on each node.
# touch /var/lib/glusterd/secure-access
Restart the glusterd service
# systemctl restart glusterd
Update the auth.ssl-allow parameter for all volumes
Use the following command on any existing node to obtain the existing settings:
# gluster volume get engine auth.ssl-allow
Set auth.ssl-allow to the old value with the new IP addresses appended.
# gluster volume set <vol_name> auth.ssl-allow "<old_hosts>;<new_hosts>"
Disable multipath for each node’s storage devices
Add the following lines to the beginning of the /etc/multipath.conf file.
# VDSM REVISION 1.3 # VDSM PRIVATE
Add Red Hat Gluster Storage devices to the blacklist definition in the /etc/multipath.conf file.
blacklist { devnode "^sd[a-z]" }
Restart multipathd
# systemctl restart multipathd
In Red Hat Virtualization Manager, add the new hosts to the existing cluster
For details on adding a host to a cluster, follow the instructions in the Red Hat Virtualization Administration Guide: https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/administration_guide/sect-host_tasks.
Ensure that you uncheck Automatically configure firewall and that you enable Power management settings.
Add the new bricks to the volume
For details, see the Red Hat Virtualization Administration Guide: https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/administration_guide/sect-using_red_hat_gluster_storage_as_a_storage_domain.
Attach the gluster network to the new hosts
- Click the Hosts tab and select the host.
- Click the Network Interfaces subtab and then click Setup Host Networks.
- Drag and drop the newly created network to the correct interface.
- Ensure that the Verify connectivity checkbox is checked.
- Ensure that the Save network configuration checkbox is checked.
- Click OK to save.
Verify the health of the network
Click the Hosts tab and select the host.
Click the Networks subtab and check the state of the host’s network
If the network interface enters an "Out of sync" state or does not have an IPv4 Address, click the Management tab that corresponds to the host and click Refresh Capabilities.
Create a new volume
- Check the Optimize for virt-store checkbox.
Set the following volume options:
- Set cluster.granular-entry-heal to on.
- Set network.remote-dio to off
- Set performance.strict-o-direct to on
- Start the new volume.
Create a new storage domain
- Click the Storage tab and then click New Domain.
- Provide a Name for the domain.
- Set the Domain function to Data.
- Set the Storage Type to GlusterFS.
Check the Use managed gluster volume option.
A list of volumes available in the cluster appears.
- Click OK.
Chapter 2. Configure High Availability using Fencing Policies
Fencing allows a cluster to enforce performance and availability policies and react to unexpected host failures by automatically rebooting virtualization hosts.
Several policies specific to Red Hat Gluster Storage must be enabled to ensure that fencing activities do not disrupt storage services in a Red Hat Hyperconverged (RHHI) Infrastructure deployment.
2.1. Configuring Fencing Policies for Gluster Storage
- In Red Hat Virtualization Manager, click the Clusters tab.
- Click Edit. The Edit Cluster window opens.
- Click the Fencing policy tab.
- Check the Enable fencing checkbox.
Check the checkboxes for at least the following fencing policies:
- Skip fencing if gluster bricks are up
- Skip fencing if gluster quorum not met
See Appendix A, Fencing Policies for Red Hat Gluster Storage for details on the effects of these policies.
- Click OK to save settings.
Chapter 3. Configure Disaster Recovery using Geo-replication
Geo-replication is used to synchronize data from one Red Hat Gluster Storage cluster to another. Synchronizing your data volume from your discrete Red Hat Hyperconverged Infrastructure (RHHI) cluster to a central data center on a regular basis helps ensure you can restore your cluster to a working state after an outage.
3.1. Configuring geo-replication for disaster recovery
3.1.1. Before you begin
Prepare a remote backup volume to hold the geo-replicated copy of your local volume.
Ensure that the volume you want to back up has shared storage enabled.
# gluster volume set all cluster.enable-shared-storage enable
Ensure that your remote backup volume has sharding enabled.
# gluster volume set VOLNAME features.shard enable
- If encryption is enabled on the storage that you want to back up, encryption must also be enabled on your remote backup volume.
3.1.2. Configuring a geo-replication session
Create (but do not start) the geo-replication session
Using the command line interface, create (but do not start) a geo-replication session from a local volume to the remote backup volume.
See the Red Hat Gluster Storage 3.2 Administration Guide for details: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.2/html/administration_guide/sect-Preparing_to_Deploy_Geo-replication#Setting_Up_the_Environment_for_Geo-replication_Session
Configure a meta-volume for your remote backup
See the Red Hat Gluster Storage 3.2 Administration Guide for details: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.2/html/administration_guide/sect-preparing_to_deploy_geo-replication#chap-Managing_Geo-replication-Meta_Volume.
3.1.3. Configuring synchronization schedule
Verify that geo-replication is configured
In Red Hat Virtualization Manager, click the Volumes tab.
Check the Info column for the geo-replication icon. If present, a geo-replication session is configured for that volume.
- In the Storage Domain tab, select the storage domain to back up.
Click the Remote Data Sync Setup sub-tab
The Setup Remote Data Synchronization window opens.
- In the Geo-replicated to field, select the backup destination.
In the Recurrence field, select a recurrence interval type.
Valid values are WEEKLY with at least one weekday checkbox selected, or DAILY.
In the Hours and Minutes field, specify the time to start synchronizing.
NoteThis time is based on the Hosted Engine’s timezone.
- Click OK.
- Check the Events pane at the time you specified to verify that synchronization works correctly.
3.1.4. Deleting synchronization schedule
- In the Storage Domain tab, select the storage domain to back up.
Click the Remote Data Sync Setup sub-tab
The Setup Remote Data Synchronization window opens.
- In the Recurrence field, select a recurrence interval type of NONE.
- Click OK.
(Optional) Remove the geo-replication session
Run the following command from the geo-replication master node:
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL delete
You can also run this command with the
reset-sync-time
parameter. For further information about this parameter and geo-replication in general, see the Red Hat Gluster Storage Administration Guide: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.2/html/administration_guide/chap-managing_geo-replication.
Chapter 4. Configure Encryption with Transport Layer Security (TLS/SSL)
Transport Layer Security (TLS/SSL) can be used to encrypt management and storage layer communications between nodes. This helps ensure that your data remains private.
Encryption can be configured using either self-signed certificates or certificates signed by a Certificate Authority.
This document assumes that you want to enable encryption on an existing deployment. However, encryption can also be configured as part of the deployment process. See Deploying Red Hat Hyperconverged Infrastructure for details: https://access.redhat.com/documentation/en-us/red_hat_hyperconverged_infrastructure/1.0/html/deploying_red_hat_hyperconverged_infrastructure/.
4.1. Configuring TLS/SSL using self-signed certificates
Enabling or disabling encryption is a disruptive process that requires virtual machines and the Hosted Engine to be shut down.
Shut down all virtual machines
See Shutting Down a Virtual Machine in the Red Hat Virtualization documentation for details: https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/virtual_machine_management_guide/chap-administrative_tasks.
Move all storage domains except the hosted engine storage domain into Maintenance mode
See Moving Storage Domains to Maintenance Mode in the Red Hat Virtualization documentation for details: https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/administration_guide/sect-storage_tasks.
Move the hosted engine into global maintenance mode
Run the following command on the hypervisor that hosts the hosted engine:
# hosted-engine --set-maintenance --mode=global
Shut down the hosted engine virtual machine
Run the following command on the hypervisor that hosts the hosted engine:
# hosted-engine --vm-shutdown
Verify that the hosted engine has shut down by running the following command:
# hosted-engine --vm-status
Stop all high availability services
Run the following command on all hypervisors:
# systemctl stop ovirt-ha-agent # systemctl stop ovirt-ha-broker
Unmount the hosted engine storage domain from all hypervisors
# hosted-engine --disconnect-storage
Verify that all volumes are unmounted
On each hypervisor, verify that all gluster volumes are no longer mounted.
# mount
Create a gdeploy configuration file
Use the template file in Section B.1, “Example gdeploy configuration file for setting up TLS/SSL” to create a new configuration file that will set up TLS/SSL on your deployment.
Run gdeploy using your new configuration file
On the first physical machine, run gdeploy using the configuration file you created in the previous step:
# gdeploy -c set_up_encryption.conf
This may take some time to complete.
Verify that no TLS/SSL errors occurred
Check the /var/log/glusterfs/glusterd.log file on each physical machine to ensure that no TLS/SSL related errors occurred, and setup completed successfully.
Start all high availability services
Run the following commands on all hypervisors:
# systemctl start ovirt-ha-agent # systemctl start ovirt-ha-broker
Move the hosted engine out of Global Maintenance mode
# hosted-engine --set-maintenance --mode=none
The hosted engine starts automatically after a short wait.
Wait for nodes to synchronize
Run the following command on the first hypervisor to check synchronization status. If engine status is listed as unknown stale-data, synchronization requires several more minutes to complete.
The following output indicates completed synchronization.
# hosted-engine --vm-status | grep 'Engine status' Engine status : {"health": "good", "vm": "up", "detail": "up"} Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Activate all storage domains
Activate the master storage domain first, followed by all other storage domains.
For details on activating storage domains, see Activating Storage Domains from Maintenance Mode in the Red Hat Virtualization documentation: https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/administration_guide/sect-storage_tasks.
Start all virtual machines
See Starting a Virtual Machine in the Red Hat Virtualization documentation for details: https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/virtual_machine_management_guide/sect-starting_the_virtual_machine.
4.2. Configuring TLS/SSL using Certificate Authority signed certificates
Enabling or disabling encryption is a disruptive process that requires virtual machines and the Hosted Engine to be shut down.
Ensure that you have appropriate certificates signed by a Certificate Authority before proceeding. Obtaining certificates is outside the scope of this document, but further details are available in the Red Hat Gluster Storage Administration Guide: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.2/html/administration_guide/chap-network_encryption#chap-Network_Encryption-Prereqs.
Shut down all virtual machines
See Shutting Down a Virtual Machine in the Red Hat Virtualization documentation for details: https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/virtual_machine_management_guide/chap-administrative_tasks.
Move all storage domains except the hosted engine storage domain into Maintenance mode
See Moving Storage Domains to Maintenance Mode in the Red Hat Virtualization documentation for details: https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/administration_guide/sect-storage_tasks.
Move the hosted engine into global maintenance mode
Run the following command on the hypervisor that hosts the hosted engine:
# hosted-engine --set-maintenance --mode=global
Shut down the hosted engine virtual machine
Run the following command on the hypervisor that hosts the hosted engine:
# hosted-engine --vm-shutdown
Verify that the hosted engine has shut down by running the following command:
# hosted-engine --vm-status
Stop all high availability services
Run the following command on all hypervisors:
# systemctl stop ovirt-ha-agent # systemctl stop ovirt-ha-broker
Unmount the hosted engine storage domain from all hypervisors
# hosted-engine --disconnect-storage
Verify that all volumes are unmounted
On each hypervisor, verify that all gluster volumes are no longer mounted.
# mount
Configure Certificate Authority signed encryption
ImportantEnsure that you have appropriate certificates signed by a Certificate Authority before proceeding. Obtaining certificates is outside the scope of this document.
Place certificates in the following locations on all nodes.
- /etc/ssl/glusterfs.key
- The node’s private key.
- /etc/ssl/glusterfs.pem
- The certificate signed by the Certificate Authority, which becomes the node’s certificate.
- /etc/ssl/glusterfs.ca
- The Certificate Authority’s certificate.
Stop all volumes
# gluster volume stop all
Restart glusterd on all nodes
# systemctl restart glusterd
Enable TLS/SSL encryption on all volumes
# gluster volume set <volname> client.ssl on # gluster volume set <volname> server.ssl on
Specify access permissions on all hosts
# gluster volume set <volname> auth.ssl-allow "host1,host2,host3"
Start all volumes
# gluster volume start all
Verify that no TLS/SSL errors occurred
Check the /var/log/glusterfs/glusterd.log file on each physical machine to ensure that no TLS/SSL related errors occurred, and setup completed successfully.
Start all high availability services
Run the following commands on all hypervisors:
# systemctl start ovirt-ha-agent # systemctl start ovirt-ha-broker
Move the hosted engine out of Global Maintenance mode
# hosted-engine --set-maintenance --mode=none
The hosted engine starts automatically after a short wait.
Wait for nodes to synchronize
Run the following command on the first hypervisor to check synchronization status. If engine status is listed as unknown stale-data, synchronization requires several more minutes to complete.
The following output indicates completed synchronization.
# hosted-engine --vm-status | grep 'Engine status' Engine status : {"health": "good", "vm": "up", "detail": "up"} Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Activate all storage domains
Activate the master storage domain first, followed by all other storage domains.
For details on activating storage domains, see Activating Storage Domains from Maintenance Mode in the Red Hat Virtualization documentation: https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/administration_guide/sect-storage_tasks.
Start all virtual machines
See Starting a Virtual Machine in the Red Hat Virtualization documentation for details: https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/virtual_machine_management_guide/sect-starting_the_virtual_machine.
Part II. Maintenance Tasks
Chapter 5. Updating Red Hat Hyperconverged Infrastructure
Updating involves moving from one version of a product to another minor release of the same product version, for example, from Red Hat Virtualization 4.1 to 4.1.3.
Red Hat recommends updating your systems regularly to apply security and bug fixes and take advantage of minor enhancements that are made available between major product releases.
5.1. Update workflow
Red Hat Hyperconverged Infrastructure is a software solution comprised of several different components. Apply updates in the following order to minimize disruption.
- Hosted Engine virtual machine
- Physical hosts
5.2. Before you update
Ensure that your Hosted Engine virtual machine is subscribed to the
rhel-7-server-rhvh-4-rpms
andrhel-7-server-rhv-4-tools-rpms
repositories.# subscription-manager repos --enable=rhel-7-server-rhv-4.1-rpms # subscription-manager repos --enable=rhel-7-server-rhv-4-tools-rpms
Ensure that all physical machines are subscribed to the
rhel-7-server-rhvh-4-rpms
repository.# subscription-manager repos --enable=rhel-7-server-rhv-4-rpms
If geo-replication is configured, ensure that data is not being synchronized.
- Check the Tasks subtab and ensure that there are no ongoing tasks related to Data Synchronization. If data synchronization tasks are present, wait until they are complete before beginning the update.
Stop all geo-replication sessions so that synchronization will not occur during the update. Click the Geo-replication subtab and select the session that you want to stop, then click Stop.
Alternatively, run the following command to stop a geo-replication session.
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop
5.3. Updating the Hosted Engine virtual machine
Follow the steps in the following section of the Red Hat Virtualization Upgrade Guide to update the Hosted Engine virtual machine: https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/upgrade_guide/chap-updates_between_minor_releases#Upgrading_between_Minor_Releases
5.4. Updating the physical hosts
Follow the steps in the sections linked below to update the physical hosts one at a time.
Between updates, ensure that you wait for any heal operations to complete before updating the next host. You can view heal status in the Bricks subtab. Alternatively, run the following command for every volume, and ensure that Number of entries: 0 is displayed for each brick before updating the next host.
# gluster volume heal VOLNAME info
Most updates can be applied using Red Hat Virtualization Manager. Follow the steps in the following section of the Red Hat Virtualization Upgrade Guide to update the physical host machines one at a time: https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/upgrade_guide/updating_virtualization_hosts.
If you need to apply a security fix, apply updates manually instead. Follow the steps in the following section of the Red Hat Virtualization Upgrade Guide: https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/upgrade_guide/Manually_Updating_Virtualization_Hosts
Remember to move your hosts out of maintenance mode when their updates have been applied by running the following command:
# hosted-engine --set-maintenance --mode=none
Chapter 6. Replacing the Primary Gluster Storage Node
When self-signed encryption is enabled, replacing a node is a disruptive process that requires virtual machines and the Hosted Engine to be shut down.
- (Optional) If encryption using a Certificate Authority is enabled, follow the steps at the following link before continuing: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.2/html/administration_guide/ch22s04.
Move the node to be replaced into Maintenance mode
- In Red Hat Virtualization Manager, click the Hosts tab and select the Red Hat Gluster Storage node in the results list.
- Click Maintenance to open the Maintenance Host(s) confirmation window.
- Click OK to move the host to Maintenance mode.
Install the replacement node
Follow the instructions in Deploying Red Hat Hyperconverged Infrastructure to install the physical machine and configure storage on the new node.
Prepare the replacement node
- Create a file called replace_node_prep.conf based on the template provided in Section B.2, “Example gdeploy configuration file for preparing to replace a node”.
From a node with
gdeploy
installed (usually the node that hosts the Hosted Engine), run gdeploy using the new configuration file:# gdeploy -c replace_node_prep.conf
(Optional) If encryption with self-signed certificates is enabled
- Generate the private key and self-signed certificate on the replacement node. See the Red Hat Gluster Storage Administration Guide for details: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.2/html/administration_guide/chap-network_encryption#chap-Network_Encryption-Prereqs.
On a healthy node, make a backup copy of the /etc/ssl/glusterfs.ca file:
# cp /etc/ssl/glusterfs.ca /etc/ssl/glusterfs.ca.bk
- Append the new node’s certificate to the content of the /etc/ssl/glusterfs.ca file.
- Distribute the /etc/ssl/glusterfs.ca file to all nodes in the cluster, including the new node.
Run the following command on the replacement node to enable management encryption:
# touch /var/lib/glusterd/secure-access
Include the new server in the value of the
auth.ssl-allow
volume option by running the following command for each volume.# gluster volume set <volname> auth.ssl-allow "<old_node1>,<old_node2>,<new_node>"
Restart the glusterd service on all nodes
# systemctl restart glusterd
- Follow the steps in Section 4.1, “Configuring TLS/SSL using self-signed certificates” to remount all gluster processes.
Add the replacement node to the cluster
Run the following command from any node already in the cluster.
# peer probe <new_node>
Move the Hosted Engine into Maintenance mode:
# hosted-engine --set-maintenance --mode=global
Stop the ovirt-engine service
# systemctl stop ovirt-engine
Update the database
# sudo -u postgres psql \c engine; UPDATE storage_server_connections SET connection ='<replacement_node_IP>:/engine' WHERE connection = ‘<old_server_IP>:/engine'; UPDATE storage_server_connections SET connection ='<replacement_node_IP>:/vmstore' WHERE connection = ‘<old_server_IP>:/vmstore'; UPDATE storage_server_connections SET connection ='<replacement_node_IP>:/data' WHERE connection = '<old_server_IP>:/data';
Start the ovirt-engine service
# systemctl start ovirt-engine
- Stop all virtual machines except the Hosted Engine.
- Move all storage domains except the Hosted Engine domain into Maintenance mode
Stop the Hosted Engine virtual machine
Run the following command on the existing node that hosts the Hosted Engine.
# hosted-engine --vm-shutdown
Stop high availability services on all nodes
# systemctl stop ovirt-ha-agent # systemctl stop ovirt-ha-broker
Disconnect Hosted Engine storage from the hypervisor
Run the following command on the existing node that hosts the Hosted Engine.
# hosted-engine --disconnect-storage
Update the Hosted Engine configuration file
Edit the storage parameter in the
/etc/ovirt-hosted-engine/hosted-engine.conf
file to use the replacement server.storage=<replacement_server_IP>:/engine
Reboot the existing and replacement nodes
Wait until both nodes are available before continuing.
Take the Hosted Engine out of Maintenance mode
# hosted-engine --set-maintenance --mode=none
Verify replacement node is used
On all virtualization hosts, verify that the engine volume is mounted from the replacement node by checking the IP address in the output of the
mount
command.Activate storage domains
Verify that storage domains mount using the IP address of the replacement node.
Remove the old node
- Using the RHV Management UI, remove the old node.
Detach the old host from the cluster.
# gluster peer detach <old_node_IP> force
Using the RHV Management UI, add the replacement node
Specify that the replacement node be used to host the Hosted Engine.
Move the replacement node into Maintenance mode.
# hosted-engine --set-maintenance --mode=global
Update the Hosted Engine configuration file
Edit the storage parameter in the
/etc/ovirt-hosted-engine/hosted-engine.conf
file to use the replacement node.storage=<replacement_node_IP>:/engine
Reboot the replacement node.
Wait until the node is back online before continuing.
Activate the replacement node from the RHV Management UI.
Ensure that all volumes are mounted using the IP address of the replacement node.
Replace engine volume brick
Replace the brick on the old node that belongs to the engine volume with a new brick on the replacement node.
- Click the Volumes tab.
- Click the Bricks sub-tab.
- Select the brick to replace, and then click Replace brick.
- Select the node that hosts the brick being replaced.
- In the Replace brick window, provide the new brick’s path.
On the replacement node, run the following command to remove metadata from the previous host.
# hosted-engine --clean-metadata --host-id=<old_host_id> --force-clean
Chapter 7. Replacing a Gluster Storage Node
If a Red Hat Gluster Storage node needs to be replaced, there are two options for the replacement node:
- Replace the node with a new node that has a different fully-qualified domain name by following the instructions in Section 7.1, “Replacing a Gluster Storage Node (Different FQDN)”.
- Replace the node with a new node that has the same fully-qualified domain name by following the instructions in Section 7.2, “Replacing a Gluster Storage Node (Same FQDN)”.
Follow the instructions in whichever section is appropriate for your deployment.
7.1. Replacing a Gluster Storage Node (Different FQDN)
When self-signed encryption is enabled, replacing a node is a disruptive process that requires virtual machines and the Hosted Engine to be shut down.
Prepare the replacement node
Follow the instructions in Deploying Red Hat Hyperconverged Infrastructure to install the physical machine.
Stop any existing geo-replication sessions
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop
For further information, see the Red Hat Gluster Storage Administration Guide: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.2/html/administration_guide/sect-starting_geo-replication#Stopping_a_Geo-replication_Session.
Move the node to be replaced into Maintenance mode
Perform the following steps in Red Hat Virtualization Manager:
- Click the Hosts tab and select the Red Hat Gluster Storage node in the results list.
- Click Maintenance to open the Maintenance Host(s) confirmation window.
- Click OK to move the host to Maintenance mode.
Prepare the replacement node
Configure key-based SSH authentication
Configure key-based SSH authentication from a physical machine still in the cluster to the replacement node. For details, see https://access.redhat.com/documentation/en-us/red_hat_hyperconverged_infrastructure/1.0/html/deploying_red_hat_hyperconverged_infrastructure/task-configure-key-based-ssh-auth.
Prepare the replacement node
Create a file called replace_node_prep.conf based on the template provided in Section B.2, “Example gdeploy configuration file for preparing to replace a node”.
From a node with
gdeploy
installed (usually the node that hosts the Hosted Engine), run gdeploy using the new configuration file:# gdeploy -c replace_node_prep.conf
Create replacement brick directories
Ensure the new directories are owned by the vdsm user and the kvm group.
# mkdir /gluster_bricks/engine/engine # chmod vdsm:kvm /gluster_bricks/engine/engine # mkdir /gluster_bricks/data/data # chmod vdsm:kvm /gluster_bricks/data/data # mkdir /gluster_bricks/vmstore/vmstore # chmod vdsm:kvm /gluster_bricks/vmstore/vmstore
(Optional) If encryption is enabled
Generate the private key and self-signed certificate on the new server using the steps in the Red Hat Gluster Storage Administration Guide: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.2/html/administration_guide/chap-network_encryption#chap-Network_Encryption-Prereqs.
If encryption using a Certificate Authority is enabled, follow the steps at the following link before continuing: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.2/html/administration_guide/ch22s04.
Add the new node’s certificate to existing certificates.
- On one of the healthy nodes, make a backup copy of the /etc/ssl/glusterfs.ca file.
- Add the new node’s certificate to the /etc/ssl/glusterfs.ca file on the healthy node.
- Distribute the updated /etc/ssl/glusterfs.ca file to all other nodes, including the new node.
Enable management encryption
Run the following command on the new node to enable management encryption:
# touch /var/lib/glusterd/secure-access
Include the new server in the value of the
auth.ssl-allow
volume option by running the following command for each volume.# gluster volume set <volname> auth.ssl-allow "<old_node1>,<old_node2>,<new_node>"
Restart the glusterd service on all nodes
# systemctl restart glusterd
- If encryption uses self-signed certificates, follow the steps in Section 4.1, “Configuring TLS/SSL using self-signed certificates” to remount all gluster processes.
Add the new host to the existing cluster
Run the following command from one of the healthy cluster members:
# gluster peer probe <new_node>
Add the new host to the existing cluster
- Click the Hosts tab and then click New to open the New Host dialog.
- Provide a Name, Address, and Password for the new host.
- Uncheck the Automatically configure host firewall checkbox, as firewall rules are already configured by gdeploy.
- In the Hosted Engine tab of the New Host dialog, set the value of Choose hosted engine deployment action to deploy.
- Click Deploy.
- When the host is available, click the Network Interfaces subtab and then click Setup Host Networks.
Drag and drop the network you created for gluster to the IP associated with this host, and click OK.
See the Red Hat Virtualization 4.1 Self-Hosted Engine Guide for further details: https://access.redhat.com/documentation/en/red-hat-virtualization/4.1/paged/self-hosted-engine-guide/chapter-7-installing-additional-hosts-to-a-self-hosted-environment.
Configure and mount shared storage on the new host
# cp /etc/fstab /etc/fstab.bk # echo "<new_host>:/gluster_shared_storage /var/run/gluster/shared_storage/ glusterfs defaults 0 0" >> /etc/fstab # mount /gluster_shared_storage
Replace the old brick with the brick on the new host
- In Red Hat Virtualization Manager, click the Hosts tab and select the volume.
- Click the Bricks sub-tab.
- Click Replace Brick beside the old brick and specify the replacement brick.
- Verify that brick heal completes successfully.
In the Hosts tab, right-click on the old host and click Remove.
Use
gluster peer status
to verify that that the old host no longer appears. If the old host is still present in the status output, run the following command to forcibly remove it:# gluster peer detach <old_node> force
Clean old host metadata
# hosted-engine --clean-metadata --host-id=<old_host_id> --force-clean
Set up new SSH keys for geo-replication of new brick
# gluster system:: execute gsec_create
Recreate geo-replication session and distribute new SSH keys.
# gluster volume geo-replication <MASTER_VOL> <SLAVE_HOST>::<SLAVE_VOL> create push-pem force
Start the geo-replication session.
# gluster volume geo-replication <MASTER_VOL> <SLAVE_HOST>::<SLAVE_VOL> start
7.2. Replacing a Gluster Storage Node (Same FQDN)
When self-signed encryption is enabled, replacing a node is a disruptive process that requires virtual machines and the Hosted Engine to be shut down.
- (Optional) If encryption using a Certificate Authority is enabled, follow the steps at the following link before continuing: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.2/html/administration_guide/ch22s04.
Move the node to be replaced into Maintenance mode
- In Red Hat Virtualization Manager, click the Hosts tab and select the Red Hat Gluster Storage node in the results list.
- Click Maintenance to open the Maintenance Host(s) confirmation window.
- Click OK to move the host to Maintenance mode.
Prepare the replacement node
Follow the instructions in Deploying Red Hat Hyperconverged Infrastructure to install the physical machine and configure storage on the new node.
Prepare the replacement node
- Create a file called replace_node_prep.conf based on the template provided in Section B.2, “Example gdeploy configuration file for preparing to replace a node”.
From a node with
gdeploy
installed (usually the node that hosts the Hosted Engine), run gdeploy using the new configuration file:# gdeploy -c replace_node_prep.conf
(Optional) If encryption with self-signed certificates is enabled
- Generate the private key and self-signed certificate on the replacement node. See the Red Hat Gluster Storage Administration Guide for details: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.2/html/administration_guide/chap-network_encryption#chap-Network_Encryption-Prereqs.
On a healthy node, make a backup copy of the /etc/ssl/glusterfs.ca file:
# cp /etc/ssl/glusterfs.ca /etc/ssl/glusterfs.ca.bk
- Append the new node’s certificate to the content of the /etc/ssl/glusterfs.ca file.
- Distribute the /etc/ssl/glusterfs.ca file to all nodes in the cluster, including the new node.
Run the following command on the replacement node to enable management encryption:
# touch /var/lib/glusterd/secure-access
Replace the host machine
Follow the instructions in the Red Hat Gluster Storage Administration Guide to replace the host: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.2/html/administration_guide/sect-replacing_hosts#Replacing_a_Host_Machine_with_the_Same_Hostname.
Restart the glusterd service on all nodes
# systemctl restart glusterd
Verify that all nodes reconnect
# gluster peer status
- (Optional) If encryption uses self-signed certificates, follow the steps in Section 4.1, “Configuring TLS/SSL using self-signed certificates” to remount all gluster processes.
Verify that all nodes reconnect and that brick heal completes successfully
# gluster peer status
Refresh fingerprint
- In Red Hat Virtualization Manager, click the Hosts tab and select the new host.
- Click Edit Host.
- Click Advanced on the details screen.
- Click Fetch fingerprint.
- Click Reinstall and provide the root password when prompted.
- Click the Hosted Engine tab and click Deploy
Attach the gluster network to the host
- Click the Hosts tab and select the host.
- Click the Network Interfaces subtab and then click Setup Host Networks.
- Drag and drop the newly created network to the correct interface.
- Ensure that the Verify connectivity checkbox is checked.
- Ensure that the Save network configuration checkbox is checked.
- Click OK to save.
Verify the health of the network
Click the Hosts tab and select the host. Click the Networks subtab and check the state of the host’s network.
If the network interface enters an "Out of sync" state or does not have an IPv4 Address, click the Management tab that corresponds to the host and click Refresh Capabilities.
Chapter 8. Restoring a volume from a geo-replicated backup
Install and configure a replacement Hyperconverged Infrastructure deployment
For instructions, refer to Deploying Red Hat Hyperconverged Infrastructure: https://access.redhat.com/documentation/en-us/red_hat_hyperconverged_infrastructure/1.0/html/deploying_red_hat_hyperconverged_infrastructure/.
Import the backup of the storage domain
From the new Hyperconverged Infrastructure deployment, in Red Hat Virtualization Manager:
- Click the Storage tab.
- Click Import Domain. The Import Pre-Configured Domain window opens.
- In the Storage Type field, specify GlusterFS.
- In the Name field, specify a name for the new volume that will be created from the backup volume.
- In the Path field, specify the path to the backup volume.
Click OK. The following warning appears, with any active data centers listed below:
This operation might be unrecoverable and destructive! Storage Domain(s) are already attached to a Data Center. Approving this operation might cause data corruption if both Data Centers are active.
- Check the Approve operation checkbox and click OK.
Determine a list of virtual machines to import
Determine the imported domain’s identifier
The following command returns the domain identifier.
# curl -v -k -X GET -u "admin@internal:password" -H "Accept: application/xml" https://$ENGINE_FQDN/ovirt-engine/api/storagedomains/
For example:
# curl -v -k -X GET -u "admin@example.com:mybadpassword" -H "Accept: application/xml" https://10.70.37.140/ovirt-engine/api/storagedomains/
Determine the list of unregistered disks by running the following command:
# curl -v -k -X GET -u "admin@internal:password" -H "Accept: application/xml" "https://$ENGINE_FQDN/ovirt-engine/api/storagedomains/DOMAIN_ID/vms;unregistered"
For example:
# curl -v -k -X GET -u "admin@example.com:mybadpassword" -H "Accept: application/xml" "https://10.70.37.140/ovirt-engine/api/storagedomains/5e1a37cf-933d-424c-8e3d-eb9e40b690a7/vms;unregistered"
Perform a partial import of each virtual machine to the storage domain
Determine cluster identifier
The following command returns the cluster identifier.
# curl -v -k -X GET -u "admin@internal:password" -H "Accept: application/xml" https://$ENGINE_FQDN/ovirt-engine/api/clusters/
For example:
# curl -v -k -X GET -u "admin@example:mybadpassword" -H "Accept: application/xml" https://10.70.37.140/ovirt-engine/api/clusters/
Import the virtual machines
The following command imports a virtual machine without requiring all disks to be available in the storage domain.
# curl -v -k -u 'admin@internal:password' -H "Content-type: application/xml" -d '<action> <cluster id="CLUSTER_ID"></cluster> <allow_partial_import>true</allow_partial_import> </action>' "https://ENGINE_FQDN/ovirt-engine/api/storagedomains/DOMAIN_ID/vms/VM_ID/register"
For example:
# curl -v -k -u 'admin@example.com:mybadpassword' -H "Content-type: application/xml" -d '<action> <cluster id="bf5a9e9e-5b52-4b0d-aeba-4ee4493f1072"></cluster> <allow_partial_import>true</allow_partial_import> </action>' "https://10.70.37.140/ovirt-engine/api/storagedomains/8d21980a-a50b-45e9-9f32-cd8d2424882e/e164f8c6-769a-4cbd-ac2a-ef322c2c5f30/register"
For further information, see the Red Hat Virtualization REST API Guide: https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/rest_api_guide/.
Migrate the partially imported disks to the new storage domain
On the Disks tab, click on the Move Disk option. Move the imported disks from the synced volume to the replacement cluster’s storage domain. For further information, see the Red Hat Virtualization Administration Guide.
Attach the restored disks to the new virtual machines
Follow the instructions in the Red Hat Virtualization Virtual Machine Management Guide to attach the replacement disks to each virtual machine.
Part III. Reference Material
Appendix A. Fencing Policies for Red Hat Gluster Storage
The following fencing policies are required for Red Hat Hyperconverged Infrastructure (RHHI) deployments. They ensure that hosts are not shut down in situations where brick processes are still running, or when shutting down the host would remove the cluster’s ability to reach a quorum.
These policies can be set in the New Cluster or Edit Cluster window in Red Hat Virtualization Manager when Red Hat Gluster Storage functionality is enabled.
- Skip fencing if gluster bricks are up
- Fencing is skipped if bricks are running and can be reached from other peers.
- Skip fencing if gluster quorum not met
- Fencing is skipped if bricks are running and shutting down the host will cause loss of quorum
These policies are checked after all other fencing policies when determining whether a node is fenced.
Additional fencing policies may be useful for your deployment. For further details about fencing, see the Red Hat Virtualization Technical Reference: https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/technical_reference/fencing.
Appendix B. Example gdeploy configuration files
B.1. Example gdeploy configuration file for setting up TLS/SSL
set_up_encryption.conf
# IPs that corresponds to the Gluster Network [hosts] <Gluster_Network_NodeA> <Gluster_Network_NodeB> <Gluster_Network_NodeC> # STEP-1: Generate Keys, Certificates & CA files # The following section generates the keys,certicates, creates # ca file and distributes it to all the hosts [volume1] action=enable-ssl volname=engine ssl_clients=<Gluster_Network_NodeA>,<Gluster_Network_NodeB>,<Gluster_Network_NodeC> ignore_volume_errors=no # As the certificates are already generated, its enough to stop # rest of the volumes,set TLS/SSL related volume options, and # start the volume # STEP-2: Stop all the volumes [volume2] action=stop volname=vmstore [volume3] action=stop volname=data # STEP-3: Set volume options on all the volumes to enable TLS/SSL on the volumes [volume4] action=set volname=vmstore key=client.ssl,server.ssl,auth.ssl-allow value=on,on,"<Gluster_Network_NodeA>;<Gluster_Network_NodeB>;<Gluster_Network_NodeC>" ignore_volume_errors=no [volume5] action=set volname=data key=client.ssl,server.ssl,auth.ssl-allow value=on,on,"<Gluster_Network_NodeA>;<Gluster_Network_NodeB>;<Gluster_Network_NodeC>" ignore_volume_errors=no # STEP-4: Start all the volumes [volume6] action=start volname=vmstore [volume7] action=start volname=data
B.2. Example gdeploy configuration file for preparing to replace a node
If the disks must be replaced as well as the node, ensure that the [pv], [vg], and [lv] sections are not commented out of this file.
For details about how to safely replace a node, see Chapter 7, Replacing a Gluster Storage Node.
replace_node_prep.conf
# EDITME: @1: Change to IP addresses of the network intended for gluster traffic # Values provided here are used to probe the gluster hosts. [hosts] 10.70.X1.Y1 #EDITME : @2: Change to IP addresses of the network intended for gluster traffic #of the node which is going to be replaced. [script1] action=execute ignore_script_errors=no file=/usr/share/ansible/gdeploy/scripts/grafton-sanity-check.sh -d sdc -h 10.70.X1.Y1 # EDITME: @3: Specify the number of data disks in RAID configuration [disktype] raid6 [diskcount] 4 [stripesize] 256 # EDITME : @4: UNCOMMENT SECTION (RHEL ONLY) :Provide the subscription details # Register to RHSM only on the node which needs to be replaced #[RH-subscription1:10.70.X1.Y1] #action=register #username=<username> #password=<passwd> #pool=<pool-id> #[RH-subscription2] #action=disable-repos #repos= #[RH-subscription3] #action=enable-repos #repos=rhel-7-server-rpms,rh-gluster-3-for-rhel-7-server-rpms,rhel-7-server-rhv-4-mgmt-agent-rpms #[yum1] #action=install #packages=vdsm,vdsm-gluster,ovirt-hosted-engine-setup,screen,gluster-nagios-addons #update=yes [service1] action=enable service=ntpd [service2] action=restart service=ntpd [shell1] action=execute command=gluster pool list [shell2] action=execute command=vdsm-tool configure --force # Disable multipath [script3] action=execute file=/usr/share/ansible/gdeploy/scripts/disable-multipath.sh #EDIT ME: @5: UNCOMMENT SECTIONS ONLY: if original brick disks have to be replaced. #[pv1] #action=create #devices=sdc #ignore_pv_errors=no #[vg1] #action=create #vgname=gluster_vg_sdc #pvname=sdc #ignore_vg_errors=no #[lv2:10.70.X1:Y1] #action=create #poolname=gluster_thinpool_sdc #ignore_lv_errors=no #vgname=gluster_vg_sdc #lvtype=thinpool #poolmetadatasize=16GB #size=14TB #[lv3:10.70.X1:Y1] #action=create #lvname=gluster_lv_engine #ignore_lv_errors=no #vgname=gluster_vg_sdc #mount=/gluster_bricks/engine #size=100GB #lvtype=thick #[lv5:10.70.X1:Y1] #action=create #lvname=gluster_lv_data #ignore_lv_errors=no #vgname=gluster_vg_sdc #mount=/gluster_bricks/data #lvtype=thinlv #poolname=gluster_thinpool_sdc #virtualsize=12TB #[lv7:10.70.X1:Y1] #action=create #lvname=gluster_lv_vmstore #ignore_lv_errors=no #vgname=gluster_vg_sdc #mount=/gluster_bricks/vmstore #lvtype=thinlv #poolname=gluster_thinpool_sdc #virtualsize=1TB #[selinux] #yes #[lv9:10.70.X1:Y1] #action=setup-cache #ssd=sdb #vgname=gluster_vg_sdc #poolname=lvthinpool #cache_lv=lvcache #cache_lvsize=180GB [service3] action=start service=glusterd slice_setup=yes [firewalld] action=add ports=111/tcp,2049/tcp,54321/tcp,5900/tcp,5900-6923/tcp,5666/tcp,16514/tcp,54322/tcp services=glusterfs [script2] action=execute file=/usr/share/ansible/gdeploy/scripts/disable-gluster-hooks.sh
B.3. Example gdeploy configuration file for scaling to additional nodes
add_nodes.conf
# Add the hosts to be added [hosts] <Gluster_Network_NodeD> <Gluster_Network_NodeE> <Gluster_Network_NodeF> # If using RHEL 7 as platform, enable required repos # RHVH has all the packages available #[RH-subscription] #ignore_register_errors=no #ignore_attach_pool_errors=no #ignore_enable_errors=no #action=register #username=<username> #password=<mypassword> #pool=<pool-id> #repos=rhel-7-server-rpms,rh-gluster-3-for-rhel-7-server-rpms,rhel-7-server-rhv-4-mgmt-agent-rpms #disable-repos=yes # If using RHEL 7 as platform, have the following section to install packages [yum1] action=install packages=vdsm-gluster,ovirt-hosted-engine-setup,screen update=yes gpgcheck=yes ignore_yum_errors=no # enable NTP [service1] action=enable service=ntpd # start NTP service [service2] action=restart service=ntpd # Setup glusterfs slice [service3] action=restart service=glusterd slice_setup=yes # Open the required ports and firewalld services [firewalld] action=add ports=111/tcp,2049/tcp,54321/tcp,5900/tcp,5900-6923/tcp,5666/tcp,16514/tcp,54322/tcp services=glusterfs # Disable gluster hook scripts [script2] action=execute file=/usr/share/ansible/gdeploy/scripts/disable-gluster-hooks.sh