Chapter 4. Configuring a Red Hat High Availability cluster on Microsoft Azure
To create a cluster where RHEL nodes automatically redistribute their workloads if a node failure occurs, use the Red Hat High Availability Add-On. Such high availability (HA) clusters can also be hosted on public cloud platforms, including Microsoft Azure. Creating RHEL HA clusters on Azure is similar to creating HA clusters in non-cloud environments, with certain specifics.
To configure a Red Hat HA cluster on Azure using Azure virtual machine (VM) instances as cluster nodes, see the following sections. The procedures in these sections assume that you are creating a custom image for Azure. You have a number of options for obtaining the RHEL 9 images you use for your cluster. See Red Hat Enterprise Linux Image Options on Azure for information on image options for Azure.
The following sections provide:
- Prerequisite procedures for setting up your environment for Azure. After you set up your environment, you can create and configure Azure VM instances.
- Procedures specific to the creation of HA clusters, which transform individual nodes into a cluster of HA nodes on Azure. These include procedures for installing the High Availability packages and agents on each cluster node, configuring fencing, and installing Azure network resource agents.
Prerequisites
- Sign up for a Red Hat Customer Portal account.
- Sign up for a Microsoft Azure account with administrator privileges.
- You need to install Azure command line interface (CLI). For more information, see Installing the Azure CLI.
4.1. The benefits of using high-availability clusters on public cloud platforms
A high-availability (HA) cluster is a set of computers (called nodes) that are linked together to run a specific workload. The purpose of HA clusters is to provide redundancy in case of a hardware or software failure. If a node in the HA cluster fails, the Pacemaker cluster resource manager distributes the workload to other nodes and no noticeable downtime occurs in the services that are running on the cluster.
You can also run HA clusters on public cloud platforms. In this case, you would use virtual machine (VM) instances in the cloud as the individual cluster nodes. Using HA clusters on a public cloud platform has the following benefits:
- Improved availability: In case of a VM failure, the workload is quickly redistributed to other nodes, so running services are not disrupted.
- Scalability: Additional nodes can be started when demand is high and stopped when demand is low.
- Cost-effectiveness: With the pay-as-you-go pricing, you pay only for nodes that are running.
- Simplified management: Some public cloud platforms offer management interfaces to make configuring HA clusters easier.
To enable HA on your Red Hat Enterprise Linux (RHEL) systems, Red Hat offers a High Availability Add-On. The High Availability Add-On provides all necessary components for creating HA clusters on RHEL systems. The components include high availability service management and cluster administration tools.
Additional resources
4.2. Creating resources in Azure
Complete the following procedure to create a region, resource group, storage account, virtual network, and availability set. You need these resources to set up a cluster on Microsoft Azure.
Procedure
Authenticate your system with Azure and log in.
$ az login
NoteIf a browser is available in your environment, the CLI opens your browser to the Azure sign-in page.
Example:
[clouduser@localhost]$ az login To sign in, use a web browser to open the page https://aka.ms/devicelogin and enter the code FDMSCMETZ to authenticate. [ { "cloudName": "AzureCloud", "id": "Subscription ID", "isDefault": true, "name": "MySubscriptionName", "state": "Enabled", "tenantId": "Tenant ID", "user": { "name": "clouduser@company.com", "type": "user" } } ]
Create a resource group in an Azure region.
$ az group create --name resource-group --location azure-region
Example:
[clouduser@localhost]$ az group create --name azrhelclirsgrp --location southcentralus { "id": "/subscriptions//resourceGroups/azrhelclirsgrp", "location": "southcentralus", "managedBy": null, "name": "azrhelclirsgrp", "properties": { "provisioningState": "Succeeded" }, "tags": null }
Create a storage account.
$ az storage account create -l azure-region -n storage-account-name -g resource-group --sku sku_type --kind StorageV2
Example:
[clouduser@localhost]$ az storage account create -l southcentralus -n azrhelclistact -g azrhelclirsgrp --sku Standard_LRS --kind StorageV2 { "accessTier": null, "creationTime": "2017-04-05T19:10:29.855470+00:00", "customDomain": null, "encryption": null, "id": "/subscriptions//resourceGroups/azrhelclirsgrp/providers/Microsoft.Storage/storageAccounts/azrhelclistact", "kind": "StorageV2", "lastGeoFailoverTime": null, "location": "southcentralus", "name": "azrhelclistact", "primaryEndpoints": { "blob": "https://azrhelclistact.blob.core.windows.net/", "file": "https://azrhelclistact.file.core.windows.net/", "queue": "https://azrhelclistact.queue.core.windows.net/", "table": "https://azrhelclistact.table.core.windows.net/" }, "primaryLocation": "southcentralus", "provisioningState": "Succeeded", "resourceGroup": "azrhelclirsgrp", "secondaryEndpoints": null, "secondaryLocation": null, "sku": { "name": "Standard_LRS", "tier": "Standard" }, "statusOfPrimary": "available", "statusOfSecondary": null, "tags": {}, "type": "Microsoft.Storage/storageAccounts" }
Get the storage account connection string.
$ az storage account show-connection-string -n storage-account-name -g resource-group
Example:
[clouduser@localhost]$ az storage account show-connection-string -n azrhelclistact -g azrhelclirsgrp { "connectionString": "DefaultEndpointsProtocol=https;EndpointSuffix=core.windows.net;AccountName=azrhelclistact;AccountKey=NreGk...==" }
Export the connection string by copying the connection string and pasting it into the following command. This string connects your system to the storage account.
$ export AZURE_STORAGE_CONNECTION_STRING="storage-connection-string"
Example:
[clouduser@localhost]$ export AZURE_STORAGE_CONNECTION_STRING="DefaultEndpointsProtocol=https;EndpointSuffix=core.windows.net;AccountName=azrhelclistact;AccountKey=NreGk...=="
Create the storage container.
$ az storage container create -n container-name
Example:
[clouduser@localhost]$ az storage container create -n azrhelclistcont { "created": true }
Create a virtual network. All cluster nodes must be in the same virtual network.
$ az network vnet create -g resource group --name vnet-name --subnet-name subnet-name
Example:
[clouduser@localhost]$ az network vnet create --resource-group azrhelclirsgrp --name azrhelclivnet1 --subnet-name azrhelclisubnet1 { "newVNet": { "addressSpace": { "addressPrefixes": [ "10.0.0.0/16" ] }, "dhcpOptions": { "dnsServers": [] }, "etag": "W/\"\"", "id": "/subscriptions//resourceGroups/azrhelclirsgrp/providers/Microsoft.Network/virtualNetworks/azrhelclivnet1", "location": "southcentralus", "name": "azrhelclivnet1", "provisioningState": "Succeeded", "resourceGroup": "azrhelclirsgrp", "resourceGuid": "0f25efee-e2a6-4abe-a4e9-817061ee1e79", "subnets": [ { "addressPrefix": "10.0.0.0/24", "etag": "W/\"\"", "id": "/subscriptions//resourceGroups/azrhelclirsgrp/providers/Microsoft.Network/virtualNetworks/azrhelclivnet1/subnets/azrhelclisubnet1", "ipConfigurations": null, "name": "azrhelclisubnet1", "networkSecurityGroup": null, "provisioningState": "Succeeded", "resourceGroup": "azrhelclirsgrp", "resourceNavigationLinks": null, "routeTable": null } ], "tags": {}, "type": "Microsoft.Network/virtualNetworks", "virtualNetworkPeerings": null } }
Create an availability set. All cluster nodes must be in the same availability set.
$ az vm availability-set create --name MyAvailabilitySet --resource-group MyResourceGroup
Example:
[clouduser@localhost]$ az vm availability-set create --name rhelha-avset1 --resource-group azrhelclirsgrp { "additionalProperties": {}, "id": "/subscriptions/.../resourceGroups/azrhelclirsgrp/providers/Microsoft.Compute/availabilitySets/rhelha-avset1", "location": "southcentralus", "name": “rhelha-avset1", "platformFaultDomainCount": 2, "platformUpdateDomainCount": 5, [omitted]
Additional resources
4.3. Required system packages for High Availability
The procedure assumes you are creating a VM image for Azure HA that uses Red Hat Enterprise Linux. To successfully complete the procedure, the following packages must be installed.
Package | Repository | Description |
---|---|---|
libvirt | rhel-9-for-x86_64-appstream-rpms | Open source API, daemon, and management tool for managing platform virtualization |
virt-install | rhel-9-for-x86_64-appstream-rpms | A command-line utility for building VMs |
libguestfs | rhel-9-for-x86_64-appstream-rpms | A library for accessing and modifying VM file systems |
guestfs-tools | rhel-9-for-x86_64-appstream-rpms |
System administration tools for VMs; includes the |
4.4. Azure VM configuration settings
Azure VMs must have the following configuration settings. Some of these settings are enabled during the initial VM creation. Other settings are set when provisioning the VM image for Azure. Keep these settings in mind as you move through the procedures. Refer to them as necessary.
Setting | Recommendation |
---|---|
ssh | ssh must be enabled to provide remote access to your Azure VMs. |
dhcp | The primary virtual adapter should be configured for dhcp (IPv4 only). |
Swap Space | Do not create a dedicated swap file or swap partition. You can configure swap space with the Windows Azure Linux Agent (WALinuxAgent). |
NIC | Choose virtio for the primary virtual network adapter. |
encryption | For custom images, use Network Bound Disk Encryption (NBDE) for full disk encryption on Azure. |
4.5. Installing Hyper-V device drivers
Microsoft provides network and storage device drivers as part of their Linux Integration Services (LIS) for Hyper-V package. You may need to install Hyper-V device drivers on the VM image prior to provisioning it as an Azure virtual machine (VM). Use the lsinitrd | grep hv
command to verify that the drivers are installed.
Procedure
Enter the following
grep
command to determine if the required Hyper-V device drivers are installed.# lsinitrd | grep hv
In the example below, all required drivers are installed.
# lsinitrd | grep hv drwxr-xr-x 2 root root 0 Aug 12 14:21 usr/lib/modules/3.10.0-932.el9.x86_64/kernel/drivers/hv -rw-r--r-- 1 root root 31272 Aug 11 08:45 usr/lib/modules/3.10.0-932.el9.x86_64/kernel/drivers/hv/hv_vmbus.ko.xz -rw-r--r-- 1 root root 25132 Aug 11 08:46 usr/lib/modules/3.10.0-932.el9.x86_64/kernel/drivers/net/hyperv/hv_netvsc.ko.xz -rw-r--r-- 1 root root 9796 Aug 11 08:45 usr/lib/modules/3.10.0-932.el9.x86_64/kernel/drivers/scsi/hv_storvsc.ko.xz
If all the drivers are not installed, complete the remaining steps.
NoteAn
hv_vmbus
driver may exist in the environment. Even if this driver is present, complete the following steps.-
Create a file named
hv.conf
in/etc/dracut.conf.d
. Add the following driver parameters to the
hv.conf
file.add_drivers+=" hv_vmbus " add_drivers+=" hv_netvsc " add_drivers+=" hv_storvsc " add_drivers+=" nvme "
NoteNote the spaces before and after the quotes, for example,
add_drivers+=" hv_vmbus "
. This ensures that unique drivers are loaded in the event that other Hyper-V drivers already exist in the environment.Regenerate the
initramfs
image.# dracut -f -v --regenerate-all
Verification
- Reboot the machine.
-
Run the
lsinitrd | grep hv
command to verify that the drivers are installed.
4.6. Making configuration changes required for a Microsoft Azure deployment
Before you deploy your custom base image to Azure, you must perform additional configuration changes to ensure that the virtual machine (VM) can properly operate in Azure.
Procedure
- Log in to the VM.
Register the VM, and enable the Red Hat Enterprise Linux 9 repository.
# subscription-manager register --auto-attach Installed Product Current Status: Product Name: Red Hat Enterprise Linux for x86_64 Status: Subscribed
Ensure that the
cloud-init
andhyperv-daemons
packages are installed.# dnf install cloud-init hyperv-daemons -y
Create
cloud-init
configuration files that are needed for integration with Azure services:To enable logging to the Hyper-V Data Exchange Service (KVP), create the
/etc/cloud/cloud.cfg.d/10-azure-kvp.cfg
configuration file and add the following lines to that file.reporting: logging: type: log telemetry: type: hyperv
To add Azure as a datasource, create the
/etc/cloud/cloud.cfg.d/91-azure_datasource.cfg
configuration file, and add the following lines to that file.datasource_list: [ Azure ] datasource: Azure: apply_network_config: False
To ensure that specific kernel modules are blocked from loading automatically, edit or create the
/etc/modprobe.d/blocklist.conf
file and add the following lines to that file.blacklist nouveau blacklist lbm-nouveau blacklist floppy blacklist amdgpu blacklist skx_edac blacklist intel_cstate
Modify
udev
network device rules:Remove the following persistent network device rules if present.
# rm -f /etc/udev/rules.d/70-persistent-net.rules # rm -f /etc/udev/rules.d/75-persistent-net-generator.rules # rm -f /etc/udev/rules.d/80-net-name-slot-rules
To ensure that Accelerated Networking on Azure works as intended, create a new network device rule
/etc/udev/rules.d/68-azure-sriov-nm-unmanaged.rules
and add the following line to it.SUBSYSTEM=="net", DRIVERS=="hv_pci", ACTION=="add", ENV{NM_UNMANAGED}="1"
Set the
sshd
service to start automatically.# systemctl enable sshd # systemctl is-enabled sshd
Modify kernel boot parameters:
Open the
/etc/default/grub
file, and ensure theGRUB_TIMEOUT
line has the following value.GRUB_TIMEOUT=10
Remove the following options from the end of the
GRUB_CMDLINE_LINUX
line if present.rhgb quiet
Ensure the
/etc/default/grub
file contains the following lines with all the specified options.GRUB_CMDLINE_LINUX="loglevel=3 crashkernel=auto console=tty1 console=ttyS0 earlyprintk=ttyS0 rootdelay=300" GRUB_TIMEOUT_STYLE=countdown GRUB_TERMINAL="serial console" GRUB_SERIAL_COMMAND="serial --speed=115200 --unit=0 --word=8 --parity=no --stop=1"
NoteIf you do not plan to run your workloads on HDDs, add
elevator=none
to the end of theGRUB_CMDLINE_LINUX
line.This sets the I/O scheduler to
none
, which improves I/O performance when running workloads on SSDs.Regenerate the
grub.cfg
file.On a BIOS-based machine:
In RHEL 9.2 and earlier:
# grub2-mkconfig -o /boot/grub2/grub.cfg
In RHEL 9.3 and later:
# grub2-mkconfig -o /boot/grub2/grub.cfg --update-bls-cmdline
On a UEFI-based machine:
In RHEL 9.2 and earlier:
# grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg
In RHEL 9.3 and later:
# grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg --update-bls-cmdline
If your system uses a non-default location for
grub.cfg
, adjust the command accordingly.
Configure the Windows Azure Linux Agent (
WALinuxAgent
):Install and enable the
WALinuxAgent
package.# dnf install WALinuxAgent -y # systemctl enable waagent
To ensure that a swap partition is not used in provisioned VMs, edit the following lines in the
/etc/waagent.conf
file.Provisioning.DeleteRootPassword=y ResourceDisk.Format=n ResourceDisk.EnableSwap=n
Prepare the VM for Azure provisioning:
Unregister the VM from Red Hat Subscription Manager.
# subscription-manager unregister
Clean up the existing provisioning details.
# waagent -force -deprovision
NoteThis command generates warnings, which are expected because Azure handles the provisioning of VMs automatically.
Clean the shell history and shut down the VM.
# export HISTSIZE=0 # poweroff
4.7. Creating an Azure Active Directory application
Complete the following procedure to create an Azure Active Directory (AD) application. The Azure AD application authorizes and automates access for HA operations for all nodes in the cluster.
Prerequisites
- The Azure Command Line Interface (CLI) is installed on your system.
- You are an Administrator or Owner for the Microsoft Azure subscription. You need this authorization to create an Azure AD application.
Procedure
On any node in the HA cluster, log in to your Azure account.
$ az login
Create a
json
configuration file for a custom role for the Azure fence agent. Use the following configuration, but replace <subscription-id> with your subscription IDs.{ "Name": "Linux Fence Agent Role", "description": "Allows to power-off and start virtual machines", "assignableScopes": [ "/subscriptions/<subscription-id>" ], "actions": [ "Microsoft.Compute/*/read", "Microsoft.Compute/virtualMachines/powerOff/action", "Microsoft.Compute/virtualMachines/start/action" ], "notActions": [], "dataActions": [], "notDataActions": [] }
Define the custom role for the Azure fence agent. Use the
json
file created in the previous step to do this.$ az role definition create --role-definition azure-fence-role.json { "assignableScopes": [ "/subscriptions/<my-subscription-id>" ], "description": "Allows to power-off and start virtual machines", "id": "/subscriptions/<my-subscription-id>/providers/Microsoft.Authorization/roleDefinitions/<role-id>", "name": "<role-id>", "permissions": [ { "actions": [ "Microsoft.Compute/*/read", "Microsoft.Compute/virtualMachines/powerOff/action", "Microsoft.Compute/virtualMachines/start/action" ], "dataActions": [], "notActions": [], "notDataActions": [] } ], "roleName": "Linux Fence Agent Role", "roleType": "CustomRole", "type": "Microsoft.Authorization/roleDefinitions" }
-
In the Azure web console interface, select Virtual Machine
Click Identity in the left-side menu. -
Select On
Click Save click Yes to confirm. -
Click Azure role assignments
Add role assignment. -
Select the Scope required for the role, for example
Resource Group
. - Select the required Resource Group.
- Optional: Change the Subscription if necessary.
- Select the Linux Fence Agent Role role.
- Click Save.
Verification
Display nodes visible to Azure AD.
# fence_azure_arm --msi -o list node1, node2, [...]
If this command outputs all nodes on your cluster, the AD application has been configured successfully.
4.8. Converting the image to a fixed VHD format
All Microsoft Azure VM images must be in a fixed VHD
format. The image must be aligned on a 1 MB boundary before it is converted to VHD. To convert the image from qcow2
to a fixed VHD
format and align the image, see the following procedure. Once you have converted the image, you can upload it to Azure.
Procedure
Convert the image from
qcow2
toraw
format.$ qemu-img convert -f qcow2 -O raw <image-name>.qcow2 <image-name>.raw
Create a shell script with the following content.
#!/bin/bash MB=$((1024 * 1024)) size=$(qemu-img info -f raw --output json "$1" | gawk 'match($0, /"virtual-size": ([0-9]+),/, val) {print val[1]}') rounded_size=$((($size/$MB + 1) * $MB)) if [ $(($size % $MB)) -eq 0 ] then echo "Your image is already aligned. You do not need to resize." exit 1 fi echo "rounded size = $rounded_size" export rounded_size
Run the script. This example uses the name
align.sh
.$ sh align.sh <image-xxx>.raw
- If the message "Your image is already aligned. You do not need to resize." displays, proceed to the following step.
- If a value displays, your image is not aligned.
Use the following command to convert the file to a fixed
VHD
format.The sample uses qemu-img version 2.12.0.
$ qemu-img convert -f raw -o subformat=fixed,force_size -O vpc <image-xxx>.raw <image.xxx>.vhd
Once converted, the
VHD
file is ready to upload to Azure.If the
raw
image is not aligned, complete the following steps to align it.Resize the
raw
file by using the rounded value displayed when you ran the verification script.$ qemu-img resize -f raw <image-xxx>.raw <rounded-value>
Convert the
raw
image file to aVHD
format.The sample uses qemu-img version 2.12.0.
$ qemu-img convert -f raw -o subformat=fixed,force_size -O vpc <image-xxx>.raw <image.xxx>.vhd
Once converted, the
VHD
file is ready to upload to Azure.
4.9. Uploading and creating an Azure image
Complete the following steps to upload the VHD
file to your container and create an Azure custom image.
The exported storage connection string does not persist after a system reboot. If any of the commands in the following steps fail, export the connection string again.
Procedure
Upload the
VHD
file to the storage container. It may take several minutes. To get a list of storage containers, enter theaz storage container list
command.$ az storage blob upload \ --account-name <storage-account-name> --container-name <container-name> \ --type page --file <path-to-vhd> --name <image-name>.vhd
Example:
[clouduser@localhost]$ az storage blob upload \ --account-name azrhelclistact --container-name azrhelclistcont \ --type page --file rhel-image-{ProductNumber}.vhd --name rhel-image-{ProductNumber}.vhd Percent complete: %100.0
Get the URL for the uploaded
VHD
file to use in the following step.$ az storage blob url -c <container-name> -n <image-name>.vhd
Example:
$ az storage blob url -c azrhelclistcont -n rhel-image-9.vhd "https://azrhelclistact.blob.core.windows.net/azrhelclistcont/rhel-image-9.vhd"
Create the Azure custom image.
$ az image create -n <image-name> -g <resource-group> -l <azure-region> --source <URL> --os-type linux
NoteThe default hypervisor generation of the VM is V1. You can optionally specify a V2 hypervisor generation by including the option
--hyper-v-generation V2
. Generation 2 VMs use a UEFI-based boot architecture. See Support for generation 2 VMs on Azure for information about generation 2 VMs.The command may return the error "Only blobs formatted as VHDs can be imported." This error may mean that the image was not aligned to the nearest 1 MB boundary before it was converted to
VHD
.Example:
$ az image create -n rhel9 -g azrhelclirsgrp2 -l southcentralus --source https://azrhelclistact.blob.core.windows.net/azrhelclistcont/rhel-image-9.vhd --os-type linux
4.10. Installing Red Hat HA packages and agents
Complete the following steps on all nodes.
Procedure
Launch an SSH terminal session and connect to the VM by using the administrator name and public IP address.
$ ssh administrator@PublicIP
To get the public IP address for an Azure VM, open the VM properties in the Azure Portal or enter the following Azure CLI command.
$ az vm list -g <resource-group> -d --output table
Example:
[clouduser@localhost ~] $ az vm list -g azrhelclirsgrp -d --output table Name ResourceGroup PowerState PublicIps Location ------ ---------------------- -------------- ------------- -------------- node01 azrhelclirsgrp VM running 192.98.152.251 southcentralus
Register the VM with Red Hat.
$ sudo -i # subscription-manager register --auto-attach
NoteIf the
--auto-attach
command fails, manually register the VM to your subscription.Disable all repositories.
# subscription-manager repos --disable=*
Enable the RHEL 9 Server HA repositories.
# subscription-manager repos --enable=rhel-9-for-x86_64-highavailability-rpms
Update all packages.
# dnf update -y
Install the Red Hat High Availability Add-On software packages, along with the Azure fencing agent from the High Availability channel.
# dnf install pcs pacemaker fence-agents-azure-arm
The user
hacluster
was created during the pcs and pacemaker installation in the previous step. Create a password forhacluster
on all cluster nodes. Use the same password for all nodes.# passwd hacluster
Add the
high availability
service to the RHEL Firewall iffirewalld.service
is installed.# firewall-cmd --permanent --add-service=high-availability # firewall-cmd --reload
Start the
pcs
service and enable it to start on boot.# systemctl start pcsd.service # systemctl enable pcsd.service Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.
Verification
Ensure the
pcs
service is running.# systemctl status pcsd.service pcsd.service - PCS GUI and remote configuration interface Loaded: loaded (/usr/lib/systemd/system/pcsd.service; enabled; vendor preset: disabled) Active: active (running) since Fri 2018-02-23 11:00:58 EST; 1min 23s ago Docs: man:pcsd(8) man:pcs(8) Main PID: 46235 (pcsd) CGroup: /system.slice/pcsd.service └─46235 /usr/bin/ruby /usr/lib/pcsd/pcsd > /dev/null &
4.11. Creating a cluster
Complete the following steps to create the cluster of nodes.
Procedure
On one of the nodes, enter the following command to authenticate the pcs user
hacluster
. In the command, specify the name of each node in the cluster.# pcs host auth <hostname1> <hostname2> <hostname3>
Example:
[root@node01 clouduser]# pcs host auth node01 node02 node03 Username: hacluster Password: node01: Authorized node02: Authorized node03: Authorized
Create the cluster.
# pcs cluster setup <cluster_name> <hostname1> <hostname2> <hostname3>
Example:
[root@node01 clouduser]# pcs cluster setup new_cluster node01 node02 node03 [...] Synchronizing pcsd certificates on nodes node01, node02, node03... node02: Success node03: Success node01: Success Restarting pcsd on the nodes in order to reload the certificates... node02: Success node03: Success node01: Success
Verification
Enable the cluster.
[root@node01 clouduser]# pcs cluster enable --all node02: Cluster Enabled node03: Cluster Enabled node01: Cluster Enabled
Start the cluster.
[root@node01 clouduser]# pcs cluster start --all node02: Starting Cluster... node03: Starting Cluster... node01: Starting Cluster...
4.12. Fencing overview
If communication with a single node in the cluster fails, then other nodes in the cluster must be able to restrict or release access to resources that the failed cluster node may have access to. This cannot be accomplished by contacting the cluster node itself as the cluster node may not be responsive. Instead, you must provide an external method, which is called fencing with a fence agent.
A node that is unresponsive may still be accessing data. The only way to be certain that your data is safe is to fence the node by using STONITH. STONITH is an acronym for "Shoot The Other Node In The Head," and it protects your data from being corrupted by rogue nodes or concurrent access. Using STONITH, you can be certain that a node is truly offline before allowing the data to be accessed from another node.
Additional resources
4.13. Creating a fencing device
Complete the following steps to configure fencing. Complete these commands from any node in the cluster
Prerequisites
You need to set the cluster property stonith-enabled
to true
.
Procedure
Identify the Azure node name for each RHEL VM. You use the Azure node names to configure the fence device.
# fence_azure_arm \ -l <AD-Application-ID> -p <AD-Password> \ --resourceGroup <MyResourceGroup> --tenantId <Tenant-ID> \ --subscriptionId <Subscription-ID> -o list
Example:
[root@node01 clouduser]# fence_azure_arm \ -l e04a6a49-9f00-xxxx-xxxx-a8bdda4af447 -p z/a05AwCN0IzAjVwXXXXXXXEWIoeVp0xg7QT//JE= --resourceGroup azrhelclirsgrp --tenantId 77ecefb6-cff0-XXXX-XXXX-757XXXX9485 --subscriptionId XXXXXXXX-38b4-4527-XXXX-012d49dfc02c -o list node01, node02, node03,
View the options for the Azure ARM STONITH agent.
# pcs stonith describe fence_azure_arm
Example:
# pcs stonith describe fence_apc Stonith options: password: Authentication key password_script: Script to run to retrieve password
WarningFor fence agents that provide a method option, do not specify a value of cycle as it is not supported and can cause data corruption.
Some fence devices can fence only a single node, while other devices can fence multiple nodes. The parameters you specify when you create a fencing device depend on what your fencing device supports and requires.
You can use the
pcmk_host_list
parameter when creating a fencing device to specify all of the machines that are controlled by that fencing device.You can use
pcmk_host_map
parameter when creating a fencing device to map host names to the specifications that comprehends the fence device.Create a fence device.
# pcs stonith create clusterfence fence_azure_arm
Verification
Test the fencing agent for one of the other nodes.
# pcs stonith fence azurenodename
Example:
[root@node01 clouduser]# pcs status Cluster name: newcluster Stack: corosync Current DC: node01 (version 1.1.18-11.el7-2b07d5c5a9) - partition with quorum Last updated: Fri Feb 23 11:44:35 2018 Last change: Fri Feb 23 11:21:01 2018 by root via cibadmin on node01 3 nodes configured 1 resource configured Online: [ node01 node03 ] OFFLINE: [ node02 ] Full list of resources: clusterfence (stonith:fence_azure_arm): Started node01 Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled
Start the node that was fenced in the previous step.
# pcs cluster start <hostname>
Check the status to verify the node started.
# pcs status
Example:
[root@node01 clouduser]# pcs status Cluster name: newcluster Stack: corosync Current DC: node01 (version 1.1.18-11.el7-2b07d5c5a9) - partition with quorum Last updated: Fri Feb 23 11:34:59 2018 Last change: Fri Feb 23 11:21:01 2018 by root via cibadmin on node01 3 nodes configured 1 resource configured Online: [ node01 node02 node03 ] Full list of resources: clusterfence (stonith:fence_azure_arm): Started node01 Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled
Additional resources
4.14. Creating an Azure internal load balancer
The Azure internal load balancer removes cluster nodes that do not answer health probe requests.
Perform the following procedure to create an Azure internal load balancer. Each step references a specific Microsoft procedure and includes the settings for customizing the load balancer for HA.
Prerequisites
Procedure
- Create a Basic load balancer. Select Internal load balancer, the Basic SKU, and Dynamic for the type of IP address assignment.
- Create a back-end address pool. Associate the backend pool to the availability set created while creating Azure resources in HA. Do not set any target network IP configurations.
- Create a health probe. For the health probe, select TCP and enter port 61000. You can use TCP port number that does not interfere with another service. For certain HA product applications (for example, SAP HANA and SQL Server), you may need to work with Microsoft to identify the correct port to use.
- Create a load balancer rule. To create the load balancing rule, the default values are prepopulated. Ensure to set Floating IP (direct server return) to Enabled.
4.15. Configuring the load balancer resource agent
After you have created the health probe, you must configure the load balancer
resource agent. This resource agent runs a service that answers health probe requests from the Azure load balancer and removes cluster nodes that do not answer requests.
Procedure
Install the
nmap-ncat
resource agents on all nodes.# dnf install nmap-ncat resource-agents-cloud
Perform the following steps on a single node.
Create the
pcs
resources and group. Use your load balancer FrontendIP for the IPaddr2 address.# pcs resource create resource-name IPaddr2 ip="10.0.0.7" --group cluster-resources-group
Configure the
load balancer
resource agent.# pcs resource create resource-loadbalancer-name azure-lb port=port-number --group cluster-resources-group
Verification
Run
pcs status
to see the results.[root@node01 clouduser]# pcs status
Example output:
Cluster name: clusterfence01 Stack: corosync Current DC: node02 (version 1.1.16-12.el7_4.7-94ff4df) - partition with quorum Last updated: Tue Jan 30 12:42:35 2018 Last change: Tue Jan 30 12:26:42 2018 by root via cibadmin on node01 3 nodes configured 3 resources configured Online: [ node01 node02 node03 ] Full list of resources: clusterfence (stonith:fence_azure_arm): Started node01 Resource Group: g_azure vip_azure (ocf::heartbeat:IPaddr2): Started node02 lb_azure (ocf::heartbeat:azure-lb): Started node02 Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled