이 콘텐츠는 선택한 언어로 제공되지 않습니다.
Chapter 4. Configuring a Red Hat High Availability cluster on AWS
You can configure a high availability (HA) cluster on Amazon Web Services (AWS) to group Red Hat Enterprise Linux (RHEL) nodes and automatically redistribute workloads if a node fails. The process for setting up HA clusters on AWS is comparable to configuring them in traditional, non-cloud environments.
You have several options for obtaining RHEL images for the cluster. For details, see Available RHEL image types for public cloud.
4.1. Benefits of using high-availability clusters on public cloud platforms 링크 복사링크가 클립보드에 복사되었습니다!
A high-availability (HA) cluster links a set of computers (called nodes) to run a specific workload. HA clusters offer redundancy to handle hardware or software failures. When a node in the HA cluster fails, the Pacemaker cluster resource manager quickly distributes the workload to other nodes, ensuring that services on the cluster continue without noticeable downtime.
You can also run HA clusters on public cloud platforms. In this case, you would use virtual machine (VM) instances in the cloud as the individual cluster nodes. Using HA clusters on a public cloud platform has the following benefits:
- Improved availability: In case of a VM failure, the workload is quickly redistributed to other nodes, so running services are not disrupted.
- Scalability: You can start additional nodes when demand is high and stop them when demand is low.
- Cost-effectiveness: With the pay-as-you-go pricing, you pay only for nodes that are running.
- Simplified management: Some public cloud platforms offer management interfaces to make configuring HA clusters easier.
To enable HA on your RHEL systems, Red Hat offers a HA Add-On. You can configure a RHEL cluster with Red Hat HA Add-On to manage HA clusters with groups of RHEL servers. Red Hat HA Add-On gives access to integrated and streamlined tools. With cluster resource manager, fencing agents, and resource agents, you can set up and configure the cluster for automation. The Red Hat HA Add-On offers the following components for automation:
-
Pacemaker, a cluster resource manager that offers both a command line utility (pcs) and a GUI (pcsd) to support many nodes -
CorosyncandKronosnetto create and manage HA clusters - Resource agents to configure and manage custom applications
- Fencing agents to use cluster on platforms such as bare-metal servers and virtual machines
The Red Hat HA Add-On handles critical tasks such as node failures, load balancing, and node health checks for fault tolerance and system reliability.
4.2. Installing the High Availability packages and agents 링크 복사링크가 클립보드에 복사되었습니다!
To configure a Red Hat High Availability cluster on AWS, install the High Availability packages and agents on each node in the cluster.
Prerequisites
- You have created a Red Hat account.
- You have signed up and set up an AWS account.
- You have completed the configuration for Uploading RHEL image to AWS by using the command line.
Procedure
Remove the AWS Red Hat Update Infrastructure (RHUI) client.
sudo -i dnf -y remove rh-amazon-rhui-client
$ sudo -i # dnf -y remove rh-amazon-rhui-clientCopy to Clipboard Copied! Toggle word wrap Toggle overflow Register the VM with Red Hat.
subscription-manager register
# subscription-manager registerCopy to Clipboard Copied! Toggle word wrap Toggle overflow Disable all repositories.
subscription-manager repos --disable=
# subscription-manager repos --disable=Copy to Clipboard Copied! Toggle word wrap Toggle overflow Enable the RHEL 10 Server HA repositories.
subscription-manager repos --enable=rhel-10-for-x86_64-highavailability-rpms
# subscription-manager repos --enable=rhel-10-for-x86_64-highavailability-rpmsCopy to Clipboard Copied! Toggle word wrap Toggle overflow Update the RHEL AWS instance.
dnf update -y
# dnf update -yCopy to Clipboard Copied! Toggle word wrap Toggle overflow Install the Red Hat High Availability Add-On software packages, along with the AWS fencing agent from the High Availability channel.
dnf install pcs pacemaker fence-agents-aws
# dnf install pcs pacemaker fence-agents-awsCopy to Clipboard Copied! Toggle word wrap Toggle overflow The user
haclusterwas created during thepcsandpacemakerinstallation in the earlier step. Create a password forhaclusteron all cluster nodes. Use the same password for all nodes.passwd hacluster
# passwd haclusterCopy to Clipboard Copied! Toggle word wrap Toggle overflow Add the
high availabilityservice to the RHEL Firewall iffirewalld.serviceis installed.firewall-cmd --permanent --add-service=high-availability firewall-cmd --reload
# firewall-cmd --permanent --add-service=high-availability # firewall-cmd --reloadCopy to Clipboard Copied! Toggle word wrap Toggle overflow Start the
pcsservice and enable it to start on boot.systemctl start pcsd.service systemctl enable pcsd.service
# systemctl start pcsd.service # systemctl enable pcsd.serviceCopy to Clipboard Copied! Toggle word wrap Toggle overflow -
Edit
/etc/hostsand add Red Hat Enterprise Linux (RHEL) host names and internal IP addresses. See How should the /etc/hosts file be set up on RHEL cluster nodes? for details.
Verification
Ensure the
pcsservice is running.Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.3. Creating a high availability cluster 링크 복사링크가 클립보드에 복사되었습니다!
You can create a Red Hat High Availability Add-On cluster. This example uses nodes z1.example.com and z2.example.com.
To display the parameters of a pcs command and a description of those parameters, use the -h option of the pcs command.
Prerequisites
- You have created a Red Hat account
Procedure
Authenticate the
pcsuserhaclusterfor each node in the cluster on the node from which you will be runningpcs.The following command authenticates user
haclusteronz1.example.comfor both of the nodes in a two-node cluster that will consist ofz1.example.comandz2.example.com.pcs host auth z1.example.com z2.example.com Username: hacluster Password: z1.example.com: Authorized z2.example.com: Authorized
[root@z1 ~]# pcs host auth z1.example.com z2.example.com Username: hacluster Password: z1.example.com: Authorized z2.example.com: AuthorizedCopy to Clipboard Copied! Toggle word wrap Toggle overflow Execute the following command from
z1.example.comto create the two-node clustermy_clusterthat consists of nodesz1.example.comandz2.example.com. This will propagate the cluster configuration files to both nodes in the cluster. This command includes the--startoption, which will start the cluster services on both nodes in the cluster.pcs cluster setup my_cluster --start z1.example.com z2.example.com
[root@z1 ~]# pcs cluster setup my_cluster --start z1.example.com z2.example.comCopy to Clipboard Copied! Toggle word wrap Toggle overflow Enable the cluster services to run on each node in the cluster when the node is booted.
NoteFor your particular environment, you can skip this step by keeping the cluster services disabled. If enabled and a node goes down, any issues with your cluster or your resources are resolved before the node rejoins the cluster. If you keep the cluster services disabled, you need to manually start the services when you reboot a node by executing the
pcs cluster startcommand on that node.pcs cluster enable --all
[root@z1 ~]# pcs cluster enable --allCopy to Clipboard Copied! Toggle word wrap Toggle overflow Display the status of the cluster you created with the
pcs cluster statuscommand. Because there could be a slight delay before the cluster is up and running when you start the cluster services with the--startoption of thepcs cluster setupcommand, you should ensure that the cluster is up and running before performing any subsequent actions on the cluster and its configuration.Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.4. Configuring fencing in a Red Hat High Availability cluster 링크 복사링크가 클립보드에 복사되었습니다!
When a node becomes unresponsive, the cluster must isolate it to prevent data corruption. Since the node cannot be contacted directly, you must configure fencing. An external fence device cuts off the node’s access to shared resources or performs a hard reboot.
Without a fence device configured you do not have a way to know that the resources previously used by the disconnected cluster node have been released, and this could prevent the services from running on any of the other cluster nodes. Conversely, the system may assume erroneously that the cluster node has released its resources and this can lead to data corruption and data loss. Without a fence device configured data integrity cannot be guaranteed and the cluster configuration will be unsupported.
When the fencing is in progress no other cluster operation is allowed to run. Normal operation of the cluster cannot resume until fencing has completed or the cluster node rejoins the cluster after the cluster node has been rebooted. For more information about fencing and its importance in a Red Hat High Availability cluster, see the Red Hat Knowledgebase solution Fencing in a Red Hat High Availability Cluster.
4.4.1. Displaying available fence agents and their options 링크 복사링크가 클립보드에 복사되었습니다!
You can view available fencing agents and the available options for specific fencing agents.
Your system’s hardware determines the type of fencing device to use for your cluster. For information about supported platforms and architectures and the different fencing devices, see the Red Hat Knowledgebase article Cluster Platforms and Architectures section of the article Support Policies for RHEL High Availability Clusters.
Run the following command to list all available fencing agents. When you specify a filter, this command displays only the fencing agents that match the filter.
pcs stonith list [filter]
# pcs stonith list [filter]
Run the following command to display the options for the specified fencing agent.
pcs stonith describe [stonith_agent]
# pcs stonith describe [stonith_agent]
For example, the following command displays the options for the fence agent for APC over telnet/SSH.
For fence agents that provide a method option, with the exception of the fence_sbd agent a value of cycle is unsupported and should not be specified, as it may cause data corruption. Even for fence_sbd, however. you should not specify a method and instead use the default value.
4.4.2. Creating a fence device 링크 복사링크가 클립보드에 복사되었습니다!
Create a fence device using the pcs stonith create command. To view all available creation options, use the pcs stonith -h command.
Procedure
Create a fence device:
pcs stonith create stonith_id stonith_device_type [stonith_device_options] [op operation_action operation_options]
# pcs stonith create stonith_id stonith_device_type [stonith_device_options] [op operation_action operation_options]Copy to Clipboard Copied! Toggle word wrap Toggle overflow The following command creates a single fencing device for a single node:
pcs stonith create MyStonith fence_virt pcmk_host_list=f1 op monitor interval=30s
# pcs stonith create MyStonith fence_virt pcmk_host_list=f1 op monitor interval=30sCopy to Clipboard Copied! Toggle word wrap Toggle overflow Some fence devices can fence only a single node, while other devices can fence multiple nodes. The parameters you specify when you create a fencing device depend on what your fencing device supports and requires.
- Some fence devices can automatically determine what nodes they can fence.
-
You can use the
pcmk_host_listparameter when creating a fencing device to specify all of the machines that are controlled by that fencing device. -
Some fence devices require a mapping of host names to the specifications that the fence device understands. You can map host names with the
pcmk_host_mapparameter when creating a fencing device.
4.4.3. General properties of fencing devices 링크 복사링크가 클립보드에 복사되었습니다!
Configure fencing behavior using device-specific options and cluster-wide properties. Device options define agent settings, such as IP addresses, and metadata like delays. Cluster properties manage global logic, including timeouts and the stonith-enabled parameter.
Any cluster node can fence any other cluster node with any fence device, regardless of whether the fence resource is started or stopped. Whether the resource is started controls only the recurring monitor for the device, not whether it can be used, with the following exceptions:
-
You can disable a fencing device by running the
pcs stonith disable stonith_idcommand. This will prevent any node from using that device. -
To prevent a specific node from using a fencing device, you can configure location constraints for the fencing resource with the
pcs constraint location … avoidscommand. -
Configuring
stonith-enabled=falsewill disable fencing altogether. Note, however, that Red Hat does not support clusters when fencing is disabled, as it is not suitable for a production environment.
The following table describes the general properties you can set for fencing devices.
| Field | Type | Default | Description |
|---|---|---|---|
|
| string |
A mapping of host names to port numbers for devices that do not support host names. For example: | |
|
| string |
A list of machines controlled by this device (Optional unless | |
|
| string |
*
* Otherwise,
* Otherwise,
*Otherwise, |
How to determine which machines are controlled by the device. Allowed values: |
The following table summarizes additional properties you can set for fencing devices. Note that these properties are for advanced use only.
| Field | Type | Default | Description |
|---|---|---|---|
|
| string | port |
An alternate parameter to supply instead of port. Some devices do not support the standard port parameter or may provide additional ones. Use this to specify an alternate, device-specific parameter that should indicate the machine to be fenced. A value of |
|
| string | reboot |
An alternate command to run instead of |
|
| time | 60s |
Specify an alternate timeout to use for reboot actions instead of |
|
| integer | 2 |
The maximum number of times to retry the |
|
| string | off |
An alternate command to run instead of |
|
| time | 60s |
Specify an alternate timeout to use for off actions instead of |
|
| integer | 2 | The maximum number of times to retry the off command within the timeout period. Some devices do not support multiple connections. Operations may fail if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries off actions before giving up. |
|
| string | list |
An alternate command to run instead of |
|
| time | 60s | Specify an alternate timeout to use for list actions. Some devices need much more or much less time to complete than normal. Use this to specify an alternate, device-specific, timeout for list actions. |
|
| integer | 2 |
The maximum number of times to retry the |
|
| string | monitor |
An alternate command to run instead of |
|
| time | 60s |
Specify an alternate timeout to use for monitor actions instead of |
|
| integer | 2 |
The maximum number of times to retry the |
|
| string | status |
An alternate command to run instead of |
|
| time | 60s |
Specify an alternate timeout to use for status actions instead of |
|
| integer | 2 | The maximum number of times to retry the status command within the timeout period. Some devices do not support multiple connections. Operations may fail if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries status actions before giving up. |
|
| string | 0s |
Enables a base delay for fencing actions and specifies a base delay value. You can specify different values for different nodes with the |
|
| time | 0s |
Enables a random delay for fencing actions and specifies the maximum delay, which is the maximum value of the combined base delay and random delay. For example, if the base delay is 3 and |
|
| integer | 1 |
The maximum number of actions that can be performed in parallel on this device. The cluster property |
|
| string | on |
For advanced use only: An alternate command to run instead of |
|
| time | 60s |
For advanced use only: Specify an alternate timeout to use for |
|
| integer | 2 |
For advanced use only: The maximum number of times to retry the |
In addition to the properties you can set for individual fence devices, there are also cluster properties you can set that determine fencing behavior, as described in the following table.
| Option | Default | Description |
|---|---|---|
|
| true |
Indicates that failed nodes and nodes with resources that cannot be stopped should be fenced. Protecting your data requires that you set this
If
Red Hat only supports clusters with this value set to |
|
| reboot |
Action to send to fencing device. Allowed values: |
|
| 60s | How long to wait for a STONITH action to complete. |
|
| 10 | How many times fencing can fail for a target before the cluster will no longer immediately re-attempt it. |
|
| The maximum time to wait until a node can be assumed to have been killed by the hardware watchdog. It is recommended that this value be set to twice the value of the hardware watchdog timeout. This option is needed only if watchdog-only SBD configuration is used for fencing. | |
|
| true | Allow fencing operations to be performed in parallel. |
|
| stop |
Determines how a cluster node should react if notified of its own fencing. A cluster node may receive notification of its own fencing if fencing is misconfigured, or if fabric fencing is in use that does not cut cluster communication. Allowed values are
Although the default value for this property is |
|
| 0 (disabled) | Sets a fencing delay that allows you to configure a two-node cluster so that in a split-brain situation the node with the fewest or least important resources running is the node that gets fenced. For general information about fencing delay parameters and their interactions, see Fencing delays. |
For information about setting cluster properties, see https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/10/html/configuring_and_managing_high_availability_clusters/index#setting-cluster-properties
4.4.4. Fencing delays 링크 복사링크가 클립보드에 복사되었습니다!
In two-node clusters, simultaneous communication loss can cause nodes to fence each other, shutting down the entire cluster. Configure a fencing delay to prevent this race condition. Delays are unnecessary in larger clusters, where quorum determines fencing authority.
You can set different types of fencing delays, depending on your system requirements.
static fencing delays
A static fencing delay is a fixed, predetermined delay. Setting a static delay on one node makes that node more likely to be fenced because it increases the chances that the other node will initiate fencing first after detecting lost communication. In an active/passive cluster, setting a delay on a passive node makes it more likely that the passive node will be fenced when communication breaks down. You configure a static delay by using the
pcs_delay_basecluster property. You can set this property when a separate fence device is used for each node or when a single fence device is used for all nodes.dynamic fencing delays
A dynamic fencing delay is random. It can vary and is determined at the time fencing is needed. You configure a random delay and specify a maximum value for the combined base delay and random delay with the
pcs_delay_maxcluster property. When the fencing delay for each node is random, which node is fenced is also random. You may find this feature useful if your cluster is configured with a single fence device for all nodes in an active/active design.priority fencing delays
A priority fencing delay is based on active resource priorities. If all resources have the same priority, the node with the fewest resources running is the node that gets fenced. In most cases, you use only one delay-related parameter, but it is possible to combine them. Combining delay-related parameters adds the priority values for the resources together to create a total delay. You configure a priority fencing delay with the
priority-fencing-delaycluster property. You may find this feature useful in an active/active cluster design because it can make the node running the fewest resources more likely to be fenced when communication between the nodes is lost.
The pcmk_delay_base cluster property
Setting the pcmk_delay_base cluster property enables a base delay for fencing and specifies a base delay value.
When you set the pcmk_delay_max cluster property in addition to the pcmk_delay_base property, the overall delay is derived from a random delay value added to this static delay so that the sum is kept below the maximum delay. When you set pcmk_delay_base but do not set pcmk_delay_max, there is no random component to the delay and it will be the value of pcmk_delay_base.
You can specify different values for different nodes with the pcmk_delay_base parameter. This allows a single fence device to be used in a two-node cluster, with a different delay for each node. You do not need to configure two separate devices to use separate delays. To specify different values for different nodes, you map the host names to the delay value for that node using a similar syntax to pcmk_host_map. For example, node1:0;node2:10s would use no delay when fencing node1 and a 10-second delay when fencing node2.
- The
pcmk_delay_maxcluster property Setting the
pcmk_delay_maxcluster property enables a random delay for fencing actions and specifies the maximum delay, which is the maximum value of the combined base delay and random delay. For example, if the base delay is 3 andpcmk_delay_maxis 10, the random delay will be between 3 and 10.When you set the
pcmk_delay_basecluster property in addition to thepcmk_delay_maxproperty, the overall delay is derived from a random delay value added to this static delay so that the sum is kept below the maximum delay. When you setpcmk_delay_maxbut do not setpcmk_delay_basethere is no static component to the delay.
The priority-fencing-delay cluster property
Setting the priority-fencing-delay cluster property allows you to configure a two-node cluster so that in a split-brain situation the node with the fewest or least important resources running is the node that gets fenced.
The priority-fencing-delay property can be set to a time duration. The default value for this property is 0 (disabled). If this property is set to a non-zero value, and the priority meta-attribute is configured for at least one resource, then in a split-brain situation the node with the highest combined priority of all resources running on it will be more likely to remain operational. For example, if you set pcs resource defaults update priority=1 and pcs property set priority-fencing-delay=15s and no other priorities are set, then the node running the most resources will be more likely to remain operational because the other node will wait 15 seconds before initiating fencing. If a particular resource is more important than the rest, you can give it a higher priority.
The node running the promoted role of a promotable clone gets an extra 1 point if a priority has been configured for that clone.
Interaction of fencing delays
Setting more than one type of fencing delay yields the following results:
-
Any delay set with the
priority-fencing-delayproperty is added to any delay from thepcmk_delay_baseandpcmk_delay_maxfence device properties. This behavior allows some delay when both nodes have equal priority, or both nodes need to be fenced for some reason other than node loss, as whenon-fail=fencingis set for a resource monitor operation. When setting these delays in combination, set thepriority-fencing-delayproperty to a value that is significantly greater than the maximum delay frompcmk_delay_baseandpcmk_delay_maxto be sure the prioritized node is preferred. Setting this property to twice this value is always safe. -
Only fencing scheduled by Pacemaker itself observes fencing delays. Fencing scheduled by external code such as
dlm_controldand fencing implemented by thepcs stonith fencecommand do not provide the necessary information to the fence device. -
Some individual fence agents implement a delay parameter, with a name determined by the agent and which is independent of delays configured with a
pcmk_delay_* property. If both of these delays are configured, they are added together and would generally not be used in conjunction.
4.4.5. Testing a fence device 링크 복사링크가 클립보드에 복사되었습니다!
Validate fence devices to ensure the cluster can successfully recover from node failures and prevent data corruption. A complete testing strategy involves verifying network connectivity, executing the fence agent script directly, triggering the fence action through the cluster manager, and simulating a physical node failure.
When a Pacemaker cluster node or Pacemaker remote node is fenced a hard kill should occur and not a graceful shutdown of the operating system. If a graceful shutdown occurs when your system fences a node, disable ACPI soft-off in the /etc/systemd/logind.conf file so that your system ignores any power-button-pressed signal. For instructions on disabling ACPI soft-off in the logind.conf file, see Disabling ACPI soft-off in the logind.conf file
Use the following procedure to test a fence device.
Procedure
Use SSH, Telnet, HTTP, or whatever remote protocol is used to connect to the device to manually log in and test the fence device or see what output is given. For example, if you will be configuring fencing for an IPMI-enabled device,then try to log in remotely with
ipmitool. Take note of the options used when logging in manually because those options might be needed when using the fencing agent.If you are unable to log in to the fence device, verify that the device is pingable, there is nothing such as a firewall configuration that is preventing access to the fence device, remote access is enabled on the fencing device, and the credentials are correct.
Run the fence agent manually, using the fence agent script. This does not require that the cluster services are running, so you can perform this step before the device is configured in the cluster. This can ensure that the fence device is responding properly before proceeding.
NoteThese examples use the
fence_ipmilanfence agent script for an iLO device. The actual fence agent you will use and the command that calls that agent will depend on your server hardware. You should consult the man page for the fence agent you are using to determine which options to specify. You will usually need to know the login and password for the fence device and other information related to the fence device.The following example shows the format you would use to run the
fence_ipmilanfence agent script with-o statusparameter to check the status of the fence device interface on another node without actually fencing it. This allows you to test the device and get it working before attempting to reboot the node. When running this command, you specify the name and password of an iLO user that has power on and off permissions for the iLO device.fence_ipmilan -a ipaddress -l username -p password -o status
# fence_ipmilan -a ipaddress -l username -p password -o statusCopy to Clipboard Copied! Toggle word wrap Toggle overflow The following example shows the format you would use to run the
fence_ipmilanfence agent script with the-o rebootparameter. Running this command on one node reboots the node managed by this iLO device.fence_ipmilan -a ipaddress -l username -p password -o reboot
# fence_ipmilan -a ipaddress -l username -p password -o rebootCopy to Clipboard Copied! Toggle word wrap Toggle overflow If the fence agent failed to properly do a status, off, on, or reboot action, you should check the hardware, the configuration of the fence device, and the syntax of your commands. In addition, you can run the fence agent script with the debug output enabled. The debug output is useful for some fencing agents to see where in the sequence of events the fencing agent script is failing when logging into the fence device.
fence_ipmilan -a ipaddress -l username -p password -o status -D /tmp/$(hostname)-fence_agent.debug
# fence_ipmilan -a ipaddress -l username -p password -o status -D /tmp/$(hostname)-fence_agent.debugCopy to Clipboard Copied! Toggle word wrap Toggle overflow When diagnosing a failure that has occurred, you should ensure that the options you specified when manually logging in to the fence device are identical to what you passed on to the fence agent with the fence agent script.
For fence agents that support an encrypted connection, you may see an error due to certificate validation failing, requiring that you trust the host or that you use the fence agent’s
ssl-insecureparameter. Similarly, if SSL/TLS is disabled on the target device, you may need to account for this when setting the SSL parameters for the fence agent.NoteIf the fence agent that is being tested is a
fence_drac,fence_ilo, or some other fencing agent for a systems management device that continues to fail, then fall back to tryingfence_ipmilan. Most systems management cards support IPMI remote login and the only supported fencing agent isfence_ipmilan.Once the fence device has been configured in the cluster with the same options that worked manually and the cluster has been started, test fencing with the
pcs stonith fencecommand from any node (or even multiple times from different nodes), as in the following example. Thepcs stonith fencecommand reads the cluster configuration from the CIB and calls the fence agent as configured to execute the fence action. This verifies that the cluster configuration is correct.pcs stonith fence node_name
# pcs stonith fence node_nameCopy to Clipboard Copied! Toggle word wrap Toggle overflow If the
pcs stonith fencecommand works properly, that means the fencing configuration for the cluster should work when a fence event occurs. If the command fails, it means that cluster management cannot invoke the fence device through the configuration it has retrieved. Check for the following issues and update your cluster configuration as needed.- Check your fence configuration. For example, if you have used a host map you should ensure that the system can find the node using the host name you have provided.
- Check whether the password and user name for the device include any special characters that could be misinterpreted by the bash shell. Making sure that you enter passwords and user names surrounded by quotation marks could address this issue.
-
Check whether you can connect to the device using the exact IP address or host name you specified in the
pcs stonithcommand. For example, if you give the host name in the stonith command but test by using the IP address, that is not a valid test. If the protocol that your fence device uses is accessible to you, use that protocol to try to connect to the device. For example many agents use ssh or telnet. You should try to connect to the device with the credentials you provided when configuring the device, to see if you get a valid prompt and can log in to the device.
If you determine that all your parameters are appropriate but you still have trouble connecting to your fence device, you can check the logging on the fence device itself, if the device provides that, which will show if the user has connected and what command the user issued. You can also search through the
/var/log/messagesfile for instances of stonith and error, which could give some idea of what is transpiring, but some agents can provide additional information.
Once the fence device tests are working and the cluster is up and running, test an actual failure. To do this, take an action in the cluster that should initiate a token loss.
Take down a network. How you take a network depends on your specific configuration. In many cases, you can physically pull the network or power cables out of the host. For information about simulating a network failure, see the Red Hat Knowledgebase solution What is the proper way to simulate a network failure on a RHEL Cluster?.
NoteDisabling the network interface on the local host rather than physically disconnecting the network or power cables is not recommended as a test of fencing because it does not accurately simulate a typical real-world failure.
Block corosync traffic both inbound and outbound using the local firewall.
The following example blocks corosync, assuming the default corosync port is used,
firewalldis used as the local firewall, and the network interface used by corosync is in the default firewall zone:firewall-cmd --direct --add-rule ipv4 filter OUTPUT 2 -p udp --dport=5405 -j DROP firewall-cmd --add-rich-rule='rule family="ipv4" port port="5405" protocol="udp" drop
# firewall-cmd --direct --add-rule ipv4 filter OUTPUT 2 -p udp --dport=5405 -j DROP # firewall-cmd --add-rich-rule='rule family="ipv4" port port="5405" protocol="udp" dropCopy to Clipboard Copied! Toggle word wrap Toggle overflow Simulate a crash and panic your machine with
sysrq-trigger. Note, however, that triggering a kernel panic can cause data loss; it is recommended that you disable your cluster resources first.echo c > /proc/sysrq-trigger
# echo c > /proc/sysrq-triggerCopy to Clipboard Copied! Toggle word wrap Toggle overflow
4.4.6. Configuring fencing levels 링크 복사링크가 클립보드에 복사되었습니다!
Pacemaker supports fencing nodes with multiple devices through a feature called fencing topologies. To implement topologies, create the individual devices as you normally would and then define one or more fencing levels in the fencing topology section in the configuration.
Pacemaker processes fencing levels as follows:
- Each level is attempted in ascending numeric order, starting at 1.
- If a device fails, processing terminates for the current level. No further devices in that level are exercised and the next level is attempted instead.
- If all devices are successfully fenced, then that level has succeeded and no other levels are tried.
- The operation is finished when a level has passed (success), or all levels have been attempted (failed).
Use the following command to add a fencing level to a node. The devices are given as a comma-separated list of stonith ids, which are attempted for the node at that level.
pcs stonith level add level node devices
pcs stonith level add level node devices
The following example sets up fence levels so that if the device my_ilo fails and is unable to fence the node, then Pacemaker attempts to use the device my_apc.
Prerequisites
-
You have configured an ilo fence device called
my_ilofor noderh7-2. -
You have configured an apc fence device called
my_apcfor noderh7-2.
Procedure
Add a fencing level of 1 for fence device
my_iloon noderh7-2.pcs stonith level add 1 rh7-2 my_ilo
# pcs stonith level add 1 rh7-2 my_iloCopy to Clipboard Copied! Toggle word wrap Toggle overflow Add a fencing level of 2 for fence device
my_apcon noderh7-2.pcs stonith level add 2 rh7-2 my_apc
# pcs stonith level add 2 rh7-2 my_apcCopy to Clipboard Copied! Toggle word wrap Toggle overflow List the currently configured fencing levels.
pcs stonith level Node: rh7-2 Level 1 - my_ilo Level 2 - my_apc
# pcs stonith level Node: rh7-2 Level 1 - my_ilo Level 2 - my_apcCopy to Clipboard Copied! Toggle word wrap Toggle overflow
4.4.7. Removing a fence level 링크 복사링크가 클립보드에 복사되었습니다!
You can remove the fence level for the specified node and device. If no nodes or devices are specified then the fence level you specify is removed from all nodes.
Procedure
Remove the fence level for the specified node and device:
pcs stonith level remove level [node_id] [stonith_id] ... [stonith_id]
# pcs stonith level remove level [node_id] [stonith_id] ... [stonith_id]Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.4.8. Clearing fence levels 링크 복사링크가 클립보드에 복사되었습니다!
You can clear the fence levels on the specified node or stonith id. If you do not specify a node or stonith id, all fence levels are cleared.
Procedure
Clear the fence levels on the specified node or stinith id:
pcs stonith level clear [node]|stonith_id(s)]
# pcs stonith level clear [node]|stonith_id(s)]Copy to Clipboard Copied! Toggle word wrap Toggle overflow If you specify more than one stonith id, they must be separated by a comma and no spaces, as in the following example.
pcs stonith level clear dev_a,dev_b
# pcs stonith level clear dev_a,dev_bCopy to Clipboard Copied! Toggle word wrap Toggle overflow
4.4.9. Verifying nodes and devices in fence levels 링크 복사링크가 클립보드에 복사되었습니다!
You can verify that all fence devices and nodes specified in fence levels exist.
Procedure
Use the following command to verify that all fence devices and nodes specified in fence levels exist:
pcs stonith level verify
# pcs stonith level verifyCopy to Clipboard Copied! Toggle word wrap Toggle overflow
4.4.10. Specifying nodes in fencing topology 링크 복사링크가 클립보드에 복사되었습니다!
You can specify nodes in fencing topology by a regular expression applied on a node name and by a node attribute and its value.
Procedure
The following commands configure nodes
node1,node2, andnode3to use fence devicesapc1andapc2, and nodesnode4,node5, andnode6to use fence devicesapc3andapc4:pcs stonith level add 1 "regexp%node[1-3]" apc1,apc2 pcs stonith level add 1 "regexp%node[4-6]" apc3,apc4
# pcs stonith level add 1 "regexp%node[1-3]" apc1,apc2 # pcs stonith level add 1 "regexp%node[4-6]" apc3,apc4Copy to Clipboard Copied! Toggle word wrap Toggle overflow The following commands yield the same results by using node attribute matching:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.4.11. Configuring fencing for redundant power supplies 링크 복사링크가 클립보드에 복사되었습니다!
When configuring fencing for redundant power supplies, the cluster must ensure that when attempting to reboot a host, both power supplies are turned off before either power supply is turned back on.
If the node never completely loses power, the node may not release its resources. This opens up the possibility of nodes accessing these resources simultaneously and corrupting them.
You need to define each device only once and to specify that both are required to fence the node.
Procedure
Create the first fence device.
pcs stonith create apc1 fence_apc_snmp ipaddr=apc1.example.com login=user passwd='7a4D#1j!pz864' pcmk_host_map="node1.example.com:1;node2.example.com:2"
# pcs stonith create apc1 fence_apc_snmp ipaddr=apc1.example.com login=user passwd='7a4D#1j!pz864' pcmk_host_map="node1.example.com:1;node2.example.com:2"Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the second fence device.
pcs stonith create apc2 fence_apc_snmp ipaddr=apc2.example.com login=user passwd='7a4D#1j!pz864' pcmk_host_map="node1.example.com:1;node2.example.com:2"
# pcs stonith create apc2 fence_apc_snmp ipaddr=apc2.example.com login=user passwd='7a4D#1j!pz864' pcmk_host_map="node1.example.com:1;node2.example.com:2"Copy to Clipboard Copied! Toggle word wrap Toggle overflow Specify that both devices are required to fence the node.
pcs stonith level add 1 node1.example.com apc1,apc2 pcs stonith level add 1 node2.example.com apc1,apc2
# pcs stonith level add 1 node1.example.com apc1,apc2 # pcs stonith level add 1 node2.example.com apc1,apc2Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.4.12. Administering fence devices 링크 복사링크가 클립보드에 복사되었습니다!
The pcs command-line interface provides a variety of commands you can use to administer your fence devices after you have configured them.
4.4.12.1. Displaying configured fence devices 링크 복사링크가 클립보드에 복사되었습니다!
The following command shows all currently configured fence devices. If a stonith_id is specified, the command shows the options for that configured fencing device only. If the --full option is specified, all configured fencing options are displayed.
pcs stonith config [stonith_id] [--full]
pcs stonith config [stonith_id] [--full]
4.4.12.2. Exporting fence devices as pcs commands 링크 복사링크가 클립보드에 복사되었습니다!
You can display the pcs commands that can be used to re-create configured fence devices on a different system using the --output-format=cmd option of the pcs stonith config command.
The following commands create a fence_apc_snmp fence device and display the pcs command you can use to re-create the device.
4.4.12.3. Exporting fence level configuration 링크 복사링크가 클립보드에 복사되었습니다!
The pcs stonith config and the pcs stonith level config commands support the --output-format= option to export the fencing level configuration in JSON format and as pcs commands.
-
Specifying
--output-format=cmddisplays thepcscommands created from the current cluster configuration that configure fencing levels. You can use these commands to re-create configured fencing levels on a different system. -
Specifying
--output-format=jsondisplays the fencing level configuration in JSON format, which is suitable for machine parsing.
4.4.12.4. Modifying and deleting fence devices 링크 복사링크가 클립보드에 복사되었습니다!
Modify or add options to a currently configured fencing device with the following command.
pcs stonith update stonith_id [stonith_device_options]
pcs stonith update stonith_id [stonith_device_options]
Updating a SCSI fencing device with the pcs stonith update command causes a restart of all resources running on the same node where the fencing resource was running. You can use either version of the following command to update SCSI devices without causing a restart of other cluster resources. SCSI fencing devices can be configured as multipath devices.
pcs stonith update-scsi-devices stonith_id set device-path1 device-path2 pcs stonith update-scsi-devices stonith_id add device-path1 remove device-path2
pcs stonith update-scsi-devices stonith_id set device-path1 device-path2
pcs stonith update-scsi-devices stonith_id add device-path1 remove device-path2
Use the following command to remove a fencing device from the current configuration.
pcs stonith delete stonith_id
pcs stonith delete stonith_id
4.4.12.5. Manually fencing a cluster node 링크 복사링크가 클립보드에 복사되었습니다!
You can fence a node manually with the following command. If you specify the --off option this will use the off API call to stonith which will turn the node off instead of rebooting it.
pcs stonith fence node [--off]
pcs stonith fence node [--off]
In a situation where no fence device is able to fence a node even if it is no longer active, the cluster may not be able to recover the resources on the node. If this occurs, after manually ensuring that the node is powered down you can enter the following command to confirm to the cluster that the node is powered down and free its resources for recovery.
If the node you specify is not actually off, but running the cluster software or services normally controlled by the cluster, data corruption and cluster failure occurs.
pcs stonith confirm node
pcs stonith confirm node
4.4.12.6. Disabling a fence device 링크 복사링크가 클립보드에 복사되었습니다!
To disable a fencing device, run the pcs stonith disable command.
The following command disables the fence device myapc.
pcs stonith disable myapc
# pcs stonith disable myapc
4.4.12.7. Preventing a node from using a fencing device 링크 복사링크가 클립보드에 복사되었습니다!
To prevent a specific node from using a fencing device, you can configure location constraints for the fencing resource.
The following example prevents fence device node1-ipmi from running on node1.
pcs constraint location node1-ipmi avoids node1
# pcs constraint location node1-ipmi avoids node1
4.5. Configuring ACPI for use with integrated fence devices 링크 복사링크가 클립보드에 복사되었습니다!
If your cluster uses integrated fence devices, you must configure ACPI (Advanced Configuration and Power Interface) to ensure immediate and complete fencing.
If a cluster node is configured to be fenced by an integrated fence device, disable ACPI Soft-Off for that node. Disabling ACPI Soft-Off allows an integrated fence device to turn off a node immediately and completely rather than attempting a clean shutdown (for example, shutdown -h now). Otherwise, if ACPI Soft-Off is enabled, an integrated fence device can take four or more seconds to turn off a node (see the note that follows). In addition, if ACPI Soft-Off is enabled and a node panics or freezes during shutdown, an integrated fence device may not be able to turn off the node. Under those circumstances, fencing is delayed or unsuccessful. Consequently, when a node is fenced with an integrated fence device and ACPI Soft-Off is enabled, a cluster recovers slowly or requires administrative intervention to recover.
The amount of time required to fence a node depends on the integrated fence device used. Some integrated fence devices perform the equivalent of pressing and holding the power button; therefore, the fence device turns off the node in four to five seconds. Other integrated fence devices perform the equivalent of pressing the power button momentarily, relying on the operating system to turn off the node; therefore, the fence device turns off the node in a time span much longer than four to five seconds.
- The preferred way to disable ACPI Soft-Off is to change the BIOS setting to "instant-off" or an equivalent setting that turns off the node without delay, as described in Disabling ACPI Soft-Off with the Bios".
Disabling ACPI Soft-Off with the BIOS may not be possible with some systems. If disabling ACPI Soft-Off with the BIOS is not satisfactory for your cluster, you can disable ACPI Soft-Off with one of the following alternate methods:
-
Setting
HandlePowerKey=ignorein the/etc/systemd/logind.conffile and verifying that the node node turns off immediately when fenced, as described in Disabling ACPI soft-off in the logind.conf file. This is the first alternate method of disabling ACPI Soft-Off. Appending
acpi=offto the kernel boot command line, as described in Disabling ACPI completely in the GRUB 2 file. This is the second alternate method of disabling ACPI Soft-Off, if the preferred or the first alternate method is not available.ImportantThis method completely disables ACPI; some computers do not boot correctly if ACPI is completely disabled. Use this method only if the other methods are not effective for your cluster.
4.5.1. Disabling ACPI Soft-Off with the BIOS 링크 복사링크가 클립보드에 복사되었습니다!
You can disable ACPI Soft-Off by configuring the BIOS of each cluster node.
The procedure for disabling ACPI Soft-Off with the BIOS may differ among server systems. You should verify this procedure with your hardware documentation.
Procedure
-
Reboot the node and start the
BIOS CMOS Setup Utilityprogram. - Navigate to the Power menu (or equivalent power management menu).
At the Power menu, set the
Soft-Off by PWR-BTTNfunction (or equivalent) toInstant-Off(or the equivalent setting that turns off the node by means of the power button without delay). TheBIOS CMOS Setup Utiliyexample below shows a Power menu withACPI Functionset toEnabledandSoft-Off by PWR-BTTNset toInstant-Off.NoteThe equivalents to
ACPI Function,Soft-Off by PWR-BTTN, andInstant-Offmay vary among computers. However, the objective of this procedure is to configure the BIOS so that the computer is turned off by means of the power button without delay.-
Exit the
BIOS CMOS Setup Utilityprogram, saving the BIOS configuration. - Verify that the node turns off immediately when fenced. For information about testing a fence device, see Testing a fence device.
Example 4.1. BIOS CMOS Setup Utility
`Soft-Off by PWR-BTTN` set to `Instant-Off`
`Soft-Off by PWR-BTTN` set to
`Instant-Off`
This example shows ACPI Function set to Enabled, and Soft-Off by PWR-BTTN set to Instant-Off.
4.5.2. Disabling ACPI Soft-Off in the logind.conf file 링크 복사링크가 클립보드에 복사되었습니다!
You can disable power-key handing in the /etc/systemd/logind.conf file.
Procedure
Define the following configuration in the
/etc/systemd/logind.conffile:HandlePowerKey=ignore
HandlePowerKey=ignoreCopy to Clipboard Copied! Toggle word wrap Toggle overflow Restart the
systemd-logindservice:systemctl restart systemd-logind.service
# systemctl restart systemd-logind.serviceCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
- Verify that the node turns off immediately when fenced. For information about testing a fence device, see Testing a fence device.
4.5.3. Disabling ACPI completely in the GRUB 2 file 링크 복사링크가 클립보드에 복사되었습니다!
You can disable ACPI Soft-Off by appending acpi=off to the GRUB menu entry for a kernel.
This method completely disables ACPI; some computers do not boot correctly if ACPI is completely disabled. Use this method only if the other methods are not effective for your cluster.
Procedure
Use the
--argsoption in combination with the--update-kerneloption of thegrubbytool to change thegrub.cfgfile of each cluster node as follows:grubby --args=acpi=off --update-kernel=ALL
# grubby --args=acpi=off --update-kernel=ALLCopy to Clipboard Copied! Toggle word wrap Toggle overflow - Reboot the node.
Verification
- Verify that the node turns off immediately when fenced. For information about testing a fence device, see Testing a fence device
4.6. Setting up IP address resources on AWS 링크 복사링크가 클립보드에 복사되었습니다!
To manage network access for cluster resources during failover in a high availability (HA) cluster, you can configure IP address resources. The Red Hat High Availability Add-On offers resource agents for different Amazon Web Services (AWS) IP address types.
This includes internet-exposed addresses, single-zone addresses, and multi-zone addresses.
-
Exposed to the internet: Use the
awseipnetwork resource. -
Limited to a single AWS Availability Zone (AZ): Use the
awsvipandIPaddr2network resources. Reassigns to many AWS AZs within the same AWS region: Use the
aws-vpc-move-ipnetwork resource.NoteIf the HA cluster does not manage any IP addresses, the resource agents for managing virtual IP addresses on AWS are not required. If you need further guidance for your specific deployment, consult with AWS.
4.6.1. Creating an IP address resource to manage an IP address exposed to the internet 링크 복사링크가 클립보드에 복사되었습니다!
To ensure that high-availability (HA) clients can access a Red Hat Enterprise Linux (RHEL) node that uses public-facing internet connections, configure an AWS Secondary Elastic IP Address (awseip) resource to use an elastic IP address.
Prerequisites
- You have a configured cluster.
- Your cluster nodes must have access to the RHEL HA repositories. For details, see Installing the High Availability packages and agents.
- You have set up the AWS CLI2. For details, see Installing AWSCLI2.
Procedure
-
Add the two resources to the same group that you have already created to enforce
orderandcolocationconstraints. Install the
resource-agentspackage:dnf install resource-agents
# dnf install resource-agentsCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create an elastic IP address:
aws ec2 allocate-address --domain vpc --output text eipalloc-4c4a2c45 vpc 35.169.153.122
[root@ip-10-0-0-48 ~]# aws ec2 allocate-address --domain vpc --output text eipalloc-4c4a2c45 vpc 35.169.153.122Copy to Clipboard Copied! Toggle word wrap Toggle overflow Optional: Display the description of
awseip. This shows the options and default operations for this agent.pcs resource describe awseip
# pcs resource describe awseipCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create the Secondary Elastic IP address resource with the allocated IP address in the 2nd step:
pcs resource create <resource_id> awseip elastic_ip=<elastic_ip_address> allocation_id=<elastic_ip_association_id> --group networking-group
# pcs resource create <resource_id> awseip elastic_ip=<elastic_ip_address> allocation_id=<elastic_ip_association_id> --group networking-groupCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example:
pcs resource create elastic awseip elastic_ip=35.169.153.122 allocation_id=eipalloc-4c4a2c45 --group networking-group
# pcs resource create elastic awseip elastic_ip=35.169.153.122 allocation_id=eipalloc-4c4a2c45 --group networking-groupCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Verify the cluster status to ensure resources are available:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In this example,
newclusteris an active cluster where resources such asvipandelasticare part of thenetworking-groupresource group.Launch an SSH session from your local workstation to the elastic IP address that you have already created:
ssh -l ec2-user -i ~/.ssh/cluster-admin.pem 35.169.153.122
$ ssh -l ec2-user -i ~/.ssh/cluster-admin.pem 35.169.153.122Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Verify that the SSH connected host is same as the host with the elastic resources.
To use a private IP address that is limited to a single Amazon Web Services (AWS) Availability Zone (AZ) in a Red Hat Enterprise Linux (RHEL) high-availability (HA) cluster, you can configure an AWS Secondary Private IP Address (awsvip) resource.
Prerequisites
- You have a configured cluster.
- Your cluster nodes have access to the RHEL HA repositories. For details,see Installing the High Availability packages and agents.
- You have set up the AWS CLI. For instructions, see Installing AWSCLI2.
Procedure
Install the
resource-agentspackage.dnf install resource-agents
# dnf install resource-agentsCopy to Clipboard Copied! Toggle word wrap Toggle overflow Optional: View the options and default operations for
awsvip:pcs resource describe awsvip
# pcs resource describe awsvipCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create a Secondary Private IP address with an unused private IP address in the
VPC CIDRblock:pcs resource create privip awsvip secondary_private_ip=10.0.0.68 --group networking-group
[root@ip-10-0-0-48 ~]# pcs resource create privip awsvip secondary_private_ip=10.0.0.68 --group networking-groupCopy to Clipboard Copied! Toggle word wrap Toggle overflow Here, secondary private IP address is a part of gets included in a resource group
Create a virtual IP resource with the
vipresource ID and thenetworking-groupgroup name:root@ip-10-0-0-48 ~]# pcs resource create vip IPaddr2 ip=10.0.0.68 --group networking-group
root@ip-10-0-0-48 ~]# pcs resource create vip IPaddr2 ip=10.0.0.68 --group networking-groupCopy to Clipboard Copied! Toggle word wrap Toggle overflow This is a Virtual Private Cloud (VPC) IP address that maps from the fence node to the failover node, masking the failure of the fence node within the subnet. Ensure that the virtual IP belongs to the same resource group as the Secondary Private IP address you created in the last step.
Verification
Verify the cluster status to ensure resources are available:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In this example,
newclusteris an active cluster where resources such asvipandelasticare part of thenetworking-groupresource group.
To use an elastic IP address on Amazon Web Services (AWS), you can configure a Red Hat Enterprise Linux (RHEL) Overlay IP (aws-vpc-move-ip) resource agent. With aws-vpc-move-ip, you can move a RHEL node within a single region of AWS across multiple availability zones (AZ) to ensure high-availability (HA) clients.
Prerequisites
- You have an already configured cluster.
- Your cluster nodes have access to the RHEL HA repositories. For more information, see Installing the High Availability packages and agents.
- You have set up the AWS CLI. For instructions, see Installing AWSCLI2.
You have configured an Identity and Access Management (IAM) user on your cluster with the following permissions:
- Modify routing tables
- Create security groups
- Create IAM policies and roles
Procedure
Install the
resource-agentspackage:dnf install resource-agents
# dnf install resource-agentsCopy to Clipboard Copied! Toggle word wrap Toggle overflow Optional: View the options and default operations for
awsvip:pcs resource describe aws-vpc-move-ip
# pcs resource describe aws-vpc-move-ipCopy to Clipboard Copied! Toggle word wrap Toggle overflow Set up an
OverlayIPAgentIAM policy for the IAM user.-
In the AWS console, navigate to Services
IAM Policies Create OverlayIPAgentPolicy Input the following configuration, and change the <region>, <account_id>, and <cluster_route_table_id> values to correspond with your cluster.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
-
In the AWS console, navigate to Services
In the AWS console, disable the
Source/Destination Checkfunction on all nodes in the cluster.To do this, right-click each node
Networking Change Source/Destination Checks. In the pop-up message that is displayed, click Yes, Disable. Create a route table for the cluster. To do so, use the following command on one node in the cluster:
aws ec2 create-route --route-table-id <cluster_route_table_id> --destination-cidr-block <new_cidr_block_ip/net_mask> --instance-id <cluster_node_id>
# aws ec2 create-route --route-table-id <cluster_route_table_id> --destination-cidr-block <new_cidr_block_ip/net_mask> --instance-id <cluster_node_id>Copy to Clipboard Copied! Toggle word wrap Toggle overflow In the command, replace values as follows:
-
ClusterRouteTableID: The route table ID for the existing cluster Virtual Private Cloud (VPC) route table. -
NewCIDRblockIP: A new IP address and netmask outside of the VPC classless inter-domain routing (CIDR) block. For example, if the VPC CIDR block is172.31.0.0/16, the new IP address or netmask can be192.168.0.15/32. -
ClusterNodeID: The instance ID for another node in the cluster.
-
On one of the nodes in the cluster, create a
aws-vpc-move-ipresource that uses a free IP address that is accessible to the client. The following example creates a resource namedvpcipthat uses IP192.168.0.15.pcs resource create vpcip aws-vpc-move-ip ip=192.168.0.15 interface=eth0 routing_table=<cluster_route_table_id>
# pcs resource create vpcip aws-vpc-move-ip ip=192.168.0.15 interface=eth0 routing_table=<cluster_route_table_id>Copy to Clipboard Copied! Toggle word wrap Toggle overflow On all nodes in the cluster, edit the
/etc/hosts/file, and add a line with the IP address of the newly created resource. For example:192.168.0.15 vpcip
192.168.0.15 vpcipCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Test the failover ability of the new
aws-vpc-move-ipresource:pcs resource move vpcip
# pcs resource move vpcipCopy to Clipboard Copied! Toggle word wrap Toggle overflow If the failover succeeded, remove the automatically created constraint after the move of the
vpcipresource:pcs resource clear vpcip
# pcs resource clear vpcipCopy to Clipboard Copied! Toggle word wrap Toggle overflow