Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.
Chapter 4. Fencing Controller nodes with STONITH
Fencing is the process of isolating a failed node to protect the cluster and the cluster resources. Without fencing, a failed node might result in data corruption in a cluster. Director uses Pacemaker to provide a highly available cluster of Controller nodes.
Pacemaker uses a process called STONITH to fence failed nodes. STONITH is an acronym for "Shoot the other node in the head". STONITH is disabled by default and requires manual configuration so that Pacemaker can control the power management of each node in the cluster.
Deploying a highly available overcloud without STONITH is not supported. You must configure a STONITH device for each node that is a part of the Pacemaker cluster in a highly available overcloud. For more information on STONITH and Pacemaker, see Fencing in a Red Hat High Availability Cluster and Support Policies for RHEL High Availability Clusters.
If a Controller node fails a health check, the Controller node that acts as the Pacemaker designated coordinator (DC) uses the Pacemaker stonith
service to fence the impacted Controller node.
When a Pacemaker cluster node or Pacemaker remote node is fenced a hard kill of the operating system must occur and not a graceful shutdown. For more information, see Testing a fence device in RHEL Configuring and managing high availability clusters.
4.1. Supported fencing agents Link kopierenLink in die Zwischenablage kopiert!
When you deploy a high availability environment with fencing, you can choose the fencing agents based on your environment needs. To change the fencing agent, you must configure additional parameters in the fencing.yaml
file.
Red Hat OpenStack Platform (RHOSP) supports the following fencing agents:
- Intelligent Platform Management Interface (IPMI)
- Default fencing mechanism that Red Hat OpenStack Platform (RHOSP) uses to manage fencing.
- STONITH Block Device (SBD)
The SBD (Storage-Based Death) daemon integrates with Pacemaker and a watchdog device to arrange for nodes to reliably shut down when fencing is triggered and in cases where traditional fencing mechanisms are not available.
Important-
You can only configure SBD fencing on controller nodes, because it is not supported on virtual machines or nodes that use the
pacemaker_remote
service. -
fence_sbd
andsbd poison-pill
fencing with block storage devices are not supported. - SBD fencing is only supported with compatible watchdog devices. For more information, see Support Policies for RHEL High Availability Clusters - sbd and fence_sbd.
-
You can only configure SBD fencing on controller nodes, because it is not supported on virtual machines or nodes that use the
fence_kdump
Use in deployments with the
kdump
crash recovery service. If you choose this agent, ensure that you have enough disk space to store the dump files.You can configure this agent in addition to the IPMI,
fence_rhevm
, or Redfish fencing agents. If you configure multiple fencing agents, ensure that you allocate enough time for the first agent to complete the task before the second agent starts the next task. Implement thisfence_kdump
STONITH agent as a first level device. For example:
-
RHOSP director supports only the configuration of the
fence_kdump
STONITH agent, and not the configuration of the fullkdump
service that the fencing agent depends on. For information about configuring thekdump
service, see the article How do I configure fence_kdump in a Red Hat Pacemaker cluster. -
fence_kdump
is not supported if the Pacemaker network traffic interface uses theovs_bridges
orovs_bonds
network device. To enablefence_kdump
, you must change the network device tolinux_bond
orlinux_bridge
.
- Redfish
-
Use in deployments with servers that support the DMTF Redfish APIs. To specify this agent, change the value of the
agent
parameter tofence_redfish
in thefencing.yaml
file. For more information about Redfish, see the DTMF Documentation. - Multi-layered fencing
-
You can configure multiple fencing agents to support complex fencing use cases. For example, you can configure IPMI fencing together with
fence_kdump
. The order of the fencing agents determines the order in which Pacemaker triggers each mechanism.
4.2. Deploying fencing on the overcloud Link kopierenLink in die Zwischenablage kopiert!
To deploy fencing on the overcloud, first review the state of STONITH and Pacemaker and configure the fencing.yaml
file. Then, deploy the overcloud and configure additional parameters. Finally, test that fencing is deployed correctly on the overcloud.
Prerequisites
- You have chosen the correct fencing agent for your deployment. For the list of supported fencing agents, see Section 4.1, “Supported fencing agents”.
-
You have verified that you can access the
nodes.json
file that you created when you registered your nodes in director. This file is a required input for thefencing.yaml
file that you generate during deployment. -
This
nodes.json
file must contain the MAC address of one of the network interfaces (NICs) on the node. For more information, see Registering Nodes for the Overcloud. - You have verified that all of the parameters for your fencing agents are correctly specified to ensure that these agents will be successfully created. However, the fencing agents will be created even when their parameters are incorrectly specified. Therefore you must ensure that all the created fencing agents are in the "running" and not the "stopped" state.
Procedure
-
Log in to each Controller node as the
tripleo-admin
user. Verify that the cluster is running:
sudo pcs status
$ sudo pcs status
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that STONITH is disabled:
sudo pcs property show
$ sudo pcs property show
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Depending on the fencing agent that you want to use, choose one of the following options:
If you use the IPMI or RHV fencing agent, generate the
fencing.yaml
environment file:(undercloud) $ openstack overcloud generate fencing --output fencing.yaml nodes.json
(undercloud) $ openstack overcloud generate fencing --output fencing.yaml nodes.json
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteThis command converts
ilo
anddrac
power management details to IPMI equivalents.-
If you use a different fencing agent, such as STONITH Block Device (SBD),
fence_kdump
, or Redfish, or if you use pre-provisioned nodes, create thefencing.yaml
file manually.
SBD fencing only: Add the following parameter to the
fencing.yaml
file:parameter_defaults: ExtraConfig: pacemaker::corosync::enable_sbd: true
parameter_defaults: ExtraConfig: pacemaker::corosync::enable_sbd: true
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteThis step is applicable to initial overcloud deployments only. For more information about how to enable SBD fencing on an existing overcloud, see Enabling sbd fencing in RHEL 7 and 8.
Multi-layered fencing only: Add the level-specific parameters to the generated
fencing.yaml
file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace
<parameter>
and<value>
with the actual parameters and values that the fencing agent requires.Run the
overcloud deploy
command and include thefencing.yaml
file and any other environment files that are relevant for your deployment:openstack overcloud deploy --templates \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e ~/templates/network-environment.yaml \ -e ~/templates/storage-environment.yaml --ntp-server pool.ntp.org --neutron-network-type vxlan --neutron-tunnel-types vxlan \ -e fencing.yaml
openstack overcloud deploy --templates \ -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \ -e ~/templates/network-environment.yaml \ -e ~/templates/storage-environment.yaml --ntp-server pool.ntp.org --neutron-network-type vxlan --neutron-tunnel-types vxlan \ -e fencing.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow SBD fencing only: Set the watchdog timer device interval and check that the interval is set correctly.
pcs property set stonith-watchdog-timeout=<interval> pcs property show
# pcs property set stonith-watchdog-timeout=<interval> # pcs property show
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Ensure that all of the created fencing agents are in the "running" and not the "stopped" state. Fencing agents in the "stopped" state have been incorrectly configured.
Verification
Log in to the overcloud as the
tripleo-admin
user and ensure that Pacemaker is configured as the resource manager:source stackrc metalsmith list | grep controller ssh tripleo-admin@<controller-x_ip> sudo pcs status | grep fence
$ source stackrc $ metalsmith list | grep controller $ ssh tripleo-admin@<controller-x_ip> $ sudo pcs status | grep fence stonith-overcloud-controller-x (stonith:fence_ipmilan): Started overcloud-controller-y
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In this example, Pacemaker is configured to use a STONITH resource for each of the Controller nodes that are specified in the
fencing.yaml
file.NoteYou must not configure the
fence-resource
process on the same node that it controls.Check the fencing resource attributes. The STONITH attribute values must match the values in the
fencing.yaml
file:sudo pcs stonith show <stonith-resource-controller-x>
$ sudo pcs stonith show <stonith-resource-controller-x>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.3. Testing fencing on the overcloud Link kopierenLink in die Zwischenablage kopiert!
To test that fencing works correctly, trigger fencing by closing all ports on a Controller node and restarting the server.
This procedure deliberately drops all connections to the Controller node, which causes the node to restart.
Prerequisites
- Fencing is deployed and running on the overcloud. For information on how to deploy fencing, see Section 4.2, “Deploying fencing on the overcloud”.
- Controller node is available for a restart.
Procedure
Log in to a Controller node as the
stack
user and source the credentials file:source stackrc metalsmith list | grep controller ssh tripleo-admin@<controller-x_ip>
$ source stackrc $ metalsmith list | grep controller $ ssh tripleo-admin@<controller-x_ip>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Change to the
root
user and close all connections to the Controller node:Copy to Clipboard Copied! Toggle word wrap Toggle overflow From a different Controller node, locate the fencing event in the Pacemaker log file:
ssh tripleo-admin@<controller-x_ip> less /var/log/cluster/corosync.log
$ ssh tripleo-admin@<controller-x_ip> $ less /var/log/cluster/corosync.log (less): /fenc*
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If the STONITH service performed the fencing action on the Controller, the log file shows a fencing event.
-
Wait a few minutes and then verify that the restarted Controller node is running in the cluster again by running the
pcs status
command. If you can see the Controller node that you restarted in the output, fencing functions correctly.
4.4. Viewing STONITH device information Link kopierenLink in die Zwischenablage kopiert!
To see how STONITH configures your fencing devices, run the pcs stonith status --full
command from the overcloud.
Prerequisites
- Fencing is deployed and running on the overcloud. For information on how to deploy fencing, see Section 4.2, “Deploying fencing on the overcloud”.
Procedure
Show the list of Controller nodes and the status of their STONITH devices:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow This output shows the following information for each resource:
-
IPMI power management service that the fencing device uses to turn the machines on and off as needed, such as
fence_ipmilan
. -
IP address of the IPMI interface, such as
10.100.0.51
. -
User name to log in with, such as
admin
. -
Password to use to log in to the node, such as
abc
. -
Interval in seconds at which each host is monitored, such as
60s
.
-
IPMI power management service that the fencing device uses to turn the machines on and off as needed, such as
4.5. Fencing parameters Link kopierenLink in die Zwischenablage kopiert!
When you deploy fencing on the overcloud, you generate the fencing.yaml
file with the required parameters to configure fencing.
The following example shows the structure of the fencing.yaml
environment file:
This file contains the following parameters:
- EnableFencing
- Enables the fencing functionality for Pacemaker-managed nodes.
- FencingConfig
Lists the fencing devices and the parameters for each device:
-
agent
: Fencing agent name. host_mac
: The mac address in lowercase of the provisioning interface or any other network interface on the server. You can use this as a unique identifier for the fencing device.ImportantDo not use the MAC address of the IPMI interface.
-
params
: List of fencing device parameters.
-
- Fencing device parameters
Lists the fencing device parameters. This example shows the parameters for the IPMI fencing agent:
-
auth
: IPMI authentication type (md5
,password
, or none). -
ipaddr
: IPMI IP address. -
ipport
: IPMI port. -
login
: Username for the IPMI device. -
passwd
: Password for the IPMI device. -
lanplus
: Use lanplus to improve security of connection. -
privlvl
: Privilege level on IPMI device -
pcmk_host_list
: List of Pacemaker hosts.
-