Chapter 4. Fencing Controller nodes with STONITH

Fencing is the process of isolating a failed node to protect the cluster and the cluster resources. Without fencing, a failed node might result in data corruption in a cluster. Director uses Pacemaker to provide a highly available cluster of Controller nodes.

Pacemaker uses a process called STONITH to fence failed nodes. STONITH is an acronym for "Shoot the other node in the head". STONITH is disabled by default and requires manual configuration so that Pacemaker can control the power management of each node in the cluster.

If a Controller node fails a health check, the Controller node that acts as the Pacemaker designated coordinator (DC) uses the Pacemaker stonith service to fence the impacted Controller node.

Important

Deploying a highly available overcloud without STONITH is not supported. You must configure a STONITH device for each node that is a part of the Pacemaker cluster in a highly available overcloud. For more information on STONITH and Pacemaker, see Fencing in a Red Hat High Availability Cluster and Support Policies for RHEL High Availability Clusters.

4.1. Supported fencing agents
Copy link

When you deploy a high availability environment with fencing, you can choose the fencing agents based on your environment needs. To change the fencing agent, you must configure additional parameters in the fencing.yaml file.

Red Hat OpenStack Platform (RHOSP) supports the following fencing agents:

Intelligent Platform Management Interface (IPMI)

Default fencing mechanism that Red Hat OpenStack Platform (RHOSP) uses to manage fencing.

STONITH Block Device (SBD)

The SBD (Storage-Based Death) daemon integrates with Pacemaker and a watchdog device to arrange for nodes to reliably shut down when fencing is triggered and in cases where traditional fencing mechanisms are not available.

Important

SBD fencing is not supported in clusters with remote bare metal or virtual machine nodes that use pacemaker_remote, so it is not supported if your deployment uses Instance HA.
fence_sbd and sbd poison-pill fencing with block storage devices are not supported.
SBD fencing is only supported with compatible watchdog devices. For more information, see Support Policies for RHEL High Availability Clusters - sbd and fence_sbd.

fence_kdump

Use in deployments with the kdump crash recovery service. If you choose this agent, ensure that you have enough disk space to store the dump files.

You can configure this agent as a secondary mechanism in addition to the IPMI, fence_rhevm, or Redfish fencing agents. If you configure multiple fencing agents, make sure that you allocate enough time for the first agent to complete the task before the second agent starts the next task.

Important

RHOSP director supports only the configuration of the fence_kdump STONITH agent, and not the configuration of the full kdump service that the fencing agent depends on. For information about configuring the kdump service, see the article How do I configure fence_kdump in a Red Hat Pacemaker cluster.
fence_kdump is not supported if the Pacemaker network traffic interface uses the ovs_bridges or ovs_bonds network device. To enable fence_kdump, you must change the network device to linux_bond or linux_bridge. For more information about network interface configuration, see Network interface reference.

Redfish

Use in deployments with servers that support the DMTF Redfish APIs. To specify this agent, change the value of the agent parameter to fence_redfish in the fencing.yaml file. For more information about Redfish, see the DTMF Documentation.

fence_rhevm for Red Hat Virtualization (RHV)

Use to configure fencing for Controller nodes that run in RHV environments. You can generate the fencing.yaml file in the same way as you do for IPMI fencing, but you must define the pm_type parameter in the nodes.json file to use RHV.

By default, the ssl_insecure parameter is set to accept self-signed certificates. You can change the parameter value based on your security requirements.

Important

Ensure that you use a role that has permissions to create and launch virtual machines in RHV, such as UserVMManager.

Multi-layered fencing

You can configure multiple fencing agents to support complex fencing use cases. For example, you can configure IPMI fencing together with fence_kdump. The order of the fencing agents determines the order in which Pacemaker triggers each mechanism.

4.2. Deploying fencing on the overcloud
Copy link

To deploy fencing on the overcloud, first review the state of STONITH and Pacemaker and configure the fencing.yaml file. Then, deploy the overcloud and configure additional parameters. Finally, test that fencing is deployed correctly on the overcloud.

Prerequisites

Choose the correct fencing agent for your deployment. For the list of supported fencing agents, see Section 4.1, “Supported fencing agents”.
Ensure that you can access the nodes.json file that you created when you registered your nodes in director. This file is a required input for the fencing.yaml file that you generate during deployment.
The nodes.json file must contain the MAC address of one of the network interfaces (NICs) on the node. For more information, see Registering Nodes for the Overcloud.
If you use the Red Hat Virtualization (RHV) fencing agent, use a role that has permissions to manage virtual machines, such as UserVMManager.

Procedure

Verify that the cluster is running:

sudo pcs status

$ sudo pcs status

Copy to Clipboard

Toggle word wrap

Example output:

Cluster name: openstackHA
Last updated: Wed Jun 24 12:40:27 2015
Last change: Wed Jun 24 11:36:18 2015
Stack: corosync
Current DC: lb-c1a2 (2) - partition with quorum
Version: 1.1.12-a14efad
3 Nodes configured
141 Resources configured

Cluster name: openstackHA
Last updated: Wed Jun 24 12:40:27 2015
Last change: Wed Jun 24 11:36:18 2015
Stack: corosync
Current DC: lb-c1a2 (2) - partition with quorum
Version: 1.1.12-a14efad
3 Nodes configured
141 Resources configured

Copy to Clipboard

Toggle word wrap

Verify that STONITH is disabled:

sudo pcs property show

$ sudo pcs property show

Copy to Clipboard

Toggle word wrap

Example output:

Cluster Properties:
cluster-infrastructure: corosync
cluster-name: openstackHA
dc-version: 1.1.12-a14efad
have-watchdog: false
stonith-enabled: false

Cluster Properties:
cluster-infrastructure: corosync
cluster-name: openstackHA
dc-version: 1.1.12-a14efad
have-watchdog: false
stonith-enabled: false

Copy to Clipboard

Toggle word wrap

Depending on the fencing agent that you want to use, choose one of the following options:
- If you use the IPMI or RHV fencing agent, generate the fencing.yaml environment file:
  $ openstack overcloud generate fencing --output fencing.yaml nodes.json
  Copy to Clipboard Toggle word wrap
  Note
  This command converts ilo and drac power management details to IPMI equivalents.
- If you use a different fencing agent, such as STONITH Block Device (SBD), fence_kdump, or Redfish, or if you use pre-provisioned nodes, create the fencing.yaml file manually.
SBD fencing only: Add the following parameter to the fencing.yaml file:
```
parameter_defaults:
  ExtraConfig:
    pacemaker::corosync::enable_sbd: true
```
```
parameter_defaults:
  ExtraConfig:
    pacemaker::corosync::enable_sbd: true
```
Copy to Clipboard Toggle word wrap
Note
This step is applicable to initial overcloud deployments only. For more information about how to enable SBD fencing on an existing overcloud, see Enabling sbd fencing in RHEL 7 and 8.

Multi-layered fencing only: Add the level-specific parameters to the generated fencing.yaml file:

parameter_defaults:
  EnableFencing: true
  FencingConfig:
    devices:
      level1:
      - agent: [VALUE]
        host_mac: aa:bb:cc:dd:ee:ff
        params:
          <parameter>: <value>
      level2:
      - agent: fence_agent2
        host_mac: aa:bb:cc:dd:ee:ff
        params:
          <parameter>: <value>

parameter_defaults:
  EnableFencing: true
  FencingConfig:
    devices:
      level1:
      - agent: [VALUE]
        host_mac: aa:bb:cc:dd:ee:ff
        params:
          <parameter>: <value>
      level2:
      - agent: fence_agent2
        host_mac: aa:bb:cc:dd:ee:ff
        params:
          <parameter>: <value>

Copy to Clipboard

Toggle word wrap

Replace <parameter> and <value> with the actual parameters and values that the fencing agent requires.

Run the overcloud deploy command and include the fencing.yaml file and any other environment files that are relevant for your deployment:

openstack overcloud deploy --templates \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e ~/templates/network-environment.yaml \
-e ~/templates/storage-environment.yaml --ntp-server pool.ntp.org --neutron-network-type vxlan --neutron-tunnel-types vxlan \
-e fencing.yaml

openstack overcloud deploy --templates \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e ~/templates/network-environment.yaml \
-e ~/templates/storage-environment.yaml --ntp-server pool.ntp.org --neutron-network-type vxlan --neutron-tunnel-types vxlan \
-e fencing.yaml

Copy to Clipboard

Toggle word wrap

SBD fencing only: Set the watchdog timer device interval and check that the interval is set correctly.
```
pcs property set stonith-watchdog-timeout=<interval>
pcs property show
```
```
# pcs property set stonith-watchdog-timeout=<interval>
# pcs property show
```
Copy to Clipboard Toggle word wrap

Verification

Log in to the overcloud as the stack user and check that Pacemaker is configured as the resource manager:
```
source stackrc
openstack server list | grep controller
ssh heat-admin@<controller-x_ip>
sudo pcs status | grep fence
```
```
$ source stackrc
$ openstack server list | grep controller
$ ssh heat-admin@<controller-x_ip>
$ sudo pcs status | grep fence
stonith-overcloud-controller-x (stonith:fence_ipmilan): Started overcloud-controller-y
```
Copy to Clipboard Toggle word wrap
In this example, Pacemaker is configured to use a STONITH resource for each of the Controller nodes that are specified in the fencing.yaml file.
Note
You must not configure the fence-resource process on the same node that it controls.
Check the fencing resource attributes. The STONITH attribute values must match the values in the fencing.yaml file:
```
sudo pcs stonith show <stonith-resource-controller-x>
```
```
$ sudo pcs stonith show <stonith-resource-controller-x>
```
Copy to Clipboard Toggle word wrap

4.3. Testing fencing on the overcloud
Copy link

To test that fencing works correctly, trigger fencing by closing all ports on a Controller node and restarting the server.

Important

This procedure deliberately drops all connections to the Controller node, which causes the node to restart.

Prerequisites

Fencing is deployed and running on the overcloud. For information on how to deploy fencing, see Section 4.2, “Deploying fencing on the overcloud”.
Controller node is available for a restart.

Procedure

source stackrc
openstack server list | grep controller
ssh heat-admin@<controller-x_ip>

$ source stackrc
$ openstack server list | grep controller
$ ssh heat-admin@<controller-x_ip>

Copy to Clipboard

Toggle word wrap

Change to the root user and close all connections to the Controller node:

sudo -i

$ sudo -i
iptables -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT &&
iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT &&
iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 5016 -j ACCEPT &&
iptables -A INPUT -p udp -m state --state NEW -m udp --dport 5016 -j ACCEPT &&
iptables -A INPUT ! -i lo -j REJECT --reject-with icmp-host-prohibited &&
iptables -A OUTPUT -p tcp --sport 22 -j ACCEPT &&
iptables -A OUTPUT -p tcp --sport 5016 -j ACCEPT &&
iptables -A OUTPUT -p udp --sport 5016 -j ACCEPT &&
iptables -A OUTPUT ! -o lo -j REJECT --reject-with icmp-host-prohibited

Copy to Clipboard

Toggle word wrap

From a different Controller node, locate the fencing event in the Pacemaker log file:
```
ssh heat-admin@<controller-x_ip>
less /var/log/cluster/corosync.log
```
```
$ ssh heat-admin@<controller-x_ip>
$ less /var/log/cluster/corosync.log
(less): /fenc*
```
Copy to Clipboard Toggle word wrap
If the STONITH service performed the fencing action on the Controller, the log file shows a fencing event.
Wait a few minutes and then verify that the restarted Controller node is running in the cluster again by running the pcs status command. If you can see the Controller node that you restarted in the output, fencing functions correctly.

4.4. Viewing STONITH device information
Copy link

To see how STONITH configures your fencing devices, run the pcs stonith show --full command from the overcloud.

Prerequisites

Fencing is deployed and running on the overcloud. For information on how to deploy fencing, see Section 4.2, “Deploying fencing on the overcloud”.

Procedure

Show the list of Controller nodes and the status of their STONITH devices:

sudo pcs stonith show --full

$ sudo pcs stonith show --full
 Resource: my-ipmilan-for-controller-0 (class=stonith type=fence_ipmilan)
  Attributes: pcmk_host_list=overcloud-controller-0 ipaddr=10.100.0.51 login=admin passwd=abc lanplus=1 cipher=3
  Operations: monitor interval=60s (my-ipmilan-for-controller-0-monitor-interval-60s)
 Resource: my-ipmilan-for-controller-1 (class=stonith type=fence_ipmilan)
  Attributes: pcmk_host_list=overcloud-controller-1 ipaddr=10.100.0.52 login=admin passwd=abc lanplus=1 cipher=3
  Operations: monitor interval=60s (my-ipmilan-for-controller-1-monitor-interval-60s)
 Resource: my-ipmilan-for-controller-2 (class=stonith type=fence_ipmilan)
  Attributes: pcmk_host_list=overcloud-controller-2 ipaddr=10.100.0.53 login=admin passwd=abc lanplus=1 cipher=3
  Operations: monitor interval=60s (my-ipmilan-for-controller-2-monitor-interval-60s)

Copy to Clipboard

Toggle word wrap

This output shows the following information for each resource:

IPMI power management service that the fencing device uses to turn the machines on and off as needed, such as fence_ipmilan.
IP address of the IPMI interface, such as 10.100.0.51.
User name to log in with, such as admin.
Password to use to log in to the node, such as abc.
Interval in seconds at which each host is monitored, such as 60s.

4.5. Fencing parameters
Copy link

When you deploy fencing on the overcloud, you generate the fencing.yaml file with the required parameters to configure fencing.

The following example shows the structure of the fencing.yaml environment file:

parameter_defaults:
  EnableFencing: true
  FencingConfig:
    devices:
    - agent: fence_ipmilan
      host_mac: 11:11:11:11:11:11
      params:
        ipaddr: 10.0.0.101
        lanplus: true
        login: admin
        passwd: InsertComplexPasswordHere
        pcmk_host_list: host04
        privlvl: administrator

parameter_defaults:
  EnableFencing: true
  FencingConfig:
    devices:
    - agent: fence_ipmilan
      host_mac: 11:11:11:11:11:11
      params:
        ipaddr: 10.0.0.101
        lanplus: true
        login: admin
        passwd: InsertComplexPasswordHere
        pcmk_host_list: host04
        privlvl: administrator

Copy to Clipboard

Toggle word wrap

This file contains the following parameters:

EnableFencing

Enables the fencing functionality for Pacemaker-managed nodes.

FencingConfig

Lists the fencing devices and the parameters for each device:

agent: Fencing agent name.
host_mac: The mac address in lowercase of the provisioning interface or any other network interface on the server. You can use this as a unique identifier for the fencing device.
params: List of fencing device parameters.

Fencing device parameters

Lists the fencing device parameters. This example shows the parameters for the IPMI fencing agent:

auth: IPMI authentication type (md5, password, or none).
ipaddr: IPMI IP address.
ipport: IPMI port.
login: Username for the IPMI device.
passwd: Password for the IPMI device.
lanplus: Use lanplus to improve security of connection.
privlvl: Privilege level on IPMI device
pcmk_host_list: List of Pacemaker hosts.

Chapter 4. Fencing Controller nodes with STONITH

4.1. Supported fencing agents
Copy link

4.2. Deploying fencing on the overcloud
Copy link

4.3. Testing fencing on the overcloud
Copy link

4.4. Viewing STONITH device information
Copy link

4.5. Fencing parameters
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 4. Fencing Controller nodes with STONITH

4.1. Supported fencing agentsCopy linkLink copied to clipboard!

4.2. Deploying fencing on the overcloudCopy linkLink copied to clipboard!

4.3. Testing fencing on the overcloudCopy linkLink copied to clipboard!

4.4. Viewing STONITH device informationCopy linkLink copied to clipboard!

4.5. Fencing parametersCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

4.1. Supported fencing agents
Copy link

4.2. Deploying fencing on the overcloud
Copy link

4.3. Testing fencing on the overcloud
Copy link

4.4. Viewing STONITH device information
Copy link

4.5. Fencing parameters
Copy link