此内容没有您所选择的语言版本。
Chapter 20. Fencing the Controller Nodes
Fencing is the process of isolating a failed node to protect a cluster and its resources. Without fencing, a failed node can result in data corruption in a cluster.
The director uses Pacemaker to provide a highly available cluster of Controller nodes. Pacemaker uses a process called STONITH to fence failed nodes. STONITH is disabled by default and requires manual configuration so that Pacemaker can control the power management of each node in the cluster.
20.1. Review the state of STONITH and Pacemaker 复制链接链接已复制到粘贴板!
-
Log in to each node as the
heat-adminuser from thestackuser on the director. The overcloud creation automatically copies thestackuser’s SSH key to each node’sheat-admin. Verify you have a running cluster:
$ sudo pcs status Cluster name: openstackHA Last updated: Wed Jun 24 12:40:27 2015 Last change: Wed Jun 24 11:36:18 2015 Stack: corosync Current DC: lb-c1a2 (2) - partition with quorum Version: 1.1.12-a14efad 3 Nodes configured 141 Resources configuredVerify STONITH is disabled:
$ sudo pcs property show Cluster Properties: cluster-infrastructure: corosync cluster-name: openstackHA dc-version: 1.1.12-a14efad have-watchdog: false stonith-enabled: false
20.2. Enable Fencing 复制链接链接已复制到粘贴板!
Generate the
fencing.yamlenvironment file using theopenstack overcloud generate fencingcommand:$ openstack overcloud generate fencing --ipmi-lanplus --ipmi-level administrator --output fencing.yaml nodes.jsonThis command requires the
nodes.jsonfile you created when registering your nodes in director. If using pre-provisioned nodes, you must create thefencing.yamlfile manually.The following snippet is a sample
fencing.yamlenvironment file:parameter_defaults: EnableFencing: true FencingConfig: devices: - agent: fence_ipmilan host_mac: 11:11:11:11:11:11 params: ipaddr: 10.0.0.101 lanplus: true login: admin passwd: InsertComplexPasswordHere pcmk_host_list: host04 privlvl: administratorNoteThe
openstack overcloud generate fencingcommand only outputs fencing options for IPMI. The command accepts nodes usingipmipower management details and convertsiloanddracpower management details to IPMI equivalents.
Pass the resulting
fencing.yamlfile to thedeploycommand you previously used to deploy the overcloud. This will re-run the deployment procedure and configure fencing on the hosts:openstack overcloud deploy --templates -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e ~/templates/network-environment.yaml -e ~/templates/storage-environment.yaml --control-scale 3 --compute-scale 3 --ceph-storage-scale 3 --control-flavor control --compute-flavor compute --ceph-storage-flavor ceph-storage --ntp-server pool.ntp.org --neutron-network-type vxlan --neutron-tunnel-types vxlan -e fencing.yamlThe deployment command should complete without any error or exceptions.
Log in to the overcloud and verify fencing was configured for each of the controllers:
Check the fencing resources are managed by Pacemaker:
$ source stackrc $ nova list | grep controller $ ssh heat-admin@<controller-x_ip> $ sudo pcs status | grep fence stonith-overcloud-controller-x (stonith:fence_ipmilan): Started overcloud-controller-yYou should see Pacemaker is configured to use a STONITH resource for each of the controllers specified in
fencing.yaml. Thefence-resourceprocess should not be configured on the same host it controls.Use
pcsto verify the fence resource attributes:$ sudo pcs stonith show <stonith-resource-controller-x>The values used by STONITH should match those defined in the
fencing.yaml.
20.3. Fencing parameters 复制链接链接已复制到粘贴板!
The following example shows the structure of the fencing.yaml environment file:
parameter_defaults:
EnableFencing: true
FencingConfig:
devices:
- agent: fence_ipmilan
host_mac: 11:11:11:11:11:11
params:
ipaddr: 10.0.0.101
lanplus: true
login: admin
passwd: InsertComplexPasswordHere
pcmk_host_list: host04
privlvl: administrator
This file requires the following parameters:
- EnableFencing
- Enables the fencing functionality for Pacemaker nodes.
- FencingConfig
The configuration for Pacemaker fencing functionality. This parameter contains a list of
devices, which requires three main parameters:-
agent, which is the fencing agent. Red Hat OpenStack Platform only supportsfence_ipmilanfor IPMI. -
host_mac, which is a unique identifier for the device. -
params, which is a YAML dictionary of fencing parameters.
-
| Parameter | Description |
|---|---|
|
|
IPMI authentication type ( |
|
| IPMI IP address. |
|
| IPMI port. |
|
| Username for the IPMI device. |
|
| Password for the IPMI device. |
|
| Use lanplus to improve security of connection. |
|
| Privilege level on IPMI device |
|
| List of Pacemaker hosts. |
20.4. Test Fencing 复制链接链接已复制到粘贴板!
This procedure tests whether fencing is working as expected.
Trigger a fencing action for each controller in the deployment:
Log in to a controller:
$ source stackrc $ nova list | grep controller $ ssh heat-admin@<controller-x_ip>As root, trigger fencing by using
iptablesto close all ports:$ sudo -i iptables -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT && iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT && iptables -A INPUT -p tcp -m state --state NEW -m tcp --dport 5016 -j ACCEPT && iptables -A INPUT -p udp -m state --state NEW -m udp --dport 5016 -j ACCEPT && iptables -A INPUT ! -i lo -j REJECT --reject-with icmp-host-prohibited && iptables -A OUTPUT -p tcp --sport 22 -j ACCEPT && iptables -A OUTPUT -p tcp --sport 5016 -j ACCEPT && iptables -A OUTPUT -p udp --sport 5016 -j ACCEPT && iptables -A OUTPUT ! -o lo -j REJECT --reject-with icmp-host-prohibitedAs a result, the connections should drop, and the server should be rebooted.
From another controller, locate the fencing event in the Pacemaker log file:
$ ssh heat-admin@<controller-x_ip> $ less /var/log/cluster/corosync.log (less): /fenc*You should see that STONITH has issued a fence action against the controller, and that Pacemaker has raised an event in the log.
Verify the rebooted controller has returned to the cluster:
-
From the second controller, wait a few minutes and run
pcs statusto see if the fenced controller has returned to the cluster. The duration can vary depending on your configuration.
-
From the second controller, wait a few minutes and run