5.3. 测试 ASCS 实例失败
验证 pacemaker 集群在 ASCS 实例的 enqueue 服务器或整个 实例失败时采取必要的操作。
ASCS
测试先决条件
两个集群节点都有运行
ASCS和ERS的资源组:pcs status | egrep -e "S4H_ascs20|S4H_ers29" * S4H_ascs20 (ocf:heartbeat:SAPInstance): Started node1 * S4H_ers29 (ocf:heartbeat:SAPInstance): Started node2
[root@node2]# pcs status | egrep -e "S4H_ascs20|S4H_ers29" * S4H_ascs20 (ocf:heartbeat:SAPInstance): Started node1 * S4H_ers29 (ocf:heartbeat:SAPInstance): Started node2Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 已清除资源和资源组的所有故障,并且已重置故障计数。
测试步骤
-
识别运行
ASCS的节点上的 enqueue 服务器的PID。 -
向确定的进程发送
SIGKILL信号。
-
识别运行
监控
在测试过程中在一个单独的终端中运行以下命令:
watch -n 1 pcs status
[root@node2]# watch -n 1 pcs statusCopy to Clipboard Copied! Toggle word wrap Toggle overflow
预期行为
- 排队服务器进程被终止。
-
pacemaker 集群会根据配置采取必要的操作,在本例中将
ASCS移到其他节点。
测试
在运行
ASCS的节点上切换到<sid>adm 用户:su - s4hadm
[root@node1]# su - s4hadmCopy to Clipboard Copied! Toggle word wrap Toggle overflow 识别 en.sap (NetWeaver) enq.sap (S/4HANA)的 PID:
node1:s4hadm 51> pgrep -af "(en|enq).sap" 31464 enq.sapS4H_ASCS20 pf=/usr/sap/S4H/SYS/profile/S4H_ASCS20_s4ascs
node1:s4hadm 51> pgrep -af "(en|enq).sap" 31464 enq.sapS4H_ASCS20 pf=/usr/sap/S4H/SYS/profile/S4H_ASCS20_s4ascsCopy to Clipboard Copied! Toggle word wrap Toggle overflow 终止识别的进程:
node1:s4hadm 52> kill -9 31464
node1:s4hadm 52> kill -9 31464Copy to Clipboard Copied! Toggle word wrap Toggle overflow 注意集群
失败的资源操作:pcs status | grep "Failed Resource Actions" -A1 Failed Resource Actions: * S4H_ascs20 2m-interval monitor on node1 returned 'not running' at Wed Dec 6 15:37:24 2023
[root@node2]# pcs status | grep "Failed Resource Actions" -A1 Failed Resource Actions: * S4H_ascs20 2m-interval monitor on node1 returned 'not running' at Wed Dec 6 15:37:24 2023Copy to Clipboard Copied! Toggle word wrap Toggle overflow ASCS和ERS移到其他节点:pcs status | egrep -e "S4H_ascs20|S4H_ers29" * S4H_ascs20 (ocf:heartbeat:SAPInstance): Started node2 * S4H_ers29 (ocf:heartbeat:SAPInstance): Started node1 * S4H_ascs20 2m-interval monitor on node1 returned 'not running' at Wed Dec 6 15:37:24 2023
[root@node2]# pcs status | egrep -e "S4H_ascs20|S4H_ers29" * S4H_ascs20 (ocf:heartbeat:SAPInstance): Started node2 * S4H_ers29 (ocf:heartbeat:SAPInstance): Started node1 * S4H_ascs20 2m-interval monitor on node1 returned 'not running' at Wed Dec 6 15:37:24 2023Copy to Clipboard Copied! Toggle word wrap Toggle overflow
恢复过程
清除失败的操作:
pcs resource cleanup S4H_ascs20 … Waiting for 1 reply from the controller ... got reply (done)
[root@node2]# pcs resource cleanup S4H_ascs20 … Waiting for 1 reply from the controller ... got reply (done)Copy to Clipboard Copied! Toggle word wrap Toggle overflow