5.4. 测试 ERS 实例失败

当 ASCS 实例的 enqueue 复制服务器(ERS)失败时，pacemaker 集群是否采取必要的操作。

测试先决条件

两个集群节点都有运行 ASCS 和 ERS 的资源组：

[root@node1]# pcs status | egrep -e "S4H_ascs20|S4H_ers29"
    * S4H_ascs20	(ocf:heartbeat:SAPInstance):	 Started node2
    * S4H_ers29	(ocf:heartbeat:SAPInstance):	 Started node1

已清除资源和资源组的所有故障，并且已重置故障计数。

测试步骤
- 识别运行 ERS 实例的节点上 enqueue 复制服务器进程的 PID。
- 向确定的进程发送 SIGKILL 信号。
监控
- 在测试过程中在一个单独的终端中运行以下命令：
```
[root@node2]# watch -n 1 pcs status
```
预期行为
- 排队复制服务器进程被终止。
- Pacemaker 集群根据每个配置采取必要的操作，在这种情况下，重启同一节点上的 ERS 实例。

测试

切换到 < sid>adm 用户：
```
[root@node1]# su - s4hadm
```

识别 enqr.sap 的 PID：

node1:s4hadm 56> pgrep -af enqr.sap
532273 enqr.sapS4H_ERS29 pf=/usr/sap/S4H/SYS/profile/S4H_ERS29_s4ers

终止识别的进程：
```
node1:s4hadm 58> kill -9 532273
```

注意集群 "Failed Resource Actions":

[root@node1]# pcs status | grep "Failed Resource Actions" -A1
Failed Resource Actions:
  * S4H_ers29 2m-interval monitor on node1 returned 'not running' at Thu Dec  7 13:15:02 2023

ERS 在同一节点上重启，而不干扰 ASCS 已在其他节点上运行：

[root@node1]# pcs status | egrep -e "S4H_ascs20|S4H_ers29"
    * S4H_ascs20	(ocf:heartbeat:SAPInstance):	 Started node2
    * S4H_ers29	(ocf:heartbeat:SAPInstance):	 Started node1
  * S4H_ers29 2m-interval monitor on node1 returned 'not running' at Thu Dec  7 13:15:02 2023

恢复过程

清除失败的操作：

[root@node1]# pcs resource cleanup S4H_ers29
…
Waiting for 1 reply from the controller
... got reply (done)

5.4. 测试 ERS 实例失败

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Red Hat legal and privacy links

Red Hat legal and privacy links