5.3. 测试 1:故障切换具有活跃第三个站点的主节点
测试的主题 | 自动重新注册第三个站点。 清除后,将状态更改为 SOK。 |
测试先决条件 |
|
测试步骤 |
使用 |
监控测试 |
在第三个站点中,作为
在辅助节点上运行: |
启动测试 | 执行集群命令:
|
预期结果 | 在 site 3 上的 monitor 命令中,主主从 clusternode1 变为 clusternode2。
清除资源后,同步状态将从 |
返回初始状态的方法 | 运行测试两次。 |
(*)
remotehost3:rh2adm> watch hdbnsutil -sr_state [root@clusternode1]# tail -1000f /var/log/messages |egrep -e ‘SOK|SWAIT|SFAIL’
详细描述
以 root 用户身份在 clusternode1 或 clusternode2 上检查集群的初始状态。
[root@clusternode1]# pcs status --full Cluster name: cluster1 Cluster Summary: * Stack: corosync * Current DC: clusternode1 (1) (version 2.1.2-4.el8_6.6-ada5c3b36e2) - partition with quorum * Last updated: Mon Sep 4 06:34:46 2023 * Last change: Mon Sep 4 06:33:04 2023 by root via crm_attribute on clusternode1 * 2 nodes configured * 6 resource instances configured Node List: * Online: [ clusternode1 (1) clusternode2 (2) ] Full List of Resources: * auto_rhevm_fence1 (stonith:fence_rhevm): Started clusternode1 * Clone Set: SAPHanaTopology_RH2_02-clone [SAPHanaTopology_RH2_02]: * SAPHanaTopology_RH2_02 (ocf::heartbeat:SAPHanaTopology): Started clusternode2 * SAPHanaTopology_RH2_02 (ocf::heartbeat:SAPHanaTopology): Started clusternode1 * Clone Set: SAPHana_RH2_02-clone [SAPHana_RH2_02] (promotable): * SAPHana_RH2_02 (ocf::heartbeat:SAPHana): Slave clusternode2 * SAPHana_RH2_02 (ocf::heartbeat:SAPHana): Master clusternode1 * vip_RH2_02_MASTER (ocf::heartbeat:IPaddr2): Started clusternode1 Node Attributes: * Node: clusternode1 (1): * hana_rh2_clone_state : PROMOTED * hana_rh2_op_mode : logreplay * hana_rh2_remoteHost : clusternode2 * hana_rh2_roles : 4:P:master1:master:worker:master * hana_rh2_site : DC1 * hana_rh2_sra : - * hana_rh2_srah : - * hana_rh2_srmode : syncmem * hana_rh2_sync_state : PRIM * hana_rh2_version : 2.00.062.00 * hana_rh2_vhost : clusternode1 * lpa_rh2_lpt : 1693809184 * master-SAPHana_RH2_02 : 150 * Node: clusternode2 (2): * hana_rh2_clone_state : DEMOTED * hana_rh2_op_mode : logreplay * hana_rh2_remoteHost : clusternode1 * hana_rh2_roles : 4:S:master1:master:worker:master * hana_rh2_site : DC2 * hana_rh2_sra : - * hana_rh2_srah : - * hana_rh2_srmode : syncmem * hana_rh2_sync_state : SOK * hana_rh2_version : 2.00.062.00 * hana_rh2_vhost : clusternode2 * lpa_rh2_lpt : 30 * master-SAPHana_RH2_02 : 100 Migration Summary: Tickets: PCSD Status: clusternode1: Online clusternode2: Online Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled
此输出显示,HANA 在 clusternode1 上被提升,它是主 SAP HANA 服务器,克隆资源的名称为 SAPHana_RH2_02-clone 是可升级的。
您可以在测试期间在单独的窗口中运行它来查看更改:[root@clusternode1]# watch pcs status --full
识别 SAP HANA 克隆资源的另一种方法是:
[root@clusternode2]# pcs resource * Clone Set: SAPHanaTopology_RH2_02-clone [SAPHanaTopology_RH2_02]: * Started: [ clusternode1 clusternode2 ] * Clone Set: SAPHana_RH2_02-clone [SAPHana_RH2_02] (promotable): * Promoted: [ clusternode2 ] * Unpromoted: [ clusternode1 ]
要在开始测试前,在 remotehost3 上查看主服务器在 remotehost3 上启动监控的更改。
remotehost3:rh2adm> watch 'hdbnsutil -sr_state | grep "primary masters"
输出类似如下:
Every 2.0s: hdbnsutil -sr_state | grep "primary masters" remotehost3: Mon Sep 4 08:47:21 2023 primary masters: clusternode1
在测试过程中,预期的输出将更改为 clusternode2。
通过将上面发现的克隆资源移到 clusternode2 来启动测试:
[root@clusternode1]# pcs resource move SAPhana_RH2_02-clone clusternode2
remotehost3 上的 monitor 的输出将更改为:
Every 2.0s: hdbnsutil -sr_state | grep "primary masters" remotehost3: Mon Sep 4 08:50:31 2023 primary masters: clusternode2
Pacemaker 为移动克隆资源创建一个位置约束。这需要手动删除。您可以使用以下方法查看约束:
[root@clusternode1]# pcs constraint location
需要删除此约束。
清除克隆资源以删除位置约束:
[root@clusternode1]# pcs resource clear SAPhana_RH2_02-clone Removing constraint: cli-prefer-SAPHana_RH2_02-clone
清理资源:
[root@clusternode1]# pcs resource cleanup SAPHana_RH2_02-clone Cleaned up SAPHana_RH2_02:0 on clusternode2 Cleaned up SAPHana_RH2_02:1 on clusternode1 Waiting for 1 reply from the controller ... got reply (done)
测试的结果
- remotehost3 上的"主 master"监控应该立即切换到新主节点。
-
如果您检查集群状态,则以前的次要会被提升,以前的主会被重新注册,并且
Clone_State
从Promoted
变为Undefined
toWAITINGFORLPA
toDEMOTED
。 -
当
SAPHana
monitor 在故障转移后第一次启动时,次要会将sync_state
更改为SFAIL
。由于现有位置约束,资源需要被清除,并在次要的sync_state
的短时间将再次更改为SOK
。 - 二级被提升。
要恢复初始状态,您只需运行下一个测试即可。完成测试后,运行 清理。