5.4. 测试 2:使用被动第三个站点的主节点故障切换
测试的主题 | 第三站点没有注册。 即使第三个站点停机,故障转移也可以正常工作。 |
测试先决条件 |
|
测试步骤 |
使用 |
启动测试 |
执行集群命令:+ |
监控测试 |
在第三个站点作为
在集群节点中以 root 用户身份运行:{ |
预期结果 | DC3 没有更改。 |
返回初始状态的方法 | 在新主上重新注册 DC3,并启动 SAP HANA。 |
专家摘要:
在第三个节点上停止 SAP HANA:
% HDB stop
检查节点 3 上的集群状态和主节点:
# pcs status --full % hdbnsutil -sr_stateConfiguration | grep "primary masters"
在集群节点上启动故障切换:
# pcs resource move <SAPHana-clone-resource>
检查主要是否已移动,但在第三个节点上,主节点没有改变:
# pcs resource | grep Promoted % hdbnsutil -sr_stateConfiguration | grep "primary masters"
清理环境以再次运行测试:
# psc resource clear <SAPHana-clone-ressource>
使用新主注册 node3 :
% hdbnsutil -sr_register --remoteHost=node2 --remoteInstance=${TINSTANCE} --replicationMode=syncmem --name=DC3 --remoteName=DC2 --operationMode=logreplay --onlin
在 node3 上启动 SAP HANA:
% HDB start
输出详情的步骤
停止 node3 上的数据库:
sidadm+@node3% HDB stop hdbdaemon will wait maximal 300 seconds for NewDB services finishing. Stopping instance using: /usr/sap/RH2/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 02 -function Stop 400 12.07.2023 11:33:14 Stop OK Waiting for stopped instance using: /usr/sap/RH2/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 02 -function WaitforStopped 600 2 12.07.2023 11:33:30 WaitforStopped OK hdbdaemon is stopped.
检查节点 3 上的主数据库:
sidadm+@node3% hdbnsutil -sr_stateConfiguration| grep -i "primary masters" node3: Wed Jul 12 11:20:51 2023 primary masters: node2
检查集群中的当前主要信息:
root@node1# pcs+ resource | grep Promoted * Promoted: [ node1 ]
检查
sr_state
以查看 SAP HANA 系统复制关系:sidadm@node1%+ hdbnsutil -sr_state System Replication State ~~~~~~~~~~~~~~~~~~~~~~~~ online: true mode: primary operation mode: primary site id: 2 site name: DC1 is source system: true is secondary/consumer system: false has secondaries/consumers attached: true is a takeover active: false is primary suspended: false Host Mappings: ~~~~~~~~~~~~~~ node1 -> [DC3] node3 node1 -> [DC1] node1 node1 -> [DC2] node2 Site Mappings: ~~~~~~~~~~~~~~ DC1 (primary/primary) |---DC3 (syncmem/logreplay) |---DC2 (syncmem/logreplay) Tier of DC1: 1 Tier of DC3: 2 Tier of DC2: 2 Replication mode of DC1: primary Replication mode of DC3: syncmem Replication mode of DC2: syncmem Operation mode of DC1: primary Operation mode of DC3: logreplay Operation mode of DC2: logreplay Mapping: DC1 -> DC3 Mapping: DC1 -> DC2 done.
- 我们需要此信息才能将复制状态与故障转移完成后的状态进行比较。
在集群中启动故障转移,移动
SAPHana-clone-ressource
示例:root@node1+# pcs resource move SAPHana_RH2_02-clone Location constraint to move resource 'SAPHana_RH2_02-clone' has been created Waiting for the cluster to apply configuration changes... Location constraint created to move resource 'SAPHana_RH2_02-clone' has been removed Waiting for the cluster to apply configuration changes... resource 'SAPHana_RH2_02-clone' is promoted on node 'node2'; unpromoted on node 'node1'
清除资源:
root@node1# pcs+ resource clear SAPHana_RH2_02-clone root@node1# pcs resource cleanup SAPHana_RH2_02-clone Cleaned up SAPHana_RH2_02:0 on node1 Cleaned up SAPHana_RH2_02:1 on node2
检查集群的
sync_state
:root@node1# pcs status --full | grep sync_state * hana_rh2_sync_state : SOK * hana_rh2_sync_state : PRIM
- 根据数据库的大小,您需要等待 5 分钟或更多,除非其如下所示。另请参阅 检查复制状态。
检查 node3 上的主节点,这应该保持不变:
sidadm+@node3% hdbnsutil -sr_stateConfiguration| grep -i "primary masters" primary masters: node1
然后,检查新主上的复制状态:
sidadm@node1%+ hdbnsutil -sr_state System Replication State ~~~~~~~~~~~~~~~~~~~~~~~~ online: true mode: syncmem operation mode: logreplay site id: 2 site name: DC1 is source system: false is secondary/consumer system: true has secondaries/consumers attached: false is a takeover active: false is primary suspended: false is timetravel enabled: false replay mode: auto active primary site: 1 primary masters: node2 Host Mappings: ~~~~~~~~~~~~~~ node1 -> [DC3] node3 node1 -> [DC1] node1 node1 -> [DC2] node2 Site Mappings: ~~~~~~~~~~~~~~ DC2 (primary/primary) |---DC1 (syncmem/logreplay) Tier of DC2: 1 Tier of DC1: 2 Replication mode of DC2: primary Replication mode of DC1: syncmem Operation mode of DC2: primary Operation mode of DC1: logreplay Mapping: DC2 -> DC1 done.
- "Site Mapping"部分中缺少 DC3,因此除非您将 DC3 重新注册到新主,否则您将无法启动它。
DC3 可以使用以下方法重新注册:
sidadm@node3% hdbnsutil -sr_register --remoteHost=node2 --remoteInstance=${TINSTANCE} --replicationMode=syncmem --name=DC3 --remoteName=DC1 --operationMode=logreplay --online
- 将立即创建缺少的条目,系统复制将在 SAP HANA 数据库启动后立即启动。
您可以通过执行以下内容来检查它:
sidadm@node1% hdbnsutil -sr_state sidadm@node1% python systemReplicationStatus.py ; echo $?
- 您可以在 Check SAP HANA System Replication status 中找到更多信息。