5.4. 测试 2:故障切换具有被动第三个站点的主节点
测试的主题 | 没有重新注册已停止的第三个站点。 即使第三个站点停机,故障转移也可以正常工作。 |
测试先决条件 |
|
测试步骤 |
使用 |
启动测试 |
执行集群命令 |
预期结果 | DC3 没有更改。SAP HANA 系统复制与旧关系保持同步。 |
返回初始状态的方法 | 在新主上重新注册 DC3,并启动 SAP HANA。 |
详细描述
以 root 用户身份在 clusternode1 或 clusternode2 上检查集群的初始状态:
[root@clusternode1]# pcs status --full Cluster name: cluster1 Cluster Summary: * Stack: corosync * Current DC: clusternode1 (1) (version 2.1.2-4.el8_6.6-ada5c3b36e2) - partition with quorum * Last updated: Mon Sep 4 06:34:46 2023 * Last change: Mon Sep 4 06:33:04 2023 by root via crm_attribute on clusternode1 * 2 nodes configured * 6 resource instances configured Node List: * Online: [ clusternode1 (1) clusternode2 (2) ] Full List of Resources: * auto_rhevm_fence1 (stonith:fence_rhevm): Started clusternode1 * Clone Set: SAPHanaTopology_RH2_02-clone [SAPHanaTopology_RH2_02]: * SAPHanaTopology_RH2_02 (ocf::heartbeat:SAPHanaTopology): Started clusternode2 * SAPHanaTopology_RH2_02 (ocf::heartbeat:SAPHanaTopology): Started clusternode1 * Clone Set: SAPHana_RH2_02-clone [SAPHana_RH2_02] (promotable): * SAPHana_RH2_02 (ocf::heartbeat:SAPHana): Slave clusternode2 * SAPHana_RH2_02 (ocf::heartbeat:SAPHana): Master clusternode1 * vip_RH2_02_MASTER (ocf::heartbeat:IPaddr2): Started clusternode1 Node Attributes: * Node: clusternode1 (1): * hana_rh2_clone_state : PROMOTED * hana_rh2_op_mode : logreplay * hana_rh2_remoteHost : clusternode2 * hana_rh2_roles : 4:P:master1:master:worker:master * hana_rh2_site : DC1 * hana_rh2_sra : - * hana_rh2_srah : - * hana_rh2_srmode : syncmem * hana_rh2_sync_state : PRIM * hana_rh2_version : 2.00.062.00 * hana_rh2_vhost : clusternode1 * lpa_rh2_lpt : 1693809184 * master-SAPHana_RH2_02 : 150 * Node: clusternode2 (2): * hana_rh2_clone_state : DEMOTED * hana_rh2_op_mode : logreplay * hana_rh2_remoteHost : clusternode1 * hana_rh2_roles : 4:S:master1:master:worker:master * hana_rh2_site : DC2 * hana_rh2_sra : - * hana_rh2_srah : - * hana_rh2_srmode : syncmem * hana_rh2_sync_state : SOK * hana_rh2_version : 2.00.062.00 * hana_rh2_vhost : clusternode2 * lpa_rh2_lpt : 30 * master-SAPHana_RH2_02 : 100 Migration Summary: Tickets: PCSD Status: clusternode1: Online clusternode2: Online Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled
本例的输出显示,HANA 在 clusternode1 上被提升,它是主 SAP HANA 服务器,克隆资源的名称为
SAPHana_RH2_02-clone
是可升级的。如果您在 HANA 之前运行 test 3,则可能会在 clusternode2 上提升。停止 remotehost3 上的数据库:
remotehost3:rh2adm> HDB stop hdbdaemon will wait maximal 300 seconds for NewDB services finishing. Stopping instance using: /usr/sap/RH2/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 02 -function Stop 400 12.07.2023 11:33:14 Stop OK Waiting for stopped instance using: /usr/sap/RH2/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 02 -function WaitforStopped 600 2 12.07.2023 11:33:30 WaitforStopped OK hdbdaemon is stopped.
检查 remotehost3 上的主数据库:
remotehost3:rh2adm> hdbnsutil -sr_stateConfiguration| grep -i "primary masters" primary masters: clusterclusternode2
检查集群节点上集群中的当前主要信息:
[root@clusterclusternode1]# pcs resource | grep Masters * Masters: [ clusternode2 ]
检查
sr_state
以查看 SAP HANA 系统复制关系:clusternode2:rh2adm> hdbnsutil -sr_state System Replication State ~~~~~~~~~~~~~~~~~~~~~~~~ online: true mode: primary operation mode: primary site id: 2 site name: DC1 is source system: true is secondary/consumer system: false has secondaries/consumers attached: true is a takeover active: false is primary suspended: false Host Mappings: ~~~~~~~~~~~~~~ clusternode1 -> [DC3] remotehost3 clusternode1 -> [DC1] clusternode1 clusternode1 -> [DC2] clusternode2 Site Mappings: ~~~~~~~~~~~~~~ DC1 (primary/primary) |---DC3 (syncmem/logreplay) |---DC2 (syncmem/logreplay) Tier of DC1: 1 Tier of DC3: 2 Tier of DC2: 2 Replication mode of DC1: primary Replication mode of DC3: syncmem Replication mode of DC2: syncmem Operation mode of DC1: primary Operation mode of DC3: logreplay Operation mode of DC2: logreplay Mapping: DC1 -> DC3 Mapping: DC1 -> DC2 done.
SAP HANA 系统复制关系仍然有一个主要(DC1),它被复制到 DC2 和 DC3。remotehost3 (关闭)上的复制关系可以使用以下方法显示:
remothost3:rh2adm> hdbnsutil -sr_stateConfiguration System Replication State ~~~~~~~~~~~~~~~~~~~~~~~~ mode: syncmem site id: 3 site name: DC3 active primary site: 1 primary masters: clusternode1 done.
remotehost3 上的数据库,其离线检查 global.ini
文件中的条目。
启动测试:在集群中启动故障转移,移动
SAPHana-clone-resource
示例:[root@clusternode1]# pcs resource move SAPHana_RH2_02-clone clusternode2
注意如果在 clusternode2 上提升 SAPHana,则必须将克隆资源移到 clusternode1。该示例要求 SAPHana 在 clusternode1 上提升。
没有输出。与之前的测试类似,会创建一个位置约束,该约束可以使用以下方法显示:
[root@clusternode1]# pcs constraint location Location Constraints: Resource: SAPHana_RH2_02-clone Enabled on: Node: clusternode1 (score:INFINITY) (role:Started)
即使集群再次查找正常,此约束也会避免另一个故障转移,除非删除了约束。其中一种方法是清除资源。
清除资源:
[root@clusternode1]# pcs constraint location Location Constraints: Resource: SAPHana_RH2_02-clone Enabled on: Node: clusternode1 (score:INFINITY) (role:Started) [root@clusternode1]# pcs resource clear SAPHana_RH2_02-clone Removing constraint: cli-prefer-SAPHana_RH2_02-clone
清理资源:
[root@clusternode1]# pcs resource cleanup SAPHana_RH2_02-clone Cleaned up SAPHana_RH2_02:0 on clusternode2 Cleaned up SAPHana_RH2_02:1 on clusternode1 Waiting for 1 reply from the controller ... got reply (done)
检查当前状态。可以通过两种方式显示需要同步的复制状态。从 remotehost3 上的主设备开始:
remotehost3:rh2adm> hdbnsutil -sr_stateConfiguration| grep -i primary active primary site: 1 primary masters: clusternode1
输出显示站点 1 或 clusternode1,这是开始测试之前的主要内容,将主要移到 clusternode2。然后再检查新主上的系统复制状态。首先检测新主设备:
[root@clusternode1]# pcs resource | grep Master * Masters: [ clusternode2 ]
在这里,我们有一个不一致的,需要重新注册 remotehost3。您可能认为,如果我们再次运行测试,我们可能会将主设备切回到原始 clusternode1。在这种情况下,我们有第三个方法来识别系统复制是否正常工作。在主节点上,我们的问题单 clusternode2 运行:
clusternode2:rh2adm> cdpy clusternode2:rh2adm> python $DIR_EXECUTABLE/python_support/systemReplicationStatus.py |Database |Host |Port |Service Name |Volume ID |Site ID |Site Name |Secondary |Secondary |Secondary |Secondary |Secondary |Replication |Replication |Replication |Secondary | | | | | | | | |Host |Port |Site ID |Site Name |Active Status |Mode |Status |Status Details |Fully Synced | |-------- |------ |----- |------------ |--------- |------- |--------- |--------- |--------- |--------- |--------- |------------- |----------- |----------- |-------------- |------------ | |SYSTEMDB |clusternode2 |30201 |nameserver | 1 | 2 |DC2 |clusternode1 | 30201 | 1 |DC1 |YES |SYNCMEM |ACTIVE | | True | |RH2 |clusternode2 |30207 |xsengine | 2 | 2 |DC2 |clusternode1 | 30207 | 1 |DC1 |YES |SYNCMEM |ACTIVE | | True | |RH2 |clusternode2 |30203 |indexserver | 3 | 2 |DC2 |clusternode1 | 30203 | 1 |DC1 |YES |SYNCMEM |ACTIVE | | True | status system replication site "1": ACTIVE overall system replication status: ACTIVE Local System Replication State ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ mode: PRIMARY site id: 2 site name: DC2
如果在此输出中没有看到 remotehost3,则必须重新注册 remotehost3。在注册前,请在主节点上运行以下内容以观察注册的进度:
clusternode2:rh2adm> watch python $DIR_EXECUTABLE/python_support/systemReplicationStatus.py
现在,您可以使用以下命令重新注册 remotehost3:
remotehost3:rh2adm> hdbnsutil -sr_register --remoteHost=clusternode2 --remoteInstance=${TINSTANCE} --replicationMode=async --name=DC3 --remoteName=DC2 --operation Mode=logreplay --online adding site ... collecting information ... updating local ini files ... done.
即使 remotehost3 上的数据库尚未启动,您也无法在系统复制状态输出中看到第三个站点。
通过在 remotehost3 上启动数据库,可以完成注册:remotehost3:rh2adm> HDB start StartService Impromptu CCC initialization by 'rscpCInit'. See SAP note 1266393. OK OK Starting instance using: /usr/sap/RH2/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 02 -function StartWait 2700 2 04.09.2023 11:36:47 Start OK
上面启动的监控器将立即显示 remotehost3 的同步。
- 要切换回来,请再次运行测试。一个可选测试是将主要测试切换到节点,该节点在 remotehost3 上的 global.ini 上配置,然后启动数据库。数据库可能会启动,但永远不会显示在系统复制状态的输出中,除非它被重新注册。
如需更多信息,请参阅 检查 SAP HANA 系统复制状态。