5.4. 测试 2:故障切换具有被动第三个站点的主节点


测试的主题

没有重新注册已停止的第三个站点。

即使第三个站点停机,故障转移也可以正常工作。

测试先决条件

  • DC1 上的 SAP HANA,DC2 正在运行并在 DC3 上停止。
  • 集群已启动并运行,且没有错误或警告。

测试步骤

使用 pcs [root@clusternode1]# resource move 命令移动 SAPHana 资源。

启动测试

执行集群命令 :[root@clusterclusternode1]# pcs move resource SAPHana_RH2_02-clone clusterclusternode1

预期结果

DC3 没有更改。SAP HANA 系统复制与旧关系保持同步。

返回初始状态的方法

在新主上重新注册 DC3,并启动 SAP HANA。

详细描述

  • 以 root 用户身份在 clusternode1 或 clusternode2 上检查集群的初始状态:

    [root@clusternode1]# pcs status --full
    Cluster name: cluster1
    Cluster Summary:
      * Stack: corosync
      * Current DC: clusternode1 (1) (version 2.1.2-4.el8_6.6-ada5c3b36e2) - partition with quorum
      * Last updated: Mon Sep  4 06:34:46 2023
      * Last change:  Mon Sep  4 06:33:04 2023 by root via crm_attribute on clusternode1
      * 2 nodes configured
      * 6 resource instances configured
    
    Node List:
      * Online: [ clusternode1 (1) clusternode2 (2) ]
    
    Full List of Resources:
      * auto_rhevm_fence1	(stonith:fence_rhevm):	 Started clusternode1
      * Clone Set: SAPHanaTopology_RH2_02-clone [SAPHanaTopology_RH2_02]:
        * SAPHanaTopology_RH2_02	(ocf::heartbeat:SAPHanaTopology):	 Started clusternode2
        * SAPHanaTopology_RH2_02	(ocf::heartbeat:SAPHanaTopology):	 Started clusternode1
      * Clone Set: SAPHana_RH2_02-clone [SAPHana_RH2_02] (promotable):
        * SAPHana_RH2_02	(ocf::heartbeat:SAPHana):	 Slave clusternode2
        * SAPHana_RH2_02	(ocf::heartbeat:SAPHana):	 Master clusternode1
      * vip_RH2_02_MASTER	(ocf::heartbeat:IPaddr2):	 Started clusternode1
    
    Node Attributes:
      * Node: clusternode1 (1):
        * hana_rh2_clone_state            	: PROMOTED
        * hana_rh2_op_mode                	: logreplay
        * hana_rh2_remoteHost             	: clusternode2
        * hana_rh2_roles                  	: 4:P:master1:master:worker:master
        * hana_rh2_site                   	: DC1
        * hana_rh2_sra                    	: -
        * hana_rh2_srah                   	: -
        * hana_rh2_srmode                 	: syncmem
        * hana_rh2_sync_state             	: PRIM
        * hana_rh2_version                	: 2.00.062.00
        * hana_rh2_vhost                  	: clusternode1
        * lpa_rh2_lpt                     	: 1693809184
        * master-SAPHana_RH2_02           	: 150
      * Node: clusternode2 (2):
        * hana_rh2_clone_state            	: DEMOTED
        * hana_rh2_op_mode                	: logreplay
        * hana_rh2_remoteHost             	: clusternode1
        * hana_rh2_roles                  	: 4:S:master1:master:worker:master
         * hana_rh2_site                   	: DC2
        * hana_rh2_sra                    	: -
        * hana_rh2_srah                   	: -
        * hana_rh2_srmode                 	: syncmem
        * hana_rh2_sync_state             	: SOK
        * hana_rh2_version                	: 2.00.062.00
        * hana_rh2_vhost                  	: clusternode2
        * lpa_rh2_lpt                     	: 30
        * master-SAPHana_RH2_02           	: 100
    
    Migration Summary:
    
    Tickets:
    
    PCSD Status:
      clusternode1: Online
      clusternode2: Online
    
    Daemon Status:
      corosync: active/disabled
      pacemaker: active/disabled
      pcsd: active/enabled

    本例的输出显示,HANA 在 clusternode1 上被提升,它是主 SAP HANA 服务器,克隆资源的名称为 SAPHana_RH2_02-clone 是可升级的。如果您在 HANA 之前运行 test 3,则可能会在 clusternode2 上提升。

  • 停止 remotehost3 上的数据库:

    remotehost3:rh2adm> HDB stop
    hdbdaemon will wait maximal 300 seconds for NewDB services finishing.
    Stopping instance using: /usr/sap/RH2/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 02 -function Stop 400
    
    12.07.2023 11:33:14
    Stop
    OK
    Waiting for stopped instance using: /usr/sap/RH2/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 02 -function WaitforStopped 600 2
    
    
    12.07.2023 11:33:30
    WaitforStopped
    OK
    hdbdaemon is stopped.
  • 检查 remotehost3 上的主数据库:

    remotehost3:rh2adm> hdbnsutil -sr_stateConfiguration| grep -i "primary masters"
    
    primary masters: clusterclusternode2
  • 检查集群节点上集群中的当前主要信息:

    [root@clusterclusternode1]# pcs resource | grep Masters
        * Masters: [ clusternode2 ]
  • 检查 sr_state 以查看 SAP HANA 系统复制关系:

    clusternode2:rh2adm> hdbnsutil -sr_state
    
    System Replication State
    ~~~~~~~~~~~~~~~~~~~~~~~~
    
    online: true
    
    mode: primary
    operation mode: primary
    site id: 2
    site name: DC1
    
    is source system: true
    is secondary/consumer system: false
    has secondaries/consumers attached: true
    is a takeover active: false
    is primary suspended: false
    
    Host Mappings:
    ~~~~~~~~~~~~~~
    
    clusternode1 -> [DC3] remotehost3
    clusternode1 -> [DC1] clusternode1
    clusternode1 -> [DC2] clusternode2
    
    
    Site Mappings:
    ~~~~~~~~~~~~~~
    DC1 (primary/primary)
        |---DC3 (syncmem/logreplay)
        |---DC2 (syncmem/logreplay)
    
    Tier of DC1: 1
    Tier of DC3: 2
    Tier of DC2: 2
    
    Replication mode of DC1: primary
    Replication mode of DC3: syncmem
    Replication mode of DC2: syncmem
    
    Operation mode of DC1: primary
    Operation mode of DC3: logreplay
    Operation mode of DC2: logreplay
    
    Mapping: DC1 -> DC3
    Mapping: DC1 -> DC2
    done.

SAP HANA 系统复制关系仍然有一个主要(DC1),它被复制到 DC2 和 DC3。remotehost3 (关闭)上的复制关系可以使用以下方法显示:

remothost3:rh2adm> hdbnsutil -sr_stateConfiguration

System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~

mode: syncmem
site id: 3
site name: DC3
active primary site: 1

primary masters: clusternode1
done.

remotehost3 上的数据库,其离线检查 global.ini 文件中的条目。

  • 启动测试:在集群中启动故障转移,移动 SAPHana-clone-resource 示例:

    [root@clusternode1]# pcs resource move SAPHana_RH2_02-clone clusternode2
    注意

    如果在 clusternode2 上提升 SAPHana,则必须将克隆资源移到 clusternode1。该示例要求 SAPHana 在 clusternode1 上提升。

    没有输出。与之前的测试类似,会创建一个位置约束,该约束可以使用以下方法显示:

    [root@clusternode1]# pcs constraint location
    Location Constraints:
      Resource: SAPHana_RH2_02-clone
        Enabled on:
          Node: clusternode1 (score:INFINITY) (role:Started)

    即使集群再次查找正常,此约束也会避免另一个故障转移,除非删除了约束。其中一种方法是清除资源。

  • 清除资源:

    [root@clusternode1]# pcs constraint location
    Location Constraints:
      Resource: SAPHana_RH2_02-clone
        Enabled on:
          Node: clusternode1 (score:INFINITY) (role:Started)
    [root@clusternode1]# pcs resource clear SAPHana_RH2_02-clone
    Removing constraint: cli-prefer-SAPHana_RH2_02-clone
  • 清理资源:

    [root@clusternode1]# pcs resource cleanup SAPHana_RH2_02-clone
    Cleaned up SAPHana_RH2_02:0 on clusternode2
    Cleaned up SAPHana_RH2_02:1 on clusternode1
    Waiting for 1 reply from the controller
    ... got reply (done)
  • 检查当前状态。可以通过两种方式显示需要同步的复制状态。从 remotehost3 上的主设备开始:

    remotehost3:rh2adm>  hdbnsutil -sr_stateConfiguration| grep -i primary
    active primary site: 1
    primary masters: clusternode1

    输出显示站点 1 或 clusternode1,这是开始测试之前的主要内容,将主要移到 clusternode2。然后再检查新主上的系统复制状态。首先检测新主设备:

    [root@clusternode1]# pcs resource | grep  Master
        * Masters: [ clusternode2 ]

    在这里,我们有一个不一致的,需要重新注册 remotehost3。您可能认为,如果我们再次运行测试,我们可能会将主设备切回到原始 clusternode1。在这种情况下,我们有第三个方法来识别系统复制是否正常工作。在主节点上,我们的问题单 clusternode2 运行:

    clusternode2:rh2adm> cdpy
    clusternode2:rh2adm> python
    $DIR_EXECUTABLE/python_support/systemReplicationStatus.py
    |Database |Host   |Port  |Service Name |Volume ID |Site ID |Site Name |Secondary |Secondary |Secondary |Secondary |Secondary     |Replication |Replication |Replication    |Secondary    |
    |         |       |      |             |          |        |          |Host      |Port      |Site ID   |Site Name |Active Status |Mode        |Status      |Status Details |Fully Synced |
    |-------- |------ |----- |------------ |--------- |------- |--------- |--------- |--------- |--------- |--------- |------------- |----------- |----------- |-------------- |------------ |
    |SYSTEMDB |clusternode2 |30201 |nameserver   |        1 |      2 |DC2       |clusternode1    |    30201 |        1 |DC1       |YES           |SYNCMEM     |ACTIVE      |               |        True |
    |RH2      |clusternode2 |30207 |xsengine     |        2 |      2 |DC2       |clusternode1    |    30207 |        1 |DC1       |YES           |SYNCMEM     |ACTIVE      |               |        True |
    |RH2      |clusternode2 |30203 |indexserver  |        3 |      2 |DC2       |clusternode1    |    30203 |        1 |DC1       |YES           |SYNCMEM     |ACTIVE      |               |        True |
    
    status system replication site "1": ACTIVE
    overall system replication status: ACTIVE
    
    Local System Replication State
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    mode: PRIMARY
    site id: 2
    site name: DC2

    如果在此输出中没有看到 remotehost3,则必须重新注册 remotehost3。在注册前,请在主节点上运行以下内容以观察注册的进度:

    clusternode2:rh2adm> watch python
    $DIR_EXECUTABLE/python_support/systemReplicationStatus.py

    现在,您可以使用以下命令重新注册 remotehost3:

    remotehost3:rh2adm> hdbnsutil -sr_register --remoteHost=clusternode2 --remoteInstance=${TINSTANCE} --replicationMode=async --name=DC3 --remoteName=DC2 --operation
    Mode=logreplay --online
    adding site ...
    collecting information ...
    updating local ini files ...
    done.

    即使 remotehost3 上的数据库尚未启动,您也无法在系统复制状态输出中看到第三个站点。
    通过在 remotehost3 上启动数据库,可以完成注册:

    remotehost3:rh2adm> HDB start
    
    
    StartService
    Impromptu CCC initialization by 'rscpCInit'.
      See SAP note 1266393.
    OK
    OK
    Starting instance using: /usr/sap/RH2/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 02 -function StartWait 2700 2
    
    
    04.09.2023 11:36:47
    Start
    OK

上面启动的监控器将立即显示 remotehost3 的同步。

  • 要切换回来,请再次运行测试。一个可选测试是将主要测试切换到节点,该节点在 remotehost3 上的 global.ini 上配置,然后启动数据库。数据库可能会启动,但永远不会显示在系统复制状态的输出中,除非它被重新注册。

如需更多信息,请参阅 检查 SAP HANA 系统复制状态

Red Hat logoGithubRedditYoutubeTwitter

学习

尝试、购买和销售

社区

关于红帽文档

通过我们的产品和服务,以及可以信赖的内容,帮助红帽用户创新并实现他们的目标。

让开源更具包容性

红帽致力于替换我们的代码、文档和 Web 属性中存在问题的语言。欲了解更多详情,请参阅红帽博客.

關於紅帽

我们提供强化的解决方案,使企业能够更轻松地跨平台和环境(从核心数据中心到网络边缘)工作。

© 2024 Red Hat, Inc.