5.6. 测试 4:主站点的反馈到第一个站点


Expand

测试的主题

主切换回一个集群节点。

故障恢复并再次启用集群。

将第三个站点重新注册为次要站点。

测试前提条件

  • SAP HANA 主节点在第三个站点上运行。
  • 集群运行部分。
  • 集群被置于 maintenance_mode 中。
  • 前一个集群主是可检测的。

测试步骤

检查集群的预期主设备。

从 DC3 节点故障转移到 DC1 节点。

检查前一个二级是否已切换到新的主。

重新注册 az3n1 作为新次要。

设置 cluster maintenance_mode=false,集群将继续工作。

监控测试

在新的主启动时:

az3n1:rh2adm> watch python ${DIR_EXECUTABLES}/python_support/systemReplicationStatus.py [root@az1n1]# watch pcs status --full

在二级启动时:

clusternode:rh2adm> watch hdbnsutil -sr_state

启动测试

检查集群的预期主设备: [root@az1n1]" pcs resource

VIP 和提升的 SAP HANA 资源应在同一节点上运行,这是潜在的新主节点。

在这个潜在的主要上运行为 sidadm:az1n1:rh2adm> hdbnsutil -sr_takeover

将前一个主重新注册为新的次要:

az1n1:rh2adm> hdbnsutil -sr_register \ --remoteHost=az1n1 \ --remoteInstance=${TINSTANCE} \ --replicationMode=syncmem \ --name=DC3 \ --remoteName=DC1 \ --operationMode=logreplay \ --force_full_replica \ --online

设置 maintenance_mode=false 后,集群将继续正常工作。

预期结果

新主要是启动 SAP HANA。

复制状态显示所有 3 个站点。

第二个集群站点会自动重新注册到新的主站点。

DR 站点成为数据库的额外副本。

返回初始状态的方法

运行测试 3。

详细描述

  • 检查集群是否 被置于维护模式

    [root@az1n1]# pcs property config maintenance-mode
    Cluster Properties:
     maintenance-mode: true

    如果 maintenance-mode 不是 true,您可以使用以下内容设置它:

    [root@az1n1]# pcs property set  maintenance-mode=true
  • 检查系统复制状态,并发现所有节点上的主数据库。

    首先,使用以下命令发现主数据库:

    az1n1:rh2adm> hdbnsutil -sr_state | egrep -e "^mode:|primary masters"

    输出应如下所示:

    在 az1n1 上:

    az1n1:rh2adm> hdbnsutil -sr_state | egrep -e "^mode:|primary masters"
    mode: syncmem
    primary masters: az3n1

    在 az2n1 上:

    az2n1:rh2adm> hdbnsutil -sr_state | egrep -e "^mode:|primary masters"
    mode: syncmem
    primary masters: az3n1

    在 az3n1 上:

    az3n1:rh2adm> hdbnsutil -sr_state | egrep -e "^mode:|primary masters"
    mode: primary

    在所有三个节点上,主数据库是 az3n1。

    在这个主数据库中,您必须确保所有这三个节点的系统复制状态都处于活跃状态,返回码为 15:

    az3n1:rh2adm> python /usr/sap/${SAPSYSTEMNAME}/HDB${TINSTANCE}/exe/python_support/systemReplicationStatus.py
    |Database |Host   |Port  |Service Name |Volume ID |Site ID |Site Name |Secondary |Secondary |Secondary |Secondary |Secondary     |Replication |Replication |Replication    |Secondary    |
    |         |       |      |             |          |        |          |Host      |Port      |Site ID   |Site Name |Active Status |Mode        |Status      |Status Details |Fully Synced |
    |-------- |------ |----- |------------ |--------- |------- |--------- |--------- |--------- |--------- |--------- |------------- |----------- |----------- |-------------- |------------ |
    |SYSTEMDB |az3n1 |30201 |nameserver   |        1 |      3 |DC3       |az2n1    |    30201 |        2 |DC2       |YES           |SYNCMEM     |ACTIVE      |               |        True |
    |RH2      |az3n1 |30207 |xsengine     |        2 |      3 |DC3       |az2n1    |    30207 |        2 |DC2       |YES           |SYNCMEM     |ACTIVE      |               |        True |
    |RH2      |az3n1 |30203 |indexserver  |        3 |      3 |DC3       |az2n1    |    30203 |        2 |DC2       |YES           |SYNCMEM     |ACTIVE      |               |        True |
    |SYSTEMDB |az3n1 |30201 |nameserver   |        1 |      3 |DC3       |az1n1    |    30201 |        1 |DC1       |YES           |SYNCMEM     |ACTIVE      |               |        True |
    |RH2      |az3n1 |30207 |xsengine     |        2 |      3 |DC3       |az1n1    |    30207 |        1 |DC1       |YES           |SYNCMEM     |ACTIVE      |               |        True |
    |RH2      |az3n1 |30203 |indexserver  |        3 |      3 |DC3       |az1n1    |    30203 |        1 |DC1       |YES           |SYNCMEM     |ACTIVE      |               |        True |
    
    status system replication site "2": ACTIVE
    status system replication site "1": ACTIVE
    overall system replication status: ACTIVE
    
    Local System Replication State
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    mode: PRIMARY
    site id: 3
    site name: DC3
    [rh2adm@az3n1: python_support]# echo $?
    15
  • 检查所有三个 sr_states 是否都一致。

    在所有三个节点上运行 hdbnsutil -sr_state --sapcontrol=1 |grep site unsetMode:

    az1n1:rh2adm>hdbnsutil -sr_state --sapcontrol=1 |grep  site.*Mode
    
    
    az2n1:rh2adm> hsbnsutil -sr_state --sapcontrol=1 | grep site.*Mode
    
    
    az3n1:rh2adm>hsbnsutil -sr_state --sapcontrol=1 | grep site.*Mode

    所有节点上的输出都应该相同:

    siteReplicationMode/DC1=primary
    siteReplicationMode/DC3=async
    siteReplicationMode/DC2=syncmem
    siteOperationMode/DC1=primary
    siteOperationMode/DC3=logreplay
    siteOperationMode/DC2=logreplay
  • 在单独的窗口中启动监控。

    在 az1n1 上,启动:

    az1n1:rh2adm> watch "python /usr/sap/${SAPSYSTEMNAME}/HDB${TINSTANCE}/exe/python_support/systemReplicationStatus.py; echo \$?"

    在 az3n1 上,启动:

    az3n1:rh2adm>watch "python /usr/sap/${SAPSYSTEMNAME}/HDB${TINSTANCE}/exe/python_support/systemReplicationStatus.py; echo \$?"

    在 az2n1 上,启动:

    az2n1:rh2adm> watch "hdbnsutil -sr_state --sapcontrol=1 |grep  siteReplicationMode"
  • 启动测试。

    要故障转移到 az1n1,在 az1n1 上启动:

    az1n1:rh2adm> hdbnsutil -sr_takeover
    done.
  • 检查监视器的输出。

    az1n1 上的监控器发生了以下变化:

    Every 2.0s: python systemReplicationStatus.py; echo $?                                                                                                                                                            az1n1: Mon Sep  4 23:34:30 2023
    
    |Database |Host   |Port  |Service Name |Volume ID |Site ID |Site Name |Secondary |Secondary |Secondary |Secondary |Secondary     |Replication |Replication |Replication    |Secondary    |
    |         |       |      |             |          |        |          |Host      |Port      |Site ID   |Site Name |Active Status |Mode        |Status      |Status Details |Fully Synced |
    |-------- |------ |----- |------------ |--------- |------- |--------- |--------- |--------- |--------- |--------- |------------- |----------- |----------- |-------------- |------------ |
    |SYSTEMDB |az1n1 |30201 |nameserver   |        1 |      1 |DC1       |az2n1    |    30201 |        2 |DC2       |YES           |SYNCMEM     |ACTIVE      |               |        True |
    |RH2      |az1n1 |30207 |xsengine     |        2 |      1 |DC1       |az2n1    |    30207 |        2 |DC2       |YES           |SYNCMEM     |ACTIVE      |               |        True |
    |RH2      |az1n1 |30203 |indexserver  |        3 |      1 |DC1       |az2n1    |    30203 |        2 |DC2       |YES           |SYNCMEM     |ACTIVE      |               |        True |
    
    status system replication site "2": ACTIVE
    overall system replication status: ACTIVE
    
    Local System Replication State
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    mode: PRIMARY
    site id: 1
    site name: DC1
    15

    重要的也是返回代码 15。

    az2n1 上的监控器更改为:

    Every 2.0s: hdbnsutil -sr_state --sapcontrol=1 |grep  site.*Mode                                                az2n1: Mon Sep  4 23:35:18 2023
    
    siteReplicationMode/DC1=primary
    siteReplicationMode/DC2=syncmem
    siteOperationMode/DC1=primary
    siteOperationMode/DC2=logreplay

    DC3 已消失,需要重新注册。

    在 az3n1 上,systemReplicationStatus 报告错误,返回码更改为 11。

  • 检查集群节点是否已重新注册:

    az1n1:rh2adm>  hdbnsutil -sr_state
    
    System Replication State
    ~~~~~~~~~~~~~~~~~~~~~~~~
    
    online: true
    
    mode: primary
    operation mode: primary
    site id: 1
    site name: DC1
    
    is source system: true
    is secondary/consumer system: false
    has secondaries/consumers attached: true
    is a takeover active: false
    is primary suspended: false
    
    Host Mappings:
    ~~~~~~~~~~~~~~
    
    az1n1 -> [DC2] az2n1
    az1n1 -> [DC1] az1n1
    
    
    Site Mappings:
    ~~~~~~~~~~~~~~
    DC1 (primary/primary)
        |---DC2 (syncmem/logreplay)
    
    Tier of DC1: 1
    Tier of DC2: 2
    
    Replication mode of DC1: primary
    Replication mode of DC2: syncmem
    
    Operation mode of DC1: primary
    Operation mode of DC2: logreplay
    
    Mapping: DC1 -> DC2
    done.

    Site Mapping 显示 az2n1 (DC2)已重新注册。

  • 检查或启用 vip 资源:

    [root@az1n1]# pcs resource
      * Clone Set: SAPHanaTopology_RH2_02-clone [SAPHanaTopology_RH2_02] (unmanaged):
        * SAPHanaTopology_RH2_02    (ocf::heartbeat:SAPHanaTopology):        Started az2n1 (unmanaged)
        * SAPHanaTopology_RH2_02    (ocf::heartbeat:SAPHanaTopology):        Started az1n1 (unmanaged)
      * Clone Set: SAPHana_RH2_02-clone [SAPHana_RH2_02] (promotable, unmanaged):
        * SAPHana_RH2_02    (ocf::heartbeat:SAPHana):        Master az2n1 (unmanaged)
        * SAPHana_RH2_02    (ocf::heartbeat:SAPHana):        Slave az1n1 (unmanaged)
      * vip_RH2_02_MASTER   (ocf::heartbeat:IPaddr2):        Stopped (disabled, unmanaged)

    vip 资源 vip_RH2_02_MASTER 已停止。

    要再次启动它:

    [root@az1n1]# pcs resource enable vip_RH2_02_MASTER
    Warning: 'vip_RH2_02_MASTER' is unmanaged

    警告是正确的,因为集群不会启动任何资源,除非 maintenance-mode=false 除外。

  • 停止集群 维护模式

    在停止 maintenance-mode 前,应在单独的窗口中启动两个 monitor 以查看更改。

    在 az2n1 上运行:

    [root@az2n1]# watch pcs status --full

    在 az1n1 上运行:

    az1n1:rh2adm> watch "python /usr/sap/${SAPSYSTEMNAME}/HDB${TINSTANCE}/exe/python_support/systemReplicationStatus.py; echo $?"

    运行这个命令在 az1n1 上取消设置 maintenance-mode

    [root@az1n1]# pcs property set maintenance-mode=false

    az1n1 上的监控器应该显示所有按预期运行:

    Every 2.0s: pcs status --full                                                                                                                                                                                     az1n1: Tue Sep  5 00:01:17 2023
    
    Cluster name: cluster1
    Cluster Summary:
      * Stack: corosync
      * Current DC: az1n1 (1) (version 2.1.2-4.el8_6.6-ada5c3b36e2) - partition with quorum
      * Last updated: Tue Sep  5 00:01:17 2023
      * Last change:  Tue Sep  5 00:00:30 2023 by root via crm_attribute on az1n1
      * 2 nodes configured
      * 6 resource instances configured
    
    Node List:
      * Online: [ az1n1 (1) az2n1 (2) ]
    
    Full List of Resources:
      * auto_rhevm_fence1   (stonith:fence_rhevm):   Started az1n1
      * Clone Set: SAPHanaTopology_RH2_02-clone [SAPHanaTopology_RH2_02]:
        * SAPHanaTopology_RH2_02    (ocf::heartbeat:SAPHanaTopology):        Started az2n1
        * SAPHanaTopology_RH2_02    (ocf::heartbeat:SAPHanaTopology):        Started az1n1
      * Clone Set: SAPHana_RH2_02-clone [SAPHana_RH2_02] (promotable):
        * SAPHana_RH2_02    (ocf::heartbeat:SAPHana):        Slave az2n1
        * SAPHana_RH2_02    (ocf::heartbeat:SAPHana):        Master az1n1
      * vip_RH2_02_MASTER   (ocf::heartbeat:IPaddr2):        Started az1n1
    
    Node Attributes:
      * Node: az1n1 (1):
        * hana_rh2_clone_state              : PROMOTED
        * hana_rh2_op_mode                  : logreplay
        * hana_rh2_remoteHost               : az2n1
        * hana_rh2_roles                    : 4:P:master1:master:worker:master
        * hana_rh2_site                     : DC1
        * hana_rh2_sra                      : -
        * hana_rh2_srah                     : -
        * hana_rh2_srmode                   : syncmem
        * hana_rh2_sync_state               : PRIM
        * hana_rh2_version                  : 2.00.062.00
        * hana_rh2_vhost                    : az1n1
        * lpa_rh2_lpt                       : 1693872030
        * master-SAPHana_RH2_02             : 150
      * Node: az2n1 (2):
        * hana_rh2_clone_state              : DEMOTED
        * hana_rh2_op_mode                  : logreplay
        * hana_rh2_remoteHost               : az1n1
        * hana_rh2_roles                    : 4:S:master1:master:worker:master
        * hana_rh2_site                     : DC2
        * hana_rh2_sra                      : -
        * hana_rh2_srah                     : -
        * hana_rh2_srmode                   : syncmem
        * hana_rh2_sync_state               : SOK
        * hana_rh2_version                  : 2.00.062.00
        * hana_rh2_vhost                    : az2n1
        * lpa_rh2_lpt                       : 30
        * master-SAPHana_RH2_02             : 100
    
    Migration Summary:
    
    Tickets:
    
    PCSD Status:
      az1n1: Online
      az2n1: Online
    
    Daemon Status:
      corosync: active/disabled
      pacemaker: active/disabled
      pcsd: active/enabled

    手动交互后,务必要清理集群,如 清理集群 中所述。

  • 将 az3n1 重新注册到 az1n1 上的新主位置。

    需要重新注册 az3n1。要监控进度,请在 az1n1 上启动:

    az1n1:rh2adm> watch -n 5 'python /usr/sap/${SAPSYSTEMNAME}/HDB${TINSTANCE}/exe/python_support/systemReplicationStatus.py ; echo Status $?'

    在 az3n1 上,启动:

    az3n1:rh2adm> watch 'hdbnsutil -sr_state --sapcontrol=1 |grep  siteReplicationMode'

    现在,您可以使用以下命令重新注册 az3n1 :

    az3n1:rh2adm> hdbnsutil -sr_register --remoteHost=az1n1 --remoteInstance=${TINSTANCE} --replicationMode=async --name=DC3 --remoteName=DC1 --operationMode=logreplay --online

    az1n1 上的监控器发生了以下变化:

    Every 5.0s: python /usr/sap/${SAPSYSTEMNAME}/HDB${TINSTANCE}/exe/python_support/systemReplicationStatus.py ; echo Status $?                                                                                         az1n1: Tue Sep  5 00:14:40 2023
    
    |Database |Host   |Port  |Service Name |Volume ID |Site ID |Site Name |Secondary |Secondary |Secondary |Secondary |Secondary     |Replication |Replication |Replication    |Secondary    |
    |         |       |      |             |          |        |          |Host      |Port      |Site ID   |Site Name |Active Status |Mode        |Status      |Status Details |Fully Synced |
    |-------- |------ |----- |------------ |--------- |------- |--------- |--------- |--------- |--------- |--------- |------------- |----------- |----------- |-------------- |------------ |
    |SYSTEMDB |az1n1 |30201 |nameserver   |        1 |      1 |DC1       |az3n1    |    30201 |        3 |DC3       |YES           |ASYNC     |ACTIVE      |               |        True |
    |RH2      |az1n1 |30207 |xsengine     |        2 |      1 |DC1       |az3n1    |    30207 |        3 |DC3       |YES           |ASYNC     |ACTIVE      |               |        True |
    |RH2      |az1n1 |30203 |indexserver  |        3 |      1 |DC1       |az3n1    |    30203 |        3 |DC3       |YES           |ASYNC     |ACTIVE      |               |        True |
    |SYSTEMDB |az1n1 |30201 |nameserver   |        1 |      1 |DC1       |az2n1    |    30201 |        2 |DC2       |YES           |SYNCMEM     |ACTIVE      |               |        True |
    |RH2      |az1n1 |30207 |xsengine     |        2 |      1 |DC1       |az2n1    |    30207 |        2 |DC2       |YES           |SYNCMEM     |ACTIVE      |               |        True |
    |RH2      |az1n1 |30203 |indexserver  |        3 |      1 |DC1       |az2n1    |    30203 |        2 |DC2       |YES           |SYNCMEM     |ACTIVE      |               |        True |
    
    status system replication site "3": ACTIVE
    status system replication site "2": ACTIVE
    overall system replication status: ACTIVE
    
    Local System Replication State
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    mode: PRIMARY
    site id: 1
    site name: DC1
    Status 15

    az3n1 的监控器变为:

    Every 2.0s: hdbnsutil -sr_state --sapcontrol=1 |grep  site.*Mode                                                                az3n1: Tue Sep  5 02:15:28 2023
    
    siteReplicationMode/DC1=primary
    siteReplicationMode/DC3=syncmem
    siteReplicationMode/DC2=syncmem
    siteOperationMode/DC1=primary
    siteOperationMode/DC3=logreplay
    siteOperationMode/DC2=logreplay

    现在,我们再次有 3 个条目,而 az3n1 (DC3)再次是一个从 az1n1 (DC1)复制的辅助站点。

  • 检查所有节点都是 az1n1 上的系统复制状态的一部分。

    在所有三个节点上运行,hdbnsutil -sr_state --sapcontrol=1 |grep site targetedMode:

    az1n1:rh2adm> hdbnsutil -sr_state --sapcontrol=1 |grep  site.*ModesiteReplicationMode
    
    
    az2n1:rh2adm> hsbnsutil -sr_state --sapcontrol=1 | grep site.*Mode
    
    
    az3n1:rh2adm> hsbnsutil -sr_state --sapcontrol=1 | grep site.*Mode

    在所有节点上,我们应该获得相同的输出:

    siteReplicationMode/DC1=primary
    siteReplicationMode/DC3=syncmem
    siteReplicationMode/DC2=syncmem
    siteOperationMode/DC1=primary
    siteOperationMode/DC3=logreplay
    siteOperationMode/DC2=logreplay
  • 检查 pcs status --full 和 SOK。

    运行:

    [root@az1n1]# pcs status --full| grep sync_state

    输出应该是 PRIM 或 SOK :

     * hana_rh2_sync_state             	: PRIM
     * hana_rh2_sync_state             	: SOK

    最后,集群状态应该类似如下,包括 sync_state PRIM 和 SOK :

    [root@az1n1]# pcs status --full
    Cluster name: cluster1
    Cluster Summary:
      * Stack: corosync
      * Current DC: az1n1 (1) (version 2.1.2-4.el8_6.6-ada5c3b36e2) - partition with quorum
      * Last updated: Tue Sep  5 00:18:52 2023
      * Last change:  Tue Sep  5 00:16:54 2023 by root via crm_attribute on az1n1
      * 2 nodes configured
      * 6 resource instances configured
    
    Node List:
      * Online: [ az1n1 (1) az2n1 (2) ]
    
    Full List of Resources:
      * auto_rhevm_fence1   (stonith:fence_rhevm):   Started az1n1
      * Clone Set: SAPHanaTopology_RH2_02-clone [SAPHanaTopology_RH2_02]:
        * SAPHanaTopology_RH2_02    (ocf::heartbeat:SAPHanaTopology):        Started az2n1
        * SAPHanaTopology_RH2_02    (ocf::heartbeat:SAPHanaTopology):        Started az1n1
      * Clone Set: SAPHana_RH2_02-clone [SAPHana_RH2_02] (promotable):
        * SAPHana_RH2_02    (ocf::heartbeat:SAPHana):        Slave az2n1
        * SAPHana_RH2_02    (ocf::heartbeat:SAPHana):        Master az1n1
      * vip_RH2_02_MASTER   (ocf::heartbeat:IPaddr2):        Started az1n1
    
    Node Attributes:
      * Node: az1n1 (1):
        * hana_rh2_clone_state              : PROMOTED
        * hana_rh2_op_mode                  : logreplay
        * hana_rh2_remoteHost               : az2n1
        * hana_rh2_roles                    : 4:P:master1:master:worker:master
        * hana_rh2_site                     : DC1
        * hana_rh2_sra                      : -
        * hana_rh2_srah                     : -
        * hana_rh2_srmode                   : syncmem
        * hana_rh2_sync_state               : PRIM
        * hana_rh2_version                  : 2.00.062.00
        * hana_rh2_vhost                    : az1n1
        * lpa_rh2_lpt                       : 1693873014
        * master-SAPHana_RH2_02             : 150
      * Node: az2n1 (2):
        * hana_rh2_clone_state              : DEMOTED
        * hana_rh2_op_mode                  : logreplay
        * hana_rh2_remoteHost               : az1n1
        * hana_rh2_roles                    : 4:S:master1:master:worker:master
        * hana_rh2_site                     : DC2
        * hana_rh2_sra                      : -
        * hana_rh2_srah                     : -
        * hana_rh2_srmode                   : syncmem
        * hana_rh2_sync_state               : SOK
        * hana_rh2_version                  : 2.00.062.00
        * hana_rh2_vhost                    : az2n1
        * lpa_rh2_lpt                       : 30
        * master-SAPHana_RH2_02             : 100
    
    Migration Summary:
    
    Tickets:
    
    PCSD Status:
      az1n1: Online
      az2n1: Online
    
    Daemon Status:
      corosync: active/disabled
      pacemaker: active/disabled
      pcsd: active/enabled
  • 请参阅 检查集群状态并检查 数据库, 以验证所有是否可以正常工作。
Red Hat logoGithubredditYoutubeTwitter

学习

尝试、购买和销售

社区

关于红帽文档

通过我们的产品和服务,以及可以信赖的内容,帮助红帽用户创新并实现他们的目标。 了解我们当前的更新.

让开源更具包容性

红帽致力于替换我们的代码、文档和 Web 属性中存在问题的语言。欲了解更多详情,请参阅红帽博客.

關於紅帽

我们提供强化的解决方案,使企业能够更轻松地跨平台和环境(从核心数据中心到网络边缘)工作。

Theme

© 2026 Red Hat
返回顶部