5.4. 测试 2：故障切换具有被动第三个站点的主节点

Expand

测试的主题	没有重新注册已停止的第三个站点。即使第三个站点停机，故障转移也可以正常工作。
测试先决条件	DC1 上的 SAP HANA，DC2 正在运行并在 DC3 上停止。集群已启动并运行，且没有错误或警告。
测试步骤	使用 `pcs [root@clusternode1]# resource move` 命令移动 SAPHana 资源。
启动测试	执行集群命令 `：[root@clusterclusternode1]# pcs move resource SAPHana_RH2_02-clone clusterclusternode1`
预期结果	DC3 没有更改。SAP HANA 系统复制与旧关系保持同步。
返回初始状态的方法	在新主上重新注册 DC3，并启动 SAP HANA。

详细描述

以 root 用户身份在 clusternode1 或 clusternode2 上检查集群的初始状态：

pcs status --full

[root@clusternode1]# pcs status --full
Cluster name: cluster1
Cluster Summary:
  * Stack: corosync
  * Current DC: clusternode1 (1) (version 2.1.2-4.el8_6.6-ada5c3b36e2) - partition with quorum
  * Last updated: Mon Sep  4 06:34:46 2023
  * Last change:  Mon Sep  4 06:33:04 2023 by root via crm_attribute on clusternode1
  * 2 nodes configured
  * 6 resource instances configured

Node List:
  * Online: [ clusternode1 (1) clusternode2 (2) ]

Full List of Resources:
  * auto_rhevm_fence1	(stonith:fence_rhevm):	 Started clusternode1
  * Clone Set: SAPHanaTopology_RH2_02-clone [SAPHanaTopology_RH2_02]:
    * SAPHanaTopology_RH2_02	(ocf::heartbeat:SAPHanaTopology):	 Started clusternode2
    * SAPHanaTopology_RH2_02	(ocf::heartbeat:SAPHanaTopology):	 Started clusternode1
  * Clone Set: SAPHana_RH2_02-clone [SAPHana_RH2_02] (promotable):
    * SAPHana_RH2_02	(ocf::heartbeat:SAPHana):	 Slave clusternode2
    * SAPHana_RH2_02	(ocf::heartbeat:SAPHana):	 Master clusternode1
  * vip_RH2_02_MASTER	(ocf::heartbeat:IPaddr2):	 Started clusternode1

Node Attributes:
  * Node: clusternode1 (1):
    * hana_rh2_clone_state            	: PROMOTED
    * hana_rh2_op_mode                	: logreplay
    * hana_rh2_remoteHost             	: clusternode2
    * hana_rh2_roles                  	: 4:P:master1:master:worker:master
    * hana_rh2_site                   	: DC1
    * hana_rh2_sra                    	: -
    * hana_rh2_srah                   	: -
    * hana_rh2_srmode                 	: syncmem
    * hana_rh2_sync_state             	: PRIM
    * hana_rh2_version                	: 2.00.062.00
    * hana_rh2_vhost                  	: clusternode1
    * lpa_rh2_lpt                     	: 1693809184
    * master-SAPHana_RH2_02           	: 150
  * Node: clusternode2 (2):
    * hana_rh2_clone_state            	: DEMOTED
    * hana_rh2_op_mode                	: logreplay
    * hana_rh2_remoteHost             	: clusternode1
    * hana_rh2_roles                  	: 4:S:master1:master:worker:master
     * hana_rh2_site                   	: DC2
    * hana_rh2_sra                    	: -
    * hana_rh2_srah                   	: -
    * hana_rh2_srmode                 	: syncmem
    * hana_rh2_sync_state             	: SOK
    * hana_rh2_version                	: 2.00.062.00
    * hana_rh2_vhost                  	: clusternode2
    * lpa_rh2_lpt                     	: 30
    * master-SAPHana_RH2_02           	: 100

Migration Summary:

Tickets:

PCSD Status:
  clusternode1: Online
  clusternode2: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

Copy to Clipboard

Toggle word wrap

本例的输出显示，HANA 在 clusternode1 上被提升，它是主 SAP HANA 服务器，克隆资源的名称为 SAPHana_RH2_02-clone 是可升级的。如果您在 HANA 之前运行 test 3，则可能会在 clusternode2 上提升。

停止 remotehost3 上的数据库：

remotehost3:rh2adm> HDB stop
hdbdaemon will wait maximal 300 seconds for NewDB services finishing.
Stopping instance using: /usr/sap/RH2/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 02 -function Stop 400

12.07.2023 11:33:14
Stop
OK
Waiting for stopped instance using: /usr/sap/RH2/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 02 -function WaitforStopped 600 2


12.07.2023 11:33:30
WaitforStopped
OK
hdbdaemon is stopped.

remotehost3:rh2adm> HDB stop
hdbdaemon will wait maximal 300 seconds for NewDB services finishing.
Stopping instance using: /usr/sap/RH2/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 02 -function Stop 400

12.07.2023 11:33:14
Stop
OK
Waiting for stopped instance using: /usr/sap/RH2/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 02 -function WaitforStopped 600 2


12.07.2023 11:33:30
WaitforStopped
OK
hdbdaemon is stopped.

Copy to Clipboard

Toggle word wrap

检查 remotehost3 上的主数据库：

remotehost3:rh2adm> hdbnsutil -sr_stateConfiguration| grep -i "primary masters"

primary masters: clusterclusternode2

remotehost3:rh2adm> hdbnsutil -sr_stateConfiguration| grep -i "primary masters"

primary masters: clusterclusternode2

Copy to Clipboard

Toggle word wrap

检查集群节点上集群中的当前主要信息：

pcs resource | grep Masters

[root@clusterclusternode1]# pcs resource | grep Masters
    * Masters: [ clusternode2 ]

Copy to Clipboard

Toggle word wrap

检查 sr_state 以查看 SAP HANA 系统复制关系：

clusternode2:rh2adm> hdbnsutil -sr_state

System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~

online: true

mode: primary
operation mode: primary
site id: 2
site name: DC1

is source system: true
is secondary/consumer system: false
has secondaries/consumers attached: true
is a takeover active: false
is primary suspended: false

Host Mappings:
~~~~~~~~~~~~~~

clusternode1 -> [DC3] remotehost3
clusternode1 -> [DC1] clusternode1
clusternode1 -> [DC2] clusternode2


Site Mappings:
~~~~~~~~~~~~~~
DC1 (primary/primary)
    |---DC3 (syncmem/logreplay)
    |---DC2 (syncmem/logreplay)

Tier of DC1: 1
Tier of DC3: 2
Tier of DC2: 2

Replication mode of DC1: primary
Replication mode of DC3: syncmem
Replication mode of DC2: syncmem

Operation mode of DC1: primary
Operation mode of DC3: logreplay
Operation mode of DC2: logreplay

Mapping: DC1 -> DC3
Mapping: DC1 -> DC2
done.

clusternode2:rh2adm> hdbnsutil -sr_state

System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~

online: true

mode: primary
operation mode: primary
site id: 2
site name: DC1

is source system: true
is secondary/consumer system: false
has secondaries/consumers attached: true
is a takeover active: false
is primary suspended: false

Host Mappings:
~~~~~~~~~~~~~~

clusternode1 -> [DC3] remotehost3
clusternode1 -> [DC1] clusternode1
clusternode1 -> [DC2] clusternode2


Site Mappings:
~~~~~~~~~~~~~~
DC1 (primary/primary)
    |---DC3 (syncmem/logreplay)
    |---DC2 (syncmem/logreplay)

Tier of DC1: 1
Tier of DC3: 2
Tier of DC2: 2

Replication mode of DC1: primary
Replication mode of DC3: syncmem
Replication mode of DC2: syncmem

Operation mode of DC1: primary
Operation mode of DC3: logreplay
Operation mode of DC2: logreplay

Mapping: DC1 -> DC3
Mapping: DC1 -> DC2
done.

Copy to Clipboard

Toggle word wrap

SAP HANA 系统复制关系仍然有一个主要(DC1)，它被复制到 DC2 和 DC3。remotehost3 （关闭）上的复制关系可以使用以下方法显示：

remothost3:rh2adm> hdbnsutil -sr_stateConfiguration

System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~

mode: syncmem
site id: 3
site name: DC3
active primary site: 1

primary masters: clusternode1
done.

remothost3:rh2adm> hdbnsutil -sr_stateConfiguration

System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~

mode: syncmem
site id: 3
site name: DC3
active primary site: 1

primary masters: clusternode1
done.

Copy to Clipboard

Toggle word wrap

remotehost3 上的数据库，其离线检查 global.ini 文件中的条目。

启动测试：在集群中启动故障转移，移动 SAPHana-clone-resource 示例：
```
pcs resource move SAPHana_RH2_02-clone clusternode2
```
```
[root@clusternode1]# pcs resource move SAPHana_RH2_02-clone clusternode2
```
Copy to Clipboard Toggle word wrap
注意
如果在 clusternode2 上提升 SAPHana，则必须将克隆资源移到 clusternode1。该示例要求 SAPHana 在 clusternode1 上提升。
没有输出。与之前的测试类似，会创建一个位置约束，该约束可以使用以下方法显示：
```
pcs constraint location
```
```
[root@clusternode1]# pcs constraint location
Location Constraints:
  Resource: SAPHana_RH2_02-clone
    Enabled on:
      Node: clusternode1 (score:INFINITY) (role:Started)
```
Copy to Clipboard Toggle word wrap
即使集群再次查找正常，此约束也会避免另一个故障转移，除非删除了约束。其中一种方法是清除资源。

清除资源：

pcs constraint location
pcs resource clear SAPHana_RH2_02-clone

[root@clusternode1]# pcs constraint location
Location Constraints:
  Resource: SAPHana_RH2_02-clone
    Enabled on:
      Node: clusternode1 (score:INFINITY) (role:Started)
[root@clusternode1]# pcs resource clear SAPHana_RH2_02-clone
Removing constraint: cli-prefer-SAPHana_RH2_02-clone

Copy to Clipboard

Toggle word wrap

清理资源：

pcs resource cleanup SAPHana_RH2_02-clone

[root@clusternode1]# pcs resource cleanup SAPHana_RH2_02-clone
Cleaned up SAPHana_RH2_02:0 on clusternode2
Cleaned up SAPHana_RH2_02:1 on clusternode1
Waiting for 1 reply from the controller
... got reply (done)

Copy to Clipboard

Toggle word wrap

检查当前状态。可以通过两种方式显示需要同步的复制状态。从 remotehost3 上的主设备开始：

remotehost3:rh2adm>  hdbnsutil -sr_stateConfiguration| grep -i primary
active primary site: 1
primary masters: clusternode1

remotehost3:rh2adm>  hdbnsutil -sr_stateConfiguration| grep -i primary
active primary site: 1
primary masters: clusternode1

Copy to Clipboard

Toggle word wrap

输出显示站点 1 或 clusternode1，这是开始测试之前的主要内容，将主要移到 clusternode2。然后再检查新主上的系统复制状态。首先检测新主设备：

pcs resource | grep  Master

[root@clusternode1]# pcs resource | grep  Master
    * Masters: [ clusternode2 ]

Copy to Clipboard

Toggle word wrap

在这里，我们有一个不一致的，需要重新注册 remotehost3。您可能认为，如果我们再次运行测试，我们可能会将主设备切回到原始 clusternode1。在这种情况下，我们有第三个方法来识别系统复制是否正常工作。在主节点上，我们的问题单 clusternode2 运行：

clusternode2:rh2adm> cdpy
clusternode2:rh2adm> python
$DIR_EXECUTABLE/python_support/systemReplicationStatus.py
|Database |Host   |Port  |Service Name |Volume ID |Site ID |Site Name |Secondary |Secondary |Secondary |Secondary |Secondary     |Replication |Replication |Replication    |Secondary    |
|         |       |      |             |          |        |          |Host      |Port      |Site ID   |Site Name |Active Status |Mode        |Status      |Status Details |Fully Synced |
|-------- |------ |----- |------------ |--------- |------- |--------- |--------- |--------- |--------- |--------- |------------- |----------- |----------- |-------------- |------------ |
|SYSTEMDB |clusternode2 |30201 |nameserver   |        1 |      2 |DC2       |clusternode1    |    30201 |        1 |DC1       |YES           |SYNCMEM     |ACTIVE      |               |        True |
|RH2      |clusternode2 |30207 |xsengine     |        2 |      2 |DC2       |clusternode1    |    30207 |        1 |DC1       |YES           |SYNCMEM     |ACTIVE      |               |        True |
|RH2      |clusternode2 |30203 |indexserver  |        3 |      2 |DC2       |clusternode1    |    30203 |        1 |DC1       |YES           |SYNCMEM     |ACTIVE      |               |        True |

status system replication site "1": ACTIVE
overall system replication status: ACTIVE

Local System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

mode: PRIMARY
site id: 2
site name: DC2

clusternode2:rh2adm> cdpy
clusternode2:rh2adm> python
$DIR_EXECUTABLE/python_support/systemReplicationStatus.py
|Database |Host   |Port  |Service Name |Volume ID |Site ID |Site Name |Secondary |Secondary |Secondary |Secondary |Secondary     |Replication |Replication |Replication    |Secondary    |
|         |       |      |             |          |        |          |Host      |Port      |Site ID   |Site Name |Active Status |Mode        |Status      |Status Details |Fully Synced |
|-------- |------ |----- |------------ |--------- |------- |--------- |--------- |--------- |--------- |--------- |------------- |----------- |----------- |-------------- |------------ |
|SYSTEMDB |clusternode2 |30201 |nameserver   |        1 |      2 |DC2       |clusternode1    |    30201 |        1 |DC1       |YES           |SYNCMEM     |ACTIVE      |               |        True |
|RH2      |clusternode2 |30207 |xsengine     |        2 |      2 |DC2       |clusternode1    |    30207 |        1 |DC1       |YES           |SYNCMEM     |ACTIVE      |               |        True |
|RH2      |clusternode2 |30203 |indexserver  |        3 |      2 |DC2       |clusternode1    |    30203 |        1 |DC1       |YES           |SYNCMEM     |ACTIVE      |               |        True |

status system replication site "1": ACTIVE
overall system replication status: ACTIVE

Local System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

mode: PRIMARY
site id: 2
site name: DC2

Copy to Clipboard

Toggle word wrap

如果在此输出中没有看到 remotehost3，则必须重新注册 remotehost3。在注册前，请在主节点上运行以下内容以观察注册的进度：

clusternode2:rh2adm> watch python
$DIR_EXECUTABLE/python_support/systemReplicationStatus.py

clusternode2:rh2adm> watch python
$DIR_EXECUTABLE/python_support/systemReplicationStatus.py

Copy to Clipboard

Toggle word wrap

现在，您可以使用以下命令重新注册 remotehost3：

remotehost3:rh2adm> hdbnsutil -sr_register --remoteHost=clusternode2 --remoteInstance=${TINSTANCE} --replicationMode=async --name=DC3 --remoteName=DC2 --operation
Mode=logreplay --online
adding site ...
collecting information ...
updating local ini files ...
done.

remotehost3:rh2adm> hdbnsutil -sr_register --remoteHost=clusternode2 --remoteInstance=${TINSTANCE} --replicationMode=async --name=DC3 --remoteName=DC2 --operation
Mode=logreplay --online
adding site ...
collecting information ...
updating local ini files ...
done.

Copy to Clipboard

Toggle word wrap

即使 remotehost3 上的数据库尚未启动，您也无法在系统复制状态输出中看到第三个站点。
通过在 remotehost3 上启动数据库，可以完成注册：

remotehost3:rh2adm> HDB start


StartService
Impromptu CCC initialization by 'rscpCInit'.
  See SAP note 1266393.
OK
OK
Starting instance using: /usr/sap/RH2/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 02 -function StartWait 2700 2


04.09.2023 11:36:47
Start
OK

remotehost3:rh2adm> HDB start


StartService
Impromptu CCC initialization by 'rscpCInit'.
  See SAP note 1266393.
OK
OK
Starting instance using: /usr/sap/RH2/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 02 -function StartWait 2700 2


04.09.2023 11:36:47
Start
OK

Copy to Clipboard

Toggle word wrap

上面启动的监控器将立即显示 remotehost3 的同步。

要切换回来，请再次运行测试。一个可选测试是将主要测试切换到节点，该节点在 remotehost3 上的 global.ini 上配置，然后启动数据库。数据库可能会启动，但永远不会显示在系统复制状态的输出中，除非它被重新注册。

如需更多信息，请参阅检查 SAP HANA 系统复制状态。

5.4. 测试 2：故障切换具有被动第三个站点的主节点

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links