After finishing the installation, it is recommended to run some basic tests to check the installation and verify how SAP HANA Multitarget System Replication is working and how it recovers from a failure. It is always a good practice to run these test cases before starting production. If possible, you can also prepare a test environment to verify changes before starting production. If possible, you can also prepare a test environment to check the changes before applying them in production.
All cases will describe:
Subject of the test
Test preconditions
Test steps
Monitoring the test
Starting the test
Expected result(s)
Ways to return to an initial state
To automatically register a former primary HANA replication site as a new secondary HANA replication site on the HANA instances that are managed by the cluster, you can use the option AUTOMATED_REGISTER=true in the SAPHana resource. For more details, refer to AUTOMATED_REGISTER.
The name of the HA cluster nodes and the HANA replication sites (in brackets) used in the examples are:
clusternode1 (DC1)
clusternode2 (DC2)
remotehost3 (DC3)
The following parameters are used for configuring the HANA instances and the cluster:
SID=RH2
INSTANCENUMBER=02
CLUSTERNAME=cluster1
You can use clusternode1-2, remotehost3 also as alias in the /etc/hosts in your test environment.
The tests are described in more detail, including examples and additional checks of preconditions. At the end, there are examples of how to clean up the environment to be prepared for further testing.
In some cases, if the distance between clusternode1-2 and remotehost3 is too long, you should use –replcationMode=async instead of –replicationMode=syncmem. Please also ask your SAP HANA administrator before choosing the right option.
An example for pcs status --full can be found in Check cluster status with pcs status. If there are warnings or previous failures in the "Migration Summary", you should clean up the cluster before you start your test.
Copy to ClipboardCopied!Toggle word wrapToggle overflow
Cluster Cleanup describes some more ways to do it. It is important that the cluster and all the resources be started.
Besides the cluster, the database should also be up and running and in sync. The easiest way to verify the proper status of the database is to check the system replication status. See also Replication Status. This should be checked on the primary database.
In this section, we are focusing on monitoring the environment during the tests. This section will only cover the necessary monitors to see the changes. It is recommended to run these monitors from dedicated terminals. To be able to detect changes during the test, it is recommended to start monitoring before starting the test.
You need to discover the primary node to monitor a failover or run certain commands that only provide information about the replication status when executed on the primary node.
To discover the primary node, you can run the following commands as the <sid>adm user:
Copy to ClipboardCopied!Toggle word wrapToggle overflow
If you want to permanently monitor changes in the system replication status, please run the following command:
clusternode1:rh2adm> watch -n 5 'python /usr/sap/${SAPSYSTEMNAME}/HDB${TINSTANCE}/exe/python_support/systemReplicationStatus.py ; echo Status $?'
clusternode1:rh2adm> watch -n 5 'python /usr/sap/${SAPSYSTEMNAME}/HDB${TINSTANCE}/exe/python_support/systemReplicationStatus.py ; echo Status $?'
Copy to ClipboardCopied!Toggle word wrapToggle overflow
This example also determines the current return code. As long as the return code (status) is 15, the replication status is fine. The other return codes are:
10: NoHSR
11: Error
12: Unknown
13: Initializing
14: Syncing
15: Active
If you register a new secondary, you can run it in a separate window on the primary node and you will see the progress of the replication. If you want to monitor a failover, you can run it in parallel on the old primary as well as on the new primary database server. For more information, please read Check SAP HANA System Replication Status.
Pacemaker is writing a lot of information into the /var/log/messages file. During a failover, a huge number of messages are written into this message file. To be able to follow only the important messages depending on the SAP HANA resource agent, it is useful to filter the detailed activities of the pacemaker SAP resources. It is enough to check the messages file on a single cluster node.
For example, you can use this alias:
alias tmsl='tail -1000f /var/log/messages | egrep -s "Setting master-rsc_SAPHana_$SAPSYSTEMNAME_HDB${TINSTANCE}|sr_register|WAITING4LPA|PROMOTED|DEMOTED|UNDEFINED|master_walk|SWAIT|WaitforStopped|FAILED|LPT"'
[root@clusternode1]# alias tmsl='tail -1000f /var/log/messages | egrep -s "Setting master-rsc_SAPHana_$SAPSYSTEMNAME_HDB${TINSTANCE}|sr_register|WAITING4LPA|PROMOTED|DEMOTED|UNDEFINED|master_walk|SWAIT|WaitforStopped|FAILED|LPT"'
Copy to ClipboardCopied!Toggle word wrapToggle overflow
Run tmsl in a separate window to monitor the progress of the test. Please also check the example Monitor failover and sync state.
Will respond in the replication status that log_mode normally is required. log_mode can be detected as described in Using hdbsql to check Inifile contents.
Fix: change the log_mode to normal and restart the primary database.
CIB entries:
Detect: SFAIL entries in the cluster information base.
Sometimes it shows errors or warnings. You can cleanup/clear resources and if everything is fine, nothing happens. Before running the next test, you can cleanup your environment.
Copy to ClipboardCopied!Toggle word wrapToggle overflow
This output shows you that HANA is promoted on clusternode1, which is the primary SAP HANA server and that the name of the clone resource is SAPHana_RH2_02-clone which is promotable. You can run this in a separate window during the test to see changes:
watch pcs status --full
[root@clusternode1]# watch pcs status --full
Copy to ClipboardCopied!Toggle word wrapToggle overflow
Another way to identify the name of the SAP HANA clone resource is:
Copy to ClipboardCopied!Toggle word wrapToggle overflow
Cleanup the resource:
pcs resource cleanup SAPHana_RH2_02-clone
Cleaned up SAPHana_RH2_02:0 on clusternode2
Cleaned up SAPHana_RH2_02:1 on clusternode1
Waiting for 1 reply from the controller
... got reply (done)
[root@clusternode1]# pcs resource cleanup SAPHana_RH2_02-clone
Cleaned up SAPHana_RH2_02:0 on clusternode2
Cleaned up SAPHana_RH2_02:1 on clusternode1
Waiting for 1 reply from the controller
... got reply (done)
Copy to ClipboardCopied!Toggle word wrapToggle overflow
Result of the test
The “primary masters” monitor on remotehost3 should show an immediate switch to the new primary node.
If you check the cluster status, the former secondary will be promoted, the former primary gets re-register and the Clone_State changes from Promoted to Undefined to WAITINGFORLPA to DEMOTED.
The secondary will change the sync_state to SFAIL when the SAPHana monitor is started for the first time after the failover. Because of existing location constraints the resource needs to be cleared and after a short time the sync_state of the secondary will change to SOK again.
Secondary gets promoted.
To restore the initial state you can simply run the next test. After finishing the tests please run a cleanup.
Copy to ClipboardCopied!Toggle word wrapToggle overflow
The output of this example shows you that HANA is promoted on clusternode1, which is the primary SAP HANA server, and that the name of the clone resource is SAPHana_RH2_02-clone, which is promotable. If you run test 3 before HANA, you might be promoted on clusternode2.
Stop the database on remotehost3:
remotehost3:rh2adm> HDB stop
hdbdaemon will wait maximal 300 seconds for NewDB services finishing.
Stopping instance using: /usr/sap/RH2/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 02 -function Stop 400
12.07.2023 11:33:14
Stop
OK
Waiting for stopped instance using: /usr/sap/RH2/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 02 -function WaitforStopped 600 2
12.07.2023 11:33:30
WaitforStopped
OK
hdbdaemon is stopped.
remotehost3:rh2adm> HDB stop
hdbdaemon will wait maximal 300 seconds for NewDB services finishing.
Stopping instance using: /usr/sap/RH2/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 02 -function Stop 400
12.07.2023 11:33:14
Stop
OK
Waiting for stopped instance using: /usr/sap/RH2/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 02 -function WaitforStopped 600 2
12.07.2023 11:33:30
WaitforStopped
OK
hdbdaemon is stopped.
Copy to ClipboardCopied!Toggle word wrapToggle overflow
Copy to ClipboardCopied!Toggle word wrapToggle overflow
Check the sr_state to see the SAP HANA System Replication relationships:
clusternode2:rh2adm> hdbnsutil -sr_state
System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~
online: true
mode: primary
operation mode: primary
site id: 2
site name: DC1
is source system: true
is secondary/consumer system: false
has secondaries/consumers attached: true
is a takeover active: false
is primary suspended: false
Host Mappings:
~~~~~~~~~~~~~~
clusternode1 -> [DC3] remotehost3
clusternode1 -> [DC1] clusternode1
clusternode1 -> [DC2] clusternode2
Site Mappings:
~~~~~~~~~~~~~~
DC1 (primary/primary)
|---DC3 (syncmem/logreplay)
|---DC2 (syncmem/logreplay)
Tier of DC1: 1
Tier of DC3: 2
Tier of DC2: 2
Replication mode of DC1: primary
Replication mode of DC3: syncmem
Replication mode of DC2: syncmem
Operation mode of DC1: primary
Operation mode of DC3: logreplay
Operation mode of DC2: logreplay
Mapping: DC1 -> DC3
Mapping: DC1 -> DC2
done.
clusternode2:rh2adm> hdbnsutil -sr_state
System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~
online: true
mode: primary
operation mode: primary
site id: 2
site name: DC1
is source system: true
is secondary/consumer system: false
has secondaries/consumers attached: true
is a takeover active: false
is primary suspended: false
Host Mappings:
~~~~~~~~~~~~~~
clusternode1 -> [DC3] remotehost3
clusternode1 -> [DC1] clusternode1
clusternode1 -> [DC2] clusternode2
Site Mappings:
~~~~~~~~~~~~~~
DC1 (primary/primary)
|---DC3 (syncmem/logreplay)
|---DC2 (syncmem/logreplay)
Tier of DC1: 1
Tier of DC3: 2
Tier of DC2: 2
Replication mode of DC1: primary
Replication mode of DC3: syncmem
Replication mode of DC2: syncmem
Operation mode of DC1: primary
Operation mode of DC3: logreplay
Operation mode of DC2: logreplay
Mapping: DC1 -> DC3
Mapping: DC1 -> DC2
done.
Copy to ClipboardCopied!Toggle word wrapToggle overflow
The SAP HANA System Replication relations still have one primary (DC1), which is replicated to DC2 and DC3. The replication relationship on remotehost3, which is down, can be displayed using:
remothost3:rh2adm> hdbnsutil -sr_stateConfiguration
System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~
mode: syncmem
site id: 3
site name: DC3
active primary site: 1
primary masters: clusternode1
done.
remothost3:rh2adm> hdbnsutil -sr_stateConfiguration
System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~
mode: syncmem
site id: 3
site name: DC3
active primary site: 1
primary masters: clusternode1
done.
Copy to ClipboardCopied!Toggle word wrapToggle overflow
The database on remotehost3 which is offline checks the entries in the global.ini file.
Starting the test: Initiate a failover in the cluster, moving the SAPHana-clone-resource example:
Copy to ClipboardCopied!Toggle word wrapToggle overflow
Note
If SAPHana is promoted on clusternode2, you have to move the clone resource to clusternode1. The example expects that SAPHana is promoted on clusternode1.
There will be no output. Similar to the former test a location constraint will be created, which can be display with:
Copy to ClipboardCopied!Toggle word wrapToggle overflow
Cleanup the resource:
pcs resource cleanup SAPHana_RH2_02-clone
Cleaned up SAPHana_RH2_02:0 on clusternode2
Cleaned up SAPHana_RH2_02:1 on clusternode1
Waiting for 1 reply from the controller
... got reply (done)
[root@clusternode1]# pcs resource cleanup SAPHana_RH2_02-clone
Cleaned up SAPHana_RH2_02:0 on clusternode2
Cleaned up SAPHana_RH2_02:1 on clusternode1
Waiting for 1 reply from the controller
... got reply (done)
Copy to ClipboardCopied!Toggle word wrapToggle overflow
Check the current status. There are three ways to display the replication status which needs to be in sync. Starting with the primary on remotehost3:
Copy to ClipboardCopied!Toggle word wrapToggle overflow
The output shows site 1 or clusternode1 which was the primary before starting the test to move the primary to clusternode2. Next check the system replication status on the new primary. First detect the new primary:
Copy to ClipboardCopied!Toggle word wrapToggle overflow
Here we have an inconsistency which requires to re-register remotehost3. You might think, if we run the test again, we might switch the primary back to the original clusternode1. In this case, we have a third way to identify if system replication is working. On the primary node, which is our case clusternode2 run:
clusternode2:rh2adm> cdpy
clusternode2:rh2adm> python
$DIR_EXECUTABLE/python_support/systemReplicationStatus.py
|Database |Host |Port |Service Name |Volume ID |Site ID |Site Name |Secondary |Secondary |Secondary |Secondary |Secondary |Replication |Replication |Replication |Secondary |
| | | | | | | |Host |Port |Site ID |Site Name |Active Status |Mode |Status |Status Details |Fully Synced |
|-------- |------ |----- |------------ |--------- |------- |--------- |--------- |--------- |--------- |--------- |------------- |----------- |----------- |-------------- |------------ |
|SYSTEMDB |clusternode2 |30201 |nameserver | 1 | 2 |DC2 |clusternode1 | 30201 | 1 |DC1 |YES |SYNCMEM |ACTIVE | | True |
|RH2 |clusternode2 |30207 |xsengine | 2 | 2 |DC2 |clusternode1 | 30207 | 1 |DC1 |YES |SYNCMEM |ACTIVE | | True |
|RH2 |clusternode2 |30203 |indexserver | 3 | 2 |DC2 |clusternode1 | 30203 | 1 |DC1 |YES |SYNCMEM |ACTIVE | | True |
status system replication site "1": ACTIVE
overall system replication status: ACTIVE
Local System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mode: PRIMARY
site id: 2
site name: DC2
clusternode2:rh2adm> cdpy
clusternode2:rh2adm> python
$DIR_EXECUTABLE/python_support/systemReplicationStatus.py
|Database |Host |Port |Service Name |Volume ID |Site ID |Site Name |Secondary |Secondary |Secondary |Secondary |Secondary |Replication |Replication |Replication |Secondary |
| | | | | | | |Host |Port |Site ID |Site Name |Active Status |Mode |Status |Status Details |Fully Synced |
|-------- |------ |----- |------------ |--------- |------- |--------- |--------- |--------- |--------- |--------- |------------- |----------- |----------- |-------------- |------------ |
|SYSTEMDB |clusternode2 |30201 |nameserver | 1 | 2 |DC2 |clusternode1 | 30201 | 1 |DC1 |YES |SYNCMEM |ACTIVE | | True |
|RH2 |clusternode2 |30207 |xsengine | 2 | 2 |DC2 |clusternode1 | 30207 | 1 |DC1 |YES |SYNCMEM |ACTIVE | | True |
|RH2 |clusternode2 |30203 |indexserver | 3 | 2 |DC2 |clusternode1 | 30203 | 1 |DC1 |YES |SYNCMEM |ACTIVE | | True |
status system replication site "1": ACTIVE
overall system replication status: ACTIVE
Local System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mode: PRIMARY
site id: 2
site name: DC2
Copy to ClipboardCopied!Toggle word wrapToggle overflow
If you don’t see remotehost3 in this output, you have to re-register remotehost3. Before registering, please run the following on the primary node to watch the progress of the registration:
Copy to ClipboardCopied!Toggle word wrapToggle overflow
Now you can re-register remotehost3 using this command:
remotehost3:rh2adm> hdbnsutil -sr_register --remoteHost=clusternode2 --remoteInstance=${TINSTANCE} --replicationMode=async --name=DC3 --remoteName=DC2 --operation
Mode=logreplay --online
adding site ...
collecting information ...
updating local ini files ...
done.
remotehost3:rh2adm> hdbnsutil -sr_register --remoteHost=clusternode2 --remoteInstance=${TINSTANCE} --replicationMode=async --name=DC3 --remoteName=DC2 --operation
Mode=logreplay --online
adding site ...
collecting information ...
updating local ini files ...
done.
Copy to ClipboardCopied!Toggle word wrapToggle overflow
Even if the database on remotehost3 is not started yet, you are able to see the third site in the system replication status output. The registration can be finished by starting the database on remotehost3:
remotehost3:rh2adm> HDB start
StartService
Impromptu CCC initialization by 'rscpCInit'.
See SAP note 1266393.
OK
OK
Starting instance using: /usr/sap/RH2/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 02 -function StartWait 2700 2
04.09.2023 11:36:47
Start
OK
remotehost3:rh2adm> HDB start
StartService
Impromptu CCC initialization by 'rscpCInit'.
See SAP note 1266393.
OK
OK
Starting instance using: /usr/sap/RH2/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 02 -function StartWait 2700 2
04.09.2023 11:36:47
Start
OK
Copy to ClipboardCopied!Toggle word wrapToggle overflow
The monitor started above will immediately show the synchronization of remotehost3.
To switch back, run the test again. One optional test is to switch the primary to the node, which is configured on the global.ini on remotehost3 and then starting the database. The database might come up, but it will never be shown in the output of the system replication status, unless it is re-registered.
Copy to ClipboardCopied!Toggle word wrapToggle overflow
Now we have a proper environment and we can start monitoring the system replication status on all 3 nodes in separate windows.
The 3 monitors should be started before the test is started. The output will change when the test is executed. So keep them running as long as the test is not completed.
On the old primary node, clusternode1 ran in a separate window during the test:
clusternode1:rh2adm> watch -n 5 'python /usr/sap/$SAPSYSTEMNAME/HDB${TINSTANCE}/exe/python_support/systemReplicationStatus.py ; echo Status $?`
clusternode1:rh2adm> watch -n 5 'python /usr/sap/$SAPSYSTEMNAME/HDB${TINSTANCE}/exe/python_support/systemReplicationStatus.py ; echo Status $?`
Copy to ClipboardCopied!Toggle word wrapToggle overflow
The output on clusternode1 will be:
Every 5.0s: python /usr/sap/$SAPSYSTEMNAME/HDB${TINSTANCE}/exe/python_support/systemReplicati... clusternode1: Tue XXX XX HH:MM:SS 2023
|Database |Host |Port |Service Name |Volume ID |Site ID |Site Name |Secondary |Secondary |Secondary |Secondary |Secondary |
Replication |Replication |Replication |Secondary |
| | | | | | | |Host |Port |Site ID |Site Name |Active Status |
Mode |Status |Status Details |Fully Synced |
|-------- |------ |----- |------------ |--------- |------- |--------- |--------- |--------- |--------- |--------- |------------- |
----------- |----------- |-------------- |------------ |
|SYSTEMDB |clusternode1 |30201 |nameserver | 1 | 1 |DC1 |remotehost3 | 30201 | 3 |DC3 |YES |
ASYNC |ACTIVE | | True |
|RH2 |clusternode1 |30207 |xsengine | 2 | 1 |DC1 |remotehost3 | 30207 | 3 |DC3 |YES |
ASYNC |ACTIVE | | True |
|RH2 |clusternode1 |30203 |indexserver | 3 | 1 |DC1 |remotehost3 | 30203 | 3 |DC3 |YES |
ASYNC |ACTIVE | | True |
|SYSTEMDB |clusternode1 |30201 |nameserver | 1 | 1 |DC1 |clusternode2 | 30201 | 2 |DC2 |YES |
SYNCMEM |ACTIVE | | True |
|RH2 |clusternode1 |30207 |xsengine | 2 | 1 |DC1 |clusternode2 | 30207 | 2 |DC2 |YES |
SYNCMEM |ACTIVE | | True |
|RH2 |clusternode1 |30203 |indexserver | 3 | 1 |DC1 |clusternode2 | 30203 | 2 |DC2 |YES |
SYNCMEM |ACTIVE | | True |
status system replication site "3": ACTIVE
status system replication site "2": ACTIVE
overall system replication status: ACTIVE
Local System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mode: PRIMARY
site id: 1
site name: DC1
Status 15
Every 5.0s: python /usr/sap/$SAPSYSTEMNAME/HDB${TINSTANCE}/exe/python_support/systemReplicati... clusternode1: Tue XXX XX HH:MM:SS 2023
|Database |Host |Port |Service Name |Volume ID |Site ID |Site Name |Secondary |Secondary |Secondary |Secondary |Secondary |
Replication |Replication |Replication |Secondary |
| | | | | | | |Host |Port |Site ID |Site Name |Active Status |
Mode |Status |Status Details |Fully Synced |
|-------- |------ |----- |------------ |--------- |------- |--------- |--------- |--------- |--------- |--------- |------------- |
----------- |----------- |-------------- |------------ |
|SYSTEMDB |clusternode1 |30201 |nameserver | 1 | 1 |DC1 |remotehost3 | 30201 | 3 |DC3 |YES |
ASYNC |ACTIVE | | True |
|RH2 |clusternode1 |30207 |xsengine | 2 | 1 |DC1 |remotehost3 | 30207 | 3 |DC3 |YES |
ASYNC |ACTIVE | | True |
|RH2 |clusternode1 |30203 |indexserver | 3 | 1 |DC1 |remotehost3 | 30203 | 3 |DC3 |YES |
ASYNC |ACTIVE | | True |
|SYSTEMDB |clusternode1 |30201 |nameserver | 1 | 1 |DC1 |clusternode2 | 30201 | 2 |DC2 |YES |
SYNCMEM |ACTIVE | | True |
|RH2 |clusternode1 |30207 |xsengine | 2 | 1 |DC1 |clusternode2 | 30207 | 2 |DC2 |YES |
SYNCMEM |ACTIVE | | True |
|RH2 |clusternode1 |30203 |indexserver | 3 | 1 |DC1 |clusternode2 | 30203 | 2 |DC2 |YES |
SYNCMEM |ACTIVE | | True |
status system replication site "3": ACTIVE
status system replication site "2": ACTIVE
overall system replication status: ACTIVE
Local System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mode: PRIMARY
site id: 1
site name: DC1
Status 15
Copy to ClipboardCopied!Toggle word wrapToggle overflow
On remotehost3 run the same command:
remotehost3:rh2adm> watch -n 5 'python /usr/sap/$SAPSYSTEMNAME/HDB${TINSTANCE}/exe/python_support/systemReplicationStatus.py ; echo Status $?'
remotehost3:rh2adm> watch -n 5 'python /usr/sap/$SAPSYSTEMNAME/HDB${TINSTANCE}/exe/python_support/systemReplicationStatus.py ; echo Status $?'
Copy to ClipboardCopied!Toggle word wrapToggle overflow
The response will be:
this system is either not running or not primary system replication site
this system is either not running or not primary system replication site
Copy to ClipboardCopied!Toggle word wrapToggle overflow
The output will change after the test initiates the failover. The output looks similar to the example of the primary node before the test was started.
Copy to ClipboardCopied!Toggle word wrapToggle overflow
[Optional] Put the cluster into maintenance-mode:
pcs property set maintenance-mode=true
[root@clusternode1]# pcs property set maintenance-mode=true
Copy to ClipboardCopied!Toggle word wrapToggle overflow
During the tests you will find out that the failover will work with and without setting the maintenance-mode. So you can run the first test without it. While recovering it should be done, I just want to show you, it works with and without. Which is an option in terms of the primary is not accessible.
Start the test: Failover to DC3. On remotehost3 please run:
remotehost3:rh2adm> hdbnsutil -sr_takeover
done.
remotehost3:rh2adm> hdbnsutil -sr_takeover
done.
Copy to ClipboardCopied!Toggle word wrapToggle overflow
The test has started, and now please check the output of the previously started monitors.
On the clusternode1 the system replication status will lose its relationship to remotehost3 and clusternode2 (DC2):
Every 5.0s: python /usr/sap/RH2/HDB02/exe/python_support/systemReplicationStatus.py ; echo Status $? clusternode1: Mon Sep 4 11:52:16 2023
|Database |Host |Port |Service Name |Volume ID |Site ID |Site Name |Secondary |Secondary |Secondary |Secondary |Secondary |Replication |Replication |Replic
ation |Secondary |
| | | | | | | |Host |Port |Site ID |Site Name |Active Status |Mode |Status |Status
Details |Fully Synced |
|-------- |------ |----- |------------ |--------- |------- |--------- |--------- |--------- |--------- |--------- |------------- |----------- |----------- |------
---------------------- |------------ |
|SYSTEMDB |clusternode1 |30201 |nameserver | 1 | 1 |DC1 |clusternode2 | 30201 | 2 |DC2 |YES |SYNCMEM |ERROR |Commun
ication channel closed | False |
|RH2 |clusternode1 |30207 |xsengine | 2 | 1 |DC1 |clusternode2 | 30207 | 2 |DC2 |YES |SYNCMEM |ERROR |Commun
ication channel closed | False |
|RH2 |clusternode1 |30203 |indexserver | 3 | 1 |DC1 |clusternode2 | 30203 | 2 |DC2 |YES |SYNCMEM |ERROR |Commun
ication channel closed | False |
status system replication site "2": ERROR
overall system replication status: ERROR
Local System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mode: PRIMARY
site id: 1
site name: DC1
Status 11
Every 5.0s: python /usr/sap/RH2/HDB02/exe/python_support/systemReplicationStatus.py ; echo Status $? clusternode1: Mon Sep 4 11:52:16 2023
|Database |Host |Port |Service Name |Volume ID |Site ID |Site Name |Secondary |Secondary |Secondary |Secondary |Secondary |Replication |Replication |Replic
ation |Secondary |
| | | | | | | |Host |Port |Site ID |Site Name |Active Status |Mode |Status |Status
Details |Fully Synced |
|-------- |------ |----- |------------ |--------- |------- |--------- |--------- |--------- |--------- |--------- |------------- |----------- |----------- |------
---------------------- |------------ |
|SYSTEMDB |clusternode1 |30201 |nameserver | 1 | 1 |DC1 |clusternode2 | 30201 | 2 |DC2 |YES |SYNCMEM |ERROR |Commun
ication channel closed | False |
|RH2 |clusternode1 |30207 |xsengine | 2 | 1 |DC1 |clusternode2 | 30207 | 2 |DC2 |YES |SYNCMEM |ERROR |Commun
ication channel closed | False |
|RH2 |clusternode1 |30203 |indexserver | 3 | 1 |DC1 |clusternode2 | 30203 | 2 |DC2 |YES |SYNCMEM |ERROR |Commun
ication channel closed | False |
status system replication site "2": ERROR
overall system replication status: ERROR
Local System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mode: PRIMARY
site id: 1
site name: DC1
Status 11
Copy to ClipboardCopied!Toggle word wrapToggle overflow
The cluster still doesn’t notice this behavior. If you check the return code of the system replication status, Returncode 11 means error, which tells you something is wrong. If you have access, it is a good idea to enter maintenance-mode now.
The remotehost3 becomes the new primary and clusternode2 (DC2) gets automatically registered as the new primary on remotehost3.
Example output of the system replication state of remotehost3:
Every 5.0s: python /usr/sap/RH2/HDB02/exe/python_support/systemReplicationStatus.py ; echo Status $? remotehost3: Mon Sep 4 13:55:29 2023
|Database |Host |Port |Service Name |Volume ID |Site ID |Site Name |Secondary |Secondary |Secondary |Secondary |Secondary |Replication |Replication |Replic
ation |Secondary |
| | | | | | | |Host |Port |Site ID |Site Name |Active Status |Mode |Status |Status
Details |Fully Synced |
|-------- |------ |----- |------------ |--------- |------- |--------- |--------- |--------- |--------- |--------- |------------- |----------- |----------- |------
-------- |------------ |
|SYSTEMDB |remotehost3 |30201 |nameserver | 1 | 3 |DC3 |clusternode2 | 30201 | 2 |DC2 |YES |SYNCMEM |ACTIVE |
| True |
|RH2 |remotehost3 |30207 |xsengine | 2 | 3 |DC3 |clusternode2 | 30207 | 2 |DC2 |YES |SYNCMEM |ACTIVE |
| True |
|RH2 |remotehost3 |30203 |indexserver | 3 | 3 |DC3 |clusternode2 | 30203 | 2 |DC2 |YES |SYNCMEM |ACTIVE |
| True |
status system replication site "2": ACTIVE
overall system replication status: ACTIVE
Local System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mode: PRIMARY
site id: 3
site name: DC3
Status 15
Every 5.0s: python /usr/sap/RH2/HDB02/exe/python_support/systemReplicationStatus.py ; echo Status $? remotehost3: Mon Sep 4 13:55:29 2023
|Database |Host |Port |Service Name |Volume ID |Site ID |Site Name |Secondary |Secondary |Secondary |Secondary |Secondary |Replication |Replication |Replic
ation |Secondary |
| | | | | | | |Host |Port |Site ID |Site Name |Active Status |Mode |Status |Status
Details |Fully Synced |
|-------- |------ |----- |------------ |--------- |------- |--------- |--------- |--------- |--------- |--------- |------------- |----------- |----------- |------
-------- |------------ |
|SYSTEMDB |remotehost3 |30201 |nameserver | 1 | 3 |DC3 |clusternode2 | 30201 | 2 |DC2 |YES |SYNCMEM |ACTIVE |
| True |
|RH2 |remotehost3 |30207 |xsengine | 2 | 3 |DC3 |clusternode2 | 30207 | 2 |DC2 |YES |SYNCMEM |ACTIVE |
| True |
|RH2 |remotehost3 |30203 |indexserver | 3 | 3 |DC3 |clusternode2 | 30203 | 2 |DC2 |YES |SYNCMEM |ACTIVE |
| True |
status system replication site "2": ACTIVE
overall system replication status: ACTIVE
Local System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mode: PRIMARY
site id: 3
site name: DC3
Status 15
Copy to ClipboardCopied!Toggle word wrapToggle overflow
The returncode 15 also says everything is okay but clusternode1 is missing. This must be re-registered manually. The former primary clusternode1 is not listed, so the replication relationship is lost.
Set maintenance-mode.
If not already done before set maintenance-mode on the cluster on one node of the cluster with the command:
pcs property set maintenance-mode=true
[root@clusternode1]# pcs property set maintenance-mode=true
Copy to ClipboardCopied!Toggle word wrapToggle overflow
You can check if the maintenance-mode is active by running this command:
Copy to ClipboardCopied!Toggle word wrapToggle overflow
The resources are displaying unmanaged, this indicates that the cluster is in maintenance-mode=true. The virtual IP address is still started on clusternode1. If you want to use this IP on another node, please disable vip_RH2_02_MASTER before you set maintanence-mode=true.
Copy to ClipboardCopied!Toggle word wrapToggle overflow
After the registration is done, you will see on remotehost3 all three sites replicated and the status (return code) will change to 15. If this fails, you have to manually remove the replication relationships on DC1 and DC3. Please follow the instructions described in Register Secondary. For example list the existing relations with:
hdbnsutil -sr_state
hdbnsutil -sr_state
Copy to ClipboardCopied!Toggle word wrapToggle overflow
Example to remove the existing relations you can use:
Copy to ClipboardCopied!Toggle word wrapToggle overflow
On all three nodes the primary database is remotehost3. On this primary database, you have to ensure, that the system replication status is active for all three nodes and the return code is 15:
We help Red Hat users innovate and achieve their goals with our products and services with content they can trust. Explore our recent updates.
Making open source more inclusive
Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.
About Red Hat
We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.