Chapter 7. Useful commands
Below are 3 sections of useful commands. In most cases, it should help to verify successful operation or configuration. Examples are listed together with the response. In some cases, the output has been adjusted for formatting reasons.
-
All commands listed in this document when executed by the
<sid>admuser start with>. -
All commands run by the
root userstart with a#. -
To execute the commands, omit the prefix
>or#.
7.1. SAP HANA commands Copy linkLink copied to clipboard!
The SAP HANA commands are executed by the <sid>adm user. Example:
[root@clusternode1]# su - rh2adm
clusternode1:rh2adm> cdpy
clusternode1:rh2adm> pwd
/usr/sap/RH2/HDB02/exe/python_support
clusternode1:rh2adm> python systemReplicationStatus.py -h
systemReplicationStatus.py [-h|--help] [-a|--all] [-l|--localhost] [-m|--multiTaget] [-s|--site=<site name>] [-t|--printLandscapeTree] [--omitSecondaryActiveStatus] [--sapcontrol=1]
clusternode1:rh2adm> python landscapeHostConfiguration.py -h
landscapeHostConfiguration.py [-h|--help] [--localhost] [--sapcontrol=1]
clusternode1:rh2adm> hdbnsutil # run hdbnsutil without parameters to get help
7.1.1. SAP HANA installation using hdbclm Copy linkLink copied to clipboard!
The installation of the third site is similar to the installation of the second site. The installation can be done with hdblcm as user root. To ensure that nothing is installed before, run hdbuninst to check if SAP HANA is not already installed on this node.
Example output of HANA uninstallation:
[root@remotehost3]# cd /software/DATA_UNITS/HDB_SERVER_LINUX_X86_64
root@DC3/software/DATA_UNITS/HDB_SERVER_LINUX_X86_64# ./hdbuninst
Option 0 will remove an already existing HANA Installation
No SAP HANA Installation found is the expected answer
Example output of HANA installation on DC3:
----[root@remotehost3]# cd /software/DATA_UNITS/HDB_SERVER_LINUX_X86_64
# ./hdbuninst
Option 0 will remove an already existing HANA Installation
No SAP HANA Installation found is the expected answer
----
Example output of HANA installation:
[source,text]
----
[root@remotehost3]# ./hdblcm
1 install
2 server
/hana/shared is default directory
Enter Local Hostname [remotehost3]: use the default name
additional hosts only during Scale-Out Installation y default is n
ENTER SAP HANA System ID: RH2
Enter Instance Number [02]:
Enter Local Host Worker Group [default]:
Select System Usage / Enter Index [4]:
Choose encryption
Enter Location of Data Volumes [/hana/data/RH2]:
Enter Location of Log Volumes [/hana/log/RH2]:
Restrict maximum memory allocation? [n]:
Enter Certificate Host Name
Enter System Administrator (rh2adm) Password: <Y0urPasswd>
Confirm System Administrator (rh2adm) Password: <Y0urPasswd>
Enter System Administrator Home Directory [/usr/sap/RH2/home]:
Enter System Administrator Login Shell [/bin/sh]:
Enter System Administrator User ID [1000]:
Enter System Database User (SYSTEM) Password: <Y0urPasswd>
Confirm System Database User (SYSTEM) Password: <Y0urPasswd>
Restart system after machine reboot? [n]:
----
Before the installation starts, a summary is listed:
SAP HANA Database System Installation
Installation Parameters
Remote Execution: ssh
Database Isolation: low
Install Execution Mode: standard
Installation Path: /hana/shared
Local Host Name: dc3host
SAP HANA System ID: RH2
Instance Number: 02
Local Host Worker Group: default
System Usage: custom
Location of Data Volumes: /hana/data/RH2
Location of Log Volumes: /hana/log/RH2
SAP HANA Database secure store: ssfs
Certificate Host Names: remotehost3 -> remotehost3 System Administrator Home Directory: /usr/sap/RH2/home
System Administrator Login Shell: /bin/sh
System Administrator User ID: 1000
ID of User Group (sapsys): 1010
Software Components
SAP HANA Database
Install version 2.00.052.00.1599235305
Location: /software/DATA_UNITS/HDB_SERVER_LINUX_X86_64/server
SAP HANA Local Secure Store
Do not install
SAP HANA AFL (incl.PAL,BFL,OFL)
Do not install
SAP HANA EML AFL
Do not install
SAP HANA EPM-MDS
Do not install
SAP HANA Database Client
Do not install
SAP HANA Studio
Do not install
SAP HANA Smart Data Access
Do not install
SAP HANA XS Advanced Runtime
Do not install
Log File Locations
Log directory: /var/tmp/hdb_RH2_hdblcm_install_2021-06-09_18.48.13
Trace location: /var/tmp/hdblcm_2021-06-09_18.48.13_31307.trc
Do you want to continue? (y/n):
Enter y to start the installation.
7.1.2. Using hdbsql to check Inifile contents Copy linkLink copied to clipboard!
clusternode1:rh2adm> hdbsql -i ${TINSTANCE} -u system -p Y0urP8ssw0rd
Welcome to the SAP HANA Database interactive terminal.
Type: \h for help with commands
\q to quit
hdbsql RH2=> select * from M_INIFILE_CONTENTS where section='system_replication'
FILE_NAME,LAYER_NAME,TENANT_NAME,HOST,SECTION,KEY,VALUE
"global.ini","DEFAULT","","","system_replication","actual_mode","primary"
"global.ini","DEFAULT","","","system_replication","mode","primary"
"global.ini","DEFAULT","","","system_replication","operation_mode","logreplay"
"global.ini","DEFAULT","","","system_replication","register_secondaries_on_takeover
","true"
"global.ini","DEFAULT","","","system_replication","site_id","1"
"global.ini","DEFAULT","","","system_replication","site_name","DC2"
"global.ini","DEFAULT","","","system_replication","timetravel_logreplay_mode","auto
"
"global.ini","DEFAULT","","","system_replication","alternative_sources",""
"global.ini","DEFAULT","","","system_replication","datashipping_logsize_threshold",
"5368709120"
"global.ini","DEFAULT","","","system_replication","datashipping_min_time_interval",
"600"
"global.ini","DEFAULT","","","system_replication","datashipping_parallel_channels",
"4"
"global.ini","DEFAULT","","","system_replication","datashipping_parallel_processing
","true"
"global.ini","DEFAULT","","","system_replication","datashipping_snapshot_max_retent
ion_time","300"
"global.ini","DEFAULT","","","system_replication","enable_data_compression","false"
"global.ini","DEFAULT","","","system_replication","enable_full_sync","false"
"global.ini","DEFAULT","","","system_replication","enable_log_compression","false"
"global.ini","DEFAULT","","","system_replication","enable_log_retention","auto"
"global.ini","DEFAULT","","","system_replication","full_replica_on_failed_delta_syn
c_check","false"
"global.ini","DEFAULT","","","system_replication","hint_based_routing_site_name",""
"global.ini","DEFAULT","","","system_replication","keep_old_style_alert","false"
"global.ini","DEFAULT","","","system_replication","logshipping_async_buffer_size","
67108864"
"global.ini","DEFAULT","","","system_replication","logshipping_async_wait_on_buffer
_full","true"
"global.ini","DEFAULT","","","system_replication","logshipping_max_retention_size",
"1048576"
"global.ini","DEFAULT","","","system_replication","logshipping_replay_logbuffer_cac
he_size","1073741824"
"global.ini","DEFAULT","","","system_replication","logshipping_replay_push_persiste
nt_segment_count","5"
"global.ini","DEFAULT","","","system_replication","logshipping_snapshot_logsize_thr
eshold","3221225472"
"global.ini","DEFAULT","","","system_replication","logshipping_snapshot_min_time_in
terval","900"
"global.ini","DEFAULT","","","system_replication","logshipping_timeout","30"
"global.ini","DEFAULT","","","system_replication","preload_column_tables","true"
"global.ini","DEFAULT","","","system_replication","propagate_log_retention","off"
"global.ini","DEFAULT","","","system_replication","reconnect_time_interval","30"
"global.ini","DEFAULT","","","system_replication","retries_before_register_to_alter
native_source","20"
"global.ini","DEFAULT","","","system_replication","takeover_esserver_without_log_ba
ckup","false"
"global.ini","DEFAULT","","","system_replication","takeover_wait_until_esserver_res
tart","true"
"global.ini","DEFAULT","","","system_replication","timetravel_call_takeover_hooks",
"off"
"global.ini","DEFAULT","","","system_replication","timetravel_log_retention_policy"
,"none"
"global.ini","DEFAULT","","","system_replication","timetravel_max_retention_time","
0"
"global.ini","DEFAULT","","","system_replication","timetravel_snapshot_creation_int
erval","1440"
"indexserver.ini","DEFAULT","","","system_replication","logshipping_async_buffer_si
ze","268435456"
"indexserver.ini","DEFAULT","","","system_replication","logshipping_replay_logbuffe
r_cache_size","4294967296"
"indexserver.ini","DEFAULT","","","system_replication","logshipping_replay_push_per
sistent_segment_count","20"
41 rows selected (overall time 1971.958 msec; server time 31.359 msec)
7.1.3. Check database Copy linkLink copied to clipboard!
Check if the database is running and discover the current primary node.
List database instances
clusternode1:rh2adm> sapcontrol -nr ${TINSTANCE} -function
GetSystemInstanceList
23.06.2023 12:08:17
GetSystemInstanceList
OK
hostname, instanceNr, httpPort, httpsPort, startPriority, features, dispstatus
node1, 2, 50213, 50214, 0.3, HDB|HDB_WORKER, GREEN
If the output is green the instance is running.
List database processes
clusternode1:rh2adm> sapcontrol -nr ${TINSTANCE} -function GetProcessList
GetProcessList
OK
name, description, dispstatus, textstatus, starttime, elapsedtime, pid
hdbdaemon, HDB Daemon, GREEN, Running, 2023 09 04 14:34:01, 18:41:33, 3788067
hdbcompileserver, HDB Compileserver, GREEN, Running, 2023 09 04 22:35:40, 10:39:54, 445299
hdbindexserver, HDB Indexserver-RH2, GREEN, Running, 2023 09 04 22:35:40, 10:39:54, 445391
hdbnameserver, HDB Nameserver, GREEN, Running, 2023 09 04 22:35:34, 10:40:00, 445178
hdbpreprocessor, HDB Preprocessor, GREEN, Running, 2023 09 04 22:35:40, 10:39:54, 445306
hdbwebdispatcher, HDB Web Dispatcher, GREEN, Running, 2023 09 04 22:35:53, 10:39:41, 445955
hdbxsengine, HDB XSEngine-RH2, GREEN, Running, 2023 09 04 22:35:40, 10:39:54, 445394
Usually, all database processes have the status GREEN.
List SAP HANA processes
clusternode1:rh2adm> HDB info
USER PID PPID %CPU VSZ RSS COMMAND
rh2adm 1560 1559 0.0 6420 3136 watch -n 5 sapcontrol -nr 02 -functi
rh2adm 1316 1315 0.0 8884 5676 -sh
rh2adm 2549 1316 0.0 7516 4072 \_ /bin/sh /usr/sap/RH2/HDB02/HDB i
rh2adm 2579 2549 0.0 10144 3576 \_ ps fx -U rh2adm -o user:8,pi
rh2adm 2388 1 0.0 679536 55520 hdbrsutil --start --port 30203 --vo
rh2adm 1921 1 0.0 679196 55312 hdbrsutil --start --port 30201 --vo
rh2adm 1469 1 0.0 8852 3260 sapstart pf=/usr/sap/RH2/SYS/profile
rh2adm 1476 1469 0.7 438316 86288 \_ /usr/sap/RH2/HDB02/remotehost3/trace/
rh2adm 1501 1476 11.7 9690172 1574796 \_ hdbnameserver
rh2adm 1845 1476 0.8 410696 122988 \_ hdbcompileserver
rh2adm 1848 1476 1.0 659464 154072 \_ hdbpreprocessor
rh2adm 1899 1476 14.7 9848276 1765208 \_ hdbindexserver -port 30203
rh2adm 1902 1476 8.4 5023288 1052768 \_ hdbxsengine -port 30207
rh2adm 2265 1476 5.2 2340284 405016 \_ hdbwebdispatcher
rh2adm 1117 1 1.1 543532 30676 /usr/sap/RH2/HDB02/exe/sapstartsrv p
rh2adm 1029 1 0.0 20324 11572 /usr/lib/systemd/systemd --user
rh2adm 1030 1029 0.0 23256 3536 \_ (sd-pam)
Display SAP HANA landscape configuration
clusternode1:rh2adm>
/usr/sap/${SAPSYSTEMNAME}/HDB${TINSTANCE}/exe/Python/bin/python
/usr/sap/${SAPSYSTEMNAME}/HDB${TINSTANCE}/exe/python_support/landscapeHostConfiguration.py;echo $?
| Host | Host | Host | Failover | Remove | Storage | Storage | Failover | Failover | NameServer | NameServer | IndexServer | IndexServer | Host | Host | Worker | Worker |
| | Active | Status | Status | Status | Config | Actual | Config | Actual | Config | Actual | Config | Actual | Config | Actual | Config | Actual |
| | | | | | Partition | Partition | Group | Group | Role | Role | Role | Role | Roles | Roles | Groups | Groups |
| ------ | ------ | ------ | -------- | ------ | --------- | --------- | -------- | -------- | ---------- | ---------- | ----------- | ----------- | ------ | ------ | ------- | ------- |
| clusternode1 | yes | ok | | | 1 | 1 | default | default | master 1 | master | worker | master | worker | worker | default | default |
overall host status: ok
4
Returncodes:
- 0: Fatal
- 1: Error
- 2: Warning
- 3: Info
- 4: OK
Discover primary database
clusternode1:rh2adm> hdbnsutil -sr_state | egrep -e "primary masters|^mode"
Example of check on a secondary:
clusternode1:rh2adm> hdbnsutil -sr_state | egrep -e "primary masters|^mode"
mode: syncmem
primary masters: clusternode1
Example of check on the current primary:
clusternode1:rh2adm> hdbnsutil -sr_state | egrep -e "primary masters|^mode"
mode: primary
clusternode1:rh2adm>hdbnsutil -sr_state --sapcontrol=1 |grep site.*Mode
siteReplicationMode/DC1=primary
siteReplicationMode/DC3=async
siteReplicationMode/DC2=syncmem
siteOperationMode/DC1=primary
siteOperationMode/DC3=logreplay
siteOperationMode/DC2=logreplay
Display the database version
Example using SQL query:
hdbsql RH2=> select * from m_database
SYSTEM_ID,DATABASE_NAME,HOST,START_TIME,VERSION,USAGE
"RH2","RH2","node1","2023-06-22 15:33:05.235000000","2.00.059.02.1647435895","CUSTOM"
1 row selected (overall time 29.107 msec; server time 927 usec)
Example using systemOverview.py:
clusternode1:rh2adm> python ./systemOverview.py
| Section | Name | Status | Value |
| ---------- | --------------- | ------- | --------------------------------------------------- |
| System | Instance ID | | RH2 |
| System | Instance Number | | 02 |
| System | Distributed | | No |
| System | Version | | 2.00.059.02.1647435895 (fa/hana2sp05) |
| System | Platform | | Red Hat Enterprise Linux 9.2 Beta (Plow) 9.2 (Plow) |
| Services | All Started | OK | Yes |
| Services | Min Start Time | | 2023-07-14 16:31:19.000 |
| Services | Max Start Time | | 2023-07-26 11:23:17.324 |
| Memory | Memory | OK | Physical 31.09 GB, Swap 10.00 GB, Used 26.38 |
| CPU | CPU | OK | Available 4, Used 1.04 |
| Disk | Data | OK | Size 89.0 GB, Used 59.3 GB, Free 33 % |
| Disk | Log | OK | Size 89.0 GB, Used 59.3 GB, Free 33 % |
| Disk | Trace | OK | Size 89.0 GB, Used 59.3 GB, Free 33 % |
| Statistics | Alerts | WARNING | cannot check statistics w/o SQL connection |
7.1.4. Start and stop SAP HANA Copy linkLink copied to clipboard!
Option 1: HDB command
clusternode1:rh2adm> HDB help
Usage: /usr/sap/RH2/HDB02/HDB { start|stop|reconf|restart|version|info|proc|admin|kill|kill-<sig>|term }
kill or kill-9 should never be used in a productive environment!
Start the Database
clusternode1:rh2adm> HDB startStop the database
clusternode1:rh2adm> HDB stop
Option 2 (recommended): Use sapcontrol
clusternode1:rh2adm> sapcontrol -nr ${TINSTANCE} -function StartSystem HDB
03.07.2023 14:08:30
StartSystem
OK
clusternode1:rh2adm> sapcontrol -nr ${TINSTANCE} -function StopSystem HDB
03.07.2023 14:09:33
StopSystem
OK
Use the GetProcessList to monitor the starting and stopping of HANA services:
clusternode1:rh2adm> sapcontrol -nr ${TINSTANCE} -function GetProcessList
7.1.5. Check SAP HANA System Replication status Copy linkLink copied to clipboard!
There are many ways to check the SAP HANA System Replication status:
- `clusternode1:rh2adm> python systemReplicationStatus.py ` on the primary node
-
clusternode1:rh2adm> echo $? #(Return code of systemReplicationStatus) -
clusternode1:rh2adm> hdbnsutil -sr_state -
clusternode1:rh2adm> hdbnsutil -sr_stateConfiguration
Example of systemReplicationStatus.py output running as a monitor:
clusternode1:rh2adm> watch -n 5 "python
/usr/sap/${SAPSYSTEMNAME}/HDB{TINSTACE}/exe/python_support/systemReplicationStatus.py;echo \$?"
concurrent-fencing: true
Every 5.0s: python systemReplicationStatus.py;echo $? hana08: Fri Jul 28 17:01:05 2023
|Database |Host |Port |Service Name |Volume ID |Site ID |Site Name |Secondary |Secondary |Secondary |Secondary |Secondary |Replication |Replication |Replication |
| | | | | | | |Host |Port |Site ID |Site Name |Active Status |Mode |Status |Status Details |
|-------- |------ |----- |------------ |--------- |------- |--------- |--------- |--------- |--------- |--------- |------------- |----------- |----------- |-------------- |
|SYSTEMDB |hana08 |30201 |nameserver | 1 | 1 |DC2 |hana09 | 30201 | 3 |DC3 |YES |SYNCMEM |ACTIVE | |
|RH2 |hana08 |30207 |xsengine | 2 | 1 |DC2 |hana09 | 30207 | 3 |DC3 |YES |SYNCMEM |ACTIVE | |
|RH2 |hana08 |30203 |indexserver | 3 | 1 |DC2 |hana09 | 30203 | 3 |DC3 |YES |SYNCMEM |ACTIVE | |
|SYSTEMDB |hana08 |30201 |nameserver | 1 | 1 |DC2 |remotehost3 | 30201 | 2 |DC1 |YES |SYNCMEM |ACTIVE | |
|RH2 |hana08 |30207 |xsengine | 2 | 1 |DC2 |remotehost3 | 30207 | 2 |DC1 |YES |SYNCMEM |ACTIVE | |
|RH2 |hana08 |30203 |indexserver | 3 | 1 |DC2 |remotehost3 | 30203 | 2 |DC1 |YES |SYNCMEM |ACTIVE | |
status system replication site "3": ACTIVE
status system replication site "2": ACTIVE
overall system replication status: ACTIVE
Local System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mode: PRIMARY
site id: 1
site name: DC2
15
The expected results for the return codes are:
- 10: NoHSR
- 11: Error
- 12: Unknown
- 13: Initializing
- 14: Syncing
- 15: Active
In most cases the System Replication check will return with return code 15. Another display option is to use -t (printLandscapeTree).
Example for the output on the current primary:
clusternode1:rh2adm> python systemReplicationStatus.py -t
HANA System Replication landscape:
DC1 ( primary )
| --- DC3 ( syncmem )
| --- DC2 ( syncmem )
Example of hdbnsutil -sr_state:
[root@clusternode1]# su - rh2adm
clusternode1:rh2adm> watch -n 10 hdbnsutil -sr_state
Every 10.0s: hdbnsutil -sr_state clusternode1: Thu Jun 22 08:42:00 2023
System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~
online: true
mode: syncmem
operation mode: logreplay
site id: 2
site name: DC1
is source system: false
is secondary/consumer system: true
has secondaries/consumers attached: false
is a takeover active: false
is primary suspended: false
is timetravel enabled: false
replay mode: auto
active primary site: 1
primary masters: clusternode2
Host Mappings:
~~~~~~~~~~~~~~
clusternode1 -> [DC3] remotehost3
clusternode1 -> [DC1] clusternode1
clusternode1 -> [DC2] clusternode2
Site Mappings:
~~~~~~~~~~~~~~
DC2 (primary/primary)
|---DC3 (syncmem/logreplay)
|---DC1 (syncmem/logreplay)
Tier of DC2: 1
Tier of DC3: 2
Tier of DC1: 2
Replication mode of DC2: primary
[2] 0:ssh*
Example of sr_stateConfiguation on the primary:
clusternode1:rh2adm> hdbnsutil -sr_stateConfiguration
System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~
mode: primary
site id: 2
site name: DC1
done.
Example of sr_stateConfiguration on the secondary:
clusternode1:rh2adm> hdbnsutil -sr_stateConfiguration
System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~
mode: syncmem
site id: 1
site name: DC2
active primary site: 2
primary masters: clusternode1
done.
You can also check in the secondary database which node is the current primary. During the failover it happens to have two primary databases and this information is needed to decide which potential primary database is wrong and needs to be re-registered as secondary.
For additional information, refer to Example: Checking the Status on the Primary and Secondary Systems.
7.1.6. Register secondary node Copy linkLink copied to clipboard!
Preconditions to register a secondary database for a SAP HANA System Replication environment:
- Create SAP HANA backup
- Enable SAP HANA System Replication on the primary node
- Copy database keys
- Register Secondary Node
Registration example:
clusternode1:rh2adm> hdbnsutil -sr_register --remoteHost=clusternode2 --remoteInstance=${TINSTANCE} --replicationMode=syncmem --name=DC1 --online
--operationMode not set; using default from global.ini/[system_replication]/operation_mode: logreplay
adding site ...
collecting information ...
updating local ini files ...
done.
With the registration the global.ini file will be automatically updated
… from:
# global.ini last modified 2023-06-15 09:55:05.665341 by /usr/sap/RH2/HDB02/exe/hdbnsutil -initTopology --workergroup=default --set_user_system_pw
[multidb]
mode = multidb
database_isolation = low
singletenant = yes
[persistence]
basepath_datavolumes = /hana/data/RH2
basepath_logvolumes = /hana/log/RH2
… to:
# global.ini last modified 2023-06-15 11:25:44.516946 by hdbnsutil -sr_register --remoteHost=node2 --remoteInstance=02 --replicationMode=syncmem --name=DC1 --online
[multidb]
mode = multidb
database_isolation = low
singletenant = yes
[persistence]
basepath_datavolumes = /hana/data/RH2
basepath_logvolumes = /hana/log/RH2
[system_replication]
timetravel_logreplay_mode = auto
site_id = 3
mode = syncmem
actual_mode = syncmem
site_name = DC1
operation_mode = logreplay
[system_replication_site_masters]
1 = clusternode2:30201
7.1.7. sapcontrol GetProcessList Copy linkLink copied to clipboard!
Check the processes of an active SAP HANA database
clusternode1:rh2adm> sapcontrol -nr ${TINSTANCE} -function GetProcessList clusternode1: Wed Jun 7 08:23:03 2023
07.06.2023 08:23:03
GetProcessList
OK
name, description, dispstatus, textstatus, starttime, elapsedtime, pid
hdbdaemon, HDB Daemon, GREEN, Running, 2023 06 02 16:59:42, 111:23:21, 4245
hdbcompileserver, HDB Compileserver, GREEN, Running, 2023 06 02 17:01:35, 111:21:28, 7888
hdbindexserver, HDB Indexserver-RH2, GREEN, Running, 2023 06 02 17:01:36, 111:21:27, 7941
hdbnameserver, HDB Nameserver, GREEN, Running, 2023 06 02 17:01:29, 111:21:34, 7594
hdbpreprocessor, HDB Preprocessor, GREEN, Running, 2023 06 02 17:01:35, 111:21:28, 7891
hdbwebdispatcher, HDB Web Dispatcher, GREEN, Running, 2023 06 02 17:01:42, 111:21:21, 8339
hdbxsengine, HDB XSEngine-RH2, GREEN, Running, 2023 06 02 17:01:36, 111:21:27, 7944
7.1.8. sapcontrol GetInstanceList Copy linkLink copied to clipboard!
This will list the status of instances of a SAP HANA database. It will also show the ports. There are three different status names:
- GREEN (running)
- GRAY (stopped)
- YELLOW ( status is currently changing)
Example of an active instance:
clusternode1:rh2adm> sapcontrol -nr ${TINSTANCE} -function GetSystemInstanceList clusternode1: Wed Jun 7 08:24:13 2023
07.06.2023 08:24:13
GetSystemInstanceList
OK
hostname, instanceNr, httpPort, httpsPort, startPriority, features, dispstatus
remotehost3, 2, 50213, 50214, 0.3, HDB|HDB_WORKER, GREEN
Example of a stopped instance:
clusternode1:rh2adm> sapcontrol -nr ${TINSTANCE} -function GetSystemInstanceList
22.06.2023 09:14:55
GetSystemInstanceList
OK
hostname, instanceNr, httpPort, httpsPort, startPriority, features, dispstatus
remotehost3, 2, 50213, 50214, 0.3, HDB|HDB_WORKER, GRAY
7.1.9. hdbcons examples Copy linkLink copied to clipboard!
You can also use the HDB Console to display information about the database:
-
hdbcons -e hdbindexserver 'replication info' -
hdbcons -e hdbindexserver helpfor more options
Example of ‘replication info’:
clusternode1:rh2adm> hdbcons -e hdbindexserver 'replication info'
hdbcons -p `pgrep hdbindex` 'replication info'
SAP HANA DB Management Client Console (type '\?' to get help for client commands)
Try to open connection to server process with PID 451925
SAP HANA DB Management Server Console (type 'help' to get help for server commands)
Executable: hdbindexserver (PID: 451925)
[OK]
--
## Start command at: 2023-06-22 09:05:25.211
listing default statistics for volume 3
System Replication Primary Information
======================================
System Replication Primary Configuration
[system_replication] logshipping_timeout = 30
[system_replication] enable_full_sync = false
[system_replication] preload_column_tables = true
[system_replication] ensure_backup_history = true
[system_replication_communication] enable_ssl = off
[system_replication] keep_old_style_alert = false
[system_replication] enable_log_retention = auto
[system_replication] logshipping_max_retention_size = 1048576
[system_replication] logshipping_async_buffer_size = 268435456
- lastLogPos : 0x4ab2700
- lastLogPosTimestamp : 22.06.2023-07.05.25 (1687417525193952)
- lastConfirmedLogPos : 0x4ab2700
- lastConfirmedLogPosTimestamp: 22.06.2023-07.05.25 (1687417525193952)
- lastSavepointVersion : 1286
- lastSavepointLogPos : 0x4ab0602
- lastSavepointTimestamp : 22.06.2023-07.02.42 (1687417362853007)
2 session registered.
Session index 0
- SiteID : 3
- RemoteHost : 192.168.5.137
Log Connection
- ptr : 0x00007ff04c0a1000
- channel : {<NetworkChannelSSLFilter>={<NetworkChannelBase>={this=140671686293528, fd=70, refCnt=2, idx=5, local=192.168.5.134/40203_tcp, remote=192.168.5.137/40406_tcp, state=Connected, pending=[r---]}}}
- SSLActive : false
- mode : syncmem
Data Connection
- ptr : 0x00007ff08b730000
- channel : {<NetworkChannelSSLFilter>={<NetworkChannelBase>={this=140671436247064, fd=68, refCnt=2, idx=6, local=192.168.5.134/40203_tcp, remote=192.168.5.137/40408_tcp, state=Connected, pending=[r---]}}}
- SSLActive : false
Primary Statistics
- Creation Timestamp : 20.06.2023-13.55.07 (1687269307772532)
- Last Reset Timestamp : 20.06.2023-13.55.07 (1687269307772532)
- Statistic Reset Count : 0
- ReplicationMode : syncmem
- OperationMode : logreplay
- ReplicationStatus : ReplicationStatus_Active
- ReplicationStatusDetails :
- ReplicationFullSync : DISABLED
- shippedLogPos : 0x4ab2700
- shippedLogPosTimestamp : 22.06.2023-07.05.25 (1687417525193952)
- sentLogPos : 0x4ab2700
- sentLogPosTimestamp : 22.06.2023-07.05.25 (1687417525193952)
- sentMaxLogWriteEndPosition : 0x4ab2700
- sentMaxLogWriteEndPositionReqCnt: 0x1f6b8
- shippedLogBuffersCount : 142439
- shippedLogBuffersSize : 805855232 bytes
- shippedLogBuffersSizeUsed : 449305792 bytes (55.76clusternode1:rh2adm>)
- shippedLogBuffersSizeNet : 449013696 bytes (55.72clusternode1:rh2adm>)
- shippedLogBufferDuration : 83898615 microseconds
- shippedLogBufferDurationMin : 152 microseconds
- shippedLogBufferDurationMax : 18879 microseconds
- shippedLogBufferDurationSend : 7301067 microseconds
- shippedLogBufferDurationComp : 0 microseconds
- shippedLogBufferThroughput : 9709099.18 bytes/s
- shippedLogBufferPendingDuration : 80583785 microseconds
- shippedLogBufferRealThrougput : 10073190.40 bytes/s
- replayLogPos : 0x4ab2700
- replayLogPosTimestamp : 22.06.2023-07.05.25 (1687417525193952)
- replayBacklog : 0 microseconds
- replayBacklogSize : 0 bytes
- replayBacklogMax : 822130896 microseconds
- replayBacklogSizeMax : 49455104 bytes
- shippedSavepointVersion : 0
- shippedSavepointLogPos : 0x0
- shippedSavepointTimestamp : not set
- shippedFullBackupCount : 0
- shippedFullBackupSize : 0 bytes
- shippedFullBackupSizeNet : 0 bytes (-nanclusternode1:rh2adm>)
- shippedFullBackupDuration : 0 microseconds
- shippedFullBackupDurationComp : 0 microseconds
- shippedFullBackupThroughput : 0.00 bytes/s
- shippedFullBackupStreamCount : 0
- shippedFullBackupResumeCount : 0
- shippedLastFullBackupSize : 0 bytes
- shippedLastFullBackupSizeNet : 0 bytes (-nanclusternode1:rh2adm>)
- shippedLastFullBackupStart : not set
- shippedLastFullBackupEnd : not set
- shippedLastFullBackupDuration : 0 microseconds
- shippedLastFullBackupStreamCount : 0
- shippedLastFullBackupResumeCount : 0
- shippedDeltaBackupCount : 0
- shippedDeltaBackupSize : 0 bytes
- shippedDeltaBackupSizeNet : 0 bytes (-nanclusternode1:rh2adm>)
- shippedDeltaBackupDuration : 0 microseconds
- shippedDeltaBackupDurationComp : 0 microseconds
- shippedDeltaBackupThroughput : 0.00 bytes/s
- shippedDeltaBackupStreamCount : 0
- shippedDeltaBackupResumeCount : 0
- shippedLastDeltaBackupSize : 0 bytes
- shippedLastDeltaBackupSizeNet : 0 bytes (-nanclusternode1:rh2adm>)
- shippedLastDeltaBackupStart : not set
- shippedLastDeltaBackupEnd : not set
- shippedLastDeltaBackupDuration : 0 microseconds
- shippedLastDeltaBackupStreamCount : 0
- shippedLastDeltaBackupResumeCount : 0
- currentTransferType : None
- currentTransferSize : 0 bytes
- currentTransferPosition : 0 bytes (0clusternode1:rh2adm>)
- currentTransferStartTime : not set
- currentTransferThroughput : 0.00 MB/s
- currentTransferStreamCount : 0
- currentTransferResumeCount : 0
- currentTransferResumeStartTime : not set
- Secondary sync'ed via Log Count : 1
- syncLogCount : 3
- syncLogSize : 62840832 bytes
- backupHistoryComplete : 1
- backupLogPosition : 0x4a99980
- backupLogPositionUpdTimestamp : 22.06.2023-06.56.27 (0x5feb26227e7af)
- shippedMissingLogCount : 0
- shippedMissingLogSize : 0 bytes
- backlogSize : 0 bytes
- backlogTime : 0 microseconds
- backlogSizeMax : 0 bytes
- backlogTimeMax : 0 microseconds
- Secondary Log Connect time : 20.06.2023-13.55.31 (1687269331361049)
- Secondary Data Connect time : 20.06.2023-13.55.33 (1687269333768341)
- Secondary Log Close time : not set
- Secondary Data Close time : 20.06.2023-13.55.31 (1687269331290050)
- Secondary Log Reconnect Count : 0
- Secondary Log Failover Count : 0
- Secondary Data Reconnect Count : 1
- Secondary Data Failover Count : 0
----------------------------------------------------------------
Session index 1
- SiteID : 2
- RemoteHost : 192.168.5.133
Log Connection
- ptr : 0x00007ff0963e4000
- channel : {<NetworkChannelSSLFilter>={<NetworkChannelBase>={this=140671506282520, fd=74, refCnt=2, idx=0, local=192.168.5.134/40203_tcp, remote=192.168.5.133/40404_tcp, state=Connected, pending=[r---]}}}
- SSLActive : false
- mode : syncmem
Data Connection
- ptr : 0x00007ff072c04000
- channel : {<NetworkChannelSSLFilter>={<NetworkChannelBase>={this=140671463146520, fd=75, refCnt=2, idx=1, local=192.168.5.134/40203_tcp, remote=192.168.5.133/40406_tcp, state=Connected, pending=[r---]}}}
- SSLActive : false
Primary Statistics
- Creation Timestamp : 20.06.2023-13.55.49 (1687269349892111)
- Last Reset Timestamp : 20.06.2023-13.55.49 (1687269349892111)
- Statistic Reset Count : 0
- ReplicationMode : syncmem
- OperationMode : logreplay
- ReplicationStatus : ReplicationStatus_Active
- ReplicationStatusDetails :
- ReplicationFullSync : DISABLED
- shippedLogPos : 0x4ab2700
- shippedLogPosTimestamp : 22.06.2023-07.05.25 (1687417525193952)
- sentLogPos : 0x4ab2700
- sentLogPosTimestamp : 22.06.2023-07.05.25 (1687417525193952)
- sentMaxLogWriteEndPosition : 0x4ab2700
- sentMaxLogWriteEndPositionReqCnt: 0x1f377
- shippedLogBuffersCount : 142326
- shippedLogBuffersSize : 793939968 bytes
- shippedLogBuffersSizeUsed : 437675200 bytes (55.13clusternode1:rh2adm>)
- shippedLogBuffersSizeNet : 437565760 bytes (55.11clusternode1:rh2adm>)
- shippedLogBufferDuration : 76954026 microseconds
- shippedLogBufferDurationMin : 115 microseconds
- shippedLogBufferDurationMax : 19285 microseconds
- shippedLogBufferDurationSend : 2951495 microseconds
- shippedLogBufferDurationComp : 0 microseconds
- shippedLogBufferThroughput : 10446578.53 bytes/s
- shippedLogBufferPendingDuration : 73848247 microseconds
- shippedLogBufferRealThrougput : 10875889.97 bytes/s
- replayLogPos : 0x4ab2700
- replayLogPosTimestamp : 22.06.2023-07.05.25 (1687417525193952)
- replayBacklog : 0 microseconds
- replayBacklogSize : 0 bytes
- replayBacklogMax : 113119944 microseconds
- replayBacklogSizeMax : 30171136 bytes
- shippedSavepointVersion : 0
- shippedSavepointLogPos : 0x0
- shippedSavepointTimestamp : not set
- shippedFullBackupCount : 0
- shippedFullBackupSize : 0 bytes
- shippedFullBackupSizeNet : 0 bytes (-nanclusternode1:rh2adm>)
- shippedFullBackupDuration : 0 microseconds
- shippedFullBackupDurationComp : 0 microseconds
- shippedFullBackupThroughput : 0.00 bytes/s
- shippedFullBackupStreamCount : 0
- shippedFullBackupResumeCount : 0
- shippedLastFullBackupSize : 0 bytes
- shippedLastFullBackupSizeNet : 0 bytes (-nanclusternode1:rh2adm>)
- shippedLastFullBackupStart : not set
- shippedLastFullBackupEnd : not set
- shippedLastFullBackupDuration : 0 microseconds
- shippedLastFullBackupStreamCount : 0
- shippedLastFullBackupResumeCount : 0
- shippedDeltaBackupCount : 0
- shippedDeltaBackupSize : 0 bytes
- shippedDeltaBackupSizeNet : 0 bytes (-nanclusternode1:rh2adm>)
- shippedDeltaBackupDuration : 0 microseconds
- shippedDeltaBackupDurationComp : 0 microseconds
- shippedDeltaBackupThroughput : 0.00 bytes/s
- shippedDeltaBackupStreamCount : 0
- shippedDeltaBackupResumeCount : 0
- shippedLastDeltaBackupSize : 0 bytes
- shippedLastDeltaBackupSizeNet : 0 bytes (-nanclusternode1:rh2adm>)
- shippedLastDeltaBackupStart : not set
- shippedLastDeltaBackupEnd : not set
- shippedLastDeltaBackupDuration : 0 microseconds
- shippedLastDeltaBackupStreamCount : 0
- shippedLastDeltaBackupResumeCount : 0
- currentTransferType : None
- currentTransferSize : 0 bytes
- currentTransferPosition : 0 bytes (0clusternode1:rh2adm>)
- currentTransferStartTime : not set
- currentTransferThroughput : 0.00 MB/s
- currentTransferStreamCount : 0
- currentTransferResumeCount : 0
- currentTransferResumeStartTime : not set
- Secondary sync'ed via Log Count : 1
- syncLogCount : 3
- syncLogSize : 61341696 bytes
- backupHistoryComplete : 1
- backupLogPosition : 0x4a99980
- backupLogPositionUpdTimestamp : 22.06.2023-06.56.27 (0x5feb26227e670)
- shippedMissingLogCount : 0
- shippedMissingLogSize : 0 bytes
- backlogSize : 0 bytes
- backlogTime : 0 microseconds
- backlogSizeMax : 0 bytes
- backlogTimeMax : 0 microseconds
- Secondary Log Connect time : 20.06.2023-13.56.21 (1687269381053599)
- Secondary Data Connect time : 20.06.2023-13.56.27 (1687269387399610)
- Secondary Log Close time : not set
- Secondary Data Close time : 20.06.2023-13.56.21 (1687269381017244)
- Secondary Log Reconnect Count : 0
- Secondary Log Failover Count : 0
- Secondary Data Reconnect Count : 1
- Secondary Data Failover Count : 0
----------------------------------------------------------------
[OK]
## Finish command at: 2023-06-22 09:05:25.212 command took: 572.000 usec
--
[EXIT]
--
[BYE]
Example of help:
clusternode1:rh2adm> hdbcons -e hdbindexserver help
SAP HANA DB Management Client Console (type '\?' to get help for client commands)
Try to open connection to server process with PID 451925
SAP HANA DB Management Server Console (type 'help' to get help for server commands)
Executable: hdbindexserver (PID: 451925)
[OK]
--
## Start command at: 2023-06-22 09:07:16.784
Synopsis:
help [<command name>]: Print command help
- <command name> - Command name for which to display help
Available commands:
ae_tableload - Handle loading of column store tables and columns
all - Print help and other info for all hdbcons commands
authentication - Authentication management.
binarysemaphore - BinarySemaphore management
bye - Exit console client
cd - ContainerDirectory management
cfgreg - Basis Configurator
checktopic - CheckTopic management
cnd - ContainerNameDirectory management
conditionalvariable - ConditionalVariable management
connection - Connection management
context - Execution context management (i.e., threads)
converter - Converter management
cpuresctrl - Manage cpu resources such as last-level cache allocation
crash - Crash management
crypto - Cryptography management (SSL/SAML/X509/Encryption).
csaccessor - Display diagnostics related to the CSAccessor library
ddlcontextstore - Get DdlContextStore information
deadlockdetector - Deadlock detector.
debug - Debug management
distribute - Handling distributed systems
dvol - DataVolume management
ELF - ELF symbol resolution management
encryption - Persistence encryption management
eslog - Manipulate logger on extended storage
event - Event management
exit - Exit console client
flightrecorder - Flight Recorder
hananet - HANA-Net command interface
help - Display help for a command or command list
hkt - HANA Kernal Tracer (HKT) management
indexmanager - Get IndexManager information, especially for IndexHandles
itab - Internaltable diagnostics
jexec - Information and actions for Job Executor/Scheduler
licensing - Licensing management.
log - Show information about logger and manipulate logger
machine - Information about the machine topology
mm - Memory management
monitor - Monitor view command
mproxy - Malloc proxy management
msl - Mid size LOB management
mutex - Mutex management
numa - Provides NUMA statistics for all columns of a given table, broken down by column constituents like dictionary, data vector and index.
nvmprovider - NVM Provider
output - Command for managing output from the hdbcons
page - Page management
pageaccess - PageAccess management
profiler - Profiler
quit - Exit console client
readwritelock - ReadWriteLock management
replication - Monitor data and log replication
resman - ResourceManager management
rowstore - Row Store
runtimedump - Generate a runtime dump.
savepoint - Savepoint management
semaphore - Semaphore management
servicethreads - Thread information M_SERVICE_THREADS
snapshot - Snapshot management
stat - Statistics management
statisticsservercontroller - StatisticsServer internals
statreg - Statistics registry command
syncprimi - Syncprimitive management (Mutex, CondVariable, Semaphore, BinarySemaphore,
ReadWriteLock)
table - Table Management
tablepreload - Manage and monitor table preload
trace - Trace management
tracetopic - TraceTopic management
transaction - Transaction management
ut - UnifiedTable Management
version - Version management
vf - VirtualFile management
x2 - get X2 info
[OK]
## Finish command at: 2023-06-22 09:07:16.785 command took: 209.000 usec
--
[EXIT]
--
[BYE]
7.1.10. Create SAP HANA backup Copy linkLink copied to clipboard!
If you want to use SAP HANA System Replication, a backup must first be created on the primary system.
Example of how to perform this is as user <sid>adm:
clusternode1:rh2adm> hdbsql -i ${TINSTANCE} -u system -d SYSTEMDB "BACKUP DATA USING FILE ('/hana/backup/')"
clusternode1:rh2adm> hdbsql -i ${TINSTANCE} -u system -d ${SAPSYSTEMNAME} "BACKUP DATA USING FILE ('/hana/backup/')"
7.1.11. Enable SAP HANA System Replication on the primary database Copy linkLink copied to clipboard!
SAP HANA System Replication has to be enabled on the primary node. This requires a backup to be done first.
clusternode1:rh2adm> hdbnsutil -sr_enable --name=DC1
nameserver is active, proceeding ...
successfully enabled system as system replication source site
done.
7.1.12. Copy database keys to the secondary nodes Copy linkLink copied to clipboard!
The database keys need to be copied from the primary to the secondary database before it can be registered as a secondary.
For example:
clusternode1:rh2adm> scp -rp /usr/sap/${SAPSYSTEMNAME}/SYS/global/security/rsecssfs/data/SSFS_${SAPSYSTEMNAME}.DAT remotehost3:/usr/sap/${SAPSYSTEMNAME}/SYS/global/security/rsecssfs/data/SSFS_${SAPSYSTEMNAME}.DAT
clusternode1:rh2adm> scp -rp /usr/sap/${SAPSYSTEMNAME}/SYS/global/security/rsecssfs/key/SSFS_${SAPSYSTEMNAME}.KEY remotehost3:/usr/sap/${SAPSYSTEMNAME}/SYS/global/security/rsecssfs/key/SSFS_${SAPSYSTEMNAME}.KEY
7.1.13. Register a secondary node for SAP HANA System Replication Copy linkLink copied to clipboard!
Please ensure that the database keys have been copied to the secondary nodes first. Then run the registration command:
clusternode1:rh2adm> hdbnsutil -sr_register --remoteHost=remotehost3 --remoteInstance=${TINSTANCE} --replicationMode=syncmem --name=DC1 --remoteName=DC3 --operationMode=logreplay --online
Parameter description:
-
remoteHost: hostname of the active node running the source (primary) database -
remoteInstance: the instance number of the database replicationMode: one of the following options-
sync: hard disk synchronization -
async: asynchronous replication -
syncmem: memory synchronization
-
-
name: this is an alias for this replication site -
remoteName: alias name of the source database operationMode: one of the following options-
delta_datashipping: data is periodically transmitted. Takeovers take a little bit longer. -
logreplay: logs are redone immediately on the remote site. Takeover is faster. -
logreplay_readaccess: additional logreplay read-only access to the second site is possible.
-
7.1.14. Check the log_mode of the SAP HANA database Copy linkLink copied to clipboard!
There are two options for setting the log_mode:
-
log_mode=overwrite -
log_mode=normal: This is the default value and is also required when the database instance is running as primary. Using SAP HANA Multitarget System Replication, you have to uselog_mode=normal. The best way to check thelog_modeis by usinghdbsql:
Example including a wrong overwrite entry:
clusternode1:rh2adm> hdbsql -i ${TINSTANCE} -d ${SAPSYSTEMNAME} -u system
Password:
Welcome to the SAP HANA Database interactive terminal.
Type: \h for help with commands
\q to quit
hdbsql RH2=> select * from m_inifile_contents where key='log_mode'
FILE_NAME,LAYER_NAME,TENANT_NAME,HOST,SECTION,KEY,VALUE
"global.ini","DEFAULT","","","persistence","log_mode","normal"
"global.ini","HOST","","node2","persistence","log_mode","overwrite"
2 rows selected (overall time 46.931 msec; server time 30.845 msec)
hdbsql RH2=>exit
In this case, we have two global.ini files:
DEFAULT-
/usr/sap/${SAPSYSTEMNAME}/SYS/global/hdb/custom/config/global.ini
-
HOST-
/hana/shared/${SAPSYSTEMNAME}/HDB${TINSTANCE}/${HOSTNAME}/global.iniTheHOSTvalues overwrite theDEFAULTvalues. You can also check both files before the database is started and then usehdbsqlagain to verify the right settings. You can change thelog_modeby editing theglobal.inifile.
-
Example:
clusternode1:rh2adm> vim /hana/shared/${SAPSYSTEMNAME}/HDB${TINSTANCE}/${HOSTNAME}/global.ini
# global.ini last modified 2023-04-06 16:15:03.521715 by hdbnameserver
[persistence]
log_mode = overwrite
# global.ini last modified 2023-04-06 16:15:03.521715 by hdbnameserver
[persistence]
log_mode = normal
After having checked or updated the global.ini file(s), verify the log_mode values:
clusternode1:rh2adm> hdbsql -d ${SAPSYSTEMNAME} -i ${TINSTANCE} -u SYSTEM;
hdbsql RH2=> select * from m_inifile_contents where section='persistence' and key='log_mode'
FILE_NAME,LAYER_NAME,TENANT_NAME,HOST,SECTION,KEY,VALUE
"global.ini","DEFAULT","","","persistence","log_mode","normal"
"global.ini","HOST","","node2","persistence","log_mode","normal"
2 rows selected (overall time 60.982 msec; server time 20.420 msec)
The section also shows that this parameter needs to be set in the [persistence] section. When you change the log mode from overwrite to normal, it is recommended that you create a full data backup to ensure that the database can be recovered.
7.1.15. Discover primary database Copy linkLink copied to clipboard!
There are several ways to identify the primary node, for instance:
-
pcs status | grep Promoted -
hdbnsutil -sr_stateConfiguration -
systemReplicationStatus.py
Option 1 - The following example of the systemReplicationStatus.py script and filter will return the primary database location on all nodes:
clusternode1:rh2adm>
/usr/sap/${SAPSYSTEMNAME}/HDB${TINSTANCE}/exe/Python/bin/python
/usr/sap/${SAPSYSTEMNAME}/HDB${TINSTANCE}/exe/python_support/systemReplicationStatus.py --sapcontrol=1 | egrep -e
"3${TINSTANCE}01/HOST|PRIMARY_MASTERS"| head -1 | awk -F"=" '{ print $2 }'
Output:
clusternode2
Option 2 - The following example displays the systemReplicationStatus in a similar way for all nodes:
rh2adm>hdbnsutil -sr_state --sapcontrol=1 | grep site.*Mode
Output:
siteReplicationMode/DC1=primary
siteReplicationMode/DC3=async
siteReplicationMode/DC2=syncmem
siteOperationMode/DC1=primary
siteOperationMode/DC3=logreplay
siteOperationMode/DC2=logreplay
7.1.16. Takeover primary Copy linkLink copied to clipboard!
Please refer to Check the Replication status section for check on the primary and the secondary nodes. Also:
-
Put cluster into
maintenance-mode - Initiate the takeover on the secondary node
Example for enabling maintenance-mode for the cluster:
[root@clusternode1]# pcs property set maintenance-mode=true
On the secondary that is to become the new primary, run as <sidadm> user:
clusternode1:rh2adm> hdbnsutil -sr_takeover
This secondary becomes the primary, other active secondary databases get re-registered to the new primary and the old primary needs to be manually re-registered as secondary.
7.1.17. Re-register former primary as secondary Copy linkLink copied to clipboard!
Please ensure that the cluster is stopped or put in maintenance-mode. Example:
clusternode2:rh2adm> hdbnsutil -sr_register --remoteHost=remotehost3 --remoteInstance=${TINSTANCE} --replicationMode=syncmem --name=DC2 --online --remoteName=DC3 --operationMode=logreplay --force_full_replica --online
In our examples, we are using full replication. Your SAP HANA system administrator should know when full replication is required.
7.1.18. Recover from failover Copy linkLink copied to clipboard!
Please refer to Check the SAP HANA System Replication status and Discover the primary node. It is important that the information is consistent. If a node is not part of the systemReplicationStatus.py output and has a different system replication state, please check with your database administrator if this node needs to be re-registered.
One way of solving this is to re-register this site as a new secondary.
Sometimes a secondary instance will still not come up. Then unregister this site before you re-register it again. Example of unregistering the secondary DC1:
clusternode1:rh2adm> hdbnsutil -sr_unregister --name=DC1
Example of re-registering DC1:
clusternode1:rh2adm> hdbnsutil -sr_register --name=DC1 --remoteHost=node2 --remoteInstance=02 --replicationMode=sync --operationMode=logreplay --online
The database needs to be started and checked if it is running. Finally check the replication status.
7.2. Pacemaker commands Copy linkLink copied to clipboard!
7.2.1. Start and stop the cluster Copy linkLink copied to clipboard!
To start the cluster on all nodes execute the following command:
# pcs cluster start -all
After a reboot, the cluster will be started automatically only if the service is enabled. The command will help to know if the cluster has started and if the daemons are enabled to be autostarted.
# pcs cluster status
The cluster auto-start can be enabled with:
# pcs cluster enable --all
Other options are:
- Stop the cluster.
- Put a node into standby.
-
Put the cluster into
maintenance-mode.
For more details, please check the pcs cluster help:
# pcs cluster stop --all
# pcs cluster help
7.2.2. Put the cluster into maintenance-mode Copy linkLink copied to clipboard!
If you want to make changes and you want to avoid interference bythe pacemaker cluster, you can "freeze" the cluster by putting it into maintenance-mode:
# pcs property set maintenance-mode=true
An easy way to verify maintenance-mode is to check if the resources are unmanaged:
# pcs resource
* Clone Set: SAPHanaTopology_RH2_02-clone [SAPHanaTopology_RH2_02] (unmanaged):
* SAPHanaTopology_RH2_02 (ocf:heartbeat:SAPHanaTopology): Started clusternode1 (unmanaged)
* SAPHanaTopology_RH2_02 (ocf:heartbeat:SAPHanaTopology): Started clusternode2 (unmanaged)
* Clone Set: SAPHana_RH2_02-clone [SAPHana_RH2_02] (promotable, unmanaged):
* SAPHana_RH2_02 (ocf:heartbeat:SAPHana): Unpromoted clusternode1 (unmanaged)
* SAPHana_RH2_02 (ocf:heartbeat:SAPHana): Promoted clusternode2 (unmanaged)
* vip_RH2_02_MASTER (ocf:heartbeat:IPaddr2): Started clusternode2 (unmanaged)
Refresh cluster resources to detect the resource state while the cluster is in maintenance-mode and does not update resource status changes:
# pcs resource refresh
This will indicate if anything is not yet correct and will cause remediation action by the cluster, as soon as it is taken out of maintenance-mode.
Remove the maintenance-mode by running:
# pcs property set maintenance-mode=false
Now the cluster will continue to work. If something is configured wrong, it will react now.
7.2.3. Check cluster status Copy linkLink copied to clipboard!
Following are several ways to check the cluster status:
Check if the cluster is running:
# pcs cluster statusCheck the cluster and all resources:
# pcs statusCheck the cluster, all resources and all node attributes:
# pcs status --fullCheck the resources only:
# pcs resource status --fullCheck
Stonithhistory:# pcs stonith historyCheck location constraints:
# pcs constraint location
Fencing must be configured and tested. In order to obtain a solution that is as automated as possible, the cluster must be constantly activated, which will then enable the cluster to automatically start after a reboot. In a production environment, disabling the restart allows manual intervention, for instance after a crash. Please also check the daemon status.
Example:
# pcs status --full
Cluster name: cluster1
Status of pacemakerd: 'Pacemaker is running' (last updated 2023-06-22 17:56:01 +02:00)
Cluster Summary:
* Stack: corosync
* Current DC: clusternode2 (2) (version 2.1.5-7.el9-a3f44794f94) - partition with quorum
* Last updated: Thu Jun 22 17:56:01 2023
* Last change: Thu Jun 22 17:53:34 2023 by root via crm_attribute on clusternode1
* 2 nodes configured
* 6 resource instances configured
Node List:
* Node clusternode1 (1): online, feature set 3.16.2
* Node clusternode2 (2): online, feature set 3.16.2
Full List of Resources:
* h7fence (stonith:fence_rhevm): Started clusternode2
* Clone Set: SAPHanaTopology_RH2_02-clone [SAPHanaTopology_RH2_02]:
* SAPHanaTopology_RH2_02 (ocf:heartbeat:SAPHanaTopology): Started clusternode1
* SAPHanaTopology_RH2_02 (ocf:heartbeat:SAPHanaTopology): Started clusternode2
* Clone Set: SAPHana_RH2_02-clone [SAPHana_RH2_02] (promotable):
* SAPHana_RH2_02 (ocf:heartbeat:SAPHana): Promoted clusternode1
* SAPHana_RH2_02 (ocf:heartbeat:SAPHana): Unpromoted clusternode2
* vip_RH2_02_MASTER (ocf:heartbeat:IPaddr2): Started clusternode1
Node Attributes:
* Node: clusternode1 (1):
* hana_rh2_clone_state : PROMOTED
* hana_rh2_op_mode : logreplay
* hana_rh2_remoteHost : clusternode2
* hana_rh2_roles : 4:P:master1:master:worker:master
* hana_rh2_site : DC1
* hana_rh2_sra : -
* hana_rh2_srah : -
* hana_rh2_srmode : syncmem
* hana_rh2_sync_state : PRIM
* hana_rh2_version : 2.00.059.02
* hana_rh2_vhost : clusternode1
* lpa_rh2_lpt : 1687449214
* master-SAPHana_RH2_02 : 150
* Node: clusternode2 (2):
* hana_rh2_clone_state : DEMOTED
* hana_rh2_op_mode : logreplay
* hana_rh2_remoteHost : clusternode1
* hana_rh2_roles : 4:S:master1:master:worker:master
* hana_rh2_site : DC2
* hana_rh2_sra : -
* hana_rh2_srah : -
* hana_rh2_srmode : syncmem
* hana_rh2_sync_state : SOK
* hana_rh2_version : 2.00.059.02
* hana_rh2_vhost : clusternode2
* lpa_rh2_lpt : 30
* master-SAPHana_RH2_02 : 100
Migration Summary:
Tickets:
PCSD Status:
clusternode1: Online
clusternode2: Online
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
7.2.4. Check resource states Copy linkLink copied to clipboard!
Use pcs resource to check the status of all resources. This prints the list and the current status of the resources.
Example:
# pcs resource
* Clone Set: SAPHanaTopology_RH2_02-clone [SAPHanaTopology_RH2_02]:
* Started: [ clusternode1 clusternode2 ]
* Clone Set: SAPHana_RH2_02-clone [SAPHana_RH2_02] (promotable):
* Promoted: [ clusternode1 ]
* Unpromoted: [ clusternode2 ]
* vip_RH2_02_MASTER (ocf:heartbeat:IPaddr2): Started clusternode1
7.2.5. Check resource config Copy linkLink copied to clipboard!
The following displays the current resource configuration:
# pcs resource config
Resource: vip_RH2_02_MASTER (class=ocf provider=heartbeat type=IPaddr2)
Attributes: vip_RH2_02_MASTER-instance_attributes
ip=192.168.5.136
Operations:
monitor: vip_RH2_02_MASTER-monitor-interval-10s
interval=10s
timeout=20s
start: vip_RH2_02_MASTER-start-interval-0s
interval=0s
timeout=20s
stop: vip_RH2_02_MASTER-stop-interval-0s
interval=0s
timeout=20s
Clone: SAPHanaTopology_RH2_02-clone
Meta Attributes: SAPHanaTopology_RH2_02-clone-meta_attributes
clone-max=2
clone-node-max=1
interleave=true
Resource: SAPHanaTopology_RH2_02 (class=ocf provider=heartbeat type=SAPHanaTopology)
Attributes: SAPHanaTopology_RH2_02-instance_attributes
InstanceNumber=02
SID=RH2
Operations:
methods: SAPHanaTopology_RH2_02-methods-interval-0s
interval=0s
timeout=5
monitor: SAPHanaTopology_RH2_02-monitor-interval-10
interval=10
timeout=600
reload: SAPHanaTopology_RH2_02-reload-interval-0s
interval=0s
timeout=5
start: SAPHanaTopology_RH2_02-start-interval-0s
interval=0s
timeout=600
stop: SAPHanaTopology_RH2_02-stop-interval-0s
interval=0s
timeout=600
Clone: SAPHana_RH2_02-clone
Meta Attributes: SAPHana_RH2_02-clone-meta_attributes
clone-max=2
clone-node-max=1
interleave=true
notify=true
promotable=true
Resource: SAPHana_RH2_02 (class=ocf provider=heartbeat type=SAPHana)
Attributes: SAPHana_RH2_02-instance_attributes
AUTOMATED_REGISTER=true
DUPLICATE_PRIMARY_TIMEOUT=300
HANA_CALL_TIMEOUT=10
InstanceNumber=02
PREFER_SITE_TAKEOVER=true
SID=RH2
Operations:
demote: SAPHana_RH2_02-demote-interval-0s
interval=0s
timeout=3600
methods: SAPHana_RH2_02-methods-interval-0s
interval=0s
timeout=5
monitor: SAPHana_RH2_02-monitor-interval-251
interval=251
timeout=700
role=Unpromoted
monitor: SAPHana_RH2_02-monitor-interval-249
interval=249
timeout=700
role=Promoted
promote: SAPHana_RH2_02-promote-interval-0s
interval=0s
timeout=3600
reload: SAPHana_RH2_02-reload-interval-0s
interval=0s
timeout=5
start: SAPHana_RH2_02-start-interval-0s
interval=0s
timeout=3200
stop: SAPHana_RH2_02-stop-interval-0s
interval=0s
timeout=3100
This lists all the parameters which are used to configure the installed and configured resource agent.
7.2.6. SAPHana resource option AUTOMATED_REGISTER=true Copy linkLink copied to clipboard!
If this option is used in the SAPHana resource, pacemaker will automatically re-register the secondary database.
It is recommended to use this option for the first tests. When using AUTOMATED_REGISTER=false the administrator needs to re-register the secondary node manually.
7.2.7. Resource handling Copy linkLink copied to clipboard!
There are several options for managing resources. For more information, please check out the help available:
# pcs resource help
List the used resource agents:
# pcs resource config | grep "type=" | awk -F"type=" '{ print $2 }' | sed -e "s/)//g"
Example output:
IPaddr2
SAPHanaTopology
SAPHana
Display specific resource agent description and configuration parameters:
# pcs resource describe <resource agent>
Example (without output):
# pcs resource describe IPaddr2
Example of resource agent IPaddr2 (with output):
Assumed agent name 'ocf:heartbeat:IPaddr2' (deduced from 'IPaddr2')
ocf:heartbeat:IPaddr2 - Manages virtual IPv4 and IPv6 addresses (Linux specific version)
This Linux-specific resource manages IP alias IP addresses. It can add an IP alias, or remove one. In
addition, it can implement Cluster Alias IP functionality if invoked as a clone resource. If used as a
clone, "shared address with a trivial, stateless (autonomous) load-balancing/mutual exclusion on
ingress" mode gets applied (as opposed to "assume resource uniqueness" mode otherwise). For that, Linux
firewall (kernel and userspace) is assumed, and since recent distributions are ambivalent in plain
"iptables" command to particular back-end resolution, "iptables-legacy" (when present) gets prioritized
so as to avoid incompatibilities (note that respective ipt_CLUSTERIP firewall extension in use here is,
at the same time, marked deprecated, yet said "legacy" layer can make it workable, literally, to this
day) with "netfilter" one (as in "iptables-nft"). In that case, you should explicitly set clone-node-max
>= 2, and/or clone-max < number of nodes. In case of node failure, clone instances need to be re-
allocated on surviving nodes. This would not be possible if there is already an instance on those nodes,
and clone-node-max=1 (which is the default). When the specified IP address gets assigned to a
respective interface, the resource agent sends unsolicited ARP (Address Resolution Protocol, IPv4) or NA
(Neighbor Advertisement, IPv6) packets to inform neighboring machines about the change. This
functionality is controlled for both IPv4 and IPv6 by shared 'arp_*' parameters.
Resource options:
ip (required) (unique): The IPv4 (dotted quad notation) or IPv6 address (colon hexadecimal notation)
example IPv4 "192.168.1.1". example IPv6 "2001:db8:DC28:0:0:FC57:D4C8:1FFF".
nic: The base network interface on which the IP address will be brought online. If left empty, the
script will try and determine this from the routing table. Do NOT specify an alias interface in
the form eth0:1 or anything here; rather, specify the base interface only. If you want a label,
see the iflabel parameter. Prerequisite: There must be at least one static IP address, which is
not managed by the cluster, assigned to the network interface. If you can not assign any static IP
address on the interface, modify this kernel parameter: sysctl -w
net.ipv4.conf.all.promote_secondaries=1 # (or per device)
cidr_netmask: The netmask for the interface in CIDR format (e.g., 24 and not 255.255.255.0) If
unspecified, the script will also try to determine this from the routing table.
broadcast: Broadcast address associated with the IP. It is possible to use the special symbols '+' and
'-' instead of the broadcast address. In this case, the broadcast address is derived by
setting/resetting the host bits of the interface prefix.
iflabel: You can specify an additional label for your IP address here. This label is appended to your
interface name. The kernel allows alphanumeric labels up to a maximum length of 15 characters
including the interface name and colon (e.g. eth0:foobar1234) A label can be specified in nic
parameter but it is deprecated. If a label is specified in nic name, this parameter has no effect.
lvs_support: Enable support for LVS Direct Routing configurations. In case a IP address is stopped,
only move it to the loopback device to allow the local node to continue to service requests, but
no longer advertise it on the network. Notes for IPv6: It is not necessary to enable this option
on IPv6. Instead, enable 'lvs_ipv6_addrlabel' option for LVS-DR usage on IPv6.
lvs_ipv6_addrlabel: Enable adding IPv6 address label so IPv6 traffic originating from the address's
interface does not use this address as the source. This is necessary for LVS-DR health checks to
realservers to work. Without it, the most recently added IPv6 address (probably the address added
by IPaddr2) will be used as the source address for IPv6 traffic from that interface and since that
address exists on loopback on the realservers, the realserver response to pings/connections will
never leave its loopback. See RFC3484 for the detail of the source address selection. See also
'lvs_ipv6_addrlabel_value' parameter.
lvs_ipv6_addrlabel_value: Specify IPv6 address label value used when 'lvs_ipv6_addrlabel' is enabled.
The value should be an unused label in the policy table which is shown by 'ip addrlabel list'
command. You would rarely need to change this parameter.
mac: Set the interface MAC address explicitly. Currently only used in case of the Cluster IP Alias.
Leave empty to chose automatically.
clusterip_hash: Specify the hashing algorithm used for the Cluster IP functionality.
unique_clone_address: If true, add the clone ID to the supplied value of IP to create a unique address
to manage
arp_interval: Specify the interval between unsolicited ARP (IPv4) or NA (IPv6) packets in
milliseconds. This parameter is deprecated and used for the backward compatibility only. It is
effective only for the send_arp binary which is built with libnet, and send_ua for IPv6. It has no
effect for other arp_sender.
arp_count: Number of unsolicited ARP (IPv4) or NA (IPv6) packets to send at resource initialization.
arp_count_refresh: For IPv4, number of unsolicited ARP packets to send during resource monitoring.
Doing so helps mitigate issues of stuck ARP caches resulting from split-brain situations.
arp_bg: Whether or not to send the ARP (IPv4) or NA (IPv6) packets in the background. The default is
true for IPv4 and false for IPv6.
arp_sender: For IPv4, the program to send ARP packets with on start. Available options are: -
send_arp: default - ipoibarping: default for infiniband interfaces if ipoibarping is available -
iputils_arping: use arping in iputils package - libnet_arping: use another variant of arping
based on libnet
send_arp_opts: For IPv4, extra options to pass to the arp_sender program. Available options are vary
depending on which arp_sender is used. A typical use case is specifying '-A' for iputils_arping
to use ARP REPLY instead of ARP REQUEST as Gratuitous ARPs.
flush_routes: Flush the routing table on stop. This is for applications which use the cluster IP
address and which run on the same physical host that the IP address lives on. The Linux kernel may
force that application to take a shortcut to the local loopback interface, instead of the
interface the address is really bound to. Under those circumstances, an application may, somewhat
unexpectedly, continue to use connections for some time even after the IP address is deconfigured.
Set this parameter in order to immediately disable said shortcut when the IP address goes away.
run_arping: For IPv4, whether or not to run arping for collision detection check.
nodad: For IPv6, do not perform Duplicate Address Detection when adding the address.
noprefixroute: Use noprefixroute flag (see 'man ip-address').
preferred_lft: For IPv6, set the preferred lifetime of the IP address. This can be used to ensure that
the created IP address will not be used as a source address for routing. Expects a value as
specified in section 5.5.4 of RFC 4862.
network_namespace: Specifies the network namespace to operate within. The namespace must already
exist, and the interface to be used must be within the namespace.
Default operations:
start:
interval=0s
timeout=20s
stop:
interval=0s
timeout=20s
monitor:
interval=10s
timeout=20s
If the cluster is stopped, all the resources will be stopped as well; if the cluster is put into maintenance-mode, all resources remain in their current status but will not be monitored or managed.
7.2.8. Cluster property handling for maintenance-mode Copy linkLink copied to clipboard!
List all defined properties:
[root@clusternode1] pcs property
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: cluster1
concurrent-fencing: true
dc-version: 2.1.5-7.el9-a3f44794f94
hana_rh2_site_srHook_DC1: PRIM
hana_rh2_site_srHook_DC2: SFAIL
have-watchdog: false
last-lrm-refresh: 1688548036
maintenance-mode: true
priority-fencing-delay: 10s
stonith-enabled: true
stonith-timeout: 900
To reconfigure the database, the cluster must be instructed to ignore any changes until the configuration is complete. You can put the cluster into maintenance-mode using:
# pcs property set maintenance-mode=true
Check the maintenance-mode:
# pcs resource
* Clone Set: SAPHanaTopology_RH2_02-clone [SAPHanaTopology_RH2_02] (unmanaged):
* SAPHanaTopology_RH2_02 (ocf:heartbeat:SAPHanaTopology): Started clusternode1 (unmanaged)
* SAPHanaTopology_RH2_02 (ocf:heartbeat:SAPHanaTopology): Started clusternode2 (unmanaged)
* Clone Set: SAPHana_RH2_02-clone [SAPHana_RH2_02] (promotable, unmanaged):
* SAPHana_RH2_02 (ocf:heartbeat:SAPHana): Promoted clusternode1 (unmanaged)
* SAPHana_RH2_02 (ocf:heartbeat:SAPHana): Unpromoted clusternode2 (unmanaged)
* vip_RH2_02_MASTER (ocf:heartbeat:IPaddr2): Started clusternode1 (unmanaged)
Verify that all resources are "unmanaged":
[root@clusternode1]# pcs status
Cluster name: cluster1
Status of pacemakerd: 'Pacemaker is running' (last updated 2023-06-27 16:02:15 +02:00)
Cluster Summary:
* Stack: corosync
* Current DC: clusternode2 (version 2.1.5-7.el9-a3f44794f94) - partition with quorum
* Last updated: Tue Jun 27 16:02:16 2023
* Last change: Tue Jun 27 16:02:14 2023 by root via cibadmin on clusternode1
* 2 nodes configured
* 6 resource instances configured
*** Resource management is DISABLED ***
The cluster will not attempt to start, stop or recover services
Node List:
* Online: [ clusternode1 clusternode2 ]
Full List of Resources:
* h7fence (stonith:fence_rhevm): Started clusternode2 (unmanaged)
* Clone Set: SAPHanaTopology_RH2_02-clone [SAPHanaTopology_RH2_02] (unmanaged):
* SAPHanaTopology_RH2_02 (ocf:heartbeat:SAPHanaTopology): Started clusternode1 (unmanaged)
* SAPHanaTopology_RH2_02 (ocf:heartbeat:SAPHanaTopology): Started clusternode2 (unmanaged)
* Clone Set: SAPHana_RH2_02-clone [SAPHana_RH2_02] (promotable, unmanaged):
* SAPHana_RH2_02 (ocf:heartbeat:SAPHana): Promoted clusternode1 (unmanaged)
* SAPHana_RH2_02 (ocf:heartbeat:SAPHana): Unpromoted clusternode2 (unmanaged)
* vip_RH2_02_MASTER (ocf:heartbeat:IPaddr2): Started clusternode1 (unmanaged)
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
The resources will switch back to managed if you unset the maintenance-mode:
# pcs property set maintenance-mode=false
7.2.9. Failover the SAPHana resource using Move Copy linkLink copied to clipboard!
A simple example of how to failover the SAP HANA database is to use the pcs resource move command. You need to use the clone resource name and move the resource as shown below:
# pcs resource move <SAPHana-clone-resource>
In this example, the clone resource is SAPHana_RH2_02-clone:
[root@clusternode1]# pcs resource
* Clone Set: SAPHanaTopology_RH2_02-clone [SAPHanaTopology_RH2_02]:
* Started: [ clusternode1 clusternode2 ]
* Clone Set: SAPHana_RH2_02-clone [SAPHana_RH2_02] (promotable):
* Promoted: [ clusternode1 ]
* Unpromoted: [ clusternode2 ]
* vip_RH2_02_MASTER (ocf:heartbeat:IPaddr2): Started clusternode1
Move the resource:
# pcs resource move SAPHana_RH2_02-clone
Location constraint to move resource 'SAPHana_RH2_02-clone' has been created
Waiting for the cluster to apply configuration changes...
Location constraint created to move resource 'SAPHana_RH2_02-clone' has been removed
Waiting for the cluster to apply configuration changes...
resource 'SAPHana_RH2_02-clone' is promoted on node 'clusternode2'; unpromoted on node 'clusternode1'
Check if there are remaining constraints:
# pcs constraint location
You can remove those location constraints created during the failover by clearing the resource. Example:
[root@clusternode1]# pcs resource clear SAPHana_RH2_02-clone
Check if there are any remaining warnings or entries in the "Migration Summary":
# pcs status --full
Check the stonith history:
# pcs stonith history
If desired, clear the stonith history:
# pcs stonith history cleanup
If you are using a pacemaker version earlier than 2.1.5, please refer to Is there a way to manage constraints when running pcs resource move? and check the remaining constraints.
7.2.10. Monitor failover and sync state Copy linkLink copied to clipboard!
All pacemaker activities are logged in the /var/log/messages file on the cluster nodes. Since there are many other messages, it is sometimes difficult to read the messages related to the SAP resource agent. You can configure a command alias that filters out only the messages related to SAP resource agent.
Example alias tmsl:
# alias tmsl='tail -1000f /var/log/messages | egrep -s "Setting master-rsc_SAPHana_${SAPSYSTEMNAME}_HDB${TINSTANCE}|sr_register|WAITING4LPA|PROMOTED|DEMOTED|UNDEFINED|master_walk|SWAIT|WaitforStopped|FAILED|LPT"'
Example output of tsml:
[root@clusternode1]# tmsl
Jun 22 13:59:54 clusternode1 SAPHana(SAPHana_RH2_02)[907482]: INFO: DEC: Finally get_SRHOOK()=SOK
Jun 22 13:59:55 clusternode1 SAPHana(SAPHana_RH2_02)[907482]: INFO: DEC: secondary with sync status SOK ==> possible takeover node
Jun 22 13:59:55 clusternode1 SAPHana(SAPHana_RH2_02)[907482]: INFO: DEC: hana_rh2_site_srHook_DC1=SWAIT
Jun 22 13:59:55 clusternode1 SAPHana(SAPHana_RH2_02)[907482]: INFO: DEC: hana_rh2_site_srHook_DC1 is empty or SWAIT. Take polling attribute: hana_rh2_sync_state=SOK
Jun 22 13:59:55 clusternode1 SAPHana(SAPHana_RH2_02)[907482]: INFO: DEC: Finally get_SRHOOK()=SOK
Jun 22 13:59:55 clusternode1 SAPHana(SAPHana_RH2_02)[907482]: INFO: DEC: saphana_monitor_secondary: scoring_crm_master(4:S:master1:master:worker:master,SOK)
Jun 22 13:59:55 clusternode1 SAPHana(SAPHana_RH2_02)[907482]: INFO: DEC: scoring_crm_master: sync(SOK) is matching syncPattern (SOK)
Jun 22 14:04:06 clusternode1 SAPHana(SAPHana_RH2_02)[914625]: INFO: DEC: hana_rh2_site_srHook_DC1=SWAIT
Jun 22 14:04:06 clusternode1 SAPHana(SAPHana_RH2_02)[914625]: INFO: DEC: hana_rh2_site_srHook_DC1 is empty or SWAIT. Take polling attribute: hana_rh2_sync_state=SOK
Jun 22 14:04:06 clusternode1 SAPHana(SAPHana_RH2_02)[914625]: INFO: DEC: Finally get_SRHOOK()=SOK
Jun 22 14:04:09 clusternode1 SAPHana(SAPHana_RH2_02)[914625]: INFO: DEC: hana_rh2_site_srHook_DC1=SWAIT
Jun 22 14:04:09 clusternode1 SAPHana(SAPHana_RH2_02)[914625]: INFO: DEC: hana_rh2_site_srHook_DC1 is empty or SWAIT. Take polling attribute: hana_rh2_sync_state=SOK
Jun 22 14:04:09 clusternode1 SAPHana(SAPHana_RH2_02)[914625]: INFO: DEC: Finally get_SRHOOK()=SOK
Jun 22 14:04:09 clusternode1 SAPHana(SAPHana_RH2_02)[914625]: INFO: DEC: secondary with sync status SOK ==> possible takeover node
Jun 22 14:04:09 clusternode1 SAPHana(SAPHana_RH2_02)[914625]: INFO: DEC: hana_rh2_site_srHook_DC1=SWAIT
Jun 22 14:04:09 clusternode1 SAPHana(SAPHana_RH2_02)[914625]: INFO: DEC: hana_rh2_site_srHook_DC1 is empty or SWAIT. Take polling attribute: hana_rh2_sync_state=SOK
Jun 22 14:04:09 clusternode1 SAPHana(SAPHana_RH2_02)[914625]: INFO: DEC: Finally get_SRHOOK()=SOK
Jun 22 14:04:09 clusternode1 SAPHana(SAPHana_RH2_02)[914625]: INFO: DEC: saphana_monitor_secondary: scoring_crm_master(4:S:master1:master:worker:master,SOK)
Jun 22 14:04:09 clusternode1 SAPHana(SAPHana_RH2_02)[914625]: INFO: DEC: scoring_crm_master: sync(SOK) is matching syncPattern (SOK)
Jun 22 14:08:21 clusternode1 SAPHana(SAPHana_RH2_02)[922136]: INFO: DEC: hana_rh2_site_srHook_DC1=SWAIT
Jun 22 14:08:21 clusternode1 SAPHana(SAPHana_RH2_02)[922136]: INFO: DEC: hana_rh2_site_srHook_DC1 is empty or SWAIT. Take polling attribute: hana_rh2_sync_state=SOK
Jun 22 14:08:21 clusternode1 SAPHana(SAPHana_RH2_02)[922136]: INFO: DEC: Finally get_SRHOOK()=SOK
Jun 22 14:08:23 clusternode1 SAPHana(SAPHana_RH2_02)[922136]: INFO: DEC: hana_rh2_site_srHook_DC1=SWAIT
Jun 22 14:08:23 clusternode1 SAPHana(SAPHana_RH2_02)[922136]: INFO: DEC: hana_rh2_site_srHook_DC1 is empty or SWAIT. Take polling attribute: hana_rh2_sync_state=SOK
Jun 22 14:08:23 clusternode1 SAPHana(SAPHana_RH2_02)[922136]: INFO: DEC: Finally get_SRHOOK()=SOK
Jun 22 14:08:24 clusternode1 SAPHana(SAPHana_RH2_02)[922136]: INFO: DEC: secondary with sync status SOK ==> possible takeover node
Jun 22 14:08:24 clusternode1 SAPHana(SAPHana_RH2_02)[922136]: INFO: DEC: hana_rh2_site_srHook_DC1=SWAIT
Jun 22 14:08:24 clusternode1 SAPHana(SAPHana_RH2_02)[922136]: INFO: DEC: hana_rh2_site_srHook_DC1 is empty or SWAIT. Take polling attribute: hana_rh2_sync_state=SOK
Jun 22 14:08:24 clusternode1 SAPHana(SAPHana_RH2_02)[922136]: INFO: DEC: Finally get_SRHOOK()=SOK
Jun 22 14:08:24 clusternode1 SAPHana(SAPHana_RH2_02)[922136]: INFO: DEC: saphana_monitor_secondary: scoring_crm_master(4:S:master1:master:worker:master,SOK)
Jun 22 14:08:24 clusternode1 SAPHana(SAPHana_RH2_02)[922136]: INFO: DEC: scoring_crm_master: sync(SOK) is matching syncPattern (SOK)
Jun 22 14:12:35 clusternode1 SAPHana(SAPHana_RH2_02)[929408]: INFO: DEC: hana_rh2_site_srHook_DC1=SWAIT
Jun 22 14:12:35 clusternode1 SAPHana(SAPHana_RH2_02)[929408]: INFO: DEC: hana_rh2_site_srHook_DC1 is empty or SWAIT. Take polling attribute: hana_rh2_sync_state=SOK
Jun 22 14:12:36 clusternode1 SAPHana(SAPHana_RH2_02)[929408]: INFO: DEC: Finally get_SRHOOK()=SOK
Jun 22 14:12:38 clusternode1 SAPHana(SAPHana_RH2_02)[929408]: INFO: DEC: hana_rh2_site_srHook_DC1=SWAIT
Jun 22 14:12:38 clusternode1 SAPHana(SAPHana_RH2_02)[929408]: INFO: DEC: hana_rh2_site_srHook_DC1 is empty or SWAIT. Take polling attribute: hana_rh2_sync_state=SOK
Jun 22 14:12:38 clusternode1 SAPHana(SAPHana_RH2_02)[929408]: INFO: DEC: Finally get_SRHOOK()=SOK
Jun 22 14:12:38 clusternode1 SAPHana(SAPHana_RH2_02)[929408]: INFO: DEC: secondary with sync status SOK ==> possible takeover node
Jun 22 14:12:39 clusternode1 SAPHana(SAPHana_RH2_02)[929408]: INFO: DEC: hana_rh2_site_srHook_DC1=SWAIT
Jun 22 14:12:39 clusternode1 SAPHana(SAPHana_RH2_02)[929408]: INFO: DEC: hana_rh2_site_srHook_DC1 is empty or SWAIT. Take polling attribute: hana_rh2_sync_state=SOK
Jun 22 14:12:39 clusternode1 SAPHana(SAPHana_RH2_02)[929408]: INFO: DEC: Finally get_SRHOOK()=SOK
Jun 22 14:12:39 clusternode1 SAPHana(SAPHana_RH2_02)[929408]: INFO: DEC: saphana_monitor_secondary: scoring_crm_master(4:S:master1:master:worker:master,SOK)
Jun 22 14:12:39 clusternode1 SAPHana(SAPHana_RH2_02)[929408]: INFO: DEC: scoring_crm_master: sync(SOK) is matching syncPattern (SOK)
Jun 22 14:14:01 clusternode1 pacemaker-attrd[10150]: notice: Setting hana_rh2_clone_state[clusternode2]: PROMOTED -> DEMOTED
Jun 22 14:14:02 clusternode1 pacemaker-attrd[10150]: notice: Setting hana_rh2_clone_state[clusternode2]: DEMOTED -> UNDEFINED
Jun 22 14:14:19 clusternode1 pacemaker-attrd[10150]: notice: Setting hana_rh2_clone_state[clusternode1]: DEMOTED -> PROMOTED
Jun 22 14:14:21 clusternode1 SAPHana(SAPHana_RH2_02)[932762]: INFO: DEC: hana_rh2_site_srHook_DC1=SWAIT
Jun 22 14:14:21 clusternode1 SAPHana(SAPHana_RH2_02)[932762]: INFO: DEC: hana_rh2_site_srHook_DC1 is empty or SWAIT. Take polling attribute: hana_rh2_sync_state=SOK
Jun 22 14:14:21 clusternode1 SAPHana(SAPHana_RH2_02)[932762]: INFO: DEC: Finally get_SRHOOK()=SOK
Jun 22 14:15:14 clusternode1 SAPHana(SAPHana_RH2_02)[932762]: INFO: DEC: hana_rh2_site_srHook_DC1=SWAIT
Jun 22 14:15:22 clusternode1 pacemaker-attrd[10150]: notice: Setting hana_rh2_sync_state[clusternode1]: SOK -> PRIM
Jun 22 14:15:23 clusternode1 pacemaker-attrd[10150]: notice: Setting hana_rh2_sync_state[clusternode2]: PRIM -> SOK
Jun 22 14:15:23 clusternode1 SAPHana(SAPHana_RH2_02)[934810]: INFO: ACT site=DC1, setting SOK for secondary (1)
Jun 22 14:15:25 clusternode1 pacemaker-attrd[10150]: notice: Setting hana_rh2_clone_state[clusternode2]: UNDEFINED -> DEMOTED
Jun 22 14:15:32 clusternode1 pacemaker-attrd[10150]: notice: Setting hana_rh2_sync_state[clusternode2]: SOK -> SFAIL
Jun 22 14:19:36 clusternode1 pacemaker-attrd[10150]: notice: Setting hana_rh2_sync_state[clusternode2]: SFAIL -> SOK
Jun 22 14:19:36 clusternode1 SAPHana(SAPHana_RH2_02)[942693]: INFO: ACT site=DC1, setting SOK for secondary (1)
Jun 22 14:23:49 clusternode1 SAPHana(SAPHana_RH2_02)[950623]: INFO: ACT site=DC1, setting SOK for secondary (1)
Jun 22 14:28:02 clusternode1 SAPHana(SAPHana_RH2_02)[958633]: INFO: ACT site=DC1, setting SOK for secondary (1)
Jun 22 14:32:15 clusternode1 SAPHana(SAPHana_RH2_02)[966683]: INFO: ACT site=DC1, setting SOK for secondary (1)
Jun 22 14:36:27 clusternode1 SAPHana(SAPHana_RH2_02)[974736]: INFO: ACT site=DC1, setting SOK for secondary (1)
Jun 22 14:40:40 clusternode1 SAPHana(SAPHana_RH2_02)[982934]: INFO: ACT site=DC1, setting SOK for secondary (1)
The filter makes it easier to understand what status changes are happening. If details are missing, you can open the whole message file to read all the information.
After a failover, you can clear the resource. Please also check that there are no remaining location constraints.
7.2.11. Check cluster consistency Copy linkLink copied to clipboard!
During the installation the resources are sometimes started before the configuration is finally completed. This can lead to entries in the Cluster Information Base (CIB), which can result in incorrect behavior. This can easily be checked and also manually corrected after the configuration has been completed.
If you start the SAPHana resources the missing entries will be recreated. Wrong entries cannot be addressed by pcs commands and need to be removed manually.
Check CIB entries:
# cibadmin --query
DC3 and SFAIL are entries that should not be present in the Cluster Information Base, when the cluster members are DC1 and DC2, and when the sync state between the nodes is reported as SOK.
Example to check for corresponding entries:
# cibadmin --query |grep '"DC3"'
# cibadmin --query |grep '"SFAIL"'
The command can be executed on any node in the cluster as user root. Usually the output of the command is empty. If there is still an error in the configuration the output could look like this:
<nvpair id="SAPHanaSR-hana_rh1_glob_sec" name="hana_rh1_glob_sec" value="DC3"/>
These entries can be removed with the following command:
# cibadmin --delete --xml-text '<...>'
To remove the entries in the example above you have to enter the following. Please note that the output contains double quotes, so the text must be embedded in single quotes:
# cibadmin --delete --xml-text ' <nvpair id="SAPHanaSR-hana_rh1_glob_sec" name="hana_rh1_glob_sec" value="DC3"/>'
Verify the absence of the removed CIB entries. The returned output should be empty.
# cibadmin --query |grep 'DC3"'
7.2.12. Cluster cleanup Copy linkLink copied to clipboard!
During the failover tests there might be left behind constraints and other remains from previous tests. The cluster needs to be cleared from these before starting the next test.
Check the cluster status for failure events:
# pcs status --full
If you see cluster warnings or entries in the "Migration Summary", you should clear and cleanup the resources:
# pcs resource clear SAPHana_RH2_02-clone
# pcs resource cleanup SAPHana_RH2_02-clone
Output:
Cleaned up SAPHana_RH2_02:0 on clusternode1
Cleaned up SAPHana_RH2_02:1 on clusternode2
Check if there are unwanted location constraints, for example from a previous failover:
# pcs constraint location
Check the existing constraints in more detail:
# pcs constraint --full
Example of a location constraint after a resource move:
Node: hana08 (score:-INFINITY) (role:Started) (id:cli-ban-SAPHana_RH2_02-clone-on-hana08)
Clear this location constraint:
# pcs resource clear SAPHana_RH2_02-clone
Verify the constraint is gone from the constraints list. If it persists, explicitly delete it using its constraint id:
# pcs constraint delete cli-ban-SAPHana_RH2_02-clone-on-hana08
If you run several tests with fencing you might also clear the stonith history:
# pcs stonith history cleanup
All pcs commands are executed as user root. Please also check Discover leftovers.
7.2.13. Other cluster commands Copy linkLink copied to clipboard!
Various cluster command examples
# pcs status --full
# crm_mon -1Arf # Provides an overview
# pcs resource # Lists all resources and shows if they are running
# pcs constraint --full # Lists all constraint ids which should be removed
# pcs cluster start --all # This will start the cluster on all nodes
# pcs cluster stop --all # This will stop the cluster on all nodes
# pcs node attribute # Lists node attributes
7.3. RHEL and general commands Copy linkLink copied to clipboard!
7.3.1. Discover current status Copy linkLink copied to clipboard!
You have to follow several steps to know what the current status of the environment is. Please refer to Monitor the environment. Also, we recommend to do the following:
-
Check
/var/log/messages, use Aliases for monitoring for easier log reviews. - Sometimes a cluster must be cleaned up from previous activity to continue proper operation. Discover leftovers and clear them if necessary.
7.3.2. yum info Copy linkLink copied to clipboard!
# yum info resource-agents-sap-hana
Last metadata expiration check: 2:47:28 ago on Tue 06 Jun 2023 03:13:57 AM CEST.
Installed Packages
Name : resource-agents-sap-hana
Epoch : 1
Version : 0.162.1
Release : 2.el9_2
Architecture : noarch
Size : 174 k
Source : resource-agents-sap-hana-0.162.1-2.el9_2.src.rpm
Repository : @System
Summary : SAP HANA cluster resource agents
URL : https://github.com/SUSE/SAPHanaSR
License : GPLv2+
Description : The SAP HANA resource agents interface with Pacemaker to allow
: SAP instances to be managed in a cluster environment.
7.3.3. RPM display version Copy linkLink copied to clipboard!
# rpm -q resource-agents-sap-hana
resource-agents-sap-hana-0.162.1-2.el9_2.noarch
7.3.4. Aliases for monitoring Copy linkLink copied to clipboard!
You can add this to your shell profile. In the example the root aliases depend on the <sid>adm aliases, which must therefore also already be defined.
root ( add to
~/.bashrc):# export ListInstances=$(/usr/sap/hostctrl/exe/saphostctrl -function ListInstances| head -1 ) export sid=$(echo "$ListInstances" |cut -d " " -f 5| tr [A-Z] [a-z]) export SID=$(echo $sid | tr [a-z] [A-Z]) export Instance=$(echo "$ListInstances" |cut -d " " -f 7 ) alias crmm='watch -n 1 crm_mon -1Arf' alias crmv='watch -n 1 /usr/local/bin/crmmv' alias cglo='su - ${sid}adm -c cglo' alias cdh='cd /usr/lib/ocf/resource.d/heartbeat' alias gtr='su - ${sid}adm -c gtr' alias hdb='su - ${sid}adm -c hdb' alias hdbi='su - ${sid}adm -c hdbi' alias hgrep='history | grep $1' alias hri='su - ${sid}adm -c hri' alias hris='su - ${sid}adm -c hris' alias killnode="echo 'b' > /proc/sysrq-trigger" alias lhc='su - ${sid}adm -c lhc' alias pit='ssh pitunnel' alias python='/usr/sap/${SID}/HDB${Instance}/exe/Python/bin/python' alias srstate='su - ${sid}adm -c srstate' alias shr='watch -n 5 "SAPHanaSR-monitor --sid=${SID}"' alias sgsi='su - ${sid}adm -c sgsi' alias srm='su - ${sid}adm -c srm' alias srs='su - ${sid}adm -c srs' alias sapstart='su - ${sid}adm -c sapstart' alias sapstop='su - ${sid}adm -c sapstop' alias tma='tmux attach -t `tmux ls | grep -v atta| head -1 |cut -d " " -f 1`' alias tm='tail -100f /var/log/messages |grep -v systemd' alias tms='tail -1000f /var/log/messages | egrep -s "Setting master-rsc_SAPHana_${SID}_HDB${Instance}|sr_register|WAITING4 LPA|EXCLUDE as possible takeover node|SAPHanaSR|failed|${HOSTNAME}|PROMOTED|DEMOTED|UNDEFINED|master_walk|SWAIT|WaitforStop ped|FAILED"' alias tmss='tail -1000f /var/log/messages | grep -v systemd| egrep -s "secondary with sync status|Setting master-rsc_SAPHa na_${SID}_HDB${Instance}|sr_register|WAITING4LPA|EXCLUDE as possible takeover node|SAPHanaSR|failed|${HOSTNAME}|PROMOTED|DE MOTED|UNDEFINED|master_walk|SWAIT|WaitforStopped|FAILED"' alias tmm='tail -1000f /var/log/messages | egrep -s "Setting master-rsc_SAPHana_${SID}_HDB${Instance}|sr_register|WAITING4 LPA|PROMOTED|DEMOTED|UNDEFINED|master_walk|SWAIT|WaitforStopped|FAILED|LPT|SOK|SFAIL|SAPHanaSR-mon"| grep -v systemd' alias tmsl='tail -1000f /var/log/messages | egrep -s "Setting master-rsc_SAPHana_${SID}_HDB${Instance}|sr_register|WAITING 4LPA|PROMOTED|DEMOTED|UNDEFINED|master_walk|SWAIT|WaitforStopped|FAILED|LPT|SOK|SFAIL|SAPHanaSR-mon"' alias vih='vim /usr/lib/ocf/resource.d/heartbeat/SAPHanaStart' alias vglo='su - ${sid}adm -c vglo'<sid>adm( add to~/.customer.sh):alias tm='tail -100f /var/log/messages |grep -v systemd' alias tms='tail -1000f /var/log/messages | egrep -s "Setting master-rsc_SAPHana_${SAPSYSTEMNAME}_HDB${TINSTANCE}|sr_register|WAITING4LPA|EXCLUDE as possible takeover node|SAPHanaSR|failed|${HOSTNAME}|PROMOTED|DEMOTED|UNDEFINED|master_walk|SWAIT|WaitforStopped|FAILED"' alias tmsl='tail -1000f /var/log/messages | egrep -s "Setting master-rsc_SAPHana_${SAPSYSTEMNAME}_HDB${TINSTANCE}|sr_register|WAITING4LPA|PROMOTED|DEMOTED|UNDEFINED|master_walk|SWAIT|WaitforStopped|FAILED|LPT"' alias sapstart='sapcontrol -nr ${TINSTANCE} -function StartSystem HDB;hdbi' alias sapstop='sapcontrol -nr ${TINSTANCE} -function StopSystem HDB;hdbi' alias sgsi='watch sapcontrol -nr ${TINSTANCE} -function GetSystemInstanceList' alias spl='watch sapcontrol -nr ${TINSTANCE} -function GetProcessList' alias splh='watch "sapcontrol -nr ${TINSTANCE} -function GetProcessList| grep hdbdaemon"' alias srm='watch "hdbnsutil -sr_state --sapcontrol=1 |grep site.*Mode"' alias srs="watch -n 5 'python /usr/sap/${SAPSYSTEMNAME}/HDB${TINSTANCE}/exe/python_support/systemReplicationStatus.py ; echo Status \$?'" alias srstate='watch -n 10 hdbnsutil -sr_state' alias hdb='watch -n 5 "sapcontrol -nr ${TINSTANCE} -function GetProcessList| egrep -s hdbdaemon\|hdbnameserver\|hdbindexserver "' alias hdbi='watch -n 5 "sapcontrol -nr ${TINSTANCE} -function GetProcessList| egrep -s hdbdaemon\|hdbnameserver\|hdbindexserver;sapcontrol -nr ${TINSTANCE} -function GetSystemInstanceList "' alias hgrep='history | grep $1' alias vglo="vim /usr/sap/${SAPSYSTEMNAME}/SYS/global/hdb/custom/config/global.ini" alias vgloh="vim /hana/shared/${SAPSYSTEMNAME}/HDB${TINSTANCE}/${HOSTNAME}/global.ini" alias hri='hdbcons -e hdbindexserver "replication info"' alias hris='hdbcons -e hdbindexserver "replication info" | egrep -e "SiteID|ReplicationStatus_"' alias gtr='watch -n 10 /usr/sap/${SAPSYSTEMNAME}/HDB${TINSTANCE}/exe/Python/bin/python /usr/sap/${SAPSYSTEMNAME}/HDB${TINSTANCE}/exe/python_support/getTakeoverRecommendation.py --sapcontrol=1' alias lhc='/usr/sap/${SAPSYSTEMNAME}/HDB${TINSTANCE}/exe/Python/bin/python /usr/sap/${SAPSYSTEMNAME}/HDB${TINSTANCE}/exe/python_support/landscapeHostConfiguration.py;echo $?'