Automating SAP HANA Scale-Out System Replication using the RHEL HA Add-On
Creating an HA cluster for automating scale-out HANA system replication with the classic dedicated SAP HANA scale-out resource agents
Abstract
Providing feedback on Red Hat documentation Copy linkLink copied to clipboard!
We appreciate your feedback on our documentation. Let us know how we can improve it.
Submitting feedback through Jira (account required)
- Make sure you are logged in to the Jira website.
- Click on this link to provide feedback.
- Enter a descriptive title in the Summary field.
- Enter your suggestion for improvement in the Description field. Include links to the relevant parts of the documentation.
- Click Create at the bottom of the dialogue.
Chapter 1. Introduction to SAP HANA scale-out system replication HA Copy linkLink copied to clipboard!
Configuring the SAP HANA system replication between two identical HANA sites enables a basic resiliency of the database. You can configure these two sites in a Pacemaker cluster for advanced high availability that automatically handles the service recovery in the case of a failure on the primary instance side.
1.1. Terminology Copy linkLink copied to clipboard!
node
One host or system in a HA cluster setup, also called a cluster member.
cluster
Cluster is the high-availability setup using the Pacemaker cluster manager from the RHEL HA Add-On. It consists of two or more members, or nodes.
instance
One set of SAP HANA systems that belong to one HANA site. In single-host (scale-up) HANA environments, one HANA site consists of a single HANA instance. In multiple-host (scale-out) HANA configurations, each HANA site consists of two or more HANA instances.
primary
The primary HANA instance or primary site refers to the instance which is the active HANA instance or site. In single-host setups (scale-up), this is one system. In multiple-host (scale-out) setups, the primary database stretches across multiple systems of one HANA site and the systems have different roles in the HANA environment to distribute load.
secondary
The secondary HANA instance or secondary site refers to the SAP HANA instance or site which is configured to be synced with the primary HANA instance through the SAP HANA system replication mechanism. This instance preloads the in-memory data of the primary instance and is ready to take over if the primary instance fails.
1.2. Performance-optimized SAP HANA scale-out HA Copy linkLink copied to clipboard!
Performance-optimized means that there is only a single SAP HANA instance running on each node that has control over most of the resources, such as CPU and RAM, on each node. This means that the SAP HANA instances can run with as much performance as possible.
You configure the HANA environment without HANA standby hosts and only one coordinator name server per replication site. This master name server controls the landscape of each site. The HANA host auto-failover functionality using idle standby hosts is not necessary, because the Pacemaker cluster controls the high availability of the HANA database and manages the HANA system replication.
With a performance-optimized SAP HANA system replication setup of SAP HANA 2.0 SPS1 or newer you can also configure read access to the secondary system to reduce the load on the primary instance. For more information see the SAP documentation for Active/Active (Read Enabled) configuration.
1.3. Cluster resource agents and tools for SAP HANA HA Copy linkLink copied to clipboard!
The high-availability (HA) cluster configuration for managing SAP HANA system replication setups works with multiple resource agents and other tools that combine their functionality for the expected behavior.
The resource agents and tools are provided in the package resource-agents-sap-hana-scaleout.
SAPHanaTopology
The
SAPHanaTopologyresource agent gets status information from the SAP HANA environment and saves it to cluster properties. The agent also starts and monitors the localSAP HostAgent, which is required for starting, stopping and monitoring the HANA instances. A configuration process in SAP HANA called system replication hook adds replication health information as well to the saved properties. Based on the collected environment data, the resource agent defines a dedicated health score of the cluster node. This scoring is used by the cluster to decide if it must initiate the switch of the system replication from one site to the other.SAPHanaController
The
SAPHanaControllerresource agent monitors and manages the SAP HANA environment. In case of a failure of the HANA instance, the resource determines which recovery action it takes and executes the commands for an automatic switch, or it changes the active site of the system replication.SAPHanaSR-showAttr
The
SAPHanaSR-showAttrtool shows cluster attributes for the SAP HANA system replication automation in a preformatted overview including the HANA topology that shows whether it is a scale-up or scale-out environment. The default output includes the system replication status between the nodes and other related status information. The script retrieves the information from the Cluster Information Base (CIB), where other resource agents or hook scripts store updates during their regular checks or from HANA events, respectively. Due to this, the information can contain outdated states until it is updated again. Use HANA tools to get real-time status information from the landscape.
1.4. SAP HANA HA/DR provider hooks Copy linkLink copied to clipboard!
Current versions of SAP HANA provide an API in the form of hooks that allow the HANA instance to send notifications for certain events, for example the loss or establishment of the system replication. For each event, the HANA instance calls the configured hooks, also called HA/DR providers. Hooks are custom Python scripts which process the events that HANA sends and the scripts can trigger different actions based on the event information.
You must add the HA/DR provider definition to the HANA global configuration to enable the required functionality of triggering additional actions for certain events.
SAPHanaSR for the srConnectionChanged() hook method
The SAPHanaSR hook is required for processing the srConnectionChanged() hook method. This method is used by the primary HANA instance for a notification of any change in the HANA system replication status. The primary HANA instance calls the SAPHanaSR HA/DR provider when a HANA system replication related event occurs. The hook script SAPHanaSR.py then parses srConnectionChanged() events for the system replication status detail and as a result it updates the srHook cluster attribute. This attribute is used by the resource agents to evaluate the landscape health and make decisions. The value of the system replication or sync state defines if the cluster recovers a failed primary instance on the same node or if it triggers a takeover to the secondary. The takeover is only triggered when the system replication is fully in sync, which means the HANA data is consistent between the HANA sites.
You must configure the SAPHanaSR hook to enable the srConnectionChanged() hook method for proper function and full support of the HA cluster setup.
ChkSrv for the srServiceStateChanged() hook method
When the HANA instance detects an issue with a HANA indexserver process it recovers from the problem by stopping and restarting the hdbindexserver service automatically through an internal mechanism.
However, especially for very large HANA instances, the hdbindexserver service can take a very long time for the stopping phase of this recovery process. Although HANA reports this service degradation not as an error in the HANA landscape, the situation poses a risk to the data consistency if anything else fails in the instance during that time. To improve the unpredictable service recovery time, you can configure the ChkSrv hook to stop or kill the entire affected HANA instance instead.
In a setup with automatic failover enabled (PREFER_SITE_TAKEOVER=true), the instance stop leads to a takeover if the secondary node is in a healthy state. Otherwise, instance recovery happens locally, but the enforced local instance restart accelerates the process.
The HANA instance calls the ChkSrv hook when an event occurs. The hook script ChkSrv.py processes the srServiceStateChanged() hook method and executes actions based on the results of the filters it applies to event details. This way the ChkSrv.py hook script can distinguish a HANA hdbindexserver process that is being stopped and restarted by HANA after a failure from the same process being stopped as part of an intended instance shutdown. When the hook script determines that the event is caused by a failure it triggers the configured action.
The ChkSrv.py hook script has multiple options to define what happens when an indexserver failure event is detected:
ignore
This action just writes the parsed events and decision information to a dedicated logfile. This is useful for testing and verifying what the hook script would do when activating
stoporkillactions.stop
This action executes a graceful
StopSystemfor the instance through thesapcontrolcommand.kill
This action executes the
HDB kill-<signal>command with a default signal 9, which can be configured. The result is the same as when usingstop, but can be faster.
Any indexserver failure is treated individually by HANA. The same processes are always triggered for every single indexserver issue.
Enabling the srServiceStateChanged() hook is optional.
1.5. Support policies for SAP HANA High Availability Copy linkLink copied to clipboard!
Red Hat supports the following components of the solution:
- Basic operating system configuration for running SAP HANA on RHEL, based on SAP guidelines
- RHEL HA Add-On
- Red Hat HA solutions for SAP HANA system replication
Chapter 2. Planning the HA cluster setup Copy linkLink copied to clipboard!
Plan your setup carefully to ensure that all requirements for the HA cluster configuration for automating the HANA system replication of your HANA landscape are met.
2.1. Subscription and repositories for SAP HANA HA Copy linkLink copied to clipboard!
The solutions for SAP HANA in a Pacemaker cluster for High Availability (HA) are provided in dedicated repositories. The RHEL for SAP Solutions subscription is required to access all relevant content. In addition to the standard RHEL repos the subscription provides access to the following repos, which are required to set up the SAP HANA HA solution:
High Availability
The RHEL HA Add-On’s content is stored in a repository named High Availability. The repository ID is represented as
rhel-9-for-<arch>-highavailability-e4s-rpms.SAP Solutions
Name of the repository that contains the SAP HANA specific content. The repository ID is represented as
rhel-9-for-<arch>-sap-solutions-e4s-rpms.
The <arch> denotes the specific hardware architecture:
-
x86_64 -
ppc64le
Example list of repositories enabled as part of the RHEL for SAP Solutions subscription:
[root]# dnf repolist
Updating Subscription Management repositories.
repo id repo name
rhel-9-for-x86_64-appstream-e4s-rpms Red Hat Enterprise Linux 9 for x86_64 - AppStream - Update Services for SAP Solutions (RPMs)
rhel-9-for-x86_64-baseos-e4s-rpms Red Hat Enterprise Linux 9 for x86_64 - BaseOS - Update Services for SAP Solutions (RPMs)
rhel-9-for-x86_64-highavailability-e4s-rpms Red Hat Enterprise Linux 9 for x86_64 - High Availability - Update Services for SAP Solutions (RPMs)
rhel-9-for-x86_64-sap-netweaver-e4s-rpms Red Hat Enterprise Linux 9 for x86_64 - SAP NetWeaver - Update Services for SAP Solutions (RPMs)
rhel-9-for-x86_64-sap-solutions-e4s-rpms Red Hat Enterprise Linux 9 for x86_64 - SAP Solutions - Update Services for SAP Solutions (RPMs)
2.2. Operating system requirements Copy linkLink copied to clipboard!
Deploy your host operating system as described in Installing RHEL 9 for SAP Solutions.
Follow SAP Note 3108302 - SAP HANA DB: Recommended OS Settings for RHEL 9 to configure architecture specific settings, kernel parameters and check the minimum required Linux kernel and HANA versions.
Apply the operating system post-installation configuration for SAP HANA hosts as described in SAP Note 3108316 - Red Hat Enterprise Linux 9.x: Installation and Configuration.
Root privileges
For the HANA installation and the cluster HA setup the root user or a privileged user that can run any sudo commands is required.
2.3. Storage requirements Copy linkLink copied to clipboard!
You can find information about Sizing SAP HANA in the SAP HANA Master Guide.
There is no communication between both scale-out environments on the storage level. As a result, you must complete the storage configuration on each scale-out environment before installing the HANA instances.
For the setup of a scale-out SAP HANA environment with HANA system replication between two HANA sites you can configure the storage as shared or non-shared.
Shared storage
Configure 3 NFS shares for the mountpoints /hana/data, /hana/log, /hana/shared. For each HANA site you must create a dedicated set of the shares.
Example NFS storage details for the HANA instance in datacenter 1:
| Method | NFS Server | NFS Path | Mount Point |
|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Example NFS storage details for the HANA instance in datacenter 2:
| Method | NFS Server | NFS Path | Mount Point |
|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Non-shared storage
A non-shared storage configuration requires the integration of the storage connector. The storage connector manages access to the LUNs or LVM devices over SCSI or LVM locking mechanisms. You can only use this method for /hana/data and /hana/log.
Use the SAP HANA Fiber Channel Storage Connector Admin Guide for the setup of non-shared storage.
For /hana/shared you must configure an NFS share for each instance, even if you use non-shared storage for /hana/data and /hana/log.
2.4. Network requirements Copy linkLink copied to clipboard!
You can find information about SAP HANA network architecture considerations in the SAP HANA Administration Guide.
For the SAP HANA system replication setup in a HA cluster we recommend that you configure dedicated networks and connections for the cluster communication traffic, which is separate from any HANA network traffic.
2.5. HA cluster requirements Copy linkLink copied to clipboard!
Fencing
For a supported HA cluster setup using the RHEL HA Add-on you must configure a fencing or STONITH device on each cluster node. Which fencing or STONITH device you can use depends on the platform the cluster is running on. Check the Support Policies for RHEL High Availability Clusters - Fencing/STONITH for recommendations on fencing agents or consult your hardware or cloud provider to find out which fence device is supported on their platform.
fence_scsi or fence_mpath as fencing/STONITH mechanism requires shared storage between the cluster nodes that is fully managed by the HA cluster. If your SAP environment does not include such a shared disks setup, using these fencing options is not supported.
Quorum
HANA environments with HANA system replication managed by a Pacemaker cluster in a performance-optimized setup always consist of an even number of HANA nodes. Each cluster member automatically counts as one vote in the quorum calculations, which the cluster uses to decide which nodes can continue running when there is a communication disruption between the nodes. An even number of cluster nodes can lead to a 50/50 split and there is a risk of a split-brain situation, where both partitions continue running and cause conflicts or data corruption in the running services.
We highly recommend that you configure an additional quorum vote in the cluster to have an odd number of votes and improve the availability of the service even during multiple cluster interconnect failures. See Exploring Concepts of RHEL High Availability Clusters - Quorum for more details about the concept and its benefits.
You can use a qdevice or an additional cluster node to add a quorum vote to your cluster. Both methods require a separate host and have different advantages and limitations that you must consider.
See Configuring a quorum device in the cluster for the configuration steps of the following quorum device methods:
qdevice
- You must configure a dedicated host that is not a member of any cluster.
- Ideally you place the qdevice host in a different location or availability zone than any cluster members.
- You can configure no more than one qdevice per cluster.
- A single qdevice host can serve qdevices to multiple different clusters. You do not need additional qdevice hosts if your different clusters can reach the same qdevice host.
- The qdevice is visible in the quorum configuration only and does not require any change or considerations with the cluster resource settings.
- The qdevice communicates through the network. Use any production network, for example, the HANA client network.
- Ideally you configure a highly available network connection, for example, bonded interfaces on the qdevice host.
majority-maker node
- You must configure a dedicated host that becomes a member of the cluster you want to use it for.
- Ideally you place the majority-maker host in a different location or availability zone than other cluster members.
- The majority-maker host can only be a member of one cluster. You must configure a separate host for the same functionality in any other cluster.
- The communication with this host happens through corosync, like any other cluster member.
- You must adjust your cluster resource settings and add cluster constraints to prevent the cluster from running any resources on this node and leave it out of node target calculations for cloned resources.
- You can configure multiple additional cluster nodes that are restricted to only serve as quorum votes.
- Ideally you configure a highly available network connection, for example, bonded interfaces on the majority-maker host.
2.6. SAP HANA planning Copy linkLink copied to clipboard!
To prepare the HANA setup you can define a list of parameters that you require for the installation and configuration of the planned environment.
Example SAP HANA configuration parameters are show below:
| Parameter | Example value |
|---|---|
| HANA SID |
|
| HANA instance number |
|
| HANA site 1 name |
|
| HANA site 2 name |
|
| HANA site 1 node 1 FQDN |
|
| HANA site 1 node 2 FQDN |
|
| HANA site 1 node 3 FQDN |
|
| HANA site 1 node 4 FQDN |
|
| HANA site 2 node 1 FQDN |
|
| HANA site 2 node 2 FQDN |
|
| HANA site 2 node 3 FQDN |
|
| HANA site 2 node 4 FQDN |
|
| Majority-maker cluster node |
|
| HANA DB 'SYSTEM' user password |
|
| SAP system group ID |
|
| SAP system group name |
|
| SAP local administrator user ID |
|
| SAP local administrator user name |
|
| HANA administrative user ID |
|
| HANA administrative user name |
|
Chapter 3. Installing SAP HANA scale-out for a 8-node HA cluster setup Copy linkLink copied to clipboard!
The examples in the following configuration steps demonstrate the setup on 4 scale-out nodes per HANA site, which results in an installation of 8 HANA nodes.
You can apply the same steps to more scale-out nodes per site. Each HANA site must consist of the same amount of identically configured nodes.
3.1. Managing the firewalld service Copy linkLink copied to clipboard!
On RHEL the firewalld systemd service is enabled by default when installed and starts with a basic configuration.
For your planned SAP landscape you must decide if you want to manage all port and connection requirements in the firewall service on each cluster node, or if this is handled separately in the security design of your network infrastructure. You must disable the firewalld service in the case that you do not need to manage a firewall on the operating system level on each cluster node. If the local firewall service remains running without the necessary port configuration, it blocks the cluster communication and the connections between your SAP systems.
For your SAP landscape and HA setup to work you must implement one of the following options:
3.1.1. Disabling the firewalld service Copy linkLink copied to clipboard!
The firewalld service is installed and enabled by default as part of the "Server" package group. You must disable it if you do not use it in your network security strategy.
Prerequisites
- You are managing firewall rules outside of the individual host operating systems as part of your security concept.
Procedure
Stop and disable the
firewalldservice on each cluster node. The--nowparameter automatically stops the disabled service. Run this on each system of your planned landscape:[root]# systemctl disable --now firewalld.service
Verification
Verify that the
firewalldservice is disabled on each node.[root]# systemctl status firewalld.service○ firewalld.service - firewalld - dynamic firewall daemon Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; preset: enabled) Active: inactive (dead) Docs: man:firewalld(1)
3.1.2. Configuring the firewalld service for the SAP landscape Copy linkLink copied to clipboard!
Check the SAP documentation for the Ports and Connections that you have to enable in the firewall for your SAP landscape. Consider all SAP components in your setup that require incoming or outgoing communication and connections between the different hosts in your landscape.
Configure the firewalld service on each of your SAP hosts using the methods that fit your requirements best. Consult Configuring firewalls and packet filters for the details on how to use the firewalld service effectively.
3.2. Configuring the host names in /etc/hosts Copy linkLink copied to clipboard!
For a consistent host name resolution between all systems in your HANA and HA setup we recommend adding them to the /etc/hosts file on each node.
If you configure the HANA Internal Host Name Resolution you must ensure that the /etc/hosts entries for the same host names are consistent with the HANA configuration.
Procedure
Add the host names of all hosts to the
/etc/hostson all cluster nodes:[root]# cat /etc/hosts... 192.168.100.101 dc1hana1.example.com dc1hana1 192.168.100.102 dc1hana2.example.com dc1hana2 192.168.100.103 dc1hana3.example.com dc1hana3 192.168.100.104 dc1hana4.example.com dc1hana4 192.168.100.121 dc2hana1.example.com dc2hana1 192.168.100.122 dc2hana2.example.com dc2hana2 192.168.100.123 dc2hana3.example.com dc2hana3 192.168.100.124 dc2hana4.example.com dc2hana4
Verification
Check that you can ping the hosts. This step is optional and an example only for a basic verification. The system resolves entries in
/etc/hostswhen you use the ping command:[root]# ping dc1hana2.example.comPING dc1hana2.example.com (192.168.100.102) 56(84) bytes of data. 64 bytes from dc1hana2.example.com (192.168.100.102): icmp_seq=1 ttl=64 time=0.017 ms …
3.4. Creating the SAP administrative user and group Copy linkLink copied to clipboard!
In a high-availability environment where the highly available service can move between different systems using shared storage, you must configure the service’s user and groups with identical numerical values for their user ID (UID) and group ID (GID). Different IDs for the same service users or groups cause access conflicts and prevent you from switching the service between the cluster nodes.
Prepare the following operating system group:
-
sapsys
Prepare the following operating system users:
-
sapadm -
<sid>adm, using your target SID
Prerequisites
- You have reserved identical user and group IDs for the required groups and users, for example, in your central identity management system for service users.
Procedure
Create the
sapsysgroup. Use the prepared group ID, for example, ID10001:[root]# groupadd -g 10001 sapsysCreate the
sapadmuser as a member of thesapsysgroup. The user does not need a login shell. Use the prepared user ID, for example ID,10200:[root]# useradd -u 10200 -g sapsys sapadm \ -c 'SAP Local Administrator' -s /sbin/nologinCreate the
<sid>admuser as a member of thesapsysgroup. Use the prepared user ID, for example, ID10210for userrh1adm:[root]# useradd -u 10210 -g sapsys rh1adm \ -c 'SAP HANA Administrator' -s /bin/shAs the user shell, we recommend that you either use
/bin/shor/bin/csh. SAP installations provide user profiles and useful shell aliases in these shells.- Repeat the steps on all nodes.
Verification
Check that the users
sapadmand<sid>admexist and have the correct groups and IDs configured, for example:[root]# id sapadm rh1admuid=10200(sapadm) gid=10001(sapsys) groups=10001(sapsys) uid=10210(rh1adm) gid=10001(sapsys) groups=10001(sapsys)Check that the users have the correct description, home directory and shell defined:
[root]# grep -E 'sapadm|rh1adm' /etc/passwdsapadm:x:10200:10001:SAP Local Administrator:/home/sapadm:/sbin/nologin rh1adm:x:10210:10001:SAP HANA Administrator:/home/rh1adm:/bin/sh- Repeat the check on all nodes and verify that the names and IDs are identical.
3.5. Configuring SSH public-key access for root for all cluster nodes (optional) Copy linkLink copied to clipboard!
There are steps in the configuration in which you potentially require passwordless root access to the cluster nodes . This can be achieved by setting up SSH public-key authentication between different servers. If this can be used depends on your specific HANA setup and the security policies of your company.
Passwordless root access might be needed in the following situations:
- Accessing all hosts of the same HANA site during the database installation. This applies when your HANA site consists of more than one node like in a scale-out setup.
- Accessing the primary site from the secondary site for the HANA system replication configuration.
Procedure
Generate an ssh key pair. When no key type is defined, it creates an Ed25519 key by default, like in the following example for the
rootuser:[root]# ssh-keygenOption 1, if you have ssh
PasswordAuthenticationenabled on the remote system: Use thessh-copy-idtool to add the ssh public key to theauthorized_keyson the remote system. This automatically creates the.ssh/directory andauthorized_keysfile with correct permissions for the target user on the remote system. Run it on the host on which you created the ssh key in the previous step and enter the target user password when prompted:[root]# ssh-copy-id <remote_system>In the case of the
rootuser, this only works if the ssh config allowsPermitRootLoginand you can provide therootuser password in the prompt. Check the ssh configuration setting on the remote system if you face access permission issues even after you have enabledPasswordAuthentication. Consult your security policies before you enable these parameters on your HANA systems.Option 2, if password login to the target user on the remote host is prohibited or otherwise not possible: Configure the ssh key access on the remote system manually.
Create the
.ssh/directory in the target user’s home path on the remote system, if it does not exist yet. Run this on the remote system, for example, for therootuser:[root]# mkdir /root/.sshChange the permissions of the new
.ssh/directory. For security reasons the ssh key access does not work when the permissions are not correct. Run this on the remote system:[root]# chmod 0700 /root/.sshCopy the ssh public key from the .pub file that was created by the previous
ssh-keygen, for example,id_ed25519.pubin the default setting:[root]# cat /root/.ssh/id_ed25519.pubAdd the public key to the
authorized_keysfile. The command creates the file if it does not exist yet, otherwise it appends the key to the existing content. Run this on the remote system, for example, ondc1hana2:[root]# cat << EOF >> /root/.ssh/authorized_keys ssh-ed25519 … root@<node1> EOFEnsure that the
authorized_keysfile has the correct permissions, otherwise the ssh key access is blocked for security reasons:[root]# chmod 0600 /root/.ssh/authorized_keys
Access each system and log in from any source host to any remote host that you require for the setup. On first login you must accept each new connection once in an interactive prompt. This saves each host and key in the ssh
known_hostsfile by default.Option 1: Log in from each host to each other host and accept the key fingerprint once to save it to the local
known_hostsfile. Subsequent logins to the same host will not require further interaction, unless the key changes. This is a security measure to prevent unsolicited changes of the ssh keys. The following example confirms the authenticity of hostdc1hana2:[root]# ssh dc1hana2The authenticity of host 'dc1hana2 (pass:[***])' can't be established. ED25519 key fingerprint is SHA256:pass:[*********************************]. This key is not known by any other names Are you sure you want to continue connecting (yes/no/[fingerprint])? yes …Option 2: If you configure ssh key access between multiple systems you can use
ssh-keyscanto collect the public host key from multiple hosts and save it to the localknown_hostsfile in a single step per host. Run this on each system for which you distributed the public key and list all remote hosts that you potentially access from this node and user, for example, for therootuser on hostdc1hana1:[root]# ssh-keyscan -f - >> /root/.ssh/known_hosts dc1hana1 dc1hana2 dc1hana3 dc1hana4 dc2hana1 dc2hana2 dc2hana3 dc2hana4 <Ctrl-d># dc1hana1:22 SSH-2.0-OpenSSH_8.7 dc1hana1 ssh-ed25519 … # dc1hana2:22 SSH-2.0-OpenSSH_8.7 dc1hana2 ssh-ed25519 … # dc1hana3:22 SSH-2.0-OpenSSH_8.7 dc1hana3 ssh-ed25519 … # dc1hana4:22 SSH-2.0-OpenSSH_8.7 dc1hana4 ssh-ed25519 … # dc2hana1:22 SSH-2.0-OpenSSH_8.7 dc2hana1 ssh-ed25519 … # dc2hana2:22 SSH-2.0-OpenSSH_8.7 dc2hana2 ssh-ed25519 … # dc2hana3:22 SSH-2.0-OpenSSH_8.7 dc2hana3 ssh-ed25519 … # dc2hana4:22 SSH-2.0-OpenSSH_8.7 dc2hana4 ssh-ed25519 …-
-f -allows you to provide a list of hosts on the standard input. Instead of the-you can use a file, which you prepare upfront with the list of hosts. You can also enter a single hostname instead of the-fparameter to collect the key of one host at a time. -
In the case of the standard input list you end the input with
Ctrlandd. -
The
>>shell redirection after the scan command directly appends the collected keys to theknown_hostsfile. If the file does not exist yet it is created in the process.
-
Verification
Check the
known_hostsentries, for example, ondc1hana1:[root]# cat /root/.ssh/known_hostsdc1hana1 ssh-ed25519 ******************************... dc1hana2 ssh-ed25519 ******************************... dc1hana3 ssh-ed25519 ******************************... dc1hana4 ssh-ed25519 ******************************... dc2hana1 ssh-ed25519 ******************************... dc2hana2 ssh-ed25519 ******************************... dc2hana3 ssh-ed25519 ******************************... dc2hana4 ssh-ed25519 ******************************...Test the access from each source system to every remote system and ensure that every connection direction that you possibly need works without interactive prompts:
[root]# ssh <remote_system>
3.6. Installing a scale-out SAP HANA instance Copy linkLink copied to clipboard!
A HANA scale-out configuration consists of at least 2 HANA instances per system replication site.
Install the HANA instances with the same SID and instance number on all nodes. The setup of the system replication sites must be identical.
The following installation steps are an example of an interactive installation using the command-line interface. Check the SAP HANA Server Installation and Update Guide for more information about installation options and other details.
Prerequisites
- You have installed and configured RHEL 9 on all cluster nodes according to the Operating system requirements.
- You have prepared the details for your HANA instances, see SAP HANA planning.
- You have followed the SAP software download guides in Software Download, downloaded the SAP HANA installation media from the SAP Software Download Center and the media is available on each node.
- You have verified that you can resolve the host names of the additional nodes of one site from the main node of the site.
- You have verified that you can connect to the additional nodes of one site from the main node using the root user and ssh.
- You have configured a time synchronization service on all nodes. See Configuring time synchronization for details. You have configured your OS or network firewall services to enable all required communication between the HANA systems. See Configuring the firewalld service for the SAP landscape for references.
Procedure
Go to the directory which contains the installation media, for example,
/sapmedia/hana:[root]# cd /sapmedia/hanaUnpack the installation media:
[root]# unzip <sap_hana_software>.ZIPGo into the path of the unpacked installation media:
[root]# cd /sapmedia/hana/DATA_UNITS/HDB_LCM_LINUX_<arch>Run the SAP HANA Lifecycle Management tool (HDBLCM) for an interactive installation:
[root]# ./hdblcmIn the interactive mode the installer asks you for all the required information, including the System ID (SID), Installation Number (instance), the filesystem location of data and log volumes, and more.
In a scale-out installation you run the installer on the main node of one HANA site and provide any additional nodes of the same site as an installation parameter. For example, you run the installer for site 1 on
dc1hana1and add nodedc1hana2as an additional host name to add when the prompt asks for it.Optionally you can use the batch mode of the command-line installation tool and provide your configuration parameters in one step. For more details see Use Batch Mode to Perform Platform LCM Tasks in the SAP HANA Server Installation and Update Guide.
- Repeat all steps on the main node of the second site. For the HANA system replication to work you must ensure that each HANA site consists of the same amount of systems with an identical HANA configuration.
Verification
Switch to the
<sid>admuser:[root]# su - rh1admCheck the HANA instance runtime information as user
<sid>adm:rh1adm$HDB infoUSER PID PPID %CPU VSZ RSS COMMAND rh1adm 12525 12524 0.2 8836 5568 -sh rh1adm 12584 12525 0.0 7520 3968 \_ /bin/sh /usr/sap/RH1/HDB02/HDB info rh1adm 12613 12584 0.0 10104 3484 \_ ps fx -U rh1adm -o user:8,pid:8,ppid:8,pcpu:5,vsz:10,rss:10,args rh1adm 8813 1 0.0 566804 41000 hdbrsutil --start --port 30203 --volume 3 … rh1adm 8124 1 0.0 566724 40972 hdbrsutil --start --port 30201 --volume 1 … rh1adm 7947 1 0.0 9312 3352 sapstart pf=/usr/sap/RH1/SYS/profile/RH1_HDB02_dc1hana1 rh1adm 7955 7947 0.0 460036 89176 \_ /usr/sap/RH1/HDB02/dc1hana1/trace/hdb.sapRH1_HDB02 -d -nw -f /usr/sap/RH1/HDB02/dc1hana1/daemon.ini pf=/usr/sap/RH1/SYS/profile/RH1_HDB02_dc1hana1 rh1adm 7981 7955 26.1 18612328 14092076 \_ hdbnameserver rh1adm 8642 7955 0.5 1465380 212048 \_ hdbcompileserver rh1adm 8645 7955 294 6616736 6049012 \_ hdbpreprocessor rh1adm 8687 7955 33.9 18931580 14929092 \_ hdbindexserver -port 30203 rh1adm 8690 7955 2.0 5073572 1390440 \_ hdbxsengine -port 30207 rh1adm 9202 7955 0.8 2772836 482088 \_ hdbwebdispatcher rh1adm 7782 1 0.1 566772 58444 /usr/sap/RH1/HDB02/exe/sapstartsrv pf=/usr/sap/RH1/SYS/profile/RH1_HDB02_dc1hana1 root 11868 7782 0.1 10464 4644 \_ sapuxuserchk 0 128Verify as
<sid>admon all sites that the HANA instances are running on all nodes in the site and their status isGREENin the instance list, for example, on site 1:rh1adm$sapcontrol -nr ${TINSTANCE} -function GetSystemInstanceListhostname, instanceNr, httpPort, httpsPort, startPriority, features, dispstatus dc1hana4, 2, 50213, 50214, 0.3, HDB|HDB_STANDBY, GREEN dc1hana1, 2, 50213, 50214, 0.3, HDB|HDB_WORKER, GREEN dc1hana3, 2, 50213, 50214, 0.3, HDB|HDB_WORKER, GREEN dc1hana2, 2, 50213, 50214, 0.3, HDB|HDB_WORKER, GREENAdditionally, you can verify the
landscapeHostConfiguration.pyoutput for statusok:rh1adm$cdpy; python landscapeHostConfiguration.py| Host | Host | Host | Failover | Remove | Storage | Storage | Failover | Failover | NameServer | NameServer | IndexServer | IndexServer | Host | Host | Worker | Worker | | | Active | Status | Status | Status | Config | Actual | Config | Actual | Config | Actual | Config | Actual | Config | Actual | Config | Actual | | | | | | | Partition | Partition | Group | Group | Role | Role | Role | Role | Roles | Roles | Groups | Groups | | -------- | ------ | ------ | -------- | ------ | --------- | --------- | -------- | -------- | ---------- | ---------- | ----------- | ----------- | ------- | ------- | ------- | ------- | | dc1hana1 | yes | ok | | | 1 | 1 | default | default | master 1 | master | worker | master | worker | worker | default | default | | dc1hana2 | yes | ok | | | 2 | 2 | default | default | master 2 | slave | worker | slave | worker | worker | default | default | | dc1hana3 | yes | ok | | | 3 | 3 | default | default | slave | slave | worker | slave | worker | worker | default | default | | dc1hana4 | yes | ignore | | | 0 | 0 | default | default | master 3 | slave | standby | standby | standby | standby | default | - | overall host status: okCheck that the systemd units are installed for the HANA instance and the SAP Host Agent:
[root]# systemctl list-unit-files --all sap* SAP*UNIT FILE STATE PRESET sapmedia.mount generated - saphostagent.service enabled disabled sapinit.service generated - SAPRH1_02.service enabled disabled SAP.slice static - 5 unit files listed.-
Repeat the steps on all nodes. Note that the HANA profiles contain the individual node name in the format
<SID>_HDB<instance>_<node>.
3.7. Disabling SAP HANA instance autostart Copy linkLink copied to clipboard!
The cluster controls startup and shutdown of the HANA instance in a HA cluster setup. You must configure the HANA instance profile to not automatically start the instance itself.
Procedure
Go to the HANA instance profile directory:
[root]# cd /hana/shared/<SID>/profileEdit the instance profile:
[root]# vi <SID>_HDB<instance>_<hostname>Ensure that
Autostartis set to0.- Repeat the previous steps for each HANA instance that will be managed as part of the HA cluster.
Verification
Check that
Autostart = 0is set in the instance profiles of all HANA instances that will be managed by the HA cluster:[root]# grep Autostart /hana/shared/RH1/profile/*/hana/shared/RH1/profile/RH1_HDB02_dc1hana1:Autostart = 0 /hana/shared/RH1/profile/RH1_HDB02_dc1hana2:Autostart = 0 /hana/shared/RH1/profile/RH1_HDB02_dc1hana3:Autostart = 0 /hana/shared/RH1/profile/RH1_HDB02_dc1hana4:Autostart = 0
Chapter 4. Configuring the SAP HANA system replication Copy linkLink copied to clipboard!
You must configure and test the SAP HANA system replication before you can configure the HANA instance in a cluster. Follow the SAP guidelines for the HANA system replication setup: SAP HANA System Replication: Configuration.
4.1. Prerequisites for the SAP HANA system replication setup Copy linkLink copied to clipboard!
SAP HANA configuration
SAP HANA must be installed and configured identically on the system replication sites.
Host name resolution
All hosts must be able to resolve the host names and fully qualified domain names (FQDN) of all HANA systems. To ensure that all host names can be resolved even without DNS you can place them into /etc/hosts. This is also recommended for hosts configured in HA clusters in general.
In addition you can manage host name resolution in SAP HANA internally. For more details see Internal Host Name Resolution and Host Name Resolution for System Replication. Ensure that the HANA internal host names and /etc/hosts entries are consistent.
As documented at hostname | SAP Help Portal, SAP HANA only supports hostnames with lowercase characters.
SAP HANA log_mode
For the system replication to work, you must set the SAP HANA log_mode variable to normal, which is the default value.
Verify the current log_mode as the HANA administrative user <sid>adm on both nodes:
rh1adm$hdbsql -u system -i ${TINSTANCE} \
"select value from "SYS"."M_INIFILE_CONTENTS" where key='log_mode'"
Password: <HANA_SYSTEM_PASSWORD>
VALUE "normal"
1 row selected
4.2. Performing an initial HANA database backup Copy linkLink copied to clipboard!
You can only enable the HANA system replication when an initial backup of the SAP HANA database exists on the primary site for the planned SAP HANA system replication setup.
You can use SAP HANA tools to create the backup and skip the manual procedure. See SAP HANA Administration Guide - SAP HANA Database Backup and Recovery for more information.
Prerequisites
-
You have a writable directory to which the backup files are saved for the SAP HANA administrative user
<sid>adm. - You have sufficient free space available in the filesystem on which the backup files are stored.
Procedure
Optional: Create a dedicated directory for the backup in a suitable path, for example:
[root]# mkdir <path>/<SID>-backupReplace
<path>with a path on your system, which has enough free space for the initial backup files.Change the owner of the backup path to user
<sid>admif the target directory is not already owned or writable by the HANA user, for example:[root]# chown <sid>adm:sapsys <path>/<SID>-backupChange to the
<sid>admuser for the remaining steps:[root]# su - <sid>admCreate a backup of the
SYSTEMDBas the<sid>admuser. Specify the path to the files the backups will be stored in. Ensure that the target filesystem has enough free space left, then create the backup:rh1adm$ hdbsql -i ${TINSTANCE} -u system -d SYSTEMDB \ "BACKUP DATA USING FILE ('<path>/${SAPSYSTEMNAME}-backup/bkp-SYS')" Password: <HANA_SYSTEM_PASSWORD>-
$TINSTANCEand$SAPSYSTEMNAMEare environment variables that are part of the<sid>admuser shell environment.$TINSTANCEis the instance number and$SAPSYSTEMNAMEis the SID. Both are automatically set to the instance values related to the<sid>admuser. -
Replace
<path>with the path where the<sid>admuser has write access and where there is enough free space left.
-
Create a backup of all tenant databases as the
<sid>admuser. Specify the path to the files the backups will be stored in. Ensure that the target filesystem has enough free space left. Create the tenant DB backup:rh1adm$ hdbsql -i ${TINSTANCE} -u system -d SYSTEMDB \ "BACKUP DATA FOR ${SAPSYSTEMNAME} USING FILE ('<path>/${SAPSYSTEMNAME}-backup/bkp-${SAPSYSTEMNAME}')" Password: <HANA_SYSTEM_PASSWORD>Replace
<path>with the path where the<sid>admuser has write access and where there is enough free space left.
Verification
List the resulting backup files. Example when using
/hana/log/RH1-backupas the directory to store the initial backup:rh1adm$ ls -lh /hana/log/RH1-backup/total 7.4G -rw-r-----. 1 rh1adm sapsys 156K Dec 9 16:13 bkp-RH1_databackup_0_1 -rw-r-----. 1 rh1adm sapsys 81M Dec 9 16:13 bkp-RH1_databackup_2_1 -rw-r-----. 1 rh1adm sapsys 3.6G Dec 9 16:13 bkp-RH1_databackup_3_1 -rw-r-----. 1 rh1adm sapsys 81M Dec 9 16:13 bkp-RH1_databackup_4_1 -rw-r-----. 1 rh1adm sapsys 81M Dec 9 16:13 bkp-RH1_databackup_5_1 -rw-r-----. 1 rh1adm sapsys 172K Dec 9 16:11 bkp-SYS_databackup_0_1 -rw-r-----. 1 rh1adm sapsys 3.6G Dec 9 16:12 bkp-SYS_databackup_1_1Use the HANA command
hdbbackupcheckto confirm the sanity of each backup file you created:rh1adm$ for i in $(ls /hana/log/RH1-backup/*); do hdbbackupcheck $i; doneLoaded library 'libhdbcsaccessor' Loaded library 'libhdblivecache' Loaded library 'libhdbcsaccessor' Loaded library 'libhdblivecache' Loaded library 'libhdbcsaccessor' Loaded library 'libhdblivecache' Backup '/hana/log/RH1-backup/bkp-RH1_databackup_0_1' successfully checked. Loaded library 'libhdbcsaccessor' Loaded library 'libhdblivecache' Backup '/hana/log/RH1-backup/bkp-RH1_databackup_2_1' successfully checked. Loaded library 'libhdbcsaccessor' Loaded library 'libhdblivecache' Backup '/hana/log/RH1-backup/bkp-RH1_databackup_3_1' successfully checked. Loaded library 'libhdbcsaccessor' Loaded library 'libhdblivecache' Backup '/hana/log/RH1-backup/bkp-RH1_databackup_4_1' successfully checked. Loaded library 'libhdbcsaccessor' Loaded library 'libhdblivecache' Backup '/hana/log/RH1-backup/bkp-RH1_databackup_5_1' successfully checked. Loaded library 'libhdbcsaccessor' Loaded library 'libhdblivecache' Loaded library 'libhdbcsaccessor' Loaded library 'libhdblivecache' Loaded library 'libhdbcsaccessor' Loaded library 'libhdblivecache' Backup '/hana/log/RH1-backup/bkp-SYS_databackup_0_1' successfully checked. Loaded library 'libhdbcsaccessor' Loaded library 'libhdblivecache' Backup '/hana/log/RH1-backup/bkp-SYS_databackup_1_1' successfully checked.
Troubleshooting
The backup fails when the
<sid>admuser is not able to write to the target directory:* 447: backup could not be completed: [2001003] createDirectory(path= '/tmp/RH1-backup/', access= rwxrwxr--, recursive= true): Permission denied (rc= 13, 'Permission denied') SQLSTATE: HY000Ensure that the
<sid>admuser can create files inside of the target directory you define in the backup command. Fix the permissions, for example using step 2 of the procedure.The backup fails because the target filesystem runs out of space:
* 447: backup could not be completed: [2110001] Generic stream error: $msg$ - , rc=$sysrc$: $sysmsg$. Failed to process item 0x00007fc5796e0000 - '<root>/.bkp-RH1_databackup_3_1' ((open, mode= W, file_access= rw-r-----, flags= ASYNC|DIRECT|TRUNCATE|UNALIGNED_SIZE, size= 4096), factory= (root= '/tmp/RH1-backup/' (root_access= rwxr-x---, flags= AUTOCREATE_PATH|DISKFULL_ERROR, usage= DATA_BACKUP, fs= xfs, config= (async_write_submit_active=on,async_write_submit_blocks=all,async_read_submit=on,num_submit_queues=1,num_completion_queues=1,size_kernel_io_queue=512,max_parallel_io_requests=64,min_submit_batch_size=16,max_submit_batch_size=64)) SQLSTATE: HY000Check the free space of the filesystem on which the target directory is located. Increase the filesystem size or choose a different path with enough free space available for the backup files.
4.3. Configuring the primary HANA replication site Copy linkLink copied to clipboard!
Enable the HANA system replication on a system which you plan to be the initial primary site of your planned system replication setup.
Prerequisites
- You have created an initial backup for the HANA database on the primary node site based on steps described in Performing an initial HANA database backup.
Procedure
Enable the system replication on the HANA site that becomes the initial primary. Run the command as
<sid>admon the first, or primary, node:rh1adm$hdbnsutil -sr_enable --name=<site1>nameserver is active, proceeding ... successfully enabled system as system replication source site done.-
Replace
<site1>with your primary HANA site name, for example,DC1.
-
Replace
Verification
Check the system replication configuration as
<sid>adm, and verify that it shows the current node asmode: primary, and thatsite idandsite nameare populated with the primary site information:rh1adm$hdbnsutil -sr_stateConfigurationSystem Replication State ~~~~~~~~~~~~~~~~~~~~~~~~ mode: primary site id: 1 site name: DC1 done.
4.4. Configuring the secondary HANA replication site Copy linkLink copied to clipboard!
You must register the secondary HANA site to the primary site to complete the setup of the HANA system replication.
Prerequisites
- You have installed SAP HANA on the secondary nodes using the same SID and instance number as the primary instances.
- You have configured SSH public-key access for the root user between the cluster nodes.
- You have configured the firewall rules in your network infrastructure or on each host operating system to allow the connections that your HANA landscape requires for the system replication connection.
-
You have opened 2 terminals on a secondary node, for example, on
dc2hana1with one terminal for therootuser and one for the<sid>admuser.
Procedure
Stop the secondary HANA instances. Run as the
<sid>admuser on one secondary instance, for example, ondc2hana1:rh1adm$ sapcontrol -nr ${TINSTANCE} -function StopSystem HDBRun this and the following steps only on one node on the secondary site, for example, on
dc2hana1. Change to the directory on the shared filesystem where HANA stores the keys of the system replication encryption on one node of the secondary site:[root]# cd /hana/shared/<SID>/global/security/rsecssfsCopy the HANA system PKI file
SSFS_<SID>.KEYfrom the primary HANA site to the secondary site on one secondary node:[root]# rsync -av <node1>:$PWD/key/SSFS_<SID>.KEY key/-
Replace
<node1>with a primary node, for example,dc1hana1. -
Replace
<SID>with your HANA SID, for example,RH1.
-
Replace
Copy the PKI file
SSFS_<SID>.DATfrom the primary HANA site to the secondary site on one secondary node in the same way as the previous step:[root]# rsync -av <node1>:$PWD/data/SSFS_<SID>.DAT data/Register the secondary HANA site to the primary site. Run this in the
<sid>admuser terminal on a secondary node:rh1adm$ hdbnsutil -sr_register --remoteHost=<node1> \ --remoteInstance=${TINSTANCE} --replicationMode=sync \ --operationMode=logreplay --name=<site2>adding site ... collecting information ... updating local ini files ... done.-
Replace
<node1>with a primary node, for example,dc1hana1. -
Replace
<site2>with your secondary HANA site name, for example,DC2. -
Choose the values for
replicationModeandoperationModeaccording to your requirements for the system replication. -
$TINSTANCEis an environment variable that is set automatically for user<sid>admby reading the HANA instance profile. The variable value is the HANA instance number.
-
Replace
Start the secondary HANA instances. Run as
<sid>admon one HANA instance on the secondary site:rh1adm$ sapcontrol -nr ${TINSTANCE} -function StartSystem HDB
Verification
Check that the system replication is running on the secondary site and the
modematches the value you used for thereplicationModeparameter in thehdbnsutil -sr_registercommand. Run as<sid>admon one node on the secondary site, for example,dc2hana1:rh1adm$ hdbnsutil -sr_stateConfigurationSystem Replication State ~~~~~~~~~~~~~~~~~~~~~~~~ mode: sync site id: 2 site name: DC2 active primary site: 1 primary masters: dc1hana1 dc1hana2 dc1hana4Change to the Python script directory of the HANA instance as user
<sid>adm, on both sites. The easiest way to do this is to usecdpy, which is an alias built into the<sid>admuser shell that SAP HANA populates during the instance installation:rh1adm$ cdpyIn our example this command changes the current directory to
/usr/sap/RH1/HDB02/exe/python_support/.Show the current status of the established HANA system replication, on both HANA sites.
On the primary site the system replication status is always displayed with all details:
rh1adm$ python systemReplicationStatus.py|Database |Host |Port |Service Name |Volume ID |Site ID |Site Name |Secondary |Secondary |Secondary |Secondary |Secondary |Replication |Replication |Replication |Secondary | | | | | | | | |Host |Port |Site ID |Site Name |Active Status |Mode |Status |Status Details |Fully Synced | |-------- |-------- |----- |------------ |--------- |------- |--------- |--------- |--------- |--------- |--------- |------------- |----------- |----------- |-------------- |------------ | |RH1 |dc1hana3 |30203 |indexserver | 5 | 1 |DC1 |dc2hana3 | 30203 | 2 |DC2 |YES |SYNC |ACTIVE | | True | |RH1 |dc1hana2 |30203 |indexserver | 4 | 1 |DC1 |dc2hana2 | 30203 | 2 |DC2 |YES |SYNC |ACTIVE | | True | |SYSTEMDB |dc1hana1 |30201 |nameserver | 1 | 1 |DC1 |dc2hana1 | 30201 | 2 |DC2 |YES |SYNC |ACTIVE | | True | |RH1 |dc1hana1 |30207 |xsengine | 2 | 1 |DC1 |dc2hana1 | 30207 | 2 |DC2 |YES |SYNC |ACTIVE | | True | |RH1 |dc1hana1 |30203 |indexserver | 3 | 1 |DC1 |dc2hana1 | 30203 | 2 |DC2 |YES |SYNC |ACTIVE | | True | status system replication site "2": ACTIVE overall system replication status: ACTIVE Local System Replication State ~~~~~~~~~~ mode: PRIMARY site id: 1 site name: DC1-
ACTIVEmeans that the HANA system replication is in a healthy state and fully synced. -
SYNCINGis displayed while the system replication is being updated on the secondary site, for example after a takeover of the secondary site to become the new primary. -
INITIALIZINGis shown after the system replication has first been enabled or a full sync has been triggered.
-
On the secondary site, the output of
systemReplicationStatus.pyis less detailed. Check the status on one secondary node:rh1adm$ python systemReplicationStatus.py this system is either not running or not primary system replication site Local System Replication State ~~~~~~~~~~ mode: SYNC site id: 2 site name: DC2 active primary site: 1 primary masters: dc1hana1 dc1hana2 dc1hana4
4.5. Testing the HANA system replication Copy linkLink copied to clipboard!
We recommend that you test the HANA system replication thoroughly before you proceed with the cluster setup. The verification of the correct system replication behavior can help to prevent unexpected results when the HA cluster manages the system replication afterwards.
Use timeout values in the cluster resource configuration that cover the measured times of the different tests to ensure that cluster resource operations do not time out prematurely.
You can also test different parameter values in the HANA configuration to optimize the performance by measuring the time that certain activities take when performed manually outside of cluster control.
Perform the tests with realistic data loads and sizes.
Full replication
- How long does the synchronization take after the newly registered secondary is started until it is in sync with the primary?
- Are there parameters which can improve the synchronization time?
Lost connection
- How long does it take after the connection was lost between the primary and the secondary site, until they are in sync again?
- Are there parameters which can improve the reconnection and sync times?
Takeover
- How long does the secondary site take to be fully available after a takeover from the primary?
- What is the time difference between a normal takeover and a "takeover with handshake"?
- Are there parameters which can improve the takeover time?
Data consistency
- Is the data you create available and correct after you perform a takeover?
Client reconnect
- Can the client reconnect to the new primary site after a takeover?
- How long does it take for the client to access the new primary after a takeover?
Primary becomes secondary
- How long does it take a former primary until it is in sync again with the new primary, after it is registered as a new secondary?
- If configured, how long does it take until a client can access the newly registered secondary for read operations?
Chapter 5. Configuring the Pacemaker cluster Copy linkLink copied to clipboard!
5.1. Deploying the basic cluster configuration Copy linkLink copied to clipboard!
The following basic cluster setup covers the minimum steps to get started with the Pacemaker cluster setup for managing SAP instances. Apply the steps to include all the nodes according to your planned cluster configuration.
For more information on settings and options for complex configurations, refer to the documentation for the RHEL HA Add-On, for example, Create a high availability cluster with multiple links.
Prerequisites
- You have set up the HANA system replication environment and verified that it is working correctly.
- You have configured the RHEL High Availability repository on all systems that are going to be nodes of this cluster.
- You have verified fencing and quorum requirements according to your planned environment. For more details see HA cluster requirements.
Procedure
Install the Red Hat High Availability Add-On software packages from the High Availability repository. Choose which fence agents you want to install and execute the installation on all cluster nodes.
[root]# dnf install pcs pacemaker fence-agents-<model>Start and enable the
pcsdservice on all cluster nodes. The--nowparameter automatically starts the enabled service:[root]# systemctl enable --now pcsd.serviceOptional: If you use the local
firewalldservice you must enable the ports that are required by the Red Hat High Availability Add-On. Run this on all cluster nodes:[root]# firewall-cmd --add-service=high-availability [root]# firewall-cmd --runtime-to-permanentSet a password for the user
hacluster. Repeat the command on each node using the same password:[root]# passwd haclusterAuthenticate the user
haclusterfor each node in the cluster. Run this on one node and apply all cluster node names, for example,dc1hana1todc2hana4:[root]# pcs host auth <node1> … <node8> Username: hacluster Password:dc1hana2: Authorized dc2hana1: Authorized dc1hana4: Authorized dc1hana3: Authorized dc2hana2: Authorized dc1hana1: Authorized dc2hana4: Authorized dc2hana3: Authorized-
Enter the node names with or without FQDN, as defined in the
/etc/hostsfile. -
Enter the
haclusteruser password in the prompt.
-
Enter the node names with or without FQDN, as defined in the
Create the cluster with a unique name and provide the names of all cluster members with fully qualified host names. This propagates the cluster configuration on all nodes and starts the cluster with the defined cluster name. Run this command on one node and apply all cluster node names, for example,
dc1hana1todc2hana4:[root]# pcs cluster setup <cluster_name> --start <node1> … <node8>No addresses specified for host 'dc1hana1', using 'dc1hana1' No addresses specified for host 'dc1hana2', using 'dc1hana2' No addresses specified for host 'dc1hana3', using 'dc1hana3' No addresses specified for host 'dc1hana4', using 'dc1hana4' No addresses specified for host 'dc2hana1', using 'dc2hana1' No addresses specified for host 'dc2hana2', using 'dc2hana2' No addresses specified for host 'dc2hana3', using 'dc2hana3' No addresses specified for host 'dc2hana4', using 'dc2hana4' Destroying cluster on hosts: 'dc1hana1', 'dc1hana2', 'dc1hana3', 'dc1hana4', 'dc2hana1', 'dc2hana2', 'dc2hana3', 'dc2hana4'... dc1hana2: Successfully destroyed cluster dc2hana1: Successfully destroyed cluster dc2hana4: Successfully destroyed cluster dc2hana2: Successfully destroyed cluster dc1hana3: Successfully destroyed cluster dc1hana1: Successfully destroyed cluster dc2hana3: Successfully destroyed cluster dc1hana4: Successfully destroyed cluster Requesting remove 'pcsd settings' from 'dc1hana1', 'dc1hana2', 'dc1hana3', 'dc1hana4', 'dc2hana1', 'dc2hana2', 'dc2hana3', 'dc2hana4' dc1hana1: successful removal of the file 'pcsd settings' dc1hana2: successful removal of the file 'pcsd settings' dc1hana3: successful removal of the file 'pcsd settings' dc1hana4: successful removal of the file 'pcsd settings' dc2hana1: successful removal of the file 'pcsd settings' dc2hana2: successful removal of the file 'pcsd settings' dc2hana3: successful removal of the file 'pcsd settings' dc2hana4: successful removal of the file 'pcsd settings' Sending 'corosync authkey', 'pacemaker authkey' to 'dc1hana1', 'dc1hana2', 'dc1hana3', 'dc1hana4', 'dc2hana1', 'dc2hana2', 'dc2hana3', 'dc2hana4' dc1hana1: successful distribution of the file 'corosync authkey' dc1hana1: successful distribution of the file 'pacemaker authkey' dc1hana2: successful distribution of the file 'corosync authkey' dc1hana2: successful distribution of the file 'pacemaker authkey' dc1hana3: successful distribution of the file 'corosync authkey' dc1hana3: successful distribution of the file 'pacemaker authkey' dc1hana4: successful distribution of the file 'corosync authkey' dc1hana4: successful distribution of the file 'pacemaker authkey' dc2hana1: successful distribution of the file 'corosync authkey' dc2hana1: successful distribution of the file 'pacemaker authkey' dc2hana2: successful distribution of the file 'corosync authkey' dc2hana2: successful distribution of the file 'pacemaker authkey' dc2hana3: successful distribution of the file 'corosync authkey' dc2hana3: successful distribution of the file 'pacemaker authkey' dc2hana4: successful distribution of the file 'corosync authkey' dc2hana4: successful distribution of the file 'pacemaker authkey' Sending 'corosync.conf' to 'dc1hana1', 'dc1hana2', 'dc1hana3', 'dc1hana4', 'dc2hana1', 'dc2hana2', 'dc2hana3', 'dc2hana4' dc1hana1: successful distribution of the file 'corosync.conf' dc1hana2: successful distribution of the file 'corosync.conf' dc1hana3: successful distribution of the file 'corosync.conf' dc1hana4: successful distribution of the file 'corosync.conf' dc2hana1: successful distribution of the file 'corosync.conf' dc2hana2: successful distribution of the file 'corosync.conf' dc2hana3: successful distribution of the file 'corosync.conf' dc2hana4: successful distribution of the file 'corosync.conf' Cluster has been successfully set up. Starting cluster on hosts: 'dc1hana1', 'dc1hana2', 'dc1hana3', 'dc1hana4', 'dc2hana1', 'dc2hana2', 'dc2hana3', 'dc2hana4'...Enable the cluster to start automatically on system start on all cluster nodes, which enables the
corosyncandpacemakerservices. Skip this step if you prefer to manually control the start of the cluster after a node restarts. Run on one node:[root]# pcs cluster enable --alldc1hana1: Cluster Enabled dc1hana2: Cluster Enabled dc1hana3: Cluster Enabled dc1hana4: Cluster Enabled dc2hana1: Cluster Enabled dc2hana2: Cluster Enabled dc2hana3: Cluster Enabled dc2hana4: Cluster Enabled
Verification
Check the cluster status after the initial configuration. Verify that it shows all cluster nodes as
Onlineand the status of all cluster daemons isactive/enabled:[root]# pcs status --fullCluster name: hana-scaleout-cluster WARNINGS: No stonith devices and stonith-enabled is not false Status of pacemakerd: 'Pacemaker is running' (last updated 2025-12-10 13:47:29Z) Cluster Summary: * Stack: corosync * Current DC: dc1hana4 (4) (version 2.1.5-9.el9_2.5-a3f44794f94) - partition with quorum * Last updated: Wed Dec 10 13:47:30 2025 * Last change: Wed Dec 10 13:45:23 2025 by hacluster via crmd on dc1hana4 * 8 nodes configured * 0 resource instances configured Node List: * Node dc1hana1 (1): online, feature set 3.16.2 * Node dc1hana2 (2): online, feature set 3.16.2 * Node dc1hana3 (3): online, feature set 3.16.2 * Node dc1hana4 (4): online, feature set 3.16.2 * Node dc2hana1 (5): online, feature set 3.16.2 * Node dc2hana2 (6): online, feature set 3.16.2 * Node dc2hana3 (7): online, feature set 3.16.2 * Node dc2hana4 (8): online, feature set 3.16.2 Full List of Resources: * No resources Migration Summary: Tickets: PCSD Status: dc1hana1: Online dc1hana2: Online dc1hana3: Online dc1hana4: Online dc2hana1: Online dc2hana2: Online dc2hana3: Online dc2hana4: Online Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled
Next steps
- Configure a fencing method to enable the STONITH mechanism. See Configuring fencing in a Red Hat High Availability cluster.
- Test the fencing setup before you proceed with further configuration of the cluster. For more information, see How to test fence devices and fencing configuration in a Red Hat High Availability cluster?
- Configure a quorum device, see Configuring a quorum device in the cluster.
5.2. Configuring general cluster properties Copy linkLink copied to clipboard!
You must adjust cluster resource defaults to avoid unnecessary failovers of the resources.
Procedure
Run the following command on one cluster node to update the default values of the
resource-stickinessandmigration-thresholdparameters:[root]# pcs resource defaults update \ resource-stickiness=1000 \ migration-threshold=5000-
resource-stickiness=1000encourages the resource to stay running where it is. This prevents the cluster from moving the resources based on light internal health indicators. -
migration-threshold=5000enables the resource to be restarted on a node after 5000 failures. After exceeding this limit, the resource is blocked on the node until the failure has been cleared. This allows resource recovery after a few failures until an administrator can investigate the cause of the repeated failure and reset the counter.
-
Verification
Check that the resource defaults are set:
[root]# pcs resource defaultsMeta Attrs: build-resource-defaults migration-threshold=5000 resource-stickiness=1000
5.3. Configuring the systemd-based SAP startup framework Copy linkLink copied to clipboard!
Systemd integration is the default behavior of SAP HANA installations on RHEL 9 for SAP HANA 2.0 SPS07 revision 70 and newer. In HA environments you must apply additional modifications to integrate the different systemd services that are involved in the cluster setup.
Configure the pacemaker systemd service to manage the HANA instance systemd service in the correct order on all cluster nodes running HANA instances..
Prerequisites
You have installed the HANA instances with systemd integration and have checked the systemd integration on all HANA nodes, for example:
[root]# systemctl list-units --all SAP*UNIT LOAD ACTIVE SUB DESCRIPTION SAPRH1_02.service loaded active running SAP Instance SAPRH1_02 SAP.slice loaded active active SAP Slice ...
Procedure
Create the directory
/etc/systemd/system/pacemaker.service.d/for the pacemaker service drop-in file:[root]# mkdir /etc/systemd/system/pacemaker.service.d/Create the systemd drop-in file for the pacemaker service with the following content:
[root]# cat << EOF > /etc/systemd/system/pacemaker.service.d/00-pacemaker.conf [Unit] Description=Pacemaker needs the SAP HANA instance service Wants=SAP<SID>_<instance>.service After=SAP<SID>_<instance>.service EOF-
Replace
<SID>with your HANA SID. -
Replace
<instance>with your HANA instance number.
-
Replace
Reload the
systemctldaemon to activate the drop-in file:[root]# systemctl daemon-reload- Repeat steps 1-3 on the other HANA cluster nodes.
Verification
Check the systemd service of your HANA instance and verify that it is
loaded:[root]# systemctl status SAPRH1_02.service● SAPRH1_02.service - SAP Instance SAPRH1_02 Loaded: loaded (/etc/systemd/system/SAPRH1_02.service; disabled; preset: disabled) Active: active (running) since xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Main PID: 5825 (sapstartsrv) Tasks: 841 Memory: 88.6G CPU: 4h 50min 2.033s CGroup: /SAP.slice/SAPRH1_02.service ├─ 5825 /usr/sap/RH1/HDB02/exe/sapstartsrv pf=/usr/sap/RH1/SYS/profile/RH1_HDB02_dc1hana1 ├─ 5986 sapstart pf=/usr/sap/RH1/SYS/profile/RH1_HDB02_dc1hana1 ├─ 5993 /usr/sap/RH1/HDB02/dc1hana1/trace/hdb.sapRH1_HDB02 -d -nw -f /usr/sap/RH1/HDB02/dc1hana1/daemon.ini pf=/usr/sap/RH1/SYS/profile/RH1_HDB02_dc1hana1 ...Verify that the SAP HANA instance service is known to the pacemaker service now:
[root]# systemctl show pacemaker.service | grep -E 'Wants=|After=|SAP.{6}.service'Wants=SAPRH1_02.service resource-agents-deps.target dbus-broker.service After=... SAPRH1_02.service …Make sure that the
SAP<SID>_<instance>.serviceis listed in theAfter=andWants=lists.
5.4. Installing the SAP HANA HA components Copy linkLink copied to clipboard!
The resource-agents-sap-hana-scaleout RPM package in the Red Hat Enterprise Linux 9 for <arch> - SAP Solutions (RPMs) repository provides resource agents and other SAP HANA specific components for setting up a HA cluster for managing HANA system replication setup.
Procedure
Install the
resource-agents-sap-hana-scaleoutpackage on all cluster nodes:[root]# dnf install resource-agents-sap-hana-scaleout
Verification
Check on all nodes that the package is installed, for example:
[root]# rpm -q resource-agents-sap-hana-scaleoutresource-agents-sap-hana-scaleout-0.185.3-0.el9_2.noarch
5.5. Configuring the SAPHanaSR HA/DR provider for the srConnectionChanged() hook method Copy linkLink copied to clipboard!
When you configure the HANA instance in a HA cluster setup with SAP HANA 2.0 SPS0 or later, you must enable and test the SAP HANA srConnectionChanged() hook method before proceeding with the cluster setup.
Prerequisites
-
You have installed the
resource-agents-sap-hana-scaleoutpackage. - Your HANA instance is not yet managed by the cluster. Otherwise, use the maintenance procedure Performing maintenance on the SAP HANA instances to make sure that the cluster does not interfere during the configuration of the hook scripts.
Procedure
Stop the HANA instances on all nodes. Run this as the
<sid>admuser on one HANA instance per site:rh1adm$ sapcontrol -nr ${TINSTANCE} -function StopSystem HDBVerify as
<sid>admon all sites that the HANA instances are stopped completely and their status isGRAYin the instance list. Run this on one host on each site:rh1adm$ sapcontrol -nr ${TINSTANCE} -function GetSystemInstanceListGetSystemInstanceList OK hostname, instanceNr, httpPort, httpsPort, startPriority, features, dispstatus dc1hana2, 2, 50213, 50214, 0.3, HDB|HDB_WORKER, GRAY dc1hana3, 2, 50213, 50214, 0.3, HDB|HDB_WORKER, GRAY dc1hana1, 2, 50213, 50214, 0.3, HDB|HDB_WORKER, GRAY dc1hana4, 2, 50213, 50214, 0.3, HDB|HDB_STANDBY, GRAYChange to the HANA configuration directory, as the
<sid>admuser, using the command aliascdcoc, which is built into the<sid>admuser shell. This automatically changes into the/hana/shared/<SID>/global/hdb/custom/config/path:rh1adm$ cdcocUpdate the
global.inifile of the SAP HANA site to configure theSAPHanaSRhook. Edit the configuration file on one node of each HANA site and add the following configuration:[ha_dr_provider_SAPHanaSR] provider = SAPHanaSR path = /usr/share/SAPHanaSR-ScaleOut execution_order = 1 [trace] ha_dr_saphanasr = infoSet
execution_orderto1to ensure that theSAPHanaSRhook is always executed with the highest priority.Due to the shared
/hana/sharedfilesystem between the nodes of each HANA site, you only adjust the configuration once per site. Do not try to edit the same file on the shared filesystem simultaneously on more than one node of the same site.Optional: When you also want to configure the optional
ChkSrvhook for taking action on anhdbindexserverfailure, you can add the changes to theglobal.iniat the same time, see step 1 in Configuring the ChkSrv HA/DR provider for the srServiceStateChanged() hook method:[ha_dr_provider_SAPHanaSR] provider = SAPHanaSR path = /usr/share/SAPHanaSR-ScaleOut execution_order = 1 [ha_dr_provider_chksrv] provider = ChkSrv path = /usr/share/SAPHanaSR-ScaleOut execution_order = 2 action_on_lost = stop [trace] ha_dr_saphanasr = info ha_dr_chksrv = infoCreate the file
/etc/sudoers.d/20-saphana, as therootuser, on each cluster node with the following content. These command privileges allow the<sid>admuser to update certain cluster node attributes as part of the SAPHanaSR hook execution:[root]# visudo -f /etc/sudoers.d/20-saphanaDefaults:<sid>adm !requiretty <sid>adm ALL=(ALL) NOPASSWD: /usr/sbin/crm_attribute -n hana_*-
Replace
<sid>with the lower-case HANA SID. -
For further information on why the
Defaultssetting is needed, refer to The srHook attribute is set to SFAIL in a Pacemaker cluster managing SAP HANA system replication, even though replication is in a healthy state.
-
Replace
Start the HANA instances on all cluster nodes manually without starting the HA cluster. Run as
<sid>admon one HANA instance per site:rh1adm$ sapcontrol -nr ${TINSTANCE} -function StartSystem HDB
Verification
Change to the SAP HANA directory, as the
<sid>admuser, where trace log files are stored. Use the command aliascdtrace, which is built into the<sid>admuser shell:rh1adm$ cdtraceCheck the HANA nameserver process logs for the HA/DR provider loading message:
rh1adm$ grep -he "loading HA/DR Provider.*" nameserver_*If only the
SAPHanaSRprovider is configured:loading HA/DR Provider 'SAPHanaSR' from /usr/share/SAPHanaSR-ScaleOutIf the optional
ChkSrvprovider is also implemented:loading HA/DR Provider 'ChkSrv' from /usr/share/SAPHanaSR-ScaleOut loading HA/DR Provider 'SAPHanaSR' from /usr/share/SAPHanaSR-ScaleOut
Verify as user
rootin the system secure log that thesudocommand executed successfully. If the sudoers file is not correct, an error is logged when thesudocommand is executed. Check this on the primary node on which the HANA master nameserver is running, for example,dc1hana1:[root]# grep -e 'sudo.*crm_attribute.*' /var/log/securesudo[17141]: rh1adm : PWD=/hana/shared/RH1/HDB02/dc1hana1 ; USER=root ; COMMAND=/usr/sbin/crm_attribute -n hana_rh1_gsh -v 1.0 -l reboot sudo[17160]: rh1adm : PWD=/hana/shared/RH1/HDB02/dc1hana1 ; USER=root ; COMMAND=/usr/sbin/crm_attribute -n hana_rh1_glob_srHook -v SFAIL -t crm_config -s SAPHanaSR … sudo[17584]: rh1adm : PWD=/hana/shared/RH1/HDB02/dc1hana1 ; USER=root ; COMMAND=/usr/sbin/crm_attribute -n hana_rh1_glob_srHook -v SOK -t crm_config -s SAPHanaSRAfter the HANA instance starts on both nodes, you can usually see several
srHookattribute updates. At first it is settingSFAIL, because immediately after the primary starts, it is not yet in sync with the secondary, which is still synchronizing at this time.The last update to
SOKis triggered by the HANA event after the system replication status finally changes to fully in sync.-
Repeat the verification steps 1-2 on the second site, if not already done at the same time. The
sudolog messages of step 3 are only visible on the primary instance coordinator nameserver node, on which the system replication events are logged. Check the cluster attributes on any node and verify that the value for the
hana_<sid>_glob_srHookattribute is updated as expected:[root]# cibadmin --query | grep -e 'SAPHanaSR.*srHook'<nvpair id="SAPHanaSR-hana_rh1_glob_srHook" name="hana_rh1_glob_srHook" value="SOK"/>-
SOKis set when the HANA system replication is inACTIVEstate, which means established and fully in sync. -
SFAILis set when the system replication is in any other state.
-
Troubleshooting
5.6. Configuring the ChkSrv HA/DR provider for the srServiceStateChanged() hook method Copy linkLink copied to clipboard!
You can configure the hook ChkSrv if you want the HANA instance to be stopped or killed for faster recovery after an indexserver process has failed. This configuration is optional.
Prerequisites
-
You have installed the
resource-agents-sap-hana-scaleoutpackage. -
You have configured the
SAPHanaSRHA/DR provider. For more information see Configuring the SAPHanaSR HA/DR provider for the srConnectionChanged() hook method.
Procedure
Change to the HANA configuration directory as the
<sid>admuser. Use the command aliascdcoc, which is built into the<sid>admuser shell. This automatically changes into the/hana/shared/<SID>/global/hdb/custom/config/path:rh1adm$ cdcocUpdate the
global.inifile of the HANA site to configure the hook script. Edit the configuration file one node of each HANA site and add the following content in addition to theSAPHanaSRprovider definition:[ha_dr_provider_chksrv] provider = ChkSrv path = /usr/share/SAPHanaSR-ScaleOut execution_order = 2 action_on_lost = stop [trace] ha_dr_saphanasr = info ha_dr_chksrv = infoDue to the shared
/hana/sharedfilesystem between the nodes of each HANA site, you only adjust the configuration once per site. Do not try to edit the same file on the shared filesystem simultaneously on more than one node of the same site.Optional: Activate the
ChkSrvprovider while HANA is running by reloading the HA/DR providers. Skip this step when configuring the hook script while the instance is down, the HA/DR provider is loaded automatically at the next instance start.rh1adm$ hdbnsutil -reloadHADRProviders
Verification
Change to the SAP HANA directory, as the
<sid>admuser, where trace log files are stored. Use the command aliascdtrace, which is built into the<sid>admuser shell:rh1adm$ cdtraceCheck that the changes are loaded:
rh1adm$ grep -e "loading HA/DR Provider.*ChkSrv.*" nameserver_*loading HA/DR Provider 'ChkSrv' from /usr/share/SAPHanaSR-ScaleOutCheck that the dedicated trace log file is created and the provider loaded with the correct configuration parameters:
rh1adm$ cat nameserver_chksrv.trcinit called ChkSrv.init() version 0.7.8, parameter info: action_on_lost=stop stop_timeout=20 kill_signal=9 …
Troubleshooting
5.7. Creating the HANA cluster resources Copy linkLink copied to clipboard!
You must configure the SAPHanaTopology and SAPHanaController resources so that the cluster can collect the status of the HANA landscape, monitor the instance health and take action to manage the instance when required.
Prerequisites
- You have installed the cluster and you have configured all HANA nodes in the cluster.
- You have configured the HANA system replication between your HANA sites.
- All HANA instances are running and the system replication is healthy.
Procedure
Create the
SAPHanaTopologyresource as a clone resource, which means it runs on all cluster nodes at the same time:[root]# pcs resource create rsc_SAPHanaTop_<SID>_HDB<instance> \ ocf:heartbeat:SAPHanaTopology \ SID=<SID> \ InstanceNumber=<instance> \ op start timeout=600 \ op stop timeout=300 \ op monitor interval=30 timeout=300 \ clone cln_SAPHanaTop_<SID>_HDB<instance>-
Replace
<SID>with your HANA SID. -
Replace
<instance>with your HANA instance number.
-
Replace
Update the meta attributes of the new
SAPHanaTopologyclone resource:[root]# pcs resource update cln_SAPHanaTop_<SID>_HDB<instance> \ meta clone-node-max=1 interleave=trueCreate the
SAPHanaControllerresource as a promotable clone resource. This means it runs on all cluster nodes at the same time, but on one node it functions as the active, or primary, resource:[root]# pcs resource create rsc_SAPHanaCon_<SID>_HDB<instance> \ ocf:heartbeat:SAPHanaController \ SID=<SID> \ InstanceNumber=<instance> \ PREFER_SITE_TAKEOVER=true \ DUPLICATE_PRIMARY_TIMEOUT=7200 \ AUTOMATED_REGISTER=false \ op stop timeout=3600 \ op monitor interval=59 role=Promoted timeout=700 \ op monitor interval=61 role=Unpromoted timeout=700 \ meta priority=100 \ promotable cln_SAPHanaCon_<SID>_HDB<instance>We recommend that you create the resource with
AUTOMATED_REGISTER=falseand then verify the correct behavior and data consistency through tests to complete the setup. For more information, see Testing the setup. You can enable this already at creation time by setting the parameter to true.See SAPHanaController resource parameters for more details.
Update the meta attributes of the new
SAPHanaControllerclone resource:[root]# pcs resource update cln_SAPHanaCon_<SID>_HDB<instance> \ meta clone-node-max=1 interleave=trueYou must start the
SAPHanaTopologyresource before theSAPHanaControllerresource, because it collects HANA landscape information, which theSAPHanaControllerresource requires to start correctly. Create the cluster constraint that enforces the correct start order of the two resources:[root]# pcs constraint order cln_SAPHanaTop_<SID>_HDB<instance> \ then cln_SAPHanaCon_<SID>_HDB<instance> symmetrical=falseSetting
symmetrical=falseindicates that the constraint only influences the start order of the resources, but it does not apply to the stop order.
Verification
Review the
SAPHanaTopologyresource clone. For operations that you do not define at resource creation,pcsautomatically applies default values. Example resource configuration:[root]# pcs resource config cln_SAPHanaTop_RH1_HDB02Clone: cln_SAPHanaTop_RH1_HDB02 Meta Attributes: cln_SAPHanaTop_RH1_HDB02-meta_attributes clone-node-max=1 interleave=true Resource: rsc_SAPHanaTop_RH1_HDB02 (class=ocf provider=heartbeat type=SAPHanaTopology) Attributes: rsc_SAPHanaTop_RH1_HDB02-instance_attributes InstanceNumber=02 SID=RH1 Operations: methods: rsc_SAPHanaTop_RH1_HDB02-methods-interval-0s interval=0s timeout=5 monitor: rsc_SAPHanaTop_RH1_HDB02-monitor-interval-30 interval=30 timeout=300 reload: rsc_SAPHanaTop_RH1_HDB02-reload-interval-0s interval=0s timeout=5 start: rsc_SAPHanaTop_RH1_HDB02-start-interval-0s interval=0s timeout=600 stop: rsc_SAPHanaTop_RH1_HDB02-stop-interval-0s interval=0s timeout=300Review the
SAPHanaControllerresource clone. Example resource configuration:[root]# pcs resource config cln_SAPHanaCon_RH1_HDB02Clone: cln_SAPHanaCon_RH1_HDB02 Meta Attributes: cln_SAPHanaCon_RH1_HDB02-meta_attributes clone-node-max=1 interleave=true promotable=true Resource: rsc_SAPHanaCon_RH1_HDB02 (class=ocf provider=heartbeat type=SAPHanaController) Attributes: rsc_SAPHanaCon_RH1_HDB02-instance_attributes AUTOMATED_REGISTER=false DUPLICATE_PRIMARY_TIMEOUT=7200 InstanceNumber=02 PREFER_SITE_TAKEOVER=true SID=RH1 Meta Attributes: rsc_SAPHanaCon_RH1_HDB02-meta_attributes priority=100 Operations: demote: rsc_SAPHanaCon_RH1_HDB02-demote-interval-0s interval=0s timeout=320 methods: rsc_SAPHanaCon_RH1_HDB02-methods-interval-0s interval=0s timeout=5 monitor: rsc_SAPHanaCon_RH1_HDB02-monitor-interval-59 interval=59 timeout=700 role=Promoted monitor: rsc_SAPHanaCon_RH1_HDB02-monitor-interval-61 interval=61 timeout=700 role=Unpromoted promote: rsc_SAPHanaCon_RH1_HDB02-promote-interval-0s interval=0s timeout=3600 reload: rsc_SAPHanaCon_RH1_HDB02-reload-interval-0s interval=0s timeout=5 start: rsc_SAPHanaCon_RH1_HDB02-start-interval-0s interval=0s timeout=3600 stop: rsc_SAPHanaCon_RH1_HDB02-stop-interval-0s interval=0s timeout=3600Check that the start order constraint is in place:
[root]# pcs constraint orderOrder Constraints: start resource 'cln_SAPHanaTop_RH1_HDB02' then start resource 'cln_SAPHanaCon_RH1_HDB02' symmetrical=0Check the cluster status. Use
--fullto include node attributes, which are updated by the HANA resources:[root]# pcs status --full... Full List of Resources: * rsc_fence (stonith:<fence agent>): Started dc1hana1 * Clone Set: cln_SAPHanaTop_RH1_HDB02 [rsc_SAPHanaTop_RH1_HDB02]: * rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc1hana4 * rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc2hana1 * rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc2hana2 * rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc2hana3 * rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc2hana4 * rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc1hana1 * rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc1hana2 * rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc1hana3 * Clone Set: cln_SAPHanaCon_RH1_HDB02 [rsc_SAPHanaCon_RH1_HDB02] (promotable): * rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc1hana4 * rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc2hana1 * rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc2hana2 * rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc2hana3 * rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc2hana4 * rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Promoted dc1hana1 * rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc1hana2 * rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc1hana3 Node Attributes: * Node: dc1hana1 (1): * hana_rh1_clone_state : PROMOTED * hana_rh1_gra : 2.0 * hana_rh1_gsh : 1.0 * hana_rh1_roles : master1:master:worker:master * hana_rh1_site : DC1 * master-rsc_SAPHanaCon_RH1_HDB02 : 150 * Node: dc1hana2 (2): * hana_rh1_clone_state : DEMOTED * hana_rh1_gra : 2.0 * hana_rh1_gsh : 1.0 * hana_rh1_roles : master2:slave:worker:slave * hana_rh1_site : DC1 * master-rsc_SAPHanaCon_RH1_HDB02 : 140 * Node: dc1hana3 (3): * hana_rh1_clone_state : DEMOTED * hana_rh1_gra : 2.0 * hana_rh1_gsh : 1.0 * hana_rh1_roles : slave:slave:worker:slave * hana_rh1_site : DC1 * master-rsc_SAPHanaCon_RH1_HDB02 : -10000 * Node: dc1hana4 (4): * hana_rh1_clone_state : DEMOTED * hana_rh1_gra : 2.0 * hana_rh1_gsh : 1.0 * hana_rh1_roles : master3:slave:standby:standby * hana_rh1_site : DC1 * master-rsc_SAPHanaCon_RH1_HDB02 : 140 * Node: dc2hana1 (5): * hana_rh1_clone_state : DEMOTED * hana_rh1_gra : 2.0 * hana_rh1_gsh : 1.0 * hana_rh1_roles : master1:master:worker:master * hana_rh1_site : DC2 * master-rsc_SAPHanaCon_RH1_HDB02 : 100 * Node: dc2hana2 (6): * hana_rh1_clone_state : DEMOTED * hana_rh1_gra : 2.0 * hana_rh1_gsh : 1.0 * hana_rh1_roles : master2:slave:worker:slave * hana_rh1_site : DC2 * master-rsc_SAPHanaCon_RH1_HDB02 : 80 * Node: dc2hana3 (7): * hana_rh1_clone_state : DEMOTED * hana_rh1_gra : 2.0 * hana_rh1_gsh : 1.0 * hana_rh1_roles : slave:slave:worker:slave * hana_rh1_site : DC2 * master-rsc_SAPHanaCon_RH1_HDB02 : -12200 * Node: dc2hana4 (8): * hana_rh1_clone_state : DEMOTED * hana_rh1_gra : 2.0 * hana_rh1_gsh : 1.0 * hana_rh1_roles : master3:slave:standby:standby * hana_rh1_site : DC2 * master-rsc_SAPHanaCon_RH1_HDB02 : 80 ...
The timeouts shown for the resource operations are only recommended defaults and can be adjusted depending on your SAP HANA environment. For example, large SAP HANA databases can take longer to start up and therefore you might have to increase the start timeout.
Setting AUTOMATED_REGISTER to true can potentially increase the risk of data loss or corruption. If the HA cluster triggers a takeover when the data on the secondary HANA instance is not fully in sync, the automatic registration of the old primary HANA instance as the new secondary HANA instance results a data loss on this instance and any data that was not synced before the takeover occurred is lost as well.
For more information, see the article on the SAP Technology Blog for Members: Be Prepared for Using Pacemaker Cluster for SAP HANA – Part 2: Failure of Both Nodes.
5.8. Creating the virtual IP resource Copy linkLink copied to clipboard!
You must configure a virtual IP (VIP) resource for SAP clients to access the primary HANA instance independently from the cluster node it is currently running on. Configure the VIP resource to automatically move to the node where the primary instance is running.
The resource agent you need for the VIP resource depends on the platform you use. We are using the IPaddr2 resource agent to demonstrate the setup.
Prerequisites
- You have reserved a virtual IP for the service.
Procedure
Use the appropriate resource agent for managing the virtual IP address based on the platform on which the HA cluster is running. Adjust the parameters according to the resource agent you are using. Create the cluster resource for the primary virtual IP, for example, using the
IPaddr2agent:[root]# pcs resource create rsc_vip_<SID>_HDB<instance>_primary \ ocf:heartbeat:IPaddr2 ip=<address> cidr_netmask=<netmask> nic=<device>-
Replace
<SID>with your HANA SID. -
Replace
<instance>with your HANA instance number. -
Replace
<address>,<netmask>and<device>with the details of your primary virtual IP address.
-
Replace
Create a cluster constraint that places the VIP resource with the
SAPHanaControllerresource on the HANA primary node:[root]# pcs constraint colocation add rsc_vip_<SID>_HDB<instance>_primary \ with promoted cln_SAPHanaCon_<SID>_HDB<instance> 2000The constraint applies a score of
2000instead of the defaultINFINITY. This softens the resource dependency and allows the virtual IP resource to stay active in the case when there is no promotedSAPHanaControllerresource. This way it is still possible to use tools like the SAP Management Console (MMC) or SAP Landscape Management (LaMa) that can reach this IP address to query the status information of the HANA instance.
Verification
Check the resource configuration of the virtual IP resource, for example::
[root]# pcs resource config rsc_vip_RH1_HDB02_primaryResource: rsc_vip_RH1_HDB02_primary (class=ocf provider=heartbeat type=IPaddr2) Attributes: rsc_vip_RH1_HDB02_primary-instance_attributes cidr_netmask=32 ip=192.168.1.100 nic=eth0 Operations: monitor: rsc_vip_RH1_HDB02_primary-monitor-interval-10s interval=10s timeout=20s start: rsc_vip_RH1_HDB02_primary-start-interval-0s interval=0s timeout=20s stop: rsc_vip_RH1_HDB02_primary-stop-interval-0s interval=0s timeout=20sCheck that the constraint is defined correctly:
[root]# pcs constraint colocationColocation Constraints: rsc_vip_RH1_HDB02_primary with cln_SAPHanaCon_RH1_HDB02 (score:2000) (rsc-role:Started) (with-rsc-role:Promoted)Check that the resource is running on the promoted primary node, for example,
dc1hana1:[root]# pcs status resources rsc_vip_RH1_HDB02_primary* rsc_vip_RH1_HDB02_primary (ocf:heartbeat:IPaddr2): Started dc1hana1
5.9. Adding a secondary (read-enabled) virtual IP address Copy linkLink copied to clipboard!
To support the Active/Active (read-enabled) secondary setup you must add a second virtual IP to provide client access to the secondary SAP HANA site.
Configure additional rules to ensure that the second virtual IP is always associated with a healthy SAP HANA site, maximizing client access and availability.
Normal operation
When both primary and secondary SAP HANA sites are active and the system replication is in sync, the second virtual IP is assigned to the secondary instance.
Secondary unavailable or out of sync
If the secondary site is down or the system replication is not in sync, the virtual IP moves to the primary instance. It automatically returns to the secondary instance as soon as the system replication is back in sync.
Failover scenario
In case the cluster triggers a takeover, the virtual IP remains on the same instance. After the former primary instance takes over the secondary role and the system replication is in sync again, this VIP shifts there accordingly.
Prerequisites
-
You have set
operationMode=logreplay_readaccesswhen registering the secondary SAP HANA instance for system replication with the primary site.
Procedure
Use the appropriate resource agent for managing the virtual IP address based on the platform on which the HA cluster is running. Adjust the parameters according to the resource agent you are using. Create the cluster resource for the secondary virtual IP, for example, using the
IPaddr2agent:[root]# pcs resource create rsc_vip_<SID>_HDB<site>_readonly \ ocf:heartbeat:IPaddr2 ip=<address> cidr_netmask=<netmask> nic=<device>-
Replace
<SID>with your HANA SID. -
Replace
<site>with your HANA site number. -
Replace
<address>,<netmask>and<device>with the details of your read-only secondary virtual IP address.
-
Replace
Create a location constraint rule to ensure that the secondary virtual IP is assigned to the secondary site during normal operations:
[root]# pcs constraint location rsc_vip_<SID>_HDB<site>_readonly \ rule score=INFINITY master-rsc_SAPHanaCon_<SID>_HDB<site> eq 100 \ and hana_<sid>_clone_state eq DEMOTED-
Replace
<SID>with your HANA SID. -
Replace
<sid>with the lower-case HANA SID. -
Replace
<site>with your HANA site number.
-
Replace
Create a location constraint rule to ensure that the secondary virtual IP runs on the primary site as an alternative whenever necessary:
[root]# pcs constraint location rsc_vip_<SID>_HDB<site>_readonly \ rule score=2000 master-rsc_SAPHanaCon_<SID>_HDB<site> eq 150 \ and hana_<sid>_clone_state eq PROMOTED
Verification
Check the resource configuration of the secondary virtual IP resource, for example:
[root]# pcs resource config rsc_vip_RH1_HDB02_readonlyResource: rsc_vip_RH1_HDB02_readonly (class=ocf provider=heartbeat type=IPaddr2) Attributes: rsc_vip_RH1_HDB02_readonly-instance_attributes cidr_netmask=32 ip=192.168.1.200 nic=eth0 Operations: monitor: rsc_vip_RH1_HDB02_readonly-monitor-interval-10s interval=10s timeout=20s start: rsc_vip_RH1_HDB02_readonly-start-interval-0s interval=0s timeout=20s stop: rsc_vip_RH1_HDB02_readonly-stop-interval-0s interval=0s timeout=20sCheck that the constraints are part of the cluster configuration:
[root]# pcs constraint locationLocation Constraints: Resource: rsc_vip_RH1_HDB02_readonly Constraint: location-rsc_vip_RH1_HDB02_readonly Rule: boolean-op=and score=INFINITY Expression: master-rsc_SAPHanaCon_RH1_HDB02 eq 100 Expression: hana_rh1_clone_state eq DEMOTED Constraint: location-rsc_vip_RH1_HDB02_readonly-1 Rule: boolean-op=and score=2000 Expression: master-rsc_SAPHanaCon_RH1_HDB02 eq 150 Expression: hana_rh1_clone_state eq PROMOTEDCheck that the resource is running on the main secondary node, for example, dc2hana1:
[root]# pcs status resources rsc_vip_RH1_HDB02_readonly* rsc_vip_RH1_HDB02_readonly (ocf:heartbeat:IPaddr2): Started dc2hana1
Chapter 6. Configuring a quorum device in the cluster Copy linkLink copied to clipboard!
We recommend that you configure a qdevice in your cluster for improved service resiliency. Alternatively you can configure a dedicated cluster node that only serves for adding a quorum vote.
Do not configure both a qdevice and a majority-maker node in the same cluster. This adds one vote per method and results in an even number of quorum votes again.
6.1. Configuring a qdevice for cluster quorum Copy linkLink copied to clipboard!
If you prefer to use a dedicated majority-maker cluster node for this purpose, skip the qdevice setup and follow the steps in Configuring a majority-maker node for cluster quorum instead.
6.1.1. Preparing the quorum device host Copy linkLink copied to clipboard!
- Configure the RHEL High Availability repository on the quorum device host.
-
If the
firewalldservice is installed and you are not using it on your hosts, disable the service. See Disabling the firewalld service.
6.1.2. Configuring a qdevice on a quorum device host Copy linkLink copied to clipboard!
At first you must configure a quorum device host that serves the qdevice for your cluster quorum.
In the following steps the example name of the qdevice host is dc3qdevice.
Prerequisites
- You have installed a separate host that is ideally located in a different location or availability zone than your cluster nodes.
- You have configured the RHEL High Availability repository on the dedicated quorum device host.
- You have configured your network in a way that your cluster nodes can reach the quorum device host.
Procedure
Install
pcsandcorosync-qnetdon the quorum device host:[root]# dnf install pcs corosync-qnetdStart and enable the
pcsdservice on the quorum device host:[root]# systemctl enable --now pcsd.serviceCreate the qdevice on the quorum device host. This command configures and starts the quorum device model
netand configures the device to start on boot. Run this command on the quorum device host:[root]# pcs qdevice setup model net --enable --startQuorum device 'net' initialized quorum device enabled Starting quorum device... quorum device startedOptional: If you are running the
firewalldservice, enable the ports that are required by the Red Hat High Availability Add-On. Run this on the quorum device host:[root]# firewall-cmd --add-service=high-availability [root]# firewall-cmd --runtime-to-permanentSet a password for the user
haclusteron the quorum device host:[root]# passwd hacluster
Verification
Check the quorum device status on the quorum device host:
[root]# pcs qdevice status net --fullQNetd address: *:5403 TLS: Supported (client certificate required) Connected clients: 0 Connected clusters: 0 Maximum send/receive size: 32768/32768 bytes
6.1.3. Configuring a qdevice in the cluster Copy linkLink copied to clipboard!
Prerequisites
-
You have configured a quorum device host that is ideally located in a different location or availability zone than your cluster nodes, for example,
dc3qdevice. - You have configured a qdevice on the quorum device host.
- You have configured your network in a way that your cluster nodes can reach the quorum device host.
Procedure
Add the qdevice host to the
/etc/hostson all existing cluster nodes, so that the resulting/etc/hostsentries are the same on all nodes. This ensures that the cluster nodes can communicate with the host even when the DNS services failed:[root]# cat /etc/hosts... 192.168.100.101 dc1hana1.example.com dc1hana1 192.168.100.102 dc1hana2.example.com dc1hana2 192.168.100.103 dc1hana3.example.com dc1hana3 192.168.100.104 dc1hana4.example.com dc1hana4 192.168.100.121 dc2hana1.example.com dc2hana1 192.168.100.122 dc2hana2.example.com dc2hana2 192.168.100.123 dc2hana3.example.com dc2hana3 192.168.100.124 dc2hana4.example.com dc2hana4 192.168.100.120 dc3qdevice.example.com dc3qdeviceInstall
corosync-qdeviceon all nodes of your cluster:[root]# dnf install corosync-qdeviceAuthenticate the quorum device host in the cluster to enable communication. Run this command on one cluster node:
[root]# pcs host auth <qdevice_host> Username: hacluster Password:dc3qdevice: Authorized-
Replace
<qdevice_host>with the name of your quorum device host, for example,dc3qdevice.
-
Replace
Add the qdevice from the quorum device host to the cluster. Run this command on one cluster node:
[root]# pcs quorum device add model net host=<qdevice_host> algorithm=ffsplitSetting up qdevice certificates on nodes... dc1hana1: Succeeded dc1hana2: Succeeded dc1hana3: Succeeded dc1hana4: Succeeded dc2hana3: Succeeded dc2hana2: Succeeded dc2hana1: Succeeded dc2hana4: Succeeded Enabling corosync-qdevice... dc1hana3: corosync-qdevice enabled dc1hana1: corosync-qdevice enabled dc1hana4: corosync-qdevice enabled dc2hana3: corosync-qdevice enabled dc2hana4: corosync-qdevice enabled dc2hana2: corosync-qdevice enabled dc1hana2: corosync-qdevice enabled dc2hana1: corosync-qdevice enabled Sending updated corosync.conf to nodes... dc1hana1: Succeeded dc1hana3: Succeeded dc1hana4: Succeeded dc1hana2: Succeeded dc2hana1: Succeeded dc2hana2: Succeeded dc2hana3: Succeeded dc2hana4: Succeeded dc1hana1: Corosync configuration reloaded Starting corosync-qdevice... dc1hana1: corosync-qdevice started dc1hana2: corosync-qdevice started dc2hana4: corosync-qdevice started dc2hana3: corosync-qdevice started dc1hana3: corosync-qdevice started dc1hana4: corosync-qdevice started dc2hana2: corosync-qdevice started dc2hana1: corosync-qdevice started-
Replace
<qdevice_host>with the name of your quorum device host, for example,dc3qdevice. -
The algorithm can be
ffsplitorlms. Consult thecorosync-qdevice(8)man page for more details about the different algorithms.
-
Replace
Verification
Check the quorum configuration on a cluster node:
[root]# pcs quorum configDevice: votes: 1 Model: net algorithm: ffsplit host: dc3qdeviceCheck the quorum status on a cluster node:
[root]# pcs quorum statusQuorum information ------------------ Date: Thu Dec 11 08:27:00 2025 Quorum provider: corosync_votequorum Nodes: 8 Node ID: 1 Ring ID: 1.3a Quorate: Yes Votequorum information ---------------------- Expected votes: 9 Highest expected: 9 Total votes: 9 Quorum: 5 Flags: Quorate Qdevice Membership information ---------------------- Nodeid Votes Qdevice Name 1 1 A,V,NMW dc1hana1 (local) 2 1 A,V,NMW dc1hana2 3 1 A,V,NMW dc1hana3 4 1 A,V,NMW dc1hana4 5 1 A,V,NMW dc2hana1 6 1 A,V,NMW dc2hana2 7 1 A,V,NMW dc2hana3 8 1 A,V,NMW dc2hana4 0 1 QdeviceCheck the quorum device status on a cluster node:
[root]# pcs quorum device statusQdevice information ------------------- Model: Net Node ID: 1 Configured node list: 0 Node ID = 1 1 Node ID = 2 2 Node ID = 3 3 Node ID = 4 4 Node ID = 5 5 Node ID = 6 6 Node ID = 7 7 Node ID = 8 Membership node list: 1, 2, 3, 4, 5, 6, 7, 8 Qdevice-net information ---------------------- Cluster name: hana-scaleout-cluster QNetd host: dc3qdevice:5403 Algorithm: Fifty-Fifty split Tie-breaker: Node with lowest node ID State: ConnectedCheck the quorum device status on the quorum device host. The status now shows the details of the cluster on which the qdevice is used. If the same qdevice is configured in multiple clusters, the status contains the details for each cluster:
[root]# pcs qdevice status net --fullQNetd address: *:5403 TLS: Supported (client certificate required) Connected clients: 8 Connected clusters: 1 Maximum send/receive size: 32768/32768 bytes Cluster "hana-scaleout-cluster": Algorithm: Fifty-Fifty split (KAP Tie-breaker) Tie-breaker: Node with lowest node ID Node ID 1: Client address: ::ffff:10.99.30.149:13248 HB interval: 8000ms Configured node list: 1, 2, 3, 4, 5, 6, 7, 8 Ring ID: 1.3a Membership node list: 1, 2, 3, 4, 5, 6, 7, 8 Heuristics: Undefined (membership: Undefined, regular: Undefined) TLS active: Yes (client certificate verified) Vote: ACK (ACK) Node ID 2: Client address: ::ffff:10.99.30.30:13276 HB interval: 8000ms Configured node list: 1, 2, 3, 4, 5, 6, 7, 8 Ring ID: 1.3a Membership node list: 1, 2, 3, 4, 5, 6, 7, 8 Heuristics: Undefined (membership: Undefined, regular: Undefined) TLS active: Yes (client certificate verified) Vote: ACK (ACK) Node ID 3: Client address: ::ffff:10.99.30.232:22226 HB interval: 8000ms Configured node list: 1, 2, 3, 4, 5, 6, 7, 8 Ring ID: 1.3a Membership node list: 1, 2, 3, 4, 5, 6, 7, 8 Heuristics: Undefined (membership: Undefined, regular: Undefined) TLS active: Yes (client certificate verified) Vote: ACK (ACK) Node ID 4: Client address: ::ffff:10.99.30.82:40404 HB interval: 8000ms Configured node list: 1, 2, 3, 4, 5, 6, 7, 8 Ring ID: 1.3a Membership node list: 1, 2, 3, 4, 5, 6, 7, 8 Heuristics: Undefined (membership: Undefined, regular: Undefined) TLS active: Yes (client certificate verified) Vote: ACK (ACK) Node ID 5: Client address: ::ffff:10.99.30.163:40404 HB interval: 8000ms Configured node list: 1, 2, 3, 4, 5, 6, 7, 8 Ring ID: 1.3a Membership node list: 1, 2, 3, 4, 5, 6, 7, 8 Heuristics: Undefined (membership: Undefined, regular: Undefined) TLS active: Yes (client certificate verified) Vote: No change (ACK) Node ID 6: Client address: ::ffff:10.99.30.42:40404 HB interval: 8000ms Configured node list: 1, 2, 3, 4, 5, 6, 7, 8 Ring ID: 1.3a Membership node list: 1, 2, 3, 4, 5, 6, 7, 8 Heuristics: Undefined (membership: Undefined, regular: Undefined) TLS active: Yes (client certificate verified) Vote: ACK (ACK) Node ID 7: Client address: ::ffff:10.99.30.191:24736 HB interval: 8000ms Configured node list: 1, 2, 3, 4, 5, 6, 7, 8 Ring ID: 1.3a Membership node list: 1, 2, 3, 4, 5, 6, 7, 8 Heuristics: Undefined (membership: Undefined, regular: Undefined) TLS active: Yes (client certificate verified) Vote: ACK (ACK) Node ID 8: Client address: ::ffff:10.99.30.33:29004 HB interval: 8000ms Configured node list: 1, 2, 3, 4, 5, 6, 7, 8 Ring ID: 1.3a Membership node list: 1, 2, 3, 4, 5, 6, 7, 8 Heuristics: Undefined (membership: Undefined, regular: Undefined) TLS active: Yes (client certificate verified) Vote: ACK (ACK)
6.2. Configuring a majority-maker node for cluster quorum Copy linkLink copied to clipboard!
You can use an additional cluster node to serve as an extra quorum vote. In the following steps we configure such a majority-maker node dc3mm in the existing cluster.
6.2.1. Preparing the majority-maker node Copy linkLink copied to clipboard!
- Install the node with the same operating system version as your HANA nodes.
- Configure the RHEL High Availability repository on the majority-maker node.
-
If the
firewalldservice is installed and you are not using it on your hosts, disable the service. See Disabling the firewalld service.
6.2.2. Updating the host names in /etc/hosts Copy linkLink copied to clipboard!
As for all cluster nodes, we recommend that you also add the majority-maker cluster node to the /etc/hosts file on each node. On the new dc3mm node you add all cluster nodes.
Procedure
Add the new host to the
/etc/hostson all existing cluster nodes and add all hosts to the newdc3mmhost, so that the resulting/etc/hostsentries are the same on all nodes:[root]# cat /etc/hosts... 192.168.100.101 dc1hana1.example.com dc1hana1 192.168.100.102 dc1hana2.example.com dc1hana2 192.168.100.103 dc1hana3.example.com dc1hana3 192.168.100.104 dc1hana4.example.com dc1hana4 192.168.100.121 dc2hana1.example.com dc2hana1 192.168.100.122 dc2hana2.example.com dc2hana2 192.168.100.123 dc2hana3.example.com dc2hana3 192.168.100.124 dc2hana4.example.com dc2hana 192.168.100.110 dc3mm.example.com dc3mm
Verification
Check that you can ping the hosts between each other. This step is optional and an example only for a basic verification. The system resolves entries in
/etc/hostswhen you use thepingcommand:[root]# ping dc3mm.example.comPING dc3mm.example.com (192.168.100.110) 56(84) bytes of data. 64 bytes from dc3mm.example.com (192.168.100.110): icmp_seq=1 ttl=64 time=0.017 ms …
6.2.3. Updating the cluster clone resources Copy linkLink copied to clipboard!
The cluster automatically creates one copy of a clone resource for each cluster node and uses this to calculate resource allocations. For example, if the cluster consists of 8 HANA nodes and you add an additional majority-maker node for a quorum vote only, the cluster automatically calculates with all 9 cluster members for clone resource assignments. However, including this non-HANA node in the calculations can lead to unexpected impact when the cluster moves resources.
To prevent this influence you must configure the clone-max limit for all cloned resources explicitly to only the number of HANA nodes. Adjust the clone configuration before you add the new node to the cluster.
Procedure
Update the
SAPHanaTopologyresource clone with a limit to the number of HANA nodes, for example,8:[root]# pcs resource update cln_SAPHanaTop_<SID>_HDB<instance> meta clone-max=8Update the
SAPHanaControllerresource clone with a limit to the number of HANA nodes, for example,8:[root]# pcs resource update cln_SAPHanaCon_<SID>_HDB<instance> meta clone-max=8
Verification
Check that the
clone-maxoption is correct for all clone resources:[root]# pcs resource config | grep -i cloneClone: cln_SAPHanaTop_RH1_HDB02 clone-max=8 clone-node-max=1 Clone: cln_SAPHanaCon_RH1_HDB02 clone-max=8 clone-node-max=1-
clone-maxmust be the number of HANA nodes, for example,8. When this option is not displayed, it defaults to the total number of cluster nodes instead.
-
6.2.4. Installing the cluster components on the majority-maker node Copy linkLink copied to clipboard!
Install the same cluster packages as on the existing cluster nodes to prepare the host in the same way.
Prerequisites
- You have configured the RHEL High Availability repository on the majority-maker host.
Procedure
Install the Red Hat High Availability Add-On software packages from the High Availability repository. Choose the same fence agents as you are using on the existing cluster nodes:
[root]# dnf install pcs pacemaker fence-agents-<model>Start and enable the
pcsdservice on the new nodes. The--nowparameter automatically starts the enabled service:[root]# systemctl enable --now pcsd.serviceOptional: If you use the local
firewalldservice you must enable the ports that are required by the Red Hat High Availability Add-On. Run this on the new node:[root]# firewall-cmd --add-service=high-availability [root]# firewall-cmd --runtime-to-permanentSet a password for the user
haclusteron the new node using the same password:[root]# passwd hacluster
Verification
Check that the pcsd service is running and shows as
loadedandactiveon the new node:[root]# systemctl status pcsd.service● pcsd.service - PCS GUI and remote configuration interface Loaded: loaded (/usr/lib/systemd/system/pcsd.service; enabled; preset: disabled) Active: active (running) since … …
6.2.5. Adding the new node to the cluster Copy linkLink copied to clipboard!
Add the dedicated majority-maker node as a regular cluster member.
Prerequisites
- You have configured a cluster to which you want to add this node as a member.
Procedure
Authenticate the user
haclusterfor the new node in the cluster. Run this on one cluster node:[root]# pcs host auth dc3mm Username: hacluster Password:dc3mm: Authorized-
Enter the node name with or without FQDN, as defined in the
/etc/hostsfile. -
Enter the
haclusteruser password in the prompt.
-
Enter the node name with or without FQDN, as defined in the
Add the new node to the cluster. Run this on one cluster node:
[root]# pcs cluster node add dc3mmNo addresses specified for host 'dc3mm', using 'dc3mm' Disabling sbd... dc3mm: sbd disabled Sending 'corosync authkey', 'pacemaker authkey' to 'dc3mm' dc3mm: successful distribution of the file 'corosync authkey' dc3mm: successful distribution of the file 'pacemaker authkey' Sending updated corosync.conf to nodes... dc1hana3: Succeeded dc1hana2: Succeeded dc1hana1: Succeeded dc2hana1: Succeeded dc1hana4: Succeeded dc2hana2: Succeeded dc2hana3: Succeeded dc2hana4: Succeeded dc3mm: Succeeded dc2hana1: Corosync configuration reloadedAdd a location constraint to prevent any HANA resource from running on this node, and also prevent the cluster from trying to check the initial resource status. The following constraint definition uses a
regexpexpression to match all HANA resources. If required, adjust the pattern to match your resource names:[root]# pcs constraint location add avoid-dc3mm \ regexp%.*SAPHana.* dc3mm -- -INFINITY resource-discovery=neverEnable the cluster on the new node to be started automatically on system start. Run on any node:
[root]# pcs cluster enable dc3mmdc3mm: Cluster EnabledStart the cluster on the new node:
[root]# pcs cluster start dc3mmdc3mm: Starting Cluster...
Verification
Check the location constraint that keeps HANA resources off the new node:
[root]# pcs constraint location --fullLocation Constraints: Resource pattern: .*SAPHana.* Disabled on: Node: dc3mm (score:-INFINITY) (resource-discovery=never) (id:avoid-dc3mm)Check the cluster status. Verify that the cluster daemon services are in the desired state. Run this on the new node to also verify the local daemon status at the end:
[root]# pcs status --full… Node List: * Node dc1hana1 (1): online, feature set 3.16.2 * Node dc1hana2 (2): online, feature set 3.16.2 * Node dc1hana3 (3): online, feature set 3.16.2 * Node dc1hana4 (4): online, feature set 3.16.2 * Node dc2hana1 (5): online, feature set 3.16.2 * Node dc2hana2 (6): online, feature set 3.16.2 * Node dc2hana3 (7): online, feature set 3.16.2 * Node dc2hana4 (8): online, feature set 3.16.2 * Node dc3mm (9): online, feature set 3.16.2 Full List of Resources: * rsc_fence (stonith:<stonith agent>): Started dc1hana1 * Clone Set: cln_SAPHanaTop_RH1_HDB02 [rsc_SAPHanaTop_RH1_HDB02]: * rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc2hana3 * rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc2hana4 * rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc1hana1 * rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc1hana2 * rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc1hana3 * rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc1hana4 * rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc2hana1 * rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc2hana2 * Clone Set: cln_SAPHanaCon_RH1_HDB02 [rsc_SAPHanaCon_RH1_HDB02] (promotable): * rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc2hana3 * rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc2hana4 * rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Promoted dc1hana1 * rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc1hana2 * rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc1hana3 * rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc1hana4 * rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc2hana1 * rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc2hana2 * rsc_vip_RH1_HDB02_primary (ocf:heartbeat:IPaddr2): Started dc1hana1 * rsc_vip_RH1_HDB02_readonly (ocf:heartbeat:IPaddr2): Started dc1hana1 …-
The new node must be displayed in the
Node Listand in thePCSD Statuslist. -
The new node must not show a status in the
Full List of Resources.
-
The new node must be displayed in the
Verify the quorum status. The new node adds a vote, so that there is now an odd number of votes and there can be no equal 50/50 split anymore:
[root]# pcs quorum statusQuorum information ------------------ Date: Thu Dec 11 10:25:40 2025 Quorum provider: corosync_votequorum Nodes: 9 Node ID: 1 Ring ID: 1.3e Quorate: Yes Votequorum information ---------------------- Expected votes: 9 Highest expected: 9 Total votes: 9 Quorum: 5 Flags: Quorate Membership information ---------------------- Nodeid Votes Qdevice Name 1 1 NR dc1hana1 (local) 2 1 NR dc1hana2 3 1 NR dc1hana3 4 1 NR dc1hana4 5 1 NR dc2hana1 6 1 NR dc2hana2 7 1 NR dc2hana3 8 1 NR dc2hana4 9 1 NR dc3mm
Next steps
- Add the new node to your individual fencing method. See Configuring fencing in a Red Hat High Availability cluster.
Chapter 7. Testing the setup Copy linkLink copied to clipboard!
Test your new HANA HA cluster thoroughly before you enable it for production workloads.
Enhance the basic example test cases with your specific requirements.
7.1. Detecting the system replication state changes Copy linkLink copied to clipboard!
You must monitor the sync state information in logs and cluster attributes when disrupting the system replication, to test the correct functionality of the SAPHanaSR HA/DR provider.
In this test, you use the primary site for monitoring the system replication status and for verifying the log messages. On a secondary instance you freeze the indexserver process to simulate a system replication issue while the primary remains fully intact.
Prerequisites
-
You have configured the mandatory
SAPHanaSRHA/DR provider. - Your HANA instances are in a healthy state on all cluster nodes and the system replication is in sync.
Procedure
As user
<sid>admgo to the HANA Python directory on the primary site and check the current system replication state. Verify that it isACTIVEand fully synced:rh1adm$ cdpy; python systemReplicationStatus.py… status system replication site "2": ACTIVE overall system replication status: ACTIVE …Verify that the
srHookandsync_statecluster attributes are bothSOKin the attributes summary of the secondary site. Run this command as therootuser on any node in a separate terminal to keep track of the attribute changes:[root]# watch SAPHanaSR-showAttrGlobal cib-time prim sec srHook sync_state upd --------------------------------------------------------------- RH1 Thu Dec 11 10:59:25 2025 DC1 DC2 SOK SOK ok ...You can use the
watchcommand to run the command in a loop at a default interval of 2 seconds.On an instance on the secondary site, for example,
dc2hana2, get the process ID (PID) of thehdbindexserverprocess. For example, you can get it from thePIDcolumn of theHDB infooutput as user<sid>adm:rh1adm$ HDB infoOn the same instance on the secondary site, use the
PIDto simulate a hanginghdbindexserverprocess by sending theSTOPsignal to the process. This freezes the process and blocks it from communicating and syncing the instance between the nodes:rh1adm$ kill -STOP <PID>
Verification
On the primary site, watch the system replication status for the change on any primary instance. In the following example the system’s
cututility helps you limit the output to certain fields for readability. Remove it to see all columns of the table formatted text output. In the example we froze the indexserver on the secondary nodedc2hana2, which results in a replication error with that node’s counterpart on the primary site,dc1hana2:rh1adm$ cdpy; watch "python systemReplicationStatus.py | cut -d '|' -f 1-3,5,9,13-"|Database |Host |Service Name |Secondary |Secondary |Replication |Replication |Replication |Secondary | | | | |Host |Active Status |Mode |Status |Status Details |Fully Synced | |-------- |-------- |------------ |--------- |------------- |----------- |----------- |----------------------------- |------------ | |RH1 |dc1hana3 |indexserver |dc2hana3 |YES |SYNC |ACTIVE | | True | |RH1 |dc1hana2 |indexserver |dc2hana2 |YES |SYNC |ERROR |Log shipping timeout occurred | False | |SYSTEMDB |dc1hana1 |nameserver |dc2hana1 |YES |SYNC |ACTIVE | | True | |RH1 |dc1hana1 |xsengine |dc2hana1 |YES |SYNC |ACTIVE | | True | |RH1 |dc1hana1 |indexserver |dc2hana1 |YES |SYNC |ACTIVE | | True | status system replication site "2": ERROR overall system replication status: ERROR ...The replication status changes to
ERRORfor the indexserver service after a bit. It can take a while to react on an idle instance, wait a minute or more.On the primary site’s master name server node, check the HANA nameserver process log for the related messages as the
<sid>admuser:rh1adm$ cdtrace; grep -he 'HanaSR.srConnectionChanged.*' nameserver_*ha_dr_SAPHanaSR SAPHanaSR.py(00057) : SAPHanaSR (0.184.1) SAPHanaSR.srConnectionChanged method called with Dict={'hostname': 'dc1hana2', 'port': '30203', 'volume': 4, 'service_name': 'indexserver', 'database': 'RH1', 'status': 11, 'database_status': 11, 'system_status': 11, 'timestamp': '2025-12-11T11:52:01.069691+00:00', 'is_in_sync': False, 'system_is_in_sync': False, 'reason': '', 'siteName': 'DC2'} ha_dr_SAPHanaSR SAPHanaSR.py(00068) : SAPHanaSR.srConnectionChanged() CALLING CRM: <sudo /usr/sbin/crm_attribute -n hana_rh1_gsh -v 1.0 -l reboot> rc=0 ha_dr_SAPHanaSR SAPHanaSR.py(00069) : SAPHanaSR.srConnectionChanged() Running old srHookGeneration 1.0, see attribute hana_rh1_gsh too ha_dr_SAPHanaSR SAPHanaSR.py(00087) : SAPHanaSR SAPHanaSR.srConnectionChanged method called with Dict={'hostname': 'dc1hana2', 'port': '30203', 'volume': 4, 'service_name': 'indexserver', 'database': 'RH1', 'status': 11, 'database_status': 11, 'system_status': 11, 'timestamp': '2025-12-11T11:52:01.069691+00:00', 'is_in_sync': False, 'system_is_in_sync': False, 'reason': '', 'siteName': 'DC2'} ###The nameserver process log contains the
SAPHanaSRhook logs.Verify that both cluster attributes for the system replication status,
srHookandsync_state, show theSFAILstatus of the secondary site. Run the following as therootuser on any HANA node or use the open terminal from the previous steps to watch the changes:[root]# SAPHanaSR-showAttrGlobal cib-time prim sec srHook sync_state upd --------------------------------------------------------------- RH1 Thu Dec 11 12:24:40 2025 DC1 DC2 SFAIL SFAIL ok ...Unblock the previously frozen
hdbindexserverPID to enable it again. Run this on the secondary instance on which you blocked thehdbindexserverprocess for the test:rh1adm$ kill -CONT <PID>-
Repeat the previous checks to verify that the system replication recovers fully after a bit. The cluster does not trigger any actions during this test since the resources remain running. Ensure that the system replication status is healthy again, fully synced and the cluster attributes are set to
SOKagain for the secondary site.
7.2. Triggering the indexserver crash recovery Copy linkLink copied to clipboard!
Test the functionality of the ChkSrv HA/DR provider by simulating the crash of an hdbindexserver process. You can run this on the primary or on the secondary site. The exact recovery actions depend on the overall configuration. The following steps demonstrate the activity when using action_on_lost = stop in the hook configuration..
Prerequisites
-
You have configured the
ChkSrvHA/DR provider. Skip this test if you have not configured this optional hook. - Your HANA instances have a healthy HANA system replication.
- You have no failures in the cluster status.
Procedure
Use a separate terminal to monitor the HANA processes as user
<sid>admon the instance on which you run this test:rh1adm$ watch "sapcontrol -nr ${TINSTANCE} -function GetProcessList | column -s ',' -t"In another terminal on the same HANA instance, kill the
hdbindexserverprocess:rh1adm$ kill <PID>
Verification
Check the dedicated HANA nameserver trace log on the same instance and identify the event and related action, as user
<sid>adm:rh1adm$ cdtrace; less nameserver_chksrv.trc... ChkSrv version 0.7.8. Method srServiceStateChanged method called. ChkSrv srServiceStateChanged method called with Dict={'hostname': 'dc2hana2', 'service_name': 'indexserver', 'service_port': '30203', 'service_status': 'stopping', 'service_previous_status': 'yes', 'timestamp': '2025-12-11T13:01:10.872386+00:00', 'daemon_status': 'yes', 'database_id': '3', 'database_name': 'RH1', 'database_status': 'yes', 'details': ''} ChkSrv srServiceStateChanged method called with SAPSYSTEMNAME=RH1 srv:indexserver-30203-stopping-yes db:RH1-3-yes daem:yes LOST: indexserver event looks like a lost indexserver (status=stopping) LOST: stop instance. action_on_lost=stop ...Check the cluster status for resource failure information on any cluster node, as user
root:[root]# pcs status --full... Failed Resource Actions: * rsc_SAPHanaCon_RH1_HDB02_monitor_61000 on dc2hana3 'not running' (7): call=183, status='complete', last-rc-change='Thu Dec 11 13:02:31 2025', queued=0ms, exec=0ms * rsc_SAPHanaCon_RH1_HDB02_monitor_61000 on dc2hana4 'not running' (7): call=179, status='complete', last-rc-change='Thu Dec 11 13:02:35 2025', queued=0ms, exec=0ms * rsc_SAPHanaCon_RH1_HDB02_monitor_61000 on dc2hana1 'not running' (7): call=108, status='complete', last-rc-change='Thu Dec 11 13:01:49 2025', queued=0ms, exec=0ms * rsc_SAPHanaCon_RH1_HDB02_monitor_61000 on dc2hana2 'error' (1): call=179, status='complete', last-rc-change='Thu Dec 11 13:02:39 2025', queued=0ms, exec=0ms ...Check the system log for the related cluster actions on the on the test node, for example,
dc2hana2, as userroot:[root]# grep rsc_SAPHanaCon_RH1_HDB02 /var/log/messages... Dec 11 13:01:33 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[311166]: INFO: RA ==== begin action monitor_clone (0.185.3) ==== Dec 11 13:01:33 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[311166]: INFO: ACT: systemd service SAPRH1_02.service is active Dec 11 13:01:33 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[311166]: ERROR: SAP instance service hdbdaemon status color is GRAY ! Dec 11 13:01:34 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[311166]: WARNING: RA: HANA_CALL stderr from command 'su - rh1adm -c' is '', stderr from command 'python landscapeHostConfiguration.py --sapcontrol=1' is '' Dec 11 13:01:34 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[311166]: INFO: DEC: local instance AND landscape are down (lss=1) Dec 11 13:01:34 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[311166]: INFO: DEC: DEMOTED => OCF_SUCCESS Dec 11 13:01:34 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[311166]: INFO: RA ==== end action monitor_clone with rc=0 (0.185.3) (3s)==== Dec 11 13:01:49 dc2hana2 pacemaker-attrd[1977]: notice: Setting last-failure-rsc_SAPHanaCon_RH1_HDB02#monitor_61000[dc2hana1]: (unset) -> 1765458109 Dec 11 13:01:49 dc2hana2 pacemaker-attrd[1977]: notice: Setting fail-count-rsc_SAPHanaCon_RH1_HDB02#monitor_61000[dc2hana1]: (unset) -> 1 Dec 11 13:01:58 dc2hana2 pacemaker-attrd[1977]: notice: Setting master-rsc_SAPHanaCon_RH1_HDB02[dc2hana1]: 100 -> 0 Dec 11 13:02:31 dc2hana2 pacemaker-attrd[1977]: notice: Setting last-failure-rsc_SAPHanaCon_RH1_HDB02#monitor_61000[dc2hana3]: (unset) -> 1765458151 Dec 11 13:02:31 dc2hana2 pacemaker-attrd[1977]: notice: Setting fail-count-rsc_SAPHanaCon_RH1_HDB02#monitor_61000[dc2hana3]: (unset) -> 1 Dec 11 13:02:35 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[312314]: INFO: DEC: is_master_nameserver rc=0 Dec 11 13:02:35 dc2hana2 pacemaker-attrd[1977]: notice: Setting last-failure-rsc_SAPHanaCon_RH1_HDB02#monitor_61000[dc2hana4]: (unset) -> 1765458155 Dec 11 13:02:35 dc2hana2 pacemaker-attrd[1977]: notice: Setting fail-count-rsc_SAPHanaCon_RH1_HDB02#monitor_61000[dc2hana4]: (unset) -> 1 Dec 11 13:02:36 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[312314]: INFO: RA: multiTargetSupport attribute not set. May be no Hook is configured or the old-style Hook is used. Dec 11 13:02:36 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[312314]: INFO: DEC: is_master_nameserver rc=0 Dec 11 13:02:36 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[312314]: INFO: DEC: ===> master_walk: priorities for site DC2 master1= master2=dc2hana2 master3= ==> active_master= best_cold_master=dc2hana2 Dec 11 13:02:36 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[312314]: INFO: DEC: ===> master_walk: the_master=dc2hana2; Dec 11 13:02:36 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[312314]: INFO: RA ==== begin action monitor_clone (0.185.3) ==== Dec 11 13:02:38 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[312314]: INFO: ACT: systemd service SAPRH1_02.service is active Dec 11 13:02:39 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[312314]: INFO: RA ==== end action monitor_clone with rc=1 (0.185.3) (4s)==== Dec 11 13:02:39 dc2hana2 pacemaker-controld[1979]: notice: Result of monitor operation for rsc_SAPHanaCon_RH1_HDB02 on dc2hana2: error Dec 11 13:02:39 dc2hana2 pacemaker-controld[1979]: notice: rsc_SAPHanaCon_RH1_HDB02_monitor_61000@dc2hana2 output [ 10\n ] Dec 11 13:02:39 dc2hana2 pacemaker-attrd[1977]: notice: Setting last-failure-rsc_SAPHanaCon_RH1_HDB02#monitor_61000[dc2hana2]: (unset) -> 1765458159 Dec 11 13:02:39 dc2hana2 pacemaker-attrd[1977]: notice: Setting fail-count-rsc_SAPHanaCon_RH1_HDB02#monitor_61000[dc2hana2]: (unset) -> 1 Dec 11 13:03:34 dc2hana2 pacemaker-attrd[1977]: notice: Setting master-rsc_SAPHanaCon_RH1_HDB02[dc2hana3]: -12200 -> -22200 Dec 11 13:03:38 dc2hana2 pacemaker-attrd[1977]: notice: Setting master-rsc_SAPHanaCon_RH1_HDB02[dc2hana4]: 80 -> -32300 Dec 11 13:03:41 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[313705]: INFO: DEC: is_master_nameserver rc=0 Dec 11 13:03:41 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[313705]: INFO: RA: multiTargetSupport attribute not set. May be no Hook is configured or the old-style Hook is used. Dec 11 13:03:41 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[313705]: INFO: DEC: is_master_nameserver rc=0 Dec 11 13:03:42 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[313705]: INFO: DEC: ===> master_walk: priorities for site DC2 master1=dc2hana1 master2=dc2hana2 master3=dc2hana4 ==> active_master=dc2hana1 best_cold_master= Dec 11 13:03:42 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[313705]: INFO: DEC: ===> master_walk: the_master=dc2hana1; Dec 11 13:03:42 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[313705]: INFO: RA ==== begin action monitor_clone (0.185.3) ==== Dec 11 13:03:42 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[313705]: INFO: ACT: systemd service SAPRH1_02.service is active Dec 11 13:03:42 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[313705]: INFO: DEC: the_master=<<dc2hana1>> Dec 11 13:03:42 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[313705]: INFO: RA: SRHOOK0= Dec 11 13:03:42 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[313705]: INFO: RA: SRHOOK1=SFAIL Dec 11 13:03:42 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[313705]: INFO: RA: SRHOOK3=SFAIL Dec 11 13:03:42 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[313705]: INFO: DEC: scoring 2:S:master2:slave:worker:slave : SFAIL Dec 11 13:03:42 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[313705]: INFO: DEC: scoring_crm_master: roles(2:S:master2:slave:worker:slave) are matching pattern ([0-9]:S:) Dec 11 13:03:42 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[313705]: INFO: DEC: scoring_crm_master: sync(SFAIL) is matching syncPattern (.*) Dec 11 13:03:42 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[313705]: INFO: DEC: scoring_crm_master: set score -32300 Dec 11 13:03:42 dc2hana2 pacemaker-attrd[1977]: notice: Setting master-rsc_SAPHanaCon_RH1_HDB02[dc2hana2]: 80 -> -32300 Dec 11 13:03:42 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[313705]: INFO: RA ==== end action monitor_clone with rc=0 (0.185.3) (2s)==== ...The next
SAPHanaControllerresource monitor reports the unexpectedly stopped HANA instances as a failure and initiates the recovery steps according to the configuration. IfPREFER_SITE_TAKEOVERis enabled and you executed the test on a primary instance, it triggers a HANA takeover to the secondary site.
Next steps
- When necessary, depending on the configuration, manually reregister the stopped former primary HANA site and start it using HANA tools. For more information, refer to Registering the former primary HANA site as a secondary HANA site after a takeover.
- Clear any failure notifications from the cluster that might be there from previous testing. For more information see Cleaning up the failure history.
7.3. Triggering a HANA takeover using cluster commands Copy linkLink copied to clipboard!
Use the cluster command to move the promoted resource to the other site and manually test the planned takeover of the primary to the secondary site.
Prerequisites
- Your HANA instances have a healthy HANA system replication.
- You have no failures in the cluster status.
Procedure
Switch the primary site to the secondary site. Run the cluster command as user root on any node and use the coordinator instance of the secondary site as the target, for example,
dc2hana1. If you do not name the target node in a HANA setup with alternate nameserver candidates, the cluster first unsuccessfully attempts to promote the alternative nodes before it ends up promoting the correct coordinator node on the secondary HANA site:[root]# pcs resource move cln_SAPHanaCon_<SID>_HDB<instance> <secondary coordinator instance>Location constraint to move resource 'cln_SAPHanaCon_RH1_HDB02' has been created Waiting for the cluster to apply configuration changes... Location constraint created to move resource 'cln_SAPHanaCon_RH1_HDB02' has been removed Waiting for the cluster to apply configuration changes... resource 'cln_SAPHanaCon_RH1_HDB02' is promoted on node 'dc2hana1'; unpromoted on nodes 'dc1hana1', 'dc1hana3', 'dc1hana4', 'dc2hana2', 'dc2hana3', 'dc2hana4'
Verification
Verify that the
SAPHanaControllerresource is now promoted on the other site:[root]# pcs resource status cln_SAPHanaCon_RH1_HDB02* Clone Set: cln_SAPHanaCon_RH1_HDB02 [rsc_SAPHanaCon_RH1_HDB02] (promotable): * Promoted: [ dc2hana1 ] * Unpromoted: [ dc1hana3 dc1hana4 dc2hana2 dc2hana3 dc2hana4 ] * Stopped: [ dc1hana1 dc1hana2 dc3mm ]The status of the previous primary site instances depends on the
AUTOMATED_REGISTERparameter of theSAPHanaControllerresource. The instance stops until manual intervention whenAUTOMATED_REGISTERisfalse, otherwise the instance restarts automatically and reregisters as the new secondary instance.
Next steps
- When necessary, depending on the configuration, manually reregister the stopped former primary HANA site and start it using HANA tools. For more information, refer to Registering the former primary HANA site as a secondary HANA site after a takeover.
- Clear any failure notifications from the cluster that may be there from previous testing. For more information see Cleaning up the failure history.
7.4. Crashing the node with a primary instance Copy linkLink copied to clipboard!
Simulate the crash of the cluster node on which a primary instance is running to test the behavior of your HANA cluster resources.
Prerequisites
- Your HANA instances have a healthy HANA system replication.
- You have no failures in the cluster status.
Procedure
Trigger a crash on a HANA node on the primary site. This command immediately causes a crash of the node with no further warning:
[root]# echo c > /proc/sysrq-trigger
Verification
The cluster detects the failed node and fences it. You can watch the cluster activity on any of the remaining nodes:
[root]# pcs status --full... Pending Fencing Actions: * reboot of dc1hana1 pending: client=pacemaker-controld.1685, origin=dc1hana2 ...- The secondary site takes over and becomes promoted as the new primary.
-
The fenced former primary node recovers according to your fencing and
SAPHanaControllerresource configuration.
Next steps
- When necessary, depending on the configuration, manually reregister the stopped former primary HANA site and start it using HANA tools. For more information, refer to Registering the former primary HANA site as a secondary HANA site after a takeover.
- Clear any failure notifications from the cluster that may be there from previous testing. For more information see Cleaning up the failure history.
7.5. Crashing the node with a secondary instance Copy linkLink copied to clipboard!
Simulate the crash of the cluster node on which a secondary instance is running to test the behavior of your HANA cluster resources.
Procedure
Trigger a crash of a HANA node on the secondary site. This command immediately causes a crash of the node with no further warning:
[root]# echo c > /proc/sysrq-trigger
Verification
The cluster detects the failed node and fences it. You can watch the cluster activity on any of the remaining nodes:
[root]# pcs status --full... Pending Fencing Actions: * reboot of dc2hana1 pending: client=pacemaker-controld.1694, origin=dc1hana1 ...- The primary site remains running while the secondary node restarts and recovers. The fenced node recovery depends on your fencing configuration.
Next steps
- Clear any failure notifications from the cluster that may be there from previous testing. For more information see Cleaning up the failure history.
7.6. Stopping the primary site using SAP commands Copy linkLink copied to clipboard!
Test the behavior of the cluster when you manage the primary HANA site outside of the cluster using HANA commands.
Since the cluster is not aware of the execution of HANA commands, it detects the change as a failure and triggers the configured recovery actions.
Prerequisites
- Your HANA instances have a healthy HANA system replication.
- You have no failures in the cluster status.
Procedure
Stop the primary HANA site as the
<sid>admuser outside of the cluster. Run on one HANA instance on the primary site:rh1adm$ sapcontrol -nr ${TINSTANCE} -function StopSystem HDB
Verification
The cluster notices the stopped instance as a failure and initiates the recovery of the primary site:
[root]# pcs status --full... Migration Summary: * Node: dc1hana1 (1): * rsc_SAPHanaCon_RH1_HDB02: migration-threshold=5000 fail-count=1 last-failure=... Failed Resource Actions: * rsc_SAPHanaCon_RH1_HDB02_monitor_59000 on dc2hana1 'promoted (failed)' (9): ...If you configured and enabled both the
PREFER_SITE_TAKEOVERandAUTOMATED_REGISTERparameters in theSAPHanaControllerresource, the cluster triggers a HANA takeover to the secondary site and automatically registers the failed primary as the new secondary. Otherwise it recovers the failed primary according to your configuration.
Next steps
- When necessary, depending on the configuration, manually reregister the stopped former primary HANA site and start it using HANA tools. For more information, refer to Registering the former primary HANA site as a secondary HANA site after a takeover.
- Clear any failure notifications from the cluster that may be there from previous testing. For more information see Cleaning up the failure history.
7.7. Stopping the secondary site using SAP commands Copy linkLink copied to clipboard!
Test the behavior of the cluster when you manage the secondary HANA site outside of the cluster using HANA commands.
Since the cluster is not aware of the execution of HANA commands, it detects the change as a failure and triggers the configured recovery actions.
Prerequisites
- You have no failures in the cluster status.
Procedure
Stop the secondary HANA site as the
<sid>admuser outside of the cluster. Run on one HANA instance on the secondary site:rh1adm$ sapcontrol -nr ${TINSTANCE} -function StopSystem HDB
Verification
The cluster notices the stopped instance as a failure and recovers the secondary site:
[root]# pcs status --full.. Migration Summary: * Node: dc2hana3 (7): * rsc_SAPHanaCon_RH1_HDB02: migration-threshold=5000 fail-count=1 last-failure=... * Node: dc2hana1 (5): * rsc_SAPHanaCon_RH1_HDB02: migration-threshold=5000 fail-count=1 last-failure=... * Node: dc2hana2 (6): * rsc_SAPHanaCon_RH1_HDB02: migration-threshold=5000 fail-count=1 last-failure=... * Node: dc2hana4 (8): * rsc_SAPHanaCon_RH1_HDB02: migration-threshold=5000 fail-count=1 last-failure=... Failed Resource Actions: * rsc_SAPHanaCon_RH1_HDB02_monitor_61000 on dc2hana3 'not running' (7): ... * rsc_SAPHanaCon_RH1_HDB02_monitor_61000 on dc2hana1 'not running' (7): ... * rsc_SAPHanaCon_RH1_HDB02_monitor_61000 on dc2hana2 'not running' (7): ... * rsc_SAPHanaCon_RH1_HDB02_monitor_61000 on dc2hana4 'not running' (7): ...
Next steps
- Clear any failure notifications from the cluster that may be there from previous testing. For more information see Cleaning up the failure history.
Chapter 8. Finishing the setup Copy linkLink copied to clipboard!
Ensure that the final setup is complete and the systems and resources are healthy, then you can enable the environment for production workloads.
8.1. Enabling the automatic registration of HANA after a takeover (optional) Copy linkLink copied to clipboard!
If you want a previously failed primary site to automatically recover as a fully functional secondary site without manual verification of the data consistency, you can enable the SAPHanaController resource to re-register the site right after a takeover.
This enables the previously failed primary site to continue the HANA system replication and automatically take over again in the event of a new failure of the new primary site.
Your HANA operator must decide if they first require to manually check the health of the previously failed instance and re-register the HANA site afterwards, or if the priority is on a faster automatic recovery of the full high availability.
Procedure
Update the
SAPHanaControllerresource and override the defaultAUTOMATED_REGISTER:[root]# pcs resource update rsc_SAPHanaCon_<SID>_HDB<instance> AUTOMATED_REGISTER=true
Verification
Check that
AUTOMATED_REGISTERis set totrue:[root]# pcs resource config rsc_SAPHanaCon_RH1_HDB02Resource: rsc_SAPHanaCon_RH1_HDB02 (class=ocf provider=heartbeat type=SAPHanaController) Attributes: rsc_SAPHanaCon_RH1_HDB02-instance_attributes AUTOMATED_REGISTER=true DUPLICATE_PRIMARY_TIMEOUT=7200 InstanceNumber=02 PREFER_SITE_TAKEOVER=true SID=RH1 ...
Setting AUTOMATED_REGISTER to true can potentially increase the risk of data loss or corruption. If the HA cluster triggers a takeover when the data on the secondary HANA site is not fully in sync, the automatic registration of the old primary HANA site as the new secondary HANA site results in data loss on this site and any data that was not synced before the takeover occurred is lost as well.
For more information, see the article on the SAP Technology Blog for Members: Be Prepared for Using Pacemaker Cluster for SAP HANA – Part 2: Failure of Both Nodes.
8.2. Reviewing the final cluster state Copy linkLink copied to clipboard!
After the configuration of a 8-node cluster for a scale-out HANA system replication setup, the status looks like in the below example.
Your cluster state may deviate from the example, depending on your setup of optional or platform dependent resources, like the individual fencing or VIP resources.
Also, you can decide if you want to disable the cluster service, so that it does not start automatically on system boot. This requires manual intervention on every system boot, but allows you more control and supervision for the startup.
[root]# pcs status --full
+
Cluster name: hana-scaleout-cluster
Status of pacemakerd: 'Pacemaker is running' (last updated 2025-12-18 16:27:04Z)
Cluster Summary:
* Stack: corosync
* Current DC: dc2hana1 (5) (version 2.1.5-9.el9_2.5-a3f44794f94) - partition with quorum
* Last updated: Thu Dec 18 16:27:05 2025
* Last change: Thu Dec 18 16:26:37 2025 by root via crm_attribute on dc1hana1
* 9 nodes configured
* 19 resource instances configured
Node List:
* Node dc1hana1 (1): online, feature set 3.16.2
* Node dc1hana2 (2): online, feature set 3.16.2
* Node dc1hana3 (3): online, feature set 3.16.2
* Node dc1hana4 (4): online, feature set 3.16.2
* Node dc2hana1 (5): online, feature set 3.16.2
* Node dc2hana2 (6): online, feature set 3.16.2
* Node dc2hana3 (7): online, feature set 3.16.2
* Node dc2hana4 (8): online, feature set 3.16.2
* Node dc3mm (9): online, feature set 3.16.2
Full List of Resources:
* rsc_fence (stonith:<fence agent>): Started dc1hana2
* Clone Set: cln_SAPHanaTop_RH1_HDB02 [rsc_SAPHanaTop_RH1_HDB02]:
* rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc2hana3
* rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc1hana1
* rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc1hana2
* rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc1hana3
* rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc2hana1
* rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc2hana2
* rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc1hana4
* rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc2hana4
* Clone Set: cln_SAPHanaCon_RH1_HDB02 [rsc_SAPHanaCon_RH1_HDB02] (promotable):
* rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc2hana3
* rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Promoted dc1hana1
* rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc1hana2
* rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc1hana3
* rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc2hana1
* rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc2hana2
* rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc1hana4
* rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc2hana4
* rsc_vip_RH1_HDB02_primary (ocf:heartbeat:IPaddr2): Started dc1hana1
* rsc_vip_RH1_HDB02_readonly (ocf:heartbeat:IPaddr2): Started dc2hana1
Node Attributes:
* Node: dc1hana1 (1):
* hana_rh1_clone_state : PROMOTED
* hana_rh1_gra : 2.0
* hana_rh1_gsh : 1.0
* hana_rh1_roles : master1:master:worker:master
* hana_rh1_site : DC1
* hana_rh1_sra : -
* master-rsc_SAPHanaCon_RH1_HDB02 : 150
* Node: dc1hana2 (2):
* hana_rh1_clone_state : DEMOTED
* hana_rh1_gra : 2.0
* hana_rh1_gsh : 1.0
* hana_rh1_roles : master2:slave:worker:slave
* hana_rh1_site : DC1
* master-rsc_SAPHanaCon_RH1_HDB02 : 140
* Node: dc1hana3 (3):
* hana_rh1_clone_state : DEMOTED
* hana_rh1_gra : 2.0
* hana_rh1_gsh : 1.0
* hana_rh1_roles : slave:slave:worker:slave
* hana_rh1_site : DC1
* master-rsc_SAPHanaCon_RH1_HDB02 : -10000
* Node: dc1hana4 (4):
* hana_rh1_clone_state : DEMOTED
* hana_rh1_gra : 2.0
* hana_rh1_gsh : 1.0
* hana_rh1_roles : master3:slave:standby:standby
* hana_rh1_site : DC1
* master-rsc_SAPHanaCon_RH1_HDB02 : 140
* Node: dc2hana1 (5):
* hana_rh1_clone_state : DEMOTED
* hana_rh1_gra : 2.0
* hana_rh1_gsh : 1.0
* hana_rh1_roles : master1:master:worker:master
* hana_rh1_site : DC2
* hana_rh1_sra : -
* master-rsc_SAPHanaCon_RH1_HDB02 : 100
* Node: dc2hana2 (6):
* hana_rh1_clone_state : DEMOTED
* hana_rh1_gra : 2.0
* hana_rh1_gsh : 1.0
* hana_rh1_roles : master2:slave:worker:slave
* hana_rh1_site : DC2
* master-rsc_SAPHanaCon_RH1_HDB02 : 80
* Node: dc2hana3 (7):
* hana_rh1_clone_state : DEMOTED
* hana_rh1_gra : 2.0
* hana_rh1_gsh : 1.0
* hana_rh1_roles : slave:slave:worker:slave
* hana_rh1_site : DC2
* master-rsc_SAPHanaCon_RH1_HDB02 : -12200
* Node: dc2hana4 (8):
* hana_rh1_clone_state : DEMOTED
* hana_rh1_gra : 2.0
* hana_rh1_gsh : 1.0
* hana_rh1_roles : master3:slave:standby:standby
* hana_rh1_site : DC2
* master-rsc_SAPHanaCon_RH1_HDB02 : 80
Migration Summary:
Tickets:
PCSD Status:
dc1hana1: Online
dc1hana2: Online
dc1hana3: Online
dc1hana4: Online
dc2hana1: Online
dc2hana2: Online
dc2hana3: Online
dc2hana4: Online
dc3mm: Online
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
In a healthy setup the additional cluster attributes appear like in this example:
[root]# SAPHanaSR-showAttr
Global cib-time prim sec srHook sync_state upd
---------------------------------------------------------------
RH1 Thu Dec 18 16:28:49 2025 DC1 DC2 SOK SOK ok
Sites lpt lss mns srr
----------------------------------
DC1 1766075329 4 dc1hana1 P
DC2 30 4 dc2hana1 S
Hosts clone_state gra gsh node_state roles score site sra
--------------------------------------------------------------------------------------
dc1hana1 PROMOTED 2.0 1.0 online master1:master:worker:master 150 DC1 -
dc1hana2 DEMOTED 2.0 1.0 online master2:slave:worker:slave 140 DC1
dc1hana3 DEMOTED 2.0 1.0 online slave:slave:worker:slave -10000 DC1
dc1hana4 DEMOTED 2.0 1.0 online master3:slave:standby:standby 140 DC1
dc2hana1 DEMOTED 2.0 1.0 online master1:master:worker:master 100 DC2 -
dc2hana2 DEMOTED 2.0 1.0 online master2:slave:worker:slave 80 DC2
dc2hana3 DEMOTED 2.0 1.0 online slave:slave:worker:slave -12200 DC2
dc2hana4 DEMOTED 2.0 1.0 online master3:slave:standby:standby 80 DC2
dc3mm online
Chapter 9. Maintenance procedures Copy linkLink copied to clipboard!
You must apply specific steps to ensure that the cluster does not cause unplanned impact so that you can perform maintenance of different components of SAP HANA system replication HA environments.
Use maintenance procedures to keep your cluster in a healthy state during planned change activity or to restore the health after unplanned incidents.
9.1. Cleaning up the failure history Copy linkLink copied to clipboard!
Clear any failure notifications from the cluster that may be there from previous testing. This resets the failure counters and the migration thresholds.
Procedure
Clean up resource failures:
[root]# pcs resource cleanupClean up the STONITH failure history:
[root]# pcs stonith history cleanup
Verification
Check the overall cluster status and confirm that no failures are displayed anymore:
[root]# pcs status --fullCheck that the stonith history for fencing actions has 0 events:
[root]# pcs stonith history
9.2. Triggering a HANA takeover using cluster commands Copy linkLink copied to clipboard!
Use the cluster control to execute a simple takeover of the primary site to the secondary site.
For detailed steps, refer to the section Testing the setup - Triggering a HANA takeover using cluster commands.
9.3. Updating the operating system and HA cluster components Copy linkLink copied to clipboard!
For updates or offline changes on the HA cluster, the operating system or even the system hardware, you must follow the Recommended Practices for Applying Software Updates to a RHEL High Availability or Resilient Storage Cluster.
9.4. Performing maintenance on the SAP HANA instances Copy linkLink copied to clipboard!
For any kind of maintenance of applications or other components that the HA cluster manages, you must enable the cluster maintenance mode to prevent the cluster from any interference during the maintenance.
During the update of your HANA instances, the cluster remains running, but is not actively monitoring resources or taking any actions. After the change on the HANA instances is done, it is vital to refresh the cluster resource status and verify that the detected resource states are all correct. Only then you can safely disable the maintenance mode without unexpected cluster actions.
If you need to stop the cluster for the maintenance activity, ensure that you set maintenance mode first, then stop and start the cluster on the node as required for the HANA maintenance.
Prerequisites
- You have configured the Pacemaker cluster to manage the HANA system replication.
Procedure
Set
maintenancemode for the entire cluster:[root]# pcs property set maintenance-mode=trueSetting maintenance for the whole cluster ensures that no activity during the maintenance phase can trigger cluster actions and impact the HANA update process.
Verify that the cluster resource management is fully disabled:
[root]# pcs status... *** Resource management is DISABLED *** The cluster will not attempt to start, stop or recover services Node List: * Online: [ dc1hana1 dc1hana2 dc1hana3 dc1hana4 dc2hana1 dc2hana2 dc2hana3 dc2hana4 dc3mm ] Full List of Resources: * rsc_fence (stonith:<fence agent>): Started dc1hana2 (unmanaged) * Clone Set: cln_SAPHanaTop_RH1_HDB02 [rsc_SAPHanaTop_RH1_HDB02] (unmanaged): * rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc2hana3 (unmanaged) * rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc1hana1 (unmanaged) * rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc1hana2 (unmanaged) * rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc1hana3 (unmanaged) * rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc2hana1 (unmanaged) * rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc2hana2 (unmanaged) * rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc1hana4 (unmanaged) * rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc2hana4 (unmanaged) * Stopped: [ dc3mm ] * Clone Set: cln_SAPHanaCon_RH1_HDB02 [rsc_SAPHanaCon_RH1_HDB02] (promotable, unmanaged): * rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc2hana3 (unmanaged) * rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc1hana1 (unmanaged) * rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc1hana2 (unmanaged) * rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc1hana3 (unmanaged) * rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc2hana1 (unmanaged) * rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc2hana2 (unmanaged) * rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc1hana4 (unmanaged) * rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc2hana4 (unmanaged) * Stopped: [ dc3mm ] * rsc_vip_RH1_HDB02_primary (ocf:heartbeat:IPaddr2): Started dc1hana1 (unmanaged) * rsc_vip_RH1_HDB02_readonly (ocf:heartbeat:IPaddr2): Started dc2hana1 (unmanaged) ...Update the HANA instances using the SAP procedure. If you have to perform a takeover during the HANA update, you can use the SAP HANA Takeover with Handshake option. For more information see also Is it possible to use SAP HANA "Takeover with Handshake" option with the HA Solutions for managing HANA System Replication?.
If you stop the cluster in this step, ensure that you start it again before you proceed with the next steps. Keep the maintenance mode enabled.
After the HANA update, verify that the HANA system replication is working correctly. Use the
systemReplicationStatus.pyscript to show the status of the HANA system replication on the primary site. Below is the example after a manual takeover to the secondary site during the maintenance:[root]# su - <sid>adm -c "HDBSettings.sh systemReplicationStatus.py \ --sapcontrol=1 | grep -i replication_status="service/dc2hana2/30203/REPLICATION_STATUS=ACTIVE service/dc2hana3/30203/REPLICATION_STATUS=ACTIVE service/dc2hana1/30201/REPLICATION_STATUS=ACTIVE service/dc2hana1/30207/REPLICATION_STATUS=ACTIVE service/dc2hana1/30203/REPLICATION_STATUS=ACTIVE site/1/REPLICATION_STATUS=ACTIVE overall_replication_status=ACTIVEBefore you proceed, ensure that the system replication is healthy and reported as
ACTIVE.Refresh all cluster resources to execute one monitor operation and update their status:
[root]# pcs resource refreshWaiting for 1 reply from the controller ... got reply (done)It is crucial that the HANA resources update cluster and node attributes to reflect the new HANA system replication status. It ensures that the cluster has the correct information and does not trigger recovery actions due to incorrect status information, after the maintenance stops.
Check the cluster status and verify the resource status and main HANA resource score attribute. All resources must show as
Startedand the promotable resources must show asUnpromotedon all nodes:[root]# pcs status resources* Clone Set: cln_SAPHanaTop_RH1_HDB02 [rsc_SAPHanaTop_RH1_HDB02] (unmanaged): * rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc1hana2 (unmanaged) * rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc1hana3 (unmanaged) * rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc1hana4 (unmanaged) * rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc2hana2 (unmanaged) * rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc2hana3 (unmanaged) * rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc1hana1 (unmanaged) * rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc2hana4 (unmanaged) * rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc2hana1 (unmanaged) * Stopped: [ dc3mm ] * Clone Set: cln_SAPHanaCon_RH1_HDB02 [rsc_SAPHanaCon_RH1_HDB02] (promotable, unmanaged): * rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc1hana2 (unmanaged) * rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc1hana3 (unmanaged) * rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc1hana4 (unmanaged) * rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc2hana2 (unmanaged) * rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc2hana3 (unmanaged) * rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Promoted dc1hana1 (unmanaged) * rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc2hana4 (unmanaged) * rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc2hana1 (unmanaged) * Stopped: [ dc3mm ] * rsc_vip_RH1_HDB02_primary (ocf:heartbeat:IPaddr2): Started dc1hana1 (unmanaged) * rsc_vip_RH1_HDB02_readonly (ocf:heartbeat:IPaddr2): Started dc2hana1 (unmanaged)Check the cluster attributes and verify that at least the
sync_stateattribute isSOK:[root]# SAPHanaSR-showAttrGlobal cib-time maintenance prim sec srHook sync_state upd --------------------------------------------------------------------------- RH1 Fri Dec 19 09:41:39 2025 true DC2 DC1 SOK SOK ok …Depending on the maintenance activity, the rest of the attribute information can be different or empty, for example, when you stopped and restarted the cluster on all nodes.
When the checks of the previous steps show the landscape in the expected healthy state, you can remove the maintenance mode of the cluster again:
[root]# pcs property set maintenance-mode=When you lift the maintenance it triggers a monitor run of all resources again. The cluster updates the status of the promotable resources to
PromotedandUnpromotedin correspondence to the location of the primary and secondary instances. The resources now also update thesrPollattribute again to match thesrHookattribute value.
Verification
Check that the resources are managed again and are in the expected state on all nodes:
[root]# pcs resource status* Clone Set: cln_SAPHanaTop_RH1_HDB02 [rsc_SAPHanaTop_RH1_HDB02]: * Started: [ dc1hana1 dc1hana2 dc1hana3 dc1hana4 dc2hana1 dc2hana2 dc2hana3 dc2hana4 ] * Clone Set: cln_SAPHanaCon_RH1_HDB02 [rsc_SAPHanaCon_RH1_HDB02] (promotable): * Promoted: [ dc1hana1 ] * Unpromoted: [ dc1hana2 dc1hana3 dc1hana4 dc2hana1 dc2hana2 dc2hana3 dc2hana4 ] * rsc_vip_RH1_HDB02_primary (ocf:heartbeat:IPaddr2): Started dc1hana1 * rsc_vip_RH1_HDB02_readonly (ocf:heartbeat:IPaddr2): Started dc2hana1Verify that the
srHookattribute value isSOKand that the nodes have the expected score values assigned:[root]# SAPHanaSR-showAttrGlobal cib-time maintenance prim sec srHook sync_state upd --------------------------------------------------------------------------- RH1 Fri Dec 19 09:41:39 2025 true DC2 DC1 SOK SOK ok Sites lpt lss mns srr ---------------------------------- DC1 1766139865 4 dc1hana1 P DC2 30 4 dc2hana1 S Hosts clone_state gra node_state roles score site ------------------------------------------------------------------------------ dc1hana1 PROMOTED 2.0 online master1:master:worker:master 150 DC1 dc1hana2 DEMOTED 2.0 online master2:slave:worker:slave 140 DC1 dc1hana3 DEMOTED 2.0 online slave:slave:worker:slave -10000 DC1 dc1hana4 DEMOTED 2.0 online master3:slave:standby:standby 140 DC1 dc2hana1 DEMOTED 2.0 online master1:master:worker:master 100 DC2 dc2hana2 DEMOTED 2.0 online master2:slave:worker:slave 80 DC2 dc2hana3 DEMOTED 2.0 online slave:slave:worker:slave -12200 DC2 dc2hana4 DEMOTED 2.0 online master3:slave:standby:standby 80 DC2 dc3mm online
Troubleshooting
If the srHook attribute value is SFAIL at the end of the maintenance, then the scores for the secondary site nodes are reduced and prevent the cluster from triggering a takeover in the case of a failure. To review and fix this, see The srHook attribute is SFAIL while the system replication is healthy.
9.5. Registering the former primary HANA site as a secondary HANA site after a takeover Copy linkLink copied to clipboard!
When you configure AUTOMATED_REGISTER=false in the SAPHanaController resource, which is the default, you must manually register the former primary site as the new secondary after takeover and start it. Otherwise, the unregistered site remains stopped.
Procedure
Register the former primary site as the new secondary site. Run as user
<sid>admon one stopped former primary instance:rh1adm$ hdbnsutil -sr_register --remoteHost=<node> \ --remoteInstance=${TINSTANCE} --replicationMode=sync \ --operationMode=logreplay --name=<site>-
Replace
<node>with the new primary instance host, for example,dc2hana1if there was a takeover from dc1hana1 to dc2hana1. -
Replace
<site>with your new secondary HANA site name, for example,DC1if dc1hana1 is to be registered as a secondary. -
Choose the values for
replicationModeandoperationModeaccording to your requirements for the system replication. -
$TINSTANCEis an environment variable that is set automatically for user<sid>admby reading the HANA instance profile. The variable value is the HANA instance number.
-
Replace
Start the secondary HANA site. Run as
<sid>admon one new secondary instance node:rh1adm$ sapcontrol -nr ${TINSTANCE} -function StartSystem HDB
Verification
On one new primary instance, show the current status of the re-established HANA system replication. Below is the example after a takeover from
dc1hana1todc2hana1and DC2 is the new primary site:rh1adm$ cdpy; python systemReplicationStatus.py|Database |Host |Port |Service Name |Volume ID |Site ID |Site Name |Secondary |Secondary |Secondary |Secondary |Secondary |Replication |Replication |Replication |Secondary | | | | | | | | |Host |Port |Site ID |Site Name |Active Status |Mode |Status |Status Details |Fully Synced | |-------- |-------- |----- |------------ |--------- |------- |--------- |--------- |--------- |--------- |--------- |------------- |----------- |----------- |-------------- |------------ | |RH1 |dc2hana2 |30203 |indexserver | 4 | 2 |DC2 |dc1hana2 | 30203 | 1 |DC1 |YES |SYNC |ACTIVE | | True | |RH1 |dc2hana3 |30203 |indexserver | 5 | 2 |DC2 |dc1hana3 | 30203 | 1 |DC1 |YES |SYNC |ACTIVE | | True | |SYSTEMDB |dc2hana1 |30201 |nameserver | 1 | 2 |DC2 |dc1hana1 | 30201 | 1 |DC1 |YES |SYNC |ACTIVE | | True | |RH1 |dc2hana1 |30207 |xsengine | 2 | 2 |DC2 |dc1hana1 | 30207 | 1 |DC1 |YES |SYNC |ACTIVE | | True | |RH1 |dc2hana1 |30203 |indexserver | 3 | 2 |DC2 |dc1hana1 | 30203 | 1 |DC1 |YES |SYNC |ACTIVE | | True | status system replication site "1": ACTIVE overall system replication status: ACTIVE Local System Replication State ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ mode: PRIMARY site id: 2 site name: DC2
Chapter 10. Troubleshooting Copy linkLink copied to clipboard!
10.1. The srHook cluster attribute value is incorrect Copy linkLink copied to clipboard!
When the srHook attribute value does not match the actual HANA system replication status, it can lead to unexpected behavior in the cluster when a failure of a primary instance occurs.
Check and correct your sudo configuration when the srHook attribute of the secondary site and the HANA system replication status do not match:
-
The
srHookcluster attribute of the secondary is empty. -
The
srHookcluster attribute of the secondary is set toSOKwhile the HANA system replication is not healthy. -
The
srHookcluster attribute of the secondary is set toSFAILwhile the system replication is inACTIVEstate.
The primary site receives the events of HANA system replication changes and stores the result as a cluster attribute for the secondary site.
Procedure
Check for
crm_attributeupdate errors in thesecurelog, since the command is executed usingsudo. The log shows the command that the hook script tries to execute, but potentially fails. Check on the primary instance node for an error likecommand not allowed, like in this example:[root]# grep crm_attribute /var/log/secure... rh1adm : command not allowed ; PWD=/hana/shared/RH1/HDB02/<node> ; USER=root ; COMMAND=/usr/sbin/crm_attribute -n hana_rh1_site_srHook_DC2 -v SFAIL -t crm_config -s SAPHanaSRCompare the logged
COMMANDto yoursudoersconfiguration. Check thoroughly and fix thesudoersfile, so that you have a sudo entry that matches the command. As a temporary measure you can ensure that the sudo entry as such works by simplifying it with a wildcard to exclude typos in the command parameters as the cause:[root]# cat /etc/sudoers.d/20-saphanaDefaults:<sid>adm !requiretty <sid>adm ALL=(ALL) NOPASSWD: /usr/sbin/crm_attribute *-
Replace
<sid>with your lower-case HANA SID.
-
Replace
Verify that the command path is correct:
[root]# ls /usr/sbin/crm_attribute/usr/sbin/crm_attribute- Fix the sudo configuration. For more information, see Configuring the SAPHanaSR HA/DR provider for the srConnectionChanged() hook method.
- Repeat any fixing steps on all nodes. The sudo configuration must be identical on all instances.
10.2. The HANA instance does not start after hook changes Copy linkLink copied to clipboard!
You recently made changes in the global.ini in a HA/DR provider section and the HANA instance does not start anymore.
Procedure
Go to the HANA trace logs directory, as the
<sid>admuser:rh1adm$ cdtraceCheck for errors related to the HA/DR providers in the HANA nameserver process alert log:
rh1adm$ grep ha_dr_provider nameserver_alert_*.trc... ha_dr_provider PythonProxyImpl.cpp(00145) : import of saphanasr failed: No module named 'saphanasr' ... ha_dr_provider HADRProviderManager.cpp(00100) : could not load HA/DR Provider 'saphanasr' from /usr/share/SAPHanaSR-ScaleOutIdentify the root cause, for example a misspelled HA/DR
providername or a wrongpath. Check the path and the hook script name. In this example the HA/DR provider namesaphanasris not matching the hook script nameSAPHanaSR:rh1adm$ ls /usr/share/SAPHanaSR-ScaleOut/ChkSrv.py SAPHanaSR.py SAPHanaSrMultiTarget.py samplesCorrect the
SAPHanaSRHA/DR provider configuration:[ha_dr_provider_SAPHanaSR] provider = SAPHanaSR path = /usr/share/SAPHanaSR-ScaleOut execution_order = 1-
providermust match the name of the Python hook script. It is case-sensitive without the.pyfile suffix. -
pathmust be the path in which the hook script is stored.
-
10.3. A cluster node is reported as offline during maintenance Copy linkLink copied to clipboard!
When maintenance-mode is set for the cluster, for example, for a HANA update, it can still notice issues between the nodes, but does not trigger recovery actions yet.
If you encounter such a situation, you must first fix the cause of the issue before you lift the maintenance mode.
Example: the corosync communication between the nodes is blocked in a 8-node cluster
If the maintenance mode is removed in this situation, the cluster tries to recover the issue by itself. This can have a severe impact on your ongoing HANA maintenance activity.
...
* Resource management is DISABLED *
The cluster will not attempt to start, stop or recover services
Node List:
* Node dc2hana3: UNCLEAN (offline)
* Online: [ dc1hana1 dc1hana2 dc1hana3 dc1hana4 dc2hana1 dc2hana2 dc2hana4 dc3mm ]
Full List of Resources:
* rsc_fence (stonith:<fence agent>): Started dc1hana1 (maintenance)
* Clone Set: cln_SAPHanaTop_RH1_HDB02 [rsc_SAPHanaTop_RH1_HDB02] (maintenance):
* rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc1hana2 (maintenance)
* rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc1hana3 (maintenance)
* rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc1hana4 (maintenance)
* rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc2hana1 (maintenance)
* rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc2hana2 (maintenance)
* rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc2hana3 (UNCLEAN, maintenance)
* rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc2hana4 (maintenance)
* rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc1hana1 (maintenance)
* Stopped: [ dc2hana3 dc3mm ]
* Clone Set: cln_SAPHanaCon_RH1_HDB02 [rsc_SAPHanaCon_RH1_HDB02] (promotable, maintenance):
* rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc1hana2 (maintenance)
* rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc1hana3 (maintenance)
* rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc1hana4 (maintenance)
* rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc2hana1 (maintenance)
* rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc2hana2 (maintenance)
* rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc2hana3 (UNCLEAN, maintenance)
* rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc2hana4 (maintenance)
* rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Promoted dc1hana1 (maintenance)
* Stopped: [ dc2hana3 dc3mm ]
* rsc_vip_RH1_HDB02_primary (ocf:heartbeat:IPaddr2): Started dc1hana1 (maintenance)
* rsc_vip_RH1_HDB02_readonly (ocf:heartbeat:IPaddr2): Started dc2hana1 (maintenance)
...
Identify the root cause of the issue, for example:
- Planned network maintenance on the cluster communication connection in parallel to your HANA maintenance.
- Unplanned outage of network connections due to network device failures or misconfiguration on operating system or network level.
- Firewall configuration blocking cluster communication ports.
Fix any issue to prevent the cluster from taking recovery measures when the cluster maintenance is removed.
10.4. The srHook attribute is SFAIL while the system replication is healthy Copy linkLink copied to clipboard!
An inconsistency between the actual HANA system replication state and the srHook cluster node attribute can occur, when the cluster is running on the primary instance node while the system replication fails, for example, during a maintenance. HANA triggers the hook that updates the srHook attribute with the SFAIL value. If the cluster is then stopped on the primary instance node and the HANA system replication recovers to a healthy state, the hook is correctly executed by HANA, but the update of the cluster node attribute fails.
The primary HANA instance only triggers the srConnectionChanged() hook when there is a new change of the system replication status.
The sync_state attribute is set based on an active check and functions as a fallback when the srHook value is empty. However, when the values are different, then the SAPHanaController resource uses the srHook attribute to take the decision if a takeover is possible or not. As a result, if the srHook attribute is SFAIL despite a healthy HANA system replication state, the cluster will not trigger the takeover to the secondary site at the next failure on the primary site.
To solve this conflict, you can delete the incorrect srHook attribute. Afterwards the cluster uses the sync_state attribute for decisions, and the srHook attribute is updated and used again after the next change of the HANA system replication status.
Procedure
Use the
systemReplicationStatus.pyscript to check the status of the HANA system replication on the primary site:[root]# su - <sid>adm -c "HDBSettings.sh systemReplicationStatus.py \ --sapcontrol=1 | grep -i replication_status="service/dc1hana3/30203/REPLICATION_STATUS=ACTIVE service/dc1hana2/30203/REPLICATION_STATUS=ACTIVE service/dc1hana1/30201/REPLICATION_STATUS=ACTIVE service/dc1hana1/30207/REPLICATION_STATUS=ACTIVE service/dc1hana1/30203/REPLICATION_STATUS=ACTIVE site/2/REPLICATION_STATUS=ACTIVE overall_replication_status=ACTIVEBefore you proceed, ensure that the system replication is healthy and reported as
ACTIVE.Review the
sync_stateandsrHookattributes and the node score values during the conflict:[root]# SAPHanaSR-showAttrGlobal cib-time prim sec srHook sync_state upd --------------------------------------------------------------- RH1 Fri Dec 19 11:12:42 2025 DC1 DC1 SFAIL SOK ok Sites lpt lss mns srr ---------------------------------- DC1 1766142750 4 dc1hana1 P DC2 10 4 dc2hana1 S Hosts clone_state gra node_state roles score site --------------------------------------------------------------------------------- dc1hana1 PROMOTED 2.0 online master1:master:worker:master 150 DC1 dc1hana2 DEMOTED 2.0 online master2:slave:worker:slave 140 DC1 dc1hana3 DEMOTED 2.0 online slave:slave:worker:slave -10000 DC1 dc1hana4 DEMOTED 2.0 online master3:slave:standby:standby 140 DC1 dc2hana1 DEMOTED 2.0 online master1:master:worker:master -INFINITY DC2 dc2hana2 DEMOTED 2.0 online master2:slave:worker:slave -32300 DC2 dc2hana3 DEMOTED 2.0 online slave:slave:worker:slave -22200 DC2 dc2hana4 DEMOTED 2.0 online master3:slave:standby:standby -32300 DC2 dc3mm onlineIn this state, the
sync_stateattribute is correct, but thesrHookattribute takes precedence. Therefore, the secondary site is excluded from taking over if the primary site fails.Delete the
srHookattribute to solve the conflict:[root]# crm_attribute --type crm_config -n hana_<sid>_glob_srHook --deleteDeleted crm_config option: id=SAPHanaSR-hana_rh1_glob_srHook name=hana_rh1_glob_srHook
Verification
Check the attributes summary and note, that the
srHookattribute is missing and that the node scores are updated to enable an automatic takeover again using thesync_stateattribute status:[root]# SAPHanaSR-showAttrGlobal cib-time prim sec sync_state upd -------------------------------------------------------- RH1 Fri Dec 19 11:17:59 2025 DC1 DC1 SOK ok Sites lpt lss mns srr ---------------------------------- DC1 1766143077 4 dc1hana1 P DC2 30 4 dc2hana1 S Hosts clone_state gra node_state roles score site ------------------------------------------------------------------------------ dc1hana1 PROMOTED 2.0 online master1:master:worker:master 150 DC1 dc1hana2 DEMOTED 2.0 online master2:slave:worker:slave 140 DC1 dc1hana3 DEMOTED 2.0 online slave:slave:worker:slave -10000 DC1 dc1hana4 DEMOTED 2.0 online master3:slave:standby:standby 140 DC1 dc2hana1 DEMOTED 2.0 online master1:master:worker:master 100 DC2 dc2hana2 DEMOTED 2.0 online master2:slave:worker:slave 80 DC2 dc2hana3 DEMOTED 2.0 online slave:slave:worker:slave -12200 DC2 dc2hana4 DEMOTED 2.0 online master3:slave:standby:standby 80 DC2 dc3mm online
Appendix A. Component options Copy linkLink copied to clipboard!
A.1. HA/DR provider options for SAPHanaSR Copy linkLink copied to clipboard!
Parameters that are available for the configuration of the SAPHanaSR HA/DR provider are shown below:
| Provider options | Required | Default | Description |
|---|---|---|---|
|
| yes |
The provider parameter must be set to the hook script name without the | |
|
| yes | The full path to the location of the hook script. | |
|
| yes |
Set to |
A.2. HA/DR provider options for ChkSrv Copy linkLink copied to clipboard!
Parameters that are available for the configuration of the ChkSrv HA/DR provider are shown below:
| Provider options | Required | Default | Description |
|---|---|---|---|
|
| yes |
The provider parameter must be set to the hook script name without the | |
|
| yes | The full path to the location of the hook script. | |
|
| yes |
Set to | |
|
| no | ignore | Action to be triggered when a lost indexserver is identified.
|
|
| no | 9 |
The signal that is used with the |
|
| no | 20s |
How many seconds to wait for the |
A.3. SAPHanaTopology resource parameters Copy linkLink copied to clipboard!
Parameters that are available for the configuration of SAPHanaTopology resources are shown below:
| Resource options | Required | Default | Description |
|---|---|---|---|
|
| yes | SAP system identifier. | |
|
| yes | Number of the SAP HANA instance. | |
|
| no | 120 |
Defines the timeout - how long a call to HANA to receive information can take, for example, when the resource agent executes If you increase the timeout for HANA calls of this resource you must also consider increasing the operation timeout values of the same resource. |
A.4. SAPHanaController resource parameters Copy linkLink copied to clipboard!
Parameters that are available for the configuration of SAPHanaController resources are shown below:
| Resource options | Required | Default | Description |
|---|---|---|---|
|
| yes | SAP system identifier. | |
|
| yes | Number of the SAP HANA instance. | |
|
| no |
The fully qualified path to binaries such as SAP standard paths are searched by default. | |
|
| no | The fully qualified path to the SAP START profile. Specify this parameter if you have changed the SAP profile directory location after the default SAP installation. SAP standard paths are searched by default. | |
|
| no | 120 |
Defines the timeout - how long a call to HANA to receive information can take, for example, when the resource agent runs If you increase the timeout for HANA calls of this resource you must also consider increasing the operation timeout values of the same resource. |
|
| no | The name of the SAP HANA instance profile. Specify this parameter if you have changed the name of the SAP HANA instance profile after the default SAP installation. SAP standard paths are searched by default. | |
|
| no | false |
Defines whether the resource agent should prefer to trigger takeover to the secondary site instead of restarting the primary site locally. However, a takeover is only triggered if the SAP HANA landscape status is on
|
|
| no | false |
Defines whether the resource agent automatically registers a former primary instance as a secondary during cluster resource start and if the
|
|
| no | 7200 |
The time difference required between two primary time stamps (LPTs), in case a dual-primary situation occurs. If the difference between both nodes' last primary time stamps is less than
How the recovery proceeds after the |
We recommend that you set PREFER_SITE_TAKEOVER to true. This allows the HA cluster to trigger a takeover when a failure of the primary HANA instance is detected. In most cases it takes less time for the new HANA primary instance to become fully active after a takeover, than it takes for the original primary instance to restart and reload all data back from disk into memory.
Leave AUTOMATED_REGISTER set to false to give the operator the option to first verify the health and data consistency of the previously failed primary instance, before you manually register it as the new secondary instance to re-establish the HANA system replication between both instances and manually start the instance.
Set AUTOMATED_REGISTER to true to enable the automatic registration of the former primary instance as the new secondary after a takeover occurs. This increases the availability of the HANA system replication setup and prevents so-called dual-primary situations in the SAP HANA system replication environment.
Appendix B. Useful information Copy linkLink copied to clipboard!
B.1. Explaining cluster information Copy linkLink copied to clipboard!
When the cluster resources start they execute the first monitor operation and gather initial resource information. The HANA resources add node attributes and cluster properties for the collected landscape information, which describes the current state of SAP HANA databases on the cluster nodes.
Scale-out cluster status with node attributes
[root]# pcs status --full
Cluster name: hana-scaleout-cluster
Status of pacemakerd: 'Pacemaker is running' (last updated 2025-12-18 16:27:04Z)
Cluster Summary:
* Stack: corosync
* Current DC: dc2hana1 (5) (version 2.1.5-9.el9_2.5-a3f44794f94) - partition with quorum
* Last updated: Thu Dec 18 16:27:05 2025
* Last change: Thu Dec 18 16:26:37 2025 by root via crm_attribute on dc1hana1
* 9 nodes configured
* 19 resource instances configured
Node List:
* Node dc1hana1 (1): online, feature set 3.16.2
* Node dc1hana2 (2): online, feature set 3.16.2
* Node dc1hana3 (3): online, feature set 3.16.2
* Node dc1hana4 (4): online, feature set 3.16.2
* Node dc2hana1 (5): online, feature set 3.16.2
* Node dc2hana2 (6): online, feature set 3.16.2
* Node dc2hana3 (7): online, feature set 3.16.2
* Node dc2hana4 (8): online, feature set 3.16.2
* Node dc3mm (9): online, feature set 3.16.2
Full List of Resources:
* rsc_fence (stonith:<fence agent>): Started dc1hana2
* Clone Set: cln_SAPHanaTop_RH1_HDB02 [rsc_SAPHanaTop_RH1_HDB02]:
* rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc2hana3
* rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc1hana1
* rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc1hana2
* rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc1hana3
* rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc2hana1
* rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc2hana2
* rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc1hana4
* rsc_SAPHanaTop_RH1_HDB02 (ocf:heartbeat:SAPHanaTopology): Started dc2hana4
* Clone Set: cln_SAPHanaCon_RH1_HDB02 [rsc_SAPHanaCon_RH1_HDB02] (promotable):
* rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc2hana3
* rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Promoted dc1hana1
* rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc1hana2
* rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc1hana3
* rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc2hana1
* rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc2hana2
* rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc1hana4
* rsc_SAPHanaCon_RH1_HDB02 (ocf:heartbeat:SAPHanaController): Unpromoted dc2hana4
* rsc_vip_RH1_HDB02_primary (ocf:heartbeat:IPaddr2): Started dc1hana1
* rsc_vip_RH1_HDB02_readonly (ocf:heartbeat:IPaddr2): Started dc2hana1
Node Attributes:
* Node: dc1hana1 (1):
* hana_rh1_clone_state : PROMOTED
* hana_rh1_gra : 2.0
* hana_rh1_gsh : 1.0
* hana_rh1_roles : master1:master:worker:master
* hana_rh1_site : DC1
* hana_rh1_sra : -
* master-rsc_SAPHanaCon_RH1_HDB02 : 150
* Node: dc1hana2 (2):
* hana_rh1_clone_state : DEMOTED
* hana_rh1_gra : 2.0
* hana_rh1_gsh : 1.0
* hana_rh1_roles : master2:slave:worker:slave
* hana_rh1_site : DC1
* master-rsc_SAPHanaCon_RH1_HDB02 : 140
* Node: dc1hana3 (3):
* hana_rh1_clone_state : DEMOTED
* hana_rh1_gra : 2.0
* hana_rh1_gsh : 1.0
* hana_rh1_roles : slave:slave:worker:slave
* hana_rh1_site : DC1
* master-rsc_SAPHanaCon_RH1_HDB02 : -10000
* Node: dc1hana4 (4):
* hana_rh1_clone_state : DEMOTED
* hana_rh1_gra : 2.0
* hana_rh1_gsh : 1.0
* hana_rh1_roles : master3:slave:standby:standby
* hana_rh1_site : DC1
* master-rsc_SAPHanaCon_RH1_HDB02 : 140
* Node: dc2hana1 (5):
* hana_rh1_clone_state : DEMOTED
* hana_rh1_gra : 2.0
* hana_rh1_gsh : 1.0
* hana_rh1_roles : master1:master:worker:master
* hana_rh1_site : DC2
* hana_rh1_sra : -
* master-rsc_SAPHanaCon_RH1_HDB02 : 100
* Node: dc2hana2 (6):
* hana_rh1_clone_state : DEMOTED
* hana_rh1_gra : 2.0
* hana_rh1_gsh : 1.0
* hana_rh1_roles : master2:slave:worker:slave
* hana_rh1_site : DC2
* master-rsc_SAPHanaCon_RH1_HDB02 : 80
* Node: dc2hana3 (7):
* hana_rh1_clone_state : DEMOTED
* hana_rh1_gra : 2.0
* hana_rh1_gsh : 1.0
* hana_rh1_roles : slave:slave:worker:slave
* hana_rh1_site : DC2
* master-rsc_SAPHanaCon_RH1_HDB02 : -12200
* Node: dc2hana4 (8):
* hana_rh1_clone_state : DEMOTED
* hana_rh1_gra : 2.0
* hana_rh1_gsh : 1.0
* hana_rh1_roles : master3:slave:standby:standby
* hana_rh1_site : DC2
* master-rsc_SAPHanaCon_RH1_HDB02 : 80
Migration Summary:
Tickets:
PCSD Status:
dc1hana1: Online
dc1hana2: Online
dc1hana3: Online
dc1hana4: Online
dc2hana1: Online
dc2hana2: Online
dc2hana3: Online
dc2hana4: Online
dc3mm: Online
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
Cluster properties
The new generation of SAP HANA resource agents stores information about the HANA instance as cluster properties in the property set named SAPHanaSR. You can query the CIB (cluster information base) as the root user to check the content of the cluster attributes. The HANA resources and the SAPHanaSR hook update these attributes.
[root]# cibadmin --query --xpath "//crm_config//cluster_property_set[@id='SAPHanaSR']"
<cluster_property_set id="SAPHanaSR">
</cluster_property_set>
SAPHanaSR-showAttr
You can use the tool SAPHanaSR-showAttr to display all of the HANA cluster attribute information in a preformatted overview.
Check this status in addition to pcs status [--full] to see the overall landscape health.
[root]# SAPHanaSR-showAttr
Global cib-time prim sec srHook sync_state upd
---------------------------------------------------------------
RH1 Thu Dec 18 16:28:49 2025 DC1 DC2 SOK SOK ok
Sites lpt lss mns srr
----------------------------------
DC1 1766075329 4 dc1hana1 P
DC2 30 4 dc2hana1 S
Hosts clone_state gra gsh node_state roles score site sra
--------------------------------------------------------------------------------------
dc1hana1 PROMOTED 2.0 1.0 online master1:master:worker:master 150 DC1 -
dc1hana2 DEMOTED 2.0 1.0 online master2:slave:worker:slave 140 DC1
dc1hana3 DEMOTED 2.0 1.0 online slave:slave:worker:slave -10000 DC1
dc1hana4 DEMOTED 2.0 1.0 online master3:slave:standby:standby 140 DC1
dc2hana1 DEMOTED 2.0 1.0 online master1:master:worker:master 100 DC2 -
dc2hana2 DEMOTED 2.0 1.0 online master2:slave:worker:slave 80 DC2
dc2hana3 DEMOTED 2.0 1.0 online slave:slave:worker:slave -12200 DC2
dc2hana4 DEMOTED 2.0 1.0 online master3:slave:standby:standby 80 DC2
dc3mm online