Automating SAP HANA Scale-Out System Replication using the RHEL HA Add-On

Red Hat Enterprise Linux for SAP Solutions 9

Creating an HA cluster for automating scale-out HANA system replication with the classic dedicated SAP HANA scale-out resource agents

Red Hat Customer Content Services

Legal Notice

Abstract

This document describes how to plan and implement automated takeover for SAP HANA Scale-Out deployments.

Providing feedback on Red Hat documentation
Copy link

We appreciate your feedback on our documentation. Let us know how we can improve it.

Submitting feedback through Jira (account required)

Make sure you are logged in to the Jira website.
Click on this link to provide feedback.
Enter a descriptive title in the Summary field.
Enter your suggestion for improvement in the Description field. Include links to the relevant parts of the documentation.
Click Create at the bottom of the dialogue.

Chapter 1. Introduction to SAP HANA scale-out system replication HA
Copy link

Configuring the SAP HANA system replication between two identical HANA sites enables a basic resiliency of the database. You can configure these two sites in a Pacemaker cluster for advanced high availability that automatically handles the service recovery in the case of a failure on the primary instance side.

1.1. Terminology
Copy link

node
One host or system in a HA cluster setup, also called a cluster member.
cluster
Cluster is the high-availability setup using the Pacemaker cluster manager from the RHEL HA Add-On. It consists of two or more members, or nodes.
instance
One set of SAP HANA systems that belong to one HANA site. In single-host (scale-up) HANA environments, one HANA site consists of a single HANA instance. In multiple-host (scale-out) HANA configurations, each HANA site consists of two or more HANA instances.
primary
The primary HANA instance or primary site refers to the instance which is the active HANA instance or site. In single-host setups (scale-up), this is one system. In multiple-host (scale-out) setups, the primary database stretches across multiple systems of one HANA site and the systems have different roles in the HANA environment to distribute load.
secondary
The secondary HANA instance or secondary site refers to the SAP HANA instance or site which is configured to be synced with the primary HANA instance through the SAP HANA system replication mechanism. This instance preloads the in-memory data of the primary instance and is ready to take over if the primary instance fails.

1.2. Performance-optimized SAP HANA scale-out HA
Copy link

Performance-optimized means that there is only a single SAP HANA instance running on each node that has control over most of the resources, such as CPU and RAM, on each node. This means that the SAP HANA instances can run with as much performance as possible.

You configure the HANA environment without HANA standby hosts and only one coordinator name server per replication site. This master name server controls the landscape of each site. The HANA host auto-failover functionality using idle standby hosts is not necessary, because the Pacemaker cluster controls the high availability of the HANA database and manages the HANA system replication.

With a performance-optimized SAP HANA system replication setup of SAP HANA 2.0 SPS1 or newer you can also configure read access to the secondary system to reduce the load on the primary instance. For more information see the SAP documentation for Active/Active (Read Enabled) configuration.

1.3. Cluster resource agents and tools for SAP HANA HA
Copy link

The high-availability (HA) cluster configuration for managing SAP HANA system replication setups works with multiple resource agents and other tools that combine their functionality for the expected behavior.

The resource agents and tools are provided in the package resource-agents-sap-hana-scaleout.

SAPHanaTopology
The SAPHanaTopology resource agent gets status information from the SAP HANA environment and saves it to cluster properties. The agent also starts and monitors the local SAP HostAgent, which is required for starting, stopping and monitoring the HANA instances. A configuration process in SAP HANA called system replication hook adds replication health information as well to the saved properties. Based on the collected environment data, the resource agent defines a dedicated health score of the cluster node. This scoring is used by the cluster to decide if it must initiate the switch of the system replication from one site to the other.
SAPHanaController
The SAPHanaController resource agent monitors and manages the SAP HANA environment. In case of a failure of the HANA instance, the resource determines which recovery action it takes and executes the commands for an automatic switch, or it changes the active site of the system replication.
SAPHanaSR-showAttr
The SAPHanaSR-showAttr tool shows cluster attributes for the SAP HANA system replication automation in a preformatted overview including the HANA topology that shows whether it is a scale-up or scale-out environment. The default output includes the system replication status between the nodes and other related status information. The script retrieves the information from the Cluster Information Base (CIB), where other resource agents or hook scripts store updates during their regular checks or from HANA events, respectively. Due to this, the information can contain outdated states until it is updated again. Use HANA tools to get real-time status information from the landscape.

1.4. SAP HANA HA/DR provider hooks
Copy link

Current versions of SAP HANA provide an API in the form of hooks that allow the HANA instance to send notifications for certain events, for example the loss or establishment of the system replication. For each event, the HANA instance calls the configured hooks, also called HA/DR providers. Hooks are custom Python scripts which process the events that HANA sends and the scripts can trigger different actions based on the event information.

You must add the HA/DR provider definition to the HANA global configuration to enable the required functionality of triggering additional actions for certain events.

SAPHanaSR for the srConnectionChanged() hook method

The SAPHanaSR hook is required for processing the srConnectionChanged() hook method. This method is used by the primary HANA instance for a notification of any change in the HANA system replication status. The primary HANA instance calls the SAPHanaSR HA/DR provider when a HANA system replication related event occurs. The hook script SAPHanaSR.py then parses srConnectionChanged() events for the system replication status detail and as a result it updates the srHook cluster attribute. This attribute is used by the resource agents to evaluate the landscape health and make decisions. The value of the system replication or sync state defines if the cluster recovers a failed primary instance on the same node or if it triggers a takeover to the secondary. The takeover is only triggered when the system replication is fully in sync, which means the HANA data is consistent between the HANA sites.

Important

You must configure the SAPHanaSR hook to enable the srConnectionChanged() hook method for proper function and full support of the HA cluster setup.

ChkSrv for the srServiceStateChanged() hook method

When the HANA instance detects an issue with a HANA indexserver process it recovers from the problem by stopping and restarting the hdbindexserver service automatically through an internal mechanism.

However, especially for very large HANA instances, the hdbindexserver service can take a very long time for the stopping phase of this recovery process. Although HANA reports this service degradation not as an error in the HANA landscape, the situation poses a risk to the data consistency if anything else fails in the instance during that time. To improve the unpredictable service recovery time, you can configure the ChkSrv hook to stop or kill the entire affected HANA instance instead.

In a setup with automatic failover enabled (PREFER_SITE_TAKEOVER=true), the instance stop leads to a takeover if the secondary node is in a healthy state. Otherwise, instance recovery happens locally, but the enforced local instance restart accelerates the process.

The HANA instance calls the ChkSrv hook when an event occurs. The hook script ChkSrv.py processes the srServiceStateChanged() hook method and executes actions based on the results of the filters it applies to event details. This way the ChkSrv.py hook script can distinguish a HANA hdbindexserver process that is being stopped and restarted by HANA after a failure from the same process being stopped as part of an intended instance shutdown. When the hook script determines that the event is caused by a failure it triggers the configured action.

The ChkSrv.py hook script has multiple options to define what happens when an indexserver failure event is detected:

ignore
This action just writes the parsed events and decision information to a dedicated logfile. This is useful for testing and verifying what the hook script would do when activating stop or kill actions.
stop
This action executes a graceful StopSystem for the instance through the sapcontrol command.
kill
This action executes the HDB kill-<signal> command with a default signal 9, which can be configured. The result is the same as when using stop, but can be faster.

Note

Any indexserver failure is treated individually by HANA. The same processes are always triggered for every single indexserver issue.

Enabling the srServiceStateChanged() hook is optional.

1.5. Support policies for SAP HANA High Availability
Copy link

Red Hat supports the following components of the solution:

Basic operating system configuration for running SAP HANA on RHEL, based on SAP guidelines
RHEL HA Add-On
Red Hat HA solutions for SAP HANA system replication

Chapter 2. Planning the HA cluster setup
Copy link

Plan your setup carefully to ensure that all requirements for the HA cluster configuration for automating the HANA system replication of your HANA landscape are met.

2.1. Subscription and repositories for SAP HANA HA
Copy link

The solutions for SAP HANA in a Pacemaker cluster for High Availability (HA) are provided in dedicated repositories. The RHEL for SAP Solutions subscription is required to access all relevant content. In addition to the standard RHEL repos the subscription provides access to the following repos, which are required to set up the SAP HANA HA solution:

High Availability
The RHEL HA Add-On’s content is stored in a repository named High Availability. The repository ID is represented as rhel-9-for-<arch>-highavailability-e4s-rpms.
SAP Solutions
Name of the repository that contains the SAP HANA specific content. The repository ID is represented as rhel-9-for-<arch>-sap-solutions-e4s-rpms.

The <arch> denotes the specific hardware architecture:

x86_64
ppc64le

Example list of repositories enabled as part of the RHEL for SAP Solutions subscription:

[root]# dnf repolist

Updating Subscription Management repositories.
repo id                                     repo name
rhel-9-for-x86_64-appstream-e4s-rpms        Red Hat Enterprise Linux 9 for x86_64 - AppStream - Update Services for SAP Solutions (RPMs)
rhel-9-for-x86_64-baseos-e4s-rpms           Red Hat Enterprise Linux 9 for x86_64 - BaseOS - Update Services for SAP Solutions (RPMs)
rhel-9-for-x86_64-highavailability-e4s-rpms Red Hat Enterprise Linux 9 for x86_64 - High Availability - Update Services for SAP Solutions (RPMs)
rhel-9-for-x86_64-sap-netweaver-e4s-rpms    Red Hat Enterprise Linux 9 for x86_64 - SAP NetWeaver - Update Services for SAP Solutions (RPMs)
rhel-9-for-x86_64-sap-solutions-e4s-rpms    Red Hat Enterprise Linux 9 for x86_64 - SAP Solutions - Update Services for SAP Solutions (RPMs)

2.2. Operating system requirements
Copy link

Deploy your host operating system as described in Installing RHEL 9 for SAP Solutions.

Follow SAP Note 3108302 - SAP HANA DB: Recommended OS Settings for RHEL 9 to configure architecture specific settings, kernel parameters and check the minimum required Linux kernel and HANA versions.

Apply the operating system post-installation configuration for SAP HANA hosts as described in SAP Note 3108316 - Red Hat Enterprise Linux 9.x: Installation and Configuration.

Root privileges

For the HANA installation and the cluster HA setup the root user or a privileged user that can run any sudo commands is required.

2.3. Storage requirements
Copy link

You can find information about Sizing SAP HANA in the SAP HANA Master Guide.

There is no communication between both scale-out environments on the storage level. As a result, you must complete the storage configuration on each scale-out environment before installing the HANA instances.

For the setup of a scale-out SAP HANA environment with HANA system replication between two HANA sites you can configure the storage as shared or non-shared.

Shared storage

Configure 3 NFS shares for the mountpoints /hana/data, /hana/log, /hana/shared. For each HANA site you must create a dedicated set of the shares.

Example NFS storage details for the HANA instance in datacenter 1:

Expand

Method	NFS Server	NFS Path	Mount Point
`NFS`	`nfs01-datacenter1a.example.com`	`/dc1/data`	`/hana/data`
`NFS`	`nfs01-datacenter1a.example.com`	`/dc1/log`	`/hana/log`
`NFS`	`nfs01-datacenter1a.example.com`	`/dc1/shared`	`/hana/shared`

Example NFS storage details for the HANA instance in datacenter 2:

Expand

Method	NFS Server	NFS Path	Mount Point
`NFS`	`nfs01-datacenter2a.example.com`	`/dc2/data`	`/hana/data`
`NFS`	`nfs01-datacenter2a.example.com`	`/dc2/log`	`/hana/log`
`NFS`	`nfs01-datacenter2a.example.com`	`/dc2/shared`	`/hana/shared`

Non-shared storage

A non-shared storage configuration requires the integration of the storage connector. The storage connector manages access to the LUNs or LVM devices over SCSI or LVM locking mechanisms. You can only use this method for /hana/data and /hana/log.

Use the SAP HANA Fiber Channel Storage Connector Admin Guide for the setup of non-shared storage.

For /hana/shared you must configure an NFS share for each instance, even if you use non-shared storage for /hana/data and /hana/log.

2.4. Network requirements
Copy link

You can find information about SAP HANA network architecture considerations in the SAP HANA Administration Guide.

For the SAP HANA system replication setup in a HA cluster we recommend that you configure dedicated networks and connections for the cluster communication traffic, which is separate from any HANA network traffic.

2.5. HA cluster requirements
Copy link

Fencing

For a supported HA cluster setup using the RHEL HA Add-on you must configure a fencing or STONITH device on each cluster node. Which fencing or STONITH device you can use depends on the platform the cluster is running on. Check the Support Policies for RHEL High Availability Clusters - Fencing/STONITH for recommendations on fencing agents or consult your hardware or cloud provider to find out which fence device is supported on their platform.

Note

fence_scsi or fence_mpath as fencing/STONITH mechanism requires shared storage between the cluster nodes that is fully managed by the HA cluster. If your SAP environment does not include such a shared disks setup, using these fencing options is not supported.

Quorum

HANA environments with HANA system replication managed by a Pacemaker cluster in a performance-optimized setup always consist of an even number of HANA nodes. Each cluster member automatically counts as one vote in the quorum calculations, which the cluster uses to decide which nodes can continue running when there is a communication disruption between the nodes. An even number of cluster nodes can lead to a 50/50 split and there is a risk of a split-brain situation, where both partitions continue running and cause conflicts or data corruption in the running services.

We highly recommend that you configure an additional quorum vote in the cluster to have an odd number of votes and improve the availability of the service even during multiple cluster interconnect failures. See Exploring Concepts of RHEL High Availability Clusters - Quorum for more details about the concept and its benefits.

You can use a qdevice or an additional cluster node to add a quorum vote to your cluster. Both methods require a separate host and have different advantages and limitations that you must consider.

See Configuring a quorum device in the cluster for the configuration steps of the following quorum device methods:

qdevice
- You must configure a dedicated host that is not a member of any cluster.
- Ideally you place the qdevice host in a different location or availability zone than any cluster members.
- You can configure no more than one qdevice per cluster.
- A single qdevice host can serve qdevices to multiple different clusters. You do not need additional qdevice hosts if your different clusters can reach the same qdevice host.
- The qdevice is visible in the quorum configuration only and does not require any change or considerations with the cluster resource settings.
- The qdevice communicates through the network. Use any production network, for example, the HANA client network.
- Ideally you configure a highly available network connection, for example, bonded interfaces on the qdevice host.
majority-maker node
- You must configure a dedicated host that becomes a member of the cluster you want to use it for.
- Ideally you place the majority-maker host in a different location or availability zone than other cluster members.
- The majority-maker host can only be a member of one cluster. You must configure a separate host for the same functionality in any other cluster.
- The communication with this host happens through corosync, like any other cluster member.
- You must adjust your cluster resource settings and add cluster constraints to prevent the cluster from running any resources on this node and leave it out of node target calculations for cloned resources.
- You can configure multiple additional cluster nodes that are restricted to only serve as quorum votes.
- Ideally you configure a highly available network connection, for example, bonded interfaces on the majority-maker host.

2.6. SAP HANA planning
Copy link

To prepare the HANA setup you can define a list of parameters that you require for the installation and configuration of the planned environment.

Example SAP HANA configuration parameters are show below:

Expand

Parameter	Example value
HANA SID	`RH1`
HANA instance number	`02`
HANA site 1 name	`DC1`
HANA site 2 name	`DC2`
HANA site 1 node 1 FQDN	`dc1hana1.example.com`
HANA site 1 node 2 FQDN	`dc1hana2.example.com`
HANA site 1 node 3 FQDN	`dc1hana3.example.com`
HANA site 1 node 4 FQDN	`dc1hana4.example.com`
HANA site 2 node 1 FQDN	`dc2hana1.example.com`
HANA site 2 node 2 FQDN	`dc2hana2.example.com`
HANA site 2 node 3 FQDN	`dc2hana3.example.com`
HANA site 2 node 4 FQDN	`dc2hana4.example.com`
Majority-maker cluster node	`dc3mm.example.com`
HANA DB 'SYSTEM' user password	`<HANA_SYSTEM_PASSWORD>`
SAP system group ID	`10001`
SAP system group name	`sapsys`
SAP local administrator user ID	`10200`
SAP local administrator user name	`sapadm`
HANA administrative user ID	`10210`
HANA administrative user name	`rh1adm`

Chapter 3. Installing SAP HANA scale-out for a 8-node HA cluster setup
Copy link

The examples in the following configuration steps demonstrate the setup on 4 scale-out nodes per HANA site, which results in an installation of 8 HANA nodes.

You can apply the same steps to more scale-out nodes per site. Each HANA site must consist of the same amount of identically configured nodes.

3.1. Managing the firewalld service
Copy link

On RHEL the firewalld systemd service is enabled by default when installed and starts with a basic configuration.

For your planned SAP landscape you must decide if you want to manage all port and connection requirements in the firewall service on each cluster node, or if this is handled separately in the security design of your network infrastructure. You must disable the firewalld service in the case that you do not need to manage a firewall on the operating system level on each cluster node. If the local firewall service remains running without the necessary port configuration, it blocks the cluster communication and the connections between your SAP systems.

For your SAP landscape and HA setup to work you must implement one of the following options:

Disabling the firewalld service
Configuring the firewalld service for the SAP landscape

3.1.1. Disabling the firewalld service
Copy link

The firewalld service is installed and enabled by default as part of the "Server" package group. You must disable it if you do not use it in your network security strategy.

Prerequisites

You are managing firewall rules outside of the individual host operating systems as part of your security concept.

Procedure

Stop and disable the firewalld service on each cluster node. The --now parameter automatically stops the disabled service. Run this on each system of your planned landscape:
```
[root]# systemctl disable --now firewalld.service
```

Verification

Verify that the firewalld service is disabled on each node.

[root]# systemctl status firewalld.service

○ firewalld.service - firewalld - dynamic firewall daemon
     Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; preset: enabled)
     Active: inactive (dead)
       Docs: man:firewalld(1)

3.1.2. Configuring the firewalld service for the SAP landscape
Copy link

Check the SAP documentation for the Ports and Connections that you have to enable in the firewall for your SAP landscape. Consider all SAP components in your setup that require incoming or outgoing communication and connections between the different hosts in your landscape.

Configure the firewalld service on each of your SAP hosts using the methods that fit your requirements best. Consult Configuring firewalls and packet filters for the details on how to use the firewalld service effectively.

3.2. Configuring the host names in /etc/hosts
Copy link

For a consistent host name resolution between all systems in your HANA and HA setup we recommend adding them to the /etc/hosts file on each node.

If you configure the HANA Internal Host Name Resolution you must ensure that the /etc/hosts entries for the same host names are consistent with the HANA configuration.

Procedure

Add the host names of all hosts to the /etc/hosts on all cluster nodes:

[root]# cat /etc/hosts

...
192.168.100.101 dc1hana1.example.com dc1hana1
192.168.100.102 dc1hana2.example.com dc1hana2
192.168.100.103 dc1hana3.example.com dc1hana3
192.168.100.104 dc1hana4.example.com dc1hana4
192.168.100.121 dc2hana1.example.com dc2hana1
192.168.100.122 dc2hana2.example.com dc2hana2
192.168.100.123 dc2hana3.example.com dc2hana3
192.168.100.124 dc2hana4.example.com dc2hana4

Verification

Check that you can ping the hosts. This step is optional and an example only for a basic verification. The system resolves entries in /etc/hosts when you use the ping command:

[root]# ping dc1hana2.example.com

PING dc1hana2.example.com (192.168.100.102) 56(84) bytes of data.
64 bytes from dc1hana2.example.com (192.168.100.102): icmp_seq=1 ttl=64 time=0.017 ms
…

3.3. Configuring the shared SAP filesystems
Copy link

You must configure the shared filesystems on all systems of each HANA site they belong to.

Prerequisites

You have prepared the shared NFS-based filesystems, and all cluster nodes of each HANA site are able to access their related shares. The NFS shares must be external and not exported on one of the cluster nodes.

Procedure

Create the directories for the shared filesystems:
```
[root]# mkdir -p /hana/{shared,data,log}
```
Add the shared NFS filesystems to /etc/fstab to mount them automatically on system start. Configure the mount options that apply to your environment. The following is a basic example:
```
[root]# vi /etc/fstab
```
```
…
<nfs_server>:/<site_path>/data /hana/data nfs4 defaults 0 0
<nfs_server>:/<site_path>/log /hana/log nfs4 defaults 0 0
<nfs_server>:/<site_path>/shared /hana/shared nfs4 defaults 0 0
```
- Replace <nfs_server> with the NFS server DNS name or the IP address of each share, for example, nfs01-datacenter1a.example.com.
- Replace <site_path> with the site specific root path, for example, dc1 on the nodes of one HANA site.
Reload the systemctl daemon to make the new /etc/fstab entries known to systemd:
```
[root]# systemctl daemon-reload
```
Mount any new filesystems that you configured in the /etc/fstab:
```
[root]# mount -a
```
Repeat the configuration steps on each cluster node.

Verification

Check that the filesystems are mounted, for example, on HANA site 1:

[root]# df -hP | grep hana

nfs01-datacenter1a.example.com:/dc1/hana/data    8.0E   32G  8.0E   1% /hana/data
nfs01-datacenter1a.example.com:/dc1/hana/log     8.0E   32G  8.0E   1% /hana/log
nfs01-datacenter1a.example.com:/dc1/hana/shared  8.0E   32G  8.0E   1% /hana/shared

Check that the systemd mount targets exist for the filesystems configured in the /etc/fstab:

[root]# systemctl list-units --all | grep -e 'hana*.*mount' | column -t

hana-data.mount    loaded  active  mounted  /hana/data
hana-log.mount     loaded  active  mounted  /hana/log
hana-shared.mount  loaded  active  mounted  /hana/shared

Repeat the verification steps on each cluster node.

3.4. Creating the SAP administrative user and group
Copy link

In a high-availability environment where the highly available service can move between different systems using shared storage, you must configure the service’s user and groups with identical numerical values for their user ID (UID) and group ID (GID). Different IDs for the same service users or groups cause access conflicts and prevent you from switching the service between the cluster nodes.

Prepare the following operating system group:

sapsys

Prepare the following operating system users:

sapadm
<sid>adm, using your target SID

Prerequisites

You have reserved identical user and group IDs for the required groups and users, for example, in your central identity management system for service users.

Procedure

Create the sapsys group. Use the prepared group ID, for example, ID 10001:
```
[root]# groupadd -g 10001 sapsys
```
Create the sapadm user as a member of the sapsys group. The user does not need a login shell. Use the prepared user ID, for example ID, 10200:
```
[root]# useradd -u 10200 -g sapsys sapadm \
-c 'SAP Local Administrator' -s /sbin/nologin
```
Create the <sid>adm user as a member of the sapsys group. Use the prepared user ID, for example, ID 10210 for user rh1adm:
```
[root]# useradd -u 10210 -g sapsys rh1adm \
-c 'SAP HANA Administrator' -s /bin/sh
```
As the user shell, we recommend that you either use /bin/sh or /bin/csh. SAP installations provide user profiles and useful shell aliases in these shells.
Repeat the steps on all nodes.

Verification

Check that the users sapadm and <sid>adm exist and have the correct groups and IDs configured, for example:

[root]# id sapadm rh1adm

uid=10200(sapadm) gid=10001(sapsys) groups=10001(sapsys)
uid=10210(rh1adm) gid=10001(sapsys) groups=10001(sapsys)

Check that the users have the correct description, home directory and shell defined:

[root]# grep -E 'sapadm|rh1adm' /etc/passwd

sapadm:x:10200:10001:SAP Local Administrator:/home/sapadm:/sbin/nologin
rh1adm:x:10210:10001:SAP HANA Administrator:/home/rh1adm:/bin/sh

Repeat the check on all nodes and verify that the names and IDs are identical.

3.5. Configuring SSH public-key access for root for all cluster nodes (optional)
Copy link

There are steps in the configuration in which you potentially require passwordless root access to the cluster nodes . This can be achieved by setting up SSH public-key authentication between different servers. If this can be used depends on your specific HANA setup and the security policies of your company.

Passwordless root access might be needed in the following situations:

Accessing all hosts of the same HANA site during the database installation. This applies when your HANA site consists of more than one node like in a scale-out setup.
Accessing the primary site from the secondary site for the HANA system replication configuration.

Procedure

Generate an ssh key pair. When no key type is defined, it creates an Ed25519 key by default, like in the following example for the root user:
```
[root]# ssh-keygen
```
Option 1, if you have ssh PasswordAuthentication enabled on the remote system: Use the ssh-copy-id tool to add the ssh public key to the authorized_keys on the remote system. This automatically creates the .ssh/ directory and authorized_keys file with correct permissions for the target user on the remote system. Run it on the host on which you created the ssh key in the previous step and enter the target user password when prompted:
```
[root]# ssh-copy-id <remote_system>
```
In the case of the root user, this only works if the ssh config allows PermitRootLogin and you can provide the root user password in the prompt. Check the ssh configuration setting on the remote system if you face access permission issues even after you have enabled PasswordAuthentication. Consult your security policies before you enable these parameters on your HANA systems.
Option 2, if password login to the target user on the remote host is prohibited or otherwise not possible: Configure the ssh key access on the remote system manually.
1. Create the .ssh/ directory in the target user’s home path on the remote system, if it does not exist yet. Run this on the remote system, for example, for the root user:
  [root]# mkdir /root/.ssh
2. Change the permissions of the new .ssh/ directory. For security reasons the ssh key access does not work when the permissions are not correct. Run this on the remote system:
  [root]# chmod 0700 /root/.ssh
3. Copy the ssh public key from the .pub file that was created by the previous ssh-keygen, for example, id_ed25519.pub in the default setting:
  [root]# cat /root/.ssh/id_ed25519.pub
4. Add the public key to the authorized_keys file. The command creates the file if it does not exist yet, otherwise it appends the key to the existing content. Run this on the remote system, for example, on dc1hana2:
  [root]# cat << EOF >> /root/.ssh/authorized_keys ssh-ed25519 … root@<node1> EOF
5. Ensure that the authorized_keys file has the correct permissions, otherwise the ssh key access is blocked for security reasons:
  [root]# chmod 0600 /root/.ssh/authorized_keys
Access each system and log in from any source host to any remote host that you require for the setup. On first login you must accept each new connection once in an interactive prompt. This saves each host and key in the ssh known_hosts file by default.
1. Option 1: Log in from each host to each other host and accept the key fingerprint once to save it to the local known_hosts file. Subsequent logins to the same host will not require further interaction, unless the key changes. This is a security measure to prevent unsolicited changes of the ssh keys. The following example confirms the authenticity of host dc1hana2:
  [root]# ssh dc1hana2
  The authenticity of host 'dc1hana2 (pass:[***])' can't be established. ED25519 key fingerprint is SHA256:pass:[*********************************]. This key is not known by any other names Are you sure you want to continue connecting (yes/no/[fingerprint])? yes …
2. Option 2: If you configure ssh key access between multiple systems you can use ssh-keyscan to collect the public host key from multiple hosts and save it to the local known_hosts file in a single step per host. Run this on each system for which you distributed the public key and list all remote hosts that you potentially access from this node and user, for example, for the root user on host dc1hana1:
  [root]# ssh-keyscan -f - >> /root/.ssh/known_hosts dc1hana1 dc1hana2 dc1hana3 dc1hana4 dc2hana1 dc2hana2 dc2hana3 dc2hana4 <Ctrl-d>
  # dc1hana1:22 SSH-2.0-OpenSSH_8.7 dc1hana1 ssh-ed25519 … # dc1hana2:22 SSH-2.0-OpenSSH_8.7 dc1hana2 ssh-ed25519 … # dc1hana3:22 SSH-2.0-OpenSSH_8.7 dc1hana3 ssh-ed25519 … # dc1hana4:22 SSH-2.0-OpenSSH_8.7 dc1hana4 ssh-ed25519 … # dc2hana1:22 SSH-2.0-OpenSSH_8.7 dc2hana1 ssh-ed25519 … # dc2hana2:22 SSH-2.0-OpenSSH_8.7 dc2hana2 ssh-ed25519 … # dc2hana3:22 SSH-2.0-OpenSSH_8.7 dc2hana3 ssh-ed25519 … # dc2hana4:22 SSH-2.0-OpenSSH_8.7 dc2hana4 ssh-ed25519 …
  - -f - allows you to provide a list of hosts on the standard input. Instead of the - you can use a file, which you prepare upfront with the list of hosts. You can also enter a single hostname instead of the -f parameter to collect the key of one host at a time.
  - In the case of the standard input list you end the input with Ctrl and d.
  - The >> shell redirection after the scan command directly appends the collected keys to the known_hosts file. If the file does not exist yet it is created in the process.

Verification

Check the known_hosts entries, for example, on dc1hana1:

[root]# cat /root/.ssh/known_hosts

dc1hana1 ssh-ed25519 ******************************...
dc1hana2 ssh-ed25519 ******************************...
dc1hana3 ssh-ed25519 ******************************...
dc1hana4 ssh-ed25519 ******************************...
dc2hana1 ssh-ed25519 ******************************...
dc2hana2 ssh-ed25519 ******************************...
dc2hana3 ssh-ed25519 ******************************...
dc2hana4 ssh-ed25519 ******************************...

Test the access from each source system to every remote system and ensure that every connection direction that you possibly need works without interactive prompts:
```
[root]# ssh <remote_system>
```

3.6. Installing a scale-out SAP HANA instance
Copy link

A HANA scale-out configuration consists of at least 2 HANA instances per system replication site.

Install the HANA instances with the same SID and instance number on all nodes. The setup of the system replication sites must be identical.

The following installation steps are an example of an interactive installation using the command-line interface. Check the SAP HANA Server Installation and Update Guide for more information about installation options and other details.

Prerequisites

You have installed and configured RHEL 9 on all cluster nodes according to the Operating system requirements.
You have prepared the details for your HANA instances, see SAP HANA planning.
You have followed the SAP software download guides in Software Download, downloaded the SAP HANA installation media from the SAP Software Download Center and the media is available on each node.
You have verified that you can resolve the host names of the additional nodes of one site from the main node of the site.
You have verified that you can connect to the additional nodes of one site from the main node using the root user and ssh.
You have configured a time synchronization service on all nodes. See Configuring time synchronization for details. You have configured your OS or network firewall services to enable all required communication between the HANA systems. See Configuring the firewalld service for the SAP landscape for references.

Procedure

Go to the directory which contains the installation media, for example, /sapmedia/hana:
```
[root]# cd /sapmedia/hana
```
Unpack the installation media:
```
[root]# unzip <sap_hana_software>.ZIP
```

Go into the path of the unpacked installation media:

[root]# cd /sapmedia/hana/DATA_UNITS/HDB_LCM_LINUX_<arch>

Run the SAP HANA Lifecycle Management tool (HDBLCM) for an interactive installation:
```
[root]# ./hdblcm
```
In the interactive mode the installer asks you for all the required information, including the System ID (SID), Installation Number (instance), the filesystem location of data and log volumes, and more.
In a scale-out installation you run the installer on the main node of one HANA site and provide any additional nodes of the same site as an installation parameter. For example, you run the installer for site 1 on dc1hana1 and add node dc1hana2 as an additional host name to add when the prompt asks for it.
Optionally you can use the batch mode of the command-line installation tool and provide your configuration parameters in one step. For more details see Use Batch Mode to Perform Platform LCM Tasks in the SAP HANA Server Installation and Update Guide.
Repeat all steps on the main node of the second site. For the HANA system replication to work you must ensure that each HANA site consists of the same amount of systems with an identical HANA configuration.

Verification

Switch to the <sid>adm user:
```
[root]# su - rh1adm
```

Check the HANA instance runtime information as user <sid>adm:

rh1adm$HDB info

USER          PID     PPID  %CPU        VSZ        RSS COMMAND
rh1adm      12525    12524   0.2       8836       5568 -sh
rh1adm      12584    12525   0.0       7520       3968  \_ /bin/sh /usr/sap/RH1/HDB02/HDB info
rh1adm      12613    12584   0.0      10104       3484      \_ ps fx -U rh1adm -o user:8,pid:8,ppid:8,pcpu:5,vsz:10,rss:10,args
rh1adm       8813        1   0.0     566804      41000 hdbrsutil  --start --port 30203 --volume 3 …
rh1adm       8124        1   0.0     566724      40972 hdbrsutil  --start --port 30201 --volume 1 …
rh1adm       7947        1   0.0       9312       3352 sapstart pf=/usr/sap/RH1/SYS/profile/RH1_HDB02_dc1hana1
rh1adm       7955     7947   0.0     460036      89176  \_ /usr/sap/RH1/HDB02/dc1hana1/trace/hdb.sapRH1_HDB02 -d -nw -f /usr/sap/RH1/HDB02/dc1hana1/daemon.ini pf=/usr/sap/RH1/SYS/profile/RH1_HDB02_dc1hana1
rh1adm       7981     7955  26.1   18612328   14092076      \_ hdbnameserver
rh1adm       8642     7955   0.5    1465380     212048      \_ hdbcompileserver
rh1adm       8645     7955   294    6616736    6049012      \_ hdbpreprocessor
rh1adm       8687     7955  33.9   18931580   14929092      \_ hdbindexserver -port 30203
rh1adm       8690     7955   2.0    5073572    1390440      \_ hdbxsengine -port 30207
rh1adm       9202     7955   0.8    2772836     482088      \_ hdbwebdispatcher
rh1adm       7782        1   0.1     566772      58444 /usr/sap/RH1/HDB02/exe/sapstartsrv pf=/usr/sap/RH1/SYS/profile/RH1_HDB02_dc1hana1
root        11868     7782   0.1      10464       4644  \_ sapuxuserchk 0 128

Verify as <sid>adm on all sites that the HANA instances are running on all nodes in the site and their status is GREEN in the instance list, for example, on site 1:

rh1adm$sapcontrol -nr ${TINSTANCE} -function GetSystemInstanceList

hostname, instanceNr, httpPort, httpsPort, startPriority, features, dispstatus
dc1hana4, 2, 50213, 50214, 0.3, HDB|HDB_STANDBY, GREEN
dc1hana1, 2, 50213, 50214, 0.3, HDB|HDB_WORKER, GREEN
dc1hana3, 2, 50213, 50214, 0.3, HDB|HDB_WORKER, GREEN
dc1hana2, 2, 50213, 50214, 0.3, HDB|HDB_WORKER, GREEN

Additionally, you can verify the landscapeHostConfiguration.py output for status ok:

rh1adm$cdpy; python landscapeHostConfiguration.py

| Host     | Host   | Host   | Failover | Remove | Storage   | Storage   | Failover | Failover | NameServer | NameServer | IndexServer | IndexServer | Host    | Host    | Worker  | Worker  |
|          | Active | Status | Status   | Status | Config    | Actual    | Config   | Actual   | Config     | Actual     | Config      | Actual      | Config  | Actual  | Config  | Actual  |
|          |        |        |          |        | Partition | Partition | Group    | Group    | Role       | Role       | Role        | Role        | Roles   | Roles   | Groups  | Groups  |
| -------- | ------ | ------ | -------- | ------ | --------- | --------- | -------- | -------- | ---------- | ---------- | ----------- | ----------- | ------- | ------- | ------- | ------- |
| dc1hana1 | yes    | ok     |          |        |         1 |         1 | default  | default  | master 1   | master     | worker      | master      | worker  | worker  | default | default |
| dc1hana2 | yes    | ok     |          |        |         2 |         2 | default  | default  | master 2   | slave      | worker      | slave       | worker  | worker  | default | default |
| dc1hana3 | yes    | ok     |          |        |         3 |         3 | default  | default  | slave      | slave      | worker      | slave       | worker  | worker  | default | default |
| dc1hana4 | yes    | ignore |          |        |         0 |         0 | default  | default  | master 3   | slave      | standby     | standby     | standby | standby | default | -       |

overall host status: ok

Check that the systemd units are installed for the HANA instance and the SAP Host Agent:

[root]# systemctl list-unit-files --all sap* SAP*

UNIT FILE            STATE     PRESET
sapmedia.mount       generated -
saphostagent.service enabled   disabled
sapinit.service      generated -
SAPRH1_02.service    enabled   disabled
SAP.slice            static    -

5 unit files listed.

Repeat the steps on all nodes. Note that the HANA profiles contain the individual node name in the format <SID>_HDB<instance>_<node>.

3.7. Disabling SAP HANA instance autostart
Copy link

The cluster controls startup and shutdown of the HANA instance in a HA cluster setup. You must configure the HANA instance profile to not automatically start the instance itself.

Procedure

Go to the HANA instance profile directory:
```
[root]# cd /hana/shared/<SID>/profile
```
Edit the instance profile:
```
[root]# vi <SID>_HDB<instance>_<hostname>
```
Ensure that Autostart is set to 0.
Repeat the previous steps for each HANA instance that will be managed as part of the HA cluster.

Verification

Check that Autostart = 0 is set in the instance profiles of all HANA instances that will be managed by the HA cluster:

[root]# grep Autostart /hana/shared/RH1/profile/*

/hana/shared/RH1/profile/RH1_HDB02_dc1hana1:Autostart = 0
/hana/shared/RH1/profile/RH1_HDB02_dc1hana2:Autostart = 0
/hana/shared/RH1/profile/RH1_HDB02_dc1hana3:Autostart = 0
/hana/shared/RH1/profile/RH1_HDB02_dc1hana4:Autostart = 0

Chapter 4. Configuring the SAP HANA system replication
Copy link

You must configure and test the SAP HANA system replication before you can configure the HANA instance in a cluster. Follow the SAP guidelines for the HANA system replication setup: SAP HANA System Replication: Configuration.

4.1. Prerequisites for the SAP HANA system replication setup
Copy link

SAP HANA configuration

SAP HANA must be installed and configured identically on the system replication sites.

Host name resolution

All hosts must be able to resolve the host names and fully qualified domain names (FQDN) of all HANA systems. To ensure that all host names can be resolved even without DNS you can place them into /etc/hosts. This is also recommended for hosts configured in HA clusters in general.

In addition you can manage host name resolution in SAP HANA internally. For more details see Internal Host Name Resolution and Host Name Resolution for System Replication. Ensure that the HANA internal host names and /etc/hosts entries are consistent.

Note

As documented at hostname | SAP Help Portal, SAP HANA only supports hostnames with lowercase characters.

SAP HANA log_mode

For the system replication to work, you must set the SAP HANA log_mode variable to normal, which is the default value.

Verify the current log_mode as the HANA administrative user <sid>adm on both nodes:

rh1adm$hdbsql -u system -i ${TINSTANCE} \
"select value from "SYS"."M_INIFILE_CONTENTS" where key='log_mode'"
Password: <HANA_SYSTEM_PASSWORD>

VALUE "normal"
1 row selected

4.2. Performing an initial HANA database backup
Copy link

You can only enable the HANA system replication when an initial backup of the SAP HANA database exists on the primary site for the planned SAP HANA system replication setup.

You can use SAP HANA tools to create the backup and skip the manual procedure. See SAP HANA Administration Guide - SAP HANA Database Backup and Recovery for more information.

Prerequisites

You have a writable directory to which the backup files are saved for the SAP HANA administrative user <sid>adm.
You have sufficient free space available in the filesystem on which the backup files are stored.

Procedure

Optional: Create a dedicated directory for the backup in a suitable path, for example:
```
[root]# mkdir <path>/<SID>-backup
```
Replace <path> with a path on your system, which has enough free space for the initial backup files.
Change the owner of the backup path to user <sid>adm if the target directory is not already owned or writable by the HANA user, for example:
```
[root]# chown <sid>adm:sapsys <path>/<SID>-backup
```
Change to the <sid>adm user for the remaining steps:
```
[root]# su - <sid>adm
```
Create a backup of the SYSTEMDB as the <sid>adm user. Specify the path to the files the backups will be stored in. Ensure that the target filesystem has enough free space left, then create the backup:
```
rh1adm$ hdbsql -i ${TINSTANCE} -u system -d SYSTEMDB \
"BACKUP DATA USING FILE ('<path>/${SAPSYSTEMNAME}-backup/bkp-SYS')"
Password: <HANA_SYSTEM_PASSWORD>
```
- $TINSTANCE and $SAPSYSTEMNAME are environment variables that are part of the <sid>adm user shell environment. $TINSTANCE is the instance number and $SAPSYSTEMNAME is the SID. Both are automatically set to the instance values related to the <sid>adm user.
- Replace <path> with the path where the <sid>adm user has write access and where there is enough free space left.
Create a backup of all tenant databases as the <sid>adm user. Specify the path to the files the backups will be stored in. Ensure that the target filesystem has enough free space left. Create the tenant DB backup:
```
rh1adm$ hdbsql -i ${TINSTANCE} -u system -d SYSTEMDB \
"BACKUP DATA FOR ${SAPSYSTEMNAME} USING FILE ('<path>/${SAPSYSTEMNAME}-backup/bkp-${SAPSYSTEMNAME}')"
Password: <HANA_SYSTEM_PASSWORD>
```
Replace <path> with the path where the <sid>adm user has write access and where there is enough free space left.

Verification

List the resulting backup files. Example when using /hana/log/RH1-backup as the directory to store the initial backup:

rh1adm$ ls -lh /hana/log/RH1-backup/

total 7.4G
-rw-r-----. 1 rh1adm sapsys 156K Dec  9 16:13 bkp-RH1_databackup_0_1
-rw-r-----. 1 rh1adm sapsys  81M Dec  9 16:13 bkp-RH1_databackup_2_1
-rw-r-----. 1 rh1adm sapsys 3.6G Dec  9 16:13 bkp-RH1_databackup_3_1
-rw-r-----. 1 rh1adm sapsys  81M Dec  9 16:13 bkp-RH1_databackup_4_1
-rw-r-----. 1 rh1adm sapsys  81M Dec  9 16:13 bkp-RH1_databackup_5_1
-rw-r-----. 1 rh1adm sapsys 172K Dec  9 16:11 bkp-SYS_databackup_0_1
-rw-r-----. 1 rh1adm sapsys 3.6G Dec  9 16:12 bkp-SYS_databackup_1_1

Use the HANA command hdbbackupcheck to confirm the sanity of each backup file you created:

rh1adm$ for i in $(ls /hana/log/RH1-backup/*); do hdbbackupcheck $i; done

Loaded library 'libhdbcsaccessor'
Loaded library 'libhdblivecache'
Loaded library 'libhdbcsaccessor'
Loaded library 'libhdblivecache'
Loaded library 'libhdbcsaccessor'
Loaded library 'libhdblivecache'
Backup '/hana/log/RH1-backup/bkp-RH1_databackup_0_1' successfully checked.
Loaded library 'libhdbcsaccessor'
Loaded library 'libhdblivecache'
Backup '/hana/log/RH1-backup/bkp-RH1_databackup_2_1' successfully checked.
Loaded library 'libhdbcsaccessor'
Loaded library 'libhdblivecache'
Backup '/hana/log/RH1-backup/bkp-RH1_databackup_3_1' successfully checked.
Loaded library 'libhdbcsaccessor'
Loaded library 'libhdblivecache'
Backup '/hana/log/RH1-backup/bkp-RH1_databackup_4_1' successfully checked.
Loaded library 'libhdbcsaccessor'
Loaded library 'libhdblivecache'
Backup '/hana/log/RH1-backup/bkp-RH1_databackup_5_1' successfully checked.
Loaded library 'libhdbcsaccessor'
Loaded library 'libhdblivecache'
Loaded library 'libhdbcsaccessor'
Loaded library 'libhdblivecache'
Loaded library 'libhdbcsaccessor'
Loaded library 'libhdblivecache'
Backup '/hana/log/RH1-backup/bkp-SYS_databackup_0_1' successfully checked.
Loaded library 'libhdbcsaccessor'
Loaded library 'libhdblivecache'
Backup '/hana/log/RH1-backup/bkp-SYS_databackup_1_1' successfully checked.

Troubleshooting

The backup fails when the <sid>adm user is not able to write to the target directory:
```
* 447: backup could not be completed: [2001003]
createDirectory(path= '/tmp/RH1-backup/', access= rwxrwxr--, recursive= true):
Permission denied (rc= 13, 'Permission denied') SQLSTATE: HY000
```
Ensure that the <sid>adm user can create files inside of the target directory you define in the backup command. Fix the permissions, for example using step 2 of the procedure.

The backup fails because the target filesystem runs out of space:

* 447: backup could not be completed: [2110001]
Generic stream error: $msg$ - , rc=$sysrc$: $sysmsg$.
Failed to process item 0x00007fc5796e0000 - '<root>/.bkp-RH1_databackup_3_1'
((open, mode= W, file_access= rw-r-----, flags= ASYNC|DIRECT|TRUNCATE|UNALIGNED_SIZE,
size= 4096), factory= (root= '/tmp/RH1-backup/' (root_access= rwxr-x---,
flags= AUTOCREATE_PATH|DISKFULL_ERROR, usage= DATA_BACKUP, fs= xfs, config=
(async_write_submit_active=on,async_write_submit_blocks=all,async_read_submit=on,num_submit_queues=1,num_completion_queues=1,size_kernel_io_queue=512,max_parallel_io_requests=64,min_submit_batch_size=16,max_submit_batch_size=64))
SQLSTATE: HY000

Check the free space of the filesystem on which the target directory is located. Increase the filesystem size or choose a different path with enough free space available for the backup files.

4.3. Configuring the primary HANA replication site
Copy link

Enable the HANA system replication on a system which you plan to be the initial primary site of your planned system replication setup.

Prerequisites

You have created an initial backup for the HANA database on the primary node site based on steps described in Performing an initial HANA database backup.

Procedure

Enable the system replication on the HANA site that becomes the initial primary. Run the command as <sid>adm on the first, or primary, node:
```
rh1adm$hdbnsutil -sr_enable --name=<site1>
```
```
nameserver is active, proceeding ...
successfully enabled system as system replication source site
done.
```
- Replace <site1> with your primary HANA site name, for example, DC1.

Verification

Check the system replication configuration as <sid>adm, and verify that it shows the current node as mode: primary, and that site id and site name are populated with the primary site information:
```
rh1adm$hdbnsutil -sr_stateConfiguration
```
```
System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~

mode: primary
site id: 1
site name: DC1
done.
```

4.4. Configuring the secondary HANA replication site
Copy link

You must register the secondary HANA site to the primary site to complete the setup of the HANA system replication.

Prerequisites

You have installed SAP HANA on the secondary nodes using the same SID and instance number as the primary instances.
You have configured SSH public-key access for the root user between the cluster nodes.
You have configured the firewall rules in your network infrastructure or on each host operating system to allow the connections that your HANA landscape requires for the system replication connection.
You have opened 2 terminals on a secondary node, for example, on dc2hana1 with one terminal for the root user and one for the <sid>adm user.

Procedure

Stop the secondary HANA instances. Run as the <sid>adm user on one secondary instance, for example, on dc2hana1:
```
rh1adm$ sapcontrol -nr ${TINSTANCE} -function StopSystem HDB
```
Run this and the following steps only on one node on the secondary site, for example, on dc2hana1. Change to the directory on the shared filesystem where HANA stores the keys of the system replication encryption on one node of the secondary site:
```
[root]# cd /hana/shared/<SID>/global/security/rsecssfs
```
Copy the HANA system PKI file SSFS_<SID>.KEY from the primary HANA site to the secondary site on one secondary node:
```
[root]# rsync -av <node1>:$PWD/key/SSFS_<SID>.KEY key/
```
- Replace <node1> with a primary node, for example, dc1hana1.
- Replace <SID> with your HANA SID, for example, RH1.
Copy the PKI file SSFS_<SID>.DAT from the primary HANA site to the secondary site on one secondary node in the same way as the previous step:
```
[root]# rsync -av <node1>:$PWD/data/SSFS_<SID>.DAT data/
```
Register the secondary HANA site to the primary site. Run this in the <sid>adm user terminal on a secondary node:
```
rh1adm$ hdbnsutil -sr_register --remoteHost=<node1> \
--remoteInstance=${TINSTANCE} --replicationMode=sync \
--operationMode=logreplay --name=<site2>
```
```
adding site ...
collecting information ...
updating local ini files ...
done.
```
- Replace <node1> with a primary node, for example, dc1hana1.
- Replace <site2> with your secondary HANA site name, for example, DC2.
- Choose the values for replicationMode and operationMode according to your requirements for the system replication.
- $TINSTANCE is an environment variable that is set automatically for user <sid>adm by reading the HANA instance profile. The variable value is the HANA instance number.
Start the secondary HANA instances. Run as <sid>adm on one HANA instance on the secondary site:
```
rh1adm$ sapcontrol -nr ${TINSTANCE} -function StartSystem HDB
```

Verification

Check that the system replication is running on the secondary site and the mode matches the value you used for the replicationMode parameter in the hdbnsutil -sr_register command. Run as <sid>adm on one node on the secondary site, for example, dc2hana1:
```
rh1adm$ hdbnsutil -sr_stateConfiguration
```
```
System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~

mode: sync
site id: 2
site name: DC2
active primary site: 1

primary masters: dc1hana1 dc1hana2 dc1hana4
```
Change to the Python script directory of the HANA instance as user <sid>adm, on both sites. The easiest way to do this is to use cdpy, which is an alias built into the <sid>adm user shell that SAP HANA populates during the instance installation:
```
rh1adm$ cdpy
```
In our example this command changes the current directory to /usr/sap/RH1/HDB02/exe/python_support/.

Show the current status of the established HANA system replication, on both HANA sites.

On the primary site the system replication status is always displayed with all details:

rh1adm$ python systemReplicationStatus.py

|Database |Host     |Port  |Service Name |Volume ID |Site ID |Site Name |Secondary |Secondary |Secondary |Secondary |Secondary     |Replication |Replication |Replication    |Secondary    |
|         |         |      |             |          |        |          |Host      |Port      |Site ID   |Site Name |Active Status |Mode        |Status      |Status Details |Fully Synced |
|-------- |-------- |----- |------------ |--------- |------- |--------- |--------- |--------- |--------- |--------- |------------- |----------- |----------- |-------------- |------------ |
|RH1      |dc1hana3 |30203 |indexserver  |        5 |      1 |DC1       |dc2hana3  |    30203 |        2 |DC2       |YES           |SYNC        |ACTIVE      |               |        True |
|RH1      |dc1hana2 |30203 |indexserver  |        4 |      1 |DC1       |dc2hana2  |    30203 |        2 |DC2       |YES           |SYNC        |ACTIVE      |               |        True |
|SYSTEMDB |dc1hana1 |30201 |nameserver   |        1 |      1 |DC1       |dc2hana1  |    30201 |        2 |DC2       |YES           |SYNC        |ACTIVE      |               |        True |
|RH1      |dc1hana1 |30207 |xsengine     |        2 |      1 |DC1       |dc2hana1  |    30207 |        2 |DC2       |YES           |SYNC        |ACTIVE      |               |        True |
|RH1      |dc1hana1 |30203 |indexserver  |        3 |      1 |DC1       |dc2hana1  |    30203 |        2 |DC2       |YES           |SYNC        |ACTIVE      |               |        True |

status system replication site "2": ACTIVE
overall system replication status: ACTIVE

Local System Replication State
~~~~~~~~~~

mode: PRIMARY
site id: 1
site name: DC1

ACTIVE means that the HANA system replication is in a healthy state and fully synced.
SYNCING is displayed while the system replication is being updated on the secondary site, for example after a takeover of the secondary site to become the new primary.
INITIALIZING is shown after the system replication has first been enabled or a full sync has been triggered.

On the secondary site, the output of systemReplicationStatus.py is less detailed. Check the status on one secondary node:

rh1adm$ python systemReplicationStatus.py
this system is either not running or not primary system replication site

Local System Replication State
~~~~~~~~~~

mode: SYNC
site id: 2
site name: DC2
active primary site: 1
primary masters: dc1hana1 dc1hana2 dc1hana4

4.5. Testing the HANA system replication
Copy link

We recommend that you test the HANA system replication thoroughly before you proceed with the cluster setup. The verification of the correct system replication behavior can help to prevent unexpected results when the HA cluster manages the system replication afterwards.

Use timeout values in the cluster resource configuration that cover the measured times of the different tests to ensure that cluster resource operations do not time out prematurely.

You can also test different parameter values in the HANA configuration to optimize the performance by measuring the time that certain activities take when performed manually outside of cluster control.

Perform the tests with realistic data loads and sizes.

Full replication

How long does the synchronization take after the newly registered secondary is started until it is in sync with the primary?
Are there parameters which can improve the synchronization time?

Lost connection

How long does it take after the connection was lost between the primary and the secondary site, until they are in sync again?
Are there parameters which can improve the reconnection and sync times?

Takeover

How long does the secondary site take to be fully available after a takeover from the primary?
What is the time difference between a normal takeover and a "takeover with handshake"?
Are there parameters which can improve the takeover time?

Data consistency

Is the data you create available and correct after you perform a takeover?

Client reconnect

Can the client reconnect to the new primary site after a takeover?
How long does it take for the client to access the new primary after a takeover?

Primary becomes secondary

How long does it take a former primary until it is in sync again with the new primary, after it is registered as a new secondary?
If configured, how long does it take until a client can access the newly registered secondary for read operations?

Chapter 5. Configuring the Pacemaker cluster
Copy link

5.1. Deploying the basic cluster configuration
Copy link

The following basic cluster setup covers the minimum steps to get started with the Pacemaker cluster setup for managing SAP instances. Apply the steps to include all the nodes according to your planned cluster configuration.

For more information on settings and options for complex configurations, refer to the documentation for the RHEL HA Add-On, for example, Create a high availability cluster with multiple links.

Prerequisites

You have set up the HANA system replication environment and verified that it is working correctly.
You have configured the RHEL High Availability repository on all systems that are going to be nodes of this cluster.
You have verified fencing and quorum requirements according to your planned environment. For more details see HA cluster requirements.

Procedure

Install the Red Hat High Availability Add-On software packages from the High Availability repository. Choose which fence agents you want to install and execute the installation on all cluster nodes.
```
[root]# dnf install pcs pacemaker fence-agents-<model>
```
Start and enable the pcsd service on all cluster nodes. The --now parameter automatically starts the enabled service:
```
[root]# systemctl enable --now pcsd.service
```
Optional: If you use the local firewalld service you must enable the ports that are required by the Red Hat High Availability Add-On. Run this on all cluster nodes:
```
[root]# firewall-cmd --add-service=high-availability
[root]# firewall-cmd --runtime-to-permanent
```
Set a password for the user hacluster. Repeat the command on each node using the same password:
```
[root]# passwd hacluster
```
Authenticate the user hacluster for each node in the cluster. Run this on one node and apply all cluster node names, for example, dc1hana1 to dc2hana4:
```
[root]# pcs host auth <node1> … <node8>
Username: hacluster
Password:
```
```
dc1hana2: Authorized
dc2hana1: Authorized
dc1hana4: Authorized
dc1hana3: Authorized
dc2hana2: Authorized
dc1hana1: Authorized
dc2hana4: Authorized
dc2hana3: Authorized
```
- Enter the node names with or without FQDN, as defined in the /etc/hosts file.
- Enter the hacluster user password in the prompt.

Create the cluster with a unique name and provide the names of all cluster members with fully qualified host names. This propagates the cluster configuration on all nodes and starts the cluster with the defined cluster name. Run this command on one node and apply all cluster node names, for example, dc1hana1 to dc2hana4:

[root]# pcs cluster setup <cluster_name> --start <node1> … <node8>

No addresses specified for host 'dc1hana1', using 'dc1hana1'
No addresses specified for host 'dc1hana2', using 'dc1hana2'
No addresses specified for host 'dc1hana3', using 'dc1hana3'
No addresses specified for host 'dc1hana4', using 'dc1hana4'
No addresses specified for host 'dc2hana1', using 'dc2hana1'
No addresses specified for host 'dc2hana2', using 'dc2hana2'
No addresses specified for host 'dc2hana3', using 'dc2hana3'
No addresses specified for host 'dc2hana4', using 'dc2hana4'
Destroying cluster on hosts: 'dc1hana1', 'dc1hana2', 'dc1hana3', 'dc1hana4', 'dc2hana1', 'dc2hana2', 'dc2hana3', 'dc2hana4'...
dc1hana2: Successfully destroyed cluster
dc2hana1: Successfully destroyed cluster
dc2hana4: Successfully destroyed cluster
dc2hana2: Successfully destroyed cluster
dc1hana3: Successfully destroyed cluster
dc1hana1: Successfully destroyed cluster
dc2hana3: Successfully destroyed cluster
dc1hana4: Successfully destroyed cluster
Requesting remove 'pcsd settings' from 'dc1hana1', 'dc1hana2', 'dc1hana3', 'dc1hana4', 'dc2hana1', 'dc2hana2', 'dc2hana3', 'dc2hana4'
dc1hana1: successful removal of the file 'pcsd settings'
dc1hana2: successful removal of the file 'pcsd settings'
dc1hana3: successful removal of the file 'pcsd settings'
dc1hana4: successful removal of the file 'pcsd settings'
dc2hana1: successful removal of the file 'pcsd settings'
dc2hana2: successful removal of the file 'pcsd settings'
dc2hana3: successful removal of the file 'pcsd settings'
dc2hana4: successful removal of the file 'pcsd settings'
Sending 'corosync authkey', 'pacemaker authkey' to 'dc1hana1', 'dc1hana2', 'dc1hana3', 'dc1hana4', 'dc2hana1', 'dc2hana2', 'dc2hana3', 'dc2hana4'
dc1hana1: successful distribution of the file 'corosync authkey'
dc1hana1: successful distribution of the file 'pacemaker authkey'
dc1hana2: successful distribution of the file 'corosync authkey'
dc1hana2: successful distribution of the file 'pacemaker authkey'
dc1hana3: successful distribution of the file 'corosync authkey'
dc1hana3: successful distribution of the file 'pacemaker authkey'
dc1hana4: successful distribution of the file 'corosync authkey'
dc1hana4: successful distribution of the file 'pacemaker authkey'
dc2hana1: successful distribution of the file 'corosync authkey'
dc2hana1: successful distribution of the file 'pacemaker authkey'
dc2hana2: successful distribution of the file 'corosync authkey'
dc2hana2: successful distribution of the file 'pacemaker authkey'
dc2hana3: successful distribution of the file 'corosync authkey'
dc2hana3: successful distribution of the file 'pacemaker authkey'
dc2hana4: successful distribution of the file 'corosync authkey'
dc2hana4: successful distribution of the file 'pacemaker authkey'
Sending 'corosync.conf' to 'dc1hana1', 'dc1hana2', 'dc1hana3', 'dc1hana4', 'dc2hana1', 'dc2hana2', 'dc2hana3', 'dc2hana4'
dc1hana1: successful distribution of the file 'corosync.conf'
dc1hana2: successful distribution of the file 'corosync.conf'
dc1hana3: successful distribution of the file 'corosync.conf'
dc1hana4: successful distribution of the file 'corosync.conf'
dc2hana1: successful distribution of the file 'corosync.conf'
dc2hana2: successful distribution of the file 'corosync.conf'
dc2hana3: successful distribution of the file 'corosync.conf'
dc2hana4: successful distribution of the file 'corosync.conf'
Cluster has been successfully set up.
Starting cluster on hosts: 'dc1hana1', 'dc1hana2', 'dc1hana3', 'dc1hana4', 'dc2hana1', 'dc2hana2', 'dc2hana3', 'dc2hana4'...

Enable the cluster to start automatically on system start on all cluster nodes, which enables the corosync and pacemaker services. Skip this step if you prefer to manually control the start of the cluster after a node restarts. Run on one node:

[root]# pcs cluster enable --all

dc1hana1: Cluster Enabled
dc1hana2: Cluster Enabled
dc1hana3: Cluster Enabled
dc1hana4: Cluster Enabled
dc2hana1: Cluster Enabled
dc2hana2: Cluster Enabled
dc2hana3: Cluster Enabled
dc2hana4: Cluster Enabled

Verification

Check the cluster status after the initial configuration. Verify that it shows all cluster nodes as Online and the status of all cluster daemons is active/enabled:

[root]# pcs status --full

Cluster name: hana-scaleout-cluster

WARNINGS:
No stonith devices and stonith-enabled is not false

Status of pacemakerd: 'Pacemaker is running' (last updated 2025-12-10 13:47:29Z)
Cluster Summary:
  * Stack: corosync
  * Current DC: dc1hana4 (4) (version 2.1.5-9.el9_2.5-a3f44794f94) - partition with quorum
  * Last updated: Wed Dec 10 13:47:30 2025
  * Last change:  Wed Dec 10 13:45:23 2025 by hacluster via crmd on dc1hana4
  * 8 nodes configured
  * 0 resource instances configured

Node List:
  * Node dc1hana1 (1): online, feature set 3.16.2
  * Node dc1hana2 (2): online, feature set 3.16.2
  * Node dc1hana3 (3): online, feature set 3.16.2
  * Node dc1hana4 (4): online, feature set 3.16.2
  * Node dc2hana1 (5): online, feature set 3.16.2
  * Node dc2hana2 (6): online, feature set 3.16.2
  * Node dc2hana3 (7): online, feature set 3.16.2
  * Node dc2hana4 (8): online, feature set 3.16.2

Full List of Resources:
  * No resources

Migration Summary:

Tickets:

PCSD Status:
  dc1hana1: Online
  dc1hana2: Online
  dc1hana3: Online
  dc1hana4: Online
  dc2hana1: Online
  dc2hana2: Online
  dc2hana3: Online
  dc2hana4: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Next steps

Configure a fencing method to enable the STONITH mechanism. See Configuring fencing in a Red Hat High Availability cluster.
Test the fencing setup before you proceed with further configuration of the cluster. For more information, see How to test fence devices and fencing configuration in a Red Hat High Availability cluster?
Configure a quorum device, see Configuring a quorum device in the cluster.

5.2. Configuring general cluster properties
Copy link

You must adjust cluster resource defaults to avoid unnecessary failovers of the resources.

Procedure

Run the following command on one cluster node to update the default values of the resource-stickiness and migration-threshold parameters:
```
[root]# pcs resource defaults update \
resource-stickiness=1000 \
migration-threshold=5000
```
- resource-stickiness=1000 encourages the resource to stay running where it is. This prevents the cluster from moving the resources based on light internal health indicators.
- migration-threshold=5000 enables the resource to be restarted on a node after 5000 failures. After exceeding this limit, the resource is blocked on the node until the failure has been cleared. This allows resource recovery after a few failures until an administrator can investigate the cause of the repeated failure and reset the counter.

Verification

Check that the resource defaults are set:

[root]# pcs resource defaults

Meta Attrs: build-resource-defaults
  migration-threshold=5000
  resource-stickiness=1000

5.3. Configuring the systemd-based SAP startup framework
Copy link

Systemd integration is the default behavior of SAP HANA installations on RHEL 9 for SAP HANA 2.0 SPS07 revision 70 and newer. In HA environments you must apply additional modifications to integrate the different systemd services that are involved in the cluster setup.

Configure the pacemaker systemd service to manage the HANA instance systemd service in the correct order on all cluster nodes running HANA instances..

Prerequisites

You have installed the HANA instances with systemd integration and have checked the systemd integration on all HANA nodes, for example:

[root]# systemctl list-units --all SAP*

  UNIT              LOAD      ACTIVE   SUB     DESCRIPTION
  SAPRH1_02.service loaded    active   running SAP Instance SAPRH1_02
  SAP.slice         loaded    active   active  SAP Slice
...

Procedure

Create the directory /etc/systemd/system/pacemaker.service.d/ for the pacemaker service drop-in file:
```
[root]# mkdir /etc/systemd/system/pacemaker.service.d/
```

Create the systemd drop-in file for the pacemaker service with the following content:

[root]# cat << EOF > /etc/systemd/system/pacemaker.service.d/00-pacemaker.conf
[Unit]
Description=Pacemaker needs the SAP HANA instance service
Wants=SAP<SID>_<instance>.service
After=SAP<SID>_<instance>.service
EOF

Replace <SID> with your HANA SID.
Replace <instance> with your HANA instance number.

Reload the systemctl daemon to activate the drop-in file:
```
[root]# systemctl daemon-reload
```
Repeat steps 1-3 on the other HANA cluster nodes.

Verification

Check the systemd service of your HANA instance and verify that it is loaded:

[root]# systemctl status SAPRH1_02.service

● SAPRH1_02.service - SAP Instance SAPRH1_02
     Loaded: loaded (/etc/systemd/system/SAPRH1_02.service; disabled; preset: disabled)
     Active: active (running) since xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
   Main PID: 5825 (sapstartsrv)
      Tasks: 841
     Memory: 88.6G
        CPU: 4h 50min 2.033s
     CGroup: /SAP.slice/SAPRH1_02.service
             ├─ 5825 /usr/sap/RH1/HDB02/exe/sapstartsrv pf=/usr/sap/RH1/SYS/profile/RH1_HDB02_dc1hana1
             ├─ 5986 sapstart pf=/usr/sap/RH1/SYS/profile/RH1_HDB02_dc1hana1
             ├─ 5993 /usr/sap/RH1/HDB02/dc1hana1/trace/hdb.sapRH1_HDB02 -d -nw -f /usr/sap/RH1/HDB02/dc1hana1/daemon.ini pf=/usr/sap/RH1/SYS/profile/RH1_HDB02_dc1hana1
...

Verify that the SAP HANA instance service is known to the pacemaker service now:

[root]# systemctl show pacemaker.service | grep -E 'Wants=|After=|SAP.{6}.service'

Wants=SAPRH1_02.service resource-agents-deps.target dbus-broker.service
After=... SAPRH1_02.service …

Make sure that the SAP<SID>_<instance>.service is listed in the After= and Wants= lists.

5.4. Installing the SAP HANA HA components
Copy link

The resource-agents-sap-hana-scaleout RPM package in the Red Hat Enterprise Linux 9 for <arch> - SAP Solutions (RPMs) repository provides resource agents and other SAP HANA specific components for setting up a HA cluster for managing HANA system replication setup.

Procedure

Install the resource-agents-sap-hana-scaleout package on all cluster nodes:
```
[root]# dnf install resource-agents-sap-hana-scaleout
```

Verification

Check on all nodes that the package is installed, for example:

[root]# rpm -q resource-agents-sap-hana-scaleout

resource-agents-sap-hana-scaleout-0.185.3-0.el9_2.noarch

5.5. Configuring the SAPHanaSR HA/DR provider for the srConnectionChanged() hook method
Copy link

When you configure the HANA instance in a HA cluster setup with SAP HANA 2.0 SPS0 or later, you must enable and test the SAP HANA srConnectionChanged() hook method before proceeding with the cluster setup.

Prerequisites

You have installed the resource-agents-sap-hana-scaleout package.
Your HANA instance is not yet managed by the cluster. Otherwise, use the maintenance procedure Performing maintenance on the SAP HANA instances to make sure that the cluster does not interfere during the configuration of the hook scripts.

Procedure

Stop the HANA instances on all nodes. Run this as the <sid>adm user on one HANA instance per site:
```
rh1adm$ sapcontrol -nr ${TINSTANCE} -function StopSystem HDB
```

Verify as <sid>adm on all sites that the HANA instances are stopped completely and their status is GRAY in the instance list. Run this on one host on each site:

rh1adm$ sapcontrol -nr ${TINSTANCE} -function GetSystemInstanceList

GetSystemInstanceList
OK
hostname, instanceNr, httpPort, httpsPort, startPriority, features, dispstatus
dc1hana2, 2, 50213, 50214, 0.3, HDB|HDB_WORKER, GRAY
dc1hana3, 2, 50213, 50214, 0.3, HDB|HDB_WORKER, GRAY
dc1hana1, 2, 50213, 50214, 0.3, HDB|HDB_WORKER, GRAY
dc1hana4, 2, 50213, 50214, 0.3, HDB|HDB_STANDBY, GRAY

Change to the HANA configuration directory, as the <sid>adm user, using the command alias cdcoc, which is built into the <sid>adm user shell. This automatically changes into the /hana/shared/<SID>/global/hdb/custom/config/ path:
```
rh1adm$ cdcoc
```
Update the global.ini file of the SAP HANA site to configure the SAPHanaSR hook. Edit the configuration file on one node of each HANA site and add the following configuration:
```
[ha_dr_provider_SAPHanaSR]
provider = SAPHanaSR
path = /usr/share/SAPHanaSR-ScaleOut
execution_order = 1

[trace]
ha_dr_saphanasr = info
```
Set execution_order to 1 to ensure that the SAPHanaSR hook is always executed with the highest priority.
Due to the shared /hana/shared filesystem between the nodes of each HANA site, you only adjust the configuration once per site. Do not try to edit the same file on the shared filesystem simultaneously on more than one node of the same site.

Optional: When you also want to configure the optional ChkSrv hook for taking action on an hdbindexserver failure, you can add the changes to the global.ini at the same time, see step 1 in Configuring the ChkSrv HA/DR provider for the srServiceStateChanged() hook method:

[ha_dr_provider_SAPHanaSR]
provider = SAPHanaSR
path = /usr/share/SAPHanaSR-ScaleOut
execution_order = 1

[ha_dr_provider_chksrv]
provider = ChkSrv
path = /usr/share/SAPHanaSR-ScaleOut
execution_order = 2
action_on_lost = stop

[trace]
ha_dr_saphanasr = info
ha_dr_chksrv = info

Create the file /etc/sudoers.d/20-saphana, as the root user, on each cluster node with the following content. These command privileges allow the <sid>adm user to update certain cluster node attributes as part of the SAPHanaSR hook execution:
```
[root]# visudo -f /etc/sudoers.d/20-saphana
```
```
Defaults:<sid>adm !requiretty
<sid>adm ALL=(ALL) NOPASSWD: /usr/sbin/crm_attribute -n hana_*
```
- Replace <sid> with the lower-case HANA SID.
- For further information on why the Defaults setting is needed, refer to The srHook attribute is set to SFAIL in a Pacemaker cluster managing SAP HANA system replication, even though replication is in a healthy state.
Start the HANA instances on all cluster nodes manually without starting the HA cluster. Run as <sid>adm on one HANA instance per site:
```
rh1adm$ sapcontrol -nr ${TINSTANCE} -function StartSystem HDB
```

Verification

Change to the SAP HANA directory, as the <sid>adm user, where trace log files are stored. Use the command alias cdtrace, which is built into the <sid>adm user shell:
```
rh1adm$ cdtrace
```

Check the HANA nameserver process logs for the HA/DR provider loading message:

rh1adm$ grep -he "loading HA/DR Provider.*" nameserver_*

If only the SAPHanaSR provider is configured:

loading HA/DR Provider 'SAPHanaSR' from /usr/share/SAPHanaSR-ScaleOut

If the optional ChkSrv provider is also implemented:

loading HA/DR Provider 'ChkSrv' from /usr/share/SAPHanaSR-ScaleOut
loading HA/DR Provider 'SAPHanaSR' from /usr/share/SAPHanaSR-ScaleOut

Verify as user root in the system secure log that the sudo command executed successfully. If the sudoers file is not correct, an error is logged when the sudo command is executed. Check this on the primary node on which the HANA master nameserver is running, for example, dc1hana1:
```
[root]# grep -e 'sudo.*crm_attribute.*' /var/log/secure
```
```
sudo[17141]:  rh1adm : PWD=/hana/shared/RH1/HDB02/dc1hana1 ; USER=root ; COMMAND=/usr/sbin/crm_attribute -n hana_rh1_gsh -v 1.0 -l reboot
sudo[17160]:  rh1adm : PWD=/hana/shared/RH1/HDB02/dc1hana1 ; USER=root ; COMMAND=/usr/sbin/crm_attribute -n hana_rh1_glob_srHook -v SFAIL -t crm_config -s SAPHanaSR
…
sudo[17584]:  rh1adm : PWD=/hana/shared/RH1/HDB02/dc1hana1 ; USER=root ; COMMAND=/usr/sbin/crm_attribute -n hana_rh1_glob_srHook -v SOK -t crm_config -s SAPHanaSR
```
After the HANA instance starts on both nodes, you can usually see several srHook attribute updates. At first it is setting SFAIL, because immediately after the primary starts, it is not yet in sync with the secondary, which is still synchronizing at this time.
The last update to SOK is triggered by the HANA event after the system replication status finally changes to fully in sync.
Repeat the verification steps 1-2 on the second site, if not already done at the same time. The sudo log messages of step 3 are only visible on the primary instance coordinator nameserver node, on which the system replication events are logged.
Check the cluster attributes on any node and verify that the value for the hana_<sid>_glob_srHook attribute is updated as expected:
```
[root]# cibadmin --query | grep -e 'SAPHanaSR.*srHook'
```
```
 <nvpair id="SAPHanaSR-hana_rh1_glob_srHook" name="hana_rh1_glob_srHook" value="SOK"/>
```
- SOK is set when the HANA system replication is in ACTIVE state, which means established and fully in sync.
- SFAIL is set when the system replication is in any other state.

Troubleshooting

Refer to The HANA instance does not start after hook changes.

5.6. Configuring the ChkSrv HA/DR provider for the srServiceStateChanged() hook method
Copy link

You can configure the hook ChkSrv if you want the HANA instance to be stopped or killed for faster recovery after an indexserver process has failed. This configuration is optional.

Prerequisites

You have installed the resource-agents-sap-hana-scaleout package.
You have configured the SAPHanaSR HA/DR provider. For more information see Configuring the SAPHanaSR HA/DR provider for the srConnectionChanged() hook method.

Procedure

Change to the HANA configuration directory as the <sid>adm user. Use the command alias cdcoc, which is built into the <sid>adm user shell. This automatically changes into the /hana/shared/<SID>/global/hdb/custom/config/ path:
```
rh1adm$ cdcoc
```
Update the global.ini file of the HANA site to configure the hook script. Edit the configuration file one node of each HANA site and add the following content in addition to the SAPHanaSR provider definition:
```
[ha_dr_provider_chksrv]
provider = ChkSrv
path = /usr/share/SAPHanaSR-ScaleOut
execution_order = 2
action_on_lost = stop

[trace]
ha_dr_saphanasr = info
ha_dr_chksrv = info
```
Due to the shared /hana/shared filesystem between the nodes of each HANA site, you only adjust the configuration once per site. Do not try to edit the same file on the shared filesystem simultaneously on more than one node of the same site.
Optional: Activate the ChkSrv provider while HANA is running by reloading the HA/DR providers. Skip this step when configuring the hook script while the instance is down, the HA/DR provider is loaded automatically at the next instance start.
```
rh1adm$ hdbnsutil -reloadHADRProviders
```

Verification

Change to the SAP HANA directory, as the <sid>adm user, where trace log files are stored. Use the command alias cdtrace, which is built into the <sid>adm user shell:
```
rh1adm$ cdtrace
```

Check that the changes are loaded:

rh1adm$ grep -e "loading HA/DR Provider.*ChkSrv.*" nameserver_*

loading HA/DR Provider 'ChkSrv' from /usr/share/SAPHanaSR-ScaleOut

Check that the dedicated trace log file is created and the provider loaded with the correct configuration parameters:

rh1adm$ cat nameserver_chksrv.trc

init called
ChkSrv.init() version 0.7.8, parameter info: action_on_lost=stop stop_timeout=20 kill_signal=9
…

Troubleshooting

Refer to The HANA instance does not start after hook changes.

5.7. Creating the HANA cluster resources
Copy link

You must configure the SAPHanaTopology and SAPHanaController resources so that the cluster can collect the status of the HANA landscape, monitor the instance health and take action to manage the instance when required.

Prerequisites

You have installed the cluster and you have configured all HANA nodes in the cluster.
You have configured the HANA system replication between your HANA sites.
All HANA instances are running and the system replication is healthy.

Procedure

Create the SAPHanaTopology resource as a clone resource, which means it runs on all cluster nodes at the same time:

[root]# pcs resource create rsc_SAPHanaTop_<SID>_HDB<instance> \
ocf:heartbeat:SAPHanaTopology \
SID=<SID> \
InstanceNumber=<instance> \
op start timeout=600 \
op stop timeout=300 \
op monitor interval=30 timeout=300 \
clone cln_SAPHanaTop_<SID>_HDB<instance>

Replace <SID> with your HANA SID.
Replace <instance> with your HANA instance number.

Update the meta attributes of the new SAPHanaTopology clone resource:

[root]# pcs resource update cln_SAPHanaTop_<SID>_HDB<instance> \
meta clone-node-max=1 interleave=true

Create the SAPHanaController resource as a promotable clone resource. This means it runs on all cluster nodes at the same time, but on one node it functions as the active, or primary, resource:
```
[root]# pcs resource create rsc_SAPHanaCon_<SID>_HDB<instance> \
ocf:heartbeat:SAPHanaController \
SID=<SID> \
InstanceNumber=<instance> \
PREFER_SITE_TAKEOVER=true \
DUPLICATE_PRIMARY_TIMEOUT=7200 \
AUTOMATED_REGISTER=false \
op stop timeout=3600 \
op monitor interval=59 role=Promoted timeout=700 \
op monitor interval=61 role=Unpromoted timeout=700 \
meta priority=100 \
promotable cln_SAPHanaCon_<SID>_HDB<instance>
```
We recommend that you create the resource with AUTOMATED_REGISTER=false and then verify the correct behavior and data consistency through tests to complete the setup. For more information, see Testing the setup. You can enable this already at creation time by setting the parameter to true.
See SAPHanaController resource parameters for more details.

Update the meta attributes of the new SAPHanaController clone resource:

[root]# pcs resource update cln_SAPHanaCon_<SID>_HDB<instance> \
meta clone-node-max=1 interleave=true

You must start the SAPHanaTopology resource before the SAPHanaController resource, because it collects HANA landscape information, which the SAPHanaController resource requires to start correctly. Create the cluster constraint that enforces the correct start order of the two resources:
```
[root]# pcs constraint order cln_SAPHanaTop_<SID>_HDB<instance> \
then cln_SAPHanaCon_<SID>_HDB<instance> symmetrical=false
```
Setting symmetrical=false indicates that the constraint only influences the start order of the resources, but it does not apply to the stop order.

Verification

Review the SAPHanaTopology resource clone. For operations that you do not define at resource creation, pcs automatically applies default values. Example resource configuration:

[root]# pcs resource config cln_SAPHanaTop_RH1_HDB02

Clone: cln_SAPHanaTop_RH1_HDB02
  Meta Attributes: cln_SAPHanaTop_RH1_HDB02-meta_attributes
    clone-node-max=1
    interleave=true
  Resource: rsc_SAPHanaTop_RH1_HDB02 (class=ocf provider=heartbeat type=SAPHanaTopology)
    Attributes: rsc_SAPHanaTop_RH1_HDB02-instance_attributes
      InstanceNumber=02
      SID=RH1
    Operations:
      methods: rsc_SAPHanaTop_RH1_HDB02-methods-interval-0s
        interval=0s
        timeout=5
      monitor: rsc_SAPHanaTop_RH1_HDB02-monitor-interval-30
        interval=30
        timeout=300
      reload: rsc_SAPHanaTop_RH1_HDB02-reload-interval-0s
        interval=0s
        timeout=5
      start: rsc_SAPHanaTop_RH1_HDB02-start-interval-0s
        interval=0s
        timeout=600
      stop: rsc_SAPHanaTop_RH1_HDB02-stop-interval-0s
        interval=0s
        timeout=300

Review the SAPHanaController resource clone. Example resource configuration:

[root]# pcs resource config cln_SAPHanaCon_RH1_HDB02

Clone: cln_SAPHanaCon_RH1_HDB02
  Meta Attributes: cln_SAPHanaCon_RH1_HDB02-meta_attributes
    clone-node-max=1
    interleave=true
    promotable=true
  Resource: rsc_SAPHanaCon_RH1_HDB02 (class=ocf provider=heartbeat type=SAPHanaController)
    Attributes: rsc_SAPHanaCon_RH1_HDB02-instance_attributes
      AUTOMATED_REGISTER=false
      DUPLICATE_PRIMARY_TIMEOUT=7200
      InstanceNumber=02
      PREFER_SITE_TAKEOVER=true
      SID=RH1
    Meta Attributes: rsc_SAPHanaCon_RH1_HDB02-meta_attributes
      priority=100
    Operations:
      demote: rsc_SAPHanaCon_RH1_HDB02-demote-interval-0s
        interval=0s
        timeout=320
      methods: rsc_SAPHanaCon_RH1_HDB02-methods-interval-0s
        interval=0s
        timeout=5
      monitor: rsc_SAPHanaCon_RH1_HDB02-monitor-interval-59
        interval=59
        timeout=700
        role=Promoted
      monitor: rsc_SAPHanaCon_RH1_HDB02-monitor-interval-61
        interval=61
        timeout=700
        role=Unpromoted
      promote: rsc_SAPHanaCon_RH1_HDB02-promote-interval-0s
        interval=0s
        timeout=3600
      reload: rsc_SAPHanaCon_RH1_HDB02-reload-interval-0s
        interval=0s
        timeout=5
      start: rsc_SAPHanaCon_RH1_HDB02-start-interval-0s
        interval=0s
        timeout=3600
      stop: rsc_SAPHanaCon_RH1_HDB02-stop-interval-0s
        interval=0s
        timeout=3600

Check that the start order constraint is in place:

[root]# pcs constraint order

Order Constraints:
  start resource 'cln_SAPHanaTop_RH1_HDB02' then start resource 'cln_SAPHanaCon_RH1_HDB02'
    symmetrical=0

Check the cluster status. Use --full to include node attributes, which are updated by the HANA resources:

[root]# pcs status --full

...
Full List of Resources:
  * rsc_fence       (stonith:<fence agent>):     Started dc1hana1
  * Clone Set: cln_SAPHanaTop_RH1_HDB02 [rsc_SAPHanaTop_RH1_HDB02]:
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc1hana4
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc2hana1
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc2hana2
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc2hana3
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc2hana4
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc1hana1
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc1hana2
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc1hana3
  * Clone Set: cln_SAPHanaCon_RH1_HDB02 [rsc_SAPHanaCon_RH1_HDB02] (promotable):
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc1hana4
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc2hana1
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc2hana2
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc2hana3
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc2hana4
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Promoted dc1hana1
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc1hana2
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc1hana3

Node Attributes:
  * Node: dc1hana1 (1):
    * hana_rh1_clone_state              : PROMOTED
    * hana_rh1_gra                      : 2.0
    * hana_rh1_gsh                      : 1.0
    * hana_rh1_roles                    : master1:master:worker:master
    * hana_rh1_site                     : DC1
    * master-rsc_SAPHanaCon_RH1_HDB02   : 150
  * Node: dc1hana2 (2):
    * hana_rh1_clone_state              : DEMOTED
    * hana_rh1_gra                      : 2.0
    * hana_rh1_gsh                      : 1.0
    * hana_rh1_roles                    : master2:slave:worker:slave
    * hana_rh1_site                     : DC1
    * master-rsc_SAPHanaCon_RH1_HDB02   : 140
  * Node: dc1hana3 (3):
    * hana_rh1_clone_state              : DEMOTED
    * hana_rh1_gra                      : 2.0
    * hana_rh1_gsh                      : 1.0
    * hana_rh1_roles                    : slave:slave:worker:slave
    * hana_rh1_site                     : DC1
    * master-rsc_SAPHanaCon_RH1_HDB02   : -10000
  * Node: dc1hana4 (4):
    * hana_rh1_clone_state              : DEMOTED
    * hana_rh1_gra                      : 2.0
    * hana_rh1_gsh                      : 1.0
    * hana_rh1_roles                    : master3:slave:standby:standby
    * hana_rh1_site                     : DC1
    * master-rsc_SAPHanaCon_RH1_HDB02   : 140
  * Node: dc2hana1 (5):
    * hana_rh1_clone_state              : DEMOTED
    * hana_rh1_gra                      : 2.0
    * hana_rh1_gsh                      : 1.0
    * hana_rh1_roles                    : master1:master:worker:master
    * hana_rh1_site                     : DC2
    * master-rsc_SAPHanaCon_RH1_HDB02   : 100
  * Node: dc2hana2 (6):
    * hana_rh1_clone_state              : DEMOTED
    * hana_rh1_gra                      : 2.0
    * hana_rh1_gsh                      : 1.0
    * hana_rh1_roles                    : master2:slave:worker:slave
    * hana_rh1_site                     : DC2
    * master-rsc_SAPHanaCon_RH1_HDB02   : 80
  * Node: dc2hana3 (7):
    * hana_rh1_clone_state              : DEMOTED
    * hana_rh1_gra                      : 2.0
    * hana_rh1_gsh                      : 1.0
    * hana_rh1_roles                    : slave:slave:worker:slave
    * hana_rh1_site                     : DC2
    * master-rsc_SAPHanaCon_RH1_HDB02   : -12200
  * Node: dc2hana4 (8):
    * hana_rh1_clone_state              : DEMOTED
    * hana_rh1_gra                      : 2.0
    * hana_rh1_gsh                      : 1.0
    * hana_rh1_roles                    : master3:slave:standby:standby
    * hana_rh1_site                     : DC2
    * master-rsc_SAPHanaCon_RH1_HDB02   : 80

...

Note

The timeouts shown for the resource operations are only recommended defaults and can be adjusted depending on your SAP HANA environment. For example, large SAP HANA databases can take longer to start up and therefore you might have to increase the start timeout.

Warning

Setting AUTOMATED_REGISTER to true can potentially increase the risk of data loss or corruption. If the HA cluster triggers a takeover when the data on the secondary HANA instance is not fully in sync, the automatic registration of the old primary HANA instance as the new secondary HANA instance results a data loss on this instance and any data that was not synced before the takeover occurred is lost as well.

For more information, see the article on the SAP Technology Blog for Members: Be Prepared for Using Pacemaker Cluster for SAP HANA – Part 2: Failure of Both Nodes.

5.8. Creating the virtual IP resource
Copy link

You must configure a virtual IP (VIP) resource for SAP clients to access the primary HANA instance independently from the cluster node it is currently running on. Configure the VIP resource to automatically move to the node where the primary instance is running.

The resource agent you need for the VIP resource depends on the platform you use. We are using the IPaddr2 resource agent to demonstrate the setup.

Prerequisites

You have reserved a virtual IP for the service.

Procedure

Use the appropriate resource agent for managing the virtual IP address based on the platform on which the HA cluster is running. Adjust the parameters according to the resource agent you are using. Create the cluster resource for the primary virtual IP, for example, using the IPaddr2 agent:
```
[root]# pcs resource create rsc_vip_<SID>_HDB<instance>_primary \
ocf:heartbeat:IPaddr2 ip=<address> cidr_netmask=<netmask> nic=<device>
```
- Replace <SID> with your HANA SID.
- Replace <instance> with your HANA instance number.
- Replace <address>, <netmask> and <device> with the details of your primary virtual IP address.
Create a cluster constraint that places the VIP resource with the SAPHanaController resource on the HANA primary node:
```
[root]# pcs constraint colocation add rsc_vip_<SID>_HDB<instance>_primary \
with promoted cln_SAPHanaCon_<SID>_HDB<instance> 2000
```
The constraint applies a score of 2000 instead of the default INFINITY. This softens the resource dependency and allows the virtual IP resource to stay active in the case when there is no promoted SAPHanaController resource. This way it is still possible to use tools like the SAP Management Console (MMC) or SAP Landscape Management (LaMa) that can reach this IP address to query the status information of the HANA instance.

Verification

Check the resource configuration of the virtual IP resource, for example::

[root]# pcs resource config rsc_vip_RH1_HDB02_primary

Resource: rsc_vip_RH1_HDB02_primary (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: rsc_vip_RH1_HDB02_primary-instance_attributes
    cidr_netmask=32
    ip=192.168.1.100
    nic=eth0
  Operations:
    monitor: rsc_vip_RH1_HDB02_primary-monitor-interval-10s
      interval=10s
      timeout=20s
    start: rsc_vip_RH1_HDB02_primary-start-interval-0s
      interval=0s
      timeout=20s
    stop: rsc_vip_RH1_HDB02_primary-stop-interval-0s
      interval=0s
      timeout=20s

Check that the constraint is defined correctly:

[root]# pcs constraint colocation

Colocation Constraints:
  rsc_vip_RH1_HDB02_primary with cln_SAPHanaCon_RH1_HDB02 (score:2000) (rsc-role:Started) (with-rsc-role:Promoted)

Check that the resource is running on the promoted primary node, for example, dc1hana1:

[root]# pcs status resources rsc_vip_RH1_HDB02_primary

  * rsc_vip_RH1_HDB02_primary   (ocf:heartbeat:IPaddr2):         Started dc1hana1

5.9. Adding a secondary (read-enabled) virtual IP address
Copy link

To support the Active/Active (read-enabled) secondary setup you must add a second virtual IP to provide client access to the secondary SAP HANA site.

Configure additional rules to ensure that the second virtual IP is always associated with a healthy SAP HANA site, maximizing client access and availability.

Normal operation
When both primary and secondary SAP HANA sites are active and the system replication is in sync, the second virtual IP is assigned to the secondary instance.
Secondary unavailable or out of sync
If the secondary site is down or the system replication is not in sync, the virtual IP moves to the primary instance. It automatically returns to the secondary instance as soon as the system replication is back in sync.
Failover scenario
In case the cluster triggers a takeover, the virtual IP remains on the same instance. After the former primary instance takes over the secondary role and the system replication is in sync again, this VIP shifts there accordingly.

Prerequisites

You have set operationMode=logreplay_readaccess when registering the secondary SAP HANA instance for system replication with the primary site.

Procedure

Use the appropriate resource agent for managing the virtual IP address based on the platform on which the HA cluster is running. Adjust the parameters according to the resource agent you are using. Create the cluster resource for the secondary virtual IP, for example, using the IPaddr2 agent:
```
[root]# pcs resource create rsc_vip_<SID>_HDB<site>_readonly \
ocf:heartbeat:IPaddr2 ip=<address> cidr_netmask=<netmask> nic=<device>
```
- Replace <SID> with your HANA SID.
- Replace <site> with your HANA site number.
- Replace <address>, <netmask> and <device> with the details of your read-only secondary virtual IP address.
Create a location constraint rule to ensure that the secondary virtual IP is assigned to the secondary site during normal operations:
```
[root]# pcs constraint location rsc_vip_<SID>_HDB<site>_readonly \
rule score=INFINITY master-rsc_SAPHanaCon_<SID>_HDB<site> eq 100 \
and hana_<sid>_clone_state eq DEMOTED
```
- Replace <SID> with your HANA SID.
- Replace <sid> with the lower-case HANA SID.
- Replace <site> with your HANA site number.

Create a location constraint rule to ensure that the secondary virtual IP runs on the primary site as an alternative whenever necessary:

[root]# pcs constraint location rsc_vip_<SID>_HDB<site>_readonly \
rule score=2000 master-rsc_SAPHanaCon_<SID>_HDB<site> eq 150 \
and hana_<sid>_clone_state eq PROMOTED

Verification

Check the resource configuration of the secondary virtual IP resource, for example:

[root]# pcs resource config rsc_vip_RH1_HDB02_readonly

Resource: rsc_vip_RH1_HDB02_readonly (class=ocf provider=heartbeat type=IPaddr2)
  Attributes: rsc_vip_RH1_HDB02_readonly-instance_attributes
    cidr_netmask=32
    ip=192.168.1.200
    nic=eth0
  Operations:
    monitor: rsc_vip_RH1_HDB02_readonly-monitor-interval-10s
      interval=10s
      timeout=20s
    start: rsc_vip_RH1_HDB02_readonly-start-interval-0s
      interval=0s
      timeout=20s
    stop: rsc_vip_RH1_HDB02_readonly-stop-interval-0s
      interval=0s
      timeout=20s

Check that the constraints are part of the cluster configuration:

[root]#  pcs constraint location

Location Constraints:
  Resource: rsc_vip_RH1_HDB02_readonly
    Constraint: location-rsc_vip_RH1_HDB02_readonly
      Rule: boolean-op=and score=INFINITY
        Expression: master-rsc_SAPHanaCon_RH1_HDB02 eq 100
        Expression: hana_rh1_clone_state eq DEMOTED
    Constraint: location-rsc_vip_RH1_HDB02_readonly-1
      Rule: boolean-op=and score=2000
        Expression: master-rsc_SAPHanaCon_RH1_HDB02 eq 150
        Expression: hana_rh1_clone_state eq PROMOTED

Check that the resource is running on the main secondary node, for example, dc2hana1:

[root]# pcs status resources rsc_vip_RH1_HDB02_readonly

  * rsc_vip_RH1_HDB02_readonly  (ocf:heartbeat:IPaddr2):         Started dc2hana1

Chapter 6. Configuring a quorum device in the cluster
Copy link

We recommend that you configure a qdevice in your cluster for improved service resiliency. Alternatively you can configure a dedicated cluster node that only serves for adding a quorum vote.

Warning

Do not configure both a qdevice and a majority-maker node in the same cluster. This adds one vote per method and results in an even number of quorum votes again.

6.1. Configuring a qdevice for cluster quorum
Copy link

If you prefer to use a dedicated majority-maker cluster node for this purpose, skip the qdevice setup and follow the steps in Configuring a majority-maker node for cluster quorum instead.

6.1.1. Preparing the quorum device host
Copy link

Configure the RHEL High Availability repository on the quorum device host.
If the firewalld service is installed and you are not using it on your hosts, disable the service. See Disabling the firewalld service.

6.1.2. Configuring a qdevice on a quorum device host
Copy link

At first you must configure a quorum device host that serves the qdevice for your cluster quorum.

In the following steps the example name of the qdevice host is dc3qdevice.

Prerequisites

You have installed a separate host that is ideally located in a different location or availability zone than your cluster nodes.
You have configured the RHEL High Availability repository on the dedicated quorum device host.
You have configured your network in a way that your cluster nodes can reach the quorum device host.

Procedure

Install pcs and corosync-qnetd on the quorum device host:
```
[root]# dnf install pcs corosync-qnetd
```
Start and enable the pcsd service on the quorum device host:
```
[root]# systemctl enable --now pcsd.service
```
Create the qdevice on the quorum device host. This command configures and starts the quorum device model net and configures the device to start on boot. Run this command on the quorum device host:
```
[root]# pcs qdevice setup model net --enable --start
```
```
Quorum device 'net' initialized
quorum device enabled
Starting quorum device...
quorum device started
```
Optional: If you are running the firewalld service, enable the ports that are required by the Red Hat High Availability Add-On. Run this on the quorum device host:
```
[root]# firewall-cmd --add-service=high-availability
[root]# firewall-cmd --runtime-to-permanent
```
Set a password for the user hacluster on the quorum device host:
```
[root]# passwd hacluster
```

Verification

Check the quorum device status on the quorum device host:

[root]# pcs qdevice status net --full

QNetd address:                  *:5403
TLS:                            Supported (client certificate required)
Connected clients:              0
Connected clusters:             0
Maximum send/receive size:      32768/32768 bytes

6.1.3. Configuring a qdevice in the cluster
Copy link

Prerequisites

You have configured a quorum device host that is ideally located in a different location or availability zone than your cluster nodes, for example, dc3qdevice.
You have configured a qdevice on the quorum device host.
You have configured your network in a way that your cluster nodes can reach the quorum device host.

Procedure

Add the qdevice host to the /etc/hosts on all existing cluster nodes, so that the resulting /etc/hosts entries are the same on all nodes. This ensures that the cluster nodes can communicate with the host even when the DNS services failed:

[root]# cat /etc/hosts

...
192.168.100.101 dc1hana1.example.com dc1hana1
192.168.100.102 dc1hana2.example.com dc1hana2
192.168.100.103 dc1hana3.example.com dc1hana3
192.168.100.104 dc1hana4.example.com dc1hana4
192.168.100.121 dc2hana1.example.com dc2hana1
192.168.100.122 dc2hana2.example.com dc2hana2
192.168.100.123 dc2hana3.example.com dc2hana3
192.168.100.124 dc2hana4.example.com dc2hana4
192.168.100.120 dc3qdevice.example.com dc3qdevice

Install corosync-qdevice on all nodes of your cluster:
```
[root]# dnf install corosync-qdevice
```
Authenticate the quorum device host in the cluster to enable communication. Run this command on one cluster node:
```
[root]# pcs host auth <qdevice_host>
Username: hacluster
Password:
```
```
dc3qdevice: Authorized
```
- Replace <qdevice_host> with the name of your quorum device host, for example, dc3qdevice.

Add the qdevice from the quorum device host to the cluster. Run this command on one cluster node:

[root]# pcs quorum device add model net host=<qdevice_host> algorithm=ffsplit

Setting up qdevice certificates on nodes...
dc1hana1: Succeeded
dc1hana2: Succeeded
dc1hana3: Succeeded
dc1hana4: Succeeded
dc2hana3: Succeeded
dc2hana2: Succeeded
dc2hana1: Succeeded
dc2hana4: Succeeded
Enabling corosync-qdevice...
dc1hana3: corosync-qdevice enabled
dc1hana1: corosync-qdevice enabled
dc1hana4: corosync-qdevice enabled
dc2hana3: corosync-qdevice enabled
dc2hana4: corosync-qdevice enabled
dc2hana2: corosync-qdevice enabled
dc1hana2: corosync-qdevice enabled
dc2hana1: corosync-qdevice enabled
Sending updated corosync.conf to nodes...
dc1hana1: Succeeded
dc1hana3: Succeeded
dc1hana4: Succeeded
dc1hana2: Succeeded
dc2hana1: Succeeded
dc2hana2: Succeeded
dc2hana3: Succeeded
dc2hana4: Succeeded
dc1hana1: Corosync configuration reloaded
Starting corosync-qdevice...
dc1hana1: corosync-qdevice started
dc1hana2: corosync-qdevice started
dc2hana4: corosync-qdevice started
dc2hana3: corosync-qdevice started
dc1hana3: corosync-qdevice started
dc1hana4: corosync-qdevice started
dc2hana2: corosync-qdevice started
dc2hana1: corosync-qdevice started

Replace <qdevice_host> with the name of your quorum device host, for example, dc3qdevice.
The algorithm can be ffsplit or lms. Consult the corosync-qdevice(8) man page for more details about the different algorithms.

Verification

Check the quorum configuration on a cluster node:

[root]# pcs quorum config

Device:
  votes: 1
  Model: net
    algorithm: ffsplit
    host: dc3qdevice

Check the quorum status on a cluster node:

[root]# pcs quorum status

Quorum information
------------------
Date:             Thu Dec 11 08:27:00 2025
Quorum provider:  corosync_votequorum
Nodes:            8
Node ID:          1
Ring ID:          1.3a
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   9
Highest expected: 9
Total votes:      9
Quorum:           5
Flags:            Quorate Qdevice

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
         1          1    A,V,NMW dc1hana1 (local)
         2          1    A,V,NMW dc1hana2
         3          1    A,V,NMW dc1hana3
         4          1    A,V,NMW dc1hana4
         5          1    A,V,NMW dc2hana1
         6          1    A,V,NMW dc2hana2
         7          1    A,V,NMW dc2hana3
         8          1    A,V,NMW dc2hana4
         0          1            Qdevice

Check the quorum device status on a cluster node:

[root]# pcs quorum device status

Qdevice information
-------------------
Model:                  Net
Node ID:                1
Configured node list:
    0   Node ID = 1
    1   Node ID = 2
    2   Node ID = 3
    3   Node ID = 4
    4   Node ID = 5
    5   Node ID = 6
    6   Node ID = 7
    7   Node ID = 8
Membership node list:   1, 2, 3, 4, 5, 6, 7, 8

Qdevice-net information
----------------------
Cluster name:           hana-scaleout-cluster
QNetd host:             dc3qdevice:5403
Algorithm:              Fifty-Fifty split
Tie-breaker:            Node with lowest node ID
State:                  Connected

Check the quorum device status on the quorum device host. The status now shows the details of the cluster on which the qdevice is used. If the same qdevice is configured in multiple clusters, the status contains the details for each cluster:

[root]# pcs qdevice status net --full

QNetd address:                  *:5403
TLS:                            Supported (client certificate required)
Connected clients:              8
Connected clusters:             1
Maximum send/receive size:      32768/32768 bytes
Cluster "hana-scaleout-cluster":
    Algorithm:          Fifty-Fifty split (KAP Tie-breaker)
    Tie-breaker:        Node with lowest node ID
    Node ID 1:
        Client address:         ::ffff:10.99.30.149:13248
        HB interval:            8000ms
        Configured node list:   1, 2, 3, 4, 5, 6, 7, 8
        Ring ID:                1.3a
        Membership node list:   1, 2, 3, 4, 5, 6, 7, 8
        Heuristics:             Undefined (membership: Undefined, regular: Undefined)
        TLS active:             Yes (client certificate verified)
        Vote:                   ACK (ACK)
    Node ID 2:
        Client address:         ::ffff:10.99.30.30:13276
        HB interval:            8000ms
        Configured node list:   1, 2, 3, 4, 5, 6, 7, 8
        Ring ID:                1.3a
        Membership node list:   1, 2, 3, 4, 5, 6, 7, 8
        Heuristics:             Undefined (membership: Undefined, regular: Undefined)
        TLS active:             Yes (client certificate verified)
        Vote:                   ACK (ACK)
    Node ID 3:
        Client address:         ::ffff:10.99.30.232:22226
        HB interval:            8000ms
        Configured node list:   1, 2, 3, 4, 5, 6, 7, 8
        Ring ID:                1.3a
        Membership node list:   1, 2, 3, 4, 5, 6, 7, 8
        Heuristics:             Undefined (membership: Undefined, regular: Undefined)
        TLS active:             Yes (client certificate verified)
        Vote:                   ACK (ACK)
    Node ID 4:
        Client address:         ::ffff:10.99.30.82:40404
        HB interval:            8000ms
        Configured node list:   1, 2, 3, 4, 5, 6, 7, 8
        Ring ID:                1.3a
        Membership node list:   1, 2, 3, 4, 5, 6, 7, 8
        Heuristics:             Undefined (membership: Undefined, regular: Undefined)
        TLS active:             Yes (client certificate verified)
        Vote:                   ACK (ACK)
    Node ID 5:
        Client address:         ::ffff:10.99.30.163:40404
        HB interval:            8000ms
        Configured node list:   1, 2, 3, 4, 5, 6, 7, 8
        Ring ID:                1.3a
        Membership node list:   1, 2, 3, 4, 5, 6, 7, 8
        Heuristics:             Undefined (membership: Undefined, regular: Undefined)
        TLS active:             Yes (client certificate verified)
        Vote:                   No change (ACK)
    Node ID 6:
        Client address:         ::ffff:10.99.30.42:40404
        HB interval:            8000ms
        Configured node list:   1, 2, 3, 4, 5, 6, 7, 8
        Ring ID:                1.3a
        Membership node list:   1, 2, 3, 4, 5, 6, 7, 8
        Heuristics:             Undefined (membership: Undefined, regular: Undefined)
        TLS active:             Yes (client certificate verified)
        Vote:                   ACK (ACK)
    Node ID 7:
        Client address:         ::ffff:10.99.30.191:24736
        HB interval:            8000ms
        Configured node list:   1, 2, 3, 4, 5, 6, 7, 8
        Ring ID:                1.3a
        Membership node list:   1, 2, 3, 4, 5, 6, 7, 8
        Heuristics:             Undefined (membership: Undefined, regular: Undefined)
        TLS active:             Yes (client certificate verified)
        Vote:                   ACK (ACK)
    Node ID 8:
        Client address:         ::ffff:10.99.30.33:29004
        HB interval:            8000ms
        Configured node list:   1, 2, 3, 4, 5, 6, 7, 8
        Ring ID:                1.3a
        Membership node list:   1, 2, 3, 4, 5, 6, 7, 8
        Heuristics:             Undefined (membership: Undefined, regular: Undefined)
        TLS active:             Yes (client certificate verified)
        Vote:                   ACK (ACK)

6.2. Configuring a majority-maker node for cluster quorum
Copy link

You can use an additional cluster node to serve as an extra quorum vote. In the following steps we configure such a majority-maker node dc3mm in the existing cluster.

6.2.1. Preparing the majority-maker node
Copy link

Install the node with the same operating system version as your HANA nodes.
Configure the RHEL High Availability repository on the majority-maker node.
If the firewalld service is installed and you are not using it on your hosts, disable the service. See Disabling the firewalld service.

6.2.2. Updating the host names in /etc/hosts
Copy link

As for all cluster nodes, we recommend that you also add the majority-maker cluster node to the /etc/hosts file on each node. On the new dc3mm node you add all cluster nodes.

Procedure

Add the new host to the /etc/hosts on all existing cluster nodes and add all hosts to the new dc3mm host, so that the resulting /etc/hosts entries are the same on all nodes:

[root]# cat /etc/hosts

...
192.168.100.101 dc1hana1.example.com dc1hana1
192.168.100.102 dc1hana2.example.com dc1hana2
192.168.100.103 dc1hana3.example.com dc1hana3
192.168.100.104 dc1hana4.example.com dc1hana4
192.168.100.121 dc2hana1.example.com dc2hana1
192.168.100.122 dc2hana2.example.com dc2hana2
192.168.100.123 dc2hana3.example.com dc2hana3
192.168.100.124 dc2hana4.example.com dc2hana
192.168.100.110 dc3mm.example.com    dc3mm

Verification

Check that you can ping the hosts between each other. This step is optional and an example only for a basic verification. The system resolves entries in /etc/hosts when you use the ping command:
```
[root]# ping dc3mm.example.com
```
```
PING dc3mm.example.com (192.168.100.110) 56(84) bytes of data.
64 bytes from dc3mm.example.com (192.168.100.110): icmp_seq=1 ttl=64 time=0.017 ms
…
```

6.2.3. Updating the cluster clone resources
Copy link

The cluster automatically creates one copy of a clone resource for each cluster node and uses this to calculate resource allocations. For example, if the cluster consists of 8 HANA nodes and you add an additional majority-maker node for a quorum vote only, the cluster automatically calculates with all 9 cluster members for clone resource assignments. However, including this non-HANA node in the calculations can lead to unexpected impact when the cluster moves resources.

To prevent this influence you must configure the clone-max limit for all cloned resources explicitly to only the number of HANA nodes. Adjust the clone configuration before you add the new node to the cluster.

Procedure

Update the SAPHanaTopology resource clone with a limit to the number of HANA nodes, for example, 8:
```
[root]# pcs resource update cln_SAPHanaTop_<SID>_HDB<instance> meta clone-max=8
```
Update the SAPHanaController resource clone with a limit to the number of HANA nodes, for example, 8:
```
[root]# pcs resource update cln_SAPHanaCon_<SID>_HDB<instance> meta clone-max=8
```

Verification

Check that the clone-max option is correct for all clone resources:
```
[root]# pcs resource config | grep -i clone
```
```
Clone: cln_SAPHanaTop_RH1_HDB02
    clone-max=8
    clone-node-max=1
Clone: cln_SAPHanaCon_RH1_HDB02
    clone-max=8
    clone-node-max=1
```
- clone-max must be the number of HANA nodes, for example, 8. When this option is not displayed, it defaults to the total number of cluster nodes instead.

6.2.4. Installing the cluster components on the majority-maker node
Copy link

Install the same cluster packages as on the existing cluster nodes to prepare the host in the same way.

Prerequisites

You have configured the RHEL High Availability repository on the majority-maker host.

Procedure

Install the Red Hat High Availability Add-On software packages from the High Availability repository. Choose the same fence agents as you are using on the existing cluster nodes:
```
[root]# dnf install pcs pacemaker fence-agents-<model>
```
Start and enable the pcsd service on the new nodes. The --now parameter automatically starts the enabled service:
```
[root]# systemctl enable --now pcsd.service
```
Optional: If you use the local firewalld service you must enable the ports that are required by the Red Hat High Availability Add-On. Run this on the new node:
```
[root]# firewall-cmd --add-service=high-availability
[root]# firewall-cmd --runtime-to-permanent
```
Set a password for the user hacluster on the new node using the same password:
```
[root]# passwd hacluster
```

Verification

Check that the pcsd service is running and shows as loaded and active on the new node:

[root]# systemctl status pcsd.service

● pcsd.service - PCS GUI and remote configuration interface
     Loaded: loaded (/usr/lib/systemd/system/pcsd.service; enabled; preset: disabled)
     Active: active (running) since …
…

6.2.5. Adding the new node to the cluster
Copy link

Add the dedicated majority-maker node as a regular cluster member.

Prerequisites

You have configured a cluster to which you want to add this node as a member.

Procedure

Authenticate the user hacluster for the new node in the cluster. Run this on one cluster node:
```
[root]# pcs host auth dc3mm
Username: hacluster
Password:
```
```
dc3mm: Authorized
```
- Enter the node name with or without FQDN, as defined in the /etc/hosts file.
- Enter the hacluster user password in the prompt.

Add the new node to the cluster. Run this on one cluster node:

[root]# pcs cluster node add dc3mm

No addresses specified for host 'dc3mm', using 'dc3mm'
Disabling sbd...
dc3mm: sbd disabled
Sending 'corosync authkey', 'pacemaker authkey' to 'dc3mm'
dc3mm: successful distribution of the file 'corosync authkey'
dc3mm: successful distribution of the file 'pacemaker authkey'
Sending updated corosync.conf to nodes...
dc1hana3: Succeeded
dc1hana2: Succeeded
dc1hana1: Succeeded
dc2hana1: Succeeded
dc1hana4: Succeeded
dc2hana2: Succeeded
dc2hana3: Succeeded
dc2hana4: Succeeded
dc3mm: Succeeded
dc2hana1: Corosync configuration reloaded

Add a location constraint to prevent any HANA resource from running on this node, and also prevent the cluster from trying to check the initial resource status. The following constraint definition uses a regexp expression to match all HANA resources. If required, adjust the pattern to match your resource names:
```
[root]# pcs constraint location add avoid-dc3mm \
regexp%.*SAPHana.* dc3mm -- -INFINITY resource-discovery=never
```
Enable the cluster on the new node to be started automatically on system start. Run on any node:
```
[root]# pcs cluster enable dc3mm
```
```
dc3mm: Cluster Enabled
```

Start the cluster on the new node:

[root]# pcs cluster start dc3mm

dc3mm: Starting Cluster...

Verification

Check the location constraint that keeps HANA resources off the new node:

[root]# pcs constraint location --full

Location Constraints:
  Resource pattern: .*SAPHana.*
    Disabled on:
      Node: dc3mm (score:-INFINITY) (resource-discovery=never) (id:avoid-dc3mm)

Check the cluster status. Verify that the cluster daemon services are in the desired state. Run this on the new node to also verify the local daemon status at the end:

[root]# pcs status --full

…
Node List:
  * Node dc1hana1 (1): online, feature set 3.16.2
  * Node dc1hana2 (2): online, feature set 3.16.2
  * Node dc1hana3 (3): online, feature set 3.16.2
  * Node dc1hana4 (4): online, feature set 3.16.2
  * Node dc2hana1 (5): online, feature set 3.16.2
  * Node dc2hana2 (6): online, feature set 3.16.2
  * Node dc2hana3 (7): online, feature set 3.16.2
  * Node dc2hana4 (8): online, feature set 3.16.2
  * Node dc3mm (9): online, feature set 3.16.2

Full List of Resources:
  * rsc_fence       (stonith:<stonith agent>):     Started dc1hana1
  * Clone Set: cln_SAPHanaTop_RH1_HDB02 [rsc_SAPHanaTop_RH1_HDB02]:
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc2hana3
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc2hana4
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc1hana1
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc1hana2
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc1hana3
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc1hana4
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc2hana1
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc2hana2
  * Clone Set: cln_SAPHanaCon_RH1_HDB02 [rsc_SAPHanaCon_RH1_HDB02] (promotable):
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc2hana3
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc2hana4
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Promoted dc1hana1
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc1hana2
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc1hana3
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc1hana4
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc2hana1
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc2hana2
  * rsc_vip_RH1_HDB02_primary   (ocf:heartbeat:IPaddr2):         Started dc1hana1
  * rsc_vip_RH1_HDB02_readonly  (ocf:heartbeat:IPaddr2):         Started dc1hana1

…

The new node must be displayed in the Node List and in the PCSD Status list.
The new node must not show a status in the Full List of Resources.

Verify the quorum status. The new node adds a vote, so that there is now an odd number of votes and there can be no equal 50/50 split anymore:

[root]# pcs quorum status

Quorum information
------------------
Date:             Thu Dec 11 10:25:40 2025
Quorum provider:  corosync_votequorum
Nodes:            9
Node ID:          1
Ring ID:          1.3e
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   9
Highest expected: 9
Total votes:      9
Quorum:           5
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
         1          1         NR dc1hana1 (local)
         2          1         NR dc1hana2
         3          1         NR dc1hana3
         4          1         NR dc1hana4
         5          1         NR dc2hana1
         6          1         NR dc2hana2
         7          1         NR dc2hana3
         8          1         NR dc2hana4
         9          1         NR dc3mm

Next steps

Add the new node to your individual fencing method. See Configuring fencing in a Red Hat High Availability cluster.

Chapter 7. Testing the setup
Copy link

Test your new HANA HA cluster thoroughly before you enable it for production workloads.

Enhance the basic example test cases with your specific requirements.

7.1. Detecting the system replication state changes
Copy link

You must monitor the sync state information in logs and cluster attributes when disrupting the system replication, to test the correct functionality of the SAPHanaSR HA/DR provider.

In this test, you use the primary site for monitoring the system replication status and for verifying the log messages. On a secondary instance you freeze the indexserver process to simulate a system replication issue while the primary remains fully intact.

Prerequisites

You have configured the mandatory SAPHanaSR HA/DR provider.
Your HANA instances are in a healthy state on all cluster nodes and the system replication is in sync.

Procedure

As user <sid>adm go to the HANA Python directory on the primary site and check the current system replication state. Verify that it is ACTIVE and fully synced:
```
rh1adm$ cdpy; python systemReplicationStatus.py
```
```
…
status system replication site "2": ACTIVE
overall system replication status: ACTIVE
…
```
Verify that the srHook and sync_state cluster attributes are both SOK in the attributes summary of the secondary site. Run this command as the root user on any node in a separate terminal to keep track of the attribute changes:
```
[root]# watch SAPHanaSR-showAttr
```
```
Global cib-time                 prim sec srHook sync_state upd
---------------------------------------------------------------
RH1    Thu Dec 11 10:59:25 2025 DC1  DC2 SOK    SOK        ok


...
```
You can use the watch command to run the command in a loop at a default interval of 2 seconds.
On an instance on the secondary site, for example, dc2hana2, get the process ID (PID) of the hdbindexserver process. For example, you can get it from the PID column of the HDB info output as user <sid>adm:
```
rh1adm$ HDB info
```
On the same instance on the secondary site, use the PID to simulate a hanging hdbindexserver process by sending the STOP signal to the process. This freezes the process and blocks it from communicating and syncing the instance between the nodes:
```
rh1adm$ kill -STOP <PID>
```

Verification

On the primary site, watch the system replication status for the change on any primary instance. In the following example the system’s cut utility helps you limit the output to certain fields for readability. Remove it to see all columns of the table formatted text output. In the example we froze the indexserver on the secondary node dc2hana2, which results in a replication error with that node’s counterpart on the primary site, dc1hana2:

rh1adm$ cdpy; watch "python systemReplicationStatus.py | cut -d '|' -f 1-3,5,9,13-"

|Database |Host     |Service Name |Secondary |Secondary     |Replication |Replication |Replication                   |Secondary    |
|         |         |             |Host      |Active Status |Mode        |Status      |Status Details                |Fully Synced |
|-------- |-------- |------------ |--------- |------------- |----------- |----------- |----------------------------- |------------ |
|RH1      |dc1hana3 |indexserver  |dc2hana3  |YES           |SYNC        |ACTIVE      |                              |        True |
|RH1      |dc1hana2 |indexserver  |dc2hana2  |YES           |SYNC        |ERROR       |Log shipping timeout occurred |       False |
|SYSTEMDB |dc1hana1 |nameserver   |dc2hana1  |YES           |SYNC        |ACTIVE      |                              |        True |
|RH1      |dc1hana1 |xsengine     |dc2hana1  |YES           |SYNC        |ACTIVE      |                              |        True |
|RH1      |dc1hana1 |indexserver  |dc2hana1  |YES           |SYNC        |ACTIVE      |                              |        True |

status system replication site "2": ERROR
overall system replication status: ERROR

...

The replication status changes to ERROR for the indexserver service after a bit. It can take a while to react on an idle instance, wait a minute or more.

On the primary site’s master name server node, check the HANA nameserver process log for the related messages as the <sid>adm user:

rh1adm$ cdtrace; grep -he 'HanaSR.srConnectionChanged.*' nameserver_*

ha_dr_SAPHanaSR  SAPHanaSR.py(00057) : SAPHanaSR (0.184.1) SAPHanaSR.srConnectionChanged method called with
 Dict={'hostname': 'dc1hana2', 'port': '30203', 'volume': 4, 'service_name': 'indexserver', 'database': 'RH1',
 'status': 11, 'database_status': 11, 'system_status': 11, 'timestamp': '2025-12-11T11:52:01.069691+00:00',
 'is_in_sync': False, 'system_is_in_sync': False, 'reason': '', 'siteName': 'DC2'}
ha_dr_SAPHanaSR  SAPHanaSR.py(00068) : SAPHanaSR.srConnectionChanged() CALLING CRM: <sudo /usr/sbin/crm_attribute -n hana_rh1_gsh -v 1.0 -l reboot> rc=0
ha_dr_SAPHanaSR  SAPHanaSR.py(00069) : SAPHanaSR.srConnectionChanged() Running old srHookGeneration 1.0, see attribute hana_rh1_gsh too
ha_dr_SAPHanaSR  SAPHanaSR.py(00087) : SAPHanaSR SAPHanaSR.srConnectionChanged method called with
 Dict={'hostname': 'dc1hana2', 'port': '30203', 'volume': 4, 'service_name': 'indexserver', 'database': 'RH1',
 'status': 11, 'database_status': 11, 'system_status': 11, 'timestamp': '2025-12-11T11:52:01.069691+00:00',
 'is_in_sync': False, 'system_is_in_sync': False, 'reason': '', 'siteName': 'DC2'} ###

The nameserver process log contains the SAPHanaSR hook logs.

Verify that both cluster attributes for the system replication status, srHook and sync_state, show the SFAIL status of the secondary site. Run the following as the root user on any HANA node or use the open terminal from the previous steps to watch the changes:
```
[root]# SAPHanaSR-showAttr
```
```
Global cib-time                 prim sec srHook sync_state upd
---------------------------------------------------------------
RH1    Thu Dec 11 12:24:40 2025 DC1  DC2 SFAIL  SFAIL      ok

...
```
Unblock the previously frozen hdbindexserver PID to enable it again. Run this on the secondary instance on which you blocked the hdbindexserver process for the test:
```
rh1adm$ kill -CONT <PID>
```
Repeat the previous checks to verify that the system replication recovers fully after a bit. The cluster does not trigger any actions during this test since the resources remain running. Ensure that the system replication status is healthy again, fully synced and the cluster attributes are set to SOK again for the secondary site.

7.2. Triggering the indexserver crash recovery
Copy link

Test the functionality of the ChkSrv HA/DR provider by simulating the crash of an hdbindexserver process. You can run this on the primary or on the secondary site. The exact recovery actions depend on the overall configuration. The following steps demonstrate the activity when using action_on_lost = stop in the hook configuration..

Prerequisites

You have configured the ChkSrv HA/DR provider. Skip this test if you have not configured this optional hook.
Your HANA instances have a healthy HANA system replication.
You have no failures in the cluster status.

Procedure

Use a separate terminal to monitor the HANA processes as user <sid>adm on the instance on which you run this test:
```
rh1adm$ watch "sapcontrol -nr ${TINSTANCE} -function GetProcessList | column -s ',' -t"
```
In another terminal on the same HANA instance, kill the hdbindexserver process:
```
rh1adm$ kill <PID>
```

Verification

Check the dedicated HANA nameserver trace log on the same instance and identify the event and related action, as user <sid>adm:

rh1adm$ cdtrace; less nameserver_chksrv.trc

...
ChkSrv version 0.7.8. Method srServiceStateChanged method called.
ChkSrv srServiceStateChanged method called with Dict={'hostname': 'dc2hana2',
 'service_name': 'indexserver', 'service_port': '30203', 'service_status': 'stopping',
 'service_previous_status': 'yes', 'timestamp': '2025-12-11T13:01:10.872386+00:00',
 'daemon_status': 'yes', 'database_id': '3', 'database_name': 'RH1', 'database_status': 'yes',
 'details': ''}
ChkSrv srServiceStateChanged method called with SAPSYSTEMNAME=RH1
srv:indexserver-30203-stopping-yes db:RH1-3-yes daem:yes
LOST: indexserver event looks like a lost indexserver (status=stopping)
LOST: stop instance. action_on_lost=stop

...

Check the cluster status for resource failure information on any cluster node, as user root:

[root]# pcs status --full

...
Failed Resource Actions:
  * rsc_SAPHanaCon_RH1_HDB02_monitor_61000 on dc2hana3 'not running' (7): call=183, status='complete',
last-rc-change='Thu Dec 11 13:02:31 2025', queued=0ms, exec=0ms
  * rsc_SAPHanaCon_RH1_HDB02_monitor_61000 on dc2hana4 'not running' (7): call=179, status='complete',
last-rc-change='Thu Dec 11 13:02:35 2025', queued=0ms, exec=0ms
  * rsc_SAPHanaCon_RH1_HDB02_monitor_61000 on dc2hana1 'not running' (7): call=108, status='complete',
last-rc-change='Thu Dec 11 13:01:49 2025', queued=0ms, exec=0ms
  * rsc_SAPHanaCon_RH1_HDB02_monitor_61000 on dc2hana2 'error' (1): call=179, status='complete',
last-rc-change='Thu Dec 11 13:02:39 2025', queued=0ms, exec=0ms

...

Check the system log for the related cluster actions on the on the test node, for example, dc2hana2, as user root:

[root]# grep rsc_SAPHanaCon_RH1_HDB02 /var/log/messages

...
Dec 11 13:01:33 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[311166]: INFO: RA ==== begin action monitor_clone (0.185.3) ====
Dec 11 13:01:33 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[311166]: INFO: ACT: systemd service SAPRH1_02.service is active
Dec 11 13:01:33 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[311166]: ERROR: SAP instance service hdbdaemon status color is GRAY !
Dec 11 13:01:34 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[311166]: WARNING: RA: HANA_CALL stderr from command 'su - rh1adm -c' is '', stderr from command 'python landscapeHostConfiguration.py --sapcontrol=1' is ''
Dec 11 13:01:34 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[311166]: INFO: DEC: local instance AND landscape are down (lss=1)
Dec 11 13:01:34 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[311166]: INFO: DEC: DEMOTED => OCF_SUCCESS
Dec 11 13:01:34 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[311166]: INFO: RA ==== end action monitor_clone with rc=0 (0.185.3) (3s)====
Dec 11 13:01:49 dc2hana2 pacemaker-attrd[1977]: notice: Setting last-failure-rsc_SAPHanaCon_RH1_HDB02#monitor_61000[dc2hana1]: (unset) -> 1765458109
Dec 11 13:01:49 dc2hana2 pacemaker-attrd[1977]: notice: Setting fail-count-rsc_SAPHanaCon_RH1_HDB02#monitor_61000[dc2hana1]: (unset) -> 1
Dec 11 13:01:58 dc2hana2 pacemaker-attrd[1977]: notice: Setting master-rsc_SAPHanaCon_RH1_HDB02[dc2hana1]: 100 -> 0
Dec 11 13:02:31 dc2hana2 pacemaker-attrd[1977]: notice: Setting last-failure-rsc_SAPHanaCon_RH1_HDB02#monitor_61000[dc2hana3]: (unset) -> 1765458151
Dec 11 13:02:31 dc2hana2 pacemaker-attrd[1977]: notice: Setting fail-count-rsc_SAPHanaCon_RH1_HDB02#monitor_61000[dc2hana3]: (unset) -> 1
Dec 11 13:02:35 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[312314]: INFO: DEC: is_master_nameserver  rc=0
Dec 11 13:02:35 dc2hana2 pacemaker-attrd[1977]: notice: Setting last-failure-rsc_SAPHanaCon_RH1_HDB02#monitor_61000[dc2hana4]: (unset) -> 1765458155
Dec 11 13:02:35 dc2hana2 pacemaker-attrd[1977]: notice: Setting fail-count-rsc_SAPHanaCon_RH1_HDB02#monitor_61000[dc2hana4]: (unset) -> 1
Dec 11 13:02:36 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[312314]: INFO: RA: multiTargetSupport attribute not set. May be no Hook is configured or the old-style Hook is used.
Dec 11 13:02:36 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[312314]: INFO: DEC: is_master_nameserver  rc=0
Dec 11 13:02:36 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[312314]: INFO: DEC: ===> master_walk: priorities for site DC2 master1= master2=dc2hana2 master3= ==> active_master= best_cold_master=dc2hana2
Dec 11 13:02:36 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[312314]: INFO: DEC: ===> master_walk: the_master=dc2hana2;
Dec 11 13:02:36 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[312314]: INFO: RA ==== begin action monitor_clone (0.185.3) ====
Dec 11 13:02:38 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[312314]: INFO: ACT: systemd service SAPRH1_02.service is active
Dec 11 13:02:39 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[312314]: INFO: RA ==== end action monitor_clone with rc=1 (0.185.3) (4s)====
Dec 11 13:02:39 dc2hana2 pacemaker-controld[1979]: notice: Result of monitor operation for rsc_SAPHanaCon_RH1_HDB02 on dc2hana2: error
Dec 11 13:02:39 dc2hana2 pacemaker-controld[1979]: notice: rsc_SAPHanaCon_RH1_HDB02_monitor_61000@dc2hana2 output [ 10\n ]
Dec 11 13:02:39 dc2hana2 pacemaker-attrd[1977]: notice: Setting last-failure-rsc_SAPHanaCon_RH1_HDB02#monitor_61000[dc2hana2]: (unset) -> 1765458159
Dec 11 13:02:39 dc2hana2 pacemaker-attrd[1977]: notice: Setting fail-count-rsc_SAPHanaCon_RH1_HDB02#monitor_61000[dc2hana2]: (unset) -> 1
Dec 11 13:03:34 dc2hana2 pacemaker-attrd[1977]: notice: Setting master-rsc_SAPHanaCon_RH1_HDB02[dc2hana3]: -12200 -> -22200
Dec 11 13:03:38 dc2hana2 pacemaker-attrd[1977]: notice: Setting master-rsc_SAPHanaCon_RH1_HDB02[dc2hana4]: 80 -> -32300
Dec 11 13:03:41 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[313705]: INFO: DEC: is_master_nameserver  rc=0
Dec 11 13:03:41 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[313705]: INFO: RA: multiTargetSupport attribute not set. May be no Hook is configured or the old-style Hook is used.
Dec 11 13:03:41 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[313705]: INFO: DEC: is_master_nameserver  rc=0
Dec 11 13:03:42 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[313705]: INFO: DEC: ===> master_walk: priorities for site DC2 master1=dc2hana1 master2=dc2hana2 master3=dc2hana4 ==> active_master=dc2hana1 best_cold_master=
Dec 11 13:03:42 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[313705]: INFO: DEC: ===> master_walk: the_master=dc2hana1;
Dec 11 13:03:42 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[313705]: INFO: RA ==== begin action monitor_clone (0.185.3) ====
Dec 11 13:03:42 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[313705]: INFO: ACT: systemd service SAPRH1_02.service is active
Dec 11 13:03:42 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[313705]: INFO: DEC: the_master=<<dc2hana1>>
Dec 11 13:03:42 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[313705]: INFO: RA: SRHOOK0=
Dec 11 13:03:42 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[313705]: INFO: RA: SRHOOK1=SFAIL
Dec 11 13:03:42 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[313705]: INFO: RA: SRHOOK3=SFAIL
Dec 11 13:03:42 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[313705]: INFO: DEC: scoring 2:S:master2:slave:worker:slave : SFAIL
Dec 11 13:03:42 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[313705]: INFO: DEC: scoring_crm_master: roles(2:S:master2:slave:worker:slave) are matching pattern ([0-9]:S:)
Dec 11 13:03:42 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[313705]: INFO: DEC: scoring_crm_master: sync(SFAIL) is matching syncPattern (.*)
Dec 11 13:03:42 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[313705]: INFO: DEC: scoring_crm_master: set score -32300
Dec 11 13:03:42 dc2hana2 pacemaker-attrd[1977]: notice: Setting master-rsc_SAPHanaCon_RH1_HDB02[dc2hana2]: 80 -> -32300
Dec 11 13:03:42 dc2hana2 SAPHanaController(rsc_SAPHanaCon_RH1_HDB02)[313705]: INFO: RA ==== end action monitor_clone with rc=0 (0.185.3) (2s)====

...

The next SAPHanaController resource monitor reports the unexpectedly stopped HANA instances as a failure and initiates the recovery steps according to the configuration. If PREFER_SITE_TAKEOVER is enabled and you executed the test on a primary instance, it triggers a HANA takeover to the secondary site.

Next steps

When necessary, depending on the configuration, manually reregister the stopped former primary HANA site and start it using HANA tools. For more information, refer to Registering the former primary HANA site as a secondary HANA site after a takeover.
Clear any failure notifications from the cluster that might be there from previous testing. For more information see Cleaning up the failure history.

7.3. Triggering a HANA takeover using cluster commands
Copy link

Use the cluster command to move the promoted resource to the other site and manually test the planned takeover of the primary to the secondary site.

Prerequisites

Your HANA instances have a healthy HANA system replication.
You have no failures in the cluster status.

Procedure

Switch the primary site to the secondary site. Run the cluster command as user root on any node and use the coordinator instance of the secondary site as the target, for example, dc2hana1. If you do not name the target node in a HANA setup with alternate nameserver candidates, the cluster first unsuccessfully attempts to promote the alternative nodes before it ends up promoting the correct coordinator node on the secondary HANA site:

[root]# pcs resource move cln_SAPHanaCon_<SID>_HDB<instance> <secondary coordinator instance>

Location constraint to move resource 'cln_SAPHanaCon_RH1_HDB02' has been created
Waiting for the cluster to apply configuration changes...
Location constraint created to move resource 'cln_SAPHanaCon_RH1_HDB02' has been removed
Waiting for the cluster to apply configuration changes...
resource 'cln_SAPHanaCon_RH1_HDB02' is promoted on node 'dc2hana1'; unpromoted on nodes
 'dc1hana1', 'dc1hana3', 'dc1hana4', 'dc2hana2', 'dc2hana3', 'dc2hana4'

Verification

Verify that the SAPHanaController resource is now promoted on the other site:
```
[root]# pcs resource status cln_SAPHanaCon_RH1_HDB02
```
```
* Clone Set: cln_SAPHanaCon_RH1_HDB02 [rsc_SAPHanaCon_RH1_HDB02] (promotable):
    * Promoted: [ dc2hana1 ]
    * Unpromoted: [ dc1hana3 dc1hana4 dc2hana2 dc2hana3 dc2hana4 ]
    * Stopped: [ dc1hana1 dc1hana2 dc3mm ]
```
The status of the previous primary site instances depends on the AUTOMATED_REGISTER parameter of the SAPHanaController resource. The instance stops until manual intervention when AUTOMATED_REGISTER is false, otherwise the instance restarts automatically and reregisters as the new secondary instance.

Next steps

When necessary, depending on the configuration, manually reregister the stopped former primary HANA site and start it using HANA tools. For more information, refer to Registering the former primary HANA site as a secondary HANA site after a takeover.
Clear any failure notifications from the cluster that may be there from previous testing. For more information see Cleaning up the failure history.

7.4. Crashing the node with a primary instance
Copy link

Simulate the crash of the cluster node on which a primary instance is running to test the behavior of your HANA cluster resources.

Prerequisites

Your HANA instances have a healthy HANA system replication.
You have no failures in the cluster status.

Procedure

Trigger a crash on a HANA node on the primary site. This command immediately causes a crash of the node with no further warning:
```
[root]# echo c > /proc/sysrq-trigger
```

Verification

The cluster detects the failed node and fences it. You can watch the cluster activity on any of the remaining nodes:

[root]# pcs status --full

...
Pending Fencing Actions:
  * reboot of dc1hana1 pending: client=pacemaker-controld.1685, origin=dc1hana2
...

The secondary site takes over and becomes promoted as the new primary.
The fenced former primary node recovers according to your fencing and SAPHanaController resource configuration.

Next steps

When necessary, depending on the configuration, manually reregister the stopped former primary HANA site and start it using HANA tools. For more information, refer to Registering the former primary HANA site as a secondary HANA site after a takeover.
Clear any failure notifications from the cluster that may be there from previous testing. For more information see Cleaning up the failure history.

7.5. Crashing the node with a secondary instance
Copy link

Simulate the crash of the cluster node on which a secondary instance is running to test the behavior of your HANA cluster resources.

Procedure

Trigger a crash of a HANA node on the secondary site. This command immediately causes a crash of the node with no further warning:
```
[root]# echo c > /proc/sysrq-trigger
```

Verification

The cluster detects the failed node and fences it. You can watch the cluster activity on any of the remaining nodes:

[root]# pcs status --full

...
Pending Fencing Actions:
  * reboot of dc2hana1 pending: client=pacemaker-controld.1694, origin=dc1hana1
...

The primary site remains running while the secondary node restarts and recovers. The fenced node recovery depends on your fencing configuration.

Next steps

Clear any failure notifications from the cluster that may be there from previous testing. For more information see Cleaning up the failure history.

7.6. Stopping the primary site using SAP commands
Copy link

Test the behavior of the cluster when you manage the primary HANA site outside of the cluster using HANA commands.

Since the cluster is not aware of the execution of HANA commands, it detects the change as a failure and triggers the configured recovery actions.

Prerequisites

Your HANA instances have a healthy HANA system replication.
You have no failures in the cluster status.

Procedure

Stop the primary HANA site as the <sid>adm user outside of the cluster. Run on one HANA instance on the primary site:
```
rh1adm$ sapcontrol -nr ${TINSTANCE} -function StopSystem HDB
```

Verification

The cluster notices the stopped instance as a failure and initiates the recovery of the primary site:
```
[root]# pcs status --full
```
```
...
Migration Summary:
  * Node: dc1hana1 (1):
    * rsc_SAPHanaCon_RH1_HDB02: migration-threshold=5000 fail-count=1 last-failure=...

Failed Resource Actions:
  * rsc_SAPHanaCon_RH1_HDB02_monitor_59000 on dc2hana1 'promoted (failed)' (9): ...
```
If you configured and enabled both the PREFER_SITE_TAKEOVER and AUTOMATED_REGISTER parameters in the SAPHanaController resource, the cluster triggers a HANA takeover to the secondary site and automatically registers the failed primary as the new secondary. Otherwise it recovers the failed primary according to your configuration.

Next steps

When necessary, depending on the configuration, manually reregister the stopped former primary HANA site and start it using HANA tools. For more information, refer to Registering the former primary HANA site as a secondary HANA site after a takeover.
Clear any failure notifications from the cluster that may be there from previous testing. For more information see Cleaning up the failure history.

7.7. Stopping the secondary site using SAP commands
Copy link

Test the behavior of the cluster when you manage the secondary HANA site outside of the cluster using HANA commands.

Since the cluster is not aware of the execution of HANA commands, it detects the change as a failure and triggers the configured recovery actions.

Prerequisites

You have no failures in the cluster status.

Procedure

Stop the secondary HANA site as the <sid>adm user outside of the cluster. Run on one HANA instance on the secondary site:
```
rh1adm$ sapcontrol -nr ${TINSTANCE} -function StopSystem HDB
```

Verification

The cluster notices the stopped instance as a failure and recovers the secondary site:

[root]# pcs status --full

..
Migration Summary:
  * Node: dc2hana3 (7):
    * rsc_SAPHanaCon_RH1_HDB02: migration-threshold=5000 fail-count=1 last-failure=...
  * Node: dc2hana1 (5):
    * rsc_SAPHanaCon_RH1_HDB02: migration-threshold=5000 fail-count=1 last-failure=...
  * Node: dc2hana2 (6):
    * rsc_SAPHanaCon_RH1_HDB02: migration-threshold=5000 fail-count=1 last-failure=...
  * Node: dc2hana4 (8):
    * rsc_SAPHanaCon_RH1_HDB02: migration-threshold=5000 fail-count=1 last-failure=...

Failed Resource Actions:
  * rsc_SAPHanaCon_RH1_HDB02_monitor_61000 on dc2hana3 'not running' (7): ...
  * rsc_SAPHanaCon_RH1_HDB02_monitor_61000 on dc2hana1 'not running' (7): ...
  * rsc_SAPHanaCon_RH1_HDB02_monitor_61000 on dc2hana2 'not running' (7): ...
  * rsc_SAPHanaCon_RH1_HDB02_monitor_61000 on dc2hana4 'not running' (7): ...

Next steps

Clear any failure notifications from the cluster that may be there from previous testing. For more information see Cleaning up the failure history.

Chapter 8. Finishing the setup
Copy link

Ensure that the final setup is complete and the systems and resources are healthy, then you can enable the environment for production workloads.

8.1. Enabling the automatic registration of HANA after a takeover (optional)
Copy link

If you want a previously failed primary site to automatically recover as a fully functional secondary site without manual verification of the data consistency, you can enable the SAPHanaController resource to re-register the site right after a takeover.

This enables the previously failed primary site to continue the HANA system replication and automatically take over again in the event of a new failure of the new primary site.

Your HANA operator must decide if they first require to manually check the health of the previously failed instance and re-register the HANA site afterwards, or if the priority is on a faster automatic recovery of the full high availability.

Procedure

Update the SAPHanaController resource and override the default AUTOMATED_REGISTER:

[root]# pcs resource update rsc_SAPHanaCon_<SID>_HDB<instance> AUTOMATED_REGISTER=true

Verification

Check that AUTOMATED_REGISTER is set to true:

[root]# pcs resource config rsc_SAPHanaCon_RH1_HDB02

Resource: rsc_SAPHanaCon_RH1_HDB02 (class=ocf provider=heartbeat type=SAPHanaController)
  Attributes: rsc_SAPHanaCon_RH1_HDB02-instance_attributes
    AUTOMATED_REGISTER=true
    DUPLICATE_PRIMARY_TIMEOUT=7200
    InstanceNumber=02
    PREFER_SITE_TAKEOVER=true
    SID=RH1
...

Warning

Setting AUTOMATED_REGISTER to true can potentially increase the risk of data loss or corruption. If the HA cluster triggers a takeover when the data on the secondary HANA site is not fully in sync, the automatic registration of the old primary HANA site as the new secondary HANA site results in data loss on this site and any data that was not synced before the takeover occurred is lost as well.

For more information, see the article on the SAP Technology Blog for Members: Be Prepared for Using Pacemaker Cluster for SAP HANA – Part 2: Failure of Both Nodes.

8.2. Reviewing the final cluster state
Copy link

After the configuration of a 8-node cluster for a scale-out HANA system replication setup, the status looks like in the below example.

Your cluster state may deviate from the example, depending on your setup of optional or platform dependent resources, like the individual fencing or VIP resources.

Also, you can decide if you want to disable the cluster service, so that it does not start automatically on system boot. This requires manual intervention on every system boot, but allows you more control and supervision for the startup.

[root]# pcs status --full

Cluster name: hana-scaleout-cluster
Status of pacemakerd: 'Pacemaker is running' (last updated 2025-12-18 16:27:04Z)
Cluster Summary:
  * Stack: corosync
  * Current DC: dc2hana1 (5) (version 2.1.5-9.el9_2.5-a3f44794f94) - partition with quorum
  * Last updated: Thu Dec 18 16:27:05 2025
  * Last change:  Thu Dec 18 16:26:37 2025 by root via crm_attribute on dc1hana1
  * 9 nodes configured
  * 19 resource instances configured

Node List:
  * Node dc1hana1 (1): online, feature set 3.16.2
  * Node dc1hana2 (2): online, feature set 3.16.2
  * Node dc1hana3 (3): online, feature set 3.16.2
  * Node dc1hana4 (4): online, feature set 3.16.2
  * Node dc2hana1 (5): online, feature set 3.16.2
  * Node dc2hana2 (6): online, feature set 3.16.2
  * Node dc2hana3 (7): online, feature set 3.16.2
  * Node dc2hana4 (8): online, feature set 3.16.2
  * Node dc3mm (9): online, feature set 3.16.2

Full List of Resources:
  * rsc_fence       (stonith:<fence agent>):     Started dc1hana2
  * Clone Set: cln_SAPHanaTop_RH1_HDB02 [rsc_SAPHanaTop_RH1_HDB02]:
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc2hana3
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc1hana1
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc1hana2
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc1hana3
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc2hana1
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc2hana2
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc1hana4
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc2hana4
  * Clone Set: cln_SAPHanaCon_RH1_HDB02 [rsc_SAPHanaCon_RH1_HDB02] (promotable):
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc2hana3
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Promoted dc1hana1
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc1hana2
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc1hana3
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc2hana1
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc2hana2
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc1hana4
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc2hana4
  * rsc_vip_RH1_HDB02_primary   (ocf:heartbeat:IPaddr2):         Started dc1hana1
  * rsc_vip_RH1_HDB02_readonly  (ocf:heartbeat:IPaddr2):         Started dc2hana1

Node Attributes:
  * Node: dc1hana1 (1):
    * hana_rh1_clone_state              : PROMOTED
    * hana_rh1_gra                      : 2.0
    * hana_rh1_gsh                      : 1.0
    * hana_rh1_roles                    : master1:master:worker:master
    * hana_rh1_site                     : DC1
    * hana_rh1_sra                      : -
    * master-rsc_SAPHanaCon_RH1_HDB02   : 150
  * Node: dc1hana2 (2):
    * hana_rh1_clone_state              : DEMOTED
    * hana_rh1_gra                      : 2.0
    * hana_rh1_gsh                      : 1.0
    * hana_rh1_roles                    : master2:slave:worker:slave
    * hana_rh1_site                     : DC1
    * master-rsc_SAPHanaCon_RH1_HDB02   : 140
  * Node: dc1hana3 (3):
    * hana_rh1_clone_state              : DEMOTED
    * hana_rh1_gra                      : 2.0
    * hana_rh1_gsh                      : 1.0
    * hana_rh1_roles                    : slave:slave:worker:slave
    * hana_rh1_site                     : DC1
    * master-rsc_SAPHanaCon_RH1_HDB02   : -10000
  * Node: dc1hana4 (4):
    * hana_rh1_clone_state              : DEMOTED
    * hana_rh1_gra                      : 2.0
    * hana_rh1_gsh                      : 1.0
    * hana_rh1_roles                    : master3:slave:standby:standby
    * hana_rh1_site                     : DC1
    * master-rsc_SAPHanaCon_RH1_HDB02   : 140
  * Node: dc2hana1 (5):
    * hana_rh1_clone_state              : DEMOTED
    * hana_rh1_gra                      : 2.0
    * hana_rh1_gsh                      : 1.0
    * hana_rh1_roles                    : master1:master:worker:master
    * hana_rh1_site                     : DC2
    * hana_rh1_sra                      : -
    * master-rsc_SAPHanaCon_RH1_HDB02   : 100
  * Node: dc2hana2 (6):
    * hana_rh1_clone_state              : DEMOTED
    * hana_rh1_gra                      : 2.0
    * hana_rh1_gsh                      : 1.0
    * hana_rh1_roles                    : master2:slave:worker:slave
    * hana_rh1_site                     : DC2
    * master-rsc_SAPHanaCon_RH1_HDB02   : 80
  * Node: dc2hana3 (7):
    * hana_rh1_clone_state              : DEMOTED
    * hana_rh1_gra                      : 2.0
    * hana_rh1_gsh                      : 1.0
    * hana_rh1_roles                    : slave:slave:worker:slave
    * hana_rh1_site                     : DC2
    * master-rsc_SAPHanaCon_RH1_HDB02   : -12200
  * Node: dc2hana4 (8):
    * hana_rh1_clone_state              : DEMOTED
    * hana_rh1_gra                      : 2.0
    * hana_rh1_gsh                      : 1.0
    * hana_rh1_roles                    : master3:slave:standby:standby
    * hana_rh1_site                     : DC2
    * master-rsc_SAPHanaCon_RH1_HDB02   : 80

Migration Summary:

Tickets:

PCSD Status:
  dc1hana1: Online
  dc1hana2: Online
  dc1hana3: Online
  dc1hana4: Online
  dc2hana1: Online
  dc2hana2: Online
  dc2hana3: Online
  dc2hana4: Online
  dc3mm: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

In a healthy setup the additional cluster attributes appear like in this example:

[root]# SAPHanaSR-showAttr

Global cib-time                 prim sec srHook sync_state upd
---------------------------------------------------------------
RH1    Thu Dec 18 16:28:49 2025 DC1  DC2 SOK    SOK        ok

Sites lpt        lss mns      srr
----------------------------------
DC1   1766075329 4   dc1hana1 P
DC2   30         4   dc2hana1 S

Hosts    clone_state gra gsh node_state roles                         score  site sra
--------------------------------------------------------------------------------------
dc1hana1 PROMOTED    2.0 1.0 online     master1:master:worker:master  150    DC1  -
dc1hana2 DEMOTED     2.0 1.0 online     master2:slave:worker:slave    140    DC1
dc1hana3 DEMOTED     2.0 1.0 online     slave:slave:worker:slave      -10000 DC1
dc1hana4 DEMOTED     2.0 1.0 online     master3:slave:standby:standby 140    DC1
dc2hana1 DEMOTED     2.0 1.0 online     master1:master:worker:master  100    DC2  -
dc2hana2 DEMOTED     2.0 1.0 online     master2:slave:worker:slave    80     DC2
dc2hana3 DEMOTED     2.0 1.0 online     slave:slave:worker:slave      -12200 DC2
dc2hana4 DEMOTED     2.0 1.0 online     master3:slave:standby:standby 80     DC2
dc3mm                        online

Chapter 9. Maintenance procedures
Copy link

You must apply specific steps to ensure that the cluster does not cause unplanned impact so that you can perform maintenance of different components of SAP HANA system replication HA environments.

Use maintenance procedures to keep your cluster in a healthy state during planned change activity or to restore the health after unplanned incidents.

9.1. Cleaning up the failure history
Copy link

Clear any failure notifications from the cluster that may be there from previous testing. This resets the failure counters and the migration thresholds.

Procedure

Clean up resource failures:
```
[root]# pcs resource cleanup
```
Clean up the STONITH failure history:
```
[root]# pcs stonith history cleanup
```

Verification

Check the overall cluster status and confirm that no failures are displayed anymore:
```
[root]# pcs status --full
```
Check that the stonith history for fencing actions has 0 events:
```
[root]# pcs stonith history
```

9.2. Triggering a HANA takeover using cluster commands
Copy link

Use the cluster control to execute a simple takeover of the primary site to the secondary site.

For detailed steps, refer to the section Testing the setup - Triggering a HANA takeover using cluster commands.

9.3. Updating the operating system and HA cluster components
Copy link

For updates or offline changes on the HA cluster, the operating system or even the system hardware, you must follow the Recommended Practices for Applying Software Updates to a RHEL High Availability or Resilient Storage Cluster.

9.4. Performing maintenance on the SAP HANA instances
Copy link

For any kind of maintenance of applications or other components that the HA cluster manages, you must enable the cluster maintenance mode to prevent the cluster from any interference during the maintenance.

During the update of your HANA instances, the cluster remains running, but is not actively monitoring resources or taking any actions. After the change on the HANA instances is done, it is vital to refresh the cluster resource status and verify that the detected resource states are all correct. Only then you can safely disable the maintenance mode without unexpected cluster actions.

If you need to stop the cluster for the maintenance activity, ensure that you set maintenance mode first, then stop and start the cluster on the node as required for the HANA maintenance.

Prerequisites

You have configured the Pacemaker cluster to manage the HANA system replication.

Procedure

Set maintenance mode for the entire cluster:
```
[root]# pcs property set maintenance-mode=true
```
Setting maintenance for the whole cluster ensures that no activity during the maintenance phase can trigger cluster actions and impact the HANA update process.

Verify that the cluster resource management is fully disabled:

[root]# pcs status

...
              *** Resource management is DISABLED ***
  The cluster will not attempt to start, stop or recover services

Node List:
  * Online: [ dc1hana1 dc1hana2 dc1hana3 dc1hana4 dc2hana1 dc2hana2 dc2hana3 dc2hana4 dc3mm ]

Full List of Resources:
  * rsc_fence       (stonith:<fence agent>):     Started dc1hana2 (unmanaged)
  * Clone Set: cln_SAPHanaTop_RH1_HDB02 [rsc_SAPHanaTop_RH1_HDB02] (unmanaged):
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc2hana3 (unmanaged)
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc1hana1 (unmanaged)
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc1hana2 (unmanaged)
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc1hana3 (unmanaged)
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc2hana1 (unmanaged)
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc2hana2 (unmanaged)
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc1hana4 (unmanaged)
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc2hana4 (unmanaged)
    * Stopped: [ dc3mm ]
  * Clone Set: cln_SAPHanaCon_RH1_HDB02 [rsc_SAPHanaCon_RH1_HDB02] (promotable, unmanaged):
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc2hana3 (unmanaged)
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc1hana1 (unmanaged)
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc1hana2 (unmanaged)
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc1hana3 (unmanaged)
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc2hana1 (unmanaged)
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc2hana2 (unmanaged)
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc1hana4 (unmanaged)
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc2hana4 (unmanaged)
    * Stopped: [ dc3mm ]
  * rsc_vip_RH1_HDB02_primary   (ocf:heartbeat:IPaddr2):         Started dc1hana1 (unmanaged)
  * rsc_vip_RH1_HDB02_readonly  (ocf:heartbeat:IPaddr2):         Started dc2hana1 (unmanaged)

...

Update the HANA instances using the SAP procedure. If you have to perform a takeover during the HANA update, you can use the SAP HANA Takeover with Handshake option. For more information see also Is it possible to use SAP HANA "Takeover with Handshake" option with the HA Solutions for managing HANA System Replication?.
If you stop the cluster in this step, ensure that you start it again before you proceed with the next steps. Keep the maintenance mode enabled.

After the HANA update, verify that the HANA system replication is working correctly. Use the systemReplicationStatus.py script to show the status of the HANA system replication on the primary site. Below is the example after a manual takeover to the secondary site during the maintenance:

[root]# su - <sid>adm -c "HDBSettings.sh systemReplicationStatus.py \
--sapcontrol=1 | grep -i replication_status="

service/dc2hana2/30203/REPLICATION_STATUS=ACTIVE
service/dc2hana3/30203/REPLICATION_STATUS=ACTIVE
service/dc2hana1/30201/REPLICATION_STATUS=ACTIVE
service/dc2hana1/30207/REPLICATION_STATUS=ACTIVE
service/dc2hana1/30203/REPLICATION_STATUS=ACTIVE
site/1/REPLICATION_STATUS=ACTIVE
overall_replication_status=ACTIVE

Before you proceed, ensure that the system replication is healthy and reported as ACTIVE.

Refresh all cluster resources to execute one monitor operation and update their status:
```
[root]# pcs resource refresh
```
```
Waiting for 1 reply from the controller
... got reply (done)
```
It is crucial that the HANA resources update cluster and node attributes to reflect the new HANA system replication status. It ensures that the cluster has the correct information and does not trigger recovery actions due to incorrect status information, after the maintenance stops.

Check the cluster status and verify the resource status and main HANA resource score attribute. All resources must show as Started and the promotable resources must show as Unpromoted on all nodes:

[root]# pcs status resources

 * Clone Set: cln_SAPHanaTop_RH1_HDB02 [rsc_SAPHanaTop_RH1_HDB02] (unmanaged):
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc1hana2 (unmanaged)
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc1hana3 (unmanaged)
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc1hana4 (unmanaged)
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc2hana2 (unmanaged)
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc2hana3 (unmanaged)
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc1hana1 (unmanaged)
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc2hana4 (unmanaged)
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc2hana1 (unmanaged)
    * Stopped: [ dc3mm ]
  * Clone Set: cln_SAPHanaCon_RH1_HDB02 [rsc_SAPHanaCon_RH1_HDB02] (promotable, unmanaged):
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc1hana2 (unmanaged)
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc1hana3 (unmanaged)
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc1hana4 (unmanaged)
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc2hana2 (unmanaged)
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc2hana3 (unmanaged)
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Promoted dc1hana1 (unmanaged)
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc2hana4 (unmanaged)
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc2hana1 (unmanaged)
    * Stopped: [ dc3mm ]
  * rsc_vip_RH1_HDB02_primary   (ocf:heartbeat:IPaddr2):         Started dc1hana1 (unmanaged)
  * rsc_vip_RH1_HDB02_readonly  (ocf:heartbeat:IPaddr2):         Started dc2hana1 (unmanaged)

Check the cluster attributes and verify that at least the sync_state attribute is SOK:

[root]# SAPHanaSR-showAttr

Global cib-time                 maintenance prim sec srHook sync_state upd
---------------------------------------------------------------------------
RH1    Fri Dec 19 09:41:39 2025 true        DC2  DC1 SOK    SOK        ok

…

Depending on the maintenance activity, the rest of the attribute information can be different or empty, for example, when you stopped and restarted the cluster on all nodes.

When the checks of the previous steps show the landscape in the expected healthy state, you can remove the maintenance mode of the cluster again:
```
[root]# pcs property set maintenance-mode=
```
When you lift the maintenance it triggers a monitor run of all resources again. The cluster updates the status of the promotable resources to Promoted and Unpromoted in correspondence to the location of the primary and secondary instances. The resources now also update the srPoll attribute again to match the srHook attribute value.

Verification

Check that the resources are managed again and are in the expected state on all nodes:

[root]# pcs resource status

  * Clone Set: cln_SAPHanaTop_RH1_HDB02 [rsc_SAPHanaTop_RH1_HDB02]:
    * Started: [ dc1hana1 dc1hana2 dc1hana3 dc1hana4 dc2hana1 dc2hana2 dc2hana3 dc2hana4 ]
  * Clone Set: cln_SAPHanaCon_RH1_HDB02 [rsc_SAPHanaCon_RH1_HDB02] (promotable):
    * Promoted: [ dc1hana1 ]
    * Unpromoted: [ dc1hana2 dc1hana3 dc1hana4 dc2hana1 dc2hana2 dc2hana3 dc2hana4 ]
  * rsc_vip_RH1_HDB02_primary   (ocf:heartbeat:IPaddr2):         Started dc1hana1
  * rsc_vip_RH1_HDB02_readonly  (ocf:heartbeat:IPaddr2):         Started dc2hana1

Verify that the srHook attribute value is SOK and that the nodes have the expected score values assigned:

[root]# SAPHanaSR-showAttr

Global cib-time                 maintenance prim sec srHook sync_state upd
---------------------------------------------------------------------------
RH1    Fri Dec 19 09:41:39 2025 true        DC2  DC1 SOK    SOK        ok

Sites lpt        lss mns      srr
----------------------------------
DC1   1766139865 4   dc1hana1 P
DC2   30         4   dc2hana1 S

Hosts    clone_state gra node_state roles                         score  site
------------------------------------------------------------------------------
dc1hana1 PROMOTED    2.0 online     master1:master:worker:master  150    DC1
dc1hana2 DEMOTED     2.0 online     master2:slave:worker:slave    140    DC1
dc1hana3 DEMOTED     2.0 online     slave:slave:worker:slave      -10000 DC1
dc1hana4 DEMOTED     2.0 online     master3:slave:standby:standby 140    DC1
dc2hana1 DEMOTED     2.0 online     master1:master:worker:master  100    DC2
dc2hana2 DEMOTED     2.0 online     master2:slave:worker:slave    80     DC2
dc2hana3 DEMOTED     2.0 online     slave:slave:worker:slave      -12200 DC2
dc2hana4 DEMOTED     2.0 online     master3:slave:standby:standby 80     DC2
dc3mm                    online

Troubleshooting

If the srHook attribute value is SFAIL at the end of the maintenance, then the scores for the secondary site nodes are reduced and prevent the cluster from triggering a takeover in the case of a failure. To review and fix this, see The srHook attribute is SFAIL while the system replication is healthy.

9.5. Registering the former primary HANA site as a secondary HANA site after a takeover
Copy link

When you configure AUTOMATED_REGISTER=false in the SAPHanaController resource, which is the default, you must manually register the former primary site as the new secondary after takeover and start it. Otherwise, the unregistered site remains stopped.

Procedure

Register the former primary site as the new secondary site. Run as user <sid>adm on one stopped former primary instance:
```
rh1adm$ hdbnsutil -sr_register --remoteHost=<node> \
--remoteInstance=${TINSTANCE} --replicationMode=sync \
--operationMode=logreplay --name=<site>
```
- Replace <node> with the new primary instance host, for example, dc2hana1 if there was a takeover from dc1hana1 to dc2hana1.
- Replace <site> with your new secondary HANA site name, for example, DC1 if dc1hana1 is to be registered as a secondary.
- Choose the values for replicationMode and operationMode according to your requirements for the system replication.
- $TINSTANCE is an environment variable that is set automatically for user <sid>adm by reading the HANA instance profile. The variable value is the HANA instance number.
Start the secondary HANA site. Run as <sid>adm on one new secondary instance node:
```
rh1adm$ sapcontrol -nr ${TINSTANCE} -function StartSystem HDB
```

Verification

On one new primary instance, show the current status of the re-established HANA system replication. Below is the example after a takeover from dc1hana1 to dc2hana1 and DC2 is the new primary site:

rh1adm$ cdpy; python systemReplicationStatus.py

|Database |Host     |Port  |Service Name |Volume ID |Site ID |Site Name |Secondary |Secondary |Secondary |Secondary |Secondary     |Replication |Replication |Replication    |Secondary    |
|         |         |      |             |          |        |          |Host      |Port      |Site ID   |Site Name |Active Status |Mode        |Status      |Status Details |Fully Synced |
|-------- |-------- |----- |------------ |--------- |------- |--------- |--------- |--------- |--------- |--------- |------------- |----------- |----------- |-------------- |------------ |
|RH1      |dc2hana2 |30203 |indexserver  |        4 |      2 |DC2       |dc1hana2  |    30203 |        1 |DC1       |YES           |SYNC        |ACTIVE      |               |        True |
|RH1      |dc2hana3 |30203 |indexserver  |        5 |      2 |DC2       |dc1hana3  |    30203 |        1 |DC1       |YES           |SYNC        |ACTIVE      |               |        True |
|SYSTEMDB |dc2hana1 |30201 |nameserver   |        1 |      2 |DC2       |dc1hana1  |    30201 |        1 |DC1       |YES           |SYNC        |ACTIVE      |               |        True |
|RH1      |dc2hana1 |30207 |xsengine     |        2 |      2 |DC2       |dc1hana1  |    30207 |        1 |DC1       |YES           |SYNC        |ACTIVE      |               |        True |
|RH1      |dc2hana1 |30203 |indexserver  |        3 |      2 |DC2       |dc1hana1  |    30203 |        1 |DC1       |YES           |SYNC        |ACTIVE      |               |        True |

status system replication site "1": ACTIVE
overall system replication status: ACTIVE

Local System Replication State
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

mode: PRIMARY
site id: 2
site name: DC2

Chapter 10. Troubleshooting
Copy link

10.1. The srHook cluster attribute value is incorrect
Copy link

When the srHook attribute value does not match the actual HANA system replication status, it can lead to unexpected behavior in the cluster when a failure of a primary instance occurs.

Check and correct your sudo configuration when the srHook attribute of the secondary site and the HANA system replication status do not match:

The srHook cluster attribute of the secondary is empty.
The srHook cluster attribute of the secondary is set to SOK while the HANA system replication is not healthy.
The srHook cluster attribute of the secondary is set to SFAIL while the system replication is in ACTIVE state.

The primary site receives the events of HANA system replication changes and stores the result as a cluster attribute for the secondary site.

Procedure

Check for crm_attribute update errors in the secure log, since the command is executed using sudo. The log shows the command that the hook script tries to execute, but potentially fails. Check on the primary instance node for an error like command not allowed, like in this example:
```
[root]# grep crm_attribute /var/log/secure
```
```
... rh1adm : command not allowed ; PWD=/hana/shared/RH1/HDB02/<node> ; USER=root ; COMMAND=/usr/sbin/crm_attribute -n hana_rh1_site_srHook_DC2 -v SFAIL -t crm_config -s SAPHanaSR
```
Compare the logged COMMAND to your sudoers configuration. Check thoroughly and fix the sudoers file, so that you have a sudo entry that matches the command. As a temporary measure you can ensure that the sudo entry as such works by simplifying it with a wildcard to exclude typos in the command parameters as the cause:
```
[root]# cat /etc/sudoers.d/20-saphana
```
```
Defaults:<sid>adm !requiretty
<sid>adm ALL=(ALL) NOPASSWD: /usr/sbin/crm_attribute *
```
- Replace <sid> with your lower-case HANA SID.

Verify that the command path is correct:

[root]# ls /usr/sbin/crm_attribute

/usr/sbin/crm_attribute

Fix the sudo configuration. For more information, see Configuring the SAPHanaSR HA/DR provider for the srConnectionChanged() hook method.
Repeat any fixing steps on all nodes. The sudo configuration must be identical on all instances.

10.2. The HANA instance does not start after hook changes
Copy link

You recently made changes in the global.ini in a HA/DR provider section and the HANA instance does not start anymore.

Procedure

Go to the HANA trace logs directory, as the <sid>adm user:
```
rh1adm$ cdtrace
```

Check for errors related to the HA/DR providers in the HANA nameserver process alert log:

rh1adm$ grep ha_dr_provider nameserver_alert_*.trc

... ha_dr_provider   PythonProxyImpl.cpp(00145) : import of saphanasr failed: No module named 'saphanasr'
... ha_dr_provider   HADRProviderManager.cpp(00100) : could not load HA/DR Provider 'saphanasr' from /usr/share/SAPHanaSR-ScaleOut

Identify the root cause, for example a misspelled HA/DR provider name or a wrong path. Check the path and the hook script name. In this example the HA/DR provider name saphanasr is not matching the hook script name SAPHanaSR:
```
rh1adm$ ls /usr/share/SAPHanaSR-ScaleOut/
```
```
ChkSrv.py  SAPHanaSR.py  SAPHanaSrMultiTarget.py  samples
```
Correct the SAPHanaSR HA/DR provider configuration:
```
[ha_dr_provider_SAPHanaSR]
provider = SAPHanaSR
path = /usr/share/SAPHanaSR-ScaleOut
execution_order = 1
```
- provider must match the name of the Python hook script. It is case-sensitive without the .py file suffix.
- path must be the path in which the hook script is stored.

10.3. A cluster node is reported as offline during maintenance
Copy link

When maintenance-mode is set for the cluster, for example, for a HANA update, it can still notice issues between the nodes, but does not trigger recovery actions yet.

If you encounter such a situation, you must first fix the cause of the issue before you lift the maintenance mode.

Example: the corosync communication between the nodes is blocked in a 8-node cluster

If the maintenance mode is removed in this situation, the cluster tries to recover the issue by itself. This can have a severe impact on your ongoing HANA maintenance activity.

...
              * Resource management is DISABLED *
  The cluster will not attempt to start, stop or recover services

Node List:
  * Node dc2hana3: UNCLEAN (offline)
  * Online: [ dc1hana1 dc1hana2 dc1hana3 dc1hana4 dc2hana1 dc2hana2 dc2hana4 dc3mm ]

Full List of Resources:
  * rsc_fence       (stonith:<fence agent>):     Started dc1hana1 (maintenance)
  * Clone Set: cln_SAPHanaTop_RH1_HDB02 [rsc_SAPHanaTop_RH1_HDB02] (maintenance):
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc1hana2 (maintenance)
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc1hana3 (maintenance)
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc1hana4 (maintenance)
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc2hana1 (maintenance)
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc2hana2 (maintenance)
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc2hana3 (UNCLEAN, maintenance)
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc2hana4 (maintenance)
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc1hana1 (maintenance)
    * Stopped: [ dc2hana3 dc3mm ]
  * Clone Set: cln_SAPHanaCon_RH1_HDB02 [rsc_SAPHanaCon_RH1_HDB02] (promotable, maintenance):
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc1hana2 (maintenance)
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc1hana3 (maintenance)
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc1hana4 (maintenance)
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc2hana1 (maintenance)
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc2hana2 (maintenance)
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc2hana3 (UNCLEAN, maintenance)
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc2hana4 (maintenance)
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Promoted dc1hana1 (maintenance)
    * Stopped: [ dc2hana3 dc3mm ]
  * rsc_vip_RH1_HDB02_primary   (ocf:heartbeat:IPaddr2):         Started dc1hana1 (maintenance)
  * rsc_vip_RH1_HDB02_readonly  (ocf:heartbeat:IPaddr2):         Started dc2hana1 (maintenance)


...

Identify the root cause of the issue, for example:

Planned network maintenance on the cluster communication connection in parallel to your HANA maintenance.
Unplanned outage of network connections due to network device failures or misconfiguration on operating system or network level.
Firewall configuration blocking cluster communication ports.

Fix any issue to prevent the cluster from taking recovery measures when the cluster maintenance is removed.

10.4. The srHook attribute is SFAIL while the system replication is healthy
Copy link

An inconsistency between the actual HANA system replication state and the srHook cluster node attribute can occur, when the cluster is running on the primary instance node while the system replication fails, for example, during a maintenance. HANA triggers the hook that updates the srHook attribute with the SFAIL value. If the cluster is then stopped on the primary instance node and the HANA system replication recovers to a healthy state, the hook is correctly executed by HANA, but the update of the cluster node attribute fails.

The primary HANA instance only triggers the srConnectionChanged() hook when there is a new change of the system replication status.

The sync_state attribute is set based on an active check and functions as a fallback when the srHook value is empty. However, when the values are different, then the SAPHanaController resource uses the srHook attribute to take the decision if a takeover is possible or not. As a result, if the srHook attribute is SFAIL despite a healthy HANA system replication state, the cluster will not trigger the takeover to the secondary site at the next failure on the primary site.

To solve this conflict, you can delete the incorrect srHook attribute. Afterwards the cluster uses the sync_state attribute for decisions, and the srHook attribute is updated and used again after the next change of the HANA system replication status.

Procedure

Use the systemReplicationStatus.py script to check the status of the HANA system replication on the primary site:

[root]# su - <sid>adm -c "HDBSettings.sh systemReplicationStatus.py \
--sapcontrol=1 | grep -i replication_status="

service/dc1hana3/30203/REPLICATION_STATUS=ACTIVE
service/dc1hana2/30203/REPLICATION_STATUS=ACTIVE
service/dc1hana1/30201/REPLICATION_STATUS=ACTIVE
service/dc1hana1/30207/REPLICATION_STATUS=ACTIVE
service/dc1hana1/30203/REPLICATION_STATUS=ACTIVE
site/2/REPLICATION_STATUS=ACTIVE
overall_replication_status=ACTIVE

Before you proceed, ensure that the system replication is healthy and reported as ACTIVE.

Review the sync_state and srHook attributes and the node score values during the conflict:

[root]# SAPHanaSR-showAttr

Global cib-time                 prim sec srHook sync_state upd
---------------------------------------------------------------
RH1    Fri Dec 19 11:12:42 2025 DC1  DC1 SFAIL  SOK        ok

Sites lpt        lss mns      srr
----------------------------------
DC1   1766142750 4   dc1hana1 P
DC2   10         4   dc2hana1 S

Hosts    clone_state gra node_state roles                         score     site
---------------------------------------------------------------------------------
dc1hana1 PROMOTED    2.0 online     master1:master:worker:master  150       DC1
dc1hana2 DEMOTED     2.0 online     master2:slave:worker:slave    140       DC1
dc1hana3 DEMOTED     2.0 online     slave:slave:worker:slave      -10000    DC1
dc1hana4 DEMOTED     2.0 online     master3:slave:standby:standby 140       DC1
dc2hana1 DEMOTED     2.0 online     master1:master:worker:master  -INFINITY DC2
dc2hana2 DEMOTED     2.0 online     master2:slave:worker:slave    -32300    DC2
dc2hana3 DEMOTED     2.0 online     slave:slave:worker:slave      -22200    DC2
dc2hana4 DEMOTED     2.0 online     master3:slave:standby:standby -32300    DC2
dc3mm                    online

In this state, the sync_state attribute is correct, but the srHook attribute takes precedence. Therefore, the secondary site is excluded from taking over if the primary site fails.

Delete the srHook attribute to solve the conflict:

[root]# crm_attribute --type crm_config -n hana_<sid>_glob_srHook --delete

Deleted crm_config option: id=SAPHanaSR-hana_rh1_glob_srHook name=hana_rh1_glob_srHook

Verification

Check the attributes summary and note, that the srHook attribute is missing and that the node scores are updated to enable an automatic takeover again using the sync_state attribute status:

[root]# SAPHanaSR-showAttr

Global cib-time                 prim sec sync_state upd
--------------------------------------------------------
RH1    Fri Dec 19 11:17:59 2025 DC1  DC1 SOK        ok

Sites lpt        lss mns      srr
----------------------------------
DC1   1766143077 4   dc1hana1 P
DC2   30         4   dc2hana1 S

Hosts    clone_state gra node_state roles                         score  site
------------------------------------------------------------------------------
dc1hana1 PROMOTED    2.0 online     master1:master:worker:master  150    DC1
dc1hana2 DEMOTED     2.0 online     master2:slave:worker:slave    140    DC1
dc1hana3 DEMOTED     2.0 online     slave:slave:worker:slave      -10000 DC1
dc1hana4 DEMOTED     2.0 online     master3:slave:standby:standby 140    DC1
dc2hana1 DEMOTED     2.0 online     master1:master:worker:master  100    DC2
dc2hana2 DEMOTED     2.0 online     master2:slave:worker:slave    80     DC2
dc2hana3 DEMOTED     2.0 online     slave:slave:worker:slave      -12200 DC2
dc2hana4 DEMOTED     2.0 online     master3:slave:standby:standby 80     DC2
dc3mm                    online

Appendix A. Component options
Copy link

A.1. HA/DR provider options for SAPHanaSR
Copy link

Parameters that are available for the configuration of the SAPHanaSR HA/DR provider are shown below:

Expand

Provider options	Required	Description
`provider = SAPHanaSR`	yes	The provider parameter must be set to the hook script name without the `.py` ending.
`path = /usr/share/SAPHanaSR-ScaleOut`	yes	The full path to the location of the hook script.
`execution_order = 1`	yes	Set to `1` to ensure that the `SAPHanaSR` HA/DR provider is always executed before other optional hook scripts when an event occurs.

A.2. HA/DR provider options for ChkSrv
Copy link

Parameters that are available for the configuration of the ChkSrv HA/DR provider are shown below:

Expand

Provider options	Required	Default	Description
`provider = ChkSrv`	yes		The provider parameter must be set to the hook script name without the `.py` ending.
`path = /usr/share/SAPHanaSR-ScaleOut`	yes		The full path to the location of the hook script.
`execution_order = 2`	yes		Set to `2` or higher. The mandatory `SAPHanaSR` provider must always be run first.
`action_on_lost`	no	ignore	Action to be triggered when a lost indexserver is identified. `ignore`: do nothing, just write to trace files. `stop`: execute `sapcontrol … StopSystem`. `kill`: execute `HDB kill-<signal>`. The signal can be defined by the separate parameter `kill_signal`.
`kill_signal`	no	9	The signal that is used with the `kill` action.
`stop_timeout`	no	20s	How many seconds to wait for the `stop` action to return. This value must be greater than the value of the HANA parameter `forcedterminationtimeout`.

A.3. SAPHanaTopology resource parameters
Copy link

Parameters that are available for the configuration of SAPHanaTopology resources are shown below:

Expand

Resource options Required Default Description

SID

yes

SAP system identifier.

InstanceNumber

yes

Number of the SAP HANA instance.

HANA_CALL_TIMEOUT

120

Defines the timeout - how long a call to HANA to receive information can take, for example, when the resource agent executes landscapeHostConfiguration.py. There are some specific calls to HANA which have their own timeout values.

If you increase the timeout for HANA calls of this resource you must also consider increasing the operation timeout values of the same resource.

A.4. SAPHanaController resource parameters
Copy link

Parameters that are available for the configuration of SAPHanaController resources are shown below:

Expand

Resource options	Required	Default	Description
`SID`	yes		SAP system identifier.
`InstanceNumber`	yes		Number of the SAP HANA instance.
`DIR_EXECUTABLE`	no		The fully qualified path to binaries such as `sapstartsrv` and `sapcontrol`. Specify this parameter if you have changed the SAP kernel directory location after the default SAP installation. SAP standard paths are searched by default.
`DIR_PROFILE`	no		The fully qualified path to the SAP START profile. Specify this parameter if you have changed the SAP profile directory location after the default SAP installation. SAP standard paths are searched by default.
`HANA_CALL_TIMEOUT`	no	120	Defines the timeout - how long a call to HANA to receive information can take, for example, when the resource agent runs `landscapeHostConfiguration.py`. There are some specific calls to HANA which have their own timeout values. If you increase the timeout for HANA calls of this resource you must also consider increasing the operation timeout values of the same resource.
`INSTANCE_PROFILE`	no		The name of the SAP HANA instance profile. Specify this parameter if you have changed the name of the SAP HANA instance profile after the default SAP installation. SAP standard paths are searched by default.
`PREFER_SITE_TAKEOVER`	no	false	Defines whether the resource agent should prefer to trigger takeover to the secondary site instead of restarting the primary site locally. However, a takeover is only triggered if the SAP HANA landscape status is on `ERROR`. For `FATAL` a local restart is initiated. `true`: Prefer to trigger the takeover to the secondary site. `false`: Prefer to restart locally.
`AUTOMATED_REGISTER`	no	false	Defines whether the resource agent automatically registers a former primary instance as a secondary during cluster resource start and if the `DUPLICATE_PRIMARY_TIMEOUT` condition is met. Registering an instance as a secondary initiates a data synchronisation from the primary and might overwrite local data. `false`: Manual intervention is required. `true`: The former primary is automatically registered as a secondary.
`DUPLICATE_PRIMARY_TIMEOUT`	no	7200	The time difference required between two primary time stamps (LPTs), in case a dual-primary situation occurs. If the difference between both nodes' last primary time stamps is less than `DUPLICATE_PRIMARY_TIMEOUT`, then the cluster holds one or both instances in a `WAITING` status. This gives an admin the chance to react to a failover. How the recovery proceeds after the `DUPLICATE_PRIMARY_TIMEOUT` has passed depends on the parameter `AUTOMATED_REGISTER`.

We recommend that you set PREFER_SITE_TAKEOVER to true. This allows the HA cluster to trigger a takeover when a failure of the primary HANA instance is detected. In most cases it takes less time for the new HANA primary instance to become fully active after a takeover, than it takes for the original primary instance to restart and reload all data back from disk into memory.

Leave AUTOMATED_REGISTER set to false to give the operator the option to first verify the health and data consistency of the previously failed primary instance, before you manually register it as the new secondary instance to re-establish the HANA system replication between both instances and manually start the instance.

Set AUTOMATED_REGISTER to true to enable the automatic registration of the former primary instance as the new secondary after a takeover occurs. This increases the availability of the HANA system replication setup and prevents so-called dual-primary situations in the SAP HANA system replication environment.

Appendix B. Useful information
Copy link

B.1. Explaining cluster information
Copy link

When the cluster resources start they execute the first monitor operation and gather initial resource information. The HANA resources add node attributes and cluster properties for the collected landscape information, which describes the current state of SAP HANA databases on the cluster nodes.

Scale-out cluster status with node attributes

[root]# pcs status --full

Cluster name: hana-scaleout-cluster
Status of pacemakerd: 'Pacemaker is running' (last updated 2025-12-18 16:27:04Z)
Cluster Summary:
  * Stack: corosync
  * Current DC: dc2hana1 (5) (version 2.1.5-9.el9_2.5-a3f44794f94) - partition with quorum
  * Last updated: Thu Dec 18 16:27:05 2025
  * Last change:  Thu Dec 18 16:26:37 2025 by root via crm_attribute on dc1hana1
  * 9 nodes configured
  * 19 resource instances configured

Node List:
  * Node dc1hana1 (1): online, feature set 3.16.2
  * Node dc1hana2 (2): online, feature set 3.16.2
  * Node dc1hana3 (3): online, feature set 3.16.2
  * Node dc1hana4 (4): online, feature set 3.16.2
  * Node dc2hana1 (5): online, feature set 3.16.2
  * Node dc2hana2 (6): online, feature set 3.16.2
  * Node dc2hana3 (7): online, feature set 3.16.2
  * Node dc2hana4 (8): online, feature set 3.16.2
  * Node dc3mm (9): online, feature set 3.16.2

Full List of Resources:
  * rsc_fence       (stonith:<fence agent>):     Started dc1hana2
  * Clone Set: cln_SAPHanaTop_RH1_HDB02 [rsc_SAPHanaTop_RH1_HDB02]:
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc2hana3
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc1hana1
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc1hana2
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc1hana3
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc2hana1
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc2hana2
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc1hana4
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc2hana4
  * Clone Set: cln_SAPHanaCon_RH1_HDB02 [rsc_SAPHanaCon_RH1_HDB02] (promotable):
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc2hana3
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Promoted dc1hana1
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc1hana2
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc1hana3
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc2hana1
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc2hana2
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc1hana4
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc2hana4
  * rsc_vip_RH1_HDB02_primary   (ocf:heartbeat:IPaddr2):         Started dc1hana1
  * rsc_vip_RH1_HDB02_readonly  (ocf:heartbeat:IPaddr2):         Started dc2hana1

Node Attributes:
  * Node: dc1hana1 (1):
    * hana_rh1_clone_state              : PROMOTED
    * hana_rh1_gra                      : 2.0
    * hana_rh1_gsh                      : 1.0
    * hana_rh1_roles                    : master1:master:worker:master
    * hana_rh1_site                     : DC1
    * hana_rh1_sra                      : -
    * master-rsc_SAPHanaCon_RH1_HDB02   : 150
  * Node: dc1hana2 (2):
    * hana_rh1_clone_state              : DEMOTED
    * hana_rh1_gra                      : 2.0
    * hana_rh1_gsh                      : 1.0
    * hana_rh1_roles                    : master2:slave:worker:slave
    * hana_rh1_site                     : DC1
    * master-rsc_SAPHanaCon_RH1_HDB02   : 140
  * Node: dc1hana3 (3):
    * hana_rh1_clone_state              : DEMOTED
    * hana_rh1_gra                      : 2.0
    * hana_rh1_gsh                      : 1.0
    * hana_rh1_roles                    : slave:slave:worker:slave
    * hana_rh1_site                     : DC1
    * master-rsc_SAPHanaCon_RH1_HDB02   : -10000
  * Node: dc1hana4 (4):
    * hana_rh1_clone_state              : DEMOTED
    * hana_rh1_gra                      : 2.0
    * hana_rh1_gsh                      : 1.0
    * hana_rh1_roles                    : master3:slave:standby:standby
    * hana_rh1_site                     : DC1
    * master-rsc_SAPHanaCon_RH1_HDB02   : 140
  * Node: dc2hana1 (5):
    * hana_rh1_clone_state              : DEMOTED
    * hana_rh1_gra                      : 2.0
    * hana_rh1_gsh                      : 1.0
    * hana_rh1_roles                    : master1:master:worker:master
    * hana_rh1_site                     : DC2
    * hana_rh1_sra                      : -
    * master-rsc_SAPHanaCon_RH1_HDB02   : 100
  * Node: dc2hana2 (6):
    * hana_rh1_clone_state              : DEMOTED
    * hana_rh1_gra                      : 2.0
    * hana_rh1_gsh                      : 1.0
    * hana_rh1_roles                    : master2:slave:worker:slave
    * hana_rh1_site                     : DC2
    * master-rsc_SAPHanaCon_RH1_HDB02   : 80
  * Node: dc2hana3 (7):
    * hana_rh1_clone_state              : DEMOTED
    * hana_rh1_gra                      : 2.0
    * hana_rh1_gsh                      : 1.0
    * hana_rh1_roles                    : slave:slave:worker:slave
    * hana_rh1_site                     : DC2
    * master-rsc_SAPHanaCon_RH1_HDB02   : -12200
  * Node: dc2hana4 (8):
    * hana_rh1_clone_state              : DEMOTED
    * hana_rh1_gra                      : 2.0
    * hana_rh1_gsh                      : 1.0
    * hana_rh1_roles                    : master3:slave:standby:standby
    * hana_rh1_site                     : DC2
    * master-rsc_SAPHanaCon_RH1_HDB02   : 80

Migration Summary:

Tickets:

PCSD Status:
  dc1hana1: Online
  dc1hana2: Online
  dc1hana3: Online
  dc1hana4: Online
  dc2hana1: Online
  dc2hana2: Online
  dc2hana3: Online
  dc2hana4: Online
  dc3mm: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Cluster properties

The new generation of SAP HANA resource agents stores information about the HANA instance as cluster properties in the property set named SAPHanaSR. You can query the CIB (cluster information base) as the root user to check the content of the cluster attributes. The HANA resources and the SAPHanaSR hook update these attributes.

[root]# cibadmin --query --xpath "//crm_config//cluster_property_set[@id='SAPHanaSR']"

<cluster_property_set id="SAPHanaSR">

</cluster_property_set>

SAPHanaSR-showAttr

You can use the tool SAPHanaSR-showAttr to display all of the HANA cluster attribute information in a preformatted overview.

Check this status in addition to pcs status [--full] to see the overall landscape health.

[root]# SAPHanaSR-showAttr

Global cib-time                 prim sec srHook sync_state upd
---------------------------------------------------------------
RH1    Thu Dec 18 16:28:49 2025 DC1  DC2 SOK    SOK        ok

Sites lpt        lss mns      srr
----------------------------------
DC1   1766075329 4   dc1hana1 P
DC2   30         4   dc2hana1 S

Hosts    clone_state gra gsh node_state roles                         score  site sra
--------------------------------------------------------------------------------------
dc1hana1 PROMOTED    2.0 1.0 online     master1:master:worker:master  150    DC1  -
dc1hana2 DEMOTED     2.0 1.0 online     master2:slave:worker:slave    140    DC1
dc1hana3 DEMOTED     2.0 1.0 online     slave:slave:worker:slave      -10000 DC1
dc1hana4 DEMOTED     2.0 1.0 online     master3:slave:standby:standby 140    DC1
dc2hana1 DEMOTED     2.0 1.0 online     master1:master:worker:master  100    DC2  -
dc2hana2 DEMOTED     2.0 1.0 online     master2:slave:worker:slave    80     DC2
dc2hana3 DEMOTED     2.0 1.0 online     slave:slave:worker:slave      -12200 DC2
dc2hana4 DEMOTED     2.0 1.0 online     master3:slave:standby:standby 80     DC2
dc3mm                        online

Legal Notice
Copy link

Except as otherwise noted below, the text of and illustrations in this documentation are licensed by Red Hat under the Creative Commons Attribution–Share Alike 3.0 Unported license . If you distribute this document or an adaptation of it, you must provide the URL for the original version.

Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law.

Red Hat, the Red Hat logo, JBoss, Hibernate, and RHCE are trademarks or registered trademarks of Red Hat, LLC. or its subsidiaries in the United States and other countries.

Linux® is the registered trademark of Linus Torvalds in the United States and other countries.

XFS is a trademark or registered trademark of Hewlett Packard Enterprise Development LP or its subsidiaries in the United States and other countries.

The OpenStack® Word Mark and OpenStack logo are trademarks or registered trademarks of the Linux Foundation, used under license.

All other trademarks are the property of their respective owners.

Automating SAP HANA Scale-Out System Replication using the RHEL HA Add-On

Creating an HA cluster for automating scale-out HANA system replication with the classic dedicated SAP HANA scale-out resource agents

Providing feedback on Red Hat documentationCopy linkLink copied to clipboard!

Chapter 1. Introduction to SAP HANA scale-out system replication HACopy linkLink copied to clipboard!

1.1. TerminologyCopy linkLink copied to clipboard!

1.2. Performance-optimized SAP HANA scale-out HACopy linkLink copied to clipboard!

1.3. Cluster resource agents and tools for SAP HANA HACopy linkLink copied to clipboard!

1.4. SAP HANA HA/DR provider hooksCopy linkLink copied to clipboard!

1.5. Support policies for SAP HANA High AvailabilityCopy linkLink copied to clipboard!

Chapter 2. Planning the HA cluster setupCopy linkLink copied to clipboard!

2.1. Subscription and repositories for SAP HANA HACopy linkLink copied to clipboard!

2.2. Operating system requirementsCopy linkLink copied to clipboard!

2.3. Storage requirementsCopy linkLink copied to clipboard!

2.4. Network requirementsCopy linkLink copied to clipboard!

2.5. HA cluster requirementsCopy linkLink copied to clipboard!

2.6. SAP HANA planningCopy linkLink copied to clipboard!

Chapter 3. Installing SAP HANA scale-out for a 8-node HA cluster setupCopy linkLink copied to clipboard!

3.1. Managing the firewalld serviceCopy linkLink copied to clipboard!

3.1.1. Disabling the firewalld serviceCopy linkLink copied to clipboard!

3.1.2. Configuring the firewalld service for the SAP landscapeCopy linkLink copied to clipboard!

3.2. Configuring the host names in /etc/hostsCopy linkLink copied to clipboard!

3.3. Configuring the shared SAP filesystemsCopy linkLink copied to clipboard!

3.4. Creating the SAP administrative user and groupCopy linkLink copied to clipboard!

3.5. Configuring SSH public-key access for root for all cluster nodes (optional)Copy linkLink copied to clipboard!

3.6. Installing a scale-out SAP HANA instanceCopy linkLink copied to clipboard!

3.7. Disabling SAP HANA instance autostartCopy linkLink copied to clipboard!

Chapter 4. Configuring the SAP HANA system replicationCopy linkLink copied to clipboard!

4.1. Prerequisites for the SAP HANA system replication setupCopy linkLink copied to clipboard!

4.2. Performing an initial HANA database backupCopy linkLink copied to clipboard!

4.3. Configuring the primary HANA replication siteCopy linkLink copied to clipboard!

4.4. Configuring the secondary HANA replication siteCopy linkLink copied to clipboard!

4.5. Testing the HANA system replicationCopy linkLink copied to clipboard!

Chapter 5. Configuring the Pacemaker clusterCopy linkLink copied to clipboard!

5.1. Deploying the basic cluster configurationCopy linkLink copied to clipboard!

5.2. Configuring general cluster propertiesCopy linkLink copied to clipboard!

5.3. Configuring the systemd-based SAP startup frameworkCopy linkLink copied to clipboard!

5.4. Installing the SAP HANA HA componentsCopy linkLink copied to clipboard!

5.5. Configuring the SAPHanaSR HA/DR provider for the srConnectionChanged() hook methodCopy linkLink copied to clipboard!

5.6. Configuring the ChkSrv HA/DR provider for the srServiceStateChanged() hook methodCopy linkLink copied to clipboard!

5.7. Creating the HANA cluster resourcesCopy linkLink copied to clipboard!

5.8. Creating the virtual IP resourceCopy linkLink copied to clipboard!

5.9. Adding a secondary (read-enabled) virtual IP addressCopy linkLink copied to clipboard!

Chapter 6. Configuring a quorum device in the clusterCopy linkLink copied to clipboard!

6.1. Configuring a qdevice for cluster quorumCopy linkLink copied to clipboard!

6.1.1. Preparing the quorum device hostCopy linkLink copied to clipboard!

6.1.2. Configuring a qdevice on a quorum device hostCopy linkLink copied to clipboard!

6.1.3. Configuring a qdevice in the clusterCopy linkLink copied to clipboard!

6.2. Configuring a majority-maker node for cluster quorumCopy linkLink copied to clipboard!

6.2.1. Preparing the majority-maker nodeCopy linkLink copied to clipboard!

6.2.2. Updating the host names in /etc/hostsCopy linkLink copied to clipboard!

6.2.3. Updating the cluster clone resourcesCopy linkLink copied to clipboard!

6.2.4. Installing the cluster components on the majority-maker nodeCopy linkLink copied to clipboard!

6.2.5. Adding the new node to the clusterCopy linkLink copied to clipboard!

Chapter 7. Testing the setupCopy linkLink copied to clipboard!

7.1. Detecting the system replication state changesCopy linkLink copied to clipboard!

7.2. Triggering the indexserver crash recoveryCopy linkLink copied to clipboard!

7.3. Triggering a HANA takeover using cluster commandsCopy linkLink copied to clipboard!

7.4. Crashing the node with a primary instanceCopy linkLink copied to clipboard!

7.5. Crashing the node with a secondary instanceCopy linkLink copied to clipboard!

7.6. Stopping the primary site using SAP commandsCopy linkLink copied to clipboard!

7.7. Stopping the secondary site using SAP commandsCopy linkLink copied to clipboard!

Chapter 8. Finishing the setupCopy linkLink copied to clipboard!

8.1. Enabling the automatic registration of HANA after a takeover (optional)Copy linkLink copied to clipboard!

8.2. Reviewing the final cluster stateCopy linkLink copied to clipboard!

Chapter 9. Maintenance proceduresCopy linkLink copied to clipboard!

9.1. Cleaning up the failure historyCopy linkLink copied to clipboard!

9.2. Triggering a HANA takeover using cluster commandsCopy linkLink copied to clipboard!

9.3. Updating the operating system and HA cluster componentsCopy linkLink copied to clipboard!

9.4. Performing maintenance on the SAP HANA instancesCopy linkLink copied to clipboard!

9.5. Registering the former primary HANA site as a secondary HANA site after a takeoverCopy linkLink copied to clipboard!

Chapter 10. TroubleshootingCopy linkLink copied to clipboard!

10.1. The srHook cluster attribute value is incorrectCopy linkLink copied to clipboard!

10.2. The HANA instance does not start after hook changesCopy linkLink copied to clipboard!

10.3. A cluster node is reported as offline during maintenanceCopy linkLink copied to clipboard!

10.4. The srHook attribute is SFAIL while the system replication is healthyCopy linkLink copied to clipboard!

Appendix A. Component optionsCopy linkLink copied to clipboard!

A.1. HA/DR provider options for SAPHanaSRCopy linkLink copied to clipboard!

A.2. HA/DR provider options for ChkSrvCopy linkLink copied to clipboard!

A.3. SAPHanaTopology resource parametersCopy linkLink copied to clipboard!

A.4. SAPHanaController resource parametersCopy linkLink copied to clipboard!

Providing feedback on Red Hat documentation
Copy link

Chapter 1. Introduction to SAP HANA scale-out system replication HA
Copy link

1.1. Terminology
Copy link

1.2. Performance-optimized SAP HANA scale-out HA
Copy link

1.3. Cluster resource agents and tools for SAP HANA HA
Copy link

1.4. SAP HANA HA/DR provider hooks
Copy link

1.5. Support policies for SAP HANA High Availability
Copy link

Chapter 2. Planning the HA cluster setup
Copy link

2.1. Subscription and repositories for SAP HANA HA
Copy link

2.2. Operating system requirements
Copy link

2.3. Storage requirements
Copy link

2.4. Network requirements
Copy link

2.5. HA cluster requirements
Copy link

2.6. SAP HANA planning
Copy link

Chapter 3. Installing SAP HANA scale-out for a 8-node HA cluster setup
Copy link

3.1. Managing the firewalld service
Copy link

3.1.1. Disabling the firewalld service
Copy link

3.1.2. Configuring the firewalld service for the SAP landscape
Copy link

3.2. Configuring the host names in /etc/hosts
Copy link

3.3. Configuring the shared SAP filesystems
Copy link

3.4. Creating the SAP administrative user and group
Copy link

3.5. Configuring SSH public-key access for root for all cluster nodes (optional)
Copy link

3.6. Installing a scale-out SAP HANA instance
Copy link

3.7. Disabling SAP HANA instance autostart
Copy link

Chapter 4. Configuring the SAP HANA system replication
Copy link

4.1. Prerequisites for the SAP HANA system replication setup
Copy link

4.2. Performing an initial HANA database backup
Copy link

4.3. Configuring the primary HANA replication site
Copy link

4.4. Configuring the secondary HANA replication site
Copy link

4.5. Testing the HANA system replication
Copy link

Chapter 5. Configuring the Pacemaker cluster
Copy link

5.1. Deploying the basic cluster configuration
Copy link

5.2. Configuring general cluster properties
Copy link

5.3. Configuring the systemd-based SAP startup framework
Copy link

5.4. Installing the SAP HANA HA components
Copy link

5.5. Configuring the SAPHanaSR HA/DR provider for the srConnectionChanged() hook method
Copy link

5.6. Configuring the ChkSrv HA/DR provider for the srServiceStateChanged() hook method
Copy link

5.7. Creating the HANA cluster resources
Copy link

5.8. Creating the virtual IP resource
Copy link

5.9. Adding a secondary (read-enabled) virtual IP address
Copy link

Chapter 6. Configuring a quorum device in the cluster
Copy link

6.1. Configuring a qdevice for cluster quorum
Copy link

6.1.1. Preparing the quorum device host
Copy link

6.1.2. Configuring a qdevice on a quorum device host
Copy link

6.1.3. Configuring a qdevice in the cluster
Copy link

6.2. Configuring a majority-maker node for cluster quorum
Copy link

6.2.1. Preparing the majority-maker node
Copy link

6.2.2. Updating the host names in /etc/hosts
Copy link

6.2.3. Updating the cluster clone resources
Copy link

6.2.4. Installing the cluster components on the majority-maker node
Copy link

6.2.5. Adding the new node to the cluster
Copy link

Chapter 7. Testing the setup
Copy link

7.1. Detecting the system replication state changes
Copy link

7.2. Triggering the indexserver crash recovery
Copy link

7.3. Triggering a HANA takeover using cluster commands
Copy link

7.4. Crashing the node with a primary instance
Copy link

7.5. Crashing the node with a secondary instance
Copy link

7.6. Stopping the primary site using SAP commands
Copy link

7.7. Stopping the secondary site using SAP commands
Copy link

Chapter 8. Finishing the setup
Copy link

8.1. Enabling the automatic registration of HANA after a takeover (optional)
Copy link

8.2. Reviewing the final cluster state
Copy link

Chapter 9. Maintenance procedures
Copy link

9.1. Cleaning up the failure history
Copy link

9.2. Triggering a HANA takeover using cluster commands
Copy link

9.3. Updating the operating system and HA cluster components
Copy link

9.4. Performing maintenance on the SAP HANA instances
Copy link

9.5. Registering the former primary HANA site as a secondary HANA site after a takeover
Copy link

Chapter 10. Troubleshooting
Copy link

10.1. The srHook cluster attribute value is incorrect
Copy link

10.2. The HANA instance does not start after hook changes
Copy link

10.3. A cluster node is reported as offline during maintenance
Copy link

10.4. The srHook attribute is SFAIL while the system replication is healthy
Copy link

Appendix A. Component options
Copy link

A.1. HA/DR provider options for SAPHanaSR
Copy link

A.2. HA/DR provider options for ChkSrv
Copy link

A.3. SAPHanaTopology resource parameters
Copy link

A.4. SAPHanaController resource parameters
Copy link

Appendix B. Useful information
Copy link

B.1. Explaining cluster information
Copy link