此内容没有您所选择的语言版本。

Chapter 1. Introduction to SAP HANA scale-out system replication HA


Configuring the SAP HANA system replication between two identical HANA sites enables a basic resiliency of the database. You can configure these two sites in a Pacemaker cluster for advanced high availability that automatically handles the service recovery in the case of a failure on the primary instance side.

1.1. Terminology

  • node

    One host or system in a HA cluster setup, also called a cluster member.

  • cluster

    Cluster is the high-availability setup using the Pacemaker cluster manager from the RHEL HA Add-On. It consists of two or more members, or nodes.

  • instance

    One set of SAP HANA systems that belong to one HANA site. In single-host (scale-up) HANA environments, one HANA site consists of a single HANA instance. In multiple-host (scale-out) HANA configurations, each HANA site consists of two or more HANA instances.

  • primary

    The primary HANA instance or primary site refers to the instance which is the active HANA instance or site. In single-host setups (scale-up), this is one system. In multiple-host (scale-out) setups, the primary database stretches across multiple systems of one HANA site and the systems have different roles in the HANA environment to distribute load.

  • secondary

    The secondary HANA instance or secondary site refers to the SAP HANA instance or site which is configured to be synced with the primary HANA instance through the SAP HANA system replication mechanism. This instance preloads the in-memory data of the primary instance and is ready to take over if the primary instance fails.

1.2. Performance-optimized SAP HANA scale-out HA

Performance-optimized means that there is only a single SAP HANA instance running on each node that has control over most of the resources, such as CPU and RAM, on each node. This means that the SAP HANA instances can run with as much performance as possible.

You configure the HANA environment without HANA standby hosts and only one coordinator name server per replication site. This master name server controls the landscape of each site. The HANA host auto-failover functionality using idle standby hosts is not necessary, because the Pacemaker cluster controls the high availability of the HANA database and manages the HANA system replication.

With a performance-optimized SAP HANA system replication setup of SAP HANA 2.0 SPS1 or newer you can also configure read access to the secondary system to reduce the load on the primary instance. For more information see the SAP documentation for Active/Active (Read Enabled) configuration.

1.3. Cluster resource agents and tools for SAP HANA HA

The high-availability (HA) cluster configuration for managing SAP HANA system replication setups works with multiple resource agents and other tools that combine their functionality for the expected behavior. The Advanced Next Generation Interface (“angi”) resource agents are identical for scale-up and scale-out environments. In upstream they are also called SAPHanaSR-angi.

On RHEL this generation of combined resource agents and tools is provided in the package sap-hana-ha.

  • SAPHanaTopology

    The SAPHanaTopology resource agent gets status information from the SAP HANA environment and saves it to cluster properties. The agent also starts and monitors the local SAP HostAgent, which is required for starting, stopping and monitoring the HANA instances. A configuration process in SAP HANA called system replication hook adds replication health information as well to the saved properties. Based on the collected environment data, the resource agent defines a dedicated health score of the cluster node. This scoring is used by the cluster to decide if it must initiate the switch of the system replication from one site to the other.

  • SAPHanaController

    The SAPHanaController resource agent monitors and manages the SAP HANA environment. In case of a failure of the HANA instance, the resource determines which recovery action it takes and executes the commands for an automatic switch, or it changes the active site of the system replication.

  • SAPHanaFilesystem

    The SAPHanaFilesystem resource agent monitors mounted SAP HANA filesystems for read/write access. It does not mount or unmount filesystems but decides on actions based on HANA system replication status if a monitor fails, allowing for faster takeover actions. On primary HANA sites, if a monitor fails, the cluster tries to stop and restart the resource first. If that fails and HANA system replication is in sync, the cluster fences the node. If the HANA system replication is not in sync, the cluster repeats the restart until it is successful or a migration threshold is reached. On secondary HANA sites, the cluster is not aware of monitor failures. This resource agent is particularly useful for HANA scale-out systems that rely on NFS shares for /hana/shared/<SID>/, which, if failed, can stop HANA without timely cluster action. However, you can also use the resource agent with local filesystems on scale-up systems.

  • SAPHanaSR-showAttr

    The SAPHanaSR-showAttr tool shows cluster attributes for the SAP HANA system replication automation in a preformatted overview including the HANA topology that shows whether it is a scale-up or scale-out environment. The default output includes the system replication status between the nodes and other related status information. The script retrieves the information from the Cluster Information Base (CIB), where other resource agents or hook scripts store updates during their regular checks or from HANA events, respectively. Due to this, the information can contain outdated states until it is updated again. Use HANA tools to get real-time status information from the landscape.

  • SAPHanaSR-hookHelper

    The SAPHanaSR-hookHelper tool is a helper script that is used by other SAP HANA HA components for specific shared functionality, such as certain fencing features.

  • SAPHanaSR-alert-fencing

    The SAPHanaSR-alert-fencing script is a cluster alert agent. In scale-out setups with HANA HA, you can configure this alert agent to trigger the cluster to fence all nodes of the same HANA site after one node of that site is fenced due to a failure. This functionality only applies to scale-out configurations and has no effect in a scale-up setup.

Note

Verify that the new generation of resource agents is available for your RHEL version. Check Minimum supported package versions for SAP HANA Scale-Up and Scale-Out System Replication HA solutions.

1.4. SAP HANA HA/DR provider hooks

Current versions of SAP HANA provide an API in the form of hooks that allow the HANA instance to send notifications for certain events, for example the loss or establishment of the system replication. For each event, the HANA instance calls the configured hooks, also called HA/DR providers. Hooks are custom Python scripts which process the events that HANA sends and the scripts can trigger different actions based on the event information.

You must add the HA/DR provider definition to the HANA global configuration to enable the required functionality of triggering additional actions for certain events.

HanaSR for the srConnectionChanged() hook method

The HanaSR hook is required for processing the srConnectionChanged() hook method. This method is used by the primary HANA instance for a notification of any change in the HANA system replication status. The primary HANA instance calls the HanaSR HA/DR provider when a HANA system replication related event occurs. The hook script HanaSR.py then parses srConnectionChanged() events for the system replication status detail and as a result it updates the srHook cluster attribute. This attribute is used by the resource agents to evaluate the landscape health and make decisions. The value of the system replication or sync state defines if the cluster recovers a failed primary instance on the same node or if it triggers a takeover to the secondary. The takeover is only triggered when the system replication is fully in sync, which means the HANA data is consistent between the HANA sites.

Important

You must configure the HanaSR hook to enable the srConnectionChanged() hook method for proper function and full support of the HA cluster setup.

ChkSrv for the srServiceStateChanged() hook method

When the HANA instance detects an issue with a HANA indexserver process it recovers from the problem by stopping and restarting the hdbindexserver service automatically through an internal mechanism.

However, especially for very large HANA instances, the hdbindexserver service can take a very long time for the stopping phase of this recovery process. Although HANA reports this service degradation not as an error in the HANA landscape, the situation poses a risk to the data consistency if anything else fails in the instance during that time. To improve the unpredictable service recovery time, you can configure the ChkSrv hook to stop or kill the entire affected HANA instance instead.

In a setup with automatic failover enabled (PREFER_SITE_TAKEOVER=true), the instance stop leads to a takeover if the secondary node is in a healthy state. Otherwise, instance recovery happens locally, but the enforced local instance restart accelerates the process.

The HANA instance calls the ChkSrv hook when an event occurs. The hook script ChkSrv.py processes the srServiceStateChanged() hook method and executes actions based on the results of the filters it applies to event details. This way the ChkSrv.py hook script can distinguish a HANA hdbindexserver process that is being stopped and restarted by HANA after a failure from the same process being stopped as part of an intended instance shutdown. When the hook script determines that the event is caused by a failure it triggers the configured action.

The ChkSrv.py hook script has multiple options to define what happens when an indexserver failure event is detected:

  • ignore

    This action just writes the parsed events and decision information to a dedicated logfile. This is useful for testing and verifying what the hook script would do when activating stop or kill actions.

  • stop

    This action executes a graceful StopSystem for the instance through the sapcontrol command.

  • kill

    This action executes the HDB kill-<signal> command with a default signal 9, which can be configured. The result is the same as when using stop, but can be faster.

  • fence

    This action triggers the cluster to fence the node on which the indexserver failed. This option uses the SAPHanaSR-hookHelper tool and requires a sudo entry for the tool for the <sid>adm user.

Note

Any indexserver failure is treated individually by HANA. The same processes are always triggered for every single indexserver issue.

Enabling the srServiceStateChanged() hook is optional.

1.5. Support policies for SAP HANA High Availability

Red Hat supports the following components of the solution:

  • Basic operating system configuration for running SAP HANA on RHEL, based on SAP guidelines
  • RHEL HA Add-On
  • Red Hat HA solutions for SAP HANA system replication
Red Hat logoGithubredditYoutubeTwitter

学习

尝试、购买和销售

社区

关于红帽文档

通过我们的产品和服务,以及可以信赖的内容,帮助红帽用户创新并实现他们的目标。 了解我们当前的更新.

让开源更具包容性

红帽致力于替换我们的代码、文档和 Web 属性中存在问题的语言。欲了解更多详情,请参阅红帽博客.

關於紅帽

我们提供强化的解决方案,使企业能够更轻松地跨平台和环境(从核心数据中心到网络边缘)工作。

Theme

© 2026 Red Hat
返回顶部