Chapter 1. Introduction

PDF

This document provides information on planning and implementing automated takeover for SAP HANA Scale-Out System Replication deployments. SAP HANA System Replication in this solution provides continuous synchronization between two SAP HANA databases to support high availability and disaster recovery. The challenges of real implementations are typically more complex than can be covered in upfront testing. Please ensure that your environment is tested extensively.

Red Hat recommends contracting a certified consultant familiar with both SAP HANA and the Pacemaker-based RHEL High Availability Add-On to implement the setup and subsequent operation.

As SAP HANA takes on a central function as the primary database platform for SAP landscapes, requirements for stability and reliability increase dramatically. Red Hat Enterprise Linux (RHEL) for SAP Solutions meets those requirements by enhancing native SAP HANA replication and failover technology to automate the takeover process. During a failover in a SAP HANA Scale-Out System Replication deployment, a system administrator must manually instruct the application to perform a takeover to the secondary environment in case there is an issue in the primary environment.

To automate this process Red Hat provides a complete solution for managing SAP HANA Scale-Out System Replication based on the RHEL HA Add-On that is part of the RHEL for SAP Solutions subscription. This documentation provides the concepts, planning, and high-level instructions on how to set up an automated SAP HANA Scale-Out System Replication solution using RHEL for SAP Solutions. This solution has been extensively tested and is proven to work, but the challenges of a real implementation are typically more complex than what this solution can cover. Red Hat therefore recommends that a certified consultant familiar with both SAP HANA and the Pacemaker-based RHEL High Availability Add-On sets up and subsequently services such a solution.

For more information about RHEL for SAP Solutions, see Overview of Red Hat Enterprise Linux for SAP Solutions Subscription.

This solution is for experienced Linux Administrators and SAP Certified Technology Associates. The solution contains planning and deployment information for SAP HANA Scale-Out with System Replication, as well as information on Pacemaker integration with RHEL 8 or later.

Building an SAP HANA scale-out environment with HANA System Replication and Pacemaker connectivity combines several complex technologies. This document contains references to SAP Notes or documentation that explains SAP HANA configuration.

An SAP HANA system as a scale-out cluster primarily extends a growing SAP HANA landscape with new hardware easily. For this feature, essential components of the infrastructure, such as storage and network, require the use of shared resources. Based on this configuration, it is possible to extend the availability of the environment by using standby nodes, providing another level of High Availability solution before a site takeover is initiated.

The SAP HANA scale-out solution can be extended to include two or more completely independent scale-out solutions that act as additional mirrors. The system replication process mirrors databases according to the active/passive method with maximum performance. The communication takes place entirely over the network. Additional infrastructure components are not needed.

Pacemaker automates the system replication process when critical components fail. For this purpose, data from the scale-out environment as well as from the system replication process are evaluated to ensure continued operation. The cluster manages the primary IP address that the client uses to connect to the database. This ensures that in the event of the cluster triggering a database takeover, the clients can still connect to the active instance.

1.1. Supporting responsibilities

For SAP HANA appliance setups, SAP, hardware partners /cloud providers support the following:

Supported hardware and environments
SAP HANA
Storage configuration
SAP HANA Scale-Out configuration (SAP cluster setup)
SAP HANA System Replication (SAP cluster setup)

Red Hat supports the following:

Basic OS configuration for running SAP HANA on RHEL, based on SAP guidelines
RHEL HA Add-On
Red Hat HA solutions for SAP HANA Scale-Out System Replication

For more information, see SAP HANA Master Guide - Operating SAP HANA - SAP HANA Appliance - Roles and Responsibilities. For TDI setups, take a look at SAP HANA Master Guide - Operating SAP HANA - SAP HANA Tailored Data Center Intergration.

1.2. SAP HANA Scale-Out

The process of scaling SAP HANA is very dynamic. During the initial setup of a server instance of a scale-up SAP HANA database, the system can be extended by additional CPUs and memory. If this expansion level is no longer sufficient, SAP extends the environment to a scale-out environment. With a properly prepared infrastructure, additional server instances can be added to the database.

Note

To “scale-out”, add SAP HANA database 1-n server to an existing single node database. Currently, all nodes have to be the same size in terms of CPU and RAM. The configuration of all replicated database sites has to be the same. So you have to upgrade the number of HANA nodes first on all sites before you resync the database.

The prerequisite is shared storage and a corresponding network connection for all nodes. The shared storage is used to exchange data and to use standby nodes, which can take over the functionality of existing nodes in the event of a failure.

Figure 1: Overview scale-up and scale-out systems

text

Master nameserver

A HANA Scale-Out environment has a master configuration that defines a running master instance on one of the nodes. These master instances are the primary contact for the application server. Up to three master roles can be defined for a scale-out high-availability configuration. The master roles are switched automatically if a failure occurs. This master configuration is compatible with the standby host configuration, in which a failed host can take over the tasks of a failed master node.

Figure 2: Scale-out functionality of the used storage

text

1.3. Scale-Out storage configuration

Scale-out storage configuration allows SAP HANA to be flexible in the scale-out environment and to dynamically move the functionality of the nodes in the event of a failure. Since the data is made available to all nodes, the SAP instances only have to be ready to take over the process of the failed components.

There are two different shared storage scenarios for SAP HANA scale-out environments:

The first scenario is shared file systems, which offer a file system of all directories over NFS or IBM’s GPFS. In this scenario, the data is available on all nodes, all the time.
The second scenario is non-shared storage, which is used to exclusively integrate the required data when needed. All data is managed over the SAP HANA storage connector API, and it removes access from nodes using the appropriate mechanisms, for example, SCSI 3 reservations.

For both scenarios, ensure that the /hana/shared directory is made available as a shared file system. This directory must be available and shared independently of the scenarios.

Note

If you want to monitor these shared file systems, you can optionally create file system resources. The entries in the /etc/fstab should be removed; the mount is only managed by the file system resources.

1.3.1. Shared storage

Shared file systems deliver the required data on every host. When configured, SAP HANA accesses the necessary data. The data can be shared easily because the shared directories are mounted on all nodes. The installation proceeds as normal after deployment. SAP HANA has access to all directories: /hana/data, /hana/log and /hana/shared.

Figure 3: Functionality and working paths of the scale-out process with shared storage

text

1.3.2. Non-shared storage

A non-shared storage configuration is more complex than a shared storage configuration. It requires a supported storage component and an individual configuration of the storage connector in the SAP HANA installation process. The SAP HANA database reconfigures the RHEL systems with several internal changes, for example, sudo access, lvm, or multipath. With every change of the node definition, SAP HANA is changing access to the storage directly over SCSI3 reservations. The non-shared storage configuration is more optimised than the shared storage configuration because it has direct access to the storage system.

Figure 4: Functionality and working paths of the scale-out process with the storage connector

text

1.4. SAP HANA System Replication

SAP HANA System Replication provides a way for its SAP HANA environment to replicate the database across multiple sites. The network replicates the data and preloads it into the second SAP HANA installation. SAP HANA System Replication significantly reduces recovery time in case there is a failure of the primary HANA Scale-Out site. You must ensure that all replicated environments are built with identical specifications across hardware, software, and configuration settings.

1.5. Network configuration

Three networks are the minimum network requirements for an SAP HANA Scale-Out System Replication setup that is managed by the RHEL HA Add-On. Nevertheless, an SAP-recommended network configuration should be used to build up a high performing production environment.

The three networks are:

Public network: Required for the connection of the application server and clients (minimum requirement).
Communication network: Required for system replication communication, internode communication, and storage configuration.
Heartbeat network: Required for HA cluster communication.

The recommended configuration is designed with the following networks:

Application server network
Client network
Replication network
Storage network
Two internode networks
Backup network
Admin network
Pacemaker network

Based on the configuration of this solution, changes in the SAP HANA configuration process are required. The system replication hostname resolution is adjusted to the network that is used for the system replication. This is described in the SAP HANA Network Requirements documentation.

Figure 5: Example Network configuration of two scale-out systems connected over SAP HANA system replication

text

1.6. RHEL HA Add-On

In the solution described in this document, the RHEL HA Add-On is used for ensuring the operation of SAP HANA Scale-Out System Replication across two sites. For this reason, resource agents published specifically for SAP HANA scale-out environments are used, which manage the SAP HANA Scale-Out System Replication environment. Based on the current status of the SAP HANA Scale-Out System Replication environment, a decision can be made to either switch the active master node to another available standby node or to switch the entire active side of the scale-out system replication environment to the second site. For this solution, a fencing mechanism is configured to avoid split-brain constellations.

Figure 6: Overview of Pacemaker integration based on a system replication environment

text

For more information about using the RHEL HA Add-On to set up HA clusters on RHEL 8, see the following documentation:

It is important to understand scale-out and system replication methods from the SAP HANA database because SAP HANA scale-out resource agents are using data from every environment.

At first, the resource agent is watching for a stable scale-out environment on every site. It checks if enough SAP HANA scale-out master nameserver nodes are configured and in a valid state. Subsequently, the resource agent checks the system replication state. If everything is working correctly, it attaches the virtual IP address to the active master node on the master site of the system replication. In a failure state, the cluster is configured to switch the system replication configuration automatically.

The definition of a failure state is dependent on the configuration of the master nameserver. For example, when one master nameserver is configured, the cluster switches directly to the other datacenter if the master node fails. If up to three master nameservers are configured, the SAP HANA environment heals itself before switching to the other datacenter. Pacemaker is working with the scoring numbers to make decisions on what should be done. When running SAP HANA, it is very important that these parameters are not changed in a cluster setup.

Pacemaker configuration is also based on fencing configuration that uses Shoot The Other Node In The Head (STONITH). An unresponsive node does not mean that it is not accessing data. Use STONITH to fence the node and be sure that data is safe. STONITH protects data from being corrupted by rogue nodes or concurrent access. If the communication between the two sites is lost, both sites may believe they are able to continue working, which can cause data corruption. This is also called a split-brain scenario. To prevent this, a quorum can be added, which helps to decide who is able to continue. A quorum can either be an additional node or a qdevice. In our example, we are using the additional node majoritymaker.

Figure 7: Example of system replication with scale out

text

1.7. Resource agents

The cluster configuration is working with two resource agents.

1.7.1. `SAPHanaTopology` resource agent

The SAPHanaTopology resource agent is a cloned resource that receives all of its data from the SAP HANA environment. A configuration process in SAP HANA called “system replication hook" generates this data. Based on this data, the resource agent calculates the Pacemaker scoring for the Pacemaker service. The scoring is used by the cluster to decide if it should initiate switching the system replication from one site to the other. If the scoring value is higher than a predefined value, the cluster switches the system replication.

1.7.2. SAPHanaController resource agent

The SAPHanaController resource agent controls the SAP HANA environment and executes all commands for an automatic switch, or it changes the active site of the system replication.

Chapter 1. Introduction

1.1. Supporting responsibilities

1.2. SAP HANA Scale-Out