Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.

Chapter 6. Testing the setup


Test your new HANA HA cluster thoroughly before you enable it for production workloads.

Enhance the basic example test cases with your specific requirements.

Note

The following test case examples show the testing on a 2-node cluster with ASCS and ERS resource groups of an S/4HANA setup.

6.1. Moving the ASCS instance using cluster commands

Test how the cluster moves an application server instance and its related resources from one node to another on demand. You can use this procedure to distribute instances to specific nodes.

Prerequisites

  • You have ensured that all cluster nodes are up and the resource groups for the ASCS and ERS are running on different nodes.
  • You have no failures in the cluster status.

Procedure

  • Move the ASCS resource to any other HA cluster node. You can use either the SAPInstance resource or the resource group in the command:

    [root]# pcs resource move rsc_SAPInstance_<SID>_ASCS<instance> [<node>]
    Location constraint to move resource 'rsc_SAPInstance_S4H_ASCS20' has been created
    Waiting for the cluster to apply configuration changes...
    Location constraint created to move resource 'rsc_SAPInstance_S4H_ASCS20' has been removed
    Waiting for the cluster to apply configuration changes...
    resource 'rsc_SAPInstance_S4H_ASCS20' is running on node 'node2'
    • Replace <SID> with the ASCS SID, for example, S4H.
    • Replace <instance> with the ASCS instance number, for example, 20.
    • Optionally, you can define a target node where the instance is moved to. When you do not define a node, the cluster chooses a healthy target node that meets the configuration.

Verification

  1. Check that the resource group starts fully on the other node, for example, after moving the ASCS group from node1 to node2:

    [root]# pcs resource
      * Resource Group: grp_S4H_ASCS20:
        * rsc_vip_S4H_ASCS20        (ocf:heartbeat:IPAddr2):         Started node2
        * rsc_SAPStartSrv_S4H_ASCS20        (ocf:heartbeat:SAPStartSrv):     Started node2
        * rsc_SAPInstance_S4H_ASCS20        (ocf:heartbeat:SAPInstance):     Started node2
      * Resource Group: grp_S4H_ERS29:
        * rsc_vip_S4H_ERS29 (ocf:heartbeat:IPAddr2):         Started node1
        * rsc_SAPStartSrv_S4H_ERS29 (ocf:heartbeat:SAPStartSrv):     Started node1
        * rsc_SAPInstance_S4H_ERS29 (ocf:heartbeat:SAPInstance):     Started node1
  2. Optionally, when the cluster consists of 2 nodes: Verify that the ASCS resource group fully starts, then the ERS resource group automatically stops on this node and then moves to the node where the ASCS resource group was running before. This is triggered by the colocation constraint. Check for the related chain of actions by the pacemaker-controld in the system logfile:

    [root]# less /var/log/messages
    …
    … notice: Result of start operation for rsc_SAPInstance_S4H_ASCS20 on node2: ok
    …
    … notice: Requesting local execution of stop operation for rsc_SAPInstance_S4H_ERS29 on node2

This test verifies that the sapcontrol command is able to move the instances to the other HA cluster node when the SAP HA interface is enabled for the instance.

Prerequisites

  • You have enabled the SAP HA interface for the ASCS instance.
  • You have ensured that all cluster nodes are up and the resource groups for the ASCS and ERS are running on different nodes.
  • You have no failures in the cluster status.

Procedure

  1. Run the HAFailoverToNode function of sapcontrol to move the ASCS instance to the other node. Execute as user <sid>adm:

    <sid>adm $ sapcontrol -nr <instance> -function HAFailoverToNode ""
  2. Check that the instance stops on the current node and starts on the other node. The ERS instance automatically stops and starts as well after ASCS is fully up. For example, cluster resource status after you have moved the ASCS instance from node1 to node2:

    [root]# pcs resource
      * Resource Group: grp_S4H_ASCS20:
        * rsc_vip_S4H_ASCS20        (ocf:heartbeat:IPAddr2):         Started node2
        * rsc_SAPStartSrv_S4H_ASCS20        (ocf:heartbeat:SAPStartSrv):     Started node2
        * rsc_SAPInstance_S4H_ASCS20        (ocf:heartbeat:SAPInstance):     Started node2
      * Resource Group: grp_S4H_ERS29:
        * rsc_vip_S4H_ERS29 (ocf:heartbeat:IPAddr2):         Started node1
        * rsc_SAPStartSrv_S4H_ERS29 (ocf:heartbeat:SAPStartSrv):     Started node1
        * rsc_SAPInstance_S4H_ERS29 (ocf:heartbeat:SAPInstance):     Started node1
  3. Check the new location constraint that has been created by the manual move due to the HA integration:

    [root]# pcs constraint
    Location Constraints:
      Started resource 'grp_S4H_ASCS20'
        Rules:
          Rule: boolean-op=and score=-INFINITY
            Expression: #uname eq string node1
            Expression: date lt YYYY-MM-DDT13:40:45Z

    The constraint defines that the ASCS resource group is banned for 5 minutes from the original node, which enforces the move to the other node. The date string in the rule defines the time at which the cluster deletes the constraint automatically.

  4. Optional: Remove the temporary constraint to enable the ASCS resource group on the previous node immediately and end this test:

    [root]# pcs resource clear grp_S4H_ASCS20

6.3. Testing failures of the ASCS instance

Test that the pacemaker cluster executes the recovery action when the enqueue server of the ASCS instance or the whole ASCS instance fails.

Prerequisites

  • You have ensured that all cluster nodes are up and the resource groups for the ASCS and ERS are running on different nodes.
  • You have no failures in the cluster status.

Procedure

  1. Identify the process ID (PID) of the enqueue server on the node where ASCS is running. Use sapcontrol… GetProcessList as user <sid>adm to find the PID as the last entry:

    <sid>adm $ sapcontrol -nr <instance> -function GetProcessList
    name, description, dispstatus, textstatus, starttime, elapsedtime, pid
    msg_server, MessageServer, GREEN, Running, YYYY MM DD 14:10:29, 0:01:00, 142607
    enq_server, Enqueue Server 2, GREEN, Running, YYYY MM DD 14:10:29, 0:01:00, 142608

    In the example, the enqueue server PID is 142608.

  2. Send a SIGKILL signal to the identified process to kill it instantly:

    <sid>adm $ kill -9 <pid>
    • Replace <pid> with the PID of the enqueue server, for example, 142608.
  3. Check that the ASCS instance recovers. With the default migration-threshold of all resources set to 3, the cluster restarts the resource on the same node twice, before it recovers on a different node.
  4. Repeat steps 1 and 2 until you have killed the process 3 times. The cluster recovers the resource on another node after 3 consecutive failures of the same resource.

Verification

  1. Check that the ASCS instance is running on the other node:

    [root]# pcs resource
      * Resource Group: grp_S4H_ASCS20:
        * rsc_vip_S4H_ASCS20        (ocf:heartbeat:IPAddr2):         Started node2
        * rsc_SAPStartSrv_S4H_ASCS20        (ocf:heartbeat:SAPStartSrv):     Started node2
        * rsc_SAPInstance_S4H_ASCS20        (ocf:heartbeat:SAPInstance):     Started node2
      * Resource Group: grp_S4H_ERS29:
        * rsc_vip_S4H_ERS29 (ocf:heartbeat:IPAddr2):         Started node1
        * rsc_SAPStartSrv_S4H_ERS29 (ocf:heartbeat:SAPStartSrv):     Started node1
        * rsc_SAPInstance_S4H_ERS29 (ocf:heartbeat:SAPInstance):     Started node1

    In our example, the ASCS instance failed on node1 and has been moved to node2. As configured, the ERS instance moved to node1 after ASCS was up on node2.

  2. Check the fail count and notice that it has failed 3 times to trigger the recovery on the other node:

    [root]# pcs resource failcount
    Failcounts for resource 'rsc_SAPInstance_S4H_ASCS20'
      node1: 3

Next steps

6.4. Testing failures of the ERS instance

Test that the pacemaker cluster executes the recovery action when the enqueue server of the ERS instance or the whole ERS instance fails.

Prerequisites

  • You have ensured that all cluster nodes are up and the resource groups for the ASCS and ERS are running on different nodes.
  • You have no failures in the cluster status.

Procedure

  1. Identify the process ID (PID) of the enqueue server on the node where ERS is running. Use sapcontrol… GetProcessList as user <sid>adm to find the PID as the last entry:

    <sid>adm $ sapcontrol -nr <instance> -function GetProcessList
    name, description, dispstatus, textstatus, starttime, elapsedtime, pid
    enq_replicator, Enqueue Replicator 2, GREEN, Running, YYYY MM DD 15:42:03, 0:00:04, 19124

    In the example, the enqueue server PID is 191240.

  2. Send a SIGKILL signal to the identified process to kill it instantly:

    <sid>adm $ kill -9 <pid>
    • Replace <pid> with the PID of the enqueue server, for example, 191240.
  3. Check that the ERS instance recovers. With the default migration-threshold of all resources set to 3, the cluster restarts the resource on the same node twice, before it recovers on a different node.
  4. Repeat steps 1 and 2 until you have killed the process 3 times. The cluster recovers the resource on another node after 3 consecutive failures of the same resource.

Verification

  1. Check that the ERS instance is running on the other node:

    [root]# pcs resource
      * Resource Group: grp_S4H_ERS20:
        * rsc_vip_S4H_ERS20        (ocf:heartbeat:IPAddr2):         Started node2
        * rsc_SAPStartSrv_S4H_ERS20        (ocf:heartbeat:SAPStartSrv):     Started node2
        * rsc_SAPInstance_S4H_ERS20        (ocf:heartbeat:SAPInstance):     Started node2
      * Resource Group: grp_S4H_ERS29:
        * rsc_vip_S4H_ERS29 (ocf:heartbeat:IPAddr2):         Started node1
        * rsc_SAPStartSrv_S4H_ERS29 (ocf:heartbeat:SAPStartSrv):     Started node1
        * rsc_SAPInstance_S4H_ERS29 (ocf:heartbeat:SAPInstance):     Started node1

    In our example, the ERS instance failed on node2 and has been moved to node1. When the ERS instance cannot run on a different node than ASCS anymore due to the failures, it is restarted on the same node as the ASCS instance.

    When you clear the failure, the cluster automatically moves the ERS instance back to the other node.

  2. Check the fail count and notice that it has failed 3 times to trigger the recovery on the other node:

    [root]# pcs resource failcount
    Failcounts for resource 'rsc_SAPInstance_S4H_ERS20'
      node1: 3

Next steps

6.5. Crashing the node with the ASCS instance

Simulate the crash of the cluster node on which the ASCS instance runs to test the behavior of your cluster resources.

Prerequisites

  • You have ensured that all cluster nodes are up and the resource groups for the ASCS and ERS are running on different nodes.
  • You have no failures in the cluster status.

Procedure

  • Trigger a crash on the node that runs the ASCS instance, for example, node1. This immediately causes a kernel panic on the node, effectively simulating a system crash and the node becomes unresponsive.

    The cluster’s fencing mechanism (STONITH) detects the failure and initiates recovery actions. Typically it fences the node and restarts any failed resources on a surviving cluster node.

    The following command immediately causes a crash of the node on which you run the command, with no further warning:

    [root]# echo c > /proc/sysrq-trigger

Verification

  1. Check that the cluster on the other node fences the crashed node:

    [root]# pcs stonith history
    reboot of node1 successful: delegate=node2, client=pacemaker-controld.1468, origin=node2, completed=...
    1 event found
  2. Check that the cluster starts the ASCS resources on the remaining node. In a 2-node cluster this leads to ASCS and ERS running on the same node:

    [root]# pcs resource
      * Resource Group: grp_S4H_ASCS20:
        * rsc_vip_S4H_ASCS20        (ocf:heartbeat:IPAddr2):         Started node2
        * rsc_SAPStartSrv_S4H_ASCS20        (ocf:heartbeat:SAPStartSrv):     Started node2
        * rsc_SAPInstance_S4H_ASCS20        (ocf:heartbeat:SAPInstance):     Started node2
      * Resource Group: grp_S4H_ERS29:
        * rsc_vip_S4H_ERS29 (ocf:heartbeat:IPAddr2):         Started node2
        * rsc_SAPStartSrv_S4H_ERS29 (ocf:heartbeat:SAPStartSrv):     Started node2
        * rsc_SAPInstance_S4H_ERS29 (ocf:heartbeat:SAPInstance):     Started node2
  3. Check that the cluster automatically moves the ERS instance to the previously failed node after the fenced node is running again:

    [root]# pcs resource
      * Resource Group: grp_S4H_ASCS20:
        * rsc_vip_S4H_ASCS20        (ocf:heartbeat:IPAddr2):         Started node2
        * rsc_SAPStartSrv_S4H_ASCS20        (ocf:heartbeat:SAPStartSrv):     Started node2
        * rsc_SAPInstance_S4H_ASCS20        (ocf:heartbeat:SAPInstance):     Started node2
      * Resource Group: grp_S4H_ERS29:
        * rsc_vip_S4H_ERS29 (ocf:heartbeat:IPAddr2):         Started node1
        * rsc_SAPStartSrv_S4H_ERS29 (ocf:heartbeat:SAPStartSrv):     Started node1
        * rsc_SAPInstance_S4H_ERS29 (ocf:heartbeat:SAPInstance):     Started node1

    The higher ASCS resource stickiness and the constraints you have configured earlier ensure that the ASCS instance stays in place to avoid the unnecessary disruption of the service again.

Next steps

6.6. Crashing the node with the ERS instance

Simulate the crash of the cluster node on which the ERS instance runs to test the behavior of your cluster resources.

Prerequisites

  • You have ensured that all cluster nodes are up and the resource groups for the ASCS and ERS are running on different nodes.
  • You have no failures in the cluster status.

Procedure

  • Trigger a crash on the node that runs the ERS instance, for example, node2. This immediately causes a kernel panic on the node, effectively simulating a system crash and the node becomes unresponsive.

    The cluster’s fencing mechanism (STONITH) detects the failure and initiates recovery actions. Typically it fences the node and restarts any failed resources on a surviving cluster node.

    The following command immediately causes a crash of the node on which you run the command, with no further warning:

    [root]# echo c > /proc/sysrq-trigger

Verification

  1. Check that the cluster on the other node fences the crashed node:

    [root]# pcs stonith history
    reboot of node2 successful: delegate=node1, client=pacemaker-controld.1426, origin=node1, completed=...
    1 event found
  2. Check that the cluster starts the ERS resources on the remaining node. In a 2-node cluster this leads to ASCS and ERS running on the same node:

    [root]# pcs resource
      * Resource Group: grp_S4H_ASCS20:
        * rsc_vip_S4H_ASCS20        (ocf:heartbeat:IPAddr2):         Started node1
        * rsc_SAPStartSrv_S4H_ASCS20        (ocf:heartbeat:SAPStartSrv):     Started node1
        * rsc_SAPInstance_S4H_ASCS20        (ocf:heartbeat:SAPInstance):     Started node1
      * Resource Group: grp_S4H_ERS29:
        * rsc_vip_S4H_ERS29 (ocf:heartbeat:IPAddr2):         Started node1
        * rsc_SAPStartSrv_S4H_ERS29 (ocf:heartbeat:SAPStartSrv):     Started node1
        * rsc_SAPInstance_S4H_ERS29 (ocf:heartbeat:SAPInstance):     Started node1
  3. Check that the cluster automatically moves the ERS instance to the previously failed node after the fenced node is running again:

    [root]# pcs resource
    * Resource Group: grp_S4H_ASCS20:
        * rsc_vip_S4H_ASCS20        (ocf:heartbeat:IPAddr2):         Started node1
        * rsc_SAPStartSrv_S4H_ASCS20        (ocf:heartbeat:SAPStartSrv):     Started node1
        * rsc_SAPInstance_S4H_ASCS20        (ocf:heartbeat:SAPInstance):     Started node1
      * Resource Group: grp_S4H_ERS29:
        * rsc_vip_S4H_ERS29 (ocf:heartbeat:IPAddr2):         Started node2
        * rsc_SAPStartSrv_S4H_ERS29 (ocf:heartbeat:SAPStartSrv):     Started node2
        * rsc_SAPInstance_S4H_ERS29 (ocf:heartbeat:SAPInstance):     Started node2

Next steps

6.7. Testing failures of the ASCS or ERS instance in a 3-node cluster

Test that the pacemaker cluster recovers the ASCS or the ERS instance on a different node after consecutive failures.

The constraints you have configured in the ENSA2 setup try to keep the ASCS and ERS instances on separate nodes and the cluster uses any extra node for the recovery.

In a cluster with more than 2 nodes, the recovery of a failed ASCS or ERS instance is similar. The following test demonstrates an example of a failing ASCS instance.

Prerequisites

  • You have configured your ASCS and ERS instance in an ENSA2 setup.
  • You have configured 3 or more cluster nodes in this cluster.
  • You have configured the additional cluster node(s) to be able to run the instances.
  • You have ensured that all cluster nodes are up and the resource groups for the ASCS and ERS are running on different nodes.
  • You have no failures in the cluster status.

Procedure

  1. Identify the process ID (PID) of the enqueue server on the node where ASCS is running. Use sapcontrol… GetProcessList as user <sid>adm to find the PID as the last entry:

    <sid>adm $ sapcontrol -nr <instance> -function GetProcessList
    name, description, dispstatus, textstatus, starttime, elapsedtime, pid
    enq_replicator, Enqueue Replicator 2, GREEN, Running, YYYY MM DD 13:20:07, 0:00:08, 161323

    In the example, the enqueue server PID is 161323.

  2. Send a SIGKILL signal to the identified process to kill it instantly, for example, ASCS on node1:

    <sid>adm $ kill -9 <pid>
    • Replace <pid> with the PID of the enqueue server, for example, 161323.
  3. Check that the ASCS instance recovers on the same node. With the default migration-threshold of all resources set to 3, the cluster restarts the resource on the same node twice, before it recovers on a different node.
  4. Repeat steps 1 and 2 until you have killed the process 3 times. The cluster recovers the resource on another node after 3 consecutive failures of the same resource.

Verification

  1. Check that the ASCS instance is now running on the additional node:

    [root]# pcs resource
      * Resource Group: grp_S4H_ASCS20:
        * rsc_vip_S4H_ASCS20        (ocf:heartbeat:IPAddr2):         Started node3
        * rsc_SAPStartSrv_S4H_ASCS20        (ocf:heartbeat:SAPStartSrv):     Started node3
        * rsc_SAPInstance_S4H_ASCS20        (ocf:heartbeat:SAPInstance):     Started node3
      * Resource Group: grp_S4H_ERS29:
        * rsc_vip_S4H_ERS29 (ocf:heartbeat:IPAddr2):         Started node2
        * rsc_SAPStartSrv_S4H_ERS29 (ocf:heartbeat:SAPStartSrv):     Started node2
        * rsc_SAPInstance_S4H_ERS29 (ocf:heartbeat:SAPInstance):     Started node2

    In our example, the ASCS instance failed on node1 and has been moved to node3. The ERS instance stays in place on node2.

  2. Check the fail count and notice that it has failed 3 times to trigger the recovery on the other node:

    [root]# pcs resource failcount
    Failcounts for resource 'rsc_SAPInstance_S4H_ASCS20'
      node1: 3

Next steps

Red Hat logoGithubredditYoutubeTwitter

Lernen

Testen, kaufen und verkaufen

Communitys

Über Red Hat Dokumentation

Wir helfen Red Hat Benutzern, mit unseren Produkten und Diensten innovativ zu sein und ihre Ziele zu erreichen – mit Inhalten, denen sie vertrauen können. Entdecken Sie unsere neuesten Updates.

Mehr Inklusion in Open Source

Red Hat hat sich verpflichtet, problematische Sprache in unserem Code, unserer Dokumentation und unseren Web-Eigenschaften zu ersetzen. Weitere Einzelheiten finden Sie in Red Hat Blog.

Über Red Hat

Wir liefern gehärtete Lösungen, die es Unternehmen leichter machen, plattform- und umgebungsübergreifend zu arbeiten, vom zentralen Rechenzentrum bis zum Netzwerkrand.

Theme

© 2026 Red Hat
Nach oben