Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.

Chapter 6. Testing the setup

Test your new HANA HA cluster thoroughly before you enable it for production workloads.

Enhance the basic example test cases with your specific requirements.

Note

The following test case examples show the testing on a 2-node cluster with ASCS and ERS resource groups of an S/4HANA setup.

6.1. Moving the ASCS instance using cluster commands
Link kopieren

Test how the cluster moves an application server instance and its related resources from one node to another on demand. You can use this procedure to distribute instances to specific nodes.

Prerequisites

You have ensured that all cluster nodes are up and the resource groups for the ASCS and ERS are running on different nodes.
You have no failures in the cluster status.

Procedure

Move the ASCS resource to any other HA cluster node. You can use either the SAPInstance resource or the resource group in the command:

[root]# pcs resource move rsc_SAPInstance_<SID>_ASCS<instance> [<node>]
Location constraint to move resource 'rsc_SAPInstance_S4H_ASCS20' has been created
Waiting for the cluster to apply configuration changes...
Location constraint created to move resource 'rsc_SAPInstance_S4H_ASCS20' has been removed
Waiting for the cluster to apply configuration changes...
resource 'rsc_SAPInstance_S4H_ASCS20' is running on node 'node2'

Replace <SID> with the ASCS SID, for example, S4H.
Replace <instance> with the ASCS instance number, for example, 20.
Optionally, you can define a target node where the instance is moved to. When you do not define a node, the cluster chooses a healthy target node that meets the configuration.

Verification

Check that the resource group starts fully on the other node, for example, after moving the ASCS group from node1 to node2:

[root]# pcs resource
  * Resource Group: grp_S4H_ASCS20:
    * rsc_vip_S4H_ASCS20        (ocf:heartbeat:IPAddr2):         Started node2
    * rsc_SAPStartSrv_S4H_ASCS20        (ocf:heartbeat:SAPStartSrv):     Started node2
    * rsc_SAPInstance_S4H_ASCS20        (ocf:heartbeat:SAPInstance):     Started node2
  * Resource Group: grp_S4H_ERS29:
    * rsc_vip_S4H_ERS29 (ocf:heartbeat:IPAddr2):         Started node1
    * rsc_SAPStartSrv_S4H_ERS29 (ocf:heartbeat:SAPStartSrv):     Started node1
    * rsc_SAPInstance_S4H_ERS29 (ocf:heartbeat:SAPInstance):     Started node1

Optionally, when the cluster consists of 2 nodes: Verify that the ASCS resource group fully starts, then the ERS resource group automatically stops on this node and then moves to the node where the ASCS resource group was running before. This is triggered by the colocation constraint. Check for the related chain of actions by the pacemaker-controld in the system logfile:
```
[root]# less /var/log/messages
…
… notice: Result of start operation for rsc_SAPInstance_S4H_ASCS20 on node2: ok
…
… notice: Requesting local execution of stop operation for rsc_SAPInstance_S4H_ERS29 on node2
```

6.2. Manually moving the ASCS instance using sapcontrol and the SAP HA interface
Link kopieren

This test verifies that the sapcontrol command is able to move the instances to the other HA cluster node when the SAP HA interface is enabled for the instance.

Prerequisites

You have enabled the SAP HA interface for the ASCS instance.
You have ensured that all cluster nodes are up and the resource groups for the ASCS and ERS are running on different nodes.
You have no failures in the cluster status.

Procedure

Run the HAFailoverToNode function of sapcontrol to move the ASCS instance to the other node. Execute as user <sid>adm:
```
<sid>adm $ sapcontrol -nr <instance> -function HAFailoverToNode ""
```

Check that the instance stops on the current node and starts on the other node. The ERS instance automatically stops and starts as well after ASCS is fully up. For example, cluster resource status after you have moved the ASCS instance from node1 to node2:

[root]# pcs resource
  * Resource Group: grp_S4H_ASCS20:
    * rsc_vip_S4H_ASCS20        (ocf:heartbeat:IPAddr2):         Started node2
    * rsc_SAPStartSrv_S4H_ASCS20        (ocf:heartbeat:SAPStartSrv):     Started node2
    * rsc_SAPInstance_S4H_ASCS20        (ocf:heartbeat:SAPInstance):     Started node2
  * Resource Group: grp_S4H_ERS29:
    * rsc_vip_S4H_ERS29 (ocf:heartbeat:IPAddr2):         Started node1
    * rsc_SAPStartSrv_S4H_ERS29 (ocf:heartbeat:SAPStartSrv):     Started node1
    * rsc_SAPInstance_S4H_ERS29 (ocf:heartbeat:SAPInstance):     Started node1

Check the new location constraint that has been created by the manual move due to the HA integration:
```
[root]# pcs constraint
Location Constraints:
  Started resource 'grp_S4H_ASCS20'
    Rules:
      Rule: boolean-op=and score=-INFINITY
        Expression: #uname eq string node1
        Expression: date lt YYYY-MM-DDT13:40:45Z
```
The constraint defines that the ASCS resource group is banned for 5 minutes from the original node, which enforces the move to the other node. The date string in the rule defines the time at which the cluster deletes the constraint automatically.
Optional: Remove the temporary constraint to enable the ASCS resource group on the previous node immediately and end this test:
```
[root]# pcs resource clear grp_S4H_ASCS20
```

6.3. Testing failures of the ASCS instance
Link kopieren

Test that the pacemaker cluster executes the recovery action when the enqueue server of the ASCS instance or the whole ASCS instance fails.

Prerequisites

You have ensured that all cluster nodes are up and the resource groups for the ASCS and ERS are running on different nodes.
You have no failures in the cluster status.

Procedure

Identify the process ID (PID) of the enqueue server on the node where ASCS is running. Use sapcontrol… GetProcessList as user <sid>adm to find the PID as the last entry:

<sid>adm $ sapcontrol -nr <instance> -function GetProcessList
name, description, dispstatus, textstatus, starttime, elapsedtime, pid
msg_server, MessageServer, GREEN, Running, YYYY MM DD 14:10:29, 0:01:00, 142607
enq_server, Enqueue Server 2, GREEN, Running, YYYY MM DD 14:10:29, 0:01:00, 142608

In the example, the enqueue server PID is 142608.

Send a SIGKILL signal to the identified process to kill it instantly:
```
<sid>adm $ kill -9 <pid>
```
- Replace <pid> with the PID of the enqueue server, for example, 142608.
Check that the ASCS instance recovers. With the default migration-threshold of all resources set to 3, the cluster restarts the resource on the same node twice, before it recovers on a different node.
Repeat steps 1 and 2 until you have killed the process 3 times. The cluster recovers the resource on another node after 3 consecutive failures of the same resource.

Verification

Check that the ASCS instance is running on the other node:

[root]# pcs resource
  * Resource Group: grp_S4H_ASCS20:
    * rsc_vip_S4H_ASCS20        (ocf:heartbeat:IPAddr2):         Started node2
    * rsc_SAPStartSrv_S4H_ASCS20        (ocf:heartbeat:SAPStartSrv):     Started node2
    * rsc_SAPInstance_S4H_ASCS20        (ocf:heartbeat:SAPInstance):     Started node2
  * Resource Group: grp_S4H_ERS29:
    * rsc_vip_S4H_ERS29 (ocf:heartbeat:IPAddr2):         Started node1
    * rsc_SAPStartSrv_S4H_ERS29 (ocf:heartbeat:SAPStartSrv):     Started node1
    * rsc_SAPInstance_S4H_ERS29 (ocf:heartbeat:SAPInstance):     Started node1

In our example, the ASCS instance failed on node1 and has been moved to node2. As configured, the ERS instance moved to node1 after ASCS was up on node2.

Check the fail count and notice that it has failed 3 times to trigger the recovery on the other node:
```
[root]# pcs resource failcount
Failcounts for resource 'rsc_SAPInstance_S4H_ASCS20'
  node1: 3
```

Next steps

Clear any failure notifications from the cluster that may be there from previous testing. For more information, see Cleaning up the failure history.

6.4. Testing failures of the ERS instance
Link kopieren

Test that the pacemaker cluster executes the recovery action when the enqueue server of the ERS instance or the whole ERS instance fails.

Prerequisites

You have ensured that all cluster nodes are up and the resource groups for the ASCS and ERS are running on different nodes.
You have no failures in the cluster status.

Procedure

Identify the process ID (PID) of the enqueue server on the node where ERS is running. Use sapcontrol… GetProcessList as user <sid>adm to find the PID as the last entry:

<sid>adm $ sapcontrol -nr <instance> -function GetProcessList
name, description, dispstatus, textstatus, starttime, elapsedtime, pid
enq_replicator, Enqueue Replicator 2, GREEN, Running, YYYY MM DD 15:42:03, 0:00:04, 19124

In the example, the enqueue server PID is 191240.

Send a SIGKILL signal to the identified process to kill it instantly:
```
<sid>adm $ kill -9 <pid>
```
- Replace <pid> with the PID of the enqueue server, for example, 191240.
Check that the ERS instance recovers. With the default migration-threshold of all resources set to 3, the cluster restarts the resource on the same node twice, before it recovers on a different node.
Repeat steps 1 and 2 until you have killed the process 3 times. The cluster recovers the resource on another node after 3 consecutive failures of the same resource.

Verification

Check that the ERS instance is running on the other node:

[root]# pcs resource
  * Resource Group: grp_S4H_ERS20:
    * rsc_vip_S4H_ERS20        (ocf:heartbeat:IPAddr2):         Started node2
    * rsc_SAPStartSrv_S4H_ERS20        (ocf:heartbeat:SAPStartSrv):     Started node2
    * rsc_SAPInstance_S4H_ERS20        (ocf:heartbeat:SAPInstance):     Started node2
  * Resource Group: grp_S4H_ERS29:
    * rsc_vip_S4H_ERS29 (ocf:heartbeat:IPAddr2):         Started node1
    * rsc_SAPStartSrv_S4H_ERS29 (ocf:heartbeat:SAPStartSrv):     Started node1
    * rsc_SAPInstance_S4H_ERS29 (ocf:heartbeat:SAPInstance):     Started node1

In our example, the ERS instance failed on node2 and has been moved to node1. When the ERS instance cannot run on a different node than ASCS anymore due to the failures, it is restarted on the same node as the ASCS instance.

When you clear the failure, the cluster automatically moves the ERS instance back to the other node.

Check the fail count and notice that it has failed 3 times to trigger the recovery on the other node:
```
[root]# pcs resource failcount
Failcounts for resource 'rsc_SAPInstance_S4H_ERS20'
  node1: 3
```

Next steps

Clear any failure notifications from the cluster that may be there from previous testing. For more information, see Cleaning up the failure history.

6.5. Crashing the node with the ASCS instance
Link kopieren

Simulate the crash of the cluster node on which the ASCS instance runs to test the behavior of your cluster resources.

Prerequisites

You have ensured that all cluster nodes are up and the resource groups for the ASCS and ERS are running on different nodes.
You have no failures in the cluster status.

Procedure

Trigger a crash on the node that runs the ASCS instance, for example, node1. This immediately causes a kernel panic on the node, effectively simulating a system crash and the node becomes unresponsive.
The cluster’s fencing mechanism (STONITH) detects the failure and initiates recovery actions. Typically it fences the node and restarts any failed resources on a surviving cluster node.
The following command immediately causes a crash of the node on which you run the command, with no further warning:
```
[root]# echo c > /proc/sysrq-trigger
```

Verification

Check that the cluster on the other node fences the crashed node:

[root]# pcs stonith history
reboot of node1 successful: delegate=node2, client=pacemaker-controld.1468, origin=node2, completed=...
1 event found

Check that the cluster starts the ASCS resources on the remaining node. In a 2-node cluster this leads to ASCS and ERS running on the same node:

[root]# pcs resource
  * Resource Group: grp_S4H_ASCS20:
    * rsc_vip_S4H_ASCS20        (ocf:heartbeat:IPAddr2):         Started node2
    * rsc_SAPStartSrv_S4H_ASCS20        (ocf:heartbeat:SAPStartSrv):     Started node2
    * rsc_SAPInstance_S4H_ASCS20        (ocf:heartbeat:SAPInstance):     Started node2
  * Resource Group: grp_S4H_ERS29:
    * rsc_vip_S4H_ERS29 (ocf:heartbeat:IPAddr2):         Started node2
    * rsc_SAPStartSrv_S4H_ERS29 (ocf:heartbeat:SAPStartSrv):     Started node2
    * rsc_SAPInstance_S4H_ERS29 (ocf:heartbeat:SAPInstance):     Started node2

Check that the cluster automatically moves the ERS instance to the previously failed node after the fenced node is running again:

[root]# pcs resource
  * Resource Group: grp_S4H_ASCS20:
    * rsc_vip_S4H_ASCS20        (ocf:heartbeat:IPAddr2):         Started node2
    * rsc_SAPStartSrv_S4H_ASCS20        (ocf:heartbeat:SAPStartSrv):     Started node2
    * rsc_SAPInstance_S4H_ASCS20        (ocf:heartbeat:SAPInstance):     Started node2
  * Resource Group: grp_S4H_ERS29:
    * rsc_vip_S4H_ERS29 (ocf:heartbeat:IPAddr2):         Started node1
    * rsc_SAPStartSrv_S4H_ERS29 (ocf:heartbeat:SAPStartSrv):     Started node1
    * rsc_SAPInstance_S4H_ERS29 (ocf:heartbeat:SAPInstance):     Started node1

The higher ASCS resource stickiness and the constraints you have configured earlier ensure that the ASCS instance stays in place to avoid the unnecessary disruption of the service again.

Next steps

Clear any failure notifications from the cluster that may be there from previous testing. For more information, see Cleaning up the failure history.

6.6. Crashing the node with the ERS instance
Link kopieren

Simulate the crash of the cluster node on which the ERS instance runs to test the behavior of your cluster resources.

Prerequisites

You have ensured that all cluster nodes are up and the resource groups for the ASCS and ERS are running on different nodes.
You have no failures in the cluster status.

Procedure

Trigger a crash on the node that runs the ERS instance, for example, node2. This immediately causes a kernel panic on the node, effectively simulating a system crash and the node becomes unresponsive.
The cluster’s fencing mechanism (STONITH) detects the failure and initiates recovery actions. Typically it fences the node and restarts any failed resources on a surviving cluster node.
The following command immediately causes a crash of the node on which you run the command, with no further warning:
```
[root]# echo c > /proc/sysrq-trigger
```

Verification

Check that the cluster on the other node fences the crashed node:

[root]# pcs stonith history
reboot of node2 successful: delegate=node1, client=pacemaker-controld.1426, origin=node1, completed=...
1 event found

Check that the cluster starts the ERS resources on the remaining node. In a 2-node cluster this leads to ASCS and ERS running on the same node:

[root]# pcs resource
  * Resource Group: grp_S4H_ASCS20:
    * rsc_vip_S4H_ASCS20        (ocf:heartbeat:IPAddr2):         Started node1
    * rsc_SAPStartSrv_S4H_ASCS20        (ocf:heartbeat:SAPStartSrv):     Started node1
    * rsc_SAPInstance_S4H_ASCS20        (ocf:heartbeat:SAPInstance):     Started node1
  * Resource Group: grp_S4H_ERS29:
    * rsc_vip_S4H_ERS29 (ocf:heartbeat:IPAddr2):         Started node1
    * rsc_SAPStartSrv_S4H_ERS29 (ocf:heartbeat:SAPStartSrv):     Started node1
    * rsc_SAPInstance_S4H_ERS29 (ocf:heartbeat:SAPInstance):     Started node1

Check that the cluster automatically moves the ERS instance to the previously failed node after the fenced node is running again:

[root]# pcs resource
* Resource Group: grp_S4H_ASCS20:
    * rsc_vip_S4H_ASCS20        (ocf:heartbeat:IPAddr2):         Started node1
    * rsc_SAPStartSrv_S4H_ASCS20        (ocf:heartbeat:SAPStartSrv):     Started node1
    * rsc_SAPInstance_S4H_ASCS20        (ocf:heartbeat:SAPInstance):     Started node1
  * Resource Group: grp_S4H_ERS29:
    * rsc_vip_S4H_ERS29 (ocf:heartbeat:IPAddr2):         Started node2
    * rsc_SAPStartSrv_S4H_ERS29 (ocf:heartbeat:SAPStartSrv):     Started node2
    * rsc_SAPInstance_S4H_ERS29 (ocf:heartbeat:SAPInstance):     Started node2

Next steps

Clear any failure notifications from the cluster that may be there from previous testing. For more information, see Cleaning up the failure history.

6.7. Testing failures of the ASCS or ERS instance in a 3-node cluster
Link kopieren

Test that the pacemaker cluster recovers the ASCS or the ERS instance on a different node after consecutive failures.

The constraints you have configured in the ENSA2 setup try to keep the ASCS and ERS instances on separate nodes and the cluster uses any extra node for the recovery.

In a cluster with more than 2 nodes, the recovery of a failed ASCS or ERS instance is similar. The following test demonstrates an example of a failing ASCS instance.

Prerequisites

You have configured your ASCS and ERS instance in an ENSA2 setup.
You have configured 3 or more cluster nodes in this cluster.
You have configured the additional cluster node(s) to be able to run the instances.
You have ensured that all cluster nodes are up and the resource groups for the ASCS and ERS are running on different nodes.
You have no failures in the cluster status.

Procedure

Identify the process ID (PID) of the enqueue server on the node where ASCS is running. Use sapcontrol… GetProcessList as user <sid>adm to find the PID as the last entry:

<sid>adm $ sapcontrol -nr <instance> -function GetProcessList
name, description, dispstatus, textstatus, starttime, elapsedtime, pid
enq_replicator, Enqueue Replicator 2, GREEN, Running, YYYY MM DD 13:20:07, 0:00:08, 161323

In the example, the enqueue server PID is 161323.

Send a SIGKILL signal to the identified process to kill it instantly, for example, ASCS on node1:
```
<sid>adm $ kill -9 <pid>
```
- Replace <pid> with the PID of the enqueue server, for example, 161323.
Check that the ASCS instance recovers on the same node. With the default migration-threshold of all resources set to 3, the cluster restarts the resource on the same node twice, before it recovers on a different node.
Repeat steps 1 and 2 until you have killed the process 3 times. The cluster recovers the resource on another node after 3 consecutive failures of the same resource.

Verification

Check that the ASCS instance is now running on the additional node:

[root]# pcs resource
  * Resource Group: grp_S4H_ASCS20:
    * rsc_vip_S4H_ASCS20        (ocf:heartbeat:IPAddr2):         Started node3
    * rsc_SAPStartSrv_S4H_ASCS20        (ocf:heartbeat:SAPStartSrv):     Started node3
    * rsc_SAPInstance_S4H_ASCS20        (ocf:heartbeat:SAPInstance):     Started node3
  * Resource Group: grp_S4H_ERS29:
    * rsc_vip_S4H_ERS29 (ocf:heartbeat:IPAddr2):         Started node2
    * rsc_SAPStartSrv_S4H_ERS29 (ocf:heartbeat:SAPStartSrv):     Started node2
    * rsc_SAPInstance_S4H_ERS29 (ocf:heartbeat:SAPInstance):     Started node2

In our example, the ASCS instance failed on node1 and has been moved to node3. The ERS instance stays in place on node2.

Check the fail count and notice that it has failed 3 times to trigger the recovery on the other node:
```
[root]# pcs resource failcount
Failcounts for resource 'rsc_SAPInstance_S4H_ASCS20'
  node1: 3
```

Next steps

Clear any failure notifications from the cluster that may be there from previous testing. For more information, see Cleaning up the failure history.

Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.

Chapter 6. Testing the setup

6.1. Moving the ASCS instance using cluster commands
Link kopieren

6.2. Manually moving the ASCS instance using sapcontrol and the SAP HA interface
Link kopieren

6.3. Testing failures of the ASCS instance
Link kopieren

6.4. Testing failures of the ERS instance
Link kopieren

6.5. Crashing the node with the ASCS instance
Link kopieren

6.6. Crashing the node with the ERS instance
Link kopieren

6.7. Testing failures of the ASCS or ERS instance in a 3-node cluster
Link kopieren

Lernen

Testen, kaufen und verkaufen

Communitys

Über Red Hat Dokumentation

Mehr Inklusion in Open Source

Über Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.

Chapter 6. Testing the setup

6.1. Moving the ASCS instance using cluster commandsLink kopierenLink in die Zwischenablage kopiert!

6.2. Manually moving the ASCS instance using sapcontrol and the SAP HA interfaceLink kopierenLink in die Zwischenablage kopiert!

6.3. Testing failures of the ASCS instanceLink kopierenLink in die Zwischenablage kopiert!

6.4. Testing failures of the ERS instanceLink kopierenLink in die Zwischenablage kopiert!

6.5. Crashing the node with the ASCS instanceLink kopierenLink in die Zwischenablage kopiert!

6.6. Crashing the node with the ERS instanceLink kopierenLink in die Zwischenablage kopiert!

6.7. Testing failures of the ASCS or ERS instance in a 3-node clusterLink kopierenLink in die Zwischenablage kopiert!

Lernen

Testen, kaufen und verkaufen

Communitys

Über Red Hat Dokumentation

Mehr Inklusion in Open Source

Über Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

6.1. Moving the ASCS instance using cluster commands
Link kopieren

6.2. Manually moving the ASCS instance using sapcontrol and the SAP HA interface
Link kopieren

6.3. Testing failures of the ASCS instance
Link kopieren

6.4. Testing failures of the ERS instance
Link kopieren

6.5. Crashing the node with the ASCS instance
Link kopieren

6.6. Crashing the node with the ERS instance
Link kopieren

6.7. Testing failures of the ASCS or ERS instance in a 3-node cluster
Link kopieren