このコンテンツは選択した言語では利用できません。

Chapter 5. Testing the cluster configuration


Before the HA cluster setup is put in production, it is recommended to perform the following tests to ensure that the HA cluster setup works as expected.

These tests should also be repeated later on as part of regular HA/DR drills to ensure that the cluster still works as expected and that admins stay familiar with the procedures required to bring the setup back to a healthy state in case an issue occurs during normal operation, or if manual maintenance of the setup is required.

5.1. Manually moving ASCS instance using pcs command

To verify that the pacemaker cluster is able to move the instances to the other HA cluster node on demand.

  • Test Preconditions

    • Both cluster nodes are up, with the resource groups for the ASCS and ERS running on different HA cluster nodes:

        * Resource Group: S4H_ASCS20_group:
          * S4H_lvm_ascs20    (ocf:heartbeat:LVM-activate):    Started node1
          * S4H_fs_ascs20     (ocf:heartbeat:Filesystem): Started node1
          * S4H_vip_ascs20    (ocf:heartbeat:IPaddr2):         Started node1
          * S4H_ascs20        (ocf:heartbeat:SAPInstance):     Started node1
        * Resource Group: S4H_ERS29_group:
          * S4H_lvm_ers29     (ocf:heartbeat:LVM-activate):    Started node2
          * S4H_fs_ers29 (ocf:heartbeat:Filesystem): Started node2
          * S4H_vip_ers29     (ocf:heartbeat:IPaddr2):         Started node2
          * S4H_ers29 (ocf:heartbeat:SAPInstance):     Started node2
      Copy to Clipboard Toggle word wrap
    • All failures for the resources and resource groups have been cleared and the failcounts have been reset.
  • Test Procedure

    • Run the following command from any node to initiate the move of the ASCS instance to the other HA cluster node:

      [root@node1]# pcs resource move S4H_ascs20
      Copy to Clipboard Toggle word wrap
  • Monitoring

    • Run the following command in a separate terminal during the test:

      [root@node2]# watch -n 1 pcs status
      Copy to Clipboard Toggle word wrap
  • Expected behavior

    • The ASCS resource group is moved to the other node.
    • The ERS resource group stops after that and moves to the node where the ASCS resource group was running before.
  • Test Result

    • ASCS resource group moves to other node, in this scenario node node2 and ERS resource group moves to node node1:

        * Resource Group: S4H_ASCS20_group:
          * S4H_lvm_ascs20 (ocf:heartbeat:LVM-activate): Started node2
          * S4H_fs_ascs20 (ocf:heartbeat:Filesystem): Started node2
          * S4H_vip_ascs20 (ocf:heartbeat:IPaddr2): Started node2
          * S4H_ascs20 (ocf:heartbeat:SAPInstance): Started node2
        * Resource Group: S4H_ERS29_group:
          * S4H_lvm_ers29 (ocf:heartbeat:LVM-activate): Started node1
          * S4H_fs_ers29 (ocf:heartbeat:Filesystem): Started node1
          * S4H_vip_ers29 (ocf:heartbeat:IPaddr2): Started node1
          * S4H_ers29 (ocf:heartbeat:SAPInstance): Started node1
      Copy to Clipboard Toggle word wrap
  • Recovery Procedure:

    • Remove the location constraints, if any:

      [root@node1]# pcs resource clear S4H_ascs20
      Copy to Clipboard Toggle word wrap

5.2. Manually moving of the ASCS instance using sapcontrol (with SAP HA interface enabled)

To verify that the sapcontrol command is able to move the instances to the other HA cluster node when the SAP HA interface is enabled for the instance.

  • Test Preconditions

    • The SAP HA interface is enabled for the SAP instance.
    • Both cluster nodes are up with the resource groups for the ASCS and ERS running.

      [root@node2: ~]# pcs status | egrep -e "S4H_ascs20|S4H_ers29"
           * S4H_ascs20 (ocf:heartbeat:SAPInstance): Started node2
           * S4H_ers29 (ocf:heartbeat:SAPInstance): Started node1
      Copy to Clipboard Toggle word wrap
    • All failures for the resources and resource groups have been cleared and the failcounts have been reset.
  • Test Procedure

    • As the <sid>adm user, run the HAFailoverToNode function of sapcontrol to move the ASCS instance to the other node.
  • Monitoring

    • Run the following command in a separate terminal during the test:

      [root@node2]# watch -n 1 pcs status
      Copy to Clipboard Toggle word wrap
  • Expected behavior

    • ASCS instances should move to the other HA cluster node, creating a temporary location constraint for the move to complete.
  • Test

    [root@node2]# su - s4hadm
    node2:s4hadm 52> sapcontrol -nr 20 -function HAFailoverToNode ""
    
    06.12.2023 12:57:04
    HAFailoverToNode
    OK
    Copy to Clipboard Toggle word wrap
  • Test result

    • ASCS and ERS both move to the other node:

      [root@node2]# pcs status | egrep -e "S4H_ascs20|S4H_ers29"
          * S4H_ascs20 (ocf:heartbeat:SAPInstance): Started node1
          * S4H_ers29 (ocf:heartbeat:SAPInstance): Started node2
      Copy to Clipboard Toggle word wrap
    • Constraints are created as shown below:

      [root@node1]# pcs constraint
      Location Constraints:
        Resource: S4H_ASCS20_group
          Constraint: cli-ban-S4H_ASCS20_group-on-node2
            Rule: boolean-op=and score=-INFINITY
              Expression: #uname eq string node1
              Expression: date lt xxxx-xx-xx xx:xx:xx +xx:xx
      Copy to Clipboard Toggle word wrap
  • Recovery Procedure

    • The constraint shown above is cleared automatically when the date lt mentioned in the Expression is reached.
    • Alternatively, the constraint can be removed with the following command:

      [root@node1]# pcs resource clear S4H_ascs20
      Copy to Clipboard Toggle word wrap

5.3. Testing failure of the ASCS instance

To verify that the pacemaker cluster takes necessary action when the enqueue server of the ASCS instance or the whole ASCS instance fails.

  • Test Preconditions

    • Both cluster nodes are up with the resource groups for the ASCS and ERS running:

      [root@node2]# pcs status | egrep -e "S4H_ascs20|S4H_ers29"
          * S4H_ascs20 (ocf:heartbeat:SAPInstance): Started node1
          * S4H_ers29 (ocf:heartbeat:SAPInstance): Started node2
      Copy to Clipboard Toggle word wrap
    • All failures for the resources and resource groups have been cleared and the failcounts have been reset.
  • Test Procedure

    • Identify the PID of the enqueue server on the node where ASCS is running.
    • Send a SIGKILL signal to the identified process.
  • Monitoring

    • Run the following command in a separate terminal during the test:

      [root@node2]# watch -n 1 pcs status
      Copy to Clipboard Toggle word wrap
  • Expected behavior

    • Enqueue server process gets killed.
    • The pacemaker cluster takes the required action as per configuration, in this case moving the ASCS to the other node.
  • Test

    • Switch to the <sid>adm user on the node where ASCS is running:

      [root@node1]# su - s4hadm
      Copy to Clipboard Toggle word wrap
    • Identify the PID of en.sap(NetWeaver) enq.sap(S/4HANA):

      node1:s4hadm 51> pgrep -af "(en|enq).sap"
      31464 enq.sapS4H_ASCS20 pf=/usr/sap/S4H/SYS/profile/S4H_ASCS20_s4ascs
      Copy to Clipboard Toggle word wrap
    • Kill the identified process:

      node1:s4hadm 52> kill -9 31464
      Copy to Clipboard Toggle word wrap
    • Notice the cluster Failed Resource Actions:

      [root@node2]# pcs status | grep "Failed Resource Actions" -A1
      Failed Resource Actions:
        * S4H_ascs20 2m-interval monitor on node1 returned 'not running' at Wed Dec  6 15:37:24 2023
      Copy to Clipboard Toggle word wrap
    • ASCS and ERS move to the other node:

      [root@node2]# pcs status | egrep -e "S4H_ascs20|S4H_ers29"
          * S4H_ascs20 (ocf:heartbeat:SAPInstance): Started node2
          * S4H_ers29 (ocf:heartbeat:SAPInstance): Started node1
        * S4H_ascs20 2m-interval monitor on node1 returned 'not running' at Wed Dec  6 15:37:24 2023
      Copy to Clipboard Toggle word wrap
  • Recovery Procedure

    • Clear the failed action:

      [root@node2]# pcs resource cleanup S4H_ascs20
      …
      Waiting for 1 reply from the controller
      ... got reply (done)
      Copy to Clipboard Toggle word wrap

5.4. Testing failure of the ERS instance

To verify that the pacemaker cluster takes necessary action when the enqueue replication server (ERS) of the ASCS instance fails.

  • Test Preconditions

    • Both cluster nodes are up with the resource groups for the ASCS and ERS running:

      [root@node1]# pcs status | egrep -e "S4H_ascs20|S4H_ers29"
          * S4H_ascs20	(ocf:heartbeat:SAPInstance):	 Started node2
          * S4H_ers29	(ocf:heartbeat:SAPInstance):	 Started node1
      Copy to Clipboard Toggle word wrap
    • All failures for the resources and resource groups have been cleared and the failcounts have been reset.
  • Test Procedure

    • Identify the PID of the enqueue replication server process on the node where the ERS instance is running.
    • Send a SIGKILL signal to the identified process.
  • Monitoring

    • Run the following command in a separate terminal during the test:

      [root@node2]# watch -n 1 pcs status
      Copy to Clipboard Toggle word wrap
  • Expected behavior

    • Enqueue Replication server process gets killed.
    • Pacemaker cluster takes the required action as per configuration, in this case, restarting the ERS instance on the same node.
  • Test

    • Switch to the <sid>adm user:

      [root@node1]# su - s4hadm
      Copy to Clipboard Toggle word wrap
    • Identify the PID of enqr.sap:

      node1:s4hadm 56> pgrep -af enqr.sap
      532273 enqr.sapS4H_ERS29 pf=/usr/sap/S4H/SYS/profile/S4H_ERS29_s4ers
      Copy to Clipboard Toggle word wrap
    • Kill the identified process:

      node1:s4hadm 58> kill -9 532273
      Copy to Clipboard Toggle word wrap
    • Notice the cluster “Failed Resource Actions”:

      [root@node1]# pcs status | grep "Failed Resource Actions" -A1
      Failed Resource Actions:
        * S4H_ers29 2m-interval monitor on node1 returned 'not running' at Thu Dec  7 13:15:02 2023
      Copy to Clipboard Toggle word wrap
    • ERS restarts on the same node without disturbing the ASCS already running on the other node:

      [root@node1]# pcs status | egrep -e "S4H_ascs20|S4H_ers29"
          * S4H_ascs20	(ocf:heartbeat:SAPInstance):	 Started node2
          * S4H_ers29	(ocf:heartbeat:SAPInstance):	 Started node1
        * S4H_ers29 2m-interval monitor on node1 returned 'not running' at Thu Dec  7 13:15:02 2023
      Copy to Clipboard Toggle word wrap
  • Recovery Procedure

    • Clear the failed action:

      [root@node1]# pcs resource cleanup S4H_ers29
      …
      Waiting for 1 reply from the controller
      ... got reply (done)
      Copy to Clipboard Toggle word wrap

5.5. Failover of ASCS instance due to node crash

To verify that the ASCS instance moves correctly in case of a node crash.

  • Test Preconditions

    • Both cluster nodes are up with the resource groups for the ASCS and ERS running:

      [root@node1]# pcs status | egrep -e "S4H_ascs20|S4H_ers29"
          * S4H_ascs20	(ocf:heartbeat:SAPInstance):	 Started node2
          * S4H_ers29	(ocf:heartbeat:SAPInstance):	 Started node1
      Copy to Clipboard Toggle word wrap
    • All failures for the resources and resource groups have been cleared and the failcounts have been reset.
  • Test Procedure

    • Crash the node where ASCS is running.
  • Monitoring

    • Run the following command in a separate terminal on the other node during the test:

      [root@node1]# watch -n 1 pcs status
      Copy to Clipboard Toggle word wrap
  • Expected behavior

    • Node where ASCS is running gets crashed and shuts down or restarts as per configuration.
    • Meanwhile ASCS moves to the other node.
    • ERS starts on the previously crashed node, after it comes back online.
  • Test

    • Run the following command as the root user on the node where ASCS is running:

      [root@node2]# echo c > /proc/sysrq-trigger
      Copy to Clipboard Toggle word wrap
    • ASCS moves to the other node:

      [root@node1]# pcs status | egrep -e "S4H_ascs20|S4H_ers29"
          * S4H_ascs20	(ocf:heartbeat:SAPInstance):	 Started node1
          * S4H_ers29	(ocf:heartbeat:SAPInstance):	 Started node1
      Copy to Clipboard Toggle word wrap
    • ERS stops and moves to the previously crashed node once it comes back online:

      [root@node1]# pcs status | egrep -e "S4H_ascs20|S4H_ers29"
          * S4H_ascs20	(ocf:heartbeat:SAPInstance):	 Started node1
          * S4H_ers29	(ocf:heartbeat:SAPInstance):	 Stopped
      
      
      [root@node1]# pcs status | egrep -e "S4H_ascs20|S4H_ers29"
          * S4H_ascs20	(ocf:heartbeat:SAPInstance):	 Started node1
          * S4H_ers29	(ocf:heartbeat:SAPInstance):	 Started node2
      Copy to Clipboard Toggle word wrap
  • Recovery Procedure

    • Clean up failed actions, if any:

      [root@node1]# pcs resource cleanup
      Copy to Clipboard Toggle word wrap

5.6. Failure of ERS instance due to node crash

To verify that the ERS instance restarts on the same node.

  • Test Preconditions

    • Both cluster nodes are up with the resource groups for the ASCS and ERS running:

      [root@node1]# pcs status | egrep -e "S4H_ascs20|S4H_ers29"
          * S4H_ascs20	(ocf:heartbeat:SAPInstance):	 Started node1
          * S4H_ers29	(ocf:heartbeat:SAPInstance):	 Started node2
      Copy to Clipboard Toggle word wrap
    • All failures for the resources and resource groups have been cleared and the failcounts have been reset.
  • Test Procedure

    • Crash the node where ERS is running.
  • Monitoring

    • Run the following command in a separate terminal on the other node during the test:

      [root@nod1]# watch -n 1 pcs status
      Copy to Clipboard Toggle word wrap
  • Expected behavior

    • Node where ERS is running gets crashed and shuts down or restarts as per configuration.
    • Meanwhile ASCS continues to run to the other node. ERS restarts on the crashed node, after it comes back online.
  • Test

    • Run the following command as the root user on the node where ERS is running:

      [root@node2]# echo c > /proc/sysrq-trigger
      Copy to Clipboard Toggle word wrap
    • ERS restarts on the crashed node, after it comes back online, without disturbing the ASCS instance throughout the test:

      [root@node1]# pcs status | egrep -e "S4H_ascs20|S4H_ers29"
          * S4H_ascs20	(ocf:heartbeat:SAPInstance):	 Started node1
          * S4H_ers29	(ocf:heartbeat:SAPInstance):	 Started node2
      Copy to Clipboard Toggle word wrap
  • Recovery Procedure

    • Clean up failed actions if any:

      [root@node2]# pcs resource cleanup
      Copy to Clipboard Toggle word wrap

5.7. Failure of ASCS Instance due to node crash (ENSA2)

In case of 3 node ENSA 2 cluster environment, the third node is considered during failover events of any instance.

  • Test Preconditions

    • A 3 node SAP S/4HANA cluster with the resource groups for the ASCS and ERS running.
    • The 3rd node has access to all the file systems and can provision the required instance specific IP addresses the same way as the first 2 nodes.
    • In the example setup, the underlying shared NFS filesystems are as follows:

      Node List:
        * Online: [ node1 node2 node3 ]
      
      Active Resources:
        * s4r9g2_fence        (stonith:fence_rhevm):   Started node1
        * Clone Set: s4h_fs_sapmnt-clone [fs_sapmnt]:
          * Started: [ node1 node2 node3 ]
        * Clone Set: s4h_fs_sap_trans-clone [fs_sap_trans]:
          * Started: [ node1 node2 node3 ]
        * Clone Set: s4h_fs_sap_SYS-clone [fs_sap_SYS]:
          * Started: [ node1 node2 node3 ]
        * Resource Group: S4H_ASCS20_group:
          * S4H_lvm_ascs20    (ocf:heartbeat:LVM-activate):    Started node1
          * S4H_fs_ascs20     (ocf:heartbeat:Filesystem):	 Started node1
          * S4H_vip_ascs20    (ocf:heartbeat:IPaddr2):         Started node1
          * S4H_ascs20        (ocf:heartbeat:SAPInstance):     Started node1
        * Resource Group: S4H_ERS29_group:
          * S4H_lvm_ers29     (ocf:heartbeat:LVM-activate):    Started node2
          * S4H_fs_ers29	(ocf:heartbeat:Filesystem):	 Started node2
          * S4H_vip_ers29     (ocf:heartbeat:IPaddr2):         Started node2
          * S4H_ers29 (ocf:heartbeat:SAPInstance):     Started node2
      Copy to Clipboard Toggle word wrap
    • All failures for the resources and resource groups have been cleared and the failcounts have been reset.
  • Test Procedure

    • Crash the node where ASCS is running.
  • Monitoring

    • Run the following command in a separate terminal on one of the nodes where the ASCS group is currently not running during the test:

      [root@node2]# watch -n 1 pcs status
      Copy to Clipboard Toggle word wrap
  • Expected behavior

    • ASCS moves to the 3rd node.
    • ERS continues to run on the same node where it is already running.
  • Test

    • Crash the node where the ASCS group is currently running:

      [root@node1]# echo c > /proc/sysrq-trigger
      Copy to Clipboard Toggle word wrap
    • ASCS moves to the 3rd node without disturbing the already running ERS instance on 2nd node:

      [root@node2]# pcs status | egrep -e "S4H_ascs20|S4H_ers29"
          * S4H_ascs20	(ocf:heartbeat:SAPInstance):	 Started node3
          * S4H_ers29	(ocf:heartbeat:SAPInstance):	 Started node2
      Copy to Clipboard Toggle word wrap
  • Recovery Procedure

    • Clean up failed actions if any:

      [root@node2]# pcs resource cleanup
      Copy to Clipboard Toggle word wrap
トップに戻る
Red Hat logoGithubredditYoutubeTwitter

詳細情報

試用、購入および販売

コミュニティー

Red Hat ドキュメントについて

Red Hat をお使いのお客様が、信頼できるコンテンツが含まれている製品やサービスを活用することで、イノベーションを行い、目標を達成できるようにします。 最新の更新を見る.

多様性を受け入れるオープンソースの強化

Red Hat では、コード、ドキュメント、Web プロパティーにおける配慮に欠ける用語の置き換えに取り組んでいます。このような変更は、段階的に実施される予定です。詳細情報: Red Hat ブログ.

会社概要

Red Hat は、企業がコアとなるデータセンターからネットワークエッジに至るまで、各種プラットフォームや環境全体で作業を簡素化できるように、強化されたソリューションを提供しています。

Theme

© 2026 Red Hat