Chapter 9. Maintenance procedures


You must apply specific steps to ensure that the cluster does not cause unplanned impact so that you can perform maintenance of different components of SAP HANA system replication HA environments.

Use maintenance procedures to keep your cluster in a healthy state during planned change activity or to restore the health after unplanned incidents.

9.1. Cleaning up the failure history

Clear any failure notifications from the cluster that may be there from previous testing. This resets the failure counters and the migration thresholds.

Procedure

  1. Clean up resource failures:

    [root]# pcs resource cleanup
  2. Clean up the STONITH failure history:

    [root]# pcs stonith history cleanup

Verification

  • Check the overall cluster status and confirm that no failures are displayed anymore:

    [root]# pcs status --full
  • Check that the stonith history for fencing actions has 0 events:

    [root]# pcs stonith history

Use the cluster control to execute a simple takeover of the primary site to the secondary site.

For detailed steps, refer to the section Testing the setup - Triggering a HANA takeover using cluster commands.

For updates or offline changes on the HA cluster, the operating system or even the system hardware, you must follow the Recommended Practices for Applying Software Updates to a RHEL High Availability or Resilient Storage Cluster.

For any kind of maintenance of applications or other components that the HA cluster manages, you must enable the cluster maintenance mode to prevent the cluster from any interference during the maintenance.

During the update of your HANA instances, the cluster remains running, but is not actively monitoring resources or taking any actions. After the change on the HANA instances is done, it is vital to refresh the cluster resource status and verify that the detected resource states are all correct. Only then you can safely disable the maintenance mode without unexpected cluster actions.

If you need to stop the cluster for the maintenance activity, ensure that you set maintenance mode first, then stop and start the cluster on the node as required for the HANA maintenance.

Prerequisites

  • You have configured the Pacemaker cluster to manage the HANA system replication.

Procedure

  1. Set maintenance mode for the entire cluster:

    [root]# pcs property set maintenance-mode=true

    Setting maintenance for the whole cluster ensures that no activity during the maintenance phase can trigger cluster actions and impact the HANA update process.

  2. Verify that the cluster resource management is fully disabled:

    [root]# pcs status
    ...
                  *** Resource management is DISABLED ***
      The cluster will not attempt to start, stop or recover services
    
    Node List:
      * Online: [ dc1hana1 dc1hana2 dc1hana3 dc1hana4 dc2hana1 dc2hana2 dc2hana3 dc2hana4 dc3mm ]
    
    Full List of Resources:
      * rsc_fence       (stonith:<fence agent>):     Started dc1hana2 (unmanaged)
      * Clone Set: cln_SAPHanaTop_RH1_HDB02 [rsc_SAPHanaTop_RH1_HDB02] (unmanaged):
        * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc2hana3 (unmanaged)
        * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc1hana1 (unmanaged)
        * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc1hana2 (unmanaged)
        * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc1hana3 (unmanaged)
        * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc2hana1 (unmanaged)
        * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc2hana2 (unmanaged)
        * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc1hana4 (unmanaged)
        * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc2hana4 (unmanaged)
        * Stopped: [ dc3mm ]
      * Clone Set: cln_SAPHanaCon_RH1_HDB02 [rsc_SAPHanaCon_RH1_HDB02] (promotable, unmanaged):
        * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc2hana3 (unmanaged)
        * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc1hana1 (unmanaged)
        * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc1hana2 (unmanaged)
        * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc1hana3 (unmanaged)
        * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc2hana1 (unmanaged)
        * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc2hana2 (unmanaged)
        * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc1hana4 (unmanaged)
        * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc2hana4 (unmanaged)
        * Stopped: [ dc3mm ]
      * rsc_vip_RH1_HDB02_primary   (ocf:heartbeat:IPaddr2):         Started dc1hana1 (unmanaged)
      * rsc_vip_RH1_HDB02_readonly  (ocf:heartbeat:IPaddr2):         Started dc2hana1 (unmanaged)
    
    ...
  3. Update the HANA instances using the SAP procedure. If you have to perform a takeover during the HANA update, you can use the SAP HANA Takeover with Handshake option. For more information see also Is it possible to use SAP HANA "Takeover with Handshake" option with the HA Solutions for managing HANA System Replication?.

    If you stop the cluster in this step, ensure that you start it again before you proceed with the next steps. Keep the maintenance mode enabled.

  4. After the HANA update, verify that the HANA system replication is working correctly. Use the systemReplicationStatus.py script to show the status of the HANA system replication on the primary site. Below is the example after a manual takeover to the secondary site during the maintenance:

    [root]# su - <sid>adm -c "HDBSettings.sh systemReplicationStatus.py \
    --sapcontrol=1 | grep -i replication_status="
    service/dc2hana2/30203/REPLICATION_STATUS=ACTIVE
    service/dc2hana3/30203/REPLICATION_STATUS=ACTIVE
    service/dc2hana1/30201/REPLICATION_STATUS=ACTIVE
    service/dc2hana1/30207/REPLICATION_STATUS=ACTIVE
    service/dc2hana1/30203/REPLICATION_STATUS=ACTIVE
    site/1/REPLICATION_STATUS=ACTIVE
    overall_replication_status=ACTIVE

    Before you proceed, ensure that the system replication is healthy and reported as ACTIVE.

  5. Refresh all cluster resources to execute one monitor operation and update their status:

    [root]# pcs resource refresh
    Waiting for 1 reply from the controller
    ... got reply (done)

    It is crucial that the HANA resources update cluster and node attributes to reflect the new HANA system replication status. It ensures that the cluster has the correct information and does not trigger recovery actions due to incorrect status information, after the maintenance stops.

  6. Check the cluster status and verify the resource status and main HANA resource score attribute. All resources must show as Started and the promotable resources must show as Unpromoted on all nodes:

    [root]# pcs status resources
     * Clone Set: cln_SAPHanaTop_RH1_HDB02 [rsc_SAPHanaTop_RH1_HDB02] (unmanaged):
        * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc1hana2 (unmanaged)
        * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc1hana3 (unmanaged)
        * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc1hana4 (unmanaged)
        * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc2hana2 (unmanaged)
        * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc2hana3 (unmanaged)
        * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc1hana1 (unmanaged)
        * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc2hana4 (unmanaged)
        * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started dc2hana1 (unmanaged)
        * Stopped: [ dc3mm ]
      * Clone Set: cln_SAPHanaCon_RH1_HDB02 [rsc_SAPHanaCon_RH1_HDB02] (promotable, unmanaged):
        * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc1hana2 (unmanaged)
        * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc1hana3 (unmanaged)
        * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc1hana4 (unmanaged)
        * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc2hana2 (unmanaged)
        * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc2hana3 (unmanaged)
        * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Promoted dc1hana1 (unmanaged)
        * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc2hana4 (unmanaged)
        * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted dc2hana1 (unmanaged)
        * Stopped: [ dc3mm ]
      * rsc_vip_RH1_HDB02_primary   (ocf:heartbeat:IPaddr2):         Started dc1hana1 (unmanaged)
      * rsc_vip_RH1_HDB02_readonly  (ocf:heartbeat:IPaddr2):         Started dc2hana1 (unmanaged)
  7. Check the cluster attributes and verify that at least the sync_state attribute is SOK:

    [root]# SAPHanaSR-showAttr
    Global cib-time                 maintenance prim sec srHook sync_state upd
    ---------------------------------------------------------------------------
    RH1    Fri Dec 19 09:41:39 2025 true        DC2  DC1 SOK    SOK        ok
    
    …

    Depending on the maintenance activity, the rest of the attribute information can be different or empty, for example, when you stopped and restarted the cluster on all nodes.

  8. When the checks of the previous steps show the landscape in the expected healthy state, you can remove the maintenance mode of the cluster again:

    [root]# pcs property set maintenance-mode=

    When you lift the maintenance it triggers a monitor run of all resources again. The cluster updates the status of the promotable resources to Promoted and Unpromoted in correspondence to the location of the primary and secondary instances. The resources now also update the srPoll attribute again to match the srHook attribute value.

Verification

  • Check that the resources are managed again and are in the expected state on all nodes:

    [root]# pcs resource status
      * Clone Set: cln_SAPHanaTop_RH1_HDB02 [rsc_SAPHanaTop_RH1_HDB02]:
        * Started: [ dc1hana1 dc1hana2 dc1hana3 dc1hana4 dc2hana1 dc2hana2 dc2hana3 dc2hana4 ]
      * Clone Set: cln_SAPHanaCon_RH1_HDB02 [rsc_SAPHanaCon_RH1_HDB02] (promotable):
        * Promoted: [ dc1hana1 ]
        * Unpromoted: [ dc1hana2 dc1hana3 dc1hana4 dc2hana1 dc2hana2 dc2hana3 dc2hana4 ]
      * rsc_vip_RH1_HDB02_primary   (ocf:heartbeat:IPaddr2):         Started dc1hana1
      * rsc_vip_RH1_HDB02_readonly  (ocf:heartbeat:IPaddr2):         Started dc2hana1
  • Verify that the srHook attribute value is SOK and that the nodes have the expected score values assigned:

    [root]# SAPHanaSR-showAttr
    Global cib-time                 maintenance prim sec srHook sync_state upd
    ---------------------------------------------------------------------------
    RH1    Fri Dec 19 09:41:39 2025 true        DC2  DC1 SOK    SOK        ok
    
    Sites lpt        lss mns      srr
    ----------------------------------
    DC1   1766139865 4   dc1hana1 P
    DC2   30         4   dc2hana1 S
    
    Hosts    clone_state gra node_state roles                         score  site
    ------------------------------------------------------------------------------
    dc1hana1 PROMOTED    2.0 online     master1:master:worker:master  150    DC1
    dc1hana2 DEMOTED     2.0 online     master2:slave:worker:slave    140    DC1
    dc1hana3 DEMOTED     2.0 online     slave:slave:worker:slave      -10000 DC1
    dc1hana4 DEMOTED     2.0 online     master3:slave:standby:standby 140    DC1
    dc2hana1 DEMOTED     2.0 online     master1:master:worker:master  100    DC2
    dc2hana2 DEMOTED     2.0 online     master2:slave:worker:slave    80     DC2
    dc2hana3 DEMOTED     2.0 online     slave:slave:worker:slave      -12200 DC2
    dc2hana4 DEMOTED     2.0 online     master3:slave:standby:standby 80     DC2
    dc3mm                    online

Troubleshooting

If the srHook attribute value is SFAIL at the end of the maintenance, then the scores for the secondary site nodes are reduced and prevent the cluster from triggering a takeover in the case of a failure. To review and fix this, see The srHook attribute is SFAIL while the system replication is healthy.

When you configure AUTOMATED_REGISTER=false in the SAPHanaController resource, which is the default, you must manually register the former primary site as the new secondary after takeover and start it. Otherwise, the unregistered site remains stopped.

Procedure

  1. Register the former primary site as the new secondary site. Run as user <sid>adm on one stopped former primary instance:

    rh1adm$ hdbnsutil -sr_register --remoteHost=<node> \
    --remoteInstance=${TINSTANCE} --replicationMode=sync \
    --operationMode=logreplay --name=<site>
    • Replace <node> with the new primary instance host, for example, dc2hana1 if there was a takeover from dc1hana1 to dc2hana1.
    • Replace <site> with your new secondary HANA site name, for example, DC1 if dc1hana1 is to be registered as a secondary.
    • Choose the values for replicationMode and operationMode according to your requirements for the system replication.
    • $TINSTANCE is an environment variable that is set automatically for user <sid>adm by reading the HANA instance profile. The variable value is the HANA instance number.
  2. Start the secondary HANA site. Run as <sid>adm on one new secondary instance node:

    rh1adm$ sapcontrol -nr ${TINSTANCE} -function StartSystem HDB

Verification

  • On one new primary instance, show the current status of the re-established HANA system replication. Below is the example after a takeover from dc1hana1 to dc2hana1 and DC2 is the new primary site:

    rh1adm$ cdpy; python systemReplicationStatus.py
    |Database |Host     |Port  |Service Name |Volume ID |Site ID |Site Name |Secondary |Secondary |Secondary |Secondary |Secondary     |Replication |Replication |Replication    |Secondary    |
    |         |         |      |             |          |        |          |Host      |Port      |Site ID   |Site Name |Active Status |Mode        |Status      |Status Details |Fully Synced |
    |-------- |-------- |----- |------------ |--------- |------- |--------- |--------- |--------- |--------- |--------- |------------- |----------- |----------- |-------------- |------------ |
    |RH1      |dc2hana2 |30203 |indexserver  |        4 |      2 |DC2       |dc1hana2  |    30203 |        1 |DC1       |YES           |SYNC        |ACTIVE      |               |        True |
    |RH1      |dc2hana3 |30203 |indexserver  |        5 |      2 |DC2       |dc1hana3  |    30203 |        1 |DC1       |YES           |SYNC        |ACTIVE      |               |        True |
    |SYSTEMDB |dc2hana1 |30201 |nameserver   |        1 |      2 |DC2       |dc1hana1  |    30201 |        1 |DC1       |YES           |SYNC        |ACTIVE      |               |        True |
    |RH1      |dc2hana1 |30207 |xsengine     |        2 |      2 |DC2       |dc1hana1  |    30207 |        1 |DC1       |YES           |SYNC        |ACTIVE      |               |        True |
    |RH1      |dc2hana1 |30203 |indexserver  |        3 |      2 |DC2       |dc1hana1  |    30203 |        1 |DC1       |YES           |SYNC        |ACTIVE      |               |        True |
    
    status system replication site "1": ACTIVE
    overall system replication status: ACTIVE
    
    Local System Replication State
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    mode: PRIMARY
    site id: 2
    site name: DC2
Red Hat logoGithubredditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust. Explore our recent updates.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

Theme

© 2026 Red Hat
Back to top