Chapter 8. Maintenance procedures

You must apply specific steps to ensure that the cluster does not cause unplanned impact so that you can perform maintenance of different components ofSAP HANA system replication HA environments.

Use maintenance procedures to keep your cluster in a healthy state during planned change activity or to restore the health after unplanned incidents.

8.1. Cleaning up the failure history
Copy link

Clear any failure notifications from the cluster that may be there from previous testing. This resets the failure counters and the migration thresholds.

Procedure

Clean up resource failures:
```
[root]# pcs resource cleanup
```
Clean up the STONITH failure history:
```
[root]# pcs stonith history cleanup
```

Verification

Check the overall cluster status and confirm that no failures are displayed anymore:
```
[root]# pcs status --full
```
Check that the stonith history for fencing actions has 0 events:

[root]# pcs stonith history

8.2. Triggering a HANA takeover using cluster commands
Copy link

Use the cluster control to execute a simple takeover of the primary instance to the other node.

For detailed steps, refer to the section Testing the setup - Triggering a HANA takeover using cluster commands.

8.3. Updating the operating system and HA cluster components
Copy link

For updates or offline changes on the HA cluster, the operating system or even the system hardware, you must follow the Recommended Practices for Applying Software Updates to a RHEL High Availability or Resilient Storage Cluster.

8.4. Performing maintenance on the SAP HANA instances
Copy link

For any kind of maintenance of applications or other components that the HA cluster manages, you must enable the cluster maintenance mode to prevent the cluster from any interference during the maintenance.

During the update of your HANA instances, the cluster remains running, but is not actively monitoring resources or taking any actions. After the change on the HANA instance is done, it is vital to refresh the cluster resource status and verify that the detected resource states are all correct. Only then can the maintenance mode be disabled again without unexpected cluster actions.

If you need to stop the cluster for the maintenance activity, ensure that you set maintenance mode first, then stop and start the cluster on the node as required for the HANA maintenance.

Prerequisites

You have configured the Pacemaker cluster to manage the HANA system replication.

Procedure

Set maintenance mode for the entire cluster:
```
[root]# pcs property set maintenance-mode=true
```
Setting maintenance for the whole cluster ensures that no activity during the maintenance phase can trigger cluster actions and impact the HANA update process.

Verify that the cluster resource management is fully disabled:

[root]# pcs status
...

              *** Resource management is DISABLED ***
  The cluster will not attempt to start, stop or recover services

Node List:
  * Online: [ node1 node2 ]

Full List of Resources:
  * Clone Set: cln_SAPHanaTop_RH1_HDB02 [rsc_SAPHanaTop_RH1_HDB02] (maintenance):
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started node2 (maintenance)
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started node1 (maintenance)
  * Clone Set: cln_SAPHanaCon_RH1_HDB02 [rsc_SAPHanaCon_RH1_HDB02] (promotable, maintenance):
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted node2 (maintenance)
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Promoted node1 (maintenance)
  * Clone Set: cln_SAPHanaFil_RH1_HDB02 [rsc_SAPHanaFil_RH1_HDB02] (maintenance):
    * rsc_SAPHanaFil_RH1_HDB02  (ocf:heartbeat:SAPHanaFilesystem):       Started node2 (maintenance)
    * rsc_SAPHanaFil_RH1_HDB02  (ocf:heartbeat:SAPHanaFilesystem):       Started node1 (maintenance)
  * rsc_vip_RH1_HDB02_primary   (ocf:heartbeat:IPaddr2):         Started node1 (maintenance)
  * rsc_vip_RH1_HDB02_readonly  (ocf:heartbeat:IPaddr2):         Started node2 (maintenance)

...

Update the HANA instances using the SAP procedure. If you have to perform a takeover during the HANA update, you can use the SAP HANA Takeover with Handshake option. For more information see also Is it possible to use SAP HANA "Takeover with Handshake" option with the HA Solutions for managing HANA System Replication?.
If you stop the cluster in this step, ensure that you start it again before you proceed with the next steps. Keep the maintenance mode enabled.
After the HANA update, verify that the HANA system replication is working correctly. Use the systemReplicationStatus.py script to show the status of the HANA system replication on the primary instance. Below is the example of post manual takeover to node2, during the maintenance:
```
[root]# su - <sid>adm

rh1adm $ cdpy; python systemReplicationStatus.py --sapcontrol=1 | grep -i replication_status=
service/node2/30201/REPLICATION_STATUS=ACTIVE
service/node2/30207/REPLICATION_STATUS=ACTIVE
service/node2/30203/REPLICATION_STATUS=ACTIVE
site/1/REPLICATION_STATUS=ACTIVE
overall_replication_status=ACTIVE
```
Before you proceed, ensure that the system replication is healthy and reported as ACTIVE.
Refresh all cluster resources to execute one monitor operation and update their status:
```
[root]# pcs resource refresh
Waiting for 1 reply from the controller
... got reply (done)
```
It is crucial that the HANA resources update cluster and node attributes, to reflect the new HANA system replication status. It ensures that the cluster has the correct information and does not trigger recovery actions due to incorrect status information, after the maintenance stops.

Check the cluster status and verify the resource status and main HANA resource score attribute. All resources must show as Started and the promotable resources must show as Unpromoted on all nodes:

[root]# pcs status
...
Full List of Resources:
  * Clone Set: cln_SAPHanaTop_RH1_HDB02 [rsc_SAPHanaTop_RH1_HDB02] (maintenance):
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started node2 (maintenance)
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started node1 (maintenance)
  * Clone Set: cln_SAPHanaCon_RH1_HDB02 [rsc_SAPHanaCon_RH1_HDB02] (promotable, maintenance):
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted node2 (maintenance)
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted node1 (maintenance)
  * Clone Set: cln_SAPHanaFil_RH1_HDB02 [rsc_SAPHanaFil_RH1_HDB02] (maintenance):
    * rsc_SAPHanaFil_RH1_HDB02  (ocf:heartbeat:SAPHanaFilesystem):       Started node2 (maintenance)
    * rsc_SAPHanaFil_RH1_HDB02  (ocf:heartbeat:SAPHanaFilesystem):       Started node1 (maintenance)
  * rsc_vip_RH1_HDB02_primary   (ocf:heartbeat:IPaddr2):         Started node1 (maintenance)
  * rsc_vip_RH1_HDB02_readonly  (ocf:heartbeat:IPaddr2):         Started node2 (maintenance)

...

Check the cluster attributes and verify that srHook, roles and score attributes are in the correct new state:

[root]# SAPHanaSR-showAttr
...

Site lpt        lss mns   opMode    srHook srMode srPoll srr
-------------------------------------------------------------
DC2  1746793335 4   node2 logreplay PRIM   sync   SOK    S
DC1  30         4   node1 logreplay SOK    sync   PRIM   P

Host  clone_state roles                        score site sra srah version     vhost
-------------------------------------------------------------------------------------
node1 DEMOTED     master1:master:worker:master 100   DC1  -   -    2.00.078.00 node1
node2 DEMOTED     master1:master:worker:master 150   DC2  -   -    2.00.078.00 node2

The srHook is PRIM on the node that is running the primary instance and it shows SOK on the correct secondary.
The score is 150 for the node where the primary is running and is 100 on the other.

When the checks of steps 6 and 7 show the landscape in the expected healthy state, you can remove the maintenance mode of the cluster again:
```
[root]# pcs property set maintenance-mode=
```
When you lift the maintenance it triggers a monitor run of all resources again. The cluster updates the status of the promotable resources to Promoted and Unpromoted in correspondence to the location of the primary and secondary instances. The resources now also update the srPoll attribute again to match the srHook attribute value.

8.5. Registering the former primary after a takeover
Copy link

When you configure AUTOMATED_REGISTER=false in the SAPHanaController resource, which is the default, you must manually register the former primary instance as the new secondary after takeover and start it. Otherwise, the unregistered instance remains stopped.

Procedure

Register the former primary as the new secondary. Run as user <sid>adm on the stopped former primary instance:
```
rh1adm $ hdbnsutil -sr_register --remoteHost=<node> \
--remoteInstance=${TINSTANCE} --replicationMode=sync \
--operationMode=logreplay --name=<DC>
```
- Replace <node> with the new primary instance host, for example node2 if there was a takeover from node1 to node2.
- Replace <DC> with your new secondary HANA site name, for example DC1 if node1 is to be registered as a secondary.
- Choose the values for replicationMode and operationMode according to your requirements for the system replication.
- TINSTANCE is an environment variable that is set automatically for user <sid>adm by reading the HANA instance profile. The variable value is the HANA instance number.
Start the secondary HANA instance. Run as <sid>adm on the new secondary instance node:
```
rh1adm $ HDB start
```

Verification

On the current primary instance, show the current status of the re-established HANA system replication. Below is the example after a takeover and the secondary instance is re-registered on node1:

rh1adm $ cdpy; python systemReplicationStatus.py
|Database |Host     |Port  |Service Name |Volume ID |Site ID |Site Name |Secondary |Secondary |Secondary |Secondary |Secondary     |Replication |Replication |Replication    |Secondary    |
|         |         |      |             |          |        |          |Host      |Port      |Site ID   |Site Name |Active Status |Mode        |Status      |Status Details |Fully Synced |
|-------- |-------- |----- |------------ |--------- |------- |--------- |----------|--------- |--------- |--------- |------------- |----------- |----------- |-------------- |------------ |
|SYSTEMDB |node2    |30001 |nameserver   |        1 |      2 |DC2       |node1     |    30001 |        1 |DC1       |YES           |SYNC        |ACTIVE      |               |        True |
|RH1      |node2    |30007 |xsengine     |        2 |      2 |DC2       |node1     |    30007 |        1 |DC1       |YES           |SYNC        |ACTIVE      |               |        True |
|RH1      |node2    |30003 |indexserver  |        3 |      2 |DC2       |node1     |    30003 |        1 |DC1       |YES           |SYNC        |ACTIVE      |               |        True |

status system replication site "1": ACTIVE
overall system replication status: ACTIVE

Local System Replication State
~~~~~~~~~~

mode: PRIMARY
site id: 2
site name: DC2

Chapter 8. Maintenance procedures

8.1. Cleaning up the failure history
Copy link

8.2. Triggering a HANA takeover using cluster commands
Copy link

8.3. Updating the operating system and HA cluster components
Copy link

8.4. Performing maintenance on the SAP HANA instances
Copy link

8.5. Registering the former primary after a takeover
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 8. Maintenance procedures

8.1. Cleaning up the failure historyCopy linkLink copied to clipboard!

8.2. Triggering a HANA takeover using cluster commandsCopy linkLink copied to clipboard!

8.3. Updating the operating system and HA cluster componentsCopy linkLink copied to clipboard!

8.4. Performing maintenance on the SAP HANA instancesCopy linkLink copied to clipboard!

8.5. Registering the former primary after a takeoverCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

8.1. Cleaning up the failure history
Copy link

8.2. Triggering a HANA takeover using cluster commands
Copy link

8.3. Updating the operating system and HA cluster components
Copy link

8.4. Performing maintenance on the SAP HANA instances
Copy link

8.5. Registering the former primary after a takeover
Copy link