Este conteúdo não está disponível no idioma selecionado.

Chapter 10. Troubleshooting

10.1. The srHook cluster attribute value is incorrect
Copiar o link

When the srHook attribute value does not match the actual HANA system replication status, it can lead to unexpected behavior in the cluster when a failure of a primary instance occurs.

Check and correct your sudo configuration when the srHook attribute of the secondary site and the HANA system replication status do not match:

The srHook cluster attribute of the secondary is empty.
The srHook cluster attribute of the secondary is set to SOK while the HANA system replication is not healthy.
The srHook cluster attribute of the secondary is set to SFAIL while the system replication is in ACTIVE state.

The primary site receives the events of HANA system replication changes and stores the result as a cluster attribute for the secondary site.

Procedure

Check for crm_attribute update errors in the secure log, since the command is executed using sudo. The log shows the command that the hook script tries to execute, but potentially fails. Check on the primary instance node for an error like command not allowed, like in this example:
```
[root]# grep crm_attribute /var/log/secure
... rh1adm : command not allowed ; PWD=/hana/shared/RH1/HDB02/<node> ; USER=root ; COMMAND=/usr/sbin/crm_attribute -n hana_rh1_site_srHook_DC2 -v SFAIL -t crm_config -s SAPHanaSR
```
Compare the logged COMMAND to your sudoers configuration. Check thoroughly and fix the sudoers file, so that you have a sudo entry that matches the command. As a temporary measure you can ensure that the sudo entry as such works by simplifying it with a wildcard to exclude typos in the command parameters as the cause:
```
[root]# cat /etc/sudoers.d/20-saphana
Defaults:<sid>adm !requiretty
<sid>adm ALL=(ALL) NOPASSWD: /usr/sbin/crm_attribute *
```
- Replace <sid> with your lower-case HANA SID.

Verify that the command path is correct:

[root]# ls /usr/sbin/crm_attribute
/usr/sbin/crm_attribute

Fix the sudo configuration. For more information, see Configuring the HanaSR HA/DR provider for the srConnectionChanged() hook method.
Repeat any fixing steps on all nodes. The sudo configuration must be identical on all instances.

10.2. The HANA instance does not start after hook changes
Copiar o link

You recently made changes in the global.ini in a HA/DR provider section and the HANA instance does not start anymore.

Procedure

Go to the HANA trace logs directory, as the <sid>adm user:
```
rh1adm $ cdtrace
```

Check for errors related to the HA/DR providers in the HANA nameserver process alert log:

rh1adm $ grep ha_dr_provider nameserver_alert_*.trc
... ha_dr_provider   PythonProxyImpl.cpp(00145) : import of hanasr failed: No module named 'hanasr'
... ha_dr_provider   HADRProviderManager.cpp(00100) : could not load HA/DR Provider 'hanasr' from /usr/share/sap-hana-ha/

Identify the root cause, for example a misspelled HA/DR provider name or a wrong path. Check the path and the hook script name. In this example the HA/DR provider name hanasr is not matching the hook script name HanaSR:
```
rh1adm $ ls /usr/share/sap-hana-ha/
ChkSrv.py  HanaSR.py  samples
```
Correct the HanaSR HA/DR provider configuration:
```
[ha_dr_provider_hanasr]
provider = HanaSR
path = /usr/share/sap-hana-ha/
execution_order = 1
```
- provider must match the name of the Python hook script. It is case-sensitive without the .py file suffix.
- path must be the path in which the hook script is stored.

10.3. A cluster node is reported as offline during maintenance
Copiar o link

When maintenance-mode is set for the cluster, for example, for a HANA update, it can still notice issues between the nodes, but does not trigger recovery actions yet.

If you encounter such a situation, you must first fix the cause of the issue before you lift the maintenance mode.

Example: the corosync communication between the nodes is blocked in a 2-node cluster

Both nodes report the other node as offline. If the maintenance mode is removed in this situation, the cluster tries to recover by fencing one node. This can have a severe impact on your ongoing HANA maintenance activity.

...
               *** Resource management is DISABLED ***
  The cluster will not attempt to start, stop or recover services

Node List:
  * Node hana1 (1): online, feature set 3.19.0
  * Node hana2 (2): UNCLEAN (offline)

Full List of Resources:
  * Clone Set: cln_SAPHanaTop_RH1_HDB02 [rsc_SAPHanaTop_RH1_HDB02] (maintenance):
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started hana2 (UNCLEAN, maintenance)
    * rsc_SAPHanaTop_RH1_HDB02  (ocf:heartbeat:SAPHanaTopology):         Started hana1 (maintenance)
  * Clone Set: cln_SAPHanaCon_RH1_HDB02 [rsc_SAPHanaCon_RH1_HDB02] (promotable, maintenance):
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Unpromoted hana2 (UNCLEAN, maintenance)
    * rsc_SAPHanaCon_RH1_HDB02  (ocf:heartbeat:SAPHanaController):       Promoted hana1 (maintenance)
  * Clone Set: cln_SAPHanaFil_RH1_HDB02 [rsc_SAPHanaFil_RH1_HDB02] (maintenance):
    * rsc_SAPHanaFil_RH1_HDB02  (ocf:heartbeat:SAPHanaFilesystem):       Started hana2 (UNCLEAN, maintenance)
    * rsc_SAPHanaFil_RH1_HDB02  (ocf:heartbeat:SAPHanaFilesystem):       Started hana1 (maintenance)
  * rsc_vip_RH1_HDB02_primary   (ocf:heartbeat:IPaddr2):         Started hana1 (maintenance)
  * rsc_vip_RH1_HDB02_readonly  (ocf:heartbeat:IPaddr2):         Started hana2 (UNCLEAN, maintenance)

...

Identify the root cause of the issue, for example:

Planned network maintenance on the cluster communication connection in parallel to your HANA maintenance.
Unplanned outage of network connections due to network device failures or misconfiguration on operating system or network level.
Firewall configuration blocking cluster communication ports.

Fix any issue to prevent the cluster from taking recovery measures when the cluster maintenance is removed.

Este conteúdo não está disponível no idioma selecionado.

Chapter 10. Troubleshooting

10.1. The srHook cluster attribute value is incorrect
Copiar o link

10.2. The HANA instance does not start after hook changes
Copiar o link

10.3. A cluster node is reported as offline during maintenance
Copiar o link

Aprender

Experimente, compre e venda

Comunidades

Sobre a documentação da Red Hat

Tornando o open source mais inclusivo

Sobre a Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Este conteúdo não está disponível no idioma selecionado.

Chapter 10. Troubleshooting

10.1. The srHook cluster attribute value is incorrectCopiar o linkLink copiado para a área de transferência!

10.2. The HANA instance does not start after hook changesCopiar o linkLink copiado para a área de transferência!

10.3. A cluster node is reported as offline during maintenanceCopiar o linkLink copiado para a área de transferência!

Aprender

Experimente, compre e venda

Comunidades

Sobre a documentação da Red Hat

Tornando o open source mais inclusivo

Sobre a Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

10.1. The srHook cluster attribute value is incorrect
Copiar o link

10.2. The HANA instance does not start after hook changes
Copiar o link

10.3. A cluster node is reported as offline during maintenance
Copiar o link