Este contenido no está disponible en el idioma seleccionado.
Chapter 24. Clustering
Pacemaker correctly interprets systemd responses and systemd services are stopped in proper order at cluster shutdown
Previously, when a Pacemaker cluster was configured with
systemd
resources and the cluster was stopped, Pacemaker could mistakenly assume that a systemd
service had stopped before it actually had stopped. As a consequence, services could be stopped out of order, potentially leading to stop failures. With this update, Pacemaker now correctly interprets systemd
responses and systemd
services are stopped in the proper order at cluster shutdown. (BZ#1286316)
Pacemaker now distinguishes transient failures from fatal failures when loading systemd units
Previously, Pacemaker treated all errors loading a
systemd
unit as fatal. As a consequence, Pacemaker would not start a systemd
resource on a node where it could not load the systemd
unit, even if the load failed due to transient conditions such as CPU load. With this update, Pacemaker now distinguishes transient failures from fatal failures when loading systemd
units. Logs and cluster status now show more appropriate messages, and the resource can start on the node once the transient error clears. (BZ#1346726)
Pacemaker now removes node attributes from its memory when purging a node that has been removed from the cluster
Previously, Pacemaker's node attribute manager removed attribute values from its memory but not the attributes themselves when purging a node that had been removed from the cluster. As a result, if a new node was later added to the cluster with the same node ID, attributes that existed on the original node could not be set for the new node. With this update, Pacemaker now purges the attributes themselves when removing a node and a new node with the same ID encounters no problems with setting attributes. (BZ#1338623)
Pacemaker now correctly determines expected results for resources that are in a group or depend on a clone
Previously, when restarting a service, Pacemaker's
crm_resource
tool (and thus the pcs resource restart
command) could fail to properly determine when affected resources successfully started. As a result, the command could fail to restart a resource that is a member of a group, or the command could hang indefinitely if the restarted resource depended on a cloned resource that moved to another node. With this update, the command now properly determines expected results for resources that are in a group or depend on a clone. The desired service is restarted, and the command returns. (BZ#1337688)
Fencing now occurs when DLM requires it, even when the cluster itself does not
Previously, DLM could require fencing due to quorum issues, even when the cluster itself did not require fencing, but would be unable to initiate it, As a consequence, DLM and DLM-based services could hang waiting for fencing that never happened. With this fix, the
ocf:pacemaker:controld
resource agent now checks whether DLM is in this state, and requests fencing if so. Fencing now occurs in this situation, allowing DLM to recover. (BZ#1268313)
The DLM now detects and reports connection problems
Previously, the Distributed Lock Manager (DLM) used for cluster communications expected TCP/IP packet delivery and waited for responses indefinitely. As a consequence, if a DLM connection was lost, there was no notification of the problem. With this update, the DLM detects and reports when cluster communications are lost. As a result, DLM communication problems can be identified, and cluster nodes that become unresponsive can be restarted once the problems are resolved. (BZ#1267339)
High Availability instances created by non-admin users are now evacuated when a compute instance is turned off
Previously, the
fence_compute
agent searched only for compute instances created by admin users. As a consequence, instances created by non-admin users were not evacuated when a compute instance was turned off. This update makes sure that fence_compute
searches for instances run as any user, and compute instances are evacuated to new compute nodes as expected. (BZ#1313561)
Starting the nfsserver
resource no longer fails
The
nfs-idmapd
service fails to start when the var-lib-nfs-rpc_pipefs.mount
process is active. The process is active by default. Consequently, starting the nfsserver
resource failed. With this update, var-lib-nfs-rpc_pipefs.mount
stops in this situation and does not prevent nfs-idmapd
from starting. As a result, nfsserver
starts as expected. (BZ#1325453)
lrmd
logs errors as expected and no longer crashes
Previously, Pacemaker's Local Resource Management Daemon (lrmd) used an invalid format string when logging certain rare
systemd
errors. As a consequence, lrmd
could terminate unexpectedly with a segmentation fault. A patch has been applied to fix the format string. As a result, lrmd
no longer crashes and logs the aforementioned rare error messages as intended. (BZ#1284069)
stonithd
now properly distinguishes attribute removals from device removals.
Prior to this update, if a user deleted an attribute from a fence device, Pacemaker's
stonithd
service sometimes mistakenly removed the entire device. Consequently, the cluster would no longer use the fence device. The underlying source code has been modified to fix this bug, and stonithd
now properly distinguishes attribute removals from device removals. As a result, deleting a fence device attribute no longer removes the device itself. (BZ#1287315)
HealthCPU
now correctly measures CPU usage
Previously, the
ocf:pacemaker:HealthCPU
resource parsed the output of the top
command incorrectly on Red Hat Enterprise Linux 7. As a consequence, the HealthCPU
resource did not work. With this update, the resource agent correctly parses the output of later versions of top
. As a result, HealthCPU
now correctly measures CPU usage. (BZ#1287868)
Pacemaker now checks all collected files when stripping sensitive information
Pacemaker has the ability to strip sensitive information that matches a given pattern when submitting system information with bug reports, whether directly by Pacemaker's
crm_report
tool or indirectly via sosreport
. However, Pacemaker would only check certain collected files, not log file extracts. Because of this, sensitive information could remain in log file extracts. With this fix, Pacemaker now checks all collected files when stripping sensitive information and no sensitive information is collected. (BZ#1219188)
The corosync
memory footprint no longer increases on every node rejoin
Previously, when a user rejoined a node some buffers in
corosync
were not freed so that memory consumption grew. With this fix, no memory is leaked and the memory footprint no longer increases on every node rejoin. (BZ#1306349)
Corosync starts correctly when configured to use IPv4 and DNS is set to return both IPv4 and IPv6 addresses
Previously, when a pcs-generated
corosync.conf
file used hostnames instead of IP addresses and Internet Protocol version 4 (IPv4) and the DNS server was set to return both IPV4 and IPV6 addresses, the corosync
utility failed to start. With this fix, if Corosync is configured to use IPv4, IPv4 is really used. As a result, corosync
starts as expected in the described circumstances. (BZ#1289169)
The corosync-cmapctl
utility correctly handles errors in the print_key()
function
Previously, the
corosync-cmapctl
utility did not handle corosync errors in the print_key()
function correctly. Consequently, corosync-cmapctl
could enter an infinite loop if the corosync
utility was killed. The provided fix makes sure all errors returned when Corosync exits are handled correctly. As a result, corosync-cmapctl
leaves the loop and displays a relevant error message in this scenario. (BZ#1336462)