Chapter 21. Resource monitoring operations
To ensure that resources remain healthy, you can add a monitoring operation to a resource’s definition. If you do not specify a monitoring operation for a resource, by default the pcs
command will create a monitoring operation, with an interval that is determined by the resource agent. If the resource agent does not provide a default monitoring interval, the pcs command will create a monitoring operation with an interval of 60 seconds.
The following table summarizes the properties of a resource monitoring operation.
Field | Description |
---|---|
| Unique name for the action. The system assigns this when you configure an operation. |
|
The action to perform. Common values: |
|
If set to a nonzero value, a recurring operation is created that repeats at this frequency, in seconds. A nonzero value makes sense only when the action
If set to zero, which is the default value, this parameter allows you to provide values to be used for operations created by the cluster. For example, if the |
|
If the operation does not complete in the amount of time set by this parameter, abort the operation and consider it failed. The default value is the value of
The |
| The action to take if this action ever fails. Allowed values:
*
*
*
*
*
*
*
The default for the |
|
If |
21.1. Configuring resource monitoring operations
You can configure monitoring operations when you create a resource with the following command.
pcs resource create resource_id standard:provider:type|type [resource_options] [op operation_action operation_options [operation_type operation_options]...]
For example, the following command creates an IPaddr2
resource with a monitoring operation. The new resource is called VirtualIP
with an IP address of 192.168.0.99 and a netmask of 24 on eth2
. A monitoring operation will be performed every 30 seconds.
# pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=192.168.0.99 cidr_netmask=24 nic=eth2 op monitor interval=30s
Alternately, you can add a monitoring operation to an existing resource with the following command.
pcs resource op add resource_id operation_action [operation_properties]
Use the following command to delete a configured resource operation.
pcs resource op remove resource_id operation_name operation_properties
You must specify the exact operation properties to properly remove an existing operation.
To change the values of a monitoring option, you can update the resource. For example, you can create a VirtualIP
with the following command.
# pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=192.168.0.99 cidr_netmask=24 nic=eth2
By default, this command creates these operations.
Operations: start interval=0s timeout=20s (VirtualIP-start-timeout-20s) stop interval=0s timeout=20s (VirtualIP-stop-timeout-20s) monitor interval=10s timeout=20s (VirtualIP-monitor-interval-10s)
To change the stop timeout operation, execute the following command.
# pcs resource update VirtualIP op stop interval=0s timeout=40s # pcs resource config VirtualIP Resource: VirtualIP (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=192.168.0.99 cidr_netmask=24 nic=eth2 Operations: start interval=0s timeout=20s (VirtualIP-start-timeout-20s) monitor interval=10s timeout=20s (VirtualIP-monitor-interval-10s) stop interval=0s timeout=40s (VirtualIP-name-stop-interval-0s-timeout-40s)
21.2. Configuring global resource operation defaults
As of Red Hat Enterprise Linux 8.3, you can change the default value of a resource operation for all resources with the pcs resource op defaults update
command.
The following command sets a global default of a timeout
value of 240 seconds for all monitoring operations.
# pcs resource op defaults update timeout=240s
The original pcs resource op defaults name=value
command, which set resource operation defaults for all resources in previous releases, remains supported unless there is more than one set of defaults configured. However, pcs resource op defaults update
is now the preferred version of the command.
21.2.1. Overriding resource-specific operation values
Note that a cluster resource will use the global default only when the option is not specified in the cluster resource definition. By default, resource agents define the timeout
option for all operations. For the global operation timeout value to be honored, you must create the cluster resource without the timeout
option explicitly or you must remove the timeout
option by updating the cluster resource, as in the following command.
# pcs resource update VirtualIP op monitor interval=10s
For example, after setting a global default of a timeout
value of 240 seconds for all monitoring operations and updating the cluster resource VirtualIP
to remove the timeout value for the monitor
operation, the resource VirtualIP
will then have timeout values for start
, stop
, and monitor
operations of 20s, 40s and 240s, respectively. The global default value for timeout operations is applied here only on the monitor
operation, where the default timeout
option was removed by the previous command.
# pcs resource config VirtualIP
Resource: VirtualIP (class=ocf provider=heartbeat type=IPaddr2)
Attributes: ip=192.168.0.99 cidr_netmask=24 nic=eth2
Operations: start interval=0s timeout=20s (VirtualIP-start-timeout-20s)
monitor interval=10s (VirtualIP-monitor-interval-10s)
stop interval=0s timeout=40s (VirtualIP-name-stop-interval-0s-timeout-40s)
21.2.2. Changing the default value of a resource operation for sets of resources
As of Red Hat Enterprise Linux 8.3, you can create multiple sets of resource operation defaults with the pcs resource op defaults set create
command, which allows you to specify a rule that contains resource
and operation expressions. In RHEL 8.3, only resource
and operation expressions, including and
, or
and parentheses, are allowed in rules that you specify with this command. In RHEL 8.4 and later, all of the other rule expressions supported by Pacemaker are allowed as well.
With this comand, you can configure a default resource operation value for all resources of a particular type. For example, it is now possible to configure implicit podman
resources created by Pacemaker when bundles are in use.
The following command sets a default timeout value of 90s for all operations for all podman
resources. In this example, ::podman
means a resource of any class, any provider, of type podman
.
The id
option, which names the set of resource operation defaults, is not mandatory. If you do not set this option, pcs
will generate an ID automatically. Setting this value allows you to provide a more descriptive name.
# pcs resource op defaults set create id=podman-timeout meta timeout=90s rule resource ::podman
The following command sets a default timeout value of 120s for the stop
operation for all resources.
# pcs resource op defaults set create id=stop-timeout meta timeout=120s rule op stop
It is possible to set the default timeout value for a specific operation for all resources of a particular type. The following example sets a default timeout value of 120s for the stop
operation for all podman
resources.
# pcs resource op defaults set create id=podman-stop-timeout meta timeout=120s rule resource ::podman and op stop
21.2.3. Displaying currently configured resource operation default values
The pcs resource op defaults
command displays a list of currently configured default values for resource operations, including any rules you specified.
The following command displays the default operation values for a cluster which has been configured with a default timeout value of 90s for all operations for all podman
resources, and for which an ID for the set of resource operation defaults has been set as podman-timeout
.
# pcs resource op defaults
Meta Attrs: podman-timeout
timeout=90s
Rule: boolean-op=and score=INFINITY
Expression: resource ::podman
The following command displays the default operation values for a cluster which has been configured with a default timeout value of 120s for the stop
operation for all podman
resources, and for which an ID for the set of resource operation defaults has been set as podman-stop-timeout
.
# pcs resource op defaults]
Meta Attrs: podman-stop-timeout
timeout=120s
Rule: boolean-op=and score=INFINITY
Expression: resource ::podman
Expression: op stop
21.3. Configuring multiple monitoring operations
You can configure a single resource with as many monitor operations as a resource agent supports. In this way you can do a superficial health check every minute and progressively more intense ones at higher intervals.
When configuring multiple monitor operations, you must ensure that no two operations are performed at the same interval.
To configure additional monitoring operations for a resource that supports more in-depth checks at different levels, you add an OCF_CHECK_LEVEL=n
option.
For example, if you configure the following IPaddr2
resource, by default this creates a monitoring operation with an interval of 10 seconds and a timeout value of 20 seconds.
# pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=192.168.0.99 cidr_netmask=24 nic=eth2
If the Virtual IP supports a different check with a depth of 10, the following command causes Pacemaker to perform the more advanced monitoring check every 60 seconds in addition to the normal Virtual IP check every 10 seconds. (As noted, you should not configure the additional monitoring operation with a 10-second interval as well.)
# pcs resource op add VirtualIP monitor interval=60s OCF_CHECK_LEVEL=10