Chapter 3. Using node disruption policies to minimize disruption from machine config changes
By default, when you make certain changes to the fields in a MachineConfig
object, the Machine Config Operator (MCO) drains and reboots the nodes associated with that machine config. However, you can create a node disruption policy that defines a set of changes to some Ignition config objects that would require little or no disruption to your workloads.
A node disruption policy allows you to define the configuration changes that cause a disruption to your cluster, and which changes do not. This allows you to reduce node downtime when making small machine configuration changes in your cluster. To configure the policy, you modify the MachineConfiguration
object, which is in the openshift-machine-config-operator
namespace. See the example node disruption policies in the MachineConfiguration
objects that follow.
There are machine configuration changes that always require a reboot, regardless of any node disruption policies. For more information, see About the Machine Config Operator.
After you create the node disruption policy, the MCO validates the policy to search for potential issues in the file, such as problems with formatting. The MCO then merges the policy with the cluster defaults and populates the status.nodeDisruptionPolicyStatus
fields in the machine config with the actions to be performed upon future changes to the machine config. The configurations in your policy always overwrite the cluster defaults.
The MCO does not validate whether a change can be successfully applied by your node disruption policy. Therefore, you are responsible to ensure the accuracy of your node disruption policies.
For example, you can configure a node disruption policy so that sudo configurations do not require a node drain and reboot. Or, you can configure your cluster so that updates to sshd
are applied with only a reload of that one service.
The node disruption policy feature is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
You can control the behavior of the MCO when making the changes to the following Ignition configuration objects:
-
configuration files: You add to or update the files in the
/var
or/etc
directory. - systemd units: You create and set the status of a systemd service or modify an existing systemd service.
-
users and groups: You change SSH keys in the
passwd
section postinstallation. -
ICSP, ITMS, IDMS objects: You can remove mirroring rules from an
ImageContentSourcePolicy
(ICSP),ImageTagMirrorSet
(ITMS), andImageDigestMirrorSet
(IDMS) object.
When you make any of these changes, the node disruption policy determines which of the following actions are required when the MCO implements the changes:
- Reboot: The MCO drains and reboots the nodes. This is the default behavior.
- None: The MCO does not drain or reboot the nodes. The MCO applies the changes with no further action.
- Drain: The MCO cordons and drains the nodes of their workloads. The workloads restart with the new configurations.
- Reload: For services, the MCO reloads the specified services without restarting the service.
- Restart: For services, the MCO fully restarts the specified services.
- DaemonReload: The MCO reloads the systemd manager configuration.
- Special: This is an internal MCO-only action and cannot be set by the user.
-
The
Reboot
andNone
actions cannot be used with any other actions, as theReboot
andNone
actions override the others. - Actions are applied in the order that they are set in the node disruption policy list.
- If you make other machine config changes that do require a reboot or other disruption to the nodes, that reboot supercedes the node disruption policy actions.
3.1. Example node disruption policies
The following example MachineConfiguration
objects contain a node disruption policy.
A MachineConfiguration
object and a MachineConfig
object are different objects. A MachineConfiguration
object is a singleton object in the MCO namespace that contains configuration parameters for the MCO operator. A MachineConfig
object defines changes that are applied to a machine config pool.
The following example MachineConfiguration
object shows no user defined policies. The default node disruption policy values are shown in the status
stanza.
Default node disruption policy
apiVersion: operator.openshift.io/v1 kind: MachineConfiguration name: cluster spec: logLevel: Normal managementState: Managed operatorLogLevel: Normal status: nodeDisruptionPolicyStatus: clusterPolicies: files: - actions: - type: None path: /etc/mco/internal-registry-pull-secret.json - actions: - type: None path: /var/lib/kubelet/config.json - actions: - reload: serviceName: crio.service type: Reload path: /etc/machine-config-daemon/no-reboot/containers-gpg.pub - actions: - reload: serviceName: crio.service type: Reload path: /etc/containers/policy.json - actions: - type: Special path: /etc/containers/registries.conf sshkey: actions: - type: None readyReplicas: 0
In the following example, when changes are made to the SSH keys, the MCO drains the cluster nodes, reloads the crio.service
, reloads the systemd configuration, and restarts the crio-service
.
Example node disruption policy for an SSH key change
apiVersion: operator.openshift.io/v1 kind: MachineConfiguration metadata: name: cluster namespace: openshift-machine-config-operator # ... spec: nodeDisruptionPolicy: sshkey: actions: - type: Drain - reload: serviceName: crio.service type: Reload - type: DaemonReload - restart: serviceName: crio.service type: Restart # ...
In the following example, when changes are made to the /etc/chrony.conf
file, the MCO reloads the chronyd.service
on the cluster nodes.
Example node disruption policy for a configuration file change
apiVersion: operator.openshift.io/v1 kind: MachineConfiguration metadata: name: cluster namespace: openshift-machine-config-operator # ... spec: nodeDisruptionPolicy: files: - actions: - reload: serviceName: chronyd.service type: Reload path: /etc/chrony.conf
In the following example, when changes are made to the auditd.service
systemd unit, the MCO drains the cluster nodes, reloads the crio.service
, reloads the systemd manager configuration, and restarts the crio.service
.
Example node disruption policy for a systemd unit change
apiVersion: operator.openshift.io/v1 kind: MachineConfiguration metadata: name: cluster namespace: openshift-machine-config-operator # ... spec: nodeDisruptionPolicy: units: - name: auditd.service actions: - type: Drain - type: Reload reload: serviceName: crio.service - type: DaemonReload - type: Restart restart: serviceName: crio.service
In the following example, when changes are made to the registries.conf
file, such as by editing an ImageContentSourcePolicy
(ICSP) object, the MCO does not drain or reboot the nodes and applies the changes with no further action.
Example node disruption policy for a registries.conf file change
apiVersion: operator.openshift.io/v1 kind: MachineConfiguration metadata: name: cluster namespace: openshift-machine-config-operator # ... spec: nodeDisruptionPolicy: files: - actions: - type: None path: /etc/containers/registries.conf
3.2. Configuring node restart behaviors upon machine config changes
You can create a node disruption policy to define the machine configuration changes that cause a disruption to your cluster, and which changes do not.
You can control how your nodes respond to changes in the files in the /var
or /etc
directory, the systemd units, the SSH keys, and the registries.conf
file.
When you make any of these changes, the node disruption policy determines which of the following actions are required when the MCO implements the changes:
- Reboot: The MCO drains and reboots the nodes. This is the default behavior.
- None: The MCO does not drain or reboot the nodes. The MCO applies the changes with no further action.
- Drain: The MCO cordons and drains the nodes of their workloads. The workloads restart with the new configurations.
- Reload: For services, the MCO reloads the specified services without restarting the service.
- Restart: For services, the MCO fully restarts the specified services.
- DaemonReload: The MCO reloads the systemd manager configuration.
- Special: This is an internal MCO-only action and cannot be set by the user.
-
The
Reboot
andNone
actions cannot be used with any other actions, as theReboot
andNone
actions override the others. - Actions are applied in the order that they are set in the node disruption policy list.
- If you make other machine config changes that do require a reboot or other disruption to the nodes, that reboot supercedes the node disruption policy actions.
Prerequisites
You have enabled the
TechPreviewNoUpgrade
feature set by using the feature gates. For more information, see "Enabling features using feature gates".WarningEnabling the
TechPreviewNoUpgrade
feature set on your cluster prevents minor version updates. TheTechPreviewNoUpgrade
feature set cannot be disabled. Do not enable this feature set on production clusters.
Procedure
Edit the
machineconfigurations.operator.openshift.io
object to define the node disruption policy:$ oc edit MachineConfiguration cluster -n openshift-machine-config-operator
Add a node disruption policy similar to the following:
apiVersion: operator.openshift.io/v1 kind: MachineConfiguration metadata: name: cluster # ... spec: nodeDisruptionPolicy: 1 files: 2 - actions: 3 - reload: 4 serviceName: chronyd.service 5 type: Reload path: /etc/chrony.conf 6 sshkey: 7 actions: - type: Drain - reload: serviceName: crio.service type: Reload - type: DaemonReload - restart: serviceName: crio.service type: Restart units: 8 - actions: - type: Drain - reload: serviceName: crio.service type: Reload - type: DaemonReload - restart: serviceName: crio.service type: Restart name: test.service
- 1
- Specifies the node disruption policy.
- 2
- Specifies a list of machine config file definitions and actions to take to changes on those paths. This list supports a maximum of 50 entries.
- 3
- Specifies the series of actions to be executed upon changes to the specified files. Actions are applied in the order that they are set in this list. This list supports a maximum of 10 entries.
- 4
- Specifies that the listed service is to be reloaded upon changes to the specified files.
- 5
- Specifies the full name of the service to be acted upon.
- 6
- Specifies the location of a file that is managed by a machine config. The actions in the policy apply when changes are made to the file in
path
. - 7
- Specifies a list of service names and actions to take upon changes to the SSH keys in the cluster.
- 8
- Specifies a list of systemd unit names and actions to take upon changes to those units.
Verification
View the
MachineConfiguration
object file that you created:$ oc get MachineConfiguration/cluster -o yaml
Example output
apiVersion: operator.openshift.io/v1 kind: MachineConfiguration metadata: labels: machineconfiguration.openshift.io/role: worker name: cluster # ... status: nodeDisruptionPolicyStatus: 1 clusterPolicies: files: # ... - actions: - reload: serviceName: chronyd.service type: Reload path: /etc/chrony.conf sshkey: actions: - type: Drain - reload: serviceName: crio.service type: Reload - type: DaemonReload - restart: serviceName: crio.service type: Restart units: - actions: - type: Drain - reload: serviceName: crio.service type: Reload - type: DaemonReload - restart: serviceName: crio.service type: Restart name: test.se # ...
- 1
- Specifies the current cluster-validated policies.