1.4.2.2. Machine Config Operator node updates
In the second phase of the update, the Machine Config Operator (MCO) applies a new machine configuration to each control plane and compute node.
During this process, the MCO performs the following sequential actions on each node of the cluster:
- Cordon and drain all the nodes
- Update the operating system (OS)
- Reboot the nodes
- Uncordon all nodes and schedule workloads on the node
When a node is cordoned, workloads cannot be scheduled to it.
The time to complete this process depends on several factors including the node and infrastructure configuration. This process might take 5 or more minutes to complete per node.
In addition to MCO, you should consider the impact of the following parameters:
- The control plane node update duration is predictable and oftentimes shorter than compute nodes, because the control plane workloads are tuned for graceful updates and quick drains.
-
You can update the compute nodes in parallel by setting the
maxUnavailablefield to greater than1in the Machine Config Pool (MCP). The MCO cordons the number of nodes specified inmaxUnavailableand marks them unavailable for update. -
When you increase
maxUnavailableon the MCP, it can help the pool to update more quickly. However, ifmaxUnavailableis set too high, and several nodes are cordoned simultaneously, the pod disruption budget (PDB) guarded workloads could fail to drain because a schedulable node cannot be found to run the replicas. If you increasemaxUnavailablefor the MCP, ensure that you still have sufficient schedulable nodes to allow PDB guarded workloads to drain. Before you begin the update, you must ensure that all the nodes are available. Any unavailable nodes can significantly impact the update duration because the node unavailability affects the
maxUnavailableand pod disruption budgets.To check the status of nodes from the terminal, run the following command:
$ oc get nodeExample Output
NAME STATUS ROLES AGE VERSION ip-10-0-137-31.us-east-2.compute.internal Ready,SchedulingDisabled worker 12d v1.23.5+3afdacb ip-10-0-151-208.us-east-2.compute.internal Ready master 12d v1.23.5+3afdacb ip-10-0-176-138.us-east-2.compute.internal Ready master 12d v1.23.5+3afdacb ip-10-0-183-194.us-east-2.compute.internal Ready worker 12d v1.23.5+3afdacb ip-10-0-204-102.us-east-2.compute.internal Ready master 12d v1.23.5+3afdacb ip-10-0-207-224.us-east-2.compute.internal Ready worker 12d v1.23.5+3afdacbIf the status of the node is
NotReadyorSchedulingDisabled, then the node is not available and this impacts the update duration.You can also check the status of nodes from the Administrator perspective in the web console by expanding Compute
Nodes.