Chapter 19. Improving cluster stability in high latency environments using worker latency profiles
To improve cluster stability in high latency environments, apply worker latency profiles. These profiles adjust Kubelet timing parameters to ensure that nodes remain healthy and responsive despite network delays.
If the cluster administrator has performed latency tests for platform verification, they can discover the need to adjust the operation of the cluster to ensure stability in cases of high latency.
The cluster administrator needs to change only one parameter, recorded in a file, which controls four parameters affecting how supervisory processes read status and interpret the health of the cluster. Changing only the one parameter provides cluster tuning in an easy, supportable manner.
The Kubelet process provides the starting point for monitoring cluster health. The Kubelet sets status values for all nodes in the OpenShift Container Platform cluster. The Kubernetes Controller Manager (kube controller) reads the status values every 10 seconds, by default. If the kube controller cannot read a node status value, it loses contact with that node after a configured period. The default behavior is:
-
The node controller on the control plane updates the node health to
Unhealthyand marks the nodeReadycondition`Unknown`. - In response, the scheduler stops scheduling pods to that node.
-
The Node Lifecycle Controller adds a
node.kubernetes.io/unreachabletaint with aNoExecuteeffect to the node and schedules any pods on the node for eviction after five minutes, by default.
This behavior can cause problems if your network is prone to latency issues, especially if you have nodes at the network edge. In some cases, the Kubernetes Controller Manager might not receive an update from a healthy node due to network latency. The Kubelet evicts pods from the node even though the node is healthy.
To avoid this problem, you can use worker latency profiles to adjust the frequency that the Kubelet and the Kubernetes Controller Manager wait for status updates before taking action. These adjustments help to ensure that your cluster runs properly if network latency between the control plane and the worker nodes is not optimal.
These worker latency profiles contain three sets of parameters that are predefined with carefully tuned values to control the reaction of the cluster to increased latency. There is no need to experimentally find the best values manually.
You can configure worker latency profiles when installing a cluster or at any time you notice increased latency in your cluster network.
19.1. Understanding worker latency profiles Copy linkLink copied to clipboard!
Review the following information to learn about worker latency profiles, which allow you to control the reaction of the cluster to latency issues without needing to determine the best values by using manual methods.
Worker latency profiles are four different categories of carefully-tuned parameters. The four parameters which implement these values are node-status-update-frequency, node-monitor-grace-period, default-not-ready-toleration-seconds and default-unreachable-toleration-seconds.
Setting these parameters manually is not supported. Incorrect parameter settings adversely affect cluster stability.
All worker latency profiles configure the following parameters:
- node-status-update-frequency
- Specifies how often the kubelet posts node status to the API server.
- node-monitor-grace-period
-
Specifies the amount of time in seconds that the Kubernetes Controller Manager waits for an update from a kubelet before marking the node unhealthy and adding the
node.kubernetes.io/not-readyornode.kubernetes.io/unreachabletaint to the node. - default-not-ready-toleration-seconds
- Specifies the amount of time in seconds after marking a node unhealthy that the Kube API Server Operator waits before evicting pods from that node.
- default-unreachable-toleration-seconds
- Specifies the amount of time in seconds after marking a node unreachable that the Kube API Server Operator waits before evicting pods from that node.
The following Operators monitor the changes to the worker latency profiles and respond accordingly:
-
The Machine Config Operator (MCO) updates the
node-status-update-frequencyparameter on the compute nodes. -
The Kubernetes Controller Manager updates the
node-monitor-grace-periodparameter on the control plane nodes. -
The Kubernetes API Server Operator updates the
default-not-ready-toleration-secondsanddefault-unreachable-toleration-secondsparameters on the control plane nodes.
Although the default configuration works in most cases, OpenShift Container Platform offers two other worker latency profiles for situations where the network is experiencing higher latency than usual. The three worker latency profiles are described in the following sections:
- Default worker latency profile
With the
Defaultprofile, eachKubeletupdates its status every 10 seconds (node-status-update-frequency). TheKube Controller Managerchecks the statuses ofKubeletevery 5 seconds.The Kubernetes Controller Manager waits 40 seconds (
node-monitor-grace-period) for a status update fromKubeletbefore considering theKubeletunhealthy. If no status is made available to the Kubernetes Controller Manager, it then marks the node with thenode.kubernetes.io/not-readyornode.kubernetes.io/unreachabletaint and evicts the pods on that node.If a pod is on a node that has the
NoExecutetaint, the pod runs according totolerationSeconds. If the node has no taint, it will be evicted in 300 seconds (default-not-ready-toleration-secondsanddefault-unreachable-toleration-secondssettings of theKube API Server).Expand Profile Component Parameter Value Default
kubelet
node-status-update-frequency10s
Kubelet Controller Manager
node-monitor-grace-period40s
Kubernetes API Server Operator
default-not-ready-toleration-seconds300s
Kubernetes API Server Operator
default-unreachable-toleration-seconds300s
- Medium worker latency profile
Use the
MediumUpdateAverageReactionprofile if the network latency is slightly higher than usual.The
MediumUpdateAverageReactionprofile reduces the frequency of kubelet updates to 20 seconds and changes the period that the Kubernetes Controller Manager waits for those updates to 2 minutes. The pod eviction period for a pod on that node is reduced to 60 seconds. If the pod has thetolerationSecondsparameter, the eviction waits for the period specified by that parameter.The Kubernetes Controller Manager waits for 2 minutes to consider a node unhealthy. In another minute, the eviction process starts.
Expand Profile Component Parameter Value MediumUpdateAverageReaction
kubelet
node-status-update-frequency20s
Kubelet Controller Manager
node-monitor-grace-period2m
Kubernetes API Server Operator
default-not-ready-toleration-seconds60s
Kubernetes API Server Operator
default-unreachable-toleration-seconds60s
- Low worker latency profile
Use the
LowUpdateSlowReactionprofile if the network latency is extremely high.The
LowUpdateSlowReactionprofile reduces the frequency of kubelet updates to 1 minute and changes the period that the Kubernetes Controller Manager waits for those updates to 5 minutes. The pod eviction period for a pod on that node is reduced to 60 seconds. If the pod has thetolerationSecondsparameter, the eviction waits for the period specified by that parameter.The Kubernetes Controller Manager waits for 5 minutes to consider a node unhealthy. In another minute, the eviction process starts.
Expand Profile Component Parameter Value LowUpdateSlowReaction
kubelet
node-status-update-frequency1m
Kubelet Controller Manager
node-monitor-grace-period5m
Kubernetes API Server Operator
default-not-ready-toleration-seconds60s
Kubernetes API Server Operator
default-unreachable-toleration-seconds60s
The latency profiles do not support custom machine config pools, only the default worker machine config pools.
19.2. Implementing worker latency profiles at cluster creation Copy linkLink copied to clipboard!
To ensure cluster stability in high latency environments, implement worker latency profiles during cluster creation.
To edit the configuration of the installation program, first use the command openshift-install create manifests to create the default node manifest and other manifest YAML files. This file structure must exist before you can add workerLatencyProfile. The platform on which you are installing might have varying requirements. Refer to the Installing section of the documentation for your specific platform.
Procedure
- Create the manifest that is needed to build the cluster by using a folder name appropriate for your installation.
-
Create a YAML file to define
config.node. The file must be in themanifestsdirectory. -
When defining
workerLatencyProfilein the manifest for the first time, specify any of the profiles at cluster creation time:Default,MediumUpdateAverageReactionorLowUpdateSlowReaction.
Verification
View the manifest file by running the following command. The output of the command should show the creation of the
spec.workerLatencyProfileDefaultvalue in the manifest file.openshift-install create manifests --dir=<cluster_install_dir>
$ openshift-install create manifests --dir=<cluster_install_dir>Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
<cluster_install_dir>: Specifies the directory where you installed your cluster. Edit the manifest and add the value by entering the following command. The following example command uses the
vieditor to show an example manifest file with the "Default"workerLatencyProfilevalue added.vi <cluster_install_dir>/manifests/config-node-default-profile.yaml
$ vi <cluster_install_dir>/manifests/config-node-default-profile.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow <cluster_install_dir>: Specifies the directory where you installed your cluster.Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
19.3. Using and changing worker latency profiles Copy linkLink copied to clipboard!
You can change a worker latency profile to deal with network latency at any time by editing the node.config object. With this configuration, you can ensure that your cluster runs properly if network latency between the control plane and the compute nodes fluctuates.
You must move one worker latency profile at a time. For example, you cannot move directly from the Default profile to the LowUpdateSlowReaction worker latency profile. You must move from the Default worker latency profile to the MediumUpdateAverageReaction profile and then to the LowUpdateSlowReaction profile. Similarly, when returning to the Default profile, you must move from the low profile to the medium profile first, then to Default.
You can also configure worker latency profiles upon installing an OpenShift Container Platform cluster.
Procedure
Move to the medium worker latency profile:
Edit the
node.configobject:oc edit nodes.config/cluster
$ oc edit nodes.config/clusterCopy to Clipboard Copied! Toggle word wrap Toggle overflow Add
spec.workerLatencyProfile: MediumUpdateAverageReaction:Example
node.configobjectCopy to Clipboard Copied! Toggle word wrap Toggle overflow where:
spec.workerLatencyProfile.MediumUpdateAverageReaction- Specifies that the medium worker latency policy should be used.
Scheduling on each compute node is disabled as the change is being applied.
Optional: Move to the low worker latency profile:
Edit the
node.configobject:oc edit nodes.config/cluster
$ oc edit nodes.config/clusterCopy to Clipboard Copied! Toggle word wrap Toggle overflow Change the
spec.workerLatencyProfilevalue toLowUpdateSlowReaction:Example
node.configobjectCopy to Clipboard Copied! Toggle word wrap Toggle overflow where:
spec.workerLatencyProfile.LowUpdateSlowReaction- Specifies that the low worker latency policy should be used.
Scheduling on each compute node is disabled as the change is being applied.
Verification
When all nodes return to the
Readycondition, you can use the following command to look in the Kubernetes Controller Manager to ensure it was applied:oc get KubeControllerManager -o yaml | grep -i workerlatency -A 5 -B 5
$ oc get KubeControllerManager -o yaml | grep -i workerlatency -A 5 -B 5Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow where:
status.message: all static pod revision(s) have updated latency profile- Specifies that the profile is applied and active.
To change the medium profile to default or change the default to medium, edit the
node.configobject and set thespec.workerLatencyProfileparameter to the appropriate value.
19.4. Displaying resulting values of worker latency profile Copy linkLink copied to clipboard!
To verify the configuration of your compute nodes, display the resulting values of the worker latency profile configured for those nodes. This ensures that the Kubelet parameters are correctly adjusted for high latency environments and helps you confirm system stability.
The following procedure uses example commands to display the values in the worker latency profile configured for your node.
Procedure
Check the
default-not-ready-toleration-secondsanddefault-unreachable-toleration-secondsfields output by the Kube API Server:oc get KubeAPIServer -o yaml | grep -A 1 default-
$ oc get KubeAPIServer -o yaml | grep -A 1 default-Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
default-not-ready-toleration-seconds: - "300" default-unreachable-toleration-seconds: - "300"
default-not-ready-toleration-seconds: - "300" default-unreachable-toleration-seconds: - "300"Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the values of the
node-monitor-grace-periodfield from the Kube Controller Manager:oc get KubeControllerManager -o yaml | grep -A 1 node-monitor
$ oc get KubeControllerManager -o yaml | grep -A 1 node-monitorCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
node-monitor-grace-period: - 40s
node-monitor-grace-period: - 40sCopy to Clipboard Copied! Toggle word wrap Toggle overflow Check the
nodeStatusUpdateFrequencyvalue from the Kubelet by entering the following command. Set the directory/hostas the root directory within the debug shell. By changing the root directory to/host, you can run binaries contained in the executable paths of the host.oc debug node/<compute_node_name>
$ oc debug node/<compute_node_name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow chroot /host
$ chroot /hostCopy to Clipboard Copied! Toggle word wrap Toggle overflow cat /etc/kubernetes/kubelet.conf|grep nodeStatusUpdateFrequency
# cat /etc/kubernetes/kubelet.conf|grep nodeStatusUpdateFrequencyCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
“nodeStatusUpdateFrequency”: “10s”
“nodeStatusUpdateFrequency”: “10s”Copy to Clipboard Copied! Toggle word wrap Toggle overflow These outputs validate the set of timing variables for the Worker Latency Profile.