Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.
Chapter 9. Remote worker nodes on the network edge
9.1. Using remote worker nodes at the network edge Link kopierenLink in die Zwischenablage kopiert!
You can configure OpenShift Container Platform clusters with nodes located at your network edge. In this topic, they are called remote worker nodes. A typical cluster with remote worker nodes combines on-premise master and worker nodes with worker nodes in other locations that connect to the cluster. This topic is intended to provide guidance on best practices for using remote worker nodes and does not contain specific configuration details.
There are multiple use cases across different industries, such as telecommunications, retail, manufacturing, and government, for using a deployment pattern with remote worker nodes. For example, you can separate and isolate your projects and workloads by combining the remote worker nodes into Kubernetes zones.
However, having remote worker nodes can introduce higher latency, intermittent loss of network connectivity, and other issues. Among the challenges in a cluster with remote worker node are:
- Network separation: The OpenShift Container Platform control plane and the remote worker nodes must be able communicate with each other. Because of the distance between the control plane and the remote worker nodes, network issues could prevent this communication. See Network separation with remote worker nodes for information on how OpenShift Container Platform responds to network separation and for methods to diminish the impact to your cluster.
- Power outage: Because the control plane and remote worker nodes are in separate locations, a power outage at the remote location or at any point between the two can negatively impact your cluster. See Power loss on remote worker nodes for information on how OpenShift Container Platform responds to a node losing power and for methods to diminish the impact to your cluster.
- Latency spikes or temporary reduction in throughput: As with any network, any changes in network conditions between your cluster and the remote worker nodes can negatively impact your cluster. OpenShift Container Platform offers multiple worker latency profiles that let you control the reaction of the cluster to latency issues.
Note the following limitations when planning a cluster with remote worker nodes:
- OpenShift Container Platform does not support remote worker nodes that use a different cloud provider than the on-premise cluster uses.
- Moving workloads from one Kubernetes zone to a different Kubernetes zone can be problematic due to system and environment issues, such as a specific type of memory not being available in a different zone.
- Proxies and firewalls can present additional limitations that are beyond the scope of this document. See the relevant OpenShift Container Platform documentation for how to address such limitations, such as Configuring your firewall.
- You are responsible for configuring and maintaining L2/L3-level network connectivity between the control plane and the network-edge nodes.
9.1.1. Adding remote worker nodes Link kopierenLink in die Zwischenablage kopiert!
Adding remote worker nodes to a cluster involves some additional considerations.
- You must ensure that a route or a default gateway is in place to route traffic between the control plane and every remote worker node.
- You must place the Ingress VIP on the control plane.
- Adding remote worker nodes with user-provisioned infrastructure is identical to adding other worker nodes.
-
To add remote worker nodes to an installer-provisioned cluster at install time, specify the subnet for each worker node in the file before installation. There are no additional settings required for the DHCP server. You must use virtual media, because the remote worker nodes will not have access to the local provisioning network.
install-config.yaml -
To add remote worker nodes to an installer-provisioned cluster deployed with a provisioning network, ensure that flag is set to
virtualMediaViaExternalNetworkin thetruefile so that it will add the nodes using virtual media. Remote worker nodes will not have access to the local provisioning network. They must be deployed with virtual media rather than PXE. Additionally, specify each subnet for each group of remote worker nodes and the control plane nodes in the DHCP server.install-config.yaml
9.1.2. Network separation with remote worker nodes Link kopierenLink in die Zwischenablage kopiert!
All nodes send heartbeats to the Kubernetes Controller Manager Operator (kube controller) in the OpenShift Container Platform cluster every 10 seconds. If the cluster does not receive heartbeats from a node, OpenShift Container Platform responds using several default mechanisms.
OpenShift Container Platform is designed to be resilient to network partitions and other disruptions. You can mitigate some of the more common disruptions, such as interruptions from software upgrades, network splits, and routing issues. Mitigation strategies include ensuring that pods on remote worker nodes request the correct amount of CPU and memory resources, configuring an appropriate replication policy, using redundancy across zones, and using Pod Disruption Budgets on workloads.
If the kube controller loses contact with a node after a configured period, the node controller on the control plane updates the node health to
Unhealthy
Ready
Unknown
node.kubernetes.io/unreachable
NoExecute
If a workload controller, such as a
Deployment
StatefulSet
You can mitigate the effects of connection loss by:
- using daemon sets to create pods that tolerate the taints
- using static pods that automatically restart if a node goes down
- using Kubernetes zones to control pod eviction
- configuring pod tolerations to delay or avoid pod eviction
- configuring the kubelet to control the timing of when it marks nodes as unhealthy.
For more information on using these objects in a cluster with remote worker nodes, see About remote worker node strategies.
9.1.3. Power loss on remote worker nodes Link kopierenLink in die Zwischenablage kopiert!
If a remote worker node loses power or restarts ungracefully, OpenShift Container Platform responds using several default mechanisms.
If the Kubernetes Controller Manager Operator (kube controller) loses contact with a node after a configured period, the control plane updates the node health to
Unhealthy
Ready
Unknown
node.kubernetes.io/unreachable
NoExecute
On the node, the pods must be restarted when the node recovers power and reconnects with the control plane.
If you want the pods to restart immediately upon restart, use static pods.
After the node restarts, the kubelet also restarts and attempts to restart the pods that were scheduled on the node. If the connection to the control plane takes longer than the default five minutes, the control plane cannot update the node health and remove the
node.kubernetes.io/unreachable
You can mitigate the effects of power loss by:
- using daemon sets to create pods that tolerate the taints
- using static pods that automatically restart with a node
- configuring pods tolerations to delay or avoid pod eviction
- configuring the kubelet to control the timing of when the node controller marks nodes as unhealthy.
For more information on using these objects in a cluster with remote worker nodes, see About remote worker node strategies.
9.1.4. Latency spikes or temporary reduction in throughput to remote workers Link kopierenLink in die Zwischenablage kopiert!
If the cluster administrator has performed latency tests for platform verification, they can discover the need to adjust the operation of the cluster to ensure stability in cases of high latency. The cluster administrator need change only one parameter, recorded in a file, which controls four parameters affecting how supervisory processes read status and interpret the health of the cluster. Changing only the one parameter provides cluster tuning in an easy, supportable manner.
The
Kubelet
Kubelet
kube controller
kube controller
-
The node controller on the control plane updates the node health to and marks the node
Unhealthycondition`Unknown`.Ready - In response, the scheduler stops scheduling pods to that node.
-
The Node Lifecycle Controller adds a taint with a
node.kubernetes.io/unreachableeffect to the node and schedules any pods on the node for eviction after five minutes, by default.NoExecute
This behavior can cause problems if your network is prone to latency issues, especially if you have nodes at the network edge. In some cases, the Kubernetes Controller Manager might not receive an update from a healthy node due to network latency. The
Kubelet
To avoid this problem, you can use worker latency profiles to adjust the frequency that the
Kubelet
These worker latency profiles contain three sets of parameters that are pre-defined with carefully tuned values to control the reaction of the cluster to increased latency. No need to experimentally find the best values manually.
You can configure worker latency profiles when installing a cluster or at any time you notice increased latency in your cluster network.
9.1.5. Remote worker node strategies Link kopierenLink in die Zwischenablage kopiert!
If you use remote worker nodes, consider which objects to use to run your applications.
It is recommended to use daemon sets or static pods based on the behavior you want in the event of network issues or power loss. In addition, you can use Kubernetes zones and tolerations to control or avoid pod evictions if the control plane cannot reach remote worker nodes.
- Daemon sets
- Daemon sets are the best approach to managing pods on remote worker nodes for the following reasons:
-
Daemon sets do not typically need rescheduling behavior. If a node disconnects from the cluster, pods on the node can continue to run. OpenShift Container Platform does not change the state of daemon set pods, and leaves the pods in the state they last reported. For example, if a daemon set pod is in the state, when a node stops communicating, the pod keeps running and is assumed to be running by OpenShift Container Platform.
Running Daemon set pods, by default, are created with
tolerations for theNoExecuteandnode.kubernetes.io/unreachabletaints with nonode.kubernetes.io/not-readyvalue. These default values ensure that daemon set pods are never evicted if the control plane cannot reach a node. For example:tolerationSecondsTolerations added to daemon set pods by default
tolerations: - key: node.kubernetes.io/not-ready operator: Exists effect: NoExecute - key: node.kubernetes.io/unreachable operator: Exists effect: NoExecute - key: node.kubernetes.io/disk-pressure operator: Exists effect: NoSchedule - key: node.kubernetes.io/memory-pressure operator: Exists effect: NoSchedule - key: node.kubernetes.io/pid-pressure operator: Exists effect: NoSchedule - key: node.kubernetes.io/unschedulable operator: Exists effect: NoSchedule- Daemon sets can use labels to ensure that a workload runs on a matching worker node.
- You can use an OpenShift Container Platform service endpoint to load balance daemon set pods.
Daemon sets do not schedule pods after a reboot of the node if OpenShift Container Platform cannot reach the node.
- Static pods
- If you want pods restart if a node reboots, after a power loss for example, consider static pods. The kubelet on a node automatically restarts static pods as node restarts.
Static pods cannot use secrets and config maps.
- Kubernetes zones
- Kubernetes zones can slow down the rate or, in some cases, completely stop pod evictions.
When the control plane cannot reach a node, the node controller, by default, applies
node.kubernetes.io/unreachable
If a zone is fully disrupted, where all nodes in the zone have a
Ready
False
Unknown
node.kubernetes.io/unreachable
For partially disrupted zones, where more than 55% of the nodes have a
False
Unknown
You assign a node to a specific zone by applying the
topology.kubernetes.io/region
Sample node labels for Kubernetes zones
kind: Node
apiVersion: v1
metadata:
labels:
topology.kubernetes.io/region=east
KubeletConfigobjects
You can adjust the amount of time that the kubelet checks the state of each node.
To set the interval that affects the timing of when the on-premise node controller marks nodes with the
Unhealthy
Unreachable
KubeletConfig
node-status-update-frequency
node-status-report-frequency
The kubelet on each node determines the node status as defined by the
node-status-update-frequency
node-status-report-frequency
node-status-report-frequency
node-status-update-frequency
Example kubelet config
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
name: disable-cpu-units
spec:
machineConfigPoolSelector:
matchLabels:
machineconfiguration.openshift.io/role: worker
kubeletConfig:
node-status-update-frequency:
- "10s"
node-status-report-frequency:
- "1m"
- 1
- Specify the type of node type to which this
KubeletConfigobject applies using the label from theMachineConfigobject. - 2
- Specify the frequency that the kubelet checks the status of a node associated with this
MachineConfigobject. The default value is10s. If you change this default, thenode-status-report-frequencyvalue is changed to the same value. - 3
- Specify the frequency that the kubelet reports the status of a node associated with this
MachineConfigobject. The default value is1m.
The
node-status-update-frequency
node-monitor-grace-period
pod-eviction-timeout
-
The parameter specifies how long OpenShift Container Platform waits after a node associated with a
node-monitor-grace-periodobject is markedMachineConfigif the controller manager does not receive the node heartbeat. Workloads on the node continue to run after this time. If the remote worker node rejoins the cluster afterUnhealthyexpires, pods continue to run. New pods can be scheduled to that node. Thenode-monitor-grace-periodinterval isnode-monitor-grace-period. The40svalue must be lower than thenode-status-update-frequencyvalue.node-monitor-grace-period -
The parameter specifies the amount of time OpenShift Container Platform waits after marking a node that is associated with a
pod-eviction-timeoutobject asMachineConfigto start marking pods for eviction. Evicted pods are rescheduled on other nodes. If the remote worker node rejoins the cluster afterUnreachableexpires, the pods running on the remote worker node are terminated because the node controller has evicted the pods on-premise. Pods can then be rescheduled to that node. Thepod-eviction-timeoutinterval ispod-eviction-timeout.5m0s
Modifying the
node-monitor-grace-period
pod-eviction-timeout
- Tolerations
-
You can use pod tolerations to mitigate the effects if the on-premise node controller adds a
node.kubernetes.io/unreachabletaint with aNoExecuteeffect to a node it cannot reach.
A taint with the
NoExecute
- Pods that do not tolerate the taint are queued for eviction.
-
Pods that tolerate the taint without specifying a value in their toleration specification remain bound forever.
tolerationSeconds -
Pods that tolerate the taint with a specified value remain bound for the specified amount of time. After the time elapses, the pods are queued for eviction.
tolerationSeconds
You can delay or avoid pod eviction by configuring pods tolerations with the
NoExecute
node.kubernetes.io/unreachable
node.kubernetes.io/not-ready
Example toleration in a pod spec
...
tolerations:
- key: "node.kubernetes.io/unreachable"
operator: "Exists"
effect: "NoExecute"
- key: "node.kubernetes.io/not-ready"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 600
...
OpenShift Container Platform uses the
tolerationSeconds
pod-eviction-timeout
- Other types of OpenShift Container Platform objects
- You can use replica sets, deployments, and replication controllers. The scheduler can reschedule these pods onto other nodes after the node is disconnected for five minutes. Rescheduling onto other nodes can be beneficial for some workloads, such as REST APIs, where an administrator can guarantee a specific number of pods are running and accessible.
When working with remote worker nodes, rescheduling pods on different nodes might not be acceptable if remote worker nodes are intended to be reserved for specific functions.
stateful sets do not get restarted when there is an outage. The pods remain in the
terminating
To avoid scheduling a to a node that does not have access to the same type of persistent storage, OpenShift Container Platform cannot migrate pods that require persistent volumes to other zones in the case of network separation.