16.11.5. 节点问题的 Pod 驱除

OpenShift Container Platform 可以配置为用污点来代表节点不可访问和节点未就绪状况。这样，就可以对每个 pod 设置在节点变得不可访问或未就绪时保持与节点绑定的时长，而不是使用默认的五分钟。

当启用 Taint Based Evictions 功能时，节点控制器会自动添加污点，并且禁用从 Ready 节点驱除 pod 的一般逻辑。

如果节点进入未就绪状态，则添加 node.kubernetes.io/not-ready:NoExecute 污点，并且无法将 pod 调度到该节点上。现有 pod 在容限秒数期限内保留。
如果节点进入不可访问状态，则添加 node.kubernetes.io/unreachable:NoExecute 污点，并且无法将 pod 调度到该节点上。现有 pod 在容限秒数期限内保留。

启用基于污点的驱除：

检查污点是否已添加到节点：

$ oc describe node $node | grep -i taint

Taints: node.kubernetes.io/not-ready:NoExecute

重启 OpenShift 以使更改生效：

# master-restart api
# master-restart controllers

为 pod 添加容限：

tolerations:
- key: "node.kubernetes.io/unreachable"
  operator: "Exists"
  effect: "NoExecute"
  tolerationSeconds: 6000

或者

tolerations:
- key: "node.kubernetes.io/not-ready"
  operator: "Exists"
  effect: "NoExecute"
  tolerationSeconds: 6000

注意

为了保持由于节点问题而导致 pod 驱除的现有速率限制行为，系统以限速方式添加污点。这可防止在主控机从节点分区等情形中发生大量 pod 驱除。