5.3. 节点主机任务


5.3.1. 弃用节点主机

无论弃用基础架构节点或应用程序节点,步骤都是一样的。

先决条件

确保有足够的容量,以便将现有 pod 从节点集合中移除。只有在删除基础架构节点时,才建议删除基础架构节点。

流程
  1. 列出所有可用的节点,以查找要弃用的节点:

    $ oc get nodes
    NAME                  STATUS                     AGE       VERSION
    ocp-infra-node-b7pl   Ready                      23h       v1.6.1+5115d708d7
    ocp-infra-node-p5zj   Ready                      23h       v1.6.1+5115d708d7
    ocp-infra-node-rghb   Ready                      23h       v1.6.1+5115d708d7
    ocp-master-dgf8       Ready,SchedulingDisabled   23h       v1.6.1+5115d708d7
    ocp-master-q1v2       Ready,SchedulingDisabled   23h       v1.6.1+5115d708d7
    ocp-master-vq70       Ready,SchedulingDisabled   23h       v1.6.1+5115d708d7
    ocp-node-020m         Ready                      23h       v1.6.1+5115d708d7
    ocp-node-7t5p         Ready                      23h       v1.6.1+5115d708d7
    ocp-node-n0dd         Ready                      23h       v1.6.1+5115d708d7

    例如,本节弃用 ocp-infra-node-b7pl 基础架构节点。

  2. 描述节点及其运行的服务:

    $ oc describe node ocp-infra-node-b7pl
    Name:			ocp-infra-node-b7pl
    Role:
    Labels:			beta.kubernetes.io/arch=amd64
    			beta.kubernetes.io/instance-type=n1-standard-2
    			beta.kubernetes.io/os=linux
    			failure-domain.beta.kubernetes.io/region=europe-west3
    			failure-domain.beta.kubernetes.io/zone=europe-west3-c
    			kubernetes.io/hostname=ocp-infra-node-b7pl
    			role=infra
    Annotations:		volumes.kubernetes.io/controller-managed-attach-detach=true
    Taints:			<none>
    CreationTimestamp:	Wed, 22 Nov 2017 09:36:36 -0500
    Phase:
    Conditions:
      ...
    Addresses:		10.156.0.11,ocp-infra-node-b7pl
    Capacity:
     cpu:		2
     memory:	7494480Ki
     pods:		20
    Allocatable:
     cpu:		2
     memory:	7392080Ki
     pods:		20
    System Info:
     Machine ID:			bc95ccf67d047f2ae42c67862c202e44
     System UUID:			9762CC3D-E23C-AB13-B8C5-FA16F0BCCE4C
     Boot ID:			ca8bf088-905d-4ec0-beec-8f89f4527ce4
     Kernel Version:		3.10.0-693.5.2.el7.x86_64
     OS Image:			Employee SKU
     Operating System:		linux
     Architecture:			amd64
     Container Runtime Version:	docker://1.12.6
     Kubelet Version:		v1.6.1+5115d708d7
     Kube-Proxy Version:		v1.6.1+5115d708d7
    ExternalID:			437740049672994824
    Non-terminated Pods:		(2 in total)
      Namespace			Name				CPU Requests	CPU Limits	Memory Requests	Memory Limits
      ---------			----				------------	----------	---------------	-------------
      default			docker-registry-1-5szjs		100m (5%)	0 (0%)		256Mi (3%)0 (0%)
      default			router-1-vzlzq			100m (5%)	0 (0%)		256Mi (3%)0 (0%)
    Allocated resources:
      (Total limits may be over 100 percent, i.e., overcommitted.)
      CPU Requests	CPU Limits	Memory Requests	Memory Limits
      ------------	----------	---------------	-------------
      200m (10%)	0 (0%)		512Mi (7%)	0 (0%)
    Events:		<none>

    上面的输出显示节点正在运行两个 pod: router-1-vzlzqdocker-registry-1-5szjs。有两个基础架构节点可用于迁移这两个 pod。

    注意

    上述集群是一个高可用性集群,这意味着 routerdocker-registry 服务在所有基础架构节点上运行。

  3. 将节点标记为不可调度并撤离其所有 pod:

    $ oc adm drain ocp-infra-node-b7pl --delete-local-data
    node "ocp-infra-node-b7pl" cordoned
    WARNING: Deleting pods with local storage: docker-registry-1-5szjs
    pod "docker-registry-1-5szjs" evicted
    pod "router-1-vzlzq" evicted
    node "ocp-infra-node-b7pl" drained

    如果 pod 已附加了本地存储(例如 EmptyDir),则必须提供 --delete-local-data 选项。通常,在生产环境中运行的 pod 应该只将本地存储用于临时或缓存文件,但不适用于任何重要或持久的。对于常规存储,应用程序应使用对象存储或持久性卷。在这种情况下,docker-registry pod 的本地存储为空,因为对象存储被用来存储容器镜像。

    注意

    以上操作会删除节点上运行的现有 pod。然后,根据复制控制器创建新的 pod。

    通常,每个应用都应该使用部署配置进行部署,这将利用复制控制器创建 pod。

    oc adm drain 不会删除任何不是镜像 pod 的裸机 pod(镜像 pod,或由 ReplicationControllerReplicaSetDaemonSetStatefulSet 或作业进行管理的 pod)。要做到这一点,需要 --force 选项。请注意,在此操作过程中,不会在其他节点上重新创建裸机 pod,数据可能会丢失。

    以下示例显示了 registry 的复制控制器的输出:

    $ oc describe rc/docker-registry-1
    Name:		docker-registry-1
    Namespace:	default
    Selector:	deployment=docker-registry-1,deploymentconfig=docker-registry,docker-registry=default
    Labels:		docker-registry=default
    		openshift.io/deployment-config.name=docker-registry
    Annotations: ...
    Replicas:	3 current / 3 desired
    Pods Status:	3 Running / 0 Waiting / 0 Succeeded / 0 Failed
    Pod Template:
      Labels:		deployment=docker-registry-1
    			deploymentconfig=docker-registry
    			docker-registry=default
      Annotations:		openshift.io/deployment-config.latest-version=1
    			openshift.io/deployment-config.name=docker-registry
    			openshift.io/deployment.name=docker-registry-1
      Service Account:	registry
      Containers:
       registry:
        Image:	openshift3/ose-docker-registry:v3.6.173.0.49
        Port:	5000/TCP
        Requests:
          cpu:	100m
          memory:	256Mi
        Liveness:	http-get https://:5000/healthz delay=10s timeout=5s period=10s #success=1 #failure=3
        Readiness:	http-get https://:5000/healthz delay=0s timeout=5s period=10s #success=1 #failure=3
        Environment:
          REGISTRY_HTTP_ADDR:					:5000
          REGISTRY_HTTP_NET:					tcp
          REGISTRY_HTTP_SECRET:					tyGEnDZmc8dQfioP3WkNd5z+Xbdfy/JVXf/NLo3s/zE=
          REGISTRY_MIDDLEWARE_REPOSITORY_OPENSHIFT_ENFORCEQUOTA:	false
          REGISTRY_HTTP_TLS_KEY:					/etc/secrets/registry.key
          OPENSHIFT_DEFAULT_REGISTRY:				docker-registry.default.svc:5000
          REGISTRY_CONFIGURATION_PATH:				/etc/registry/config.yml
          REGISTRY_HTTP_TLS_CERTIFICATE:				/etc/secrets/registry.crt
        Mounts:
          /etc/registry from docker-config (rw)
          /etc/secrets from registry-certificates (rw)
          /registry from registry-storage (rw)
      Volumes:
       registry-storage:
        Type:	EmptyDir (a temporary directory that shares a pod's lifetime)
        Medium:
       registry-certificates:
        Type:	Secret (a volume populated by a Secret)
        SecretName:	registry-certificates
        Optional:	false
       docker-config:
        Type:	Secret (a volume populated by a Secret)
        SecretName:	registry-config
        Optional:	false
    Events:
      FirstSeen	LastSeen	Count	From			SubObjectPath	Type		Reason		Message
      ---------	--------	-----	----			-------------	--------	------		-------
      49m		49m		1	replication-controller			Normal		SuccessfulCreate	Created pod: docker-registry-1-dprp5

    输出底部的事件显示有关新 pod 创建的信息。因此,当列出所有 pod 时:

    $ oc get pods
    NAME                       READY     STATUS    RESTARTS   AGE
    docker-registry-1-dprp5    1/1       Running   0          52m
    docker-registry-1-kr8jq    1/1       Running   0          1d
    docker-registry-1-ncpl2    1/1       Running   0          1d
    registry-console-1-g4nqg   1/1       Running   0          1d
    router-1-2gshr             0/1       Pending   0          52m
    router-1-85qm4             1/1       Running   0          1d
    router-1-q5sr8             1/1       Running   0          1d
  4. 现在,在已弃用节点上运行的 docker-registry-1-5szjsrouter-1-vzlzq pod 不再可用。相反,创建了两个新 pod: docker-registry-1-dprp5router-1-2gshr。如上所示,新的路由器 Pod 是 router-1-2gshr,但处于 Pending 状态。这是因为每个节点只能在一个路由器中运行,并绑定到主机的端口 80 和 443。
  5. 观察新创建的 registry pod 时,以下示例显示了 ocp-infra-node-rghb 节点上已创建了 pod,它与弃用节点的不同:

    $ oc describe pod docker-registry-1-dprp5
    Name:			docker-registry-1-dprp5
    Namespace:		default
    Security Policy:	hostnetwork
    Node:			ocp-infra-node-rghb/10.156.0.10
    ...

    弃用基础架构和应用程序节点的唯一区别在于,在基础架构节点被撤离后,如果没有计划替换该节点,在基础架构节点上运行的服务可以缩减:

    $ oc scale dc/router --replicas 2
    deploymentconfig "router" scaled
    
    $ oc scale dc/docker-registry --replicas 2
    deploymentconfig "docker-registry" scaled
  6. 现在,每个基础架构节点只运行一个 pod 中的一类:

    $ oc get pods
    NAME                       READY     STATUS    RESTARTS   AGE
    docker-registry-1-kr8jq    1/1       Running   0          1d
    docker-registry-1-ncpl2    1/1       Running   0          1d
    registry-console-1-g4nqg   1/1       Running   0          1d
    router-1-85qm4             1/1       Running   0          1d
    router-1-q5sr8             1/1       Running   0          1d
    
    $ oc describe po/docker-registry-1-kr8jq | grep Node:
    Node:			ocp-infra-node-p5zj/10.156.0.9
    
    $ oc describe po/docker-registry-1-ncpl2 | grep Node:
    Node:			ocp-infra-node-rghb/10.156.0.10
    注意

    要提供完整的高可用性集群,至少有三个基础架构节点应当始终可用。

  7. 验证节点上的调度是否已禁用:

    $ oc get nodes
    NAME                  STATUS                     AGE       VERSION
    ocp-infra-node-b7pl   Ready,SchedulingDisabled   1d        v1.6.1+5115d708d7
    ocp-infra-node-p5zj   Ready                      1d        v1.6.1+5115d708d7
    ocp-infra-node-rghb   Ready                      1d        v1.6.1+5115d708d7
    ocp-master-dgf8       Ready,SchedulingDisabled   1d        v1.6.1+5115d708d7
    ocp-master-q1v2       Ready,SchedulingDisabled   1d        v1.6.1+5115d708d7
    ocp-master-vq70       Ready,SchedulingDisabled   1d        v1.6.1+5115d708d7
    ocp-node-020m         Ready                      1d        v1.6.1+5115d708d7
    ocp-node-7t5p         Ready                      1d        v1.6.1+5115d708d7
    ocp-node-n0dd         Ready                      1d        v1.6.1+5115d708d7

    节点没有包含任何 pod:

    $ oc describe node ocp-infra-node-b7pl
    Name:			ocp-infra-node-b7pl
    Role:
    Labels:			beta.kubernetes.io/arch=amd64
    			beta.kubernetes.io/instance-type=n1-standard-2
    			beta.kubernetes.io/os=linux
    			failure-domain.beta.kubernetes.io/region=europe-west3
    			failure-domain.beta.kubernetes.io/zone=europe-west3-c
    			kubernetes.io/hostname=ocp-infra-node-b7pl
    			role=infra
    Annotations:		volumes.kubernetes.io/controller-managed-attach-detach=true
    Taints:			<none>
    CreationTimestamp:	Wed, 22 Nov 2017 09:36:36 -0500
    Phase:
    Conditions:
      ...
    Addresses:		10.156.0.11,ocp-infra-node-b7pl
    Capacity:
     cpu:		2
     memory:	7494480Ki
     pods:		20
    Allocatable:
     cpu:		2
     memory:	7392080Ki
     pods:		20
    System Info:
     Machine ID:			bc95ccf67d047f2ae42c67862c202e44
     System UUID:			9762CC3D-E23C-AB13-B8C5-FA16F0BCCE4C
     Boot ID:			ca8bf088-905d-4ec0-beec-8f89f4527ce4
     Kernel Version:		3.10.0-693.5.2.el7.x86_64
     OS Image:			Employee SKU
     Operating System:		linux
     Architecture:			amd64
     Container Runtime Version:	docker://1.12.6
     Kubelet Version:		v1.6.1+5115d708d7
     Kube-Proxy Version:		v1.6.1+5115d708d7
    ExternalID:			437740049672994824
    Non-terminated Pods:		(0 in total)
      Namespace			Name		CPU Requests	CPU Limits	Memory Requests	Memory Limits
      ---------			----		------------	----------	---------------	-------------
    Allocated resources:
      (Total limits may be over 100 percent, i.e., overcommitted.)
      CPU Requests	CPU Limits	Memory Requests	Memory Limits
      ------------	----------	---------------	-------------
      0 (0%)	0 (0%)		0 (0%)		0 (0%)
    Events:		<none>
  8. /etc/haproxy/haproxy.cfg 配置文件中的 backend 部分删除基础架构实例:

    backend router80
        balance source
        mode tcp
        server infra-1.example.com 192.168.55.12:80 check
        server infra-2.example.com 192.168.55.13:80 check
    
    backend router443
        balance source
        mode tcp
        server infra-1.example.com 192.168.55.12:443 check
        server infra-2.example.com 192.168.55.13:443 check
  9. 然后,重新启动 haproxy 服务。

    $ sudo systemctl restart haproxy
  10. 在所有 pod 都被逐出后,使用以下命令从集群中删除节点:

    $ oc delete node ocp-infra-node-b7pl
    node "ocp-infra-node-b7pl" deleted
    $ oc get nodes
    NAME                  STATUS                     AGE       VERSION
    ocp-infra-node-p5zj   Ready                      1d        v1.6.1+5115d708d7
    ocp-infra-node-rghb   Ready                      1d        v1.6.1+5115d708d7
    ocp-master-dgf8       Ready,SchedulingDisabled   1d        v1.6.1+5115d708d7
    ocp-master-q1v2       Ready,SchedulingDisabled   1d        v1.6.1+5115d708d7
    ocp-master-vq70       Ready,SchedulingDisabled   1d        v1.6.1+5115d708d7
    ocp-node-020m         Ready                      1d        v1.6.1+5115d708d7
    ocp-node-7t5p         Ready                      1d        v1.6.1+5115d708d7
    ocp-node-n0dd         Ready                      1d        v1.6.1+5115d708d7
注意

如需有关撤离和排空 pod 或节点的更多信息,请参阅节点维护部分。

5.3.1.1. 替换节点主机

如果需要添加节点以替代已弃用节点,请按照 将主机添加到现有集群 部分。

Red Hat logoGithubRedditYoutubeTwitter

学习

尝试、购买和销售

社区

关于红帽文档

通过我们的产品和服务,以及可以信赖的内容,帮助红帽用户创新并实现他们的目标。

让开源更具包容性

红帽致力于替换我们的代码、文档和 Web 属性中存在问题的语言。欲了解更多详情,请参阅红帽博客.

關於紅帽

我们提供强化的解决方案,使企业能够更轻松地跨平台和环境(从核心数据中心到网络边缘)工作。

© 2024 Red Hat, Inc.