4.2. Restarting the cluster


You can restart your cluster after it has been shut down gracefully.

Prerequisites

  • You have access to the cluster as a user with the cluster-admin role.
  • This procedure assumes that you gracefully shut down the cluster.

Procedure

  1. Power on any cluster dependencies, such as external storage or an LDAP server.
  2. Start all cluster machines.

    Use the appropriate method for your cloud environment to start the machines, for example, from your cloud provider’s web console.

    Wait approximately 10 minutes before continuing to check the status of master nodes.

  3. Verify that all master nodes are ready.

    $ oc get nodes -l node-role.kubernetes.io/master

    The master nodes are ready if the status is Ready, as shown in the following output:

    NAME                           STATUS   ROLES    AGE   VERSION
    ip-10-0-168-251.ec2.internal   Ready    master   75m   v1.18.3
    ip-10-0-170-223.ec2.internal   Ready    master   75m   v1.18.3
    ip-10-0-211-16.ec2.internal    Ready    master   75m   v1.18.3
  4. If the master nodes are not ready, then check whether there are any pending certificate signing requests (CSRs) that must be approved.

    1. Get the list of current CSRs:

      $ oc get csr
    2. Review the details of a CSR to verify that it is valid:

      $ oc describe csr <csr_name> 1
      1
      <csr_name> is the name of a CSR from the list of current CSRs.
    3. Approve each valid CSR:

      $ oc adm certificate approve <csr_name>
  5. After the master nodes are ready, verify that all worker nodes are ready.

    $ oc get nodes -l node-role.kubernetes.io/worker

    The worker nodes are ready if the status is Ready, as shown in the following output:

    NAME                           STATUS   ROLES    AGE   VERSION
    ip-10-0-179-95.ec2.internal    Ready    worker   64m   v1.18.3
    ip-10-0-182-134.ec2.internal   Ready    worker   64m   v1.18.3
    ip-10-0-250-100.ec2.internal   Ready    worker   64m   v1.18.3
  6. If the worker nodes are not ready, then check whether there are any pending certificate signing requests (CSRs) that must be approved.

    1. Get the list of current CSRs:

      $ oc get csr
    2. Review the details of a CSR to verify that it is valid:

      $ oc describe csr <csr_name> 1
      1
      <csr_name> is the name of a CSR from the list of current CSRs.
    3. Approve each valid CSR:

      $ oc adm certificate approve <csr_name>
  7. Verify that the cluster started properly.

    1. Check that there are no degraded cluster Operators.

      $ oc get clusteroperators

      Check that there are no cluster Operators with the DEGRADED condition set to True.

      NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
      authentication                             4.5.0     True        False         False      59m
      cloud-credential                           4.5.0     True        False         False      85m
      cluster-autoscaler                         4.5.0     True        False         False      73m
      config-operator                            4.5.0     True        False         False      73m
      console                                    4.5.0     True        False         False      62m
      csi-snapshot-controller                    4.5.0     True        False         False      66m
      dns                                        4.5.0     True        False         False      76m
      etcd                                       4.5.0     True        False         False      76m
      ...
    2. Check that all nodes are in the Ready state:

      $ oc get nodes

      Check that the status for all nodes is Ready.

      NAME                           STATUS   ROLES    AGE   VERSION
      ip-10-0-168-251.ec2.internal   Ready    master   82m   v1.18.3
      ip-10-0-170-223.ec2.internal   Ready    master   82m   v1.18.3
      ip-10-0-179-95.ec2.internal    Ready    worker   70m   v1.18.3
      ip-10-0-182-134.ec2.internal   Ready    worker   70m   v1.18.3
      ip-10-0-211-16.ec2.internal    Ready    master   82m   v1.18.3
      ip-10-0-250-100.ec2.internal   Ready    worker   69m   v1.18.3

If the cluster did not start properly, you might need to restore your cluster using an etcd backup.

Additional resources

Red Hat logoGithubRedditYoutubeTwitter

学习

尝试、购买和销售

社区

关于红帽文档

通过我们的产品和服务,以及可以信赖的内容,帮助红帽用户创新并实现他们的目标。

让开源更具包容性

红帽致力于替换我们的代码、文档和 Web 属性中存在问题的语言。欲了解更多详情,请参阅红帽博客.

關於紅帽

我们提供强化的解决方案,使企业能够更轻松地跨平台和环境(从核心数据中心到网络边缘)工作。

© 2024 Red Hat, Inc.