主页
产品
OpenShift Container Platform
4.5
Backup and restore
2.3. Determining the state of the unhealthy etcd member

2.3. Determining the state of the unhealthy etcd member

The steps to replace an unhealthy etcd member depend on which of the following states your etcd member is in:

The machine is not running or the node is not ready
The etcd pod is crashlooping

This procedure determines which state your etcd member is in. This enables you to know which procedure to follow to replace the unhealthy etcd member.

注意

If you are aware that the machine is not running or the node is not ready, but you expect it to return to a healthy state soon, then you do not need to perform a procedure to replace the etcd member. The etcd cluster Operator will automatically sync when the machine or node returns to a healthy state.

Prerequisites

You have access to the cluster as a user with the cluster-admin role.
You have identified an unhealthy etcd member.

Procedure

Determine if the machine is not running:
```
oc get machines -A -ojsonpath='{range .items[*]}{@.status.nodeRef.name}{"\t"}{@.status.providerStatus.instanceState}{"\n"}' | grep -v running
```
```
$ oc get machines -A -ojsonpath='{range .items[*]}{@.status.nodeRef.name}{"\t"}{@.status.providerStatus.instanceState}{"\n"}' | grep -v running
```
Copy to Clipboard Toggle word wrap
Example output
```
ip-10-0-131-183.ec2.internal  stopped 
```
```
ip-10-0-131-183.ec2.internal  stopped 
```
1
Copy to Clipboard Toggle word wrap
1
This output lists the node and the status of the node’s machine. If the status is anything other than running, then the machine is not running.
If the machine is not running, then follow the Replacing an unhealthy etcd member whose machine is not running or whose node is not ready procedure.
Determine if the node is not ready.
If either of the following scenarios are true, then the node is not ready.
- If the machine is running, then check whether the node is unreachable:
  $ oc get nodes -o jsonpath='{range .items[*]}{"\n"}{.metadata.name}{"\t"}{range .spec.taints[*]}{.key}{" "}' | grep unreachable
  Copy to Clipboard Toggle word wrap
  Example output
  ip-10-0-131-183.ec2.internal node-role.kubernetes.io/master node.kubernetes.io/unreachable node.kubernetes.io/unreachable
  1
  
  Copy to Clipboard Toggle word wrap
  1
  If the node is listed with an unreachable taint, then the node is not ready.
- If the node is still reachable, then check whether the node is listed as NotReady:
  $ oc get nodes -l node-role.kubernetes.io/master | grep "NotReady"
  Copy to Clipboard Toggle word wrap
  Example output
  ip-10-0-131-183.ec2.internal NotReady master 122m v1.18.3
  1
  
  Copy to Clipboard Toggle word wrap
  1
  If the node is listed as NotReady, then the node is not ready.
If the node is not ready, then follow the Replacing an unhealthy etcd member whose machine is not running or whose node is not ready procedure.

Determine if the etcd pod is crashlooping.

If the machine is running and the node is ready, then check whether the etcd pod is crashlooping.

Verify that all master nodes are listed as Ready:

oc get nodes -l node-role.kubernetes.io/master

$ oc get nodes -l node-role.kubernetes.io/master

Copy to Clipboard

Toggle word wrap

Example output

NAME                           STATUS   ROLES    AGE     VERSION
ip-10-0-131-183.ec2.internal   Ready    master   6h13m   v1.18.3
ip-10-0-164-97.ec2.internal    Ready    master   6h13m   v1.18.3
ip-10-0-154-204.ec2.internal   Ready    master   6h13m   v1.18.3

NAME                           STATUS   ROLES    AGE     VERSION
ip-10-0-131-183.ec2.internal   Ready    master   6h13m   v1.18.3
ip-10-0-164-97.ec2.internal    Ready    master   6h13m   v1.18.3
ip-10-0-154-204.ec2.internal   Ready    master   6h13m   v1.18.3

Copy to Clipboard

Toggle word wrap

Check whether the status of an etcd pod is either Error or CrashloopBackoff:

oc get pods -n openshift-etcd | grep etcd

$ oc get pods -n openshift-etcd | grep etcd

Copy to Clipboard

Toggle word wrap

Example output

etcd-ip-10-0-131-183.ec2.internal                2/3     Error       7          6h9m 
etcd-ip-10-0-164-97.ec2.internal                 3/3     Running     0          6h6m
etcd-ip-10-0-154-204.ec2.internal                3/3     Running     0          6h6m

etcd-ip-10-0-131-183.ec2.internal                2/3     Error       7          6h9m


etcd-ip-10-0-164-97.ec2.internal                 3/3     Running     0          6h6m
etcd-ip-10-0-154-204.ec2.internal                3/3     Running     0          6h6m

Copy to Clipboard

Toggle word wrap

1: Since this status of this pod is Error, then the etcd pod is crashlooping.

If the etcd pod is crashlooping, then follow the Replacing an unhealthy etcd member whose etcd pod is crashlooping procedure.

返回顶部

2.3. Determining the state of the unhealthy etcd member

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links