Home
Products
OpenShift Container Platform
3.11
Cluster Administration
Chapter 37. Replacing a failed etcd member

Chapter 37. Replacing a failed etcd member

If some etcd members fail, but you still have a quorum of etcd members, you can use the remaining etcd members and the data that they contain to add more etcd members without etcd or cluster downtime.

37.1. Removing a failed etcd node
Copy link

Before you add a new etcd node, remove the failed one.

Procedure

From an active etcd host, remove the failed etcd node:

etcdctl -C https://<surviving host IP>:2379 \
  --ca-file=/etc/etcd/ca.crt     \
  --cert-file=/etc/etcd/peer.crt     \
  --key-file=/etc/etcd/peer.key cluster-health

etcdctl -C https://<surviving host IP>:2379 \
  --ca-file=/etc/etcd/ca.crt     \
  --cert-file=/etc/etcd/peer.crt     \
  --key-file=/etc/etcd/peer.key member remove <failed member identifier>

# etcdctl -C https://<surviving host IP>:2379 \
  --ca-file=/etc/etcd/ca.crt     \
  --cert-file=/etc/etcd/peer.crt     \
  --key-file=/etc/etcd/peer.key cluster-health

# etcdctl -C https://<surviving host IP>:2379 \
  --ca-file=/etc/etcd/ca.crt     \
  --cert-file=/etc/etcd/peer.crt     \
  --key-file=/etc/etcd/peer.key member remove <failed member identifier>

Copy to Clipboard

Toggle word wrap

Stop the etcd service on the failed etcd member by removing the etcd pod definition:

mkdir -p /etc/origin/node/pods-stopped
mv /etc/origin/node/pods/* /etc/origin/node/pods-stopped/

# mkdir -p /etc/origin/node/pods-stopped
# mv /etc/origin/node/pods/* /etc/origin/node/pods-stopped/

Copy to Clipboard

Toggle word wrap

Remove the contents of the etcd directory:
Important
It is recommended to back up this directory to an off-cluster location before removing the contents. You can remove this backup after a successful restore.
```
rm -rf /var/lib/etcd/*
```
```
# rm -rf /var/lib/etcd/*
```
Copy to Clipboard Toggle word wrap

37.2. Adding an etcd member
Copy link

You can add an etcd host either by using an Ansible playbook or by manual steps.

37.2.1. Adding a new etcd host using Ansible
Copy link

Procedure

In the Ansible inventory file, create a new group named [new_etcd] and add the new host. Then, add the new_etcd group as a child of the [OSEv3] group:
```
[OSEv3:children]
masters
nodes
etcd
new_etcd 

... [OUTPUT ABBREVIATED] ...

[etcd]
master-0.example.com
master-1.example.com
master-2.example.com

[new_etcd] 
etcd0.example.com 
```
```
[OSEv3:children]
masters
nodes
etcd
new_etcd 
```
1
```
... [OUTPUT ABBREVIATED] ...

[etcd]
master-0.example.com
master-1.example.com
master-2.example.com

[new_etcd] 
```
2
```
etcd0.example.com 
```
3
Copy to Clipboard Toggle word wrap
1 2 3
Add these lines.
Note
Replace the old etcd host entry with the new etcd host entry in the inventory file. While replacing the older etcd host, you must create a copy of /etc/etcd/ca/ directory. Alternatively, you can redeploy etcd ca and certs before scaling up the etcd hosts.
From the host that installed OpenShift Container Platform and hosts the Ansible inventory file, change to the playbook directory and run the etcd scaleup playbook:
```
cd /usr/share/ansible/openshift-ansible
ansible-playbook  playbooks/openshift-etcd/scaleup.yml
```
```
$ cd /usr/share/ansible/openshift-ansible
$ ansible-playbook  playbooks/openshift-etcd/scaleup.yml
```
Copy to Clipboard Toggle word wrap

After the playbook runs, modify the inventory file to reflect the current status by moving the new etcd host from the [new_etcd] group to the [etcd] group:

[OSEv3:children]
masters
nodes
etcd
new_etcd

... [OUTPUT ABBREVIATED] ...

[etcd]
master-0.example.com
master-1.example.com
master-2.example.com
etcd0.example.com

[OSEv3:children]
masters
nodes
etcd
new_etcd

... [OUTPUT ABBREVIATED] ...

[etcd]
master-0.example.com
master-1.example.com
master-2.example.com
etcd0.example.com

Copy to Clipboard

Toggle word wrap

If you use Flannel, modify the flanneld service configuration on every OpenShift Container Platform host, located at /etc/sysconfig/flanneld, to include the new etcd host:

FLANNEL_ETCD_ENDPOINTS=https://master-0.example.com:2379,https://master-1.example.com:2379,https://master-2.example.com:2379,https://etcd0.example.com:2379

FLANNEL_ETCD_ENDPOINTS=https://master-0.example.com:2379,https://master-1.example.com:2379,https://master-2.example.com:2379,https://etcd0.example.com:2379

Copy to Clipboard

Toggle word wrap

Restart the flanneld service:
```
systemctl restart flanneld.service
```
```
# systemctl restart flanneld.service
```
Copy to Clipboard Toggle word wrap

37.2.2. Manually adding a new etcd host
Copy link

If you do not run etcd as static pods on master nodes, you might need to add another etcd host.

Procedure

Modify the current etcd cluster

To create the etcd certificates, run the openssl command, replacing the values with those from your environment.

Create some environment variables:

export NEW_ETCD_HOSTNAME="*etcd0.example.com*"
export NEW_ETCD_IP="192.168.55.21"

export CN=$NEW_ETCD_HOSTNAME
export SAN="IP:${NEW_ETCD_IP}, DNS:${NEW_ETCD_HOSTNAME}"
export PREFIX="/etc/etcd/generated_certs/etcd-$CN/"
export OPENSSLCFG="/etc/etcd/ca/openssl.cnf"

export NEW_ETCD_HOSTNAME="*etcd0.example.com*"
export NEW_ETCD_IP="192.168.55.21"

export CN=$NEW_ETCD_HOSTNAME
export SAN="IP:${NEW_ETCD_IP}, DNS:${NEW_ETCD_HOSTNAME}"
export PREFIX="/etc/etcd/generated_certs/etcd-$CN/"
export OPENSSLCFG="/etc/etcd/ca/openssl.cnf"

Copy to Clipboard

Toggle word wrap

Note

The custom openssl extensions used as etcd_v3_ca_* include the $SAN environment variable as subjectAltName. See /etc/etcd/ca/openssl.cnf for more information.

Create the directory to store the configuration and certificates:
```
mkdir -p ${PREFIX}
```
```
# mkdir -p ${PREFIX}
```
Copy to Clipboard Toggle word wrap

Create the server certificate request and sign it: (server.csr and server.crt)

openssl req -new -config ${OPENSSLCFG} \
    -keyout ${PREFIX}server.key  \
    -out ${PREFIX}server.csr \
    -reqexts etcd_v3_req -batch -nodes \
    -subj /CN=$CN

openssl ca -name etcd_ca -config ${OPENSSLCFG} \
    -out ${PREFIX}server.crt \
    -in ${PREFIX}server.csr \
    -extensions etcd_v3_ca_server -batch

# openssl req -new -config ${OPENSSLCFG} \
    -keyout ${PREFIX}server.key  \
    -out ${PREFIX}server.csr \
    -reqexts etcd_v3_req -batch -nodes \
    -subj /CN=$CN

# openssl ca -name etcd_ca -config ${OPENSSLCFG} \
    -out ${PREFIX}server.crt \
    -in ${PREFIX}server.csr \
    -extensions etcd_v3_ca_server -batch

Copy to Clipboard

Toggle word wrap

Create the peer certificate request and sign it: (peer.csr and peer.crt)

openssl req -new -config ${OPENSSLCFG} \
    -keyout ${PREFIX}peer.key \
    -out ${PREFIX}peer.csr \
    -reqexts etcd_v3_req -batch -nodes \
    -subj /CN=$CN

openssl ca -name etcd_ca -config ${OPENSSLCFG} \
  -out ${PREFIX}peer.crt \
  -in ${PREFIX}peer.csr \
  -extensions etcd_v3_ca_peer -batch

# openssl req -new -config ${OPENSSLCFG} \
    -keyout ${PREFIX}peer.key \
    -out ${PREFIX}peer.csr \
    -reqexts etcd_v3_req -batch -nodes \
    -subj /CN=$CN

# openssl ca -name etcd_ca -config ${OPENSSLCFG} \
  -out ${PREFIX}peer.crt \
  -in ${PREFIX}peer.csr \
  -extensions etcd_v3_ca_peer -batch

Copy to Clipboard

Toggle word wrap

Copy the current etcd configuration and ca.crt files from the current node as examples to modify later:
```
cp /etc/etcd/etcd.conf ${PREFIX}
cp /etc/etcd/ca.crt ${PREFIX}
```
```
# cp /etc/etcd/etcd.conf ${PREFIX}
# cp /etc/etcd/ca.crt ${PREFIX}
```
Copy to Clipboard Toggle word wrap

While still on the surviving etcd host, add the new host to the cluster. To add additional etcd members to the cluster, you must first adjust the default localhost peer in the peerURLs value for the first member:

Get the member ID for the first member using the member list command:

etcdctl --cert-file=/etc/etcd/peer.crt \
    --key-file=/etc/etcd/peer.key \
    --ca-file=/etc/etcd/ca.crt \
    --peers="https://172.18.1.18:2379,https://172.18.9.202:2379,https://172.18.0.75:2379" \
    member list

# etcdctl --cert-file=/etc/etcd/peer.crt \
    --key-file=/etc/etcd/peer.key \
    --ca-file=/etc/etcd/ca.crt \
    --peers="https://172.18.1.18:2379,https://172.18.9.202:2379,https://172.18.0.75:2379" \


    member list

Copy to Clipboard

Toggle word wrap

1: Ensure that you specify the URLs of only active etcd members in the --peers parameter value.

Obtain the IP address where etcd listens for cluster peers:
```
ss -l4n | grep 2380
```
```
$ ss -l4n | grep 2380
```
Copy to Clipboard Toggle word wrap

Update the value of peerURLs using the etcdctl member update command by passing the member ID and IP address obtained from the previous steps:

etcdctl --cert-file=/etc/etcd/peer.crt \
    --key-file=/etc/etcd/peer.key \
    --ca-file=/etc/etcd/ca.crt \
    --peers="https://172.18.1.18:2379,https://172.18.9.202:2379,https://172.18.0.75:2379" \
    member update 511b7fb6cc0001 https://172.18.1.18:2380

# etcdctl --cert-file=/etc/etcd/peer.crt \
    --key-file=/etc/etcd/peer.key \
    --ca-file=/etc/etcd/ca.crt \
    --peers="https://172.18.1.18:2379,https://172.18.9.202:2379,https://172.18.0.75:2379" \
    member update 511b7fb6cc0001 https://172.18.1.18:2380

Copy to Clipboard

Toggle word wrap

Re-run the member list command and ensure the peer URLs no longer include localhost.

Add the new host to the etcd cluster. Note that the new host is not yet configured, so the status stays as unstarted until the you configure the new host.

Warning

You must add each member and bring it online one at a time. When you add each additional member to the cluster, you must adjust the peerURLs list for the current peers. The peerURLs list grows by one for each member added. The etcdctl member add command outputs the values that you must set in the etcd.conf file as you add each member, as described in the following instructions.

etcdctl -C https://${CURRENT_ETCD_HOST}:2379 \
  --ca-file=/etc/etcd/ca.crt     \
  --cert-file=/etc/etcd/peer.crt     \
  --key-file=/etc/etcd/peer.key member add ${NEW_ETCD_HOSTNAME} https://${NEW_ETCD_IP}:2380

Added member named 10.3.9.222 with ID 4e1db163a21d7651 to cluster

ETCD_NAME="<NEW_ETCD_HOSTNAME>"
ETCD_INITIAL_CLUSTER="<NEW_ETCD_HOSTNAME>=https://<NEW_HOST_IP>:2380,<CLUSTERMEMBER1_NAME>=https:/<CLUSTERMEMBER2_IP>:2380,<CLUSTERMEMBER2_NAME>=https:/<CLUSTERMEMBER2_IP>:2380,<CLUSTERMEMBER3_NAME>=https:/<CLUSTERMEMBER3_IP>:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"

# etcdctl -C https://${CURRENT_ETCD_HOST}:2379 \
  --ca-file=/etc/etcd/ca.crt     \
  --cert-file=/etc/etcd/peer.crt     \
  --key-file=/etc/etcd/peer.key member add ${NEW_ETCD_HOSTNAME} https://${NEW_ETCD_IP}:2380



Added member named 10.3.9.222 with ID 4e1db163a21d7651 to cluster

ETCD_NAME="<NEW_ETCD_HOSTNAME>"
ETCD_INITIAL_CLUSTER="<NEW_ETCD_HOSTNAME>=https://<NEW_HOST_IP>:2380,<CLUSTERMEMBER1_NAME>=https:/<CLUSTERMEMBER2_IP>:2380,<CLUSTERMEMBER2_NAME>=https:/<CLUSTERMEMBER2_IP>:2380,<CLUSTERMEMBER3_NAME>=https:/<CLUSTERMEMBER3_IP>:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"

Copy to Clipboard

Toggle word wrap

1: In this line, 10.3.9.222 is a label for the etcd member. You can specify the host name, IP address, or a simple name.

Update the sample ${PREFIX}/etcd.conf file.
1. Replace the following values with the values generated in the previous step:
  - ETCD_NAME
  - ETCD_INITIAL_CLUSTER
  - ETCD_INITIAL_CLUSTER_STATE
2. Modify the following variables with the new host IP from the output of the previous step. You can use ${NEW_ETCD_IP} as the value.
  ETCD_LISTEN_PEER_URLS ETCD_LISTEN_CLIENT_URLS ETCD_INITIAL_ADVERTISE_PEER_URLS ETCD_ADVERTISE_CLIENT_URLS
  Copy to Clipboard Toggle word wrap
3. If you previously used the member system as an etcd node, you must overwrite the current values in the /etc/etcd/etcd.conf file.
4. Check the file for syntax errors or missing IP addresses, otherwise the etcd service might fail:
  # vi ${PREFIX}/etcd.conf
  Copy to Clipboard Toggle word wrap
On the node that hosts the installation files, update the [etcd] hosts group in the /etc/ansible/hosts inventory file. Remove the old etcd hosts and add the new ones.

Create a tgz file that contains the certificates, the sample configuration file, and the ca and copy it to the new host:

# tar -czvf /etc/etcd/generated_certs/${CN}.tgz -C ${PREFIX} .
scp /etc/etcd/generated_certs/${CN}.tgz ${CN}:/tmp/

# tar -czvf /etc/etcd/generated_certs/${CN}.tgz -C ${PREFIX} .
# scp /etc/etcd/generated_certs/${CN}.tgz ${CN}:/tmp/

Copy to Clipboard

Toggle word wrap

Modify the new etcd host

Install iptables-services to provide iptables utilities to open the required ports for etcd:
```
yum install -y iptables-services
```
```
# yum install -y iptables-services
```
Copy to Clipboard Toggle word wrap

Create the OS_FIREWALL_ALLOW firewall rules to allow etcd to communicate:

Port 2379/tcp for clients

Port 2380/tcp for peer communication

systemctl enable iptables.service --now
iptables -N OS_FIREWALL_ALLOW
iptables -t filter -I INPUT -j OS_FIREWALL_ALLOW
iptables -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 2379 -j ACCEPT
iptables -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 2380 -j ACCEPT
iptables-save | tee /etc/sysconfig/iptables

# systemctl enable iptables.service --now
# iptables -N OS_FIREWALL_ALLOW
# iptables -t filter -I INPUT -j OS_FIREWALL_ALLOW
# iptables -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 2379 -j ACCEPT
# iptables -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 2380 -j ACCEPT
# iptables-save | tee /etc/sysconfig/iptables

Copy to Clipboard

Toggle word wrap

Note

In this example, a new chain OS_FIREWALL_ALLOW is created, which is the standard naming the OpenShift Container Platform installer uses for firewall rules.

Warning

If the environment is hosted in an IaaS environment, modify the security groups for the instance to allow incoming traffic to those ports as well.

Install etcd:
```
yum install -y etcd
```
```
# yum install -y etcd
```
Copy to Clipboard Toggle word wrap
Ensure version etcd-2.3.7-4.el7.x86_64 or greater is installed,

Ensure the etcd service is not running by removing the etcd pod definition:

mkdir -p /etc/origin/node/pods-stopped
mv /etc/origin/node/pods/* /etc/origin/node/pods-stopped/

# mkdir -p /etc/origin/node/pods-stopped
# mv /etc/origin/node/pods/* /etc/origin/node/pods-stopped/

Copy to Clipboard

Toggle word wrap

Remove any etcd configuration and data:
```
rm -Rf /etc/etcd/*
rm -Rf /var/lib/etcd/*
```
```
# rm -Rf /etc/etcd/*
# rm -Rf /var/lib/etcd/*
```
Copy to Clipboard Toggle word wrap
Extract the certificates and configuration files:
```
tar xzvf /tmp/etcd0.example.com.tgz -C /etc/etcd/
```
```
# tar xzvf /tmp/etcd0.example.com.tgz -C /etc/etcd/
```
Copy to Clipboard Toggle word wrap
Start etcd on the new host:
```
systemctl enable etcd --now
```
```
# systemctl enable etcd --now
```
Copy to Clipboard Toggle word wrap

Verify that the host is part of the cluster and the current cluster health:

If you use the v2 etcd api, run the following command:

etcdctl --cert-file=/etc/etcd/peer.crt \
          --key-file=/etc/etcd/peer.key \
          --ca-file=/etc/etcd/ca.crt \
          --peers="https://*master-0.example.com*:2379,\
          https://*master-1.example.com*:2379,\
          https://*master-2.example.com*:2379,\
          https://*etcd0.example.com*:2379"\
          cluster-health
member 5ee217d19001 is healthy: got healthy result from https://192.168.55.12:2379
member 2a529ba1840722c0 is healthy: got healthy result from https://192.168.55.8:2379
member 8b8904727bf526a5 is healthy: got healthy result from https://192.168.55.21:2379
member ed4f0efd277d7599 is healthy: got healthy result from https://192.168.55.13:2379
cluster is healthy

# etcdctl --cert-file=/etc/etcd/peer.crt \
          --key-file=/etc/etcd/peer.key \
          --ca-file=/etc/etcd/ca.crt \
          --peers="https://*master-0.example.com*:2379,\
          https://*master-1.example.com*:2379,\
          https://*master-2.example.com*:2379,\
          https://*etcd0.example.com*:2379"\
          cluster-health
member 5ee217d19001 is healthy: got healthy result from https://192.168.55.12:2379
member 2a529ba1840722c0 is healthy: got healthy result from https://192.168.55.8:2379
member 8b8904727bf526a5 is healthy: got healthy result from https://192.168.55.21:2379
member ed4f0efd277d7599 is healthy: got healthy result from https://192.168.55.13:2379
cluster is healthy

Copy to Clipboard

Toggle word wrap

If you use the v3 etcd api, run the following command:

# ETCDCTL_API=3 etcdctl --cert="/etc/etcd/peer.crt" \
          --key=/etc/etcd/peer.key \
          --cacert="/etc/etcd/ca.crt" \
          --endpoints="https://*master-0.example.com*:2379,\
            https://*master-1.example.com*:2379,\
            https://*master-2.example.com*:2379,\
            https://*etcd0.example.com*:2379"\
            endpoint health
https://master-0.example.com:2379 is healthy: successfully committed proposal: took = 5.011358ms
https://master-1.example.com:2379 is healthy: successfully committed proposal: took = 1.305173ms
https://master-2.example.com:2379 is healthy: successfully committed proposal: took = 1.388772ms
https://etcd0.example.com:2379 is healthy: successfully committed proposal: took = 1.498829ms

# ETCDCTL_API=3 etcdctl --cert="/etc/etcd/peer.crt" \
          --key=/etc/etcd/peer.key \
          --cacert="/etc/etcd/ca.crt" \
          --endpoints="https://*master-0.example.com*:2379,\
            https://*master-1.example.com*:2379,\
            https://*master-2.example.com*:2379,\
            https://*etcd0.example.com*:2379"\
            endpoint health
https://master-0.example.com:2379 is healthy: successfully committed proposal: took = 5.011358ms
https://master-1.example.com:2379 is healthy: successfully committed proposal: took = 1.305173ms
https://master-2.example.com:2379 is healthy: successfully committed proposal: took = 1.388772ms
https://etcd0.example.com:2379 is healthy: successfully committed proposal: took = 1.498829ms

Copy to Clipboard

Toggle word wrap

Modify each OpenShift Container Platform master

Modify the master configuration in the etcClientInfo section of the /etc/origin/master/master-config.yaml file on every master. Add the new etcd host to the list of the etcd servers OpenShift Container Platform uses to store the data, and remove any failed etcd hosts:

etcdClientInfo:
  ca: master.etcd-ca.crt
  certFile: master.etcd-client.crt
  keyFile: master.etcd-client.key
  urls:
    - https://master-0.example.com:2379
    - https://master-1.example.com:2379
    - https://master-2.example.com:2379
    - https://etcd0.example.com:2379

etcdClientInfo:
  ca: master.etcd-ca.crt
  certFile: master.etcd-client.crt
  keyFile: master.etcd-client.key
  urls:
    - https://master-0.example.com:2379
    - https://master-1.example.com:2379
    - https://master-2.example.com:2379
    - https://etcd0.example.com:2379

Copy to Clipboard

Toggle word wrap

Restart the master API service:
- On every master:
  # master-restart api # master-restart controllers
  Copy to Clipboard Toggle word wrap
  Warning
  The number of etcd nodes must be odd, so you must add at least two hosts.

If you use Flannel, modify the flanneld service configuration located at /etc/sysconfig/flanneld on every OpenShift Container Platform host to include the new etcd host:

FLANNEL_ETCD_ENDPOINTS=https://master-0.example.com:2379,https://master-1.example.com:2379,https://master-2.example.com:2379,https://etcd0.example.com:2379

FLANNEL_ETCD_ENDPOINTS=https://master-0.example.com:2379,https://master-1.example.com:2379,https://master-2.example.com:2379,https://etcd0.example.com:2379

Copy to Clipboard

Toggle word wrap

Restart the flanneld service:
```
systemctl restart flanneld.service
```
```
# systemctl restart flanneld.service
```
Copy to Clipboard Toggle word wrap

Chapter 37. Replacing a failed etcd member

37.1. Removing a failed etcd node
Copy link

Procedure

37.2. Adding an etcd member
Copy link

37.2.1. Adding a new etcd host using Ansible
Copy link

Procedure

37.2.2. Manually adding a new etcd host
Copy link

Procedure

Modify the current etcd cluster

Modify the new etcd host

Modify each OpenShift Container Platform master

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 37. Replacing a failed etcd member

37.1. Removing a failed etcd nodeCopy linkLink copied to clipboard!

Procedure

37.2. Adding an etcd memberCopy linkLink copied to clipboard!

37.2.1. Adding a new etcd host using AnsibleCopy linkLink copied to clipboard!

Procedure

37.2.2. Manually adding a new etcd hostCopy linkLink copied to clipboard!

Procedure

Modify the current etcd cluster

Modify the new etcd host

Modify each OpenShift Container Platform master

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

37.1. Removing a failed etcd node
Copy link

37.2. Adding an etcd member
Copy link

37.2.1. Adding a new etcd host using Ansible
Copy link

37.2.2. Manually adding a new etcd host
Copy link