Ce contenu n'est pas disponible dans la langue sélectionnée.
Chapter 37. Replacing a failed etcd member
If some etcd members fail, but you still have a quorum of etcd members, you can use the remaining etcd members and the data that they contain to add more etcd members without etcd or cluster downtime.
37.1. Removing a failed etcd node
Before you add a new etcd node, remove the failed one.
From an active etcd host, remove the failed etcd node:
# etcdctl -C https://<surviving host IP>:2379 \ --ca-file=/etc/etcd/ca.crt \ --cert-file=/etc/etcd/peer.crt \ --key-file=/etc/etcd/peer.key cluster-health # etcdctl -C https://<surviving host IP>:2379 \ --ca-file=/etc/etcd/ca.crt \ --cert-file=/etc/etcd/peer.crt \ --key-file=/etc/etcd/peer.key member remove <failed member identifier>
Stop the etcd service on the failed etcd member:
# systemctl stop etcd
37.2. Adding an etcd member
You can add an etcd host either by using an Ansible playbook or by manual steps.
37.2.1. Adding a new etcd host using Ansible
In the Ansible inventory file, create a new group named
and add the new host. Then, add thenew_etcd
group as a child of the[OSEv3]
group:[OSEv3:children] masters nodes etcd new_etcd 1 ... [OUTPUT ABBREVIATED] ... [etcd] master-0.example.com master-1.example.com master-2.example.com [new_etcd] 2 etcd0.example.com 3
From the host that installed OpenShift Container Platform and hosts the Ansible inventory file, run the etcd
playbook:$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-etcd/scaleup.yml
After the playbook runs, modify the inventory file to reflect the current status by moving the new etcd host from the
group to the[etcd]
group:[OSEv3:children] masters nodes etcd new_etcd ... [OUTPUT ABBREVIATED] ... [etcd] master-0.example.com master-1.example.com master-2.example.com etcd0.example.com
If you use Flannel, modify the
service configuration on every OpenShift Container Platform host, located at/etc/sysconfig/flanneld
, to include the new etcd host:FLANNEL_ETCD_ENDPOINTS=https://master-0.example.com:2379,https://master-1.example.com:2379,https://master-2.example.com:2379,https://etcd0.example.com:2379
Restart the
service:# systemctl restart flanneld.service
37.2.2. Manually adding a new etcd host
Modify the current etcd cluster
To create the etcd certificates, run the openssl
command, replacing the values with those from your environment.
Create some environment variables:
export NEW_ETCD_HOSTNAME="*etcd0.example.com*" export NEW_ETCD_IP="" export CN=$NEW_ETCD_HOSTNAME export SAN="IP:${NEW_ETCD_IP}, DNS:${NEW_ETCD_HOSTNAME}" export PREFIX="/etc/etcd/generated_certs/etcd-$CN/" export OPENSSLCFG="/etc/etcd/ca/openssl.cnf"
NoteThe custom
extensions used asetcd_v3_ca_*
include the $SAN environment variable assubjectAltName
. See/etc/etcd/ca/openssl.cnf
for more information.Create the directory to store the configuration and certificates:
# mkdir -p ${PREFIX}
Create the server certificate request and sign it: (server.csr and server.crt)
# openssl req -new -config ${OPENSSLCFG} \ -keyout ${PREFIX}server.key \ -out ${PREFIX}server.csr \ -reqexts etcd_v3_req -batch -nodes \ -subj /CN=$CN # openssl ca -name etcd_ca -config ${OPENSSLCFG} \ -out ${PREFIX}server.crt \ -in ${PREFIX}server.csr \ -extensions etcd_v3_ca_server -batch
Create the peer certificate request and sign it: (peer.csr and peer.crt)
# openssl req -new -config ${OPENSSLCFG} \ -keyout ${PREFIX}peer.key \ -out ${PREFIX}peer.csr \ -reqexts etcd_v3_req -batch -nodes \ -subj /CN=$CN # openssl ca -name etcd_ca -config ${OPENSSLCFG} \ -out ${PREFIX}peer.crt \ -in ${PREFIX}peer.csr \ -extensions etcd_v3_ca_peer -batch
Copy the current etcd configuration and
files from the current node as examples to modify later:# cp /etc/etcd/etcd.conf ${PREFIX} # cp /etc/etcd/ca.crt ${PREFIX}
While still on the surviving etcd host, add the new host to the cluster. To add additional etcd members to the cluster, you must first adjust the default localhost peer in the
value for the first member:Get the member ID for the first member using the
member list
command:# etcdctl --cert-file=/etc/etcd/peer.crt \ --key-file=/etc/etcd/peer.key \ --ca-file=/etc/etcd/ca.crt \ --peers=",," \ 1 member list
- 1
- Ensure that you specify the URLs of only active etcd members in the
parameter value.
Obtain the IP address where etcd listens for cluster peers:
$ ss -l4n | grep 2380
Update the value of
using theetcdctl member update
command by passing the member ID and IP address obtained from the previous steps:# etcdctl --cert-file=/etc/etcd/peer.crt \ --key-file=/etc/etcd/peer.key \ --ca-file=/etc/etcd/ca.crt \ --peers=",," \ member update 511b7fb6cc0001
Re-run the
member list
command and ensure the peer URLs no longer include localhost.
Add the new host to the etcd cluster. Note that the new host is not yet configured, so the status stays as
until the you configure the new host.WarningYou must add each member and bring it online one at a time. When you add each additional member to the cluster, you must adjust the
list for the current peers. ThepeerURLs
list grows by one for each member added. Theetcdctl member add
command outputs the values that you must set in the etcd.conf file as you add each member, as described in the following instructions.# etcdctl -C https://${CURRENT_ETCD_HOST}:2379 \ --ca-file=/etc/etcd/ca.crt \ --cert-file=/etc/etcd/peer.crt \ --key-file=/etc/etcd/peer.key member add ${NEW_ETCD_HOSTNAME} https://${NEW_ETCD_IP}:2380 1 Added member named with ID 4e1db163a21d7651 to cluster ETCD_NAME="<NEW_ETCD_HOSTNAME>" ETCD_INITIAL_CLUSTER="<NEW_ETCD_HOSTNAME>=https://<NEW_HOST_IP>:2380,<CLUSTERMEMBER1_NAME>=https:/<CLUSTERMEMBER2_IP>:2380,<CLUSTERMEMBER2_NAME>=https:/<CLUSTERMEMBER2_IP>:2380,<CLUSTERMEMBER3_NAME>=https:/<CLUSTERMEMBER3_IP>:2380" ETCD_INITIAL_CLUSTER_STATE="existing"
- 1
- In this line,
is a label for the etcd member. You can specify the host name, IP address, or a simple name.
Update the sample
file.Replace the following values with the values generated in the previous step:
Modify the following variables with the new host IP from the output of the previous step. You can use
- If you previously used the member system as an etcd node, you must overwrite the current values in the /etc/etcd/etcd.conf file.
Check the file for syntax errors or missing IP addresses, otherwise the etcd service might fail:
# vi ${PREFIX}/etcd.conf
On the node that hosts the installation files, update the
hosts group in the /etc/ansible/hosts inventory file. Remove the old etcd hosts and add the new ones. Create a
file that contains the certificates, the sample configuration file, and theca
and copy it to the new host:# tar -czvf /etc/etcd/generated_certs/${CN}.tgz -C ${PREFIX} . # scp /etc/etcd/generated_certs/${CN}.tgz ${CN}:/tmp/
Modify the new etcd host
to provide iptables utilities to open the required ports for etcd:# yum install -y iptables-services
Create the
firewall rules to allow etcd to communicate:- Port 2379/tcp for clients
Port 2380/tcp for peer communication
# systemctl enable iptables.service --now # iptables -N OS_FIREWALL_ALLOW # iptables -t filter -I INPUT -j OS_FIREWALL_ALLOW # iptables -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 2379 -j ACCEPT # iptables -A OS_FIREWALL_ALLOW -p tcp -m state --state NEW -m tcp --dport 2380 -j ACCEPT # iptables-save | tee /etc/sysconfig/iptables
NoteIn this example, a new chain
is created, which is the standard naming the OpenShift Container Platform installer uses for firewall rules.WarningIf the environment is hosted in an IaaS environment, modify the security groups for the instance to allow incoming traffic to those ports as well.
Install etcd:
# yum install -y etcd
Ensure version
or greater is installed,Ensure the etcd service is not running:
# systemctl disable etcd --now
Remove any etcd configuration and data:
# rm -Rf /etc/etcd/* # rm -Rf /var/lib/etcd/*
Extract the certificates and configuration files:
# tar xzvf /tmp/etcd0.example.com.tgz -C /etc/etcd/
Modify the file ownership permissions:
# chown -R etcd/etcd /etc/etcd/* # chown -R etcd/etcd /var/lib/etcd/
Start etcd on the new host:
# systemctl enable etcd --now
Verify that the host is part of the cluster and the current cluster health:
If you use the v2 etcd api, run the following command:
# etcdctl --cert-file=/etc/etcd/peer.crt \ --key-file=/etc/etcd/peer.key \ --ca-file=/etc/etcd/ca.crt \ --peers="https://*master-0.example.com*:2379,\ https://*master-1.example.com*:2379,\ https://*master-2.example.com*:2379,\ https://*etcd0.example.com*:2379"\ cluster-health member 5ee217d19001 is healthy: got healthy result from member 2a529ba1840722c0 is healthy: got healthy result from member 8b8904727bf526a5 is healthy: got healthy result from member ed4f0efd277d7599 is healthy: got healthy result from cluster is healthy
If you use the v3 etcd api, run the following command:
# ETCDCTL_API=3 etcdctl --cert="/etc/etcd/peer.crt" \ --key=/etc/etcd/peer.key \ --cacert="/etc/etcd/ca.crt" \ --endpoints="https://*master-0.example.com*:2379,\ https://*master-1.example.com*:2379,\ https://*master-2.example.com*:2379,\ https://*etcd0.example.com*:2379"\ endpoint health https://master-0.example.com:2379 is healthy: successfully committed proposal: took = 5.011358ms https://master-1.example.com:2379 is healthy: successfully committed proposal: took = 1.305173ms https://master-2.example.com:2379 is healthy: successfully committed proposal: took = 1.388772ms https://etcd0.example.com:2379 is healthy: successfully committed proposal: took = 1.498829ms
Modify each OpenShift Container Platform master
Modify the master configuration in the
section of the/etc/origin/master/master-config.yaml
file on every master. Add the new etcd host to the list of the etcd servers OpenShift Container Platform uses to store the data, and remove any failed etcd hosts:etcdClientInfo: ca: master.etcd-ca.crt certFile: master.etcd-client.crt keyFile: master.etcd-client.key urls: - https://master-0.example.com:2379 - https://master-1.example.com:2379 - https://master-2.example.com:2379 - https://etcd0.example.com:2379
Restart the master API service:
On every master:
# systemctl restart atomic-openshift-master-api
Or, on a single master cluster installation:
# systemctl restart atomic-openshift-master
WarningThe number of etcd nodes must be odd, so you must add at least two hosts.
If you use Flannel, modify the
service configuration located at/etc/sysconfig/flanneld
on every OpenShift Container Platform host to include the new etcd host:FLANNEL_ETCD_ENDPOINTS=https://master-0.example.com:2379,https://master-1.example.com:2379,https://master-2.example.com:2379,https://etcd0.example.com:2379
Restart the
service:# systemctl restart flanneld.service