This documentation is for a release that is no longer maintained
See documentation for the latest supported version 3 or the latest supported version 4.Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.
Chapter 2. Replacing a failed master host
This document describes the process to replace a single etcd member. This procedure assumes that there is still an etcd quorum in the cluster.
If you have lost the majority of your master hosts, leading to etcd quorum loss, then you must follow the disaster recovery procedure to recover from lost master hosts instead of this procedure.
If the control plane certificates are not valid on the member being replaced, then you must follow the procedure to recover from expired control plane certificates instead of this procedure.
To replace a single master host:
- Remove the member from the etcd cluster.
- If the etcd certificates for the master host are valid, then add the member back to the etcd cluster.
- If there are no etcd certificates for the master host or they are no longer valid, then generate etcd certificates and add the member to the etcd cluster.
2.1. Removing a failed master host from the etcd cluster Link kopierenLink in die Zwischenablage kopiert!
Follow these steps to remove a failed master host from the etcd cluster.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
role. - You have SSH access to an active master host.
Procedure
View the list of Pods associated with etcd.
In a terminal that has access to the cluster, run the following command:
oc get pods -n openshift-etcd
$ oc get pods -n openshift-etcd NAME READY STATUS RESTARTS AGE etcd-member-ip-10-0-128-73.us-east-2.compute.internal 2/2 Running 0 15h etcd-member-ip-10-0-147-172.us-east-2.compute.internal 2/2 Running 7 122m etcd-member-ip-10-0-171-108.us-east-2.compute.internal 2/2 Running 0 15h
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Access an active master host.
Run the
etcd-member-remove.sh
script and pass in the name of the etcd member to remove:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the etcd member has been successfully removed from the cluster:
Connect to the running etcd container:
[core@ip-10-0-128-73 ~] id=$(sudo crictl ps --name etcd-member | awk 'FNR==2{ print $1}') && sudo crictl exec -it $id /bin/sh
[core@ip-10-0-128-73 ~] id=$(sudo crictl ps --name etcd-member | awk 'FNR==2{ print $1}') && sudo crictl exec -it $id /bin/sh
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In the etcd container, export the variables needed for connecting to etcd:
export ETCDCTL_API=3 ETCDCTL_CACERT=/etc/ssl/etcd/ca.crt ETCDCTL_CERT=$(find /etc/ssl/ -name *peer*crt) ETCDCTL_KEY=$(find /etc/ssl/ -name *peer*key)
sh-4.2# export ETCDCTL_API=3 ETCDCTL_CACERT=/etc/ssl/etcd/ca.crt ETCDCTL_CERT=$(find /etc/ssl/ -name *peer*crt) ETCDCTL_KEY=$(find /etc/ssl/ -name *peer*key)
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In the etcd container, execute
etcdctl member list
and verify that the removed member is no longer listed:Copy to Clipboard Copied! Toggle word wrap Toggle overflow
2.2. Adding the member back to the cluster Link kopierenLink in die Zwischenablage kopiert!
After you have removed the member from the etcd cluster, use one of the following procedures to add the member to the cluster:
- If the etcd certificates for the master host are valid, then add the member back to the etcd cluster.
- If there are no etcd certificates for the master host or they are no longer valid, then generate etcd certificates and add the member to the etcd cluster.
2.2.1. Adding a master host back to the etcd cluster Link kopierenLink in die Zwischenablage kopiert!
Follow these steps to add a master host back to the etcd cluster. This procedure assumes that you previously removed the master host from the cluster and that its etcd dependencies, such as TLS certificates and DNS, are valid.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
role. - You have SSH access to the master host to add to the etcd cluster.
- You have the IP address of an existing active etcd member.
Procedure
Access the master host to add to the etcd cluster.
ImportantYou must run this procedure on the master host that is being added to the etcd cluster.
Run the
etcd-member-add.sh
script and pass in two parameters:- the IP address of an existing etcd member
- the name of the etcd member to add
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the etcd member has been successfully added to the etcd cluster:
Connect to the running etcd container:
[core@ip-10-0-147-172 ~] id=$(sudo crictl ps --name etcd-member | awk 'FNR==2{ print $1}') && sudo crictl exec -it $id /bin/sh
[core@ip-10-0-147-172 ~] id=$(sudo crictl ps --name etcd-member | awk 'FNR==2{ print $1}') && sudo crictl exec -it $id /bin/sh
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In the etcd container, export the variables needed for connecting to etcd:
export ETCDCTL_API=3 ETCDCTL_CACERT=/etc/ssl/etcd/ca.crt ETCDCTL_CERT=$(find /etc/ssl/ -name *peer*crt) ETCDCTL_KEY=$(find /etc/ssl/ -name *peer*key)
sh-4.2# export ETCDCTL_API=3 ETCDCTL_CACERT=/etc/ssl/etcd/ca.crt ETCDCTL_CERT=$(find /etc/ssl/ -name *peer*crt) ETCDCTL_KEY=$(find /etc/ssl/ -name *peer*key)
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In the etcd container, execute
etcdctl member list
and verify that the new member is listed:Copy to Clipboard Copied! Toggle word wrap Toggle overflow It may take up to 10 minutes for the new member to start.
In the etcd container, execute
etcdctl endpoint health
and verify that the new member is healthy:etcdctl endpoint health --cluster
sh-4.2# etcdctl endpoint health --cluster https://10.0.128.73:2379 is healthy: successfully committed proposal: took = 4.5576ms https://10.0.147.172:2379 is healthy: successfully committed proposal: took = 5.1521ms https://10.0.171.108:2379 is healthy: successfully committed proposal: took = 4.2631ms
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verify that the new member is in the list of Pods associated with etcd and that its status is
Running
.In a terminal that has access to the cluster, run the following command:
oc get pods -n openshift-etcd
$ oc get pods -n openshift-etcd NAME READY STATUS RESTARTS AGE etcd-member-ip-10-0-128-73.us-east-2.compute.internal 2/2 Running 0 15h etcd-member-ip-10-0-147-172.us-east-2.compute.internal 2/2 Running 7 122m etcd-member-ip-10-0-171-108.us-east-2.compute.internal 2/2 Running 0 15h
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
2.2.2. Generating etcd certificates and adding the member to the cluster Link kopierenLink in die Zwischenablage kopiert!
If the node is new or the etcd certificates on the node are no longer valid, you must generate the etcd certificates before you can add the member to the etcd cluster.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
role. - You have SSH access to the new master host to add to the etcd cluster.
- You have SSH access to the one of the healthy master hosts.
- You have the IP address of one of the healthy master hosts.
Procedure
Set up a temporary etcd certificate signer service on one of the healthy master nodes.
Access one of the healthy master nodes and log in to your cluster as a
cluster-admin
user using the following command.sudo oc login https://localhost:6443
[core@ip-10-0-143-125 ~]$ sudo oc login https://localhost:6443 Authentication required for https://localhost:6443 (openshift) Username: kubeadmin Password: Login successful.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Obtain the pull specification for the
kube-etcd-signer-server
image.export KUBE_ETCD_SIGNER_SERVER=$(sudo oc adm release info --image-for kube-etcd-signer-server --registry-config=/var/lib/kubelet/config.json)
[core@ip-10-0-143-125 ~]$ export KUBE_ETCD_SIGNER_SERVER=$(sudo oc adm release info --image-for kube-etcd-signer-server --registry-config=/var/lib/kubelet/config.json)
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run the
tokenize-signer.sh
script.Be sure to pass in the
-E
flag tosudo
so that environment variables are properly passed to the script.sudo -E /usr/local/bin/tokenize-signer.sh ip-10-0-143-125
[core@ip-10-0-143-125 ~]$ sudo -E /usr/local/bin/tokenize-signer.sh ip-10-0-143-125
1 Populating template /usr/local/share/openshift-recovery/template/kube-etcd-cert-signer.yaml.template Populating template ./assets/tmp/kube-etcd-cert-signer.yaml.stage1 Tokenized template now ready: ./assets/manifests/kube-etcd-cert-signer.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The host name of the healthy master, where the signer should be deployed.
Create the signer Pod using the file that was generated.
sudo oc create -f assets/manifests/kube-etcd-cert-signer.yaml
[core@ip-10-0-143-125 ~]$ sudo oc create -f assets/manifests/kube-etcd-cert-signer.yaml pod/etcd-signer created
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the signer is listening on this master node.
ss -ltn | grep 9943
[core@ip-10-0-143-125 ~]$ ss -ltn | grep 9943 LISTEN 0 128 *:9943 *:*
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Add the new master host to the etcd cluster.
Access the new master host to be added to the cluster, and log in to your cluster as a
cluster-admin
user using the following command.sudo oc login https://localhost:6443
[core@ip-10-0-156-255 ~]$ sudo oc login https://localhost:6443 Authentication required for https://localhost:6443 (openshift) Username: kubeadmin Password: Login successful.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Export two environment variables that are required by the
etcd-member-recover.sh
script.export SETUP_ETCD_ENVIRONMENT=$(sudo oc adm release info --image-for machine-config-operator --registry-config=/var/lib/kubelet/config.json)
[core@ip-10-0-156-255 ~]$ export SETUP_ETCD_ENVIRONMENT=$(sudo oc adm release info --image-for machine-config-operator --registry-config=/var/lib/kubelet/config.json)
Copy to Clipboard Copied! Toggle word wrap Toggle overflow export KUBE_CLIENT_AGENT=$(sudo oc adm release info --image-for kube-client-agent --registry-config=/var/lib/kubelet/config.json)
[core@ip-10-0-156-255 ~]$ export KUBE_CLIENT_AGENT=$(sudo oc adm release info --image-for kube-client-agent --registry-config=/var/lib/kubelet/config.json)
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run the
etcd-member-recover.sh
script.Be sure to pass in the
-E
flag tosudo
so that environment variables are properly passed to the script.Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Specify both the IP address of the healthy master where the signer server is running, and the etcd name of the new member.
Verify that the new master host has been added to the etcd member list.
Access the healthy master and connect to the running etcd container.
[core@ip-10-0-143-125 ~] id=$(sudo crictl ps --name etcd-member | awk 'FNR==2{ print $1}') && sudo crictl exec -it $id /bin/sh
[core@ip-10-0-143-125 ~] id=$(sudo crictl ps --name etcd-member | awk 'FNR==2{ print $1}') && sudo crictl exec -it $id /bin/sh
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In the etcd container, export variables needed for connecting to etcd.
export ETCDCTL_API=3 ETCDCTL_CACERT=/etc/ssl/etcd/ca.crt ETCDCTL_CERT=$(find /etc/ssl/ -name *peer*crt) ETCDCTL_KEY=$(find /etc/ssl/ -name *peer*key)
sh-4.3# export ETCDCTL_API=3 ETCDCTL_CACERT=/etc/ssl/etcd/ca.crt ETCDCTL_CERT=$(find /etc/ssl/ -name *peer*crt) ETCDCTL_KEY=$(find /etc/ssl/ -name *peer*key)
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In the etcd container, execute
etcdctl member list
and verify that the new member is listed.Copy to Clipboard Copied! Toggle word wrap Toggle overflow It may take up to 20 minutes for the new member to start.
After the new member is added, remove the signer Pod because it is no longer needed.
In a terminal that has access to the cluster, run the following command:
oc delete pod -n openshift-config etcd-signer
$ oc delete pod -n openshift-config etcd-signer
Copy to Clipboard Copied! Toggle word wrap Toggle overflow