Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.

Chapter 2. Replacing a failed master host

This document describes the process to replace a single etcd member. This procedure assumes that there is still an etcd quorum in the cluster.

Note

If you have lost the majority of your master hosts, leading to etcd quorum loss, then you must follow the disaster recovery procedure to recover from lost master hosts instead of this procedure.

If the control plane certificates are not valid on the member being replaced, then you must follow the procedure to recover from expired control plane certificates instead of this procedure.

To replace a single master host:

Remove the member from the etcd cluster.
If the etcd certificates for the master host are valid, then add the member back to the etcd cluster.
If there are no etcd certificates for the master host or they are no longer valid, then generate etcd certificates and add the member to the etcd cluster.

2.1. Removing a failed master host from the etcd cluster
Link kopieren

Follow these steps to remove a failed master host from the etcd cluster.

Prerequisites

You have access to the cluster as a user with the cluster-admin role.
You have SSH access to an active master host.

Procedure

View the list of Pods associated with etcd.

In a terminal that has access to the cluster, run the following command:

oc get pods -n openshift-etcd

$ oc get pods -n openshift-etcd
NAME                                                     READY   STATUS    RESTARTS   AGE
etcd-member-ip-10-0-128-73.us-east-2.compute.internal    2/2     Running   0          15h
etcd-member-ip-10-0-147-172.us-east-2.compute.internal   2/2     Running   7          122m
etcd-member-ip-10-0-171-108.us-east-2.compute.internal   2/2     Running   0          15h

Copy to Clipboard

Toggle word wrap

Access an active master host.

Run the etcd-member-remove.sh script and pass in the name of the etcd member to remove:

sudo -E /usr/local/bin/etcd-member-remove.sh etcd-member-ip-10-0-147-172.us-east-2.compute.internal

[core@ip-10-0-128-73 ~]$ sudo -E /usr/local/bin/etcd-member-remove.sh etcd-member-ip-10-0-147-172.us-east-2.compute.internal
Downloading etcdctl binary..
etcdctl version: 3.3.10
API version: 3.3
etcd client certs already backed up and available ./assets/backup/
Member 23e4736df4451b32 removed from cluster 6e25bab1bb556673
etcd member etcd-member-ip-10-0-147-172.us-east-2.compute.internal with 23e4736df4451b32 successfully removed..

Copy to Clipboard

Toggle word wrap

Verify that the etcd member has been successfully removed from the cluster:

Connect to the running etcd container:

[core@ip-10-0-128-73 ~] id=$(sudo crictl ps --name etcd-member | awk 'FNR==2{ print $1}') && sudo crictl exec -it $id /bin/sh

[core@ip-10-0-128-73 ~] id=$(sudo crictl ps --name etcd-member | awk 'FNR==2{ print $1}') && sudo crictl exec -it $id /bin/sh

Copy to Clipboard

Toggle word wrap

In the etcd container, export the variables needed for connecting to etcd:

export ETCDCTL_API=3 ETCDCTL_CACERT=/etc/ssl/etcd/ca.crt ETCDCTL_CERT=$(find /etc/ssl/ -name *peer*crt) ETCDCTL_KEY=$(find /etc/ssl/ -name *peer*key)

sh-4.2# export ETCDCTL_API=3 ETCDCTL_CACERT=/etc/ssl/etcd/ca.crt ETCDCTL_CERT=$(find /etc/ssl/ -name *peer*crt) ETCDCTL_KEY=$(find /etc/ssl/ -name *peer*key)

Copy to Clipboard

Toggle word wrap

In the etcd container, execute etcdctl member list and verify that the removed member is no longer listed:

 etcdctl member list -w table

sh-4.2#  etcdctl member list -w table

+------------------+---------+------------------------------------------+------------------------------------------------------------------+---------------------------+
|        ID        | STATUS  |                   NAME                   |                            PEER ADDRS                            |       CLIENT ADDRS        |
+------------------+---------+------------------------------------------+------------------------------------------------------------------+---------------------------+
| 29e461db6be4eaaa | started | etcd-member-ip-10-0-128-73.us-east-2.compute.internal | https://etcd-2.clustername.devcluster.openshift.com:2380 | https://10.0.128.73:2379 |
|  cbe982c74cbb42f | started |  etcd-member-ip-10-0-171-108.us-east-2.compute.internal | https://etcd-1.clustername.devcluster.openshift.com:2380 |   https://10.0.171.108:2379 |
+------------------+---------+------------------------------------------+------------------------------------------------------------------+---------------------------+

Copy to Clipboard

Toggle word wrap

2.2. Adding the member back to the cluster
Link kopieren

After you have removed the member from the etcd cluster, use one of the following procedures to add the member to the cluster:

If the etcd certificates for the master host are valid, then add the member back to the etcd cluster.
If there are no etcd certificates for the master host or they are no longer valid, then generate etcd certificates and add the member to the etcd cluster.

2.2.1. Adding a master host back to the etcd cluster
Link kopieren

Follow these steps to add a master host back to the etcd cluster. This procedure assumes that you previously removed the master host from the cluster and that its etcd dependencies, such as TLS certificates and DNS, are valid.

Prerequisites

You have access to the cluster as a user with the cluster-admin role.
You have SSH access to the master host to add to the etcd cluster.
You have the IP address of an existing active etcd member.

Procedure

Access the master host to add to the etcd cluster.
Important
You must run this procedure on the master host that is being added to the etcd cluster.

Run the etcd-member-add.sh script and pass in two parameters:

the IP address of an existing etcd member
the name of the etcd member to add

sudo -E /usr/local/bin/etcd-member-add.sh \
10.0.128.73 \
etcd-member-ip-10-0-147-172.us-east-2.compute.internal

[core@ip-10-0-147-172 ~]$ sudo -E /usr/local/bin/etcd-member-add.sh \
10.0.128.73 \


etcd-member-ip-10-0-147-172.us-east-2.compute.internal



Downloading etcdctl binary..
etcdctl version: 3.3.10
API version: 3.3
etcd-member.yaml found in ./assets/backup/
etcd.conf backup upready exists ./assets/backup/etcd.conf
Stopping etcd..
Waiting for etcd-member to stop
etcd data-dir backup found ./assets/backup/etcd..
Updating etcd membership..
Removing etcd data_dir /var/lib/etcd..

ETCD_NAME="etcd-member-ip-10-0-147-172.us-east-2.compute.internal"
ETCD_INITIAL_CLUSTER="etcd-member-ip-10-0-147-172.us-east-2.compute.internal=https://etcd-1.clustername.devcluster.openshift.com:2380,etcd-member-ip-10-0-171-108.us-east-2.compute.internal=https://etcd-2.clustername.devcluster.openshift.com:2380,etcd-member-ip-10-0-128-73.us-east-2.compute.internal=https://etcd-0.clustername.devcluster.openshift.com:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://etcd-1.clustername.devcluster.openshift.com:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"'
Member  1e42c7070decd39 added to cluster 6e25bab1bb556673
Starting etcd..

Copy to Clipboard

Toggle word wrap

1: The IP address of an active etcd member. This is not the IP address of the member that you are adding.
2: The name of the etcd member to add.

Verify that the etcd member has been successfully added to the etcd cluster:

Connect to the running etcd container:

[core@ip-10-0-147-172 ~] id=$(sudo crictl ps --name etcd-member | awk 'FNR==2{ print $1}') && sudo crictl exec -it $id /bin/sh

[core@ip-10-0-147-172 ~] id=$(sudo crictl ps --name etcd-member | awk 'FNR==2{ print $1}') && sudo crictl exec -it $id /bin/sh

Copy to Clipboard

Toggle word wrap

In the etcd container, export the variables needed for connecting to etcd:

export ETCDCTL_API=3 ETCDCTL_CACERT=/etc/ssl/etcd/ca.crt ETCDCTL_CERT=$(find /etc/ssl/ -name *peer*crt) ETCDCTL_KEY=$(find /etc/ssl/ -name *peer*key)

sh-4.2# export ETCDCTL_API=3 ETCDCTL_CACERT=/etc/ssl/etcd/ca.crt ETCDCTL_CERT=$(find /etc/ssl/ -name *peer*crt) ETCDCTL_KEY=$(find /etc/ssl/ -name *peer*key)

Copy to Clipboard

Toggle word wrap

In the etcd container, execute etcdctl member list and verify that the new member is listed:

 etcdctl member list -w table

sh-4.2#  etcdctl member list -w table

+------------------+---------+------------------------------------------+------------------------------------------------------------------+---------------------------+
|        ID        | STATUS  |                   NAME                   |                            PEER ADDRS                            |       CLIENT ADDRS        |
+------------------+---------+------------------------------------------+------------------------------------------------------------------+---------------------------+
| 29e461db6be4eaaa | started | etcd-member-ip-10-0-128-73.us-east-2.compute.internal | https://etcd-2.clustername.devcluster.openshift.com:2380 | https://10.0.128.73:2379 |
|  cbe982c74cbb42f | started | etcd-member-ip-10-0-147-172.us-east-2.compute.internal | https://etcd-0.clustername.devcluster.openshift.com:2380 | https://10.0.147.172:2379 |
| a752f80bcb0da3e8 | started |   etcd-member-ip-10-0-171-108.us-east-2.compute.internal | https://etcd-1.clustername.devcluster.openshift.com:2380 |   https://10.0.171.108:2379 |
+------------------+---------+------------------------------------------+------------------------------------------------------------------+---------------------------+

Copy to Clipboard

Toggle word wrap

It may take up to 10 minutes for the new member to start.

In the etcd container, execute etcdctl endpoint health and verify that the new member is healthy:

etcdctl endpoint health --cluster

sh-4.2# etcdctl endpoint health --cluster
https://10.0.128.73:2379 is healthy: successfully committed proposal: took = 4.5576ms
https://10.0.147.172:2379 is healthy: successfully committed proposal: took = 5.1521ms
https://10.0.171.108:2379 is healthy: successfully committed proposal: took = 4.2631ms

Copy to Clipboard

Toggle word wrap

Verify that the new member is in the list of Pods associated with etcd and that its status is Running.

In a terminal that has access to the cluster, run the following command:

oc get pods -n openshift-etcd

$ oc get pods -n openshift-etcd
NAME                                                     READY   STATUS    RESTARTS   AGE
etcd-member-ip-10-0-128-73.us-east-2.compute.internal    2/2     Running   0          15h
etcd-member-ip-10-0-147-172.us-east-2.compute.internal   2/2     Running   7          122m
etcd-member-ip-10-0-171-108.us-east-2.compute.internal   2/2     Running   0          15h

Copy to Clipboard

Toggle word wrap

2.2.2. Generating etcd certificates and adding the member to the cluster
Link kopieren

If the node is new or the etcd certificates on the node are no longer valid, you must generate the etcd certificates before you can add the member to the etcd cluster.

Prerequisites

You have access to the cluster as a user with the cluster-admin role.
You have SSH access to the new master host to add to the etcd cluster.
You have SSH access to the one of the healthy master hosts.
You have the IP address of one of the healthy master hosts.

Procedure

Set up a temporary etcd certificate signer service on one of the healthy master nodes.

Access one of the healthy master nodes and log in to your cluster as a cluster-admin user using the following command.

sudo oc login https://localhost:6443

[core@ip-10-0-143-125 ~]$ sudo oc login https://localhost:6443
Authentication required for https://localhost:6443 (openshift)
Username: kubeadmin
Password:
Login successful.

Copy to Clipboard

Toggle word wrap

Obtain the pull specification for the kube-etcd-signer-server image.

export KUBE_ETCD_SIGNER_SERVER=$(sudo oc adm release info --image-for kube-etcd-signer-server --registry-config=/var/lib/kubelet/config.json)

[core@ip-10-0-143-125 ~]$ export KUBE_ETCD_SIGNER_SERVER=$(sudo oc adm release info --image-for kube-etcd-signer-server --registry-config=/var/lib/kubelet/config.json)

Copy to Clipboard

Toggle word wrap

Run the tokenize-signer.sh script.

Be sure to pass in the -E flag to sudo so that environment variables are properly passed to the script.

sudo -E /usr/local/bin/tokenize-signer.sh ip-10-0-143-125

[core@ip-10-0-143-125 ~]$ sudo -E /usr/local/bin/tokenize-signer.sh ip-10-0-143-125


Populating template /usr/local/share/openshift-recovery/template/kube-etcd-cert-signer.yaml.template
Populating template ./assets/tmp/kube-etcd-cert-signer.yaml.stage1
Tokenized template now ready: ./assets/manifests/kube-etcd-cert-signer.yaml

Copy to Clipboard

Toggle word wrap

1: The host name of the healthy master, where the signer should be deployed.

Create the signer Pod using the file that was generated.

sudo oc create -f assets/manifests/kube-etcd-cert-signer.yaml

[core@ip-10-0-143-125 ~]$ sudo oc create -f assets/manifests/kube-etcd-cert-signer.yaml
pod/etcd-signer created

Copy to Clipboard

Toggle word wrap

Verify that the signer is listening on this master node.

ss -ltn | grep 9943

[core@ip-10-0-143-125 ~]$ ss -ltn | grep 9943
LISTEN   0         128                       *:9943                   *:*

Copy to Clipboard

Toggle word wrap

Add the new master host to the etcd cluster.

Access the new master host to be added to the cluster, and log in to your cluster as a cluster-admin user using the following command.

sudo oc login https://localhost:6443

[core@ip-10-0-156-255 ~]$ sudo oc login https://localhost:6443
Authentication required for https://localhost:6443 (openshift)
Username: kubeadmin
Password:
Login successful.

Copy to Clipboard

Toggle word wrap

Export two environment variables that are required by the etcd-member-recover.sh script.

export SETUP_ETCD_ENVIRONMENT=$(sudo oc adm release info --image-for machine-config-operator --registry-config=/var/lib/kubelet/config.json)

[core@ip-10-0-156-255 ~]$ export SETUP_ETCD_ENVIRONMENT=$(sudo oc adm release info --image-for machine-config-operator --registry-config=/var/lib/kubelet/config.json)

Copy to Clipboard

Toggle word wrap

export KUBE_CLIENT_AGENT=$(sudo oc adm release info --image-for kube-client-agent --registry-config=/var/lib/kubelet/config.json)

[core@ip-10-0-156-255 ~]$ export KUBE_CLIENT_AGENT=$(sudo oc adm release info --image-for kube-client-agent --registry-config=/var/lib/kubelet/config.json)

Copy to Clipboard

Toggle word wrap

Run the etcd-member-recover.sh script.

Be sure to pass in the -E flag to sudo so that environment variables are properly passed to the script.

sudo -E /usr/local/bin/etcd-member-recover.sh 10.0.143.125 etcd-member-ip-10-0-156-255.ec2.internal

[core@ip-10-0-156-255 ~]$ sudo -E /usr/local/bin/etcd-member-recover.sh 10.0.143.125 etcd-member-ip-10-0-156-255.ec2.internal


Downloading etcdctl binary..
etcdctl version: 3.3.10
API version: 3.3
etcd-member.yaml found in ./assets/backup/
etcd.conf backup upready exists ./assets/backup/etcd.conf
Trying to backup etcd client certs..
etcd client certs already backed up and available ./assets/backup/
Stopping etcd..
Waiting for etcd-member to stop
etcd data-dir backup found ./assets/backup/etcd..
etcd TLS certificate backups found in ./assets/backup..
Removing etcd certs..
Populating template /usr/local/share/openshift-recovery/template/etcd-generate-certs.yaml.template
Populating template ./assets/tmp/etcd-generate-certs.stage1
Populating template ./assets/tmp/etcd-generate-certs.stage2
Starting etcd client cert recovery agent..
Waiting for certs to generate..
Waiting for certs to generate..
Waiting for certs to generate..
Waiting for certs to generate..
Stopping cert recover..
Waiting for generate-certs to stop
Patching etcd-member manifest..
Updating etcd membership..
Member 249a4b9a790b3719 added to cluster 807ae3bffc8d69ca

ETCD_NAME="etcd-member-ip-10-0-156-255.ec2.internal"
ETCD_INITIAL_CLUSTER="etcd-member-ip-10-0-143-125.ec2.internal=https://etcd-0.clustername.devcluster.openshift.com:2380,etcd-member-ip-10-0-156-255.ec2.internal=https://etcd-1.clustername.devcluster.openshift.com:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://etcd-1.clustername.devcluster.openshift.com:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"
Starting etcd..

Copy to Clipboard

Toggle word wrap

1: Specify both the IP address of the healthy master where the signer server is running, and the etcd name of the new member.

Verify that the new master host has been added to the etcd member list.

Access the healthy master and connect to the running etcd container.

[core@ip-10-0-143-125 ~] id=$(sudo crictl ps --name etcd-member | awk 'FNR==2{ print $1}') && sudo crictl exec -it $id /bin/sh

[core@ip-10-0-143-125 ~] id=$(sudo crictl ps --name etcd-member | awk 'FNR==2{ print $1}') && sudo crictl exec -it $id /bin/sh

Copy to Clipboard

Toggle word wrap

In the etcd container, export variables needed for connecting to etcd.

export ETCDCTL_API=3 ETCDCTL_CACERT=/etc/ssl/etcd/ca.crt ETCDCTL_CERT=$(find /etc/ssl/ -name *peer*crt) ETCDCTL_KEY=$(find /etc/ssl/ -name *peer*key)

sh-4.3# export ETCDCTL_API=3 ETCDCTL_CACERT=/etc/ssl/etcd/ca.crt ETCDCTL_CERT=$(find /etc/ssl/ -name *peer*crt) ETCDCTL_KEY=$(find /etc/ssl/ -name *peer*key)

Copy to Clipboard

Toggle word wrap

In the etcd container, execute etcdctl member list and verify that the new member is listed.

 etcdctl member list -w table

sh-4.3#  etcdctl member list -w table

+------------------+---------+------------------------------------------+----------------------------------------------------------------+---------------------------+
|        ID        | STATUS  |                   NAME                   |                           PEER ADDRS                           |       CLIENT ADDRS        |
+------------------+---------+------------------------------------------+----------------------------------------------------------------+---------------------------+
|  cbe982c74cbb42f | started |  etcd-member-ip-10-0-156-255.ec2.internal | https://etcd-0.clustername.devcluster.openshift.com:2380 |  https://10.0.156.255:2379 |
| 249a4b9a790b3719 | started | etcd-member-ip-10-0-143-125.ec2.internal | https://etcd-1.clustername.devcluster.openshift.com:2380 | https://10.0.143.125:2379 |
+------------------+---------+------------------------------------------+----------------------------------------------------------------+---------------------------+

Copy to Clipboard

Toggle word wrap

It may take up to 20 minutes for the new member to start.

After the new member is added, remove the signer Pod because it is no longer needed.
In a terminal that has access to the cluster, run the following command:
```
oc delete pod -n openshift-config etcd-signer
```
```
$ oc delete pod -n openshift-config etcd-signer
```
Copy to Clipboard Toggle word wrap

Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.

Chapter 2. Replacing a failed master host

2.1. Removing a failed master host from the etcd cluster
Link kopieren

2.2. Adding the member back to the cluster
Link kopieren

2.2.1. Adding a master host back to the etcd cluster
Link kopieren

2.2.2. Generating etcd certificates and adding the member to the cluster
Link kopieren

Lernen

Testen, kaufen und verkaufen

Communitys

Über Red Hat Dokumentation

Mehr Inklusion in Open Source

Über Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.

Chapter 2. Replacing a failed master host

2.1. Removing a failed master host from the etcd clusterLink kopierenLink in die Zwischenablage kopiert!

2.2. Adding the member back to the clusterLink kopierenLink in die Zwischenablage kopiert!

2.2.1. Adding a master host back to the etcd clusterLink kopierenLink in die Zwischenablage kopiert!

2.2.2. Generating etcd certificates and adding the member to the clusterLink kopierenLink in die Zwischenablage kopiert!

Lernen

Testen, kaufen und verkaufen

Communitys

Über Red Hat Dokumentation

Mehr Inklusion in Open Source

Über Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

2.1. Removing a failed master host from the etcd cluster
Link kopieren

2.2. Adding the member back to the cluster
Link kopieren

2.2.1. Adding a master host back to the etcd cluster
Link kopieren

2.2.2. Generating etcd certificates and adding the member to the cluster
Link kopieren