이 콘텐츠는 선택한 언어로 제공되지 않습니다.

Chapter 38. Restoring etcd quorum

If you lose etcd quorum, you must back up etcd, take down your etcd cluster, and form a new one. You can use one healthy etcd node to form a new cluster, but you must remove all other healthy nodes.

Note

During etcd quorum loss, applications that run on OpenShift Container Platform are unaffected. However, the platform functionality is limited to read-only operations. You cannot take action such as scaling an application up or down, changing deployments, or running or modifying builds.

To confirm the loss of etcd quorum, run the following command and confirm that the cluster is unhealthy:

# ETCDCTL_API=2 etcdctl  --cert-file=/etc/origin/master/master.etcd-client.crt  \
          --key-file /etc/origin/master/master.etcd-client.key \
          --ca-file /etc/origin/master/master.etcd-ca.crt \
          --endpoints="https://*master-0.example.com*:2379,\
          https://*master-1.example.com*:2379,\
          https://*master-2.example.com*:2379"\
          cluster-health

member 165201190bf7f217 is unhealthy: got unhealthy result from https://master-0.example.com:2379
member b50b8a0acab2fa71 is unreachable: [https://master-1.example.com:2379] are all unreachable
member d40307cbca7bc2df is unreachable: [https://master-2.example.com:2379] are all unreachable
cluster is unhealthy

# ETCDCTL_API=2 etcdctl  --cert-file=/etc/origin/master/master.etcd-client.crt  \
          --key-file /etc/origin/master/master.etcd-client.key \
          --ca-file /etc/origin/master/master.etcd-ca.crt \
          --endpoints="https://*master-0.example.com*:2379,\
          https://*master-1.example.com*:2379,\
          https://*master-2.example.com*:2379"\
          cluster-health

member 165201190bf7f217 is unhealthy: got unhealthy result from https://master-0.example.com:2379
member b50b8a0acab2fa71 is unreachable: [https://master-1.example.com:2379] are all unreachable
member d40307cbca7bc2df is unreachable: [https://master-2.example.com:2379] are all unreachable
cluster is unhealthy

Copy to Clipboard

Toggle word wrap

Note the member IDs and host names of the hosts. You use one of the nodes that can be reached to form a new cluster.

38.1. Backing up etcd
링크 복사

When you back up etcd, you must back up both the etcd configuration files and the etcd data.

38.1.1. Backing up etcd configuration files
링크 복사

The etcd configuration files to be preserved are all stored in the /etc/etcd directory of the instances where etcd is running. This includes the etcd configuration file (/etc/etcd/etcd.conf) and the required certificates for cluster communication. All those files are generated at installation time by the Ansible installer.

Procedure

For each etcd member of the cluster, back up the etcd configuration.

ssh master-0
mkdir -p /backup/etcd-config-$(date +%Y%m%d)/
cp -R /etc/etcd/ /backup/etcd-config-$(date +%Y%m%d)/

$ ssh master-0
# mkdir -p /backup/etcd-config-$(date +%Y%m%d)/
# cp -R /etc/etcd/ /backup/etcd-config-$(date +%Y%m%d)/

Copy to Clipboard

Toggle word wrap

Note

The certificates and configuration files on each etcd cluster member are unique.

38.1.2. Backing up etcd data
링크 복사

Prerequisites

Note

The OpenShift Container Platform installer creates aliases to avoid typing all the flags named etcdctl2 for etcd v2 tasks and etcdctl3 for etcd v3 tasks.

However, the etcdctl3 alias does not provide the full endpoint list to the etcdctl command, so the --endpoints option with all the endpoints must be provided.

Before backing up etcd:

etcdctl binaries should be available or, in containerized installations, the rhel7/etcd container should be available
Ensure connectivity with the etcd cluster (port 2379/tcp)
Ensure the proper certificates to connect to the etcd cluster

Procedure

Note

While the etcdctl backup command is used to perform the backup, etcd v3 has no concept of a backup. Instead, you either take a snapshot from a live member with the etcdctl snapshot save command or copy the member/snap/db file from an etcd data directory.

The etcdctl backup command rewrites some of the metadata contained in the backup, specifically, the node ID and cluster ID, which means that in the backup, the node loses its former identity. To recreate a cluster from the backup, you create a new, single-node cluster, then add the rest of the nodes to the cluster. The metadata is rewritten to prevent the new node from joining an existing cluster.

Back up the etcd data:

If you use the v2 API, take the following actions:

Stop all etcd services:
```
systemctl stop etcd.service
```
```
# systemctl stop etcd.service
```
Copy to Clipboard Toggle word wrap

Create the etcd data backup and copy the etcd db file:

mkdir -p /backup/etcd-$(date +%Y%m%d)
etcdctl2 backup \
    --data-dir /var/lib/etcd \
    --backup-dir /backup/etcd-$(date +%Y%m%d)
cp /var/lib/etcd/member/snap/db /backup/etcd-$(date +%Y%m%d)

# mkdir -p /backup/etcd-$(date +%Y%m%d)
# etcdctl2 backup \
    --data-dir /var/lib/etcd \
    --backup-dir /backup/etcd-$(date +%Y%m%d)
# cp /var/lib/etcd/member/snap/db /backup/etcd-$(date +%Y%m%d)

Copy to Clipboard

Toggle word wrap

Start all etcd services:
```
systemctl start etcd.service
```
```
# systemctl start etcd.service
```
Copy to Clipboard Toggle word wrap

If you use the v3 API, run the following commands:

Important

Because clusters upgraded from previous versions of OpenShift Container Platform might contain v2 data stores, back up both v2 and v3 datastores.

Back up etcd v3 data:

systemctl show etcd --property=ActiveState,SubState
mkdir -p /backup/etcd-$(date +%Y%m%d)
etcdctl3 snapshot save */backup/etcd-$(date +%Y%m%d)*/db
Snapshot saved at /backup/etcd-<date>/db

# systemctl show etcd --property=ActiveState,SubState
# mkdir -p /backup/etcd-$(date +%Y%m%d)
# etcdctl3 snapshot save */backup/etcd-$(date +%Y%m%d)*/db
Snapshot saved at /backup/etcd-<date>/db

Copy to Clipboard

Toggle word wrap

Back up etcd v2 data:
```
systemctl stop etcd.service
etcdctl2 backup \
    --data-dir /var/lib/etcd \
    --backup-dir /backup/etcd-$(date +%Y%m%d)
cp /var/lib/etcd/member/snap/db /backup/etcd-$(date +%Y%m%d)
systemctl start etcd.service
```
```
# systemctl stop etcd.service
# etcdctl2 backup \
    --data-dir /var/lib/etcd \
    --backup-dir /backup/etcd-$(date +%Y%m%d)
# cp /var/lib/etcd/member/snap/db /backup/etcd-$(date +%Y%m%d)
# systemctl start etcd.service
```
Copy to Clipboard Toggle word wrap
Note
The etcdctl snapshot save command requires the etcd service to be running.
In these commands, a /backup/etcd-<date>/ directory is created, where <date> represents the current date, which must be an external NFS share, S3 bucket, or any external storage location.
In the case of an all-in-one cluster, the etcd data directory is located in the /var/lib/origin/openshift.local.etcd directory.

38.2. Removing an etcd host
링크 복사

If an etcd host fails beyond restoration, remove it from the cluster. To recover from an etcd quorum loss, you must also remove all healthy etcd nodes but one from your cluster.

Steps to be performed on all masters hosts

Procedure

Remove each other etcd host from the etcd cluster. Run the following command for each etcd node:

etcdctl -C https://<surviving host IP address>:2379 \
  --ca-file=/etc/etcd/ca.crt     \
  --cert-file=/etc/etcd/peer.crt     \
  --key-file=/etc/etcd/peer.key member remove <failed member ID>

# etcdctl -C https://<surviving host IP address>:2379 \
  --ca-file=/etc/etcd/ca.crt     \
  --cert-file=/etc/etcd/peer.crt     \
  --key-file=/etc/etcd/peer.key member remove <failed member ID>

Copy to Clipboard

Toggle word wrap

Remove the other etcd hosts from the /etc/origin/master/master-config.yaml +master configuration file on every master:

etcdClientInfo:
  ca: master.etcd-ca.crt
  certFile: master.etcd-client.crt
  keyFile: master.etcd-client.key
  urls:
    - https://master-0.example.com:2379
    - https://master-1.example.com:2379 
    - https://master-2.example.com:2379

etcdClientInfo:
  ca: master.etcd-ca.crt
  certFile: master.etcd-client.crt
  keyFile: master.etcd-client.key
  urls:
    - https://master-0.example.com:2379
    - https://master-1.example.com:2379


    - https://master-2.example.com:2379

Copy to Clipboard

Toggle word wrap

1 2: The host to remove.

Restart the master API service on every master:
```
systemctl restart atomic-openshift-master-api
```
```
# systemctl restart atomic-openshift-master-api
```
Copy to Clipboard Toggle word wrap
Or, if using a single master cluster installation:
```
systemctl restart atomic-openshift-master
```
```
# systemctl restart atomic-openshift-master
```
Copy to Clipboard Toggle word wrap

Steps to be performed in the current etcd cluster

Procedure

Remove the failed host from the cluster:

etcdctl2 cluster-health
member 5ee217d19001 is healthy: got healthy result from https://192.168.55.12:2379
member 2a529ba1840722c0 is healthy: got healthy result from https://192.168.55.8:2379
failed to check the health of member 8372784203e11288 on https://192.168.55.21:2379: Get https://192.168.55.21:2379/health: dial tcp 192.168.55.21:2379: getsockopt: connection refused
member 8372784203e11288 is unreachable: [https://192.168.55.21:2379] are all unreachable
member ed4f0efd277d7599 is healthy: got healthy result from https://192.168.55.13:2379
cluster is healthy

etcdctl2 member remove 8372784203e11288
Removed member 8372784203e11288 from cluster

etcdctl2 cluster-health
member 5ee217d19001 is healthy: got healthy result from https://192.168.55.12:2379
member 2a529ba1840722c0 is healthy: got healthy result from https://192.168.55.8:2379
member ed4f0efd277d7599 is healthy: got healthy result from https://192.168.55.13:2379
cluster is healthy

# etcdctl2 cluster-health
member 5ee217d19001 is healthy: got healthy result from https://192.168.55.12:2379
member 2a529ba1840722c0 is healthy: got healthy result from https://192.168.55.8:2379
failed to check the health of member 8372784203e11288 on https://192.168.55.21:2379: Get https://192.168.55.21:2379/health: dial tcp 192.168.55.21:2379: getsockopt: connection refused
member 8372784203e11288 is unreachable: [https://192.168.55.21:2379] are all unreachable
member ed4f0efd277d7599 is healthy: got healthy result from https://192.168.55.13:2379
cluster is healthy

# etcdctl2 member remove 8372784203e11288


Removed member 8372784203e11288 from cluster

# etcdctl2 cluster-health
member 5ee217d19001 is healthy: got healthy result from https://192.168.55.12:2379
member 2a529ba1840722c0 is healthy: got healthy result from https://192.168.55.8:2379
member ed4f0efd277d7599 is healthy: got healthy result from https://192.168.55.13:2379
cluster is healthy

Copy to Clipboard

Toggle word wrap

1: The remove command requires the etcd ID, not the hostname.

To ensure the etcd configuration does not use the failed host when the etcd service is restarted, modify the /etc/etcd/etcd.conf file on all remaining etcd hosts and remove the failed host in the value for the ETCD_INITIAL_CLUSTER variable:

vi /etc/etcd/etcd.conf

# vi /etc/etcd/etcd.conf

Copy to Clipboard

Toggle word wrap

For example:

ETCD_INITIAL_CLUSTER=master-0.example.com=https://192.168.55.8:2380,master-1.example.com=https://192.168.55.12:2380,master-2.example.com=https://192.168.55.13:2380

ETCD_INITIAL_CLUSTER=master-0.example.com=https://192.168.55.8:2380,master-1.example.com=https://192.168.55.12:2380,master-2.example.com=https://192.168.55.13:2380

Copy to Clipboard

Toggle word wrap

becomes:

ETCD_INITIAL_CLUSTER=master-0.example.com=https://192.168.55.8:2380,master-1.example.com=https://192.168.55.12:2380

ETCD_INITIAL_CLUSTER=master-0.example.com=https://192.168.55.8:2380,master-1.example.com=https://192.168.55.12:2380

Copy to Clipboard

Toggle word wrap

Note

Restarting the etcd services is not required, because the failed host is removed using etcdctl.

Modify the Ansible inventory file to reflect the current status of the cluster and to avoid issues when re-running a playbook:

[OSEv3:children]
masters
nodes
etcd

... [OUTPUT ABBREVIATED] ...

[etcd]
master-0.example.com
master-1.example.com

[OSEv3:children]
masters
nodes
etcd

... [OUTPUT ABBREVIATED] ...

[etcd]
master-0.example.com
master-1.example.com

Copy to Clipboard

Toggle word wrap

If you are using Flannel, modify the flanneld service configuration located at /etc/sysconfig/flanneld on every host and remove the etcd host:

FLANNEL_ETCD_ENDPOINTS=https://master-0.example.com:2379,https://master-1.example.com:2379,https://master-2.example.com:2379

FLANNEL_ETCD_ENDPOINTS=https://master-0.example.com:2379,https://master-1.example.com:2379,https://master-2.example.com:2379

Copy to Clipboard

Toggle word wrap

Restart the flanneld service:
```
systemctl restart flanneld.service
```
```
# systemctl restart flanneld.service
```
Copy to Clipboard Toggle word wrap

38.3. Creating a single-node etcd cluster
링크 복사

To restore the full functionality of your OpenShift Container Platform instance, make a remaining etcd node a standalone etcd cluster.

Procedure

On the etcd node that you did not remove from the cluster, stop all etcd services:
```
systemctl stop etcd.service
```
```
# systemctl stop etcd.service
```
Copy to Clipboard Toggle word wrap

Run the etcd service on the host, forcing a new cluster.

These commands create a custom file for the etcd service, which adds the --force-new-cluster option to the etcd start command:

mkdir -p /etc/systemd/system/etcd.service.d/
echo "[Service]" > /etc/systemd/system/etcd.service.d/temp.conf
echo "ExecStart=" >> /etc/systemd/system/etcd.service.d/temp.conf
sed -n '/ExecStart/s/"$/ --force-new-cluster"/p' \
    /usr/lib/systemd/system/etcd.service \
    >> /etc/systemd/system/etcd.service.d/temp.conf

systemctl daemon-reload
systemctl restart etcd

# mkdir -p /etc/systemd/system/etcd.service.d/
# echo "[Service]" > /etc/systemd/system/etcd.service.d/temp.conf
# echo "ExecStart=" >> /etc/systemd/system/etcd.service.d/temp.conf
# sed -n '/ExecStart/s/"$/ --force-new-cluster"/p' \
    /usr/lib/systemd/system/etcd.service \
    >> /etc/systemd/system/etcd.service.d/temp.conf

# systemctl daemon-reload
# systemctl restart etcd

Copy to Clipboard

Toggle word wrap

List the etcd member and confirm that the member list contains only your single etcd host:

etcdctl member list
165201190bf7f217: name=192.168.34.20 peerURLs=http://localhost:2380 clientURLs=https://master-0.example.com:2379 isLeader=true

# etcdctl member list
165201190bf7f217: name=192.168.34.20 peerURLs=http://localhost:2380 clientURLs=https://master-0.example.com:2379 isLeader=true

Copy to Clipboard

Toggle word wrap

After restoring the data and creating a new cluster, you must update the peerURLs parameter value to use the IP address where etcd listens for peer communication:
```
etcdctl member update 165201190bf7f217 https://192.168.34.20:2380
```
```
# etcdctl member update 165201190bf7f217 https://192.168.34.20:2380 
```
1
Copy to Clipboard Toggle word wrap
1
165201190bf7f217 is the member ID shown in the output of the previous command, and https://192.168.34.20:2380 is its IP address.

To verify, check that the IP is in the member list:

etcdctl2 member list
5ee217d17301: name=master-0.example.com peerURLs=https://*192.168.55.8*:2380 clientURLs=https://192.168.55.8:2379 isLeader=true

$ etcdctl2 member list
5ee217d17301: name=master-0.example.com peerURLs=https://*192.168.55.8*:2380 clientURLs=https://192.168.55.8:2379 isLeader=true

Copy to Clipboard

Toggle word wrap

맨 위로 이동

이 콘텐츠는 선택한 언어로 제공되지 않습니다.

Chapter 38. Restoring etcd quorum

38.1. Backing up etcd
링크 복사

38.1.1. Backing up etcd configuration files
링크 복사

Procedure

38.1.2. Backing up etcd data
링크 복사

Prerequisites

Procedure

38.2. Removing an etcd host
링크 복사

Procedure

Procedure

38.3. Creating a single-node etcd cluster
링크 복사

Procedure

자세한 정보

평가판, 구매 및 판매

커뮤니티

Red Hat 문서 정보

보다 포괄적 수용을 위한 오픈 소스 용어 교체

Red Hat 소개

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

이 콘텐츠는 선택한 언어로 제공되지 않습니다.

Chapter 38. Restoring etcd quorum

38.1. Backing up etcd링크 복사링크가 클립보드에 복사되었습니다!

38.1.1. Backing up etcd configuration files링크 복사링크가 클립보드에 복사되었습니다!

Procedure

38.1.2. Backing up etcd data링크 복사링크가 클립보드에 복사되었습니다!

Prerequisites

Procedure

38.2. Removing an etcd host링크 복사링크가 클립보드에 복사되었습니다!

Procedure

Procedure

38.3. Creating a single-node etcd cluster링크 복사링크가 클립보드에 복사되었습니다!

Procedure

자세한 정보

평가판, 구매 및 판매

커뮤니티

Red Hat 문서 정보

보다 포괄적 수용을 위한 오픈 소스 용어 교체

Red Hat 소개

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

38.1. Backing up etcd
링크 복사

38.1.1. Backing up etcd configuration files
링크 복사

38.1.2. Backing up etcd data
링크 복사

38.2. Removing an etcd host
링크 복사

38.3. Creating a single-node etcd cluster
링크 복사