ホーム
製品
OpenShift Container Platform
4.7
バックアップおよび復元
5.2. 正常でない etcd メンバーの置き換え

5.2. 正常でない etcd メンバーの置き換え

本書では、単一の正常でない etcd メンバーを置き換えるプロセスについて説明します。

このプロセスは、マシンが実行されていないか、またはノードが準備状態にないことによって etcd メンバーが正常な状態にないか、または etcd Pod がクラッシュループしているためにこれが正常な状態にないかによって異なります。

注記

大多数のコントロールプレーンホスト (別名マスターホスト) が失われ、etcd のクォーラム (定足数) の損失が発生した場合は、この手順ではなく、直前のクラスター状態への復元に向けた障害復旧手順を実行する必要があります。

コントロールプレーンの証明書が置き換えているメンバーで有効でない場合は、この手順ではなく、期限切れのコントロールプレーン証明書からの回復手順を実行する必要があります。

コントロールプレーンノードが失われ、新規ノードが作成される場合、etcd クラスター Operator は新規 TLS 証明書の生成と、ノードの etcd メンバーとしての追加を処理します。

5.2.1. 前提条件
リンクのコピー

正常でない etcd メンバーを置き換える前に、etcd バックアップを作成します。

5.2.2. 正常でない etcd メンバーの特定
リンクのコピー

クラスターに正常でない etcd メンバーがあるかどうかを特定することができます。

前提条件

cluster-admin ロールを持つユーザーとしてのクラスターへのアクセスがあること。

手順

以下のコマンドを使用して EtcdMembersAvailable ステータス条件のステータスを確認します。

oc get etcd -o=jsonpath='{range .items[0].status.conditions[?(@.type=="EtcdMembersAvailable")]}{.message}{"\n"}'

$ oc get etcd -o=jsonpath='{range .items[0].status.conditions[?(@.type=="EtcdMembersAvailable")]}{.message}{"\n"}'

Copy to Clipboard

Toggle word wrap

出力を確認します。
```
2 of 3 members are available, ip-10-0-131-183.ec2.internal is unhealthy
```
```
2 of 3 members are available, ip-10-0-131-183.ec2.internal is unhealthy
```
Copy to Clipboard Toggle word wrap
この出力例は、ip-10-0-131-183.ec2.internal etcd メンバーが正常ではないことを示しています。

5.2.3. 正常でない etcd メンバーの状態の判別
リンクのコピー

正常でない etcd メンバーを置き換える手順は、etcd メンバーが以下のどの状態にあるかによって異なります。

マシンが実行されていないか、ノードが準備状態にない
etcd Pod がクラッシュループしている。

以下の手順では、etcd メンバーがどの状態にあるかを判別します。これにより、正常でない etcd メンバーを置き換えるために実行する必要のある手順を確認できます。

注記

マシンが実行されていないか、またはノードが準備状態にないものの、すぐに正常な状態に戻ることが予想される場合は、etcd メンバーを置き換える手順を実行する必要はありません。etcd クラスター Operator はマシンまたはノードが正常な状態に戻ると自動的に同期します。

前提条件

cluster-admin ロールを持つユーザーとしてクラスターにアクセスできる。
正常でない etcd メンバーを特定している。

手順

マシンが実行されていないかどうかを判別します。
```
oc get machines -A -ojsonpath='{range .items[*]}{@.status.nodeRef.name}{"\t"}{@.status.providerStatus.instanceState}{"\n"}' | grep -v running
```
```
$ oc get machines -A -ojsonpath='{range .items[*]}{@.status.nodeRef.name}{"\t"}{@.status.providerStatus.instanceState}{"\n"}' | grep -v running
```
Copy to Clipboard Toggle word wrap
出力例
```
ip-10-0-131-183.ec2.internal  stopped 
```
```
ip-10-0-131-183.ec2.internal  stopped 
```
1
Copy to Clipboard Toggle word wrap
1
この出力には、ノードおよびノードのマシンのステータスを一覧表示されます。ステータスが running 以外の場合は、マシンは実行されていません。
マシンが実行されていない 場合は、マシンが実行されていないか、またはノードが準備状態にない場合の正常でない etcd メンバーの置き換えの手順を実行します。
ノードが準備状態にないかどうかを判別します。
以下のシナリオのいずれかが true の場合、ノードは準備状態にありません。
- マシンが実行されている場合は、ノードに到達できないかどうかを確認します。
  $ oc get nodes -o jsonpath='{range .items[*]}{"\n"}{.metadata.name}{"\t"}{range .spec.taints[*]}{.key}{" "}' | grep unreachable
  Copy to Clipboard Toggle word wrap
  出力例
  ip-10-0-131-183.ec2.internal node-role.kubernetes.io/master node.kubernetes.io/unreachable node.kubernetes.io/unreachable
  1
  
  Copy to Clipboard Toggle word wrap
  1
  ノードが unreachable テイントと共に一覧表示される場合、ノードの準備はできていません。
- ノードが以前として到達可能である場合は、そのノードが NotReady として一覧表示されているかどうかを確認します。
  $ oc get nodes -l node-role.kubernetes.io/master | grep "NotReady"
  Copy to Clipboard Toggle word wrap
  出力例
  ip-10-0-131-183.ec2.internal NotReady master 122m v1.20.0
  1
  
  Copy to Clipboard Toggle word wrap
  1
  ノードが NotReady として一覧表示されている場合、ノードの準備はできていません。
ノードの準備ができていない 場合は、マシンが実行されていないか、またはノードが準備状態にない場合の正常でない etcd メンバーの置き換えの手順を実行します。

etcd Pod がクラッシュループしているかどうかを判別します。

マシンが実行され、ノードが準備できている場合は、etcd Pod がクラッシュループしているかどうかを確認します。

すべてのコントロールプレーンノード (別名マスターノード) が Ready と記載されていることを確認します。

oc get nodes -l node-role.kubernetes.io/master

$ oc get nodes -l node-role.kubernetes.io/master

Copy to Clipboard

Toggle word wrap

出力例

NAME                           STATUS   ROLES    AGE     VERSION
ip-10-0-131-183.ec2.internal   Ready    master   6h13m   v1.20.0
ip-10-0-164-97.ec2.internal    Ready    master   6h13m   v1.20.0
ip-10-0-154-204.ec2.internal   Ready    master   6h13m   v1.20.0

NAME                           STATUS   ROLES    AGE     VERSION
ip-10-0-131-183.ec2.internal   Ready    master   6h13m   v1.20.0
ip-10-0-164-97.ec2.internal    Ready    master   6h13m   v1.20.0
ip-10-0-154-204.ec2.internal   Ready    master   6h13m   v1.20.0

Copy to Clipboard

Toggle word wrap

etcd Pod のステータスが Error または CrashloopBackoff のいずれかであるかどうかを確認します。

oc get pods -n openshift-etcd | grep -v etcd-quorum-guard | grep etcd

$ oc get pods -n openshift-etcd | grep -v etcd-quorum-guard | grep etcd

Copy to Clipboard

Toggle word wrap

出力例

etcd-ip-10-0-131-183.ec2.internal                2/3     Error       7          6h9m 
etcd-ip-10-0-164-97.ec2.internal                 3/3     Running     0          6h6m
etcd-ip-10-0-154-204.ec2.internal                3/3     Running     0          6h6m

etcd-ip-10-0-131-183.ec2.internal                2/3     Error       7          6h9m


etcd-ip-10-0-164-97.ec2.internal                 3/3     Running     0          6h6m
etcd-ip-10-0-154-204.ec2.internal                3/3     Running     0          6h6m

Copy to Clipboard

Toggle word wrap

1: この Pod のこのステータスは Error であるため、etcd Pod はクラッシュループしています。

etcd Pod がクラッシュループしている場合、etcd Pod がクラッシュループしている場合の正常でない etcd メンバーの置き換えについての手順を実行します。

5.2.4. 正常でない etcd メンバーの置き換え
リンクのコピー

正常でない etcd メンバーの状態に応じて、以下のいずれかの手順を使用します。

5.2.4.1. マシンが実行されていないか、またはノードが準備状態にない場合の正常でない etcd メンバーの置き換え
リンクのコピー

以下の手順では、マシンが実行されていないか、またはノードが準備状態にない場合の正常でない etcd メンバーを置き換える手順を説明します。

前提条件

正常でない etcd メンバーを特定している。
マシンが実行されていないか、またはノードが準備状態にないことを確認している。
cluster-admin ロールを持つユーザーとしてクラスターにアクセスできる。
etcd のバックアップを取得している。
重要
問題が発生した場合にクラスターを復元できるように、この手順を実行する前に etcd バックアップを作成しておくことは重要です。

手順

正常でないメンバーを削除します。

影響を受けるノード上にない Pod を選択します。

クラスターにアクセスできるターミナルで、cluster-admin ユーザーとして以下のコマンドを実行します。

oc get pods -n openshift-etcd | grep -v etcd-quorum-guard | grep etcd

$ oc get pods -n openshift-etcd | grep -v etcd-quorum-guard | grep etcd

Copy to Clipboard

Toggle word wrap

出力例

etcd-ip-10-0-131-183.ec2.internal                3/3     Running     0          123m
etcd-ip-10-0-164-97.ec2.internal                 3/3     Running     0          123m
etcd-ip-10-0-154-204.ec2.internal                3/3     Running     0          124m

etcd-ip-10-0-131-183.ec2.internal                3/3     Running     0          123m
etcd-ip-10-0-164-97.ec2.internal                 3/3     Running     0          123m
etcd-ip-10-0-154-204.ec2.internal                3/3     Running     0          124m

Copy to Clipboard

Toggle word wrap

実行中の etcd コンテナーに接続し、影響を受けるノードにない Pod の名前を渡します。
クラスターにアクセスできるターミナルで、cluster-admin ユーザーとして以下のコマンドを実行します。
```
oc rsh -n openshift-etcd etcd-ip-10-0-154-204.ec2.internal
```
```
$ oc rsh -n openshift-etcd etcd-ip-10-0-154-204.ec2.internal
```
Copy to Clipboard Toggle word wrap

メンバーの一覧を確認します。

etcdctl member list -w table

sh-4.2# etcdctl member list -w table

Copy to Clipboard

Toggle word wrap

出力例

+------------------+---------+------------------------------+---------------------------+---------------------------+
|        ID        | STATUS  |             NAME             |        PEER ADDRS         |       CLIENT ADDRS        |
+------------------+---------+------------------------------+---------------------------+---------------------------+
| 6fc1e7c9db35841d | started | ip-10-0-131-183.ec2.internal | https://10.0.131.183:2380 | https://10.0.131.183:2379 |
| 757b6793e2408b6c | started |  ip-10-0-164-97.ec2.internal |  https://10.0.164.97:2380 |  https://10.0.164.97:2379 |
| ca8c2990a0aa29d1 | started | ip-10-0-154-204.ec2.internal | https://10.0.154.204:2380 | https://10.0.154.204:2379 |
+------------------+---------+------------------------------+---------------------------+---------------------------+

+------------------+---------+------------------------------+---------------------------+---------------------------+
|        ID        | STATUS  |             NAME             |        PEER ADDRS         |       CLIENT ADDRS        |
+------------------+---------+------------------------------+---------------------------+---------------------------+
| 6fc1e7c9db35841d | started | ip-10-0-131-183.ec2.internal | https://10.0.131.183:2380 | https://10.0.131.183:2379 |
| 757b6793e2408b6c | started |  ip-10-0-164-97.ec2.internal |  https://10.0.164.97:2380 |  https://10.0.164.97:2379 |
| ca8c2990a0aa29d1 | started | ip-10-0-154-204.ec2.internal | https://10.0.154.204:2380 | https://10.0.154.204:2379 |
+------------------+---------+------------------------------+---------------------------+---------------------------+

Copy to Clipboard

Toggle word wrap

これらの値はこの手順で後ほど必要となるため、ID および正常でない etcd メンバーの名前を書き留めておきます。$ etcdctl endpoint health コマンドは、補充手順が完了し、新しいメンバーが追加されるまで、削除されたメンバーを一覧表示します。

ID を etcdctl member remove コマンドに指定して、正常でない etcd メンバーを削除します。
```
etcdctl member remove 6fc1e7c9db35841d
```
```
sh-4.2# etcdctl member remove 6fc1e7c9db35841d
```
Copy to Clipboard Toggle word wrap
出力例
```
Member 6fc1e7c9db35841d removed from cluster ead669ce1fbfb346
```
```
Member 6fc1e7c9db35841d removed from cluster ead669ce1fbfb346
```
Copy to Clipboard Toggle word wrap

メンバーの一覧を再度表示し、メンバーが削除されたことを確認します。

etcdctl member list -w table

sh-4.2# etcdctl member list -w table

Copy to Clipboard

Toggle word wrap

出力例

+------------------+---------+------------------------------+---------------------------+---------------------------+
|        ID        | STATUS  |             NAME             |        PEER ADDRS         |       CLIENT ADDRS        |
+------------------+---------+------------------------------+---------------------------+---------------------------+
| 757b6793e2408b6c | started |  ip-10-0-164-97.ec2.internal |  https://10.0.164.97:2380 |  https://10.0.164.97:2379 |
| ca8c2990a0aa29d1 | started | ip-10-0-154-204.ec2.internal | https://10.0.154.204:2380 | https://10.0.154.204:2379 |
+------------------+---------+------------------------------+---------------------------+---------------------------+

+------------------+---------+------------------------------+---------------------------+---------------------------+
|        ID        | STATUS  |             NAME             |        PEER ADDRS         |       CLIENT ADDRS        |
+------------------+---------+------------------------------+---------------------------+---------------------------+
| 757b6793e2408b6c | started |  ip-10-0-164-97.ec2.internal |  https://10.0.164.97:2380 |  https://10.0.164.97:2379 |
| ca8c2990a0aa29d1 | started | ip-10-0-154-204.ec2.internal | https://10.0.154.204:2380 | https://10.0.154.204:2379 |
+------------------+---------+------------------------------+---------------------------+---------------------------+

Copy to Clipboard

Toggle word wrap

これでノードシェルを終了できます。

重要

メンバーを削除した後、残りの etcd インスタンスが再起動している間、クラスターに短時間アクセスできない場合があります。

削除された正常でない etcd メンバーの古いシークレットを削除します。

削除された正常でない etcd メンバーのシークレットを一覧表示します。

oc get secrets -n openshift-etcd | grep ip-10-0-131-183.ec2.internal

$ oc get secrets -n openshift-etcd | grep ip-10-0-131-183.ec2.internal

Copy to Clipboard

Toggle word wrap

1: この手順で先ほど書き留めた正常でない etcd メンバーの名前を渡します。

以下の出力に示されるように、ピア、サービング、およびメトリクスシークレットがあります。

出力例

etcd-peer-ip-10-0-131-183.ec2.internal              kubernetes.io/tls                     2      47m
etcd-serving-ip-10-0-131-183.ec2.internal           kubernetes.io/tls                     2      47m
etcd-serving-metrics-ip-10-0-131-183.ec2.internal   kubernetes.io/tls                     2      47m

etcd-peer-ip-10-0-131-183.ec2.internal              kubernetes.io/tls                     2      47m
etcd-serving-ip-10-0-131-183.ec2.internal           kubernetes.io/tls                     2      47m
etcd-serving-metrics-ip-10-0-131-183.ec2.internal   kubernetes.io/tls                     2      47m

Copy to Clipboard

Toggle word wrap

削除された正常でない etcd メンバーのシークレットを削除します。

ピアシークレットを削除します。

oc delete secret -n openshift-etcd etcd-peer-ip-10-0-131-183.ec2.internal

$ oc delete secret -n openshift-etcd etcd-peer-ip-10-0-131-183.ec2.internal

Copy to Clipboard

Toggle word wrap

提供シークレットを削除します。

oc delete secret -n openshift-etcd etcd-serving-ip-10-0-131-183.ec2.internal

$ oc delete secret -n openshift-etcd etcd-serving-ip-10-0-131-183.ec2.internal

Copy to Clipboard

Toggle word wrap

メトリクスシークレットを削除します。

oc delete secret -n openshift-etcd etcd-serving-metrics-ip-10-0-131-183.ec2.internal

$ oc delete secret -n openshift-etcd etcd-serving-metrics-ip-10-0-131-183.ec2.internal

Copy to Clipboard

Toggle word wrap

コントロールプレーンマシン (別名マスターマシン) を削除し、再作成します。このマシンが再作成されると、新規リビジョンが強制的に実行され、etcd は自動的にスケールアップします。

インストーラーでプロビジョニングされるインフラストラクチャーを実行している場合、またはマシン API を使用してマシンを作成している場合は、以下の手順を実行します。それ以外の場合は、最初に作成する際に使用した方法と同じ方法を使用して新規マスターを作成する必要があります。

正常でないメンバーのマシンを取得します。

クラスターにアクセスできるターミナルで、cluster-admin ユーザーとして以下のコマンドを実行します。

oc get machines -n openshift-machine-api -o wide

$ oc get machines -n openshift-machine-api -o wide

Copy to Clipboard

Toggle word wrap

出力例

NAME                                        PHASE     TYPE        REGION      ZONE         AGE     NODE                           PROVIDERID                              STATE
clustername-8qw5l-master-0                  Running   m4.xlarge   us-east-1   us-east-1a   3h37m   ip-10-0-131-183.ec2.internal   aws:///us-east-1a/i-0ec2782f8287dfb7e   stopped 
clustername-8qw5l-master-1                  Running   m4.xlarge   us-east-1   us-east-1b   3h37m   ip-10-0-154-204.ec2.internal   aws:///us-east-1b/i-096c349b700a19631   running
clustername-8qw5l-master-2                  Running   m4.xlarge   us-east-1   us-east-1c   3h37m   ip-10-0-164-97.ec2.internal    aws:///us-east-1c/i-02626f1dba9ed5bba   running
clustername-8qw5l-worker-us-east-1a-wbtgd   Running   m4.large    us-east-1   us-east-1a   3h28m   ip-10-0-129-226.ec2.internal   aws:///us-east-1a/i-010ef6279b4662ced   running
clustername-8qw5l-worker-us-east-1b-lrdxb   Running   m4.large    us-east-1   us-east-1b   3h28m   ip-10-0-144-248.ec2.internal   aws:///us-east-1b/i-0cb45ac45a166173b   running
clustername-8qw5l-worker-us-east-1c-pkg26   Running   m4.large    us-east-1   us-east-1c   3h28m   ip-10-0-170-181.ec2.internal   aws:///us-east-1c/i-06861c00007751b0a   running

NAME                                        PHASE     TYPE        REGION      ZONE         AGE     NODE                           PROVIDERID                              STATE
clustername-8qw5l-master-0                  Running   m4.xlarge   us-east-1   us-east-1a   3h37m   ip-10-0-131-183.ec2.internal   aws:///us-east-1a/i-0ec2782f8287dfb7e   stopped


clustername-8qw5l-master-1                  Running   m4.xlarge   us-east-1   us-east-1b   3h37m   ip-10-0-154-204.ec2.internal   aws:///us-east-1b/i-096c349b700a19631   running
clustername-8qw5l-master-2                  Running   m4.xlarge   us-east-1   us-east-1c   3h37m   ip-10-0-164-97.ec2.internal    aws:///us-east-1c/i-02626f1dba9ed5bba   running
clustername-8qw5l-worker-us-east-1a-wbtgd   Running   m4.large    us-east-1   us-east-1a   3h28m   ip-10-0-129-226.ec2.internal   aws:///us-east-1a/i-010ef6279b4662ced   running
clustername-8qw5l-worker-us-east-1b-lrdxb   Running   m4.large    us-east-1   us-east-1b   3h28m   ip-10-0-144-248.ec2.internal   aws:///us-east-1b/i-0cb45ac45a166173b   running
clustername-8qw5l-worker-us-east-1c-pkg26   Running   m4.large    us-east-1   us-east-1c   3h28m   ip-10-0-170-181.ec2.internal   aws:///us-east-1c/i-06861c00007751b0a   running

Copy to Clipboard

Toggle word wrap

1: これは正常でないノードのコントロールプレーンマシンです (ip-10-0-131-183.ec2.internal)。

マシン設定をファイルシステムのファイルに保存します。

oc get machine clustername-8qw5l-master-0 \
    -n openshift-machine-api \
    -o yaml \
    > new-master-machine.yaml

$ oc get machine clustername-8qw5l-master-0 \


    -n openshift-machine-api \
    -o yaml \
    > new-master-machine.yaml

Copy to Clipboard

Toggle word wrap

1: 正常でないノードのコントロールプレーンマシンの名前を指定します。

直前の手順で作成された new-master-machine.yaml ファイルを編集し、新しい名前を割り当て、不要なフィールドを削除します。

status セクション全体を削除します。

status:
  addresses:
  - address: 10.0.131.183
    type: InternalIP
  - address: ip-10-0-131-183.ec2.internal
    type: InternalDNS
  - address: ip-10-0-131-183.ec2.internal
    type: Hostname
  lastUpdated: "2020-04-20T17:44:29Z"
  nodeRef:
    kind: Node
    name: ip-10-0-131-183.ec2.internal
    uid: acca4411-af0d-4387-b73e-52b2484295ad
  phase: Running
  providerStatus:
    apiVersion: awsproviderconfig.openshift.io/v1beta1
    conditions:
    - lastProbeTime: "2020-04-20T16:53:50Z"
      lastTransitionTime: "2020-04-20T16:53:50Z"
      message: machine successfully created
      reason: MachineCreationSucceeded
      status: "True"
      type: MachineCreation
    instanceId: i-0fdb85790d76d0c3f
    instanceState: stopped
    kind: AWSMachineProviderStatus

status:
  addresses:
  - address: 10.0.131.183
    type: InternalIP
  - address: ip-10-0-131-183.ec2.internal
    type: InternalDNS
  - address: ip-10-0-131-183.ec2.internal
    type: Hostname
  lastUpdated: "2020-04-20T17:44:29Z"
  nodeRef:
    kind: Node
    name: ip-10-0-131-183.ec2.internal
    uid: acca4411-af0d-4387-b73e-52b2484295ad
  phase: Running
  providerStatus:
    apiVersion: awsproviderconfig.openshift.io/v1beta1
    conditions:
    - lastProbeTime: "2020-04-20T16:53:50Z"
      lastTransitionTime: "2020-04-20T16:53:50Z"
      message: machine successfully created
      reason: MachineCreationSucceeded
      status: "True"
      type: MachineCreation
    instanceId: i-0fdb85790d76d0c3f
    instanceState: stopped
    kind: AWSMachineProviderStatus

Copy to Clipboard

Toggle word wrap

metadata.name フィールドを新規の名前に変更します。
古いマシンと同じベース名を維持し、最後の番号を次に利用可能な番号に変更することが推奨されます。この例では、clustername-8qw5l-master-0 は clustername-8qw5l-master-3 に変更されています。
以下に例を示します。
```
apiVersion: machine.openshift.io/v1beta1
kind: Machine
metadata:
  ...
  name: clustername-8qw5l-master-3
  ...
```
```
apiVersion: machine.openshift.io/v1beta1
kind: Machine
metadata:
  ...
  name: clustername-8qw5l-master-3
  ...
```
Copy to Clipboard Toggle word wrap

spec.providerID フィールドを削除します。

  providerID: aws:///us-east-1a/i-0fdb85790d76d0c3f

  providerID: aws:///us-east-1a/i-0fdb85790d76d0c3f

Copy to Clipboard

Toggle word wrap

metadata.annotations および metadata.generation フィールドを削除します。

  annotations:
    machine.openshift.io/instance-state: running
  ...
  generation: 2

  annotations:
    machine.openshift.io/instance-state: running
  ...
  generation: 2

Copy to Clipboard

Toggle word wrap

metadata.resourceVersion および metadata.uid フィールドを削除します。

  resourceVersion: "13291"
  uid: a282eb70-40a2-4e89-8009-d05dd420d31a

  resourceVersion: "13291"
  uid: a282eb70-40a2-4e89-8009-d05dd420d31a

Copy to Clipboard

Toggle word wrap

正常でないメンバーのマシンを削除します。
```
oc delete machine -n openshift-machine-api clustername-8qw5l-master-0
```
```
$ oc delete machine -n openshift-machine-api clustername-8qw5l-master-0 
```
1
Copy to Clipboard Toggle word wrap
1
正常でないノードのコントロールプレーンマシンの名前を指定します。

マシンが削除されたことを確認します。

oc get machines -n openshift-machine-api -o wide

$ oc get machines -n openshift-machine-api -o wide

Copy to Clipboard

Toggle word wrap

出力例

NAME                                        PHASE     TYPE        REGION      ZONE         AGE     NODE                           PROVIDERID                              STATE
clustername-8qw5l-master-1                  Running   m4.xlarge   us-east-1   us-east-1b   3h37m   ip-10-0-154-204.ec2.internal   aws:///us-east-1b/i-096c349b700a19631   running
clustername-8qw5l-master-2                  Running   m4.xlarge   us-east-1   us-east-1c   3h37m   ip-10-0-164-97.ec2.internal    aws:///us-east-1c/i-02626f1dba9ed5bba   running
clustername-8qw5l-worker-us-east-1a-wbtgd   Running   m4.large    us-east-1   us-east-1a   3h28m   ip-10-0-129-226.ec2.internal   aws:///us-east-1a/i-010ef6279b4662ced   running
clustername-8qw5l-worker-us-east-1b-lrdxb   Running   m4.large    us-east-1   us-east-1b   3h28m   ip-10-0-144-248.ec2.internal   aws:///us-east-1b/i-0cb45ac45a166173b   running
clustername-8qw5l-worker-us-east-1c-pkg26   Running   m4.large    us-east-1   us-east-1c   3h28m   ip-10-0-170-181.ec2.internal   aws:///us-east-1c/i-06861c00007751b0a   running

NAME                                        PHASE     TYPE        REGION      ZONE         AGE     NODE                           PROVIDERID                              STATE
clustername-8qw5l-master-1                  Running   m4.xlarge   us-east-1   us-east-1b   3h37m   ip-10-0-154-204.ec2.internal   aws:///us-east-1b/i-096c349b700a19631   running
clustername-8qw5l-master-2                  Running   m4.xlarge   us-east-1   us-east-1c   3h37m   ip-10-0-164-97.ec2.internal    aws:///us-east-1c/i-02626f1dba9ed5bba   running
clustername-8qw5l-worker-us-east-1a-wbtgd   Running   m4.large    us-east-1   us-east-1a   3h28m   ip-10-0-129-226.ec2.internal   aws:///us-east-1a/i-010ef6279b4662ced   running
clustername-8qw5l-worker-us-east-1b-lrdxb   Running   m4.large    us-east-1   us-east-1b   3h28m   ip-10-0-144-248.ec2.internal   aws:///us-east-1b/i-0cb45ac45a166173b   running
clustername-8qw5l-worker-us-east-1c-pkg26   Running   m4.large    us-east-1   us-east-1c   3h28m   ip-10-0-170-181.ec2.internal   aws:///us-east-1c/i-06861c00007751b0a   running

Copy to Clipboard

Toggle word wrap

new-master-machine.yaml ファイルを使用して新規マシンを作成します。
```
oc apply -f new-master-machine.yaml
```
```
$ oc apply -f new-master-machine.yaml
```
Copy to Clipboard Toggle word wrap

新規マシンが作成されたことを確認します。

oc get machines -n openshift-machine-api -o wide

$ oc get machines -n openshift-machine-api -o wide

Copy to Clipboard

Toggle word wrap

出力例

NAME                                        PHASE          TYPE        REGION      ZONE         AGE     NODE                           PROVIDERID                              STATE
clustername-8qw5l-master-1                  Running        m4.xlarge   us-east-1   us-east-1b   3h37m   ip-10-0-154-204.ec2.internal   aws:///us-east-1b/i-096c349b700a19631   running
clustername-8qw5l-master-2                  Running        m4.xlarge   us-east-1   us-east-1c   3h37m   ip-10-0-164-97.ec2.internal    aws:///us-east-1c/i-02626f1dba9ed5bba   running
clustername-8qw5l-master-3                  Provisioning   m4.xlarge   us-east-1   us-east-1a   85s     ip-10-0-133-53.ec2.internal    aws:///us-east-1a/i-015b0888fe17bc2c8   running 
clustername-8qw5l-worker-us-east-1a-wbtgd   Running        m4.large    us-east-1   us-east-1a   3h28m   ip-10-0-129-226.ec2.internal   aws:///us-east-1a/i-010ef6279b4662ced   running
clustername-8qw5l-worker-us-east-1b-lrdxb   Running        m4.large    us-east-1   us-east-1b   3h28m   ip-10-0-144-248.ec2.internal   aws:///us-east-1b/i-0cb45ac45a166173b   running
clustername-8qw5l-worker-us-east-1c-pkg26   Running        m4.large    us-east-1   us-east-1c   3h28m   ip-10-0-170-181.ec2.internal   aws:///us-east-1c/i-06861c00007751b0a   running

NAME                                        PHASE          TYPE        REGION      ZONE         AGE     NODE                           PROVIDERID                              STATE
clustername-8qw5l-master-1                  Running        m4.xlarge   us-east-1   us-east-1b   3h37m   ip-10-0-154-204.ec2.internal   aws:///us-east-1b/i-096c349b700a19631   running
clustername-8qw5l-master-2                  Running        m4.xlarge   us-east-1   us-east-1c   3h37m   ip-10-0-164-97.ec2.internal    aws:///us-east-1c/i-02626f1dba9ed5bba   running
clustername-8qw5l-master-3                  Provisioning   m4.xlarge   us-east-1   us-east-1a   85s     ip-10-0-133-53.ec2.internal    aws:///us-east-1a/i-015b0888fe17bc2c8   running


clustername-8qw5l-worker-us-east-1a-wbtgd   Running        m4.large    us-east-1   us-east-1a   3h28m   ip-10-0-129-226.ec2.internal   aws:///us-east-1a/i-010ef6279b4662ced   running
clustername-8qw5l-worker-us-east-1b-lrdxb   Running        m4.large    us-east-1   us-east-1b   3h28m   ip-10-0-144-248.ec2.internal   aws:///us-east-1b/i-0cb45ac45a166173b   running
clustername-8qw5l-worker-us-east-1c-pkg26   Running        m4.large    us-east-1   us-east-1c   3h28m   ip-10-0-170-181.ec2.internal   aws:///us-east-1c/i-06861c00007751b0a   running

Copy to Clipboard

Toggle word wrap

1: 新規マシン clustername-8qw5l-master-3 が作成され、Provisioning から Running にフェーズが変更されると準備状態になります。

新規マシンが作成されるまでに数分の時間がかかる場合があります。etcd クラスター Operator はマシンまたはノードが正常な状態に戻ると自動的に同期します。

検証

すべての etcd Pod が適切に実行されていることを確認します。

クラスターにアクセスできるターミナルで、cluster-admin ユーザーとして以下のコマンドを実行します。

oc get pods -n openshift-etcd | grep -v etcd-quorum-guard | grep etcd

$ oc get pods -n openshift-etcd | grep -v etcd-quorum-guard | grep etcd

Copy to Clipboard

Toggle word wrap

出力例

etcd-ip-10-0-133-53.ec2.internal                 3/3     Running     0          7m49s
etcd-ip-10-0-164-97.ec2.internal                 3/3     Running     0          123m
etcd-ip-10-0-154-204.ec2.internal                3/3     Running     0          124m

etcd-ip-10-0-133-53.ec2.internal                 3/3     Running     0          7m49s
etcd-ip-10-0-164-97.ec2.internal                 3/3     Running     0          123m
etcd-ip-10-0-154-204.ec2.internal                3/3     Running     0          124m

Copy to Clipboard

Toggle word wrap

直前のコマンドの出力に 2 つの Pod のみが一覧表示される場合、etcd の再デプロイメントを手動で強制できます。クラスターにアクセスできるターミナルで、cluster-admin ユーザーとして以下のコマンドを実行します。

oc patch etcd cluster -p='{"spec": {"forceRedeploymentReason": "recovery-'"$( date --rfc-3339=ns )"'"}}' --type=merge

$ oc patch etcd cluster -p='{"spec": {"forceRedeploymentReason": "recovery-'"$( date --rfc-3339=ns )"'"}}' --type=merge

Copy to Clipboard

Toggle word wrap

1: forceRedeploymentReason 値は一意である必要があります。そのため、タイムスタンプが付加されます。

3 つの etcd メンバーがあることを確認します。

実行中の etcd コンテナーに接続し、影響を受けるノードになかった Pod の名前を渡します。
クラスターにアクセスできるターミナルで、cluster-admin ユーザーとして以下のコマンドを実行します。
```
oc rsh -n openshift-etcd etcd-ip-10-0-154-204.ec2.internal
```
```
$ oc rsh -n openshift-etcd etcd-ip-10-0-154-204.ec2.internal
```
Copy to Clipboard Toggle word wrap

メンバーの一覧を確認します。

etcdctl member list -w table

sh-4.2# etcdctl member list -w table

Copy to Clipboard

Toggle word wrap

出力例

+------------------+---------+------------------------------+---------------------------+---------------------------+
|        ID        | STATUS  |             NAME             |        PEER ADDRS         |       CLIENT ADDRS        |
+------------------+---------+------------------------------+---------------------------+---------------------------+
| 5eb0d6b8ca24730c | started |  ip-10-0-133-53.ec2.internal |  https://10.0.133.53:2380 |  https://10.0.133.53:2379 |
| 757b6793e2408b6c | started |  ip-10-0-164-97.ec2.internal |  https://10.0.164.97:2380 |  https://10.0.164.97:2379 |
| ca8c2990a0aa29d1 | started | ip-10-0-154-204.ec2.internal | https://10.0.154.204:2380 | https://10.0.154.204:2379 |
+------------------+---------+------------------------------+---------------------------+---------------------------+

+------------------+---------+------------------------------+---------------------------+---------------------------+
|        ID        | STATUS  |             NAME             |        PEER ADDRS         |       CLIENT ADDRS        |
+------------------+---------+------------------------------+---------------------------+---------------------------+
| 5eb0d6b8ca24730c | started |  ip-10-0-133-53.ec2.internal |  https://10.0.133.53:2380 |  https://10.0.133.53:2379 |
| 757b6793e2408b6c | started |  ip-10-0-164-97.ec2.internal |  https://10.0.164.97:2380 |  https://10.0.164.97:2379 |
| ca8c2990a0aa29d1 | started | ip-10-0-154-204.ec2.internal | https://10.0.154.204:2380 | https://10.0.154.204:2379 |
+------------------+---------+------------------------------+---------------------------+---------------------------+

Copy to Clipboard

Toggle word wrap

直前のコマンドの出力に 4 つ以上の etcd メンバーが表示される場合、不要なメンバーを慎重に削除する必要があります。

警告

必ず適切な etcd メンバーを削除します。適切な etcd メンバーを削除すると、クォーラム (定足数) が失われる可能性があります。

5.2.4.2. etcd Pod がクラッシュループしている場合の正常でない etcd メンバーの置き換え
リンクのコピー

この手順では、etcd Pod がクラッシュループしている場合の正常でない etcd メンバーを置き換える手順を説明します。

前提条件

正常でない etcd メンバーを特定している。
etcd Pod がクラッシュループしていることを確認している。
cluster-admin ロールを持つユーザーとしてクラスターにアクセスできる。
etcd のバックアップを取得している。
重要
問題が発生した場合にクラスターを復元できるように、この手順を実行する前に etcd バックアップを作成しておくことは重要です。

手順

クラッシュループしている etcd Pod を停止します。
1. クラッシュループしているノードをデバッグします。
  クラスターにアクセスできるターミナルで、cluster-admin ユーザーとして以下のコマンドを実行します。
  $ oc debug node/ip-10-0-131-183.ec2.internal
  1
  Copy to Clipboard Toggle word wrap
  1
  これを正常でないノードの名前に置き換えます。
2. ルートディレクトリーをホストに切り替えます。
  sh-4.2# chroot /host
  Copy to Clipboard Toggle word wrap
3. 既存の etcd Pod ファイルを kubelet マニフェストディレクトリーから移動します。
  sh-4.2# mkdir /var/lib/etcd-backup
  Copy to Clipboard Toggle word wrap
  sh-4.2# mv /etc/kubernetes/manifests/etcd-pod.yaml /var/lib/etcd-backup/
  Copy to Clipboard Toggle word wrap
4. etcd データディレクトリーを別の場所に移動します。
  sh-4.2# mv /var/lib/etcd/ /tmp
  Copy to Clipboard Toggle word wrap
  これでノードシェルを終了できます。

正常でないメンバーを削除します。

影響を受けるノード上にない Pod を選択します。

クラスターにアクセスできるターミナルで、cluster-admin ユーザーとして以下のコマンドを実行します。

oc get pods -n openshift-etcd | grep -v etcd-quorum-guard | grep etcd

$ oc get pods -n openshift-etcd | grep -v etcd-quorum-guard | grep etcd

Copy to Clipboard

Toggle word wrap

出力例

etcd-ip-10-0-131-183.ec2.internal                2/3     Error       7          6h9m
etcd-ip-10-0-164-97.ec2.internal                 3/3     Running     0          6h6m
etcd-ip-10-0-154-204.ec2.internal                3/3     Running     0          6h6m

etcd-ip-10-0-131-183.ec2.internal                2/3     Error       7          6h9m
etcd-ip-10-0-164-97.ec2.internal                 3/3     Running     0          6h6m
etcd-ip-10-0-154-204.ec2.internal                3/3     Running     0          6h6m

Copy to Clipboard

Toggle word wrap

実行中の etcd コンテナーに接続し、影響を受けるノードにない Pod の名前を渡します。
クラスターにアクセスできるターミナルで、cluster-admin ユーザーとして以下のコマンドを実行します。
```
oc rsh -n openshift-etcd etcd-ip-10-0-154-204.ec2.internal
```
```
$ oc rsh -n openshift-etcd etcd-ip-10-0-154-204.ec2.internal
```
Copy to Clipboard Toggle word wrap

メンバーの一覧を確認します。

etcdctl member list -w table

sh-4.2# etcdctl member list -w table

Copy to Clipboard

Toggle word wrap

出力例

+------------------+---------+------------------------------+---------------------------+---------------------------+
|        ID        | STATUS  |             NAME             |        PEER ADDRS         |       CLIENT ADDRS        |
+------------------+---------+------------------------------+---------------------------+---------------------------+
| 62bcf33650a7170a | started | ip-10-0-131-183.ec2.internal | https://10.0.131.183:2380 | https://10.0.131.183:2379 |
| b78e2856655bc2eb | started |  ip-10-0-164-97.ec2.internal |  https://10.0.164.97:2380 |  https://10.0.164.97:2379 |
| d022e10b498760d5 | started | ip-10-0-154-204.ec2.internal | https://10.0.154.204:2380 | https://10.0.154.204:2379 |
+------------------+---------+------------------------------+---------------------------+---------------------------+

+------------------+---------+------------------------------+---------------------------+---------------------------+
|        ID        | STATUS  |             NAME             |        PEER ADDRS         |       CLIENT ADDRS        |
+------------------+---------+------------------------------+---------------------------+---------------------------+
| 62bcf33650a7170a | started | ip-10-0-131-183.ec2.internal | https://10.0.131.183:2380 | https://10.0.131.183:2379 |
| b78e2856655bc2eb | started |  ip-10-0-164-97.ec2.internal |  https://10.0.164.97:2380 |  https://10.0.164.97:2379 |
| d022e10b498760d5 | started | ip-10-0-154-204.ec2.internal | https://10.0.154.204:2380 | https://10.0.154.204:2379 |
+------------------+---------+------------------------------+---------------------------+---------------------------+

Copy to Clipboard

Toggle word wrap

これらの値はこの手順で後ほど必要となるため、ID および正常でない etcd メンバーの名前を書き留めておきます。

ID を etcdctl member remove コマンドに指定して、正常でない etcd メンバーを削除します。
```
etcdctl member remove 62bcf33650a7170a
```
```
sh-4.2# etcdctl member remove 62bcf33650a7170a
```
Copy to Clipboard Toggle word wrap
出力例
```
Member 62bcf33650a7170a removed from cluster ead669ce1fbfb346
```
```
Member 62bcf33650a7170a removed from cluster ead669ce1fbfb346
```
Copy to Clipboard Toggle word wrap

メンバーの一覧を再度表示し、メンバーが削除されたことを確認します。

etcdctl member list -w table

sh-4.2# etcdctl member list -w table

Copy to Clipboard

Toggle word wrap

出力例

+------------------+---------+------------------------------+---------------------------+---------------------------+
|        ID        | STATUS  |             NAME             |        PEER ADDRS         |       CLIENT ADDRS        |
+------------------+---------+------------------------------+---------------------------+---------------------------+
| b78e2856655bc2eb | started |  ip-10-0-164-97.ec2.internal |  https://10.0.164.97:2380 |  https://10.0.164.97:2379 |
| d022e10b498760d5 | started | ip-10-0-154-204.ec2.internal | https://10.0.154.204:2380 | https://10.0.154.204:2379 |
+------------------+---------+------------------------------+---------------------------+---------------------------+

+------------------+---------+------------------------------+---------------------------+---------------------------+
|        ID        | STATUS  |             NAME             |        PEER ADDRS         |       CLIENT ADDRS        |
+------------------+---------+------------------------------+---------------------------+---------------------------+
| b78e2856655bc2eb | started |  ip-10-0-164-97.ec2.internal |  https://10.0.164.97:2380 |  https://10.0.164.97:2379 |
| d022e10b498760d5 | started | ip-10-0-154-204.ec2.internal | https://10.0.154.204:2380 | https://10.0.154.204:2379 |
+------------------+---------+------------------------------+---------------------------+---------------------------+

Copy to Clipboard

Toggle word wrap

これでノードシェルを終了できます。

削除された正常でない etcd メンバーの古いシークレットを削除します。

削除された正常でない etcd メンバーのシークレットを一覧表示します。

oc get secrets -n openshift-etcd | grep ip-10-0-131-183.ec2.internal

$ oc get secrets -n openshift-etcd | grep ip-10-0-131-183.ec2.internal

Copy to Clipboard

Toggle word wrap

1: この手順で先ほど書き留めた正常でない etcd メンバーの名前を渡します。

以下の出力に示されるように、ピア、サービング、およびメトリクスシークレットがあります。

出力例

etcd-peer-ip-10-0-131-183.ec2.internal              kubernetes.io/tls                     2      47m
etcd-serving-ip-10-0-131-183.ec2.internal           kubernetes.io/tls                     2      47m
etcd-serving-metrics-ip-10-0-131-183.ec2.internal   kubernetes.io/tls                     2      47m

etcd-peer-ip-10-0-131-183.ec2.internal              kubernetes.io/tls                     2      47m
etcd-serving-ip-10-0-131-183.ec2.internal           kubernetes.io/tls                     2      47m
etcd-serving-metrics-ip-10-0-131-183.ec2.internal   kubernetes.io/tls                     2      47m

Copy to Clipboard

Toggle word wrap

削除された正常でない etcd メンバーのシークレットを削除します。

ピアシークレットを削除します。

oc delete secret -n openshift-etcd etcd-peer-ip-10-0-131-183.ec2.internal

$ oc delete secret -n openshift-etcd etcd-peer-ip-10-0-131-183.ec2.internal

Copy to Clipboard

Toggle word wrap

提供シークレットを削除します。

oc delete secret -n openshift-etcd etcd-serving-ip-10-0-131-183.ec2.internal

$ oc delete secret -n openshift-etcd etcd-serving-ip-10-0-131-183.ec2.internal

Copy to Clipboard

Toggle word wrap

メトリクスシークレットを削除します。

oc delete secret -n openshift-etcd etcd-serving-metrics-ip-10-0-131-183.ec2.internal

$ oc delete secret -n openshift-etcd etcd-serving-metrics-ip-10-0-131-183.ec2.internal

Copy to Clipboard

Toggle word wrap

etcd の再デプロイメントを強制的に実行します。
クラスターにアクセスできるターミナルで、cluster-admin ユーザーとして以下のコマンドを実行します。
```
oc patch etcd cluster -p='{"spec": {"forceRedeploymentReason": "single-master-recovery-'"$( date --rfc-3339=ns )"'"}}' --type=merge
```
```
$ oc patch etcd cluster -p='{"spec": {"forceRedeploymentReason": "single-master-recovery-'"$( date --rfc-3339=ns )"'"}}' --type=merge 
```
1
Copy to Clipboard Toggle word wrap
1
forceRedeploymentReason 値は一意である必要があります。そのため、タイムスタンプが付加されます。
etcd クラスター Operator が再デプロイを実行する場合、すべてのコントロールプレーンノード (別名マスターノード) に機能する etcd Pod があることを確認します。

検証

新しいメンバーが利用可能で、正常な状態にあることを確認します。

再度実行中の etcd コンテナーに接続します。
クラスターにアクセスできるターミナルで、cluster-admin ユーザーとして以下のコマンドを実行します。
```
oc rsh -n openshift-etcd etcd-ip-10-0-154-204.ec2.internal
```
```
$ oc rsh -n openshift-etcd etcd-ip-10-0-154-204.ec2.internal
```
Copy to Clipboard Toggle word wrap

すべてのメンバーが正常であることを確認します。

etcdctl endpoint health --cluster

sh-4.2# etcdctl endpoint health --cluster

Copy to Clipboard

Toggle word wrap

出力例

https://10.0.131.183:2379 is healthy: successfully committed proposal: took = 16.671434ms
https://10.0.154.204:2379 is healthy: successfully committed proposal: took = 16.698331ms
https://10.0.164.97:2379 is healthy: successfully committed proposal: took = 16.621645ms

https://10.0.131.183:2379 is healthy: successfully committed proposal: took = 16.671434ms
https://10.0.154.204:2379 is healthy: successfully committed proposal: took = 16.698331ms
https://10.0.164.97:2379 is healthy: successfully committed proposal: took = 16.621645ms

Copy to Clipboard

Toggle word wrap

5.2. 正常でない etcd メンバーの置き換え

5.2.1. 前提条件
リンクのコピー

5.2.2. 正常でない etcd メンバーの特定
リンクのコピー

5.2.3. 正常でない etcd メンバーの状態の判別
リンクのコピー

5.2.4. 正常でない etcd メンバーの置き換え
リンクのコピー

5.2.4.1. マシンが実行されていないか、またはノードが準備状態にない場合の正常でない etcd メンバーの置き換え
リンクのコピー

5.2.4.2. etcd Pod がクラッシュループしている場合の正常でない etcd メンバーの置き換え
リンクのコピー

詳細情報

試用、購入および販売

コミュニティー

Red Hat ドキュメントについて

多様性を受け入れるオープンソースの強化

会社概要

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

5.2. 正常でない etcd メンバーの置き換え

5.2.1. 前提条件リンクのコピーリンクがクリップボードにコピーされました!

5.2.2. 正常でない etcd メンバーの特定リンクのコピーリンクがクリップボードにコピーされました!

5.2.3. 正常でない etcd メンバーの状態の判別リンクのコピーリンクがクリップボードにコピーされました!

5.2.4. 正常でない etcd メンバーの置き換えリンクのコピーリンクがクリップボードにコピーされました!

5.2.4.1. マシンが実行されていないか、またはノードが準備状態にない場合の正常でない etcd メンバーの置き換えリンクのコピーリンクがクリップボードにコピーされました!

5.2.4.2. etcd Pod がクラッシュループしている場合の正常でない etcd メンバーの置き換えリンクのコピーリンクがクリップボードにコピーされました!

詳細情報

試用、購入および販売

コミュニティー

Red Hat ドキュメントについて

多様性を受け入れるオープンソースの強化

会社概要

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

5.2.1. 前提条件
リンクのコピー

5.2.2. 正常でない etcd メンバーの特定
リンクのコピー

5.2.3. 正常でない etcd メンバーの状態の判別
リンクのコピー

5.2.4. 正常でない etcd メンバーの置き換え
リンクのコピー

5.2.4.1. マシンが実行されていないか、またはノードが準備状態にない場合の正常でない etcd メンバーの置き換え
リンクのコピー

5.2.4.2. etcd Pod がクラッシュループしている場合の正常でない etcd メンバーの置き換え
リンクのコピー