6.6. 替换没有 BMC 凭证的裸机 control plane 节点失败
如果裸机集群中的 control plane 节点失败且无法恢复,但在没有提供基板管理控制器(BMC)凭证的情况下安装集群,则必须执行额外的步骤来将故障节点替换为新节点。
6.6.1. 先决条件 复制链接链接已复制到粘贴板!
- 您已找出不健康的裸机 etcd 成员。
- 您已确认机器没有运行,或者该节点未就绪。
-
您可以使用具有
cluster-admin角色的用户访问集群。 - 如果您遇到任何问题,已进行 etcd 备份。
-
您已下载并安装
coreos-installerCLI。 您的集群没有 control plane
machineset。您可以运行以下命令来检查机器集:oc get machinesets,controlplanemachinesets -n openshift-machine-api
$ oc get machinesets,controlplanemachinesets -n openshift-machine-apiCopy to Clipboard Copied! Toggle word wrap Toggle overflow 重要worker 应该只有
一个或者多个机器集。如果controlplanemachinesets存在,请不要使用此流程。
6.6.2. 删除不健康的 etcd 成员 复制链接链接已复制到粘贴板!
首先删除不健康的 etcd 成员,开始删除失败的 control plane 节点。
流程
运行以下命令列出 etcd pod,并记录没有在受影响节点上的 pod:
oc -n openshift-etcd get pods -l k8s-app=etcd -o wide
$ oc -n openshift-etcd get pods -l k8s-app=etcd -o wideCopy to Clipboard Copied! Toggle word wrap Toggle overflow 输出示例
etcd-openshift-control-plane-0 5/5 Running 11 3h56m 192.168.10.9 openshift-control-plane-0 <none> <none> etcd-openshift-control-plane-1 5/5 Running 0 3h54m 192.168.10.10 openshift-control-plane-1 <none> <none> etcd-openshift-control-plane-2 5/5 Running 0 3h58m 192.168.10.11 openshift-control-plane-2 <none> <none>
etcd-openshift-control-plane-0 5/5 Running 11 3h56m 192.168.10.9 openshift-control-plane-0 <none> <none> etcd-openshift-control-plane-1 5/5 Running 0 3h54m 192.168.10.10 openshift-control-plane-1 <none> <none> etcd-openshift-control-plane-2 5/5 Running 0 3h58m 192.168.10.11 openshift-control-plane-2 <none> <none>Copy to Clipboard Copied! Toggle word wrap Toggle overflow 运行以下命令连接到正在运行的 etcd 容器:
oc rsh -n openshift-etcd <etcd_pod>
$ oc rsh -n openshift-etcd <etcd_pod>Copy to Clipboard Copied! Toggle word wrap Toggle overflow 将
<etcd_pod> 替换为与一个健康节点关联的 etcd pod 的名称。示例命令
oc rsh -n openshift-etcd etcd-openshift-control-plane-0
$ oc rsh -n openshift-etcd etcd-openshift-control-plane-0Copy to Clipboard Copied! Toggle word wrap Toggle overflow 运行以下命令来查看 etcd 成员列表。记录不健康的 etcd 成员的 ID 和名称,因为稍后需要这些值。
etcdctl member list -w table
sh-4.2# etcdctl member list -w tableCopy to Clipboard Copied! Toggle word wrap Toggle overflow 输出示例
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 重要etcdctl endpoint health命令将列出已删除的成员,直到完成替换并添加新成员为止。运行以下命令来删除不健康的 etcd 成员:
etcdctl member remove <unhealthy_member_id>
sh-4.2# etcdctl member remove <unhealthy_member_id>Copy to Clipboard Copied! Toggle word wrap Toggle overflow 将
<unhealthy_member_id> 替换为不健康节点上的 etcd 成员的 ID。示例命令
etcdctl member remove 6fc1e7c9db35841d
sh-4.2# etcdctl member remove 6fc1e7c9db35841dCopy to Clipboard Copied! Toggle word wrap Toggle overflow 输出示例
Member 6fc1e7c9db35841d removed from cluster b23536c33f2cdd1b
Member 6fc1e7c9db35841d removed from cluster b23536c33f2cdd1bCopy to Clipboard Copied! Toggle word wrap Toggle overflow 运行以下命令来再次查看成员列表,并验证成员已被删除:
etcdctl member list -w table
sh-4.2# etcdctl member list -w tableCopy to Clipboard Copied! Toggle word wrap Toggle overflow 输出示例
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 重要删除成员后,在剩余的 etcd 实例重启时,集群可能无法访问。
运行以下命令,将 rsh 会话退出到 etcd pod 中:
exit
sh-4.2# exitCopy to Clipboard Copied! Toggle word wrap Toggle overflow 运行以下命令关闭 etcd 仲裁保护:
oc patch etcd/cluster --type=merge -p '{"spec": {"unsupportedConfigOverrides": {"useUnsupportedUnsafeNonHANonProductionUnstableEtcd": true}}}'$ oc patch etcd/cluster --type=merge -p '{"spec": {"unsupportedConfigOverrides": {"useUnsupportedUnsafeNonHANonProductionUnstableEtcd": true}}}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow 此命令可确保您可以成功重新创建机密并推出静态 pod。
运行以下命令,列出已删除的不健康 etcd 成员的 secret:
oc get secrets -n openshift-etcd | grep <node_name>
$ oc get secrets -n openshift-etcd | grep <node_name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow 将
<node_name> 替换为您删除的 etcd 成员的故障节点的名称。示例命令
oc get secrets -n openshift-etcd | grep openshift-control-plane-2
$ oc get secrets -n openshift-etcd | grep openshift-control-plane-2Copy to Clipboard Copied! Toggle word wrap Toggle overflow 输出示例
etcd-peer-openshift-control-plane-2 kubernetes.io/tls 2 134m etcd-serving-metrics-openshift-control-plane-2 kubernetes.io/tls 2 134m etcd-serving-openshift-control-plane-2 kubernetes.io/tls 2 134m
etcd-peer-openshift-control-plane-2 kubernetes.io/tls 2 134m etcd-serving-metrics-openshift-control-plane-2 kubernetes.io/tls 2 134m etcd-serving-openshift-control-plane-2 kubernetes.io/tls 2 134mCopy to Clipboard Copied! Toggle word wrap Toggle overflow 删除与已删除的受影响节点关联的 secret:
运行以下命令来删除 peer secret:
oc delete secret -n openshift-etcd etcd-peer-<node_name>
$ oc delete secret -n openshift-etcd etcd-peer-<node_name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow 将
<node_name> 替换为受影响节点的名称。运行以下命令来删除 serving secret:
oc delete secret -n openshift-etcd etcd-serving-<node_name>
$ oc delete secret -n openshift-etcd etcd-serving-<node_name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow 将
<node_name> 替换为受影响节点的名称。运行以下命令来删除 metrics secret:
oc delete secret -n openshift-etcd etcd-serving-metrics-<node_name>
$ oc delete secret -n openshift-etcd etcd-serving-metrics-<node_name>1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow 将
<node_name> 替换为受影响节点的名称。
6.6.3. 删除不健康 etcd 成员的机器 复制链接链接已复制到粘贴板!
通过删除不健康 etcd 成员的机器来完成失败的 control plane 节点。
流程
运行以下命令,确保 Bare Metal Operator 可用:
oc get clusteroperator baremetal
$ oc get clusteroperator baremetalCopy to Clipboard Copied! Toggle word wrap Toggle overflow 输出示例
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE baremetal 4.20.0 True False False 3d15h
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE baremetal 4.20.0 True False False 3d15hCopy to Clipboard Copied! Toggle word wrap Toggle overflow 运行以下命令,将受影响节点的
BareMetalHost对象保存到文件中,以便稍后使用:oc get -n openshift-machine-api bmh <node_name> -o yaml > bmh_affected.yaml
$ oc get -n openshift-machine-api bmh <node_name> -o yaml > bmh_affected.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow 将
<node_name> 替换为受影响节点的名称,这通常与关联的BareMetalHost名称匹配。运行以下命令,查看保存的
BareMetalHost对象的 YAML 文件,并确保内容正确:cat bmh_affected.yaml
$ cat bmh_affected.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow 运行以下命令来删除受影响的
BareMetalHost对象:oc delete -n openshift-machine-api bmh <node_name>
$ oc delete -n openshift-machine-api bmh <node_name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow 将
<node_name> 替换为受影响节点的名称。运行以下命令列出所有机器,并识别与受影响节点关联的机器:
oc get machines -n openshift-machine-api -o wide
$ oc get machines -n openshift-machine-api -o wideCopy to Clipboard Copied! Toggle word wrap Toggle overflow 输出示例
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 运行以下命令来删除不健康成员的机器:
oc delete machine -n openshift-machine-api <machine_name>
$ oc delete machine -n openshift-machine-api <machine_name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow 将
<machine_name> 替换为与受影响节点关联的机器名称。示例命令
oc delete machine -n openshift-machine-api examplecluster-control-plane-2
$ oc delete machine -n openshift-machine-api examplecluster-control-plane-2Copy to Clipboard Copied! Toggle word wrap Toggle overflow 注意删除
BareMetalHost和Machine对象后,机器控制器会自动删除Node对象。如果删除机器因任何原因或者命令被模糊而延迟而延迟而延迟,则通过删除机器对象终结器字段来强制删除。
警告不要通过按
Ctrl+c中断机器删除。您必须允许命令继续完成。打开一个新的终端窗口来编辑并删除 finalizer 字段。在新的终端窗口中,运行以下命令来编辑机器配置:
oc edit machine -n openshift-machine-api examplecluster-control-plane-2
$ oc edit machine -n openshift-machine-api examplecluster-control-plane-2Copy to Clipboard Copied! Toggle word wrap Toggle overflow 删除
Machine自定义资源中的以下字段,然后保存更新的文件:finalizers: - machine.machine.openshift.io
finalizers: - machine.machine.openshift.ioCopy to Clipboard Copied! Toggle word wrap Toggle overflow 输出示例
machine.machine.openshift.io/examplecluster-control-plane-2 edited
machine.machine.openshift.io/examplecluster-control-plane-2 editedCopy to Clipboard Copied! Toggle word wrap Toggle overflow
6.6.4. 验证失败的节点已被删除 复制链接链接已复制到粘贴板!
在继续创建替换 control plane 节点前,请验证失败的节点是否已成功删除。
流程
运行以下命令验证机器是否已删除:
oc get machines -n openshift-machine-api -o wide
$ oc get machines -n openshift-machine-api -o wideCopy to Clipboard Copied! Toggle word wrap Toggle overflow 输出示例
NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE examplecluster-control-plane-0 Running 3h11m openshift-control-plane-0 baremetalhost:///openshift-machine-api/openshift-control-plane-0/da1ebe11-3ff2-41c5-b099-0aa41222964e externally provisioned examplecluster-control-plane-1 Running 3h11m openshift-control-plane-1 baremetalhost:///openshift-machine-api/openshift-control-plane-1/d9f9acbc-329c-475e-8d81-03b20280a3e1 externally provisioned examplecluster-compute-0 Running 165m openshift-compute-0 baremetalhost:///openshift-machine-api/openshift-compute-0/3d685b81-7410-4bb3-80ec-13a31858241f provisioned examplecluster-compute-1 Running 165m openshift-compute-1 baremetalhost:///openshift-machine-api/openshift-compute-1/0fdae6eb-2066-4241-91dc-e7ea72ab13b9 provisioned
NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE examplecluster-control-plane-0 Running 3h11m openshift-control-plane-0 baremetalhost:///openshift-machine-api/openshift-control-plane-0/da1ebe11-3ff2-41c5-b099-0aa41222964e externally provisioned examplecluster-control-plane-1 Running 3h11m openshift-control-plane-1 baremetalhost:///openshift-machine-api/openshift-control-plane-1/d9f9acbc-329c-475e-8d81-03b20280a3e1 externally provisioned examplecluster-compute-0 Running 165m openshift-compute-0 baremetalhost:///openshift-machine-api/openshift-compute-0/3d685b81-7410-4bb3-80ec-13a31858241f provisioned examplecluster-compute-1 Running 165m openshift-compute-1 baremetalhost:///openshift-machine-api/openshift-compute-1/0fdae6eb-2066-4241-91dc-e7ea72ab13b9 provisionedCopy to Clipboard Copied! Toggle word wrap Toggle overflow 运行以下命令验证节点是否已删除:
oc get nodes
$ oc get nodesCopy to Clipboard Copied! Toggle word wrap Toggle overflow 输出示例
NAME STATUS ROLES AGE VERSION openshift-control-plane-0 Ready master 3h24m v1.33.4 openshift-control-plane-1 Ready master 3h24m v1.33.4 openshift-compute-0 Ready worker 176m v1.33.4 openshift-compute-1 Ready worker 176m v1.33.4
NAME STATUS ROLES AGE VERSION openshift-control-plane-0 Ready master 3h24m v1.33.4 openshift-control-plane-1 Ready master 3h24m v1.33.4 openshift-compute-0 Ready worker 176m v1.33.4 openshift-compute-1 Ready worker 176m v1.33.4Copy to Clipboard Copied! Toggle word wrap Toggle overflow 等待所有集群 Operator 完成推出更改。运行以下命令来监控进度:
watch oc get co
$ watch oc get coCopy to Clipboard Copied! Toggle word wrap Toggle overflow
6.6.5. 创建新的 control plane 节点 复制链接链接已复制到粘贴板!
通过创建 BareMetalHost 对象和节点开始创建新的 control plane 节点。
流程
编辑之前保存的
bmh_affected.yaml文件:从文件中删除以下元数据项目:
-
creationTimestamp -
generation -
resourceVersion -
uid
-
-
删除文件的
status部分。
生成的文件应类似以下示例:
bmh_affected.yaml文件示例Copy to Clipboard Copied! Toggle word wrap Toggle overflow 运行以下命令,使用
bmh_affected.yaml文件创建BareMetalHost对象:oc create -f bmh_affected.yaml
$ oc create -f bmh_affected.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow 在创建
BareMetalHost对象时会出现以下警告:Warning: metadata.finalizers: "baremetalhost.metal3.io": prefer a domain-qualified finalizer name to avoid accidental conflicts with other finalizer writers
Warning: metadata.finalizers: "baremetalhost.metal3.io": prefer a domain-qualified finalizer name to avoid accidental conflicts with other finalizer writersCopy to Clipboard Copied! Toggle word wrap Toggle overflow 运行以下命令来提取 control plane ignition secret:
oc extract secret/master-user-data-managed \ -n openshift-machine-api \ --keys=userData \ --to=- \ | sed '/^userData/d' > new_controlplane.ign$ oc extract secret/master-user-data-managed \ -n openshift-machine-api \ --keys=userData \ --to=- \ | sed '/^userData/d' > new_controlplane.ignCopy to Clipboard Copied! Toggle word wrap Toggle overflow 此命令还会删除 ignition secret 的起始
userData行。使用以下示例参考,为新节点的网络配置创建一个名为
new_controlplane_nmstate.yaml的 Nmstate YAML 文件:Nmstate YAML 文件示例
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 注意如果使用基于代理的安装程序安装集群,您可以使用原始集群部署的
agent-config.yaml文件中的故障节点的networkConfig部分作为新 control plane 节点的 Nmstate 文件的起点。例如,以下命令提取第一个 control plane 节点的networkConfig部分:cat agent-config-iso.yaml | yq .hosts[0].networkConfig > new_controlplane_nmstate.yaml
$ cat agent-config-iso.yaml | yq .hosts[0].networkConfig > new_controlplane_nmstate.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow 运行以下命令,创建自定义的 Red Hat Enterprise Linux CoreOS (RHCOS) live ISO:
coreos-installer iso customize rhcos-live.86_64.iso \ --dest-ignition new_controlplane.ign \ --network-nmstate new_controlplane_nmstate.yaml \ --dest-device /dev/disk/by-path/<device_path> \ -f$ coreos-installer iso customize rhcos-live.86_64.iso \ --dest-ignition new_controlplane.ign \ --network-nmstate new_controlplane_nmstate.yaml \ --dest-device /dev/disk/by-path/<device_path> \ -fCopy to Clipboard Copied! Toggle word wrap Toggle overflow 将 <
device_path> 替换为生成 ISO 的目标设备的路径。- 使用自定义 RHCOS live ISO 引导新的 control plane 节点。
- 批准证书签名请求(CSR),将新节点加入到集群中。
6.6.6. 将节点、裸机主机和机器连接在一起 复制链接链接已复制到粘贴板!
通过创建机器,然后将其链接到新的 BareMetalHost 对象和节点,继续创建新的 control plane 节点。
流程
运行以下命令,获取 control plane 节点的
providerID:oc get -n openshift-machine-api baremetalhost -l installer.openshift.io/role=control-plane -ojson | jq -r '.items[] | "baremetalhost:///openshift-machine-api/" + .metadata.name + "/" + .metadata.uid'
$ oc get -n openshift-machine-api baremetalhost -l installer.openshift.io/role=control-plane -ojson | jq -r '.items[] | "baremetalhost:///openshift-machine-api/" + .metadata.name + "/" + .metadata.uid'Copy to Clipboard Copied! Toggle word wrap Toggle overflow 输出示例
baremetalhost:///openshift-machine-api/master-00/6214c5cf-c798-4168-8c78-1ff1a3cd2cb4 baremetalhost:///openshift-machine-api/master-01/58fb60bd-b2a6-4ff3-a88d-208c33abf954 baremetalhost:///openshift-machine-api/master-02/dc5a94f3-625b-43f6-ab5a-7cc4fc79f105
baremetalhost:///openshift-machine-api/master-00/6214c5cf-c798-4168-8c78-1ff1a3cd2cb4 baremetalhost:///openshift-machine-api/master-01/58fb60bd-b2a6-4ff3-a88d-208c33abf954 baremetalhost:///openshift-machine-api/master-02/dc5a94f3-625b-43f6-ab5a-7cc4fc79f105Copy to Clipboard Copied! Toggle word wrap Toggle overflow 运行以下命令,获取标签的集群信息:
oc get machine -n openshift-machine-api \ -l machine.openshift.io/cluster-api-machine-role=master \ -L machine.openshift.io/cluster-api-cluster$ oc get machine -n openshift-machine-api \ -l machine.openshift.io/cluster-api-machine-role=master \ -L machine.openshift.io/cluster-api-clusterCopy to Clipboard Copied! Toggle word wrap Toggle overflow 输出示例
NAME PHASE TYPE REGION ZONE AGE CLUSTER-API-CLUSTER ci-op-jcp3s7wx-ng5sd-master-0 Running 10h ci-op-jcp3s7wx-ng5sd ci-op-jcp3s7wx-ng5sd-master-1 Running 10h ci-op-jcp3s7wx-ng5sd ci-op-jcp3s7wx-ng5sd-master-2 Running 10h ci-op-jcp3s7wx-ng5sd
NAME PHASE TYPE REGION ZONE AGE CLUSTER-API-CLUSTER ci-op-jcp3s7wx-ng5sd-master-0 Running 10h ci-op-jcp3s7wx-ng5sd ci-op-jcp3s7wx-ng5sd-master-1 Running 10h ci-op-jcp3s7wx-ng5sd ci-op-jcp3s7wx-ng5sd-master-2 Running 10h ci-op-jcp3s7wx-ng5sdCopy to Clipboard Copied! Toggle word wrap Toggle overflow 通过创建一个类似如下的 yaml 文件,为新 control plane 节点创建
Machine对象:Copy to Clipboard Copied! Toggle word wrap Toggle overflow 其中:
<new_control_plane_machine>- 指定新机器的名称,可以与之前删除的机器名称相同。
<cluster_api_cluster>-
为其他 control plane 机器指定
CLUSTER-API-CLUSTER值,如上一步的输出中所示。 <provider_id>-
指定新裸机主机的
providerID值,如前一步的输出中所示。
应该会发出以下警告:
Warning: metadata.finalizers: "machine.machine.openshift.io": prefer a domain-qualified finalizer name to avoid accidental conflicts with other finalizer writers
Warning: metadata.finalizers: "machine.machine.openshift.io": prefer a domain-qualified finalizer name to avoid accidental conflicts with other finalizer writersCopy to Clipboard Copied! Toggle word wrap Toggle overflow 通过在单个 bash shell 会话中执行以下步骤,将新的 control plane 节点和
Machine对象链接到BareMetalHost对象:运行以下命令来定义
NEW_NODE_NAME变量:NEW_NODE_NAME=<new_node_name>
$ NEW_NODE_NAME=<new_node_name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow 将
<new_node_name> 替换为新 control plane 节点的名称。运行以下命令定义
NEW_MACHINE_NAME变量:NEW_MACHINE_NAME=<new_machine_name>
$ NEW_MACHINE_NAME=<new_machine_name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow 将
<new_machine_name> 替换为新机器的名称。运行以下命令来定义
BMH_UID,使其从新节点的BareMetalHost对象中提取:BMH_UID=$(oc get -n openshift-machine-api bmh $NEW_NODE_NAME -ojson | jq -r .metadata.uid)
$ BMH_UID=$(oc get -n openshift-machine-api bmh $NEW_NODE_NAME -ojson | jq -r .metadata.uid)Copy to Clipboard Copied! Toggle word wrap Toggle overflow echo $BMH_UID
$ echo $BMH_UIDCopy to Clipboard Copied! Toggle word wrap Toggle overflow 运行以下命令,将
consumerRef对象修补到裸机主机中:oc patch -n openshift-machine-api bmh $NEW_NODE_NAME --type merge --patch '{"spec":{"consumerRef":{"apiVersion":"machine.openshift.io/v1beta1","kind":"Machine","name":"'$NEW_MACHINE_NAME'","namespace":"openshift-machine-api"}}}'$ oc patch -n openshift-machine-api bmh $NEW_NODE_NAME --type merge --patch '{"spec":{"consumerRef":{"apiVersion":"machine.openshift.io/v1beta1","kind":"Machine","name":"'$NEW_MACHINE_NAME'","namespace":"openshift-machine-api"}}}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow 运行以下命令,将
providerID值修补到新节点:oc patch node $NEW_NODE_NAME --type merge --patch '{"spec":{"providerID":"baremetalhost:///openshift-machine-api/'$NEW_NODE_NAME'/'$BMH_UID'"}}'$ oc patch node $NEW_NODE_NAME --type merge --patch '{"spec":{"providerID":"baremetalhost:///openshift-machine-api/'$NEW_NODE_NAME'/'$BMH_UID'"}}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow 运行以下命令,查看
providerID值:oc get node -l node-role.kubernetes.io/control-plane -ojson | jq -r '.items[] | .metadata.name + " " + .spec.providerID'
$ oc get node -l node-role.kubernetes.io/control-plane -ojson | jq -r '.items[] | .metadata.name + " " + .spec.providerID'Copy to Clipboard Copied! Toggle word wrap Toggle overflow
运行以下命令,将
BareMetalHost对象的poweredOn状态设置为true:oc patch -n openshift-machine-api bmh $NEW_NODE_NAME --subresource status --type json -p '[{"op":"replace","path":"/status/poweredOn","value":true}]'$ oc patch -n openshift-machine-api bmh $NEW_NODE_NAME --subresource status --type json -p '[{"op":"replace","path":"/status/poweredOn","value":true}]'Copy to Clipboard Copied! Toggle word wrap Toggle overflow 运行以下命令,查看
BareMetalHost对象的poweredOn状态:oc get bmh -n openshift-machine-api -ojson | jq -r '.items[] | .metadata.name + " PoweredOn:" + (.status.poweredOn | tostring)'
$ oc get bmh -n openshift-machine-api -ojson | jq -r '.items[] | .metadata.name + " PoweredOn:" + (.status.poweredOn | tostring)'Copy to Clipboard Copied! Toggle word wrap Toggle overflow 运行以下命令,查看
BareMetalHost对象的置备状态:oc get bmh -n openshift-machine-api -ojson | jq -r '.items[] | .metadata.name + " ProvisioningState:" + .status.provisioning.state'
$ oc get bmh -n openshift-machine-api -ojson | jq -r '.items[] | .metadata.name + " ProvisioningState:" + .status.provisioning.state'Copy to Clipboard Copied! Toggle word wrap Toggle overflow 重要如果置备状态不是
非受管状态,请运行以下命令来更改置备状态:oc patch -n openshift-machine-api bmh $NEW_NODE_NAME --subresource status --type json -p '[{"op":"replace","path":"/status/provisioning/state","value":"unmanaged"}]'$ oc patch -n openshift-machine-api bmh $NEW_NODE_NAME --subresource status --type json -p '[{"op":"replace","path":"/status/provisioning/state","value":"unmanaged"}]'Copy to Clipboard Copied! Toggle word wrap Toggle overflow 运行以下命令,将机器的状态设置为
Provisioned:oc patch -n openshift-machine-api machines $NEW_MACHINE_NAME -n openshift-machine-api --subresource status --type json -p '[{"op":"replace","path":"/status/phase","value":"Provisioned"}]'$ oc patch -n openshift-machine-api machines $NEW_MACHINE_NAME -n openshift-machine-api --subresource status --type json -p '[{"op":"replace","path":"/status/phase","value":"Provisioned"}]'Copy to Clipboard Copied! Toggle word wrap Toggle overflow
6.6.7. 添加新 etcd 成员 复制链接链接已复制到粘贴板!
通过向集群添加新 etcd 成员,完成添加新的 control plane 节点。
流程
通过在单个 bash shell 会话中执行以下步骤来在集群中添加新的 etcd 成员:
运行以下命令,查找新 control plane 节点的 IP:
oc get nodes -owide -l node-role.kubernetes.io/control-plane
$ oc get nodes -owide -l node-role.kubernetes.io/control-planeCopy to Clipboard Copied! Toggle word wrap Toggle overflow 记录节点的 IP 地址供以后使用。
运行以下命令列出 etcd pod:
oc get -n openshift-etcd pods -l k8s-app=etcd -o wide
$ oc get -n openshift-etcd pods -l k8s-app=etcd -o wideCopy to Clipboard Copied! Toggle word wrap Toggle overflow 运行以下命令,连接到正在运行的 etcd pod 之一。新节点上的 etcd pod 应处于
CrashLoopBackOff状态。oc rsh -n openshift-etcd <running_pod>
$ oc rsh -n openshift-etcd <running_pod>Copy to Clipboard Copied! Toggle word wrap Toggle overflow 将
<running_pod> 替换为上一步中显示的正在运行的 pod 的名称。运行以下命令来查看 etcd 成员列表:
etcdctl member list -w table
sh-4.2# etcdctl member list -w tableCopy to Clipboard Copied! Toggle word wrap Toggle overflow 运行以下命令添加新的 control plane etcd 成员:
etcdctl member add <new_node> --peer-urls="https://<ip_address>:2380"
sh-4.2# etcdctl member add <new_node> --peer-urls="https://<ip_address>:2380"Copy to Clipboard Copied! Toggle word wrap Toggle overflow 其中:
<new_node>- 指定新 control plane 节点的名称
<ip_address>- 指定新节点的 IP 地址。
运行以下命令退出 rsh shell:
exit
sh-4.2# exitCopy to Clipboard Copied! Toggle word wrap Toggle overflow
运行以下命令来强制重新部署 etcd:
oc patch etcd cluster -p='{"spec": {"forceRedeploymentReason": "single-master-recovery-'"$( date --rfc-3339=ns )"'"}}' --type=merge$ oc patch etcd cluster -p='{"spec": {"forceRedeploymentReason": "single-master-recovery-'"$( date --rfc-3339=ns )"'"}}' --type=mergeCopy to Clipboard Copied! Toggle word wrap Toggle overflow 运行以下命令重新打开 etcd 仲裁保护:
oc patch etcd/cluster --type=merge -p '{"spec": {"unsupportedConfigOverrides": null}}'$ oc patch etcd/cluster --type=merge -p '{"spec": {"unsupportedConfigOverrides": null}}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow 运行以下命令监控集群 Operator 推出部署:
watch oc get co
$ watch oc get coCopy to Clipboard Copied! Toggle word wrap Toggle overflow