8.2. 在内部环境中备份和恢复 etcd
您可以在内部环境中的托管集群中备份和恢复 etcd,以修复失败。
8.2.1. 在内部环境中的托管集群中备份和恢复 etcd 复制链接链接已复制到粘贴板!
通过在托管集群中备份和恢复 etcd,您可以修复故障,如在三个节点集群的 etcd 成员中损坏或缺少数据。如果 etcd 集群的多个成员遇到数据丢失或具有 CrashLoopBackOff
状态,则这种方法有助于防止 etcd 仲裁丢失。
此流程需要 API 停机时间。
先决条件
-
已安装
oc
和jq
二进制文件。
流程
首先,设置环境变量并缩减 API 服务器:
输入以下命令为您的托管集群设置环境变量,根据需要替换值:
CLUSTER_NAME=my-cluster
$ CLUSTER_NAME=my-cluster
Copy to Clipboard Copied! Toggle word wrap Toggle overflow HOSTED_CLUSTER_NAMESPACE=clusters
$ HOSTED_CLUSTER_NAMESPACE=clusters
Copy to Clipboard Copied! Toggle word wrap Toggle overflow CONTROL_PLANE_NAMESPACE="${HOSTED_CLUSTER_NAMESPACE}-${CLUSTER_NAME}"
$ CONTROL_PLANE_NAMESPACE="${HOSTED_CLUSTER_NAMESPACE}-${CLUSTER_NAME}"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 输入以下命令暂停托管集群的协调,根据需要替换值:
oc patch -n ${HOSTED_CLUSTER_NAMESPACE} hostedclusters/${CLUSTER_NAME} -p '{"spec":{"pausedUntil":"true"}}' --type=merge
$ oc patch -n ${HOSTED_CLUSTER_NAMESPACE} hostedclusters/${CLUSTER_NAME} -p '{"spec":{"pausedUntil":"true"}}' --type=merge
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 输入以下命令缩减 API 服务器:
缩减
kube-apiserver
:oc scale -n ${CONTROL_PLANE_NAMESPACE} deployment/kube-apiserver --replicas=0
$ oc scale -n ${CONTROL_PLANE_NAMESPACE} deployment/kube-apiserver --replicas=0
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 缩减
openshift-apiserver
:oc scale -n ${CONTROL_PLANE_NAMESPACE} deployment/openshift-apiserver --replicas=0
$ oc scale -n ${CONTROL_PLANE_NAMESPACE} deployment/openshift-apiserver --replicas=0
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 缩减
openshift-oauth-apiserver
:oc scale -n ${CONTROL_PLANE_NAMESPACE} deployment/openshift-oauth-apiserver --replicas=0
$ oc scale -n ${CONTROL_PLANE_NAMESPACE} deployment/openshift-oauth-apiserver --replicas=0
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
接下来,使用以下方法之一生成 etcd 快照:
- 使用之前备份的 etcd 快照。
如果您有可用的 etcd pod,通过完成以下步骤从活跃 etcd pod 创建快照:
输入以下命令列出 etcd pod:
oc get -n ${CONTROL_PLANE_NAMESPACE} pods -l app=etcd
$ oc get -n ${CONTROL_PLANE_NAMESPACE} pods -l app=etcd
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 输入以下命令为 pod 数据库生成快照并将其保存在您的机器中:
ETCD_POD=etcd-0
$ ETCD_POD=etcd-0
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Copy to Clipboard Copied! Toggle word wrap Toggle overflow 输入以下命令验证快照是否成功:
oc exec -n ${CONTROL_PLANE_NAMESPACE} -c etcd -t ${ETCD_POD} -- env ETCDCTL_API=3 /usr/bin/etcdctl -w table snapshot status /var/lib/snapshot.db
$ oc exec -n ${CONTROL_PLANE_NAMESPACE} -c etcd -t ${ETCD_POD} -- env ETCDCTL_API=3 /usr/bin/etcdctl -w table snapshot status /var/lib/snapshot.db
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
输入以下命令制作快照的本地副本:
oc cp -c etcd ${CONTROL_PLANE_NAMESPACE}/${ETCD_POD}:/var/lib/snapshot.db /tmp/etcd.snapshot.db
$ oc cp -c etcd ${CONTROL_PLANE_NAMESPACE}/${ETCD_POD}:/var/lib/snapshot.db /tmp/etcd.snapshot.db
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 从 etcd 持久性存储生成快照数据库副本:
输入以下命令列出 etcd pod:
oc get -n ${CONTROL_PLANE_NAMESPACE} pods -l app=etcd
$ oc get -n ${CONTROL_PLANE_NAMESPACE} pods -l app=etcd
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 输入以下命令查找正在运行的 pod,并将其名称设置为
ETCD_POD: ETCD_POD=etcd-0
,然后复制其快照数据库:oc cp -c etcd ${CONTROL_PLANE_NAMESPACE}/${ETCD_POD}:/var/lib/data/member/snap/db /tmp/etcd.snapshot.db
$ oc cp -c etcd ${CONTROL_PLANE_NAMESPACE}/${ETCD_POD}:/var/lib/data/member/snap/db /tmp/etcd.snapshot.db
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
接下来,输入以下命令缩减 etcd statefulset:
oc scale -n ${CONTROL_PLANE_NAMESPACE} statefulset/etcd --replicas=0
$ oc scale -n ${CONTROL_PLANE_NAMESPACE} statefulset/etcd --replicas=0
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 输入以下命令删除第二个和第三个成员的卷:
oc delete -n ${CONTROL_PLANE_NAMESPACE} pvc/data-etcd-1 pvc/data-etcd-2
$ oc delete -n ${CONTROL_PLANE_NAMESPACE} pvc/data-etcd-1 pvc/data-etcd-2
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 创建 pod 以访问第一个 etcd 成员的数据:
输入以下命令来获取 etcd 镜像:
ETCD_IMAGE=$(oc get -n ${CONTROL_PLANE_NAMESPACE} statefulset/etcd -o jsonpath='{ .spec.template.spec.containers[0].image }')
$ ETCD_IMAGE=$(oc get -n ${CONTROL_PLANE_NAMESPACE} statefulset/etcd -o jsonpath='{ .spec.template.spec.containers[0].image }')
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 创建允许访问 etcd 数据的 pod:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 输入以下命令检查
etcd-data
pod 的状态并等待它正在运行:oc get -n ${CONTROL_PLANE_NAMESPACE} pods -l app=etcd-data
$ oc get -n ${CONTROL_PLANE_NAMESPACE} pods -l app=etcd-data
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 输入以下命令来获取
etcd-data
pod 的名称:DATA_POD=$(oc get -n ${CONTROL_PLANE_NAMESPACE} pods --no-headers -l app=etcd-data -o name | cut -d/ -f2)
$ DATA_POD=$(oc get -n ${CONTROL_PLANE_NAMESPACE} pods --no-headers -l app=etcd-data -o name | cut -d/ -f2)
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
输入以下命令将 etcd 快照复制到 pod 中:
oc cp /tmp/etcd.snapshot.db ${CONTROL_PLANE_NAMESPACE}/${DATA_POD}:/var/lib/restored.snap.db
$ oc cp /tmp/etcd.snapshot.db ${CONTROL_PLANE_NAMESPACE}/${DATA_POD}:/var/lib/restored.snap.db
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 输入以下命令从
etcd-data
pod 中删除旧数据:oc exec -n ${CONTROL_PLANE_NAMESPACE} ${DATA_POD} -- rm -rf /var/lib/data
$ oc exec -n ${CONTROL_PLANE_NAMESPACE} ${DATA_POD} -- rm -rf /var/lib/data
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc exec -n ${CONTROL_PLANE_NAMESPACE} ${DATA_POD} -- mkdir -p /var/lib/data
$ oc exec -n ${CONTROL_PLANE_NAMESPACE} ${DATA_POD} -- mkdir -p /var/lib/data
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 输入以下命令恢复 etcd 快照:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 输入以下命令从 pod 中删除临时 etcd 快照:
oc exec -n ${CONTROL_PLANE_NAMESPACE} ${DATA_POD} -- rm /var/lib/restored.snap.db
$ oc exec -n ${CONTROL_PLANE_NAMESPACE} ${DATA_POD} -- rm /var/lib/restored.snap.db
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 输入以下命令删除数据访问部署:
oc delete -n ${CONTROL_PLANE_NAMESPACE} deployment/etcd-data
$ oc delete -n ${CONTROL_PLANE_NAMESPACE} deployment/etcd-data
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 输入以下命令扩展 etcd 集群:
oc scale -n ${CONTROL_PLANE_NAMESPACE} statefulset/etcd --replicas=3
$ oc scale -n ${CONTROL_PLANE_NAMESPACE} statefulset/etcd --replicas=3
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 输入以下命令等待 etcd 成员 pod 返回并报告 available:
oc get -n ${CONTROL_PLANE_NAMESPACE} pods -l app=etcd -w
$ oc get -n ${CONTROL_PLANE_NAMESPACE} pods -l app=etcd -w
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 输入以下命令扩展所有 etcd-writer 部署:
oc scale deployment -n ${CONTROL_PLANE_NAMESPACE} --replicas=3 kube-apiserver openshift-apiserver openshift-oauth-apiserver
$ oc scale deployment -n ${CONTROL_PLANE_NAMESPACE} --replicas=3 kube-apiserver openshift-apiserver openshift-oauth-apiserver
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
输入以下命令恢复托管集群的协调:
oc patch -n ${CLUSTER_NAMESPACE} hostedclusters/${CLUSTER_NAME} -p '{"spec":{"pausedUntil":""}}' --type=merge
$ oc patch -n ${CLUSTER_NAMESPACE} hostedclusters/${CLUSTER_NAME} -p '{"spec":{"pausedUntil":""}}' --type=merge
Copy to Clipboard Copied! Toggle word wrap Toggle overflow