OpenShift Container Storage is now OpenShift Data Foundation starting with version 4.9.
第 3 章 动态置备的 OpenShift Container Storage(在 Red Hat Virtualization 上部署)
3.1. 在 Red Hat Virtualization 安装程序置备的基础架构中替换操作或失败的存储设备 复制链接链接已复制到粘贴板!
当一个或多个虚拟机磁盘(VMDK)需要替换在 Red Hat Virtualization 基础架构上部署的 OpenShift Container Storage 中时,使用此流程。这个过程有助于在新卷上创建新持久性卷声明 (PVC) 并删除旧的对象存储设备 (OSD)。
先决条件
确保数据具有弹性。
-
在 OpenShift Web 控制台中,导航到 Storage
Overview。 - 在 Status 卡中的 Persistent Storage 下,确认 Data Resiliency 有一个绿色勾号标记。
-
在 OpenShift Web 控制台中,导航到 Storage
流程
确定需要替换的 OSD,以及在其上调度 OSD 的 OpenShift Container Platform 节点。
oc get -n openshift-storage pods -l app=rook-ceph-osd -o wide
$ oc get -n openshift-storage pods -l app=rook-ceph-osd -o wide
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 输出示例:
rook-ceph-osd-0-6d77d6c7c6-m8xj6 0/1 CrashLoopBackOff 0 24h 10.129.0.16 compute-2 <none> <none> rook-ceph-osd-1-85d99fb95f-2svc7 1/1 Running 0 24h 10.128.2.24 compute-0 <none> <none> rook-ceph-osd-2-6c66cdb977-jp542 1/1 Running 0 24h 10.130.0.18 compute-1 <none> <none>
rook-ceph-osd-0-6d77d6c7c6-m8xj6 0/1 CrashLoopBackOff 0 24h 10.129.0.16 compute-2 <none> <none> rook-ceph-osd-1-85d99fb95f-2svc7 1/1 Running 0 24h 10.128.2.24 compute-0 <none> <none> rook-ceph-osd-2-6c66cdb977-jp542 1/1 Running 0 24h 10.130.0.18 compute-1 <none> <none>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 在本例中,
rook-ceph-osd-0-6d77d6c7c6-m8xj6
需要替换,compute-2
是调度 OSD 的 OpenShift Container Platform 节点。注意如果要更换的 OSD 处于健康状态,则 Pod 的状态将为
Running
。缩减 OSD 部署,以替换 OSD。
每次您要替换 OSD 时,通过将
osd_id_to_remove
参数更新为 OSD ID 来重复这一步。osd_id_to_remove=0 oc scale -n openshift-storage deployment rook-ceph-osd-${osd_id_to_remove} --replicas=0
$ osd_id_to_remove=0 $ oc scale -n openshift-storage deployment rook-ceph-osd-${osd_id_to_remove} --replicas=0
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 其中,
osd_id_to_remove
是 pod 名称中紧接在rook-ceph-osd
前缀后面的整数。在本例中,部署名称为rook-ceph-osd-0
。输出示例:
deployment.extensions/rook-ceph-osd-0 scaled
deployment.extensions/rook-ceph-osd-0 scaled
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 验证
rook-ceph-osd
pod 是否已终止。oc get -n openshift-storage pods -l ceph-osd-id=${osd_id_to_remove}
$ oc get -n openshift-storage pods -l ceph-osd-id=${osd_id_to_remove}
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 输出示例:
No resources found.
No resources found.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 注意如果
rook-ceph-osd
pod 处于terminating
状态,请使用force
选项删除 pod。oc delete pod rook-ceph-osd-0-6d77d6c7c6-m8xj6 --force --grace-period=0
$ oc delete pod rook-ceph-osd-0-6d77d6c7c6-m8xj6 --force --grace-period=0
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 输出示例:
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely. pod "rook-ceph-osd-0-6d77d6c7c6-m8xj6" force deleted
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely. pod "rook-ceph-osd-0-6d77d6c7c6-m8xj6" force deleted
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 从集群中移除旧 OSD,以便能够添加新 OSD。
删除所有旧的
ocs-osd-removal
任务。oc delete -n openshift-storage job ocs-osd-removal-job
$ oc delete -n openshift-storage job ocs-osd-removal-job
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 输出示例:
job.batch "ocs-osd-removal-job"
job.batch "ocs-osd-removal-job"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 更改到
openshift-storage
项目。oc project openshift-storage
$ oc project openshift-storage
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 从集群中移除旧 OSD。
oc process -n openshift-storage ocs-osd-removal \ -p FAILED_OSD_IDS=<failed_osd_id> FORCE_OSD_REMOVAL=false | oc create -n openshift-storage -f -
$ oc process -n openshift-storage ocs-osd-removal \ -p FAILED_OSD_IDS=<failed_osd_id> FORCE_OSD_REMOVAL=false | oc create -n openshift-storage -f -
Copy to Clipboard Copied! Toggle word wrap Toggle overflow <failed_osd_id>
是
rook-ceph-osd
前缀后立即的 pod 名称中的整数。您可以在 命令中添加以逗号分隔的 OSD ID,以删除多个 OSD,如FAILED_OSD_IDS=0,1,2
。在只有三个 OSD 的集群中,
FORCE_OSD_REMOVAL
值必须更改为true
,或者空间不足的集群才能在 OSD 被删除后恢复所有三个数据副本。警告这一步会导致 OSD 完全从集群中移除。确保提供了
osd_id_to_remove
的正确值。
通过检查
ocs-osd-removal
pod 的状态,验证 OSD 是否已成功移除。状态为Completed
,确认 OSD 移除作业已成功。oc get pod -l job-name=ocs-osd-removal-job -n openshift-storage
$ oc get pod -l job-name=ocs-osd-removal-job -n openshift-storage
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 注意如果
ocs-osd-removal
失败且 pod 不处于预期的Completed
状态,请检查 pod 日志以进一步调试。例如:oc logs -l job-name=ocs-osd-removal-job -n openshift-storage --tail=-1'
$ oc logs -l job-name=ocs-osd-removal-job -n openshift-storage --tail=-1'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 如果在安装时启用了加密,在从相应 OpenShift Container Storage 节点中删除的 OSD 设备中删除
dm-crypt
关联的device-mapper
映射。从
ocs-osd-removal-job
pod 日志中获取所替换 OSD 的 PVC 名称:oc logs -l job-name=ocs-osd-removal-job -n openshift-storage --tail=-1 |egrep -i ‘pvc|deviceset’
$ oc logs -l job-name=ocs-osd-removal-job -n openshift-storage --tail=-1 |egrep -i ‘pvc|deviceset’
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 例如:
2021-05-12 14:31:34.666000 I | cephosd: removing the OSD PVC "ocs-deviceset-xxxx-xxx-xxx-xxx"
2021-05-12 14:31:34.666000 I | cephosd: removing the OSD PVC "ocs-deviceset-xxxx-xxx-xxx-xxx"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 对于第 #1 步中指定的每个节点,请执行以下操作:
创建
debug
pod 和chroot
到存储节点上的主机。oc debug node/<node name> chroot /host
$ oc debug node/<node name> $ chroot /host
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 根据上一步中标识的 PVC 名称查找相关的设备名称
dmsetup ls| grep <pvc name>
sh-4.4# dmsetup ls| grep <pvc name> ocs-deviceset-xxx-xxx-xxx-xxx-block-dmcrypt (253:0)
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 删除映射的设备。
cryptsetup luksClose --debug --verbose ocs-deviceset-xxx-xxx-xxx-xxx-block-dmcrypt
$ cryptsetup luksClose --debug --verbose ocs-deviceset-xxx-xxx-xxx-xxx-block-dmcrypt
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 注意如果上述命令因为权限不足而卡住,请运行以下命令:
-
按
CTRL+Z
退出上述命令。 查找阻塞的进程的 PID。
ps -ef | grep crypt
$ ps -ef | grep crypt
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 使用
kill
命令终止进程。kill -9 <PID>
$ kill -9 <PID>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 验证设备名称是否已移除。
dmsetup ls
$ dmsetup ls
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
-
按
删除
ocs-osd-removal
任务。oc delete -n openshift-storage job ocs-osd-removal-job
$ oc delete -n openshift-storage job ocs-osd-removal-job
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 输出示例:
job.batch "ocs-osd-removal-job" deleted
job.batch "ocs-osd-removal-job" deleted
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
使用带有数据加密的外部密钥管理系统(KMS)时,可以从 Vault 服务器中删除旧的 OSD 加密密钥,因为它现在是孤立的密钥。
验证步骤
验证是否有新的 OSD 正在运行。
oc get -n openshift-storage pods -l app=rook-ceph-osd
$ oc get -n openshift-storage pods -l app=rook-ceph-osd
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 输出示例:
rook-ceph-osd-0-5f7f4747d4-snshw 1/1 Running 0 4m47s rook-ceph-osd-1-85d99fb95f-2svc7 1/1 Running 0 1d20h rook-ceph-osd-2-6c66cdb977-jp542 1/1 Running 0 1d20h
rook-ceph-osd-0-5f7f4747d4-snshw 1/1 Running 0 4m47s rook-ceph-osd-1-85d99fb95f-2svc7 1/1 Running 0 1d20h rook-ceph-osd-2-6c66cdb977-jp542 1/1 Running 0 1d20h
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 验证是否创建了处于
Bound
状态的新 PVC。oc get -n openshift-storage pvc
$ oc get -n openshift-storage pvc
Copy to Clipboard Copied! Toggle word wrap Toggle overflow (可选)如果在集群中启用了集群范围的加密,请验证新 OSD 设备是否已加密。
识别运行新 OSD pod 的节点。
oc get -o=custom-columns=NODE:.spec.nodeName pod/<OSD pod name>
$ oc get -o=custom-columns=NODE:.spec.nodeName pod/<OSD pod name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 例如:
oc get -o=custom-columns=NODE:.spec.nodeName pod/rook-ceph-osd-0-544db49d7f-qrgqm
oc get -o=custom-columns=NODE:.spec.nodeName pod/rook-ceph-osd-0-544db49d7f-qrgqm
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 对于上一步中确定的每个节点,请执行以下操作:
创建调试 pod,并为所选主机打开 chroot 环境。
oc debug node/<node name> chroot /host
$ oc debug node/<node name> $ chroot /host
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 运行 "lsblk" 并检查
ocs-deviceset
名旁边的 "crypt" 关键字。lsblk
$ lsblk
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
登录 OpenShift Web 控制台并查看存储仪表板。
图 3.1. 在设备替换后,OpenShift Container Platform 存储仪表板中的 OSD 状态