第 8 章 替换 DistributedComputeHCI 节点
在硬件维护期间,您可能需要在边缘站点缩减、扩展或替换 DistributedComputeHCI 节点。要替换 DistributedComputeHCI 节点,请从要替换的节点中删除服务,扩展节点数量,然后按照流程重新扩展这些节点。
8.1. 删除 Red Hat Ceph Storage 服务
在从集群中删除 HCI (超融合)节点前,您必须删除 Red Hat Ceph Storage 服务。要删除 Red Hat Ceph 服务,您必须从您要删除的节点上的集群服务禁用和移除 ceph-osd
服务,然后停止并禁用 mon
、mgr
和 osd
服务。
流程
在 undercloud 上,使用 SSH 连接到您要删除的 DistributedComputeHCI 节点:
$ ssh tripleo-admin@<dcn-computehci-node>
启动 cephadm shell。为要删除的主机使用配置文件和密钥环文件:
$ sudo cephadm shell --config /etc/ceph/dcn2.conf \ --keyring /etc/ceph/dcn2.client.admin.keyring
记录与您要删除的 DistributedComputeHCI 节点关联的 OSD (对象存储设备),以便在以后的步骤中使用引用:
[ceph: root@dcn2-computehci2-1 ~]# ceph osd tree -c /etc/ceph/dcn2.conf … -3 0.24399 host dcn2-computehci2-1 1 hdd 0.04880 osd.1 up 1.00000 1.00000 7 hdd 0.04880 osd.7 up 1.00000 1.00000 11 hdd 0.04880 osd.11 up 1.00000 1.00000 15 hdd 0.04880 osd.15 up 1.00000 1.00000 18 hdd 0.04880 osd.18 up 1.00000 1.00000 …
使用 SSH 连接到同一集群中的另一节点,并从集群中移除该监控器:
$ sudo cephadm shell --config /etc/ceph/dcn2.conf \ --keyring /etc/ceph/dcn2.client.admin.keyring [ceph: root@dcn-computehci2-0]# ceph mon remove dcn2-computehci2-1 -c /etc/ceph/dcn2.conf removing mon.dcn2-computehci2-1 at [v2:172.23.3.153:3300/0,v1:172.23.3.153:6789/0], there will be 2 monitors
- 使用 SSH 再次登录到您要从集群中删除的节点。
停止并禁用
mgr
服务:[tripleo-admin@dcn2-computehci2-1 ~]$ sudo systemctl --type=service | grep ceph ceph-crash@dcn2-computehci2-1.service loaded active running Ceph crash dump collector ceph-mgr@dcn2-computehci2-1.service loaded active running Ceph Manager [tripleo-admin@dcn2-computehci2-1 ~]$ sudo systemctl stop ceph-mgr@dcn2-computehci2-1 [tripleo-admin@dcn2-computehci2-1 ~]$ sudo systemctl --type=service | grep ceph ceph-crash@dcn2-computehci2-1.service loaded active running Ceph crash dump collector [tripleo-admin@dcn2-computehci2-1 ~]$ sudo systemctl disable ceph-mgr@dcn2-computehci2-1 Removed /etc/systemd/system/multi-user.target.wants/ceph-mgr@dcn2-computehci2-1.service.
启动 cephadm shell:
$ sudo cephadm shell --config /etc/ceph/dcn2.conf \ --keyring /etc/ceph/dcn2.client.admin.keyring
验证节点的
mgr
服务是否已从集群中移除:[ceph: root@dcn2-computehci2-1 ~]# ceph -s cluster: id: b9b53581-d590-41ac-8463-2f50aa985001 health: HEALTH_WARN 3 pools have too many placement groups mons are allowing insecure global_id reclaim services: mon: 2 daemons, quorum dcn2-computehci2-2,dcn2-computehci2-0 (age 2h) mgr: dcn2-computehci2-2(active, since 20h), standbys: dcn2-computehci2-0 1 osd: 15 osds: 15 up (since 3h), 15 in (since 3h) data: pools: 3 pools, 384 pgs objects: 32 objects, 88 MiB usage: 16 GiB used, 734 GiB / 750 GiB avail pgs: 384 active+clean
- 1
- 当 mgr 服务被成功移除时,将不再列出 mgr 服务的节点。
导出 Red Hat Ceph Storage 规格:
[ceph: root@dcn2-computehci2-1 ~]# ceph orch ls --export > spec.yml
编辑
spec.yaml
文件中的规格:- 从 spec.yml 中删除主机 <dcn-computehci-node> 的所有实例
从以下内容中删除 <dcn-computehci-node> 条目的所有实例:
- service_type: osd
- service_type: mon
- service_type: host
重新应用 Red Hat Ceph Storage 规格:
[ceph: root@dcn2-computehci2-1 /]# ceph orch apply -i spec.yml
删除您使用
ceph osd tree
识别的 OSD:[ceph: root@dcn2-computehci2-1 /]# ceph orch osd rm --zap 1 7 11 15 18 Scheduled OSD(s) for removal
验证正在移除的 OSD 的状态。不要继续,直到以下命令没有返回输出:
[ceph: root@dcn2-computehci2-1 /]# ceph orch osd rm status OSD_ID HOST STATE PG_COUNT REPLACE FORCE DRAIN_STARTED_AT 1 dcn2-computehci2-1 draining 27 False False 2021-04-23 21:35:51.215361 7 dcn2-computehci2-1 draining 8 False False 2021-04-23 21:35:49.111500 11 dcn2-computehci2-1 draining 14 False False 2021-04-23 21:35:50.243762
验证您要删除的主机上没有保留守护进程:
[ceph: root@dcn2-computehci2-1 /]# ceph orch ps dcn2-computehci2-1
如果守护进程仍然存在,您可以使用以下命令删除它们:
[ceph: root@dcn2-computehci2-1 /]# ceph orch host drain dcn2-computehci2-1
从 Red Hat Ceph Storage 集群中删除 <dcn-computehci-node> 主机:
[ceph: root@dcn2-computehci2-1 /]# ceph orch host rm dcn2-computehci2-1 Removed host ‘dcn2-computehci2-1’