此内容没有您所选择的语言版本。
Chapter 6. Migrating Red Hat Ceph Storage RBD to external RHEL nodes
For hyperconverged infrastructure (HCI) or dedicated Storage nodes that are running Red Hat Ceph Storage version 6 or later, you must migrate the daemons that are included in the Red Hat OpenStack Platform control plane into the existing external Red Hat Enterprise Linux (RHEL) nodes. The external RHEL nodes typically include the Compute nodes for an HCI environment or dedicated storage nodes.
To migrate Red Hat Ceph Storage Rados Block Device (RBD), your environment must meet the following requirements:
- Red Hat Ceph Storage is running version 6 or later and is managed by cephadm/orchestrator.
- NFS (ganesha) is migrated from a director-based deployment to cephadm. For more information, see Creating a NFS Ganesha cluster.
- Both the Red Hat Ceph Storage public and cluster networks are propagated, with director, to the target nodes.
- Ceph Monitors need to keep their IPs to avoid cold migration.
6.1. Migrating Ceph Monitor and Ceph Manager daemons to Red Hat Ceph Storage nodes
Migrate your Ceph Monitor daemons, Ceph Manager daemons, and object storage daemons (OSDs) from your Red Hat OpenStack Platform Controller nodes to existing Red Hat Ceph Storage nodes. During the migration, ensure that you can do the following actions:
- Keep the mon IP addresses by moving them to the Red Hat Ceph Storage nodes.
- Drain the existing Controller nodes and shut them down.
- Deploy additional monitors to the existing nodes, and promote them as _admin nodes that administrators can use to manage the Red Hat Ceph Storage cluster and perform day 2 operations against it.
- Keep the Red Hat Ceph Storage cluster operational during the migration.
The following procedure shows an example migration from a Controller node (oc0-controller-1
) and a Red Hat Ceph Storage node (oc0-ceph-0
). Use the names of the nodes in your environment.
Prerequisites
Configure the Storage nodes to have both storage and storage_mgmt network to ensure that you can use both Red Hat Ceph Storage public and cluster networks. This step requires you to interact with director. From Red Hat OpenStack Platform 17.1 and later you do not have to run a stack update. However, there are commands that you must perform to run
os-net-config
on the bare metal node and configure additional networks.Ensure that the network is defined in the
metalsmith.yaml
for the CephStorageNodes:- name: CephStorage count: 2 instances: - hostname: oc0-ceph-0 name: oc0-ceph-0 - hostname: oc0-ceph-1 name: oc0-ceph-1 defaults: networks: - network: ctlplane vif: true - network: storage_cloud_0 subnet: storage_cloud_0_subnet - network: storage_mgmt_cloud_0 subnet: storage_mgmt_cloud_0_subnet network_config: template: templates/single_nic_vlans/single_nic_vlans_storage.j2
Run the following command:
openstack overcloud node provision \ -o overcloud-baremetal-deployed-0.yaml --stack overcloud-0 \ --network-config -y --concurrency 2 /home/stack/metalsmith-0.yam
Verify that the storage network is running on the node:
(undercloud) [CentOS-9 - stack@undercloud ~]$ ssh heat-admin@192.168.24.14 ip -o -4 a Warning: Permanently added '192.168.24.14' (ED25519) to the list of known hosts. 1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever 5: br-storage inet 192.168.24.14/24 brd 192.168.24.255 scope global br-storage\ valid_lft forever preferred_lft forever 6: vlan1 inet 192.168.24.14/24 brd 192.168.24.255 scope global vlan1\ valid_lft forever preferred_lft forever 7: vlan11 inet 172.16.11.172/24 brd 172.16.11.255 scope global vlan11\ valid_lft forever preferred_lft forever 8: vlan12 inet 172.16.12.46/24 brd 172.16.12.255 scope global vlan12\ valid_lft forever preferred_lft forever
Procedure
To migrate mon(s) and mgr(s) on the two existing Red Hat Ceph Storage nodes, create a Red Hat Ceph Storage spec based on the default roles with the mon/mgr on the controller nodes.
openstack overcloud ceph spec -o ceph_spec.yaml -y \ --stack overcloud-0 overcloud-baremetal-deployed-0.yaml
Deploy the Red Hat Ceph Storage cluster:
openstack overcloud ceph deploy overcloud-baremetal-deployed-0.yaml \ --stack overcloud-0 -o deployed_ceph.yaml \ --network-data ~/oc0-network-data.yaml \ --ceph-spec ~/ceph_spec.yaml
NoteThe
ceph_spec.yaml
, which is the OSP-generated description of the Red Hat Ceph Storage cluster, will be used, later in the process, as the basic template required by cephadm to update the status/info of the daemons.Check the status of the Red Hat Ceph Storage cluster:
[ceph: root@oc0-controller-0 /]# ceph -s cluster: id: f6ec3ebe-26f7-56c8-985d-eb974e8e08e3 health: HEALTH_OK services: mon: 3 daemons, quorum oc0-controller-0,oc0-controller-1,oc0-controller-2 (age 19m) mgr: oc0-controller-0.xzgtvo(active, since 32m), standbys: oc0-controller-1.mtxohd, oc0-controller-2.ahrgsk osd: 8 osds: 8 up (since 12m), 8 in (since 18m); 1 remapped pgs data: pools: 1 pools, 1 pgs objects: 0 objects, 0 B usage: 43 MiB used, 400 GiB / 400 GiB avail pgs: 1 active+clean
[ceph: root@oc0-controller-0 /]# ceph orch host ls HOST ADDR LABELS STATUS oc0-ceph-0 192.168.24.14 osd oc0-ceph-1 192.168.24.7 osd oc0-controller-0 192.168.24.15 _admin mgr mon oc0-controller-1 192.168.24.23 _admin mgr mon oc0-controller-2 192.168.24.13 _admin mgr mon
Log in to the
controller-0
node, thencephadm shell -v /home/ceph-admin/specs:/specs
Log in to the
ceph-0
node, thensudo “watch podman ps” # watch the new mon/mgr being deployed here
Optional: If mgr is active in the source node, then:
ceph mgr fail <mgr instance>
From the cephadm shell, remove the labels on
oc0-controller-1
:for label in mon mgr _admin; do ceph orch host rm label oc0-controller-1 $label; done
Add the missing labels to
oc0-ceph-0
:[ceph: root@oc0-controller-0 /]# > for label in mon mgr _admin; do ceph orch host label add oc0-ceph-0 $label; done Added label mon to host oc0-ceph-0 Added label mgr to host oc0-ceph-0 Added label _admin to host oc0-ceph-0
Drain and force-remove the
oc0-controller-1
node:[ceph: root@oc0-controller-0 /]# ceph orch host drain oc0-controller-1 Scheduled to remove the following daemons from host 'oc0-controller-1' type id -------------------- --------------- mon oc0-controller-1 mgr oc0-controller-1.mtxohd crash oc0-controller-1
[ceph: root@oc0-controller-0 /]# ceph orch host rm oc0-controller-1 --force Removed host 'oc0-controller-1' [ceph: root@oc0-controller-0 /]# ceph orch host ls HOST ADDR LABELS STATUS oc0-ceph-0 192.168.24.14 osd oc0-ceph-1 192.168.24.7 osd oc0-controller-0 192.168.24.15 mgr mon _admin oc0-controller-2 192.168.24.13 _admin mgr mon
If you have only 3 mon nodes, and the drain of the node doesn’t work as expected (the containers are still there), then log in to controller-1 and force-purge the containers in the node:
[root@oc0-controller-1 ~]# sudo podman ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 5c1ad36472bc registry.redhat.io/ceph/rhceph@sha256:320c364dcc8fc8120e2a42f54eb39ecdba12401a2546763b7bef15b02ce93bc4 -n mon.oc0-contro... 35 minutes ago Up 35 minutes ago ceph-f6ec3ebe-26f7-56c8-985d-eb974e8e08e3-mon-oc0-controller-1 3b14cc7bf4dd registry.redhat.io/ceph/rhceph@sha256:320c364dcc8fc8120e2a42f54eb39ecdba12401a2546763b7bef15b02ce93bc4 -n mgr.oc0-contro... 35 minutes ago Up 35 minutes ago ceph-f6ec3ebe-26f7-56c8-985d-eb974e8e08e3-mgr-oc0-controller-1-mtxohd [root@oc0-controller-1 ~]# cephadm rm-cluster --fsid f6ec3ebe-26f7-56c8-985d-eb974e8e08e3 --force [root@oc0-controller-1 ~]# sudo podman ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
NoteCephadm rm-cluster on a node that is not part of the cluster anymore has the effect of removing all the containers and doing some cleanup on the filesystem.
Before shutting the oc0-controller-1 down, move the IP address (on the same network) to the oc0-ceph-0 node:
mon_host = [v2:172.16.11.54:3300/0,v1:172.16.11.54:6789/0] [v2:172.16.11.121:3300/0,v1:172.16.11.121:6789/0] [v2:172.16.11.205:3300/0,v1:172.16.11.205:6789/0] [root@oc0-controller-1 ~]# ip -o -4 a 1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever 5: br-ex inet 192.168.24.23/24 brd 192.168.24.255 scope global br-ex\ valid_lft forever preferred_lft forever 6: vlan100 inet 192.168.100.96/24 brd 192.168.100.255 scope global vlan100\ valid_lft forever preferred_lft forever 7: vlan12 inet 172.16.12.154/24 brd 172.16.12.255 scope global vlan12\ valid_lft forever preferred_lft forever 8: vlan11 inet 172.16.11.121/24 brd 172.16.11.255 scope global vlan11\ valid_lft forever preferred_lft forever 9: vlan13 inet 172.16.13.178/24 brd 172.16.13.255 scope global vlan13\ valid_lft forever preferred_lft forever 10: vlan70 inet 172.17.0.23/20 brd 172.17.15.255 scope global vlan70\ valid_lft forever preferred_lft forever 11: vlan1 inet 192.168.24.23/24 brd 192.168.24.255 scope global vlan1\ valid_lft forever preferred_lft forever 12: vlan14 inet 172.16.14.223/24 brd 172.16.14.255 scope global vlan14\ valid_lft forever preferred_lft forever
On the oc0-ceph-0, add the IP address of the mon that has been deleted from
controller-0
, and verify that the IP address has been assigned and can be reached:$ sudo ip a add 172.16.11.121 dev vlan11 $ ip -o -4 a
[heat-admin@oc0-ceph-0 ~]$ ip -o -4 a 1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever 5: br-storage inet 192.168.24.14/24 brd 192.168.24.255 scope global br-storage\ valid_lft forever preferred_lft forever 6: vlan1 inet 192.168.24.14/24 brd 192.168.24.255 scope global vlan1\ valid_lft forever preferred_lft forever 7: vlan11 inet 172.16.11.172/24 brd 172.16.11.255 scope global vlan11\ valid_lft forever preferred_lft forever 8: vlan12 inet 172.16.12.46/24 brd 172.16.12.255 scope global vlan12\ valid_lft forever preferred_lft forever [heat-admin@oc0-ceph-0 ~]$ sudo ip a add 172.16.11.121 dev vlan11 [heat-admin@oc0-ceph-0 ~]$ ip -o -4 a 1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever 5: br-storage inet 192.168.24.14/24 brd 192.168.24.255 scope global br-storage\ valid_lft forever preferred_lft forever 6: vlan1 inet 192.168.24.14/24 brd 192.168.24.255 scope global vlan1\ valid_lft forever preferred_lft forever 7: vlan11 inet 172.16.11.172/24 brd 172.16.11.255 scope global vlan11\ valid_lft forever preferred_lft forever 7: vlan11 inet 172.16.11.121/32 scope global vlan11\ valid_lft forever preferred_lft forever 8: vlan12 inet 172.16.12.46/24 brd 172.16.12.255 scope global vlan12\ valid_lft forever preferred_lft forever
- Optional: Power off oc0-controller-1.
Add the new mon on oc0-ceph-0 using the old IP address:
[ceph: root@oc0-controller-0 /]# ceph orch daemon add mon oc0-ceph-0:172.16.11.121 Deployed mon.oc0-ceph-0 on host 'oc0-ceph-0'
Check the new container in the oc0-ceph-0 node:
b581dc8bbb78 registry.redhat.io/ceph/rhceph@sha256:320c364dcc8fc8120e2a42f54eb39ecdba12401a2546763b7bef15b02ce93bc4 -n mon.oc0-ceph-0... 24 seconds ago Up 24 seconds ago ceph-f6ec3ebe-26f7-56c8-985d-eb974e8e08e3-mon-oc0-ceph-0
On the cephadm shell, backup the existing ceph_spec.yaml, edit the spec removing any oc0-controller-1 entry, and replacing it with oc0-ceph-0:
cp ceph_spec.yaml ceph_spec.yaml.bkp # backup the ceph_spec.yaml file [ceph: root@oc0-controller-0 specs]# diff -u ceph_spec.yaml.bkp ceph_spec.yaml --- ceph_spec.yaml.bkp 2022-07-29 15:41:34.516329643 +0000 +++ ceph_spec.yaml 2022-07-29 15:28:26.455329643 +0000 @@ -7,14 +7,6 @@ - mgr service_type: host --- -addr: 192.168.24.12 -hostname: oc0-controller-1 -labels: -- _admin -- mon -- mgr -service_type: host ---- addr: 192.168.24.19 hostname: oc0-controller-2 labels: @@ -38,7 +30,7 @@ placement: hosts: - oc0-controller-0 - - oc0-controller-1 + - oc0-ceph-0 - oc0-controller-2 service_id: mon service_name: mon @@ -47,8 +39,8 @@ placement: hosts: - oc0-controller-0 - - oc0-controller-1 - oc0-controller-2 + - oc0-ceph-0 service_id: mgr service_name: mgr service_type: mgr
Apply the resulting spec:
ceph orch apply -i ceph_spec.yaml The result of 12 is having a new mgr deployed on the oc0-ceph-0 node, and the spec reconciled within cephadm [ceph: root@oc0-controller-0 specs]# ceph orch ls NAME PORTS RUNNING REFRESHED AGE PLACEMENT crash 4/4 5m ago 61m * mgr 3/3 5m ago 69s oc0-controller-0;oc0-ceph-0;oc0-controller-2 mon 3/3 5m ago 70s oc0-controller-0;oc0-ceph-0;oc0-controller-2 osd.default_drive_group 8 2m ago 69s oc0-ceph-0;oc0-ceph-1 [ceph: root@oc0-controller-0 specs]# ceph -s cluster: id: f6ec3ebe-26f7-56c8-985d-eb974e8e08e3 health: HEALTH_WARN 1 stray host(s) with 1 daemon(s) not managed by cephadm services: mon: 3 daemons, quorum oc0-controller-0,oc0-controller-2,oc0-ceph-0 (age 5m) mgr: oc0-controller-0.xzgtvo(active, since 62m), standbys: oc0-controller-2.ahrgsk, oc0-ceph-0.hccsbb osd: 8 osds: 8 up (since 42m), 8 in (since 49m); 1 remapped pgs data: pools: 1 pools, 1 pgs objects: 0 objects, 0 B usage: 43 MiB used, 400 GiB / 400 GiB avail pgs: 1 active+clean
Fix the warning by refreshing the mgr:
ceph mgr fail oc0-controller-0.xzgtvo
At this point the Red Hat Ceph Storage cluster is clean:
[ceph: root@oc0-controller-0 specs]# ceph -s cluster: id: f6ec3ebe-26f7-56c8-985d-eb974e8e08e3 health: HEALTH_OK services: mon: 3 daemons, quorum oc0-controller-0,oc0-controller-2,oc0-ceph-0 (age 7m) mgr: oc0-controller-2.ahrgsk(active, since 25s), standbys: oc0-controller-0.xzgtvo, oc0-ceph-0.hccsbb osd: 8 osds: 8 up (since 44m), 8 in (since 50m); 1 remapped pgs data: pools: 1 pools, 1 pgs objects: 0 objects, 0 B usage: 43 MiB used, 400 GiB / 400 GiB avail pgs: 1 active+clean
The
oc0-controller-1
is removed and powered off without leaving traces on the Red Hat Ceph Storage cluster.- Repeat this procedure for additional Controller nodes in your environment until you have migrated all the Ceph Mon and Ceph Manager daemons to the target nodes.