搜索

此内容没有您所选择的语言版本。

Chapter 6. Migrating Red Hat Ceph Storage RBD to external RHEL nodes

download PDF

For hyperconverged infrastructure (HCI) or dedicated Storage nodes that are running Red Hat Ceph Storage version 6 or later, you must migrate the daemons that are included in the Red Hat OpenStack Platform control plane into the existing external Red Hat Enterprise Linux (RHEL) nodes. The external RHEL nodes typically include the Compute nodes for an HCI environment or dedicated storage nodes.

To migrate Red Hat Ceph Storage Rados Block Device (RBD), your environment must meet the following requirements:

  • Red Hat Ceph Storage is running version 6 or later and is managed by cephadm/orchestrator.
  • NFS (ganesha) is migrated from a director-based deployment to cephadm. For more information, see Creating a NFS Ganesha cluster.
  • Both the Red Hat Ceph Storage public and cluster networks are propagated, with director, to the target nodes.
  • Ceph Monitors need to keep their IPs to avoid cold migration.

6.1. Migrating Ceph Monitor and Ceph Manager daemons to Red Hat Ceph Storage nodes

Migrate your Ceph Monitor daemons, Ceph Manager daemons, and object storage daemons (OSDs) from your Red Hat OpenStack Platform Controller nodes to existing Red Hat Ceph Storage nodes. During the migration, ensure that you can do the following actions:

  • Keep the mon IP addresses by moving them to the Red Hat Ceph Storage nodes.
  • Drain the existing Controller nodes and shut them down.
  • Deploy additional monitors to the existing nodes, and promote them as _admin nodes that administrators can use to manage the Red Hat Ceph Storage cluster and perform day 2 operations against it.
  • Keep the Red Hat Ceph Storage cluster operational during the migration.

The following procedure shows an example migration from a Controller node (oc0-controller-1) and a Red Hat Ceph Storage node (oc0-ceph-0). Use the names of the nodes in your environment.

Prerequisites

  • Configure the Storage nodes to have both storage and storage_mgmt network to ensure that you can use both Red Hat Ceph Storage public and cluster networks. This step requires you to interact with director. From Red Hat OpenStack Platform 17.1 and later you do not have to run a stack update. However, there are commands that you must perform to run os-net-config on the bare metal node and configure additional networks.

    1. Ensure that the network is defined in the metalsmith.yaml for the CephStorageNodes:

        - name: CephStorage
          count: 2
          instances:
            - hostname: oc0-ceph-0
              name: oc0-ceph-0
            - hostname: oc0-ceph-1
              name: oc0-ceph-1
          defaults:
            networks:
              - network: ctlplane
                vif: true
              - network: storage_cloud_0
                  subnet: storage_cloud_0_subnet
              - network: storage_mgmt_cloud_0
                  subnet: storage_mgmt_cloud_0_subnet
            network_config:
              template: templates/single_nic_vlans/single_nic_vlans_storage.j2
    2. Run the following command:

      openstack overcloud node provision \
        -o overcloud-baremetal-deployed-0.yaml --stack overcloud-0 \
        --network-config -y --concurrency 2 /home/stack/metalsmith-0.yam
    3. Verify that the storage network is running on the node:

      (undercloud) [CentOS-9 - stack@undercloud ~]$ ssh heat-admin@192.168.24.14 ip -o -4 a
      Warning: Permanently added '192.168.24.14' (ED25519) to the list of known hosts.
      1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
      5: br-storage    inet 192.168.24.14/24 brd 192.168.24.255 scope global br-storage\       valid_lft forever preferred_lft forever
      6: vlan1    inet 192.168.24.14/24 brd 192.168.24.255 scope global vlan1\       valid_lft forever preferred_lft forever
      7: vlan11    inet 172.16.11.172/24 brd 172.16.11.255 scope global vlan11\       valid_lft forever preferred_lft forever
      8: vlan12    inet 172.16.12.46/24 brd 172.16.12.255 scope global vlan12\       valid_lft forever preferred_lft forever

Procedure

  1. To migrate mon(s) and mgr(s) on the two existing Red Hat Ceph Storage nodes, create a Red Hat Ceph Storage spec based on the default roles with the mon/mgr on the controller nodes.

    openstack overcloud ceph spec -o ceph_spec.yaml -y  \
       --stack overcloud-0     overcloud-baremetal-deployed-0.yaml
  2. Deploy the Red Hat Ceph Storage cluster:

     openstack overcloud ceph deploy overcloud-baremetal-deployed-0.yaml \
        --stack overcloud-0 -o deployed_ceph.yaml \
        --network-data ~/oc0-network-data.yaml \
        --ceph-spec ~/ceph_spec.yaml
    Note

    The ceph_spec.yaml, which is the OSP-generated description of the Red Hat Ceph Storage cluster, will be used, later in the process, as the basic template required by cephadm to update the status/info of the daemons.

  3. Check the status of the Red Hat Ceph Storage cluster:

    [ceph: root@oc0-controller-0 /]# ceph -s
      cluster:
        id:     f6ec3ebe-26f7-56c8-985d-eb974e8e08e3
        health: HEALTH_OK
    
      services:
        mon: 3 daemons, quorum oc0-controller-0,oc0-controller-1,oc0-controller-2 (age 19m)
        mgr: oc0-controller-0.xzgtvo(active, since 32m), standbys: oc0-controller-1.mtxohd, oc0-controller-2.ahrgsk
        osd: 8 osds: 8 up (since 12m), 8 in (since 18m); 1 remapped pgs
    
      data:
        pools:   1 pools, 1 pgs
        objects: 0 objects, 0 B
        usage:   43 MiB used, 400 GiB / 400 GiB avail
        pgs:     1 active+clean
    [ceph: root@oc0-controller-0 /]# ceph orch host ls
    HOST              ADDR           LABELS          STATUS
    oc0-ceph-0        192.168.24.14  osd
    oc0-ceph-1        192.168.24.7   osd
    oc0-controller-0  192.168.24.15  _admin mgr mon
    oc0-controller-1  192.168.24.23  _admin mgr mon
    oc0-controller-2  192.168.24.13  _admin mgr mon
  4. Log in to the controller-0 node, then

    cephadm shell -v /home/ceph-admin/specs:/specs
  5. Log in to the ceph-0 node, then

    sudo “watch podman ps”  # watch the new mon/mgr being deployed here
  6. Optional: If mgr is active in the source node, then:

    ceph mgr fail <mgr instance>
  7. From the cephadm shell, remove the labels on oc0-controller-1:

        for label in mon mgr _admin; do
               ceph orch host rm label oc0-controller-1 $label;
        done
  8. Add the missing labels to oc0-ceph-0:

    [ceph: root@oc0-controller-0 /]#
    > for label in mon mgr _admin; do ceph orch host label add oc0-ceph-0 $label; done
    Added label mon to host oc0-ceph-0
    Added label mgr to host oc0-ceph-0
    Added label _admin to host oc0-ceph-0
  9. Drain and force-remove the oc0-controller-1 node:

    [ceph: root@oc0-controller-0 /]# ceph orch host drain oc0-controller-1
    Scheduled to remove the following daemons from host 'oc0-controller-1'
    type                 id
    -------------------- ---------------
    mon                  oc0-controller-1
    mgr                  oc0-controller-1.mtxohd
    crash                oc0-controller-1
    [ceph: root@oc0-controller-0 /]# ceph orch host rm oc0-controller-1 --force
    Removed  host 'oc0-controller-1'
    
    [ceph: root@oc0-controller-0 /]# ceph orch host ls
    HOST              ADDR           LABELS          STATUS
    oc0-ceph-0        192.168.24.14  osd
    oc0-ceph-1        192.168.24.7   osd
    oc0-controller-0  192.168.24.15  mgr mon _admin
    oc0-controller-2  192.168.24.13  _admin mgr mon
  10. If you have only 3 mon nodes, and the drain of the node doesn’t work as expected (the containers are still there), then log in to controller-1 and force-purge the containers in the node:

    [root@oc0-controller-1 ~]# sudo podman ps
    CONTAINER ID  IMAGE                                                                                        COMMAND               CREATED         STATUS             PORTS       NAMES
    5c1ad36472bc  registry.redhat.io/ceph/rhceph@sha256:320c364dcc8fc8120e2a42f54eb39ecdba12401a2546763b7bef15b02ce93bc4  -n mon.oc0-contro...  35 minutes ago  Up 35 minutes ago              ceph-f6ec3ebe-26f7-56c8-985d-eb974e8e08e3-mon-oc0-controller-1
    3b14cc7bf4dd  registry.redhat.io/ceph/rhceph@sha256:320c364dcc8fc8120e2a42f54eb39ecdba12401a2546763b7bef15b02ce93bc4  -n mgr.oc0-contro...  35 minutes ago  Up 35 minutes ago              ceph-f6ec3ebe-26f7-56c8-985d-eb974e8e08e3-mgr-oc0-controller-1-mtxohd
    
    [root@oc0-controller-1 ~]# cephadm rm-cluster --fsid f6ec3ebe-26f7-56c8-985d-eb974e8e08e3 --force
    
    [root@oc0-controller-1 ~]# sudo podman ps
    CONTAINER ID  IMAGE       COMMAND     CREATED     STATUS      PORTS       NAMES
    Note

    Cephadm rm-cluster on a node that is not part of the cluster anymore has the effect of removing all the containers and doing some cleanup on the filesystem.

  11. Before shutting the oc0-controller-1 down, move the IP address (on the same network) to the oc0-ceph-0 node:

    mon_host = [v2:172.16.11.54:3300/0,v1:172.16.11.54:6789/0] [v2:172.16.11.121:3300/0,v1:172.16.11.121:6789/0] [v2:172.16.11.205:3300/0,v1:172.16.11.205:6789/0]
    
    [root@oc0-controller-1 ~]# ip -o -4 a
    1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
    5: br-ex    inet 192.168.24.23/24 brd 192.168.24.255 scope global br-ex\       valid_lft forever preferred_lft forever
    6: vlan100    inet 192.168.100.96/24 brd 192.168.100.255 scope global vlan100\       valid_lft forever preferred_lft forever
    7: vlan12    inet 172.16.12.154/24 brd 172.16.12.255 scope global vlan12\       valid_lft forever preferred_lft forever
    8: vlan11    inet 172.16.11.121/24 brd 172.16.11.255 scope global vlan11\       valid_lft forever preferred_lft forever
    9: vlan13    inet 172.16.13.178/24 brd 172.16.13.255 scope global vlan13\       valid_lft forever preferred_lft forever
    10: vlan70    inet 172.17.0.23/20 brd 172.17.15.255 scope global vlan70\       valid_lft forever preferred_lft forever
    11: vlan1    inet 192.168.24.23/24 brd 192.168.24.255 scope global vlan1\       valid_lft forever preferred_lft forever
    12: vlan14    inet 172.16.14.223/24 brd 172.16.14.255 scope global vlan14\       valid_lft forever preferred_lft forever
  12. On the oc0-ceph-0, add the IP address of the mon that has been deleted from controller-0, and verify that the IP address has been assigned and can be reached:

    $ sudo ip a add 172.16.11.121 dev vlan11
    $ ip -o -4 a
    [heat-admin@oc0-ceph-0 ~]$ ip -o -4 a
    1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
    5: br-storage    inet 192.168.24.14/24 brd 192.168.24.255 scope global br-storage\       valid_lft forever preferred_lft forever
    6: vlan1    inet 192.168.24.14/24 brd 192.168.24.255 scope global vlan1\       valid_lft forever preferred_lft forever
    7: vlan11    inet 172.16.11.172/24 brd 172.16.11.255 scope global vlan11\       valid_lft forever preferred_lft forever
    8: vlan12    inet 172.16.12.46/24 brd 172.16.12.255 scope global vlan12\       valid_lft forever preferred_lft forever
    [heat-admin@oc0-ceph-0 ~]$ sudo ip a add 172.16.11.121 dev vlan11
    [heat-admin@oc0-ceph-0 ~]$ ip -o -4 a
    1: lo    inet 127.0.0.1/8 scope host lo\       valid_lft forever preferred_lft forever
    5: br-storage    inet 192.168.24.14/24 brd 192.168.24.255 scope global br-storage\       valid_lft forever preferred_lft forever
    6: vlan1    inet 192.168.24.14/24 brd 192.168.24.255 scope global vlan1\       valid_lft forever preferred_lft forever
    7: vlan11    inet 172.16.11.172/24 brd 172.16.11.255 scope global vlan11\       valid_lft forever preferred_lft forever
    7: vlan11    inet 172.16.11.121/32 scope global vlan11\       valid_lft forever preferred_lft forever
    8: vlan12    inet 172.16.12.46/24 brd 172.16.12.255 scope global vlan12\       valid_lft forever preferred_lft forever
  13. Optional: Power off oc0-controller-1.
  14. Add the new mon on oc0-ceph-0 using the old IP address:

    [ceph: root@oc0-controller-0 /]# ceph orch daemon add mon oc0-ceph-0:172.16.11.121
    Deployed mon.oc0-ceph-0 on host 'oc0-ceph-0'
  15. Check the new container in the oc0-ceph-0 node:

    b581dc8bbb78  registry.redhat.io/ceph/rhceph@sha256:320c364dcc8fc8120e2a42f54eb39ecdba12401a2546763b7bef15b02ce93bc4  -n mon.oc0-ceph-0...  24 seconds ago  Up 24 seconds ago              ceph-f6ec3ebe-26f7-56c8-985d-eb974e8e08e3-mon-oc0-ceph-0
  16. On the cephadm shell, backup the existing ceph_spec.yaml, edit the spec removing any oc0-controller-1 entry, and replacing it with oc0-ceph-0:

    cp ceph_spec.yaml ceph_spec.yaml.bkp # backup the ceph_spec.yaml file
    
    [ceph: root@oc0-controller-0 specs]# diff -u ceph_spec.yaml.bkp ceph_spec.yaml
    
    --- ceph_spec.yaml.bkp  2022-07-29 15:41:34.516329643 +0000
    +++ ceph_spec.yaml      2022-07-29 15:28:26.455329643 +0000
    @@ -7,14 +7,6 @@
     - mgr
     service_type: host
     ---
    -addr: 192.168.24.12
    -hostname: oc0-controller-1
    -labels:
    -- _admin
    -- mon
    -- mgr
    -service_type: host
     ----
     addr: 192.168.24.19
     hostname: oc0-controller-2
     labels:
    @@ -38,7 +30,7 @@
     placement:
       hosts:
       - oc0-controller-0
    -  - oc0-controller-1
    +  - oc0-ceph-0
       - oc0-controller-2
     service_id: mon
     service_name: mon
    @@ -47,8 +39,8 @@
     placement:
       hosts:
       - oc0-controller-0
    -  - oc0-controller-1
       - oc0-controller-2
    +  - oc0-ceph-0
     service_id: mgr
     service_name: mgr
     service_type: mgr
  17. Apply the resulting spec:

    ceph orch apply -i ceph_spec.yaml
    
     The result of 12 is having a new mgr deployed on the oc0-ceph-0 node, and the spec reconciled within cephadm
    
    [ceph: root@oc0-controller-0 specs]# ceph orch ls
    NAME                     PORTS  RUNNING  REFRESHED  AGE  PLACEMENT
    crash                               4/4  5m ago     61m  *
    mgr                                 3/3  5m ago     69s  oc0-controller-0;oc0-ceph-0;oc0-controller-2
    mon                                 3/3  5m ago     70s  oc0-controller-0;oc0-ceph-0;oc0-controller-2
    osd.default_drive_group               8  2m ago     69s  oc0-ceph-0;oc0-ceph-1
    
    [ceph: root@oc0-controller-0 specs]# ceph -s
      cluster:
        id:     f6ec3ebe-26f7-56c8-985d-eb974e8e08e3
        health: HEALTH_WARN
                1 stray host(s) with 1 daemon(s) not managed by cephadm
    
      services:
        mon: 3 daemons, quorum oc0-controller-0,oc0-controller-2,oc0-ceph-0 (age 5m)
        mgr: oc0-controller-0.xzgtvo(active, since 62m), standbys: oc0-controller-2.ahrgsk, oc0-ceph-0.hccsbb
        osd: 8 osds: 8 up (since 42m), 8 in (since 49m); 1 remapped pgs
    
      data:
        pools:   1 pools, 1 pgs
        objects: 0 objects, 0 B
        usage:   43 MiB used, 400 GiB / 400 GiB avail
        pgs:     1 active+clean
  18. Fix the warning by refreshing the mgr:

    ceph mgr fail oc0-controller-0.xzgtvo

    At this point the Red Hat Ceph Storage cluster is clean:

    [ceph: root@oc0-controller-0 specs]# ceph -s
      cluster:
        id:     f6ec3ebe-26f7-56c8-985d-eb974e8e08e3
        health: HEALTH_OK
    
      services:
        mon: 3 daemons, quorum oc0-controller-0,oc0-controller-2,oc0-ceph-0 (age 7m)
        mgr: oc0-controller-2.ahrgsk(active, since 25s), standbys: oc0-controller-0.xzgtvo, oc0-ceph-0.hccsbb
        osd: 8 osds: 8 up (since 44m), 8 in (since 50m); 1 remapped pgs
    
      data:
        pools:   1 pools, 1 pgs
        objects: 0 objects, 0 B
        usage:   43 MiB used, 400 GiB / 400 GiB avail
        pgs:     1 active+clean

    The oc0-controller-1 is removed and powered off without leaving traces on the Red Hat Ceph Storage cluster.

  19. Repeat this procedure for additional Controller nodes in your environment until you have migrated all the Ceph Mon and Ceph Manager daemons to the target nodes.
Red Hat logoGithubRedditYoutubeTwitter

学习

尝试、购买和销售

社区

关于红帽文档

通过我们的产品和服务,以及可以信赖的内容,帮助红帽用户创新并实现他们的目标。

让开源更具包容性

红帽致力于替换我们的代码、文档和 Web 属性中存在问题的语言。欲了解更多详情,请参阅红帽博客.

關於紅帽

我们提供强化的解决方案,使企业能够更轻松地跨平台和环境(从核心数据中心到网络边缘)工作。

© 2024 Red Hat, Inc.