이 콘텐츠는 선택한 언어로 제공되지 않습니다.

Chapter 9. Upgrading an overcloud with director-deployed Ceph deployments


If your environment includes director-deployed Red Hat Ceph Storage deployments with or without hyperconverged infrastructure (HCI) nodes, you must upgrade your deployments to Red Hat Ceph Storage 5. With an upgrade to version 5, cephadm now manages Red Hat Ceph Storage instead of ceph-ansible.

Note

If you are using the Red Hat Ceph Storage Object Gateway (RGW), ensure that all RGW pools have the application label rgw as described in Why are the RGW services crashing after running the cephadm adoption playbook?.

Implementing this configuration change addresses a common issue encountered when upgrading from Red Hat Ceph Storage Release 4 to 5.

9.1. Installing ceph-ansible

If you deployed Red Hat Ceph Storage using director, you must complete this procedure. The ceph-ansible package is required to upgrade Red Hat Ceph Storage with Red Hat OpenStack Platform.

Procedure

  1. Enable the Ceph 5 Tools repository:

    [stack@director ~]$ sudo subscription-manager repos --enable=rhceph-5-tools-for-rhel-8-x86_64-rpms
    Copy to Clipboard Toggle word wrap
  2. Install the ceph-ansible package:

    [stack@director ~]$ sudo dnf install -y ceph-ansible
    Copy to Clipboard Toggle word wrap

9.2. Downloading Red Hat Ceph Storage containers to the undercloud from Satellite

If the Red Hat Ceph Storage container image is hosted on a Red Hat Satellite Server, then you must download a copy of the image to the undercloud before starting the Red Hat Ceph Storage upgrade using Red Hat Satellite.

Prerequisite

  • The required Red Hat Ceph Storage container image is hosted on the Satellite Server.

Procedure

  1. Log in to the undercloud node as the stack user.
  2. Download the Red Hat Ceph Storage container image from the Satellite Server:

    $ sudo podman pull <ceph_image_file>
    Copy to Clipboard Toggle word wrap
    • Replace <ceph_image_file> with the Red Hat Ceph Storage container image file hosted on the Satellite Server. The following is an example of this command:

      $ sudo podman pull satellite.example.com/container-images-osp-17_1-rhceph-5-rhel8:latest
      Copy to Clipboard Toggle word wrap

9.3. Upgrading to Red Hat Ceph Storage 5

Upgrade the following nodes from Red Hat Ceph Storage version 4 to version 5:

  • Red Hat Ceph Storage nodes
  • Hyperconverged infrastructure (HCI) nodes, which contain combined Compute and Ceph OSD services

For information about the duration and impact of this upgrade procedure, see Upgrade duration and impact.

Note

Red Hat Ceph Storage 5 uses Prometheus v4.10, which has the following known issue: If you enable Red Hat Ceph Storage dashboard, two data sources are configured on the dashboard. For more information about this known issue, see BZ#2054852.

Red Hat Ceph Storage 6 uses Prometheus v4.12, which does not include this known issue. Red Hat recommends upgrading from Red Hat Ceph Storage 5 to Red Hat Ceph Storage 6 after the upgrade from Red Hat OpenStack Platform (RHOSP) 16.2 to 17.1 is complete. To upgrade from Red Hat Ceph Storage version 5 to version 6, begin with one of the following procedures for your environment:

Procedure

  1. Log in to the undercloud host as the stack user.
  2. Source the stackrc undercloud credentials file:

    $ source ~/stackrc
    Copy to Clipboard Toggle word wrap
  3. Run the Red Hat Ceph Storage external upgrade process with the ceph tag:

    $ openstack overcloud external-upgrade run \
       --skip-tags "ceph_ansible_remote_tmp" \
       --stack <stack> \
       --tags ceph,facts 2>&1
    Copy to Clipboard Toggle word wrap
    • Replace <stack> with the name of your stack.
    • If you are running this command at a DCN deployed site, add the value skip-tag cleanup_cephansible to the provided comma-separated list of values for the --skip-tags parameter.
  4. Run the ceph versions command to confirm all Red Hat Ceph Storage daemons have been upgraded to version 5. This command is available in the ceph monitor container that is hosted by default on the Controller node.

    Important

    The command in the previous step runs the ceph-ansible rolling_update.yaml playbook to update the cluster from version 4 to 5. It is important to confirm all daemons have been updated before proceeding with this procedure.

    The following example demonstrates the use and output of this command. As demonstrated in the example, all daemons in your deployment should show a package version of 16.2.* and the keyword pacific.

    $ sudo podman exec ceph-mon-$(hostname -f) ceph versions
    {
        "mon": {
            "ceph version 16.2.10-248.el8cp (0edb63afd9bd3edb333364f2e0031b77e62f4896) pacific (stable)": 3
        },
        "mgr": {
            "ceph version 16.2.10-248.el8cp (0edb63afd9bd3edb333364f2e0031b77e62f4896) pacific (stable)": 3
        },
        "osd": {
            "ceph version 16.2.10-248.el8cp (0edb63afd9bd3edb333364f2e0031b77e62f4896) pacific (stable)": 180
        },
        "mds": {},
        "rgw": {
            "ceph version 16.2.10-248.el8cp (0edb63afd9bd3edb333364f2e0031b77e62f4896) pacific (stable)": 3
        },
        "overall": {
            "ceph version 16.2.10-248.el8cp (0edb63afd9bd3edb333364f2e0031b77e62f4896) pacific (stable)": 189
        }
    }
    Copy to Clipboard Toggle word wrap
    Note

    The output of the command sudo podman ps | grep ceph on any server hosting Red Hat Ceph Storage should return a version 5 container.

  5. Create the ceph-admin user and distribute the appropriate keyrings:

    ANSIBLE_LOG_PATH=/home/stack/cephadm_enable_user_key.log \
    ANSIBLE_HOST_KEY_CHECKING=false \
    ansible-playbook -i /home/stack/overcloud-deploy/<stack>/config-download/<stack>/tripleo-ansible-inventory.yaml \
      -b -e ansible_python_interpreter=/usr/libexec/platform-python /usr/share/ansible/tripleo-playbooks/ceph-admin-user-playbook.yml \
     -e tripleo_admin_user=ceph-admin \
     -e distribute_private_key=true \
      --limit Undercloud,ceph_mon,ceph_mgr,ceph_rgw,ceph_mds,ceph_nfs,ceph_grafana,ceph_osd
    Copy to Clipboard Toggle word wrap
  6. Update the packages on the Red Hat Ceph Storage nodes:

    $ openstack overcloud upgrade run \
        --stack <stack> \
        --skip-tags ceph_ansible_remote_tmp \
        --tags setup_packages --limit Undercloud,ceph_mon,ceph_mgr,ceph_rgw,ceph_mds,ceph_nfs,ceph_grafana,ceph_osd \
        --playbook /home/stack/overcloud-deploy/<stack>/config-download/<stack>/upgrade_steps_playbook.yaml 2>&1
    Copy to Clipboard Toggle word wrap
    • If you are running this command at a DCN deployed site, add the value skip-tag cleanup_cephansible to the provided comma-separated list of values for the --skip-tags parameter.

      Note

      By default, the Ceph Monitor service (CephMon) runs on the Controller nodes unless you have used the composable roles feature to host them elsewhere. This command includes the ceph_mon tag, which also updates the packages on the nodes hosting the Ceph Monitor service (the Controller nodes by default).

  7. Configure the Red Hat Ceph Storage nodes to use cephadm:

    $ openstack overcloud external-upgrade run \
        --skip-tags ceph_ansible_remote_tmp \
        --stack <stack> \
        --tags cephadm_adopt  2>&1
    Copy to Clipboard Toggle word wrap
    • If you are running this command at a DCN deployed site, add the value skip-tag cleanup_cephansible to the provided comma-separated list of values for the --skip-tags parameter.

      Note

      The adoption of cephadm can cause downtime in the RGW and Alertmanager services. For more information about these issues, see Restarting Red Hat Ceph Storage 5 services.

  8. Run the ceph -s command to confirm all processes are now managed by Red Hat Ceph Storage orchestrator. This command is available in the ceph monitor container that is hosted by default on the Controller node.

    Important

    The command in the previous step runs the ceph-ansible cephadm-adopt.yaml playbook to move future management of the cluster from ceph-ansible to cephadm and the Red Hat Ceph Storage orchestrator. It is important to confirm all processes are now managed by the orcestrator before proceeding with this procedure.

    The following example demonstrates the use and output of this command. As demonstrated in this example, there are 63 daemons that are not managed by cephadm. This indicates there was a problem with the running of the ceph-ansible cephadm-adopt.yml playbook. Contact Red Hat Ceph Storage support to troubleshoot these errors before proceeding with the upgrade. When the adoption process has been completed successfully, there should not be any warning about stray daemons not managed by cephadm.

    $ sudo cephadm shell -- ceph -s
      cluster:
        id:     f5a40da5-6d88-4315-9bb3-6b16df51d765
        health: HEALTH_WARN
                63 stray daemon(s) not managed by cephadm
    Copy to Clipboard Toggle word wrap
  9. Modify the overcloud_upgrade_prepare.sh file to replace the ceph-ansible file with a cephadm heat environment file:

    Important

    Do not include ceph-ansible environment or deployment files, for example, environments/ceph-ansible/ceph-ansible.yaml or deployment/ceph-ansible/ceph-grafana.yaml, in openstack deployment commands such as openstack overcloud upgrade prepare and openstack overcloud deploy. For more information about replacing ceph-ansible environment and deployment files with cephadm files, see Implications of upgrading to Red Hat Ceph Storage 5.

    #!/bin/bash
    openstack overcloud upgrade prepare  --yes \
      --timeout 460 \
     --templates /usr/share/openstack-tripleo-heat-templates \
      --ntp-server 192.168.24.1 \
      --stack <stack> \
      -r /home/stack/roles_data.yaml \
      -e /home/stack/templates/internal.yaml \
      …
      -e <cephadm-file> \
      -e ~/containers-prepare-parameter.yaml
    Copy to Clipboard Toggle word wrap

    where:

    <cephadm-file>
    • If you deployed RGW in a previous RHOSP version, or if you plan to deploy RGW, use environments/cephadm/cephadm.yaml.
    • If you plan to deploy RBD, use environments/cephadm/cephadm-rbd-only.yaml.
  10. Modify the overcloud_upgrade_prepare.sh file to remove the following environment file if you added it earlier when you ran the overcloud upgrade preparation:

    -e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/manila-cephfsganesha-config.yaml
    Copy to Clipboard Toggle word wrap
  11. Save the file.
  12. Run the upgrade preparation command:

    $ source stackrc
    $ chmod 755 /home/stack/overcloud_upgrade_prepare.sh
    sh /home/stack/overcloud_upgrade_prepare.sh
    Copy to Clipboard Toggle word wrap
  13. If your deployment includes HCI nodes, create a temporary hci.conf file in a cephadm container of a Controller node:

    1. Log in to a Controller node:

      $ ssh cloud-admin@<controller_ip>
      Copy to Clipboard Toggle word wrap
      • Replace <controller_ip> with the IP address of the Controller node.
    2. Retrieve a cephadm shell from the Controller node:

      Example

      [cloud-admin@controller-0 ~]$ sudo cephadm shell
      Copy to Clipboard Toggle word wrap

    3. In the cephadm shell, create a temporary hci.conf file:

      Example

      [ceph: root@edpm-controller-0 /]# cat <<EOF > hci.conf
      [osd]
      osd_memory_target_autotune = true
      osd_numa_auto_affinity = true
      [mgr]
      mgr/cephadm/autotune_memory_target_ratio = 0.2
      EOF
      …​
      Copy to Clipboard Toggle word wrap

    4. Apply the configuration:

      Example

      [ceph: root@edpm-controller-0 /]# ceph config assimilate-conf -i hci.conf
      Copy to Clipboard Toggle word wrap

      For more information about adjusting the configuration of your HCI deployment, see Ceph configuration overrides for HCI in Deploying a hyperconverged infrastructure.

Important

You must upgrade the operating system on all HCI nodes to RHEL 9. For more information on upgrading Compute and HCI nodes, see Upgrading Compute nodes to RHEL 9.2.

Important

If Red Hat Ceph Storage Rados Gateway (RGW) is used for object storage, complete the steps in Ceph config overrides set for the RGWs on the RHCS 4.x does not get reflected after the Upgrade to RHCS 5.x to ensure your Red Hat Ceph Storage 4 configuration is reflected completely in Red Hat Ceph Storage 5.

Important

If the Red Hat Ceph Storage Dashboard is installed, complete the steps in After FFU 16.2 to 17.1, Ceph Grafana dashboard failed to start due to incorrect dashboard configuration to ensure it is properly configured.

9.4. Restarting Red Hat Ceph Storage 5 services

After the adoption from ceph-ansible to cephadm, the Alertmanager service (a component of the dashboard) or RGW might go offline. This is due to the following issues related to cephadm adoption in Red Hat Ceph Storage 5:

You can restart these services immediately by following the procedures in this section but doing so requires restarting the HAProxy service. Restarting the HAProxy service causes a brief service interruption of the Red Hat OpenStack Platform (RHOSP) control plane.

Do not perform the procedures in this section if any of the following statements are true:

  • You do not deploy Red Hat Ceph Storage Dashboard or RGW.
  • You do not immediately require the Alertmanager and RGW services.
  • You do not want the control plane downtime caused by restarting HAProxy.
  • You plan on upgrading to Red Hat Ceph Storage 6 before ending the maintenance window for the upgrade.

If any of these statements are true, proceed with the upgrade process as described in subsequent chapters and upgrade to Red Hat Ceph Storage 6 when you reach the section Upgrading Red Hat Ceph Storage 5 to 6. Complete all intervening steps in the upgrade process before attempting to upgrade to Release 6.

9.4.1. Restarting the Red Hat Ceph Storage 5 Object Gateway

After migrating from ceph-ansible to cephadm, you might have to restart the Red Hat Ceph Storage Object Gateway (RGW) before continuing with the process if it has not already restarted. RGW starts automatically when HAProxy is offline. Once RGW is online, HAproxy can be started. This is because RGW currently checks if ports are open for all IPs instead of only the IPs in use. When BZ#2356354 is resolved, RGW will only check the ports for the IPs in use.

Warning

Restarting the HAProxy service introduces downtime to the Red Hat OpenStack Platform control plane. The downtime lasts as long as is required for the HAProxy service to restart.

Procedure

  1. Log in to the OpenStack Controller node.

    Note

    Confirm that you are logged into a node that is hosting the Ceph Manager service. In default deployments, this is the OpenStack Controller node. You must be logged into a Controller node that is running the Ceph Manager service.

  2. Determine the current health of the Red Hat Ceph Storage 5 deployment:

    # sudo cephadm shell -- ceph health detail
    Copy to Clipboard Toggle word wrap
  3. Observe the command output.

    If the output displays no issues with the deployment health, you do not have to complete this procedure. If the following error is displayed in the command output, you must proceed with restarting HAProxy to restart RGW:

    HEALTH_WARN Failed to place 1 daemon(s); 3 failed cephadm daemon(s)
    [WRN] CEPHADM_DAEMON_PLACE_FAIL: Failed to place 1 daemon(s)
        Failed while placing rgw.host42.host42.foo on host42: cephadm exited with an error code: 1, stderr:Non-zero exit code 125 from /bin/podman container inspect --format {{.State.Status}} ceph-5ffc7906-2722-4602-9478-e2fe6ad3ff49-rgw-host42-host42-foo
    /bin/podman: stderr Error: error inspecting object: no such container ceph-5ffc7906-2722-4602-9478-e2fe6ad3ff49-rgw-host42-host42-foo
    Non-zero exit code 125 from /bin/podman container inspect --format {{.State.Status}} ceph-5ffc7906-2722-4602-9478-e2fe6ad3ff49-rgw.host42.host42.foo
    /bin/podman: stderr Error: error inspecting object: no such container ceph-5ffc7906-2722-4602-9478-e2fe6ad3ff49-rgw.host42.host42.foo
    Deploy daemon rgw.host42.host42.foo ...
    Verifying port 8080 ...
    Cannot bind to IP 0.0.0.0 port 8080: [Errno 98] Address already in use
    ERROR: TCP Port(s) '8080' required for rgw already in use
    Copy to Clipboard Toggle word wrap
  4. Stop the HAProxy service:

    # pcs resource disable haproxy-bundle
    Copy to Clipboard Toggle word wrap
    Note

    RGW should now restart automatically.

  5. Confirm that RGW restarted:

    # sudo cephadm shell -- ceph orch ps
    Copy to Clipboard Toggle word wrap
  6. Observe the command output.

    The following is an example of the command output confirming that all services are running:

    rgw.host42.host42.qfeedh  host42  10.0.42.20:8080  running (62s)    58s ago  62s    60.1M        -  16.2.10-275.el8cp  d7a74ab527fa  b60d550cdc91
    rgw.host43.host43.ykpwef  host43  10.0.42.21:8080  running (65s)    58s ago  64s    58.9M        -  16.2.10-275.el8cp  d7a74ab527fa  ddea7b33bfc9
    rgw.host44.host44.tsepgo  host44  10.0.42.22:8080  running (56s)    51s ago  55s    62.2M        -  16.2.10-275.el8cp  d7a74ab527fa  c1e87e8744ce
    Copy to Clipboard Toggle word wrap
  7. Start the HAProxy service:

    # pcs resource enable haproxy-bundle
    Copy to Clipboard Toggle word wrap
    Note

    When BZ#2356354 is resolved, this procedure will no longer be necessary. Upgrading to Red Hat Ceph Storage 6 using the procedures in Upgrading Red Hat Ceph Storage 5 to 6 will also correct this issue.

9.4.2. Restarting the Red Hat Ceph Storage 5 Alertmanager service

After migrating from ceph-ansible to cephadm, you can restart the Alertmanager service before continuing with the process. Restarting the Alertmanager service requires restarting the HAProxy service as well.

Warning

Restarting the HAProxy service introduces downtime to the Red Hat OpenStack Platform control plane. The downtime lasts as long as is required for the HAProxy service to restart.

Procedure

  1. Log in to the OpenStack Controller node.

    Note

    Confirm that you are logged into a node that is hosting the Ceph Manager service. In default deployments, this is the OpenStack Controller node. You must be logged into a Controller node that is running the Ceph Manager service.

  2. View the current Alertmanager specification file:

    $ sudo cephadm shell -- ceph orch ls --export alertmanager
    Copy to Clipboard Toggle word wrap
  3. Create a specification file for the Alertmanager service based on the output from the previous step.

    The following is an example of a specification file:

    service_type: alertmanager
    service_name: alertmanager
    placement:
      count: 3
      label: monitoring
    networks:
    - 10.10.10.0/24
    - 10.10.11.0/24
    Copy to Clipboard Toggle word wrap
    Note

    The IP addresses in the networks list should correspond with the Storage/Ceph public networks in your environment.

  4. Save the specification file as /root/alertmanager.spec.
  5. Stop the HAProxy service:

    # pcs resource disable haproxy-bundle
    Copy to Clipboard Toggle word wrap
  6. Stop the Alertmanager service:

    # cephadm shell -k /etc/ceph/<stack>.client.admin.keyring -- ceph orch rm alertmanager
    Copy to Clipboard Toggle word wrap
    • Replace <stack> with the name of your stack.
  7. Start the Alertmanager service:

    # cephadm shell -k /etc/ceph/<stack>.client.admin.keyring -m /root/alertmanager.spec -- ceph orch apply -i /mnt/alertmanager.spec
    Copy to Clipboard Toggle word wrap
    • Replace <stack> with the name of your stack.
  8. Start the HAProxy service:

    # pcs resource enable haproxy-bundle
    Copy to Clipboard Toggle word wrap
Note

Perform this procedure again if the Alertmanager service does not restart, adding a port definition to the specification file. The following is the previous specification file example with a port definition added:

service_type: alertmanager
service_name: alertmanager
placement:
  count: 3
  label: monitoring
networks:
- 10.10.10.0/24
- 10.10.11.0/24
spec:
  port: 4200 
1
Copy to Clipboard Toggle word wrap
1
Custom port definition. Use a port that corresponds to your deployment environment.

9.5. Implications of upgrading to Red Hat Ceph Storage 5

The Red Hat Ceph Storage cluster is now upgraded to version 5. This has the following implications:

  • You no longer use ceph-ansible to manage Red Hat Ceph Storage. Instead, the Ceph Orchestrator manages the Red Hat Ceph Storage cluster. For more information about the Ceph Orchestrator, see The Ceph Operations Guide.
  • You no longer need to perform stack updates to make changes to the Red Hat Ceph Storage cluster in most cases. Instead, you can run day two Red Hat Ceph Storage operations directly on the cluster as described in The Ceph Operations Guide. You can also scale Red Hat Ceph Storage cluster nodes up or down as described in Scaling the Ceph Storage cluster in Deploying Red Hat Ceph Storage and Red Hat OpenStack Platform together with director.
  • You can inspect the Red Hat Ceph Storage cluster’s health. For more information about monitoring your cluster’s health, see Monitoring Red Hat Ceph Storage nodes in Deploying Red Hat Ceph Storage and Red Hat OpenStack Platform together with director.
  • Do not include environment files or deployment files, for example, environments/ceph-ansible/ceph-ansible.yaml or deployment/ceph-ansible/ceph-grafana.yaml, in openstack deployment commands such as openstack overcloud upgrade prepare and openstack overcloud deploy. If your deployment includes ceph-ansible environment or deployment files, replace them with one of the following options:

    Expand
    Red Hat Ceph Storage deploymentOriginal ceph-ansible fileCephadm file replacement

    Ceph RADOS Block Device (RBD) only

    environments/ceph-ansible/ceph-ansible.yaml

    environments/cephadm/cephadm-rbd-only.yaml

    RBD and the Ceph Object Gateway (RGW)

    environments/ceph-ansible/ceph-rgw.yaml

    environments/cephadm/cephadm.yaml

    Ceph Dashboard

    environments/ceph-ansible/ceph-dashboard.yaml

    Respective file in environments/cephadm/

    Ceph MDS

    environments/ceph-ansible/ceph-mds.yaml

    Respective file in environments/cephadm/

맨 위로 이동
Red Hat logoGithubredditYoutubeTwitter

자세한 정보

평가판, 구매 및 판매

커뮤니티

Red Hat 문서 정보

Red Hat을 사용하는 고객은 신뢰할 수 있는 콘텐츠가 포함된 제품과 서비스를 통해 혁신하고 목표를 달성할 수 있습니다. 최신 업데이트를 확인하세요.

보다 포괄적 수용을 위한 오픈 소스 용어 교체

Red Hat은 코드, 문서, 웹 속성에서 문제가 있는 언어를 교체하기 위해 최선을 다하고 있습니다. 자세한 내용은 다음을 참조하세요.Red Hat 블로그.

Red Hat 소개

Red Hat은 기업이 핵심 데이터 센터에서 네트워크 에지에 이르기까지 플랫폼과 환경 전반에서 더 쉽게 작업할 수 있도록 강화된 솔루션을 제공합니다.

Theme

© 2025 Red Hat