搜索

此内容没有您所选择的语言版本。

Chapter 9. Migrating the monitoring stack component to new nodes within an existing Red Hat Ceph Storage cluster

download PDF

In the context of data plane adoption, where the Red Hat OpenStack Platform (RHOSP) services are redeployed in Red Hat OpenShift Container Platform, a director-deployed Red Hat Ceph Storage cluster will undergo a migration in a process we are calling “externalizing” the Red Hat Ceph Storage cluster. There are two deployment topologies, broadly, that include an “internal” Red Hat Ceph Storage cluster today: one is where RHOSP includes dedicated Red Hat Ceph Storage nodes to host object storage daemons (OSDs), and the other is Hyperconverged Infrastructure (HCI) where Compute nodes double up as Red Hat Ceph Storage nodes. In either scenario, there are some Red Hat Ceph Storage processes that are deployed on RHOSP Controller nodes: Red Hat Ceph Storage monitors, Ceph Object Gateway (RGW), Rados Block Device (RBD), Ceph Metadata Server (MDS), Ceph Dashboard, and NFS Ganesha. The Ceph Dashboard module adds web-based monitoring and administration to the Ceph Manager. With director-deployed Red Hat Ceph Storage this component is enabled as part of the overcloud deploy and it’s composed by:

  • Ceph Manager module
  • Grafana
  • Prometheus
  • Alertmanager
  • Node exporter

The Ceph Dashboard containers are included through tripleo-container-image-prepare parameters and the high availability relies on Haproxy and Pacemaker deployed on the RHOSP front. For an external Red Hat Ceph Storage cluster, high availability is not supported. The goal of this procedure is to migrate and relocate the Ceph Monitoring components to free Controller nodes.

For this procedure, we assume that we are beginning with a RHOSP based on 17.1 and a Red Hat Ceph Storage 7 deployment managed by director. We assume that:

  • Red Hat Ceph Storage has been upgraded to 7 and is managed by cephadm/orchestrator
  • Both the Red Hat Ceph Storage public and cluster networks are propagated, throughdirector, to the target nodes

9.1. Completing prerequisites for a Red Hat Ceph Storage cluster with monitoring stack components

You must complete the following prerequisites before you migrate a Red Hat Ceph Storage cluster with monitoring stack components.

Procedure

  1. Gather the current status of the monitoring stack. Verify that the hosts have no monitoring label (or grafana, prometheus, alertmanager in case of a per daemons placement evaluation) associated:

    Note

    The entire relocation process is driven by cephadm and relies on labels to be assigned to the target nodes, where the daemons are scheduled. Review the cardinality matrix before assigning labels and choose carefully the nodes where the monitoring stack components should be scheduled on.

    [tripleo-admin@controller-0 ~]$ sudo cephadm shell -- ceph orch host ls
    
    HOST                    	ADDR       	LABELS                 	STATUS
    cephstorage-0.redhat.local  192.168.24.11  osd mds
    cephstorage-1.redhat.local  192.168.24.12  osd mds
    cephstorage-2.redhat.local  192.168.24.47  osd mds
    controller-0.redhat.local   192.168.24.35  _admin mon mgr
    controller-1.redhat.local   192.168.24.53  mon _admin mgr
    controller-2.redhat.local   192.168.24.10  mon _admin mgr
    6 hosts in cluster

    Confirm that the cluster is healthy and both ceph orch ls and ceph orch ps return the expected number of deployed daemons.

  2. Review and update the container image registry. If the Red Hat Ceph Storage externalization procedure is executed after the Red Hat OpenStack Platform control plane has been migrated, it’s important to consider updating the container images referenced in the Red Hat Ceph Storage cluster config. The current container images point to the undercloud registry, and it might be no longer available. As the undercloud won’t be available in the future, replace the undercloud provided images with an alternative registry.

    $ ceph config dump
    ...
    ...
    mgr   advanced  mgr/cephadm/container_image_alertmanager    undercloud-0.ctlplane.redhat.local:8787/rh-osbs/openshift-ose-prometheus-alertmanager:v4.10
    mgr   advanced  mgr/cephadm/container_image_base            undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph
    mgr   advanced  mgr/cephadm/container_image_grafana         undercloud-0.ctlplane.redhat.local:8787/rh-osbs/grafana:latest
    mgr   advanced  mgr/cephadm/container_image_node_exporter   undercloud-0.ctlplane.redhat.local:8787/rh-osbs/openshift-ose-prometheus-node-exporter:v4.10
    mgr   advanced  mgr/cephadm/container_image_prometheus      undercloud-0.ctlplane.redhat.local:8787/rh-osbs/openshift-ose-prometheus:v4.10
  3. Remove the undercloud container images:

    # remove the base image
    cephadm shell -- ceph config rm mgr mgr/cephadm/container_image_base
    # remove the undercloud images associated to the monitoring
    # stack components
    for i in prometheus grafana alertmanager node_exporter; do
        cephadm shell -- ceph config rm mgr mgr/cephadm/container_image_$i
    done
Note

In the example above, in addition to the monitoring stack related container images, we update the config entry related to the container_image_base. This has an impact on all the Red Hat Ceph Storage daemons that rely on the undercloud images. New daemons will be deployed using the new/default Red Hat Ceph Storage image.

9.2. Migrating the monitoring stack to the target nodes

The migration procedure relies on nodes re-labeling: this kind of action, combined with an update in the existing spec, results in the daemons' relocation on the target nodes.

Before start this process, a few considerations are required:

  • There’s no need to migrate node exporters: these daemons are deployed across the nodes that are part of the Red Hat Ceph Storage cluster (placement is ‘*’), and we’re going to lose metrics as long as the Controller nodes are not part of the Red Hat Ceph Storage cluster anymore
  • Each monitoring stack component is bound to specific ports that director is supposed to open beforehand; make sure to double check the firewall rules are in place and the ports are opened for a given monitoring stack service

Depending on the target nodes and the number of deployed/active daemons, it is possible to either relocate the existing containers to the target nodes, or select a subset of nodes that are supposed to host the monitoring stack daemons. As we mentioned in the previous section, HA is not supported, hence reducing the placement with count: 1 is a reasonable solution and allows to successfully migrate the existing daemons in an HCI (or HW limited) scenario without impacting other services. However, it is still possible to put in place a dedicated HA solution and realize a component that is consistent with the director model to reach HA. Building and deployment such HA model is out of scope for this procedure.

9.2.1. Scenario 1: Migrating the existing daemons to the target nodes

Assuming we have 3 Red Hat Ceph Storage nodes or ComputeHCI, this scenario extends the “monitoring” labels to all the Red Hat Ceph Storage (or ComputeHCI) nodes that are part of the cluster. This means that we keep the count: 3 placements for the target nodes.

Procedure

  1. Add the monitoring label to all the Red Hat Ceph Storage (or ComputeHCI) nodes in the cluster:

    for item in $(sudo cephadm shell --  ceph orch host ls --format json | jq -r '.[].hostname'); do
        sudo cephadm shell -- ceph orch host label add  $item monitoring;
    done
  2. Verify that all the (three) hosts have the monitoring label:

    [tripleo-admin@controller-0 ~]$ sudo cephadm shell -- ceph orch host ls
    
    HOST                        ADDR           LABELS
    cephstorage-0.redhat.local  192.168.24.11  osd monitoring
    cephstorage-1.redhat.local  192.168.24.12  osd monitoring
    cephstorage-2.redhat.local  192.168.24.47  osd monitoring
    controller-0.redhat.local   192.168.24.35  _admin mon mgr monitoring
    controller-1.redhat.local   192.168.24.53  mon _admin mgr monitoring
    controller-2.redhat.local   192.168.24.10  mon _admin mgr monitoring
  3. Remove the labels from the Controller nodes:

    $ for i in 0 1 2; do ceph orch host label rm "controller-$i.redhat.local" monitoring; done
    
    Removed label monitoring from host controller-0.redhat.local
    Removed label monitoring from host controller-1.redhat.local
    Removed label monitoring from host controller-2.redhat.local
  4. Dump the current monitoring stack spec:

    function export_spec {
        local component="$1"
        local target_dir="$2"
        sudo cephadm shell -- ceph orch ls --export "$component" > "$target_dir/$component"
    }
    
    SPEC_DIR=${SPEC_DIR:-"$PWD/ceph_specs"}
    for m in grafana prometheus alertmanager; do
        export_spec "$m" "$SPEC_DIR"
    done
  5. For each daemon, edit the current spec and replace the placement/hosts section with the placement/label section, for example:

    service_type: grafana
    service_name: grafana
    placement:
      label: monitoring
    networks:
    - 172.17.3.0/24
    spec:
      port: 3100

    The same procedure applies to Prometheus and Alertmanager specs.

  6. Apply the new monitoring spec to relocate the monitoring stack daemons:

    SPEC_DIR=${SPEC_DIR:-"$PWD/ceph_specs"}
    function migrate_daemon {
        local component="$1"
        local target_dir="$2"
        sudo cephadm shell -m "$target_dir" -- ceph orch apply -i /mnt/ceph_specs/$component
    }
    for m in grafana prometheus alertmanager; do
        migrate_daemon  "$m" "$SPEC_DIR"
    done
  7. Verify that the daemons are deployed on the expected nodes:

    [ceph: root@controller-0 /]# ceph orch ps | grep -iE "(prome|alert|grafa)"
    alertmanager.cephstorage-2  cephstorage-2.redhat.local  172.17.3.144:9093,9094
    grafana.cephstorage-0       cephstorage-0.redhat.local  172.17.3.83:3100
    prometheus.cephstorage-1    cephstorage-1.redhat.local  172.17.3.53:9092
    Note

    After you migrate the monitoring stack, you lose High Availability: the monitoring stack daemons have no VIP and HAproxy anymore; Node exporters are still running on all the nodes: instead of using labels we keep the current approach as we want to not reduce the monitoring space covered.

  8. You must review the Red Hat Ceph Storageconfiguration to ensure that it is aligned with the relocation you just made. In particular, focus on the following configuration entries:

    [ceph: root@controller-0 /]# ceph config dump
    ...
    mgr  advanced  mgr/dashboard/ALERTMANAGER_API_HOST  http://172.17.3.83:9093
    mgr  advanced  mgr/dashboard/GRAFANA_API_URL        https://172.17.3.144:3100
    mgr  advanced  mgr/dashboard/PROMETHEUS_API_HOST    http://172.17.3.83:9092
    mgr  advanced  mgr/dashboard/controller-0.ycokob/server_addr  172.17.3.33
    mgr  advanced  mgr/dashboard/controller-1.lmzpuc/server_addr  172.17.3.147
    mgr  advanced  mgr/dashboard/controller-2.xpdgfl/server_addr  172.17.3.138
  9. Verify that grafana, alertmanager and prometheus API_HOST/URL point to the IP addresses (on the storage network) of the node where each daemon has been relocated. This should be automatically addressed by cephadm and it shouldn’t require any manual action.

    [ceph: root@controller-0 /]# ceph orch ps | grep -iE "(prome|alert|grafa)"
    alertmanager.cephstorage-0  cephstorage-0.redhat.local  172.17.3.83:9093,9094
    alertmanager.cephstorage-1  cephstorage-1.redhat.local  172.17.3.53:9093,9094
    alertmanager.cephstorage-2  cephstorage-2.redhat.local  172.17.3.144:9093,9094
    grafana.cephstorage-0       cephstorage-0.redhat.local  172.17.3.83:3100
    grafana.cephstorage-1       cephstorage-1.redhat.local  172.17.3.53:3100
    grafana.cephstorage-2       cephstorage-2.redhat.local  172.17.3.144:3100
    prometheus.cephstorage-0    cephstorage-0.redhat.local  172.17.3.83:9092
    prometheus.cephstorage-1    cephstorage-1.redhat.local  172.17.3.53:9092
    prometheus.cephstorage-2    cephstorage-2.redhat.local  172.17.3.144:9092
    [ceph: root@controller-0 /]# ceph config dump
    ...
    ...
    mgr  advanced  mgr/dashboard/ALERTMANAGER_API_HOST   http://172.17.3.83:9093
    mgr  advanced  mgr/dashboard/PROMETHEUS_API_HOST     http://172.17.3.83:9092
    mgr  advanced  mgr/dashboard/GRAFANA_API_URL         https://172.17.3.144:3100
  10. The Ceph Dashboard (mgr module plugin) has not been impacted at all by this relocation. The service is provided by the Ceph Manager daemon, hence we might experience an impact when the active mgr is migrated or is force-failed. However, having three replicas definition allows to redirect requests to a different instance (it’s still an A/P model), hence the impact should be limited.

    1. When the RBD migration is over, the following Red Hat Ceph Storage config keys must be regenerated to point to the right mgr container:

      mgr    advanced  mgr/dashboard/controller-0.ycokob/server_addr  172.17.3.33
      mgr    advanced  mgr/dashboard/controller-1.lmzpuc/server_addr  172.17.3.147
      mgr    advanced  mgr/dashboard/controller-2.xpdgfl/server_addr  172.17.3.138
      $ sudo cephadm shell
      $ ceph orch ps | awk '/mgr./ {print $1}'
    2. For each retrieved mgr, update the entry in the Red Hat Ceph Storage configuration:

      $ ceph config set mgr mgr/dashboard/<>/server_addr/<ip addr>
Red Hat logoGithubRedditYoutubeTwitter

学习

尝试、购买和销售

社区

关于红帽文档

通过我们的产品和服务,以及可以信赖的内容,帮助红帽用户创新并实现他们的目标。

让开源更具包容性

红帽致力于替换我们的代码、文档和 Web 属性中存在问题的语言。欲了解更多详情,请参阅红帽博客.

關於紅帽

我们提供强化的解决方案,使企业能够更轻松地跨平台和环境(从核心数据中心到网络边缘)工作。

© 2024 Red Hat, Inc.