此内容没有您所选择的语言版本。
Chapter 9. Migrating the monitoring stack component to new nodes within an existing Red Hat Ceph Storage cluster
In the context of data plane adoption, where the Red Hat OpenStack Platform (RHOSP) services are redeployed in Red Hat OpenShift Container Platform, a director-deployed Red Hat Ceph Storage cluster will undergo a migration in a process we are calling “externalizing” the Red Hat Ceph Storage cluster. There are two deployment topologies, broadly, that include an “internal” Red Hat Ceph Storage cluster today: one is where RHOSP includes dedicated Red Hat Ceph Storage nodes to host object storage daemons (OSDs), and the other is Hyperconverged Infrastructure (HCI) where Compute nodes double up as Red Hat Ceph Storage nodes. In either scenario, there are some Red Hat Ceph Storage processes that are deployed on RHOSP Controller nodes: Red Hat Ceph Storage monitors, Ceph Object Gateway (RGW), Rados Block Device (RBD), Ceph Metadata Server (MDS), Ceph Dashboard, and NFS Ganesha. The Ceph Dashboard module adds web-based monitoring and administration to the Ceph Manager. With director-deployed Red Hat Ceph Storage this component is enabled as part of the overcloud deploy and it’s composed by:
- Ceph Manager module
- Grafana
- Prometheus
- Alertmanager
- Node exporter
The Ceph Dashboard containers are included through tripleo-container-image-prepare
parameters and the high availability relies on Haproxy
and Pacemaker
deployed on the RHOSP front. For an external Red Hat Ceph Storage cluster, high availability is not supported. The goal of this procedure is to migrate and relocate the Ceph Monitoring components to free Controller nodes.
For this procedure, we assume that we are beginning with a RHOSP based on 17.1 and a Red Hat Ceph Storage 7 deployment managed by director. We assume that:
- Red Hat Ceph Storage has been upgraded to 7 and is managed by cephadm/orchestrator
- Both the Red Hat Ceph Storage public and cluster networks are propagated, throughdirector, to the target nodes
9.1. Completing prerequisites for a Red Hat Ceph Storage cluster with monitoring stack components
You must complete the following prerequisites before you migrate a Red Hat Ceph Storage cluster with monitoring stack components.
Procedure
Gather the current status of the monitoring stack. Verify that the hosts have no
monitoring
label (orgrafana
,prometheus
,alertmanager
in case of a per daemons placement evaluation) associated:NoteThe entire relocation process is driven by cephadm and relies on labels to be assigned to the target nodes, where the daemons are scheduled. Review the cardinality matrix before assigning labels and choose carefully the nodes where the monitoring stack components should be scheduled on.
[tripleo-admin@controller-0 ~]$ sudo cephadm shell -- ceph orch host ls HOST ADDR LABELS STATUS cephstorage-0.redhat.local 192.168.24.11 osd mds cephstorage-1.redhat.local 192.168.24.12 osd mds cephstorage-2.redhat.local 192.168.24.47 osd mds controller-0.redhat.local 192.168.24.35 _admin mon mgr controller-1.redhat.local 192.168.24.53 mon _admin mgr controller-2.redhat.local 192.168.24.10 mon _admin mgr 6 hosts in cluster
Confirm that the cluster is healthy and both
ceph orch ls
andceph orch ps
return the expected number of deployed daemons.Review and update the container image registry. If the Red Hat Ceph Storage externalization procedure is executed after the Red Hat OpenStack Platform control plane has been migrated, it’s important to consider updating the container images referenced in the Red Hat Ceph Storage cluster config. The current container images point to the undercloud registry, and it might be no longer available. As the undercloud won’t be available in the future, replace the undercloud provided images with an alternative registry.
$ ceph config dump ... ... mgr advanced mgr/cephadm/container_image_alertmanager undercloud-0.ctlplane.redhat.local:8787/rh-osbs/openshift-ose-prometheus-alertmanager:v4.10 mgr advanced mgr/cephadm/container_image_base undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph mgr advanced mgr/cephadm/container_image_grafana undercloud-0.ctlplane.redhat.local:8787/rh-osbs/grafana:latest mgr advanced mgr/cephadm/container_image_node_exporter undercloud-0.ctlplane.redhat.local:8787/rh-osbs/openshift-ose-prometheus-node-exporter:v4.10 mgr advanced mgr/cephadm/container_image_prometheus undercloud-0.ctlplane.redhat.local:8787/rh-osbs/openshift-ose-prometheus:v4.10
Remove the undercloud container images:
# remove the base image cephadm shell -- ceph config rm mgr mgr/cephadm/container_image_base # remove the undercloud images associated to the monitoring # stack components for i in prometheus grafana alertmanager node_exporter; do cephadm shell -- ceph config rm mgr mgr/cephadm/container_image_$i done
In the example above, in addition to the monitoring stack related container images, we update the config entry related to the container_image_base. This has an impact on all the Red Hat Ceph Storage daemons that rely on the undercloud images. New daemons will be deployed using the new/default Red Hat Ceph Storage image.
9.2. Migrating the monitoring stack to the target nodes
The migration procedure relies on nodes re-labeling: this kind of action, combined with an update in the existing spec, results in the daemons' relocation on the target nodes.
Before start this process, a few considerations are required:
- There’s no need to migrate node exporters: these daemons are deployed across the nodes that are part of the Red Hat Ceph Storage cluster (placement is ‘*’), and we’re going to lose metrics as long as the Controller nodes are not part of the Red Hat Ceph Storage cluster anymore
- Each monitoring stack component is bound to specific ports that director is supposed to open beforehand; make sure to double check the firewall rules are in place and the ports are opened for a given monitoring stack service
Depending on the target nodes and the number of deployed/active daemons, it is possible to either relocate the existing containers to the target nodes, or select a subset of nodes that are supposed to host the monitoring stack daemons. As we mentioned in the previous section, HA is not supported, hence reducing the placement with count: 1
is a reasonable solution and allows to successfully migrate the existing daemons in an HCI (or HW limited) scenario without impacting other services. However, it is still possible to put in place a dedicated HA solution and realize a component that is consistent with the director model to reach HA. Building and deployment such HA model is out of scope for this procedure.
9.2.1. Scenario 1: Migrating the existing daemons to the target nodes
Assuming we have 3 Red Hat Ceph Storage nodes or ComputeHCI, this scenario extends the “monitoring” labels to all the Red Hat Ceph Storage (or ComputeHCI) nodes that are part of the cluster. This means that we keep the count: 3 placements for the target nodes.
Procedure
Add the monitoring label to all the Red Hat Ceph Storage (or ComputeHCI) nodes in the cluster:
for item in $(sudo cephadm shell -- ceph orch host ls --format json | jq -r '.[].hostname'); do sudo cephadm shell -- ceph orch host label add $item monitoring; done
Verify that all the (three) hosts have the monitoring label:
[tripleo-admin@controller-0 ~]$ sudo cephadm shell -- ceph orch host ls HOST ADDR LABELS cephstorage-0.redhat.local 192.168.24.11 osd monitoring cephstorage-1.redhat.local 192.168.24.12 osd monitoring cephstorage-2.redhat.local 192.168.24.47 osd monitoring controller-0.redhat.local 192.168.24.35 _admin mon mgr monitoring controller-1.redhat.local 192.168.24.53 mon _admin mgr monitoring controller-2.redhat.local 192.168.24.10 mon _admin mgr monitoring
Remove the labels from the Controller nodes:
$ for i in 0 1 2; do ceph orch host label rm "controller-$i.redhat.local" monitoring; done Removed label monitoring from host controller-0.redhat.local Removed label monitoring from host controller-1.redhat.local Removed label monitoring from host controller-2.redhat.local
Dump the current monitoring stack spec:
function export_spec { local component="$1" local target_dir="$2" sudo cephadm shell -- ceph orch ls --export "$component" > "$target_dir/$component" } SPEC_DIR=${SPEC_DIR:-"$PWD/ceph_specs"} for m in grafana prometheus alertmanager; do export_spec "$m" "$SPEC_DIR" done
For each daemon, edit the current spec and replace the placement/hosts section with the placement/label section, for example:
service_type: grafana service_name: grafana placement: label: monitoring networks: - 172.17.3.0/24 spec: port: 3100
The same procedure applies to Prometheus and Alertmanager specs.
Apply the new monitoring spec to relocate the monitoring stack daemons:
SPEC_DIR=${SPEC_DIR:-"$PWD/ceph_specs"} function migrate_daemon { local component="$1" local target_dir="$2" sudo cephadm shell -m "$target_dir" -- ceph orch apply -i /mnt/ceph_specs/$component } for m in grafana prometheus alertmanager; do migrate_daemon "$m" "$SPEC_DIR" done
Verify that the daemons are deployed on the expected nodes:
[ceph: root@controller-0 /]# ceph orch ps | grep -iE "(prome|alert|grafa)" alertmanager.cephstorage-2 cephstorage-2.redhat.local 172.17.3.144:9093,9094 grafana.cephstorage-0 cephstorage-0.redhat.local 172.17.3.83:3100 prometheus.cephstorage-1 cephstorage-1.redhat.local 172.17.3.53:9092
NoteAfter you migrate the monitoring stack, you lose High Availability: the monitoring stack daemons have no VIP and HAproxy anymore; Node exporters are still running on all the nodes: instead of using labels we keep the current approach as we want to not reduce the monitoring space covered.
You must review the Red Hat Ceph Storageconfiguration to ensure that it is aligned with the relocation you just made. In particular, focus on the following configuration entries:
[ceph: root@controller-0 /]# ceph config dump ... mgr advanced mgr/dashboard/ALERTMANAGER_API_HOST http://172.17.3.83:9093 mgr advanced mgr/dashboard/GRAFANA_API_URL https://172.17.3.144:3100 mgr advanced mgr/dashboard/PROMETHEUS_API_HOST http://172.17.3.83:9092 mgr advanced mgr/dashboard/controller-0.ycokob/server_addr 172.17.3.33 mgr advanced mgr/dashboard/controller-1.lmzpuc/server_addr 172.17.3.147 mgr advanced mgr/dashboard/controller-2.xpdgfl/server_addr 172.17.3.138
Verify that
grafana
,alertmanager
andprometheus
API_HOST/URL
point to the IP addresses (on the storage network) of the node where each daemon has been relocated. This should be automatically addressed by cephadm and it shouldn’t require any manual action.[ceph: root@controller-0 /]# ceph orch ps | grep -iE "(prome|alert|grafa)" alertmanager.cephstorage-0 cephstorage-0.redhat.local 172.17.3.83:9093,9094 alertmanager.cephstorage-1 cephstorage-1.redhat.local 172.17.3.53:9093,9094 alertmanager.cephstorage-2 cephstorage-2.redhat.local 172.17.3.144:9093,9094 grafana.cephstorage-0 cephstorage-0.redhat.local 172.17.3.83:3100 grafana.cephstorage-1 cephstorage-1.redhat.local 172.17.3.53:3100 grafana.cephstorage-2 cephstorage-2.redhat.local 172.17.3.144:3100 prometheus.cephstorage-0 cephstorage-0.redhat.local 172.17.3.83:9092 prometheus.cephstorage-1 cephstorage-1.redhat.local 172.17.3.53:9092 prometheus.cephstorage-2 cephstorage-2.redhat.local 172.17.3.144:9092
[ceph: root@controller-0 /]# ceph config dump ... ... mgr advanced mgr/dashboard/ALERTMANAGER_API_HOST http://172.17.3.83:9093 mgr advanced mgr/dashboard/PROMETHEUS_API_HOST http://172.17.3.83:9092 mgr advanced mgr/dashboard/GRAFANA_API_URL https://172.17.3.144:3100
The Ceph Dashboard (mgr module plugin) has not been impacted at all by this relocation. The service is provided by the Ceph Manager daemon, hence we might experience an impact when the active mgr is migrated or is force-failed. However, having three replicas definition allows to redirect requests to a different instance (it’s still an A/P model), hence the impact should be limited.
When the RBD migration is over, the following Red Hat Ceph Storage config keys must be regenerated to point to the right mgr container:
mgr advanced mgr/dashboard/controller-0.ycokob/server_addr 172.17.3.33 mgr advanced mgr/dashboard/controller-1.lmzpuc/server_addr 172.17.3.147 mgr advanced mgr/dashboard/controller-2.xpdgfl/server_addr 172.17.3.138
$ sudo cephadm shell $ ceph orch ps | awk '/mgr./ {print $1}'
For each retrieved mgr, update the entry in the Red Hat Ceph Storage configuration:
$ ceph config set mgr mgr/dashboard/<>/server_addr/<ip addr>