此内容没有您所选择的语言版本。

Chapter 9. Migrating the monitoring stack component to new nodes within an existing Red Hat Ceph Storage cluster

In the context of data plane adoption, where the Red Hat OpenStack Platform (RHOSP) services are redeployed in Red Hat OpenShift Container Platform, a director-deployed Red Hat Ceph Storage cluster will undergo a migration in a process we are calling “externalizing” the Red Hat Ceph Storage cluster. There are two deployment topologies, broadly, that include an “internal” Red Hat Ceph Storage cluster today: one is where RHOSP includes dedicated Red Hat Ceph Storage nodes to host object storage daemons (OSDs), and the other is Hyperconverged Infrastructure (HCI) where Compute nodes double up as Red Hat Ceph Storage nodes. In either scenario, there are some Red Hat Ceph Storage processes that are deployed on RHOSP Controller nodes: Red Hat Ceph Storage monitors, Ceph Object Gateway (RGW), Rados Block Device (RBD), Ceph Metadata Server (MDS), Ceph Dashboard, and NFS Ganesha. The Ceph Dashboard module adds web-based monitoring and administration to the Ceph Manager. With director-deployed Red Hat Ceph Storage this component is enabled as part of the overcloud deploy and it’s composed by:

Ceph Manager module
Grafana
Prometheus
Alertmanager
Node exporter

The Ceph Dashboard containers are included through tripleo-container-image-prepare parameters and the high availability relies on Haproxy and Pacemaker deployed on the RHOSP front. For an external Red Hat Ceph Storage cluster, high availability is not supported. The goal of this procedure is to migrate and relocate the Ceph Monitoring components to free Controller nodes.

For this procedure, we assume that we are beginning with a RHOSP based on 17.1 and a Red Hat Ceph Storage 7 deployment managed by director. We assume that:

Red Hat Ceph Storage has been upgraded to 7 and is managed by cephadm/orchestrator
Both the Red Hat Ceph Storage public and cluster networks are propagated, throughdirector, to the target nodes

9.1. Completing prerequisites for a Red Hat Ceph Storage cluster with monitoring stack components

You must complete the following prerequisites before you migrate a Red Hat Ceph Storage cluster with monitoring stack components.

Procedure

Gather the current status of the monitoring stack. Verify that the hosts have no monitoring label (or grafana, prometheus, alertmanager in case of a per daemons placement evaluation) associated:
Note
The entire relocation process is driven by cephadm and relies on labels to be assigned to the target nodes, where the daemons are scheduled. Review the cardinality matrix before assigning labels and choose carefully the nodes where the monitoring stack components should be scheduled on.
```
[tripleo-admin@controller-0 ~]$ sudo cephadm shell -- ceph orch host ls

HOST                    	ADDR       	LABELS                 	STATUS
cephstorage-0.redhat.local  192.168.24.11  osd mds
cephstorage-1.redhat.local  192.168.24.12  osd mds
cephstorage-2.redhat.local  192.168.24.47  osd mds
controller-0.redhat.local   192.168.24.35  _admin mon mgr
controller-1.redhat.local   192.168.24.53  mon _admin mgr
controller-2.redhat.local   192.168.24.10  mon _admin mgr
6 hosts in cluster
```
Confirm that the cluster is healthy and both ceph orch ls and ceph orch ps return the expected number of deployed daemons.

Review and update the container image registry. If the Red Hat Ceph Storage externalization procedure is executed after the Red Hat OpenStack Platform control plane has been migrated, it’s important to consider updating the container images referenced in the Red Hat Ceph Storage cluster config. The current container images point to the undercloud registry, and it might be no longer available. As the undercloud won’t be available in the future, replace the undercloud provided images with an alternative registry.

$ ceph config dump
...
...
mgr   advanced  mgr/cephadm/container_image_alertmanager    undercloud-0.ctlplane.redhat.local:8787/rh-osbs/openshift-ose-prometheus-alertmanager:v4.10
mgr   advanced  mgr/cephadm/container_image_base            undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph
mgr   advanced  mgr/cephadm/container_image_grafana         undercloud-0.ctlplane.redhat.local:8787/rh-osbs/grafana:latest
mgr   advanced  mgr/cephadm/container_image_node_exporter   undercloud-0.ctlplane.redhat.local:8787/rh-osbs/openshift-ose-prometheus-node-exporter:v4.10
mgr   advanced  mgr/cephadm/container_image_prometheus      undercloud-0.ctlplane.redhat.local:8787/rh-osbs/openshift-ose-prometheus:v4.10

Remove the undercloud container images:

# remove the base image
cephadm shell -- ceph config rm mgr mgr/cephadm/container_image_base
# remove the undercloud images associated to the monitoring
# stack components
for i in prometheus grafana alertmanager node_exporter; do
    cephadm shell -- ceph config rm mgr mgr/cephadm/container_image_$i
done

Note

In the example above, in addition to the monitoring stack related container images, we update the config entry related to the container_image_base. This has an impact on all the Red Hat Ceph Storage daemons that rely on the undercloud images. New daemons will be deployed using the new/default Red Hat Ceph Storage image.

9.2. Migrating the monitoring stack to the target nodes

The migration procedure relies on nodes re-labeling: this kind of action, combined with an update in the existing spec, results in the daemons' relocation on the target nodes.

Before start this process, a few considerations are required:

There’s no need to migrate node exporters: these daemons are deployed across the nodes that are part of the Red Hat Ceph Storage cluster (placement is ‘*’), and we’re going to lose metrics as long as the Controller nodes are not part of the Red Hat Ceph Storage cluster anymore
Each monitoring stack component is bound to specific ports that director is supposed to open beforehand; make sure to double check the firewall rules are in place and the ports are opened for a given monitoring stack service

Depending on the target nodes and the number of deployed/active daemons, it is possible to either relocate the existing containers to the target nodes, or select a subset of nodes that are supposed to host the monitoring stack daemons. As we mentioned in the previous section, HA is not supported, hence reducing the placement with count: 1 is a reasonable solution and allows to successfully migrate the existing daemons in an HCI (or HW limited) scenario without impacting other services. However, it is still possible to put in place a dedicated HA solution and realize a component that is consistent with the director model to reach HA. Building and deployment such HA model is out of scope for this procedure.

9.2.1. Scenario 1: Migrating the existing daemons to the target nodes

Assuming we have 3 Red Hat Ceph Storage nodes or ComputeHCI, this scenario extends the “monitoring” labels to all the Red Hat Ceph Storage (or ComputeHCI) nodes that are part of the cluster. This means that we keep the count: 3 placements for the target nodes.

Procedure

Add the monitoring label to all the Red Hat Ceph Storage (or ComputeHCI) nodes in the cluster:

for item in $(sudo cephadm shell --  ceph orch host ls --format json | jq -r '.[].hostname'); do
    sudo cephadm shell -- ceph orch host label add  $item monitoring;
done

Verify that all the (three) hosts have the monitoring label:

[tripleo-admin@controller-0 ~]$ sudo cephadm shell -- ceph orch host ls

HOST                        ADDR           LABELS
cephstorage-0.redhat.local  192.168.24.11  osd monitoring
cephstorage-1.redhat.local  192.168.24.12  osd monitoring
cephstorage-2.redhat.local  192.168.24.47  osd monitoring
controller-0.redhat.local   192.168.24.35  _admin mon mgr monitoring
controller-1.redhat.local   192.168.24.53  mon _admin mgr monitoring
controller-2.redhat.local   192.168.24.10  mon _admin mgr monitoring

Remove the labels from the Controller nodes:

$ for i in 0 1 2; do ceph orch host label rm "controller-$i.redhat.local" monitoring; done

Removed label monitoring from host controller-0.redhat.local
Removed label monitoring from host controller-1.redhat.local
Removed label monitoring from host controller-2.redhat.local

Dump the current monitoring stack spec:

function export_spec {
    local component="$1"
    local target_dir="$2"
    sudo cephadm shell -- ceph orch ls --export "$component" > "$target_dir/$component"
}

SPEC_DIR=${SPEC_DIR:-"$PWD/ceph_specs"}
for m in grafana prometheus alertmanager; do
    export_spec "$m" "$SPEC_DIR"
done

For each daemon, edit the current spec and replace the placement/hosts section with the placement/label section, for example:
```
service_type: grafana
service_name: grafana
placement:
  label: monitoring
networks:
- 172.17.3.0/24
spec:
  port: 3100
```
The same procedure applies to Prometheus and Alertmanager specs.

Apply the new monitoring spec to relocate the monitoring stack daemons:

SPEC_DIR=${SPEC_DIR:-"$PWD/ceph_specs"}
function migrate_daemon {
    local component="$1"
    local target_dir="$2"
    sudo cephadm shell -m "$target_dir" -- ceph orch apply -i /mnt/ceph_specs/$component
}
for m in grafana prometheus alertmanager; do
    migrate_daemon  "$m" "$SPEC_DIR"
done

Verify that the daemons are deployed on the expected nodes:
```
[ceph: root@controller-0 /]# ceph orch ps | grep -iE "(prome|alert|grafa)"
alertmanager.cephstorage-2  cephstorage-2.redhat.local  172.17.3.144:9093,9094
grafana.cephstorage-0       cephstorage-0.redhat.local  172.17.3.83:3100
prometheus.cephstorage-1    cephstorage-1.redhat.local  172.17.3.53:9092
```
Note
After you migrate the monitoring stack, you lose High Availability: the monitoring stack daemons have no VIP and HAproxy anymore; Node exporters are still running on all the nodes: instead of using labels we keep the current approach as we want to not reduce the monitoring space covered.

You must review the Red Hat Ceph Storageconfiguration to ensure that it is aligned with the relocation you just made. In particular, focus on the following configuration entries:

[ceph: root@controller-0 /]# ceph config dump
...
mgr  advanced  mgr/dashboard/ALERTMANAGER_API_HOST  http://172.17.3.83:9093
mgr  advanced  mgr/dashboard/GRAFANA_API_URL        https://172.17.3.144:3100
mgr  advanced  mgr/dashboard/PROMETHEUS_API_HOST    http://172.17.3.83:9092
mgr  advanced  mgr/dashboard/controller-0.ycokob/server_addr  172.17.3.33
mgr  advanced  mgr/dashboard/controller-1.lmzpuc/server_addr  172.17.3.147
mgr  advanced  mgr/dashboard/controller-2.xpdgfl/server_addr  172.17.3.138

Verify that grafana, alertmanager and prometheus API_HOST/URL point to the IP addresses (on the storage network) of the node where each daemon has been relocated. This should be automatically addressed by cephadm and it shouldn’t require any manual action.

[ceph: root@controller-0 /]# ceph orch ps | grep -iE "(prome|alert|grafa)"
alertmanager.cephstorage-0  cephstorage-0.redhat.local  172.17.3.83:9093,9094
alertmanager.cephstorage-1  cephstorage-1.redhat.local  172.17.3.53:9093,9094
alertmanager.cephstorage-2  cephstorage-2.redhat.local  172.17.3.144:9093,9094
grafana.cephstorage-0       cephstorage-0.redhat.local  172.17.3.83:3100
grafana.cephstorage-1       cephstorage-1.redhat.local  172.17.3.53:3100
grafana.cephstorage-2       cephstorage-2.redhat.local  172.17.3.144:3100
prometheus.cephstorage-0    cephstorage-0.redhat.local  172.17.3.83:9092
prometheus.cephstorage-1    cephstorage-1.redhat.local  172.17.3.53:9092
prometheus.cephstorage-2    cephstorage-2.redhat.local  172.17.3.144:9092

[ceph: root@controller-0 /]# ceph config dump
...
...
mgr  advanced  mgr/dashboard/ALERTMANAGER_API_HOST   http://172.17.3.83:9093
mgr  advanced  mgr/dashboard/PROMETHEUS_API_HOST     http://172.17.3.83:9092
mgr  advanced  mgr/dashboard/GRAFANA_API_URL         https://172.17.3.144:3100

The Ceph Dashboard (mgr module plugin) has not been impacted at all by this relocation. The service is provided by the Ceph Manager daemon, hence we might experience an impact when the active mgr is migrated or is force-failed. However, having three replicas definition allows to redirect requests to a different instance (it’s still an A/P model), hence the impact should be limited.
1. When the RBD migration is over, the following Red Hat Ceph Storage config keys must be regenerated to point to the right mgr container:
```
mgr    advanced  mgr/dashboard/controller-0.ycokob/server_addr  172.17.3.33
mgr    advanced  mgr/dashboard/controller-1.lmzpuc/server_addr  172.17.3.147
mgr    advanced  mgr/dashboard/controller-2.xpdgfl/server_addr  172.17.3.138
```
```
$ sudo cephadm shell
$ ceph orch ps | awk '/mgr./ {print $1}'
```
2. For each retrieved mgr, update the entry in the Red Hat Ceph Storage configuration:
```
$ ceph config set mgr mgr/dashboard/<>/server_addr/<ip addr>
```

此内容没有您所选择的语言版本。

Chapter 9. Migrating the monitoring stack component to new nodes within an existing Red Hat Ceph Storage cluster

9.1. Completing prerequisites for a Red Hat Ceph Storage cluster with monitoring stack components

9.2. Migrating the monitoring stack to the target nodes

9.2.1. Scenario 1: Migrating the existing daemons to the target nodes

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Red Hat legal and privacy links

Red Hat legal and privacy links