1.2. Configuring the monitoring stack

1.2.1. Prerequisites
复制链接

The monitoring stack imposes additional resource requirements. Consult the computing resources recommendations in Scaling the Cluster Monitoring Operator and verify that you have sufficient resources.

1.2.2. Maintenance and support
复制链接

The supported way of configuring OpenShift Container Platform Monitoring is by configuring it using the options described in this document. Do not use other configurations, as they are unsupported. Configuration paradigms might change across Prometheus releases, and such cases can only be handled gracefully if all configuration possibilities are controlled. If you use configurations other than those described in this section, your changes will disappear because the cluster-monitoring-operator reconciles any differences. The operator reverses everything to the defined state by default and by design.

Explicitly unsupported cases include:

Creating additional ServiceMonitor objects in the openshift-* namespaces. This extends the targets the cluster monitoring Prometheus instance scrapes, which can cause collisions and load differences that cannot be accounted for. These factors might make the Prometheus setup unstable.
Creating unexpected ConfigMap objects or PrometheusRule objects. This causes the cluster monitoring Prometheus instance to include additional alerting and recording rules.
Modifying resources of the stack. The Prometheus Monitoring Stack ensures its resources are always in the state it expects them to be. If they are modified, the stack will reset them.
Using resources of the stack for your purposes. The resources created by the Prometheus Cluster Monitoring stack are not meant to be used by any other resources, as there are no guarantees about their backward compatibility.
Stopping the Cluster Monitoring Operator from reconciling the monitoring stack.
Adding new alerting rules.
Modifying the monitoring stack Grafana instance.

1.2.3. Creating a cluster monitoring config map
复制链接

To configure the OpenShift Container Platform monitoring stack, you must create the cluster monitoring ConfigMap object.

Prerequisites

You have access to the cluster as a user with the cluster-admin role.
You have installed the OpenShift CLI (oc).

Procedure

Check whether the cluster-monitoring-config ConfigMap object exists:

oc -n openshift-monitoring get configmap cluster-monitoring-config

$ oc -n openshift-monitoring get configmap cluster-monitoring-config

Copy to Clipboard

Toggle word wrap

If the ConfigMap object does not exist:

Create the following YAML manifest. In this example the file is called cluster-monitoring-config.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |

Copy to Clipboard

Toggle word wrap

Apply the configuration to create the ConfigMap object:
```
oc apply -f cluster-monitoring-config.yaml
```
```
$ oc apply -f cluster-monitoring-config.yaml
```
Copy to Clipboard Toggle word wrap

1.2.4. Configuring the cluster monitoring stack
复制链接

You can configure the Prometheus Cluster Monitoring stack using config maps. Config maps configure the Cluster Monitoring Operator, which in turn configures components of the stack.

Prerequisites

You have access to the cluster as a user with the cluster-admin role.
You have installed the OpenShift CLI (oc).
You have created the cluster-monitoring-config ConfigMap object.

Procedure

Start editing the cluster-monitoring-config ConfigMap object:

oc -n openshift-monitoring edit configmap cluster-monitoring-config

$ oc -n openshift-monitoring edit configmap cluster-monitoring-config

Copy to Clipboard

Toggle word wrap

Put your configuration under data/config.yaml as key-value pair <component_name>: <component_configuration>:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    <component>:
      <configuration_for_the_component>

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    <component>:
      <configuration_for_the_component>

Copy to Clipboard

Toggle word wrap

Substitute <component> and <configuration_for_the_component> accordingly.

For example, create this ConfigMap object to configure a Persistent Volume Claim (PVC) for Prometheus:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      volumeClaimTemplate: spec: storageClassName: fast volumeMode: Filesystem resources: requests: storage: 40Gi

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      volumeClaimTemplate: spec: storageClassName: fast volumeMode: Filesystem resources: requests: storage: 40Gi

Copy to Clipboard

Toggle word wrap

Here, prometheusK8s defines the Prometheus component and the following lines define its configuration.

Save the file to apply the changes. The pods affected by the new configuration are restarted automatically.

Additional resources

See Creating a cluster monitoring config map to learn how to create the cluster-monitoring-config ConfigMap object.

1.2.5. Configurable monitoring components
复制链接

This table shows the monitoring components you can configure and the keys used to specify the components in the config map:

Expand

表 1.2. Configurable monitoring components
Component	Key
Prometheus Operator	`prometheusOperator`
Prometheus	`prometheusK8s`
Alertmanager	`alertmanagerMain`
kube-state-metrics	`kubeStateMetrics`
openshift-state-metrics	`openshiftStateMetrics`
Grafana	`grafana`
Telemeter Client	`telemeterClient`
Prometheus Adapter	`k8sPrometheusAdapter`
Thanos Querier	`thanosQuerier`

From this list, only Prometheus and Alertmanager have extensive configuration options. All other components usually provide only the nodeSelector field for being deployed on a specified node.

1.2.6. Moving monitoring components to different nodes
复制链接

You can move any of the monitoring stack components to specific nodes.

Prerequisites

You have access to the cluster as a user with the cluster-admin role.
You have installed the OpenShift CLI (oc).
You have created the cluster-monitoring-config ConfigMap object.

Procedure

Start editing the cluster-monitoring-config ConfigMap object:

oc -n openshift-monitoring edit configmap cluster-monitoring-config

$ oc -n openshift-monitoring edit configmap cluster-monitoring-config

Copy to Clipboard

Toggle word wrap

Specify the nodeSelector constraint for the component under data/config.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    <component>:
      nodeSelector:
        <node_key>: <node_value>
        <node_key>: <node_value>
        <...>

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    <component>:
      nodeSelector:
        <node_key>: <node_value>
        <node_key>: <node_value>
        <...>

Copy to Clipboard

Toggle word wrap

Substitute <component> accordingly and substitute <node_key>: <node_value> with the map of key-value pairs that specifies the destination node. Often, only a single key-value pair is used.

The component can only run on a node that has each of the specified key-value pairs as labels. The node can have additional labels as well.

For example, to move components to the node that is labeled foo: bar, use:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusOperator: nodeSelector: foo: bar prometheusK8s: nodeSelector: foo: bar alertmanagerMain: nodeSelector: foo: bar kubeStateMetrics: nodeSelector: foo: bar grafana: nodeSelector: foo: bar telemeterClient: nodeSelector: foo: bar k8sPrometheusAdapter: nodeSelector: foo: bar
    openshiftStateMetrics:
      nodeSelector:
        foo: bar
    thanosQuerier:
      nodeSelector:
        foo: bar

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusOperator: nodeSelector: foo: bar prometheusK8s: nodeSelector: foo: bar alertmanagerMain: nodeSelector: foo: bar kubeStateMetrics: nodeSelector: foo: bar grafana: nodeSelector: foo: bar telemeterClient: nodeSelector: foo: bar k8sPrometheusAdapter: nodeSelector: foo: bar
    openshiftStateMetrics:
      nodeSelector:
        foo: bar
    thanosQuerier:
      nodeSelector:
        foo: bar

Copy to Clipboard

Toggle word wrap

Save the file to apply the changes. The components affected by the new configuration are moved to new nodes automatically.

Additional resources

See Creating a cluster monitoring config map to learn how to create the cluster-monitoring-config ConfigMap object.
See Placing pods on specific nodes using node selectors for more information about using node selectors.
See the Kubernetes documentation for details on the nodeSelector constraint.

1.2.7. Assigning tolerations to monitoring components
复制链接

You can assign tolerations to any of the monitoring stack components to enable moving them to tainted nodes.

Prerequisites

You have access to the cluster as a user with the cluster-admin role.
You have installed the OpenShift CLI (oc).
You have created the cluster-monitoring-config ConfigMap object.

Procedure

Start editing the cluster-monitoring-config ConfigMap object:

oc -n openshift-monitoring edit configmap cluster-monitoring-config

$ oc -n openshift-monitoring edit configmap cluster-monitoring-config

Copy to Clipboard

Toggle word wrap

Specify tolerations for the component:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    <component>:
      tolerations:
        <toleration_specification>

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    <component>:
      tolerations:
        <toleration_specification>

Copy to Clipboard

Toggle word wrap

Substitute <component> and <toleration_specification> accordingly.

For example, a oc adm taint nodes node1 key1=value1:NoSchedule taint prevents the scheduler from placing pods in the foo: bar node. To make the alertmanagerMain component ignore that taint and to place alertmanagerMain in foo: bar normally, use this toleration:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    alertmanagerMain:
      nodeSelector:
        foo: bar
      tolerations: - key: "key1" operator: "Equal" value: "value1" effect: "NoSchedule"

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    alertmanagerMain:
      nodeSelector:
        foo: bar
      tolerations: - key: "key1" operator: "Equal" value: "value1" effect: "NoSchedule"

Copy to Clipboard

Toggle word wrap

Save the file to apply the changes. The new component placement configuration is applied automatically.

Additional resources

See Creating a cluster monitoring config map to learn how to create the cluster-monitoring-config ConfigMap object.
See the OpenShift Container Platform documentation on taints and tolerations.
See the Kubernetes documentation on taints and tolerations.

1.2.8. Configuring persistent storage
复制链接

Running cluster monitoring with persistent storage means that your metrics are stored to a persistent volume (PV) and can survive a pod being restarted or recreated. This is ideal if you require your metrics or alerting data to be guarded from data loss. For production environments, it is highly recommended to configure persistent storage. Because of the high IO demands, it is advantageous to use local storage. If you use a local volume for persistent storage, do not use a raw block volume, which is described with volumeMode: block in the LocalVolume object. Elasticsearch cannot use raw block volumes.

重要

1.2.9. Prerequisites
复制链接

Dedicate sufficient local persistent storage to ensure that the disk does not become full. How much storage you need depends on the number of pods. For information on system requirements for persistent storage, see Prometheus database storage requirements.
Make sure you have a persistent volume (PV) ready to be claimed by the persistent volume claim (PVC), one PV for each replica. Because Prometheus has two replicas and Alertmanager has three replicas, you need five PVs to support the entire monitoring stack. The PVs should be available from the Local Storage Operator. This does not apply if you enable dynamically provisioned storage.
Use the block type of storage.
Configure local persistent storage.

1.2.9.1. Configuring a local persistent volume claim
复制链接

For the Prometheus or Alertmanager to use a persistent volume (PV), you first must configure a persistent volume claim (PVC).

Prerequisites

You have access to the cluster as a user with the cluster-admin role.
You have installed the OpenShift CLI (oc).
You have created the cluster-monitoring-config ConfigMap object.

Procedure

Edit the cluster-monitoring-config ConfigMap object:

oc -n openshift-monitoring edit configmap cluster-monitoring-config

$ oc -n openshift-monitoring edit configmap cluster-monitoring-config

Copy to Clipboard

Toggle word wrap

Put your PVC configuration for the component under data/config.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    <component>:
      volumeClaimTemplate:
        spec:
          storageClassName: <storage_class>
          resources:
            requests:
              storage: <amount_of_storage>

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    <component>:
      volumeClaimTemplate:
        spec:
          storageClassName: <storage_class>
          resources:
            requests:
              storage: <amount_of_storage>

Copy to Clipboard

Toggle word wrap

See the Kubernetes documentation on PersistentVolumeClaims for information on how to specify volumeClaimTemplate.

For example, to configure a PVC that claims local persistent storage for Prometheus, use:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      volumeClaimTemplate:
        spec:
          storageClassName: local-storage
          resources:
            requests:
              storage: 40Gi

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      volumeClaimTemplate:
        spec:
          storageClassName: local-storage
          resources:
            requests:
              storage: 40Gi

Copy to Clipboard

Toggle word wrap

In the above example, the storage class created by the Local Storage Operator is called local-storage.

To configure a PVC that claims local persistent storage for Alertmanager, use:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    alertmanagerMain:
      volumeClaimTemplate:
        spec:
          storageClassName: local-storage
          resources:
            requests:
              storage: 40Gi

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    alertmanagerMain:
      volumeClaimTemplate:
        spec:
          storageClassName: local-storage
          resources:
            requests:
              storage: 40Gi

Copy to Clipboard

Toggle word wrap

Save the file to apply the changes. The pods affected by the new configuration are restarted automatically and the new storage configuration is applied.

1.2.9.2. Modifying retention time for Prometheus metrics data
复制链接

By default, the Prometheus Cluster Monitoring stack configures the retention time for Prometheus data to be 15 days. You can modify the retention time to change how soon the data is deleted.

Prerequisites

You have access to the cluster as a user with the cluster-admin role.
You have installed the OpenShift CLI (oc).
You have created the cluster-monitoring-config ConfigMap object.

Procedure

Start editing the cluster-monitoring-config ConfigMap object:

oc -n openshift-monitoring edit configmap cluster-monitoring-config

$ oc -n openshift-monitoring edit configmap cluster-monitoring-config

Copy to Clipboard

Toggle word wrap

Put your retention time configuration under data/config.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      retention: <time_specification>

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      retention: <time_specification>

Copy to Clipboard

Toggle word wrap

Substitute <time_specification> with a number directly followed by ms (milliseconds), s (seconds), m (minutes), h (hours), d (days), w (weeks), or y (years).

For example, to configure retention time to be 24 hours, use:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      retention: 24h

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      retention: 24h

Copy to Clipboard

Toggle word wrap

Save the file to apply the changes. The pods affected by the new configuration are restarted automatically.

Additional resources

See Creating a cluster monitoring config map to learn how to create the cluster-monitoring-config ConfigMap object.
Understanding persistent storage
Optimizing storage

1.2.10. Configuring Alertmanager
复制链接

The Prometheus Alertmanager is a component that manages incoming alerts, including:

Alert silencing
Alert inhibition
Alert aggregation
Reliable deduplication of alerts
Grouping alerts
Sending grouped alerts as notifications through receivers such as email, PagerDuty, and HipChat

1.2.10.1. Alertmanager default configuration
复制链接

The default configuration of the OpenShift Container Platform Monitoring Alertmanager cluster is this:

global:
  resolve_timeout: 5m
route:
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h
  receiver: default
  routes:
  - match:
      alertname: Watchdog
    repeat_interval: 5m
    receiver: watchdog
receivers:
- name: default
- name: watchdog

global:
  resolve_timeout: 5m
route:
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h
  receiver: default
  routes:
  - match:
      alertname: Watchdog
    repeat_interval: 5m
    receiver: watchdog
receivers:
- name: default
- name: watchdog

Copy to Clipboard

Toggle word wrap

OpenShift Container Platform monitoring ships with the Watchdog alert, which fires continuously. Alertmanager repeatedly sends notifications for the Watchdog alert to the notification provider, for example, to PagerDuty. The provider is usually configured to notify the administrator when it stops receiving the Watchdog alert. This mechanism helps ensure continuous operation of Prometheus as well as continuous communication between Alertmanager and the notification provider.

1.2.10.2. Applying custom Alertmanager configuration
复制链接

You can overwrite the default Alertmanager configuration by editing the alertmanager-main secret inside the openshift-monitoring namespace.

Prerequisites

An installed jq tool for processing JSON data

Procedure

Print the currently active Alertmanager configuration into file alertmanager.yaml:

oc -n openshift-monitoring get secret alertmanager-main --template='{{ index .data "alertmanager.yaml" }}' | base64 --decode > alertmanager.yaml

$ oc -n openshift-monitoring get secret alertmanager-main --template='{{ index .data "alertmanager.yaml" }}' | base64 --decode > alertmanager.yaml

Copy to Clipboard

Toggle word wrap

Change the configuration in file alertmanager.yaml to your new configuration:

global:
  resolve_timeout: 5m
route:
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h
  receiver: default
  routes:
  - match:
      alertname: Watchdog
    repeat_interval: 5m
    receiver: watchdog
  - match:
      service: <your_service> 
    routes:
    - match:
        <your_matching_rules> 
      receiver: <receiver> 
receivers:
- name: default
- name: watchdog
- name: <receiver>
  <receiver_configuration>

global:
  resolve_timeout: 5m
route:
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h
  receiver: default
  routes:
  - match:
      alertname: Watchdog
    repeat_interval: 5m
    receiver: watchdog
  - match:
      service: <your_service>

1


    routes:
    - match:
        <your_matching_rules>

2


      receiver: <receiver>

3


receivers:
- name: default
- name: watchdog
- name: <receiver>
  <receiver_configuration>

Copy to Clipboard

Toggle word wrap

1: service specifies the service that fires the alerts.
2: <your_matching_rules> specify the target alerts.
3: receiver specifies the receiver to use for the alert.

For example, this listing configures PagerDuty for notifications:

global:
  resolve_timeout: 5m
route:
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h
  receiver: default
  routes:
  - match:
      alertname: Watchdog
    repeat_interval: 5m
    receiver: watchdog
  - match: service: example-app routes: - match: severity: critical receiver: team-frontend-page
receivers:
- name: default
- name: watchdog
- name: team-frontend-page pagerduty_configs: - service_key: "your-key"

global:
  resolve_timeout: 5m
route:
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h
  receiver: default
  routes:
  - match:
      alertname: Watchdog
    repeat_interval: 5m
    receiver: watchdog
  - match: service: example-app routes: - match: severity: critical receiver: team-frontend-page
receivers:
- name: default
- name: watchdog
- name: team-frontend-page pagerduty_configs: - service_key: "your-key"

Copy to Clipboard

Toggle word wrap

With this configuration, alerts of critical severity fired by the example-app service are sent using the team-frontend-page receiver, which means that these alerts are paged to a chosen person.

Apply the new configuration in the file:

oc -n openshift-monitoring create secret generic alertmanager-main --from-file=alertmanager.yaml --dry-run -o=yaml |  oc -n openshift-monitoring replace secret --filename=-

$ oc -n openshift-monitoring create secret generic alertmanager-main --from-file=alertmanager.yaml --dry-run -o=yaml |  oc -n openshift-monitoring replace secret --filename=-

Copy to Clipboard

Toggle word wrap

Additional resources

See the PagerDuty official site for more information on PagerDuty.
See the PagerDuty Prometheus Integration Guide to learn how to retrieve the service_key.
See Alertmanager configuration for configuring alerting through different alert receivers.

1.2.10.3. Alerting rules
复制链接

OpenShift Container Platform Cluster Monitoring by default ships with a set of pre-defined alerting rules.

Note that:

The default alerting rules are used specifically for the OpenShift Container Platform cluster and nothing else. For example, you get alerts for a persistent volume in the cluster, but you do not get them for persistent volume in your custom namespace.
Currently you cannot add custom alerting rules.
Some alerting rules have identical names. This is intentional. They are sending alerts about the same event with different thresholds, with different severity, or both.
With the inhibition rules, the lower severity is inhibited when the higher severity is firing.

1.2.10.4. Listing acting alerting rules
复制链接

You can list the alerting rules that currently apply to the cluster.

Procedure

Configure the necessary port forwarding:

oc -n openshift-monitoring port-forward svc/prometheus-operated 9090

$ oc -n openshift-monitoring port-forward svc/prometheus-operated 9090

Copy to Clipboard

Toggle word wrap

Fetch the JSON object containing acting alerting rules and their properties:

curl -s http://localhost:9090/api/v1/rules | jq '[.data.groups[].rules[] | select(.type=="alerting")]'

$ curl -s http://localhost:9090/api/v1/rules | jq '[.data.groups[].rules[] | select(.type=="alerting")]'

Copy to Clipboard

Toggle word wrap

Example output

[
  {
    "name": "ClusterOperatorDown",
    "query": "cluster_operator_up{job=\"cluster-version-operator\"} == 0",
    "duration": 600,
    "labels": {
      "severity": "critical"
    },
    "annotations": {
      "message": "Cluster operator {{ $labels.name }} has not been available for 10 mins. Operator may be down or disabled, cluster will not be kept up to date and upgrades will not be possible."
    },
    "alerts": [],
    "health": "ok",
    "type": "alerting"
  },
  {
    "name": "ClusterOperatorDegraded",
    ...

[
  {
    "name": "ClusterOperatorDown",
    "query": "cluster_operator_up{job=\"cluster-version-operator\"} == 0",
    "duration": 600,
    "labels": {
      "severity": "critical"
    },
    "annotations": {
      "message": "Cluster operator {{ $labels.name }} has not been available for 10 mins. Operator may be down or disabled, cluster will not be kept up to date and upgrades will not be possible."
    },
    "alerts": [],
    "health": "ok",
    "type": "alerting"
  },
  {
    "name": "ClusterOperatorDegraded",
    ...

Copy to Clipboard

Toggle word wrap

Additional resources

See also the Alertmanager documentation.

1.2.11. Attaching additional labels to your time series and alerts
复制链接

Using the external labels feature of Prometheus, you can attach additional custom labels to all time series and alerts leaving the Prometheus cluster.

Prerequisites

You have access to the cluster as a user with the cluster-admin role.
You have installed the OpenShift CLI (oc).
You have created the cluster-monitoring-config ConfigMap object.

Procedure

Start editing the cluster-monitoring-config ConfigMap:

oc -n openshift-monitoring edit configmap cluster-monitoring-config

$ oc -n openshift-monitoring edit configmap cluster-monitoring-config

Copy to Clipboard

Toggle word wrap

Define a map of labels you want to add for every metric under data/config.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      externalLabels:
        <key>: <value>

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      externalLabels:
        <key>: <value>

1

Copy to Clipboard

Toggle word wrap

1: Substitute <key>: <value> with a map of key-value pairs where <key> is a unique name of the new label and <value> is its value.

警告

Do not use prometheus or prometheus_replica as key names, because they are reserved and would be overwritten.

For example, to add metadata about the region and environment to all time series and alerts, use:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      externalLabels:
        region: eu
        environment: prod

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      externalLabels:
        region: eu
        environment: prod

Copy to Clipboard

Toggle word wrap

Save the file to apply the changes. The new configuration is applied automatically.

Additional resources

See Creating a cluster monitoring config map to learn how to create the cluster-monitoring-config ConfigMap object.

1.2.12. Next steps
复制链接

Manage cluster alerts.
Learn about remote health reporting and, if necessary, opt out of it.

1.2.1. Prerequisites
复制链接

1.2.2. Maintenance and support
复制链接

1.2.3. Creating a cluster monitoring config map
复制链接

1.2.4. Configuring the cluster monitoring stack
复制链接

1.2.5. Configurable monitoring components
复制链接

1.2.6. Moving monitoring components to different nodes
复制链接

1.2.7. Assigning tolerations to monitoring components
复制链接

1.2.8. Configuring persistent storage
复制链接

1.2.9. Prerequisites
复制链接

1.2.9.1. Configuring a local persistent volume claim
复制链接

1.2.9.2. Modifying retention time for Prometheus metrics data
复制链接

1.2.10. Configuring Alertmanager
复制链接

1.2.10.1. Alertmanager default configuration
复制链接

1.2.10.2. Applying custom Alertmanager configuration
复制链接

1.2.10.3. Alerting rules
复制链接

1.2.10.4. Listing acting alerting rules
复制链接

1.2.11. Attaching additional labels to your time series and alerts
复制链接

1.2.12. Next steps
复制链接

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

1.2. Configuring the monitoring stack

1.2.1. Prerequisites复制链接链接已复制到粘贴板!

1.2.2. Maintenance and support复制链接链接已复制到粘贴板!

1.2.3. Creating a cluster monitoring config map复制链接链接已复制到粘贴板!

1.2.4. Configuring the cluster monitoring stack复制链接链接已复制到粘贴板!

1.2.5. Configurable monitoring components复制链接链接已复制到粘贴板!

1.2.6. Moving monitoring components to different nodes复制链接链接已复制到粘贴板!

1.2.7. Assigning tolerations to monitoring components复制链接链接已复制到粘贴板!

1.2.8. Configuring persistent storage复制链接链接已复制到粘贴板!

1.2.9. Prerequisites复制链接链接已复制到粘贴板!

1.2.9.1. Configuring a local persistent volume claim复制链接链接已复制到粘贴板!

1.2.9.2. Modifying retention time for Prometheus metrics data复制链接链接已复制到粘贴板!

1.2.10. Configuring Alertmanager复制链接链接已复制到粘贴板!

1.2.10.1. Alertmanager default configuration复制链接链接已复制到粘贴板!

1.2.10.2. Applying custom Alertmanager configuration复制链接链接已复制到粘贴板!

1.2.10.3. Alerting rules复制链接链接已复制到粘贴板!

1.2.10.4. Listing acting alerting rules复制链接链接已复制到粘贴板!

1.2.11. Attaching additional labels to your time series and alerts复制链接链接已复制到粘贴板!

1.2.12. Next steps复制链接链接已复制到粘贴板!

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

1.2.1. Prerequisites
复制链接

1.2.2. Maintenance and support
复制链接

1.2.3. Creating a cluster monitoring config map
复制链接

1.2.4. Configuring the cluster monitoring stack
复制链接

1.2.5. Configurable monitoring components
复制链接

1.2.6. Moving monitoring components to different nodes
复制链接

1.2.7. Assigning tolerations to monitoring components
复制链接

1.2.8. Configuring persistent storage
复制链接

1.2.9. Prerequisites
复制链接

1.2.9.1. Configuring a local persistent volume claim
复制链接

1.2.9.2. Modifying retention time for Prometheus metrics data
复制链接

1.2.10. Configuring Alertmanager
复制链接

1.2.10.1. Alertmanager default configuration
复制链接

1.2.10.2. Applying custom Alertmanager configuration
复制链接

1.2.10.3. Alerting rules
复制链接

1.2.10.4. Listing acting alerting rules
复制链接

1.2.11. Attaching additional labels to your time series and alerts
复制链接

1.2.12. Next steps
复制链接