Chapter 2. Getting started

2.1. Maintenance and support for monitoring
Copy link

Not all configuration options for the monitoring stack are exposed. The only supported way of configuring OpenShift Container Platform monitoring is by configuring the Cluster Monitoring Operator (CMO) using the options described in the Config map reference for the Cluster Monitoring Operator. Do not use other configurations, as they are unsupported.

Configuration paradigms might change across Prometheus releases, and such cases can only be handled gracefully if all configuration possibilities are controlled. If you use configurations other than those described in the Config map reference for the Cluster Monitoring Operator, your changes will disappear because the CMO automatically reconciles any differences and resets any unsupported changes back to the originally defined state by default and by design.

2.1.1. Support considerations for monitoring
Copy link

Note

Backward compatibility for metrics, recording rules, or alerting rules is not guaranteed.

The following modifications are explicitly not supported:

Creating additional ServiceMonitor, PodMonitor, and PrometheusRule objects in the openshift-* and kube-* projects.
Modifying any resources or objects deployed in the openshift-monitoring or openshift-user-workload-monitoring projects. The resources created by the OpenShift Container Platform monitoring stack are not meant to be used by any other resources, as there are no guarantees about their backward compatibility.
Note
The Alertmanager configuration is deployed as the alertmanager-main secret resource in the openshift-monitoring namespace. If you have enabled a separate Alertmanager instance for user-defined alert routing, an Alertmanager configuration is also deployed as the alertmanager-user-workload secret resource in the openshift-user-workload-monitoring namespace. To configure additional routes for any instance of Alertmanager, you need to decode, modify, and then encode that secret. This procedure is a supported exception to the preceding statement.
Modifying resources of the stack. The OpenShift Container Platform monitoring stack ensures its resources are always in the state it expects them to be. If they are modified, the stack will reset them.
Deploying user-defined workloads to openshift-*, and kube-* projects. These projects are reserved for Red Hat provided components and they should not be used for user-defined workloads.
Enabling symptom based monitoring by using the Probe custom resource definition (CRD) in Prometheus Operator.
Manually deploying monitoring resources into namespaces that have the openshift.io/cluster-monitoring: "true" label.
Adding the openshift.io/cluster-monitoring: "true" label to namespaces. This label is reserved only for the namespaces with core OpenShift Container Platform components and Red Hat certified components.
Installing custom Prometheus instances on OpenShift Container Platform. A custom instance is a Prometheus custom resource (CR) managed by the Prometheus Operator.

2.1.2. Support policy for monitoring Operators
Copy link

Monitoring Operators ensure that OpenShift Container Platform monitoring resources function as designed and tested. If Cluster Version Operator (CVO) control of an Operator is overridden, the Operator does not respond to configuration changes, reconcile the intended state of cluster objects, or receive updates.

While overriding CVO control for an Operator can be helpful during debugging, this is unsupported and the cluster administrator assumes full control of the individual component configurations and upgrades.

Overriding the Cluster Version Operator

The spec.overrides parameter can be added to the configuration for the CVO to allow administrators to provide a list of overrides to the behavior of the CVO for a component. Setting the spec.overrides[].unmanaged parameter to true for a component blocks cluster upgrades and alerts the administrator after a CVO override has been set:

Disabling ownership via cluster version overrides prevents upgrades. Please remove overrides before continuing.

Disabling ownership via cluster version overrides prevents upgrades. Please remove overrides before continuing.

Copy to Clipboard

Toggle word wrap

Warning

Setting a CVO override puts the entire cluster in an unsupported state and prevents the monitoring stack from being reconciled to its intended state. This impacts the reliability features built into Operators and prevents updates from being received. Reported issues must be reproduced after removing any overrides for support to proceed.

2.1.3. Support version matrix for monitoring components
Copy link

The following matrix contains information about versions of monitoring components for OpenShift Container Platform 4.12 and later releases:

Expand

Table 2.1. OpenShift Container Platform and component versions
OpenShift Container Platform	Prometheus Operator	Prometheus	Metrics Server	Alertmanager	kube-state-metrics agent	monitoring-plugin	node-exporter agent	Thanos
4.18	0.78.1	2.55.1	0.7.2	0.27.0	2.13.0	1.0.0	1.8.2	0.36.1
4.17	0.75.2	2.53.1	0.7.1	0.27.0	2.13.0	1.0.0	1.8.2	0.35.1
4.16	0.73.2	2.52.0	0.7.1	0.26.0	2.12.0	1.0.0	1.8.0	0.35.0
4.15	0.70.0	2.48.0	0.6.4	0.26.0	2.10.1	1.0.0	1.7.0	0.32.5
4.14	0.67.1	2.46.0	N/A	0.25.0	2.9.2	1.0.0	1.6.1	0.30.2
4.13	0.63.0	2.42.0	N/A	0.25.0	2.8.1	N/A	1.5.0	0.30.2
4.12	0.60.1	2.39.1	N/A	0.24.0	2.6.0	N/A	1.4.0	0.28.1

Note

The openshift-state-metrics agent and Telemeter Client are OpenShift-specific components. Therefore, their versions correspond with the versions of OpenShift Container Platform.

2.2. Core platform monitoring first steps
Copy link

After OpenShift Container Platform is installed, core platform monitoring components immediately begin collecting metrics, which you can query and view. The default in-cluster monitoring stack includes the core platform Prometheus instance that collects metrics from your cluster and the core Alertmanager instance that routes alerts, among other components. Depending on who will use the monitoring stack and for what purposes, as a cluster administrator, you can further configure these monitoring components to suit the needs of different users in various scenarios.

2.2.1. Configuring core platform monitoring: Postinstallation steps
Copy link

After OpenShift Container Platform is installed, cluster administrators typically configure core platform monitoring to suit their needs. These activities include setting up storage and configuring options for Prometheus, Alertmanager, and other monitoring components.

Note

By default, in a newly installed OpenShift Container Platform system, users can query and view collected metrics. You need only configure an alert receiver if you want users to receive alert notifications. Any other configuration options listed here are optional.

Create the cluster-monitoring-config ConfigMap object if it does not exist.
Configure notifications for default platform alerts so that Alertmanager can send alerts to an external notification system such as email, Slack, or PagerDuty.
For shorter term data retention, configure persistent storage for Prometheus and Alertmanager to store metrics and alert data. Specify the metrics data retention parameters for Prometheus and Thanos Ruler.
Important
- In multi-node clusters, you must configure persistent storage for Prometheus, Alertmanager, and Thanos Ruler to ensure high availability.
- By default, in a newly installed OpenShift Container Platform system, the monitoring ClusterOperator resource reports a PrometheusDataPersistenceNotConfigured status message to remind you that storage is not configured.
For longer term data retention, configure the remote write feature to enable Prometheus to send ingested metrics to remote systems for storage.
Important
Be sure to add cluster ID labels to metrics for use with your remote write storage configuration.
Grant monitoring cluster roles to any non-administrator users that need to access certain monitoring features.
Assign tolerations to monitoring stack components so that administrators can move them to tainted nodes.
Set the body size limit for metrics collection to help avoid situations in which Prometheus consumes excessive amounts of memory when scraped targets return a response that contains a large amount of data.
Modify or create alerting rules for your cluster. These rules specify the conditions that trigger alerts, such as high CPU or memory usage, network latency, and so forth.
Specify resource limits and requests for monitoring components to ensure that the containers that run monitoring components have enough CPU and memory resources.

With the monitoring stack configured to suit your needs, Prometheus collects metrics from the specified services and stores these metrics according to your settings. You can go to the Observe pages in the OpenShift Container Platform web console to view and query collected metrics, manage alerts, identify performance bottlenecks, and scale resources as needed:

View dashboards to visualize collected metrics, troubleshoot alerts, and monitor other information about your cluster.
Query collected metrics by creating PromQL queries or using predefined queries.

2.3. User workload monitoring first steps
Copy link

As a cluster administrator, you can optionally enable monitoring for user-defined projects in addition to core platform monitoring. Non-administrator users such as developers can then monitor their own projects outside of core platform monitoring.

Cluster administrators typically complete the following activities to configure user-defined projects so that users can view collected metrics, query these metrics, and receive alerts for their own projects:

Enable user workload monitoring.
Grant non-administrator users permissions to monitor user-defined projects by assigning the monitoring-rules-view, monitoring-rules-edit, or monitoring-edit cluster roles.
Assign the user-workload-monitoring-config-edit role to grant non-administrator users permission to configure user-defined projects.
Enable alert routing for user-defined projects so that developers and other users can configure custom alerts and alert routing for their projects.
If needed, configure alert routing for user-defined projects to use an optional Alertmanager instance dedicated for use only by user-defined projects.
Configure notifications for user-defined alerts.
If you use the platform Alertmanager instance for user-defined alert routing, configure different alert receivers for default platform alerts and user-defined alerts.

2.4. Developer and non-administrator steps
Copy link

After monitoring for user-defined projects is enabled and configured, developers and other non-administrator users can then perform the following activities to set up and use monitoring for their own projects:

Deploy and monitor services.
Create and manage alerting rules.
Receive and manage alerts for your projects.
If granted the alert-routing-edit cluster role, configure alert routing.
View dashboards by using the OpenShift Container Platform web console.
Query the collected metrics by creating PromQL queries or using predefined queries.

2.1. Maintenance and support for monitoring
Copy link

2.1.1. Support considerations for monitoring
Copy link

2.1.2. Support policy for monitoring Operators
Copy link

2.1.3. Support version matrix for monitoring components
Copy link

2.2. Core platform monitoring first steps
Copy link

2.2.1. Configuring core platform monitoring: Postinstallation steps
Copy link

2.3. User workload monitoring first steps
Copy link

2.4. Developer and non-administrator steps
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 2. Getting started

2.1. Maintenance and support for monitoringCopy linkLink copied to clipboard!

2.1.1. Support considerations for monitoringCopy linkLink copied to clipboard!

2.1.2. Support policy for monitoring OperatorsCopy linkLink copied to clipboard!

2.1.3. Support version matrix for monitoring componentsCopy linkLink copied to clipboard!

2.2. Core platform monitoring first stepsCopy linkLink copied to clipboard!

2.2.1. Configuring core platform monitoring: Postinstallation stepsCopy linkLink copied to clipboard!

2.3. User workload monitoring first stepsCopy linkLink copied to clipboard!

2.4. Developer and non-administrator stepsCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

2.1. Maintenance and support for monitoring
Copy link

2.1.1. Support considerations for monitoring
Copy link

2.1.2. Support policy for monitoring Operators
Copy link

2.1.3. Support version matrix for monitoring components
Copy link

2.2. Core platform monitoring first steps
Copy link

2.2.1. Configuring core platform monitoring: Postinstallation steps
Copy link

2.3. User workload monitoring first steps
Copy link

2.4. Developer and non-administrator steps
Copy link