Chapter 2. Configuring performance and scalability for core platform monitoring
You can configure the monitoring stack to optimize the performance and scale of your clusters. The following documentation provides information about how to distribute the monitoring components and control the impact of the monitoring stack on CPU and memory resources.
2.1. Controlling the placement and distribution of monitoring components Copy linkLink copied to clipboard!
Control the placement and distribution of monitoring components across cluster nodes to optimize system resource use, improve performance, and separate workloads based on specific requirements or policies.
You can move the monitoring stack components to specific nodes with the following methods:
-
Use the
nodeSelectorconstraint with labeled nodes to move any of the monitoring stack components to specific nodes. - Assign tolerations to enable moving components to tainted nodes.
2.1.1. Moving monitoring components to different nodes Copy linkLink copied to clipboard!
Move monitoring stack components to specific nodes to optimize performance or meet hardware requirements, by configuring nodeSelector constraints in the cluster-monitoring-config config map to match labels assigned to the nodes.
You cannot add a node selector constraint directly to an existing scheduled pod.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admincluster role. -
You have created the
cluster-monitoring-configConfigMapobject. -
You have installed the OpenShift CLI (
oc).
Procedure
If you have not done so yet, add a label to the nodes on which you want to run the monitoring components:
$ oc label nodes <node_name> <node_label>where:
<node_name>- Specifies the name of the node where you want to add the label.
<node_label>- Specifies the name of the wanted label.
Edit the
cluster-monitoring-configConfigMapobject in theopenshift-monitoringproject:$ oc -n openshift-monitoring edit configmap cluster-monitoring-configSpecify the node labels for the
nodeSelectorconstraint for the component underdata/config.yaml:apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | # ... <component>: nodeSelector: <node_label_1> <node_label_2> # ...where:
<component>- Specifies the monitoring stack component.
<node_label_1>- Specifies the label you added to the node.
<node_label_2>- Optional: Specifies additional labels. If you specify additional labels, the pods for the component are only scheduled on the nodes that contain all of the specified labels.
NoteIf monitoring components remain in a
Pendingstate after configuring thenodeSelectorconstraint, check the pod events for errors relating to taints and tolerations.- Save the file to apply the changes. The components specified in the new configuration are automatically moved to the new nodes, and the pods affected by the new configuration are redeployed.
2.1.2. Assigning tolerations to monitoring components Copy linkLink copied to clipboard!
You can assign tolerations to any of the monitoring stack components to enable moving them to tainted nodes.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admincluster role. -
You have created the
cluster-monitoring-configConfigMapobject. -
You have installed the OpenShift CLI (
oc).
Procedure
Edit the
cluster-monitoring-configconfig map in theopenshift-monitoringproject:$ oc -n openshift-monitoring edit configmap cluster-monitoring-configSpecify
tolerationsfor the component:apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | <component>: tolerations: <toleration_specification>Substitute
<component>and<toleration_specification>accordingly.For example,
oc adm taint nodes node1 key1=value1:NoScheduleadds a taint tonode1with the keykey1and the valuevalue1. This prevents monitoring components from deploying pods onnode1unless a toleration is configured for that taint. The following example configures thealertmanagerMaincomponent to tolerate the example taint:apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | alertmanagerMain: tolerations: - key: "key1" operator: "Equal" value: "value1" effect: "NoSchedule"- Save the file to apply the changes. The pods affected by the new configuration are automatically redeployed.
2.2. Setting the body size limit for metrics scraping Copy linkLink copied to clipboard!
By default, no limit exists for the uncompressed body size for data returned from scraped metrics targets. You can set a body size limit to help avoid situations in which Prometheus consumes excessive amounts of memory when scraped targets return a response that contains a large amount of data.
In addition, by setting a body size limit, you can reduce the impact that a malicious target might have on Prometheus and on the cluster as a whole.
After you set a value for enforcedBodySizeLimit, the alert PrometheusScrapeBodySizeLimitHit fires when at least one Prometheus scrape target replies with a response body larger than the configured value.
If metrics data scraped from a target has an uncompressed body size exceeding the configured size limit, the scrape fails. Prometheus then considers this target to be down and sets its up metric value to 0, which can trigger the TargetDown alert.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admincluster role. -
You have installed the OpenShift CLI (
oc).
Procedure
Edit the
cluster-monitoring-configConfigMapobject in theopenshift-monitoringnamespace:$ oc -n openshift-monitoring edit configmap cluster-monitoring-configAdd a value for
enforcedBodySizeLimittodata/config.yaml/prometheusK8sto limit the body size that can be accepted per target scrape:apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: |- prometheusK8s: enforcedBodySizeLimit: 40MBwhere:
enforcedBodySizeLimitDefines the maximum body size for scraped metrics targets. This example limits the uncompressed size per target scrape to 40 megabytes.
Valid numeric values use the Prometheus data size format: B (bytes), KB (kilobytes), MB (megabytes), GB (gigabytes), TB (terabytes), PB (petabytes), and EB (exabytes).
The default value is
0, which specifies no limit. You can also set the value toautomaticto calculate the limit automatically based on cluster capacity.
- Save the file to apply the changes. The new configuration is applied automatically.
2.3. Managing CPU and memory resources for monitoring components Copy linkLink copied to clipboard!
Ensure that the containers that run monitoring components have enough CPU and memory resources by specifying values for resource limits and requests for those components.
Configure these limits and requests for core platform monitoring components in the openshift-monitoring namespace.
2.3.1. Specifying limits and requests Copy linkLink copied to clipboard!
Prevent resource exhaustion and ensure stable monitoring operations by setting appropriate CPU and memory limits for each monitoring component in the cluster-monitoring-config config map in the openshift-monitoring namespace.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admincluster role. -
You have created the
ConfigMapobject namedcluster-monitoring-config. -
You have installed the OpenShift CLI (
oc).
Procedure
Edit the
cluster-monitoring-configconfig map in theopenshift-monitoringproject:$ oc -n openshift-monitoring edit configmap cluster-monitoring-configAdd values to define resource limits and requests for each component you want to configure.
ImportantEnsure that the value set for a limit is always higher than the value set for a request. Otherwise, an error will occur, and the container will not run.
apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | alertmanagerMain: resources: limits: cpu: 500m memory: 1Gi requests: cpu: 200m memory: 500Mi prometheusK8s: resources: limits: cpu: 500m memory: 3Gi requests: cpu: 200m memory: 500Mi thanosQuerier: resources: limits: cpu: 500m memory: 1Gi requests: cpu: 200m memory: 500Mi prometheusOperator: resources: limits: cpu: 500m memory: 1Gi requests: cpu: 200m memory: 500Mi metricsServer: resources: requests: cpu: 10m memory: 50Mi limits: cpu: 50m memory: 500Mi kubeStateMetrics: resources: limits: cpu: 500m memory: 1Gi requests: cpu: 200m memory: 500Mi telemeterClient: resources: limits: cpu: 500m memory: 1Gi requests: cpu: 200m memory: 500Mi openshiftStateMetrics: resources: limits: cpu: 500m memory: 1Gi requests: cpu: 200m memory: 500Mi nodeExporter: resources: limits: cpu: 50m memory: 150Mi requests: cpu: 20m memory: 50Mi monitoringPlugin: resources: limits: cpu: 500m memory: 1Gi requests: cpu: 200m memory: 500Mi prometheusOperatorAdmissionWebhook: resources: limits: cpu: 50m memory: 100Mi requests: cpu: 20m memory: 50Mi- Save the file to apply the changes. The pods affected by the new configuration are automatically redeployed.
2.4. Choosing a metrics collection profile Copy linkLink copied to clipboard!
Metrics collection profile is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
Choose a metrics collection profile for core OpenShift Container Platform monitoring components to balance monitoring coverage with resource consumption by editing the cluster-monitoring-config config map in the openshift-monitoring project.
Prerequisites
-
You have installed the OpenShift CLI (
oc). -
You have enabled Technology Preview features by using the
FeatureGatecustom resource (CR). -
You have created the
cluster-monitoring-configConfigMapobject. -
You have access to the cluster as a user with the
cluster-admincluster role.
Procedure
Edit the
cluster-monitoring-configConfigMapobject in theopenshift-monitoringproject:$ oc -n openshift-monitoring edit configmap cluster-monitoring-configAdd the metrics collection profile setting under
data/config.yaml/prometheusK8s:apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | prometheusK8s: collectionProfile: <metrics_collection_profile_name>where:
<metrics_collection_profile_name>-
Specifies the name of the metrics collection profile. The available values are
fullorminimal. If you do not specify a value or if thecollectionProfilekey name does not exist in the config map, the default setting offullis used.
The following example sets the metrics collection profile to
minimalfor the core platform instance of Prometheus:apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | prometheusK8s: collectionProfile: minimal- Save the file to apply the changes. The new configuration is applied automatically.
2.5. Configuring pod topology spread constraints Copy linkLink copied to clipboard!
You can configure pod topology spread constraints for all the pods deployed by the Cluster Monitoring Operator to control how pod replicas are scheduled to nodes across zones.
This ensures that the pods are highly available and run more efficiently, because workloads are spread across nodes in different data centers or hierarchical infrastructure zones.
You can configure pod topology spread constraints for monitoring pods by using the cluster-monitoring-config config map.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admincluster role. -
You have created the
cluster-monitoring-configConfigMapobject. -
You have installed the OpenShift CLI (
oc).
Procedure
Edit the
cluster-monitoring-configconfig map in theopenshift-monitoringproject:$ oc -n openshift-monitoring edit configmap cluster-monitoring-configAdd the following settings under the
data/config.yamlfield to configure pod topology spread constraints:apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | <component>: topologySpreadConstraints: - maxSkew: <n> topologyKey: <key> whenUnsatisfiable: <value> labelSelector: <match_option>where:
<component>- Specifies a name of the component for which you want to set up pod topology spread constraints.
<n>-
Specifies a numeric value for
maxSkew, which defines the degree to which pods are allowed to be unevenly distributed. <key>-
Specifies a key of node labels for
topologyKey. Nodes that have a label with this key and identical values are considered to be in the same topology. The scheduler tries to put a balanced number of pods into each domain. <value>-
Specifies a value for
whenUnsatisfiable. Available options areDoNotScheduleandScheduleAnyway. SpecifyDoNotScheduleif you want themaxSkewvalue to define the maximum difference allowed between the number of matching pods in the target topology and the global minimum. SpecifyScheduleAnywayif you want the scheduler to still schedule the pod but to give higher priority to nodes that might reduce the skew. <match_option>-
Specifies
labelSelectorto find matching pods. Pods that match this label selector are counted to determine the number of pods in their corresponding topology domain.
Example configuration for Prometheus:
apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | prometheusK8s: topologySpreadConstraints: - maxSkew: 1 topologyKey: monitoring whenUnsatisfiable: DoNotSchedule labelSelector: matchLabels: app.kubernetes.io/name: prometheus- Save the file to apply the changes. The pods affected by the new configuration are automatically redeployed.