Questo contenuto non è disponibile nella lingua selezionata.
Chapter 2. Configuring the monitoring stack
The OpenShift Container Platform 4 installation program provides only a low number of configuration options before installation. Configuring most OpenShift Container Platform framework components, including the cluster monitoring stack, happens postinstallation.
This section explains what configuration is supported, shows how to configure the monitoring stack, and demonstrates several common configuration scenarios.
2.1. Prerequisites Copia collegamentoCollegamento copiato negli appunti!
- The monitoring stack imposes additional resource requirements. Consult the computing resources recommendations in Scaling the Cluster Monitoring Operator and verify that you have sufficient resources.
2.2. Maintenance and support for monitoring Copia collegamentoCollegamento copiato negli appunti!
The supported way of configuring OpenShift Container Platform Monitoring is by configuring it using the options described in this document. Do not use other configurations, as they are unsupported. Configuration paradigms might change across Prometheus releases, and such cases can only be handled gracefully if all configuration possibilities are controlled. If you use configurations other than those described in this section, your changes will disappear because the
cluster-monitoring-operator
2.2.1. Support considerations for monitoring Copia collegamentoCollegamento copiato negli appunti!
The following modifications are explicitly not supported:
-
Creating additional
ServiceMonitor,PodMonitor, andPrometheusRuleobjects in theopenshift-*andkube-*projects. Modifying any resources or objects deployed in the
openshift-monitoringoropenshift-user-workload-monitoringprojects. The resources created by the OpenShift Container Platform monitoring stack are not meant to be used by any other resources, as there are no guarantees about their backward compatibility.NoteThe Alertmanager configuration is deployed as a secret resource in the
namespace. If you have enabled a separate Alertmanager instance for user-defined alert routing, an Alertmanager configuration is also deployed as a secret resource in theopenshift-monitoringnamespace. To configure additional routes for any instance of Alertmanager, you need to decode, modify, and then encode that secret. This procedure is a supported exception to the preceding statement.openshift-user-workload-monitoring- Modifying resources of the stack. The OpenShift Container Platform monitoring stack ensures its resources are always in the state it expects them to be. If they are modified, the stack will reset them.
-
Deploying user-defined workloads to
openshift-*, andkube-*projects. These projects are reserved for Red Hat provided components and they should not be used for user-defined workloads. - Installing custom Prometheus instances on OpenShift Container Platform. A custom instance is a Prometheus custom resource (CR) managed by the Prometheus Operator.
-
Enabling symptom based monitoring by using the
Probecustom resource definition (CRD) in Prometheus Operator.
Backward compatibility for metrics, recording rules, or alerting rules is not guaranteed.
2.2.2. Support policy for monitoring Operators Copia collegamentoCollegamento copiato negli appunti!
Monitoring Operators ensure that OpenShift Container Platform monitoring resources function as designed and tested. If Cluster Version Operator (CVO) control of an Operator is overridden, the Operator does not respond to configuration changes, reconcile the intended state of cluster objects, or receive updates.
While overriding CVO control for an Operator can be helpful during debugging, this is unsupported and the cluster administrator assumes full control of the individual component configurations and upgrades.
Overriding the Cluster Version Operator
The
spec.overrides
spec.overrides[].unmanaged
true
Disabling ownership via cluster version overrides prevents upgrades. Please remove overrides before continuing.
Setting a CVO override puts the entire cluster in an unsupported state and prevents the monitoring stack from being reconciled to its intended state. This impacts the reliability features built into Operators and prevents updates from being received. Reported issues must be reproduced after removing any overrides for support to proceed.
2.3. Preparing to configure the monitoring stack Copia collegamentoCollegamento copiato negli appunti!
You can configure the monitoring stack by creating and updating monitoring config maps.
2.3.1. Creating a cluster monitoring config map Copia collegamentoCollegamento copiato negli appunti!
To configure core OpenShift Container Platform monitoring components, you must create the
cluster-monitoring-config
ConfigMap
openshift-monitoring
When you save your changes to the
cluster-monitoring-config
ConfigMap
openshift-monitoring
Prerequisites
-
You have access to the cluster as a user with the cluster role.
cluster-admin -
You have installed the OpenShift CLI ().
oc
Procedure
Check whether the
cluster-monitoring-configobject exists:ConfigMap$ oc -n openshift-monitoring get configmap cluster-monitoring-configIf the
object does not exist:ConfigMapCreate the following YAML manifest. In this example the file is called
:cluster-monitoring-config.yamlapiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: |Apply the configuration to create the
object:ConfigMap$ oc apply -f cluster-monitoring-config.yaml
2.3.2. Creating a user-defined workload monitoring config map Copia collegamentoCollegamento copiato negli appunti!
To configure the components that monitor user-defined projects, you must create the
user-workload-monitoring-config
ConfigMap
openshift-user-workload-monitoring
When you save your changes to the
user-workload-monitoring-config
ConfigMap
openshift-user-workload-monitoring
Prerequisites
-
You have access to the cluster as a user with the cluster role.
cluster-admin -
You have installed the OpenShift CLI ().
oc
Procedure
Check whether the
user-workload-monitoring-configobject exists:ConfigMap$ oc -n openshift-user-workload-monitoring get configmap user-workload-monitoring-configIf the
user-workload-monitoring-configobject does not exist:ConfigMapCreate the following YAML manifest. In this example the file is called
:user-workload-monitoring-config.yamlapiVersion: v1 kind: ConfigMap metadata: name: user-workload-monitoring-config namespace: openshift-user-workload-monitoring data: config.yaml: |Apply the configuration to create the
object:ConfigMap$ oc apply -f user-workload-monitoring-config.yamlNoteConfigurations applied to the
user-workload-monitoring-configobject are not activated unless a cluster administrator has enabled monitoring for user-defined projects.ConfigMap
2.4. Configuring the monitoring stack Copia collegamentoCollegamento copiato negli appunti!
In OpenShift Container Platform 4.11, you can configure the monitoring stack using the
cluster-monitoring-config
user-workload-monitoring-config
ConfigMap
Prerequisites
If you are configuring core OpenShift Container Platform monitoring components:
-
You have access to the cluster as a user with the cluster role.
cluster-admin -
You have created the
cluster-monitoring-configobject.ConfigMap
-
You have access to the cluster as a user with the
If you are configuring components that monitor user-defined projects:
-
You have access to the cluster as a user with the cluster role, or as a user with the
cluster-adminrole in theuser-workload-monitoring-config-editproject.openshift-user-workload-monitoring -
You have created the
user-workload-monitoring-configobject.ConfigMap
-
You have access to the cluster as a user with the
-
You have installed the OpenShift CLI ().
oc
Procedure
Edit the
object.ConfigMapTo configure core OpenShift Container Platform monitoring components:
Edit the
cluster-monitoring-configobject in theConfigMapproject:openshift-monitoring$ oc -n openshift-monitoring edit configmap cluster-monitoring-configAdd your configuration under
as a key-value pairdata/config.yaml:<component_name>: <component_configuration>apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | <component>: <configuration_for_the_component>Substitute
and<component>accordingly.<configuration_for_the_component>The following example
object configures a persistent volume claim (PVC) for Prometheus. This relates to the Prometheus instance that monitors core OpenShift Container Platform components only:ConfigMapapiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | prometheusK8s:1 volumeClaimTemplate: spec: storageClassName: fast volumeMode: Filesystem resources: requests: storage: 40Gi- 1
- Defines the Prometheus component and the subsequent lines define its configuration.
To configure components that monitor user-defined projects:
Edit the
user-workload-monitoring-configobject in theConfigMapproject:openshift-user-workload-monitoring$ oc -n openshift-user-workload-monitoring edit configmap user-workload-monitoring-configAdd your configuration under
as a key-value pairdata/config.yaml:<component_name>: <component_configuration>apiVersion: v1 kind: ConfigMap metadata: name: user-workload-monitoring-config namespace: openshift-user-workload-monitoring data: config.yaml: | <component>: <configuration_for_the_component>Substitute
and<component>accordingly.<configuration_for_the_component>The following example
object configures a data retention period and minimum container resource requests for Prometheus. This relates to the Prometheus instance that monitors user-defined projects only:ConfigMapapiVersion: v1 kind: ConfigMap metadata: name: user-workload-monitoring-config namespace: openshift-user-workload-monitoring data: config.yaml: | prometheus:1 retention: 24h2 resources: requests: cpu: 200m3 memory: 2Gi4 - 1
- Defines the Prometheus component and the subsequent lines define its configuration.
- 2
- Configures a twenty-four hour data retention period for the Prometheus instance that monitors user-defined projects.
- 3
- Defines a minimum resource request of 200 millicores for the Prometheus container.
- 4
- Defines a minimum pod resource request of 2 GiB of memory for the Prometheus container.
NoteThe Prometheus config map component is called
in theprometheusK8scluster-monitoring-configobject andConfigMapin theprometheususer-workload-monitoring-configobject.ConfigMap
Save the file to apply the changes to the
object. The pods affected by the new configuration are restarted automatically.ConfigMapNoteConfigurations applied to the
user-workload-monitoring-configobject are not activated unless a cluster administrator has enabled monitoring for user-defined projects.ConfigMapWarningWhen changes are saved to a monitoring config map, the pods and other resources in the related project might be redeployed. The running monitoring processes in that project might also be restarted.
2.5. Configurable monitoring components Copia collegamentoCollegamento copiato negli appunti!
This table shows the monitoring components you can configure and the keys used to specify the components in the
cluster-monitoring-config
user-workload-monitoring-config
ConfigMap
| Component | cluster-monitoring-config config map key | user-workload-monitoring-config config map key |
|---|---|---|
| Prometheus Operator |
|
|
| Prometheus |
|
|
| Alertmanager |
|
|
| kube-state-metrics |
| |
| openshift-state-metrics |
| |
| Telemeter Client |
| |
| Prometheus Adapter |
| |
| Thanos Querier |
| |
| Thanos Ruler |
|
The Prometheus key is called
prometheusK8s
cluster-monitoring-config
ConfigMap
prometheus
user-workload-monitoring-config
ConfigMap
2.6. Using node selectors to move monitoring components Copia collegamentoCollegamento copiato negli appunti!
By using the
nodeSelector
By controlling placement and distribution of monitoring components, you can optimize system resource use, improve performance, and segregate workloads based on specific requirements or policies.
2.6.1. How node selectors work with other constraints Copia collegamentoCollegamento copiato negli appunti!
If you move monitoring components by using node selector constraints, be aware that other constraints to control pod scheduling might exist for a cluster:
- Topology spread constraints might be in place to control pod placement.
- Hard anti-affinity rules are in place for Prometheus, Thanos Querier, Alertmanager, and other monitoring components to ensure that multiple pods for these components are always spread across different nodes and are therefore always highly available.
When scheduling pods onto nodes, the pod scheduler tries to satisfy all existing constraints when determining pod placement. That is, all constraints compound when the pod scheduler determines which pods will be placed on which nodes.
Therefore, if you configure a node selector constraint but existing constraints cannot all be satisfied, the pod scheduler cannot match all constraints and will not schedule a pod for placement onto a node.
To maintain resilience and high availability for monitoring components, ensure that enough nodes are available and match all constraints when you configure a node selector constraint to move a component.
2.6.2. Moving monitoring components to different nodes Copia collegamentoCollegamento copiato negli appunti!
To specify the nodes in your cluster on which monitoring stack components will run, configure the
nodeSelector
ConfigMap
You cannot add a node selector constraint directly to an existing scheduled pod.
Prerequisites
If you are configuring core OpenShift Container Platform monitoring components:
-
You have access to the cluster as a user with the cluster role.
cluster-admin -
You have created the
cluster-monitoring-configobject.ConfigMap
-
You have access to the cluster as a user with the
If you are configuring components that monitor user-defined projects:
-
You have access to the cluster as a user with the cluster role or as a user with the
cluster-adminrole in theuser-workload-monitoring-config-editproject.openshift-user-workload-monitoring -
You have created the
user-workload-monitoring-configobject.ConfigMap
-
You have access to the cluster as a user with the
-
You have installed the OpenShift CLI ().
oc
Procedure
If you have not done so yet, add a label to the nodes on which you want to run the monitoring components:
$ oc label nodes <node-name> <node-label>Edit the
object:ConfigMapTo move a component that monitors core OpenShift Container Platform projects:
Edit the
cluster-monitoring-configobject in theConfigMapproject:openshift-monitoring$ oc -n openshift-monitoring edit configmap cluster-monitoring-configSpecify the node labels for the
constraint for the component undernodeSelector:data/config.yamlapiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | <component>:1 nodeSelector: <node-label-1>2 <node-label-2>3 <...>- 1
- Substitute
<component>with the appropriate monitoring stack component name. - 2
- Substitute
<node-label-1>with the label you added to the node. - 3
- Optional: Specify additional labels. If you specify additional labels, the pods for the component are only scheduled on the nodes that contain all of the specified labels.
NoteIf monitoring components remain in a
state after configuring thePendingconstraint, check the pod events for errors relating to taints and tolerations.nodeSelector
To move a component that monitors user-defined projects:
Edit the
user-workload-monitoring-configobject in theConfigMapproject:openshift-user-workload-monitoring$ oc -n openshift-user-workload-monitoring edit configmap user-workload-monitoring-configSpecify the node labels for the
constraint for the component undernodeSelector:data/config.yamlapiVersion: v1 kind: ConfigMap metadata: name: user-workload-monitoring-config namespace: openshift-user-workload-monitoring data: config.yaml: | <component>:1 nodeSelector: <node-label-1>2 <node-label-2>3 <...>- 1
- Substitute
<component>with the appropriate monitoring stack component name. - 2
- Substitute
<node-label-1>with the label you added to the node. - 3
- Optional: Specify additional labels. If you specify additional labels, the pods for the component are only scheduled on the nodes that contain all of the specified labels.
NoteIf monitoring components remain in a
state after configuring thePendingconstraint, check the pod events for errors relating to taints and tolerations.nodeSelector
Save the file to apply the changes. The components specified in the new configuration are moved to the new nodes automatically.
NoteConfigurations applied to the
user-workload-monitoring-configobject are not activated unless a cluster administrator has enabled monitoring for user-defined projects.ConfigMapWarningWhen you save changes to a monitoring config map, the pods and other resources in the project might be redeployed. The running monitoring processes in that project might also restart.
2.7. Assigning tolerations to monitoring components Copia collegamentoCollegamento copiato negli appunti!
You can assign tolerations to any of the monitoring stack components to enable moving them to tainted nodes.
Prerequisites
If you are configuring core OpenShift Container Platform monitoring components:
-
You have access to the cluster as a user with the cluster role.
cluster-admin -
You have created the
cluster-monitoring-configobject.ConfigMap
-
You have access to the cluster as a user with the
If you are configuring components that monitor user-defined projects:
-
You have access to the cluster as a user with the cluster role, or as a user with the
cluster-adminrole in theuser-workload-monitoring-config-editproject.openshift-user-workload-monitoring -
You have created the
user-workload-monitoring-configobject.ConfigMap
-
You have access to the cluster as a user with the
-
You have installed the OpenShift CLI ().
oc
Procedure
Edit the
object:ConfigMapTo assign tolerations to a component that monitors core OpenShift Container Platform projects:
Edit the
cluster-monitoring-configobject in theConfigMapproject:openshift-monitoring$ oc -n openshift-monitoring edit configmap cluster-monitoring-configSpecify
for the component:tolerationsapiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | <component>: tolerations: <toleration_specification>Substitute
and<component>accordingly.<toleration_specification>For example,
adds a taint tooc adm taint nodes node1 key1=value1:NoSchedulewith the keynode1and the valuekey1. This prevents monitoring components from deploying pods onvalue1unless a toleration is configured for that taint. The following example configures thenode1component to tolerate the example taint:alertmanagerMainapiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | alertmanagerMain: tolerations: - key: "key1" operator: "Equal" value: "value1" effect: "NoSchedule"
To assign tolerations to a component that monitors user-defined projects:
Edit the
user-workload-monitoring-configobject in theConfigMapproject:openshift-user-workload-monitoring$ oc -n openshift-user-workload-monitoring edit configmap user-workload-monitoring-configSpecify
for the component:tolerationsapiVersion: v1 kind: ConfigMap metadata: name: user-workload-monitoring-config namespace: openshift-user-workload-monitoring data: config.yaml: | <component>: tolerations: <toleration_specification>Substitute
and<component>accordingly.<toleration_specification>For example,
adds a taint tooc adm taint nodes node1 key1=value1:NoSchedulewith the keynode1and the valuekey1. This prevents monitoring components from deploying pods onvalue1unless a toleration is configured for that taint. The following example configures thenode1component to tolerate the example taint:thanosRulerapiVersion: v1 kind: ConfigMap metadata: name: user-workload-monitoring-config namespace: openshift-user-workload-monitoring data: config.yaml: | thanosRuler: tolerations: - key: "key1" operator: "Equal" value: "value1" effect: "NoSchedule"
Save the file to apply the changes. The new component placement configuration is applied automatically.
NoteConfigurations applied to the
user-workload-monitoring-configobject are not activated unless a cluster administrator has enabled monitoring for user-defined projects.ConfigMapWarningWhen changes are saved to a monitoring config map, the pods and other resources in the related project might be redeployed. The running monitoring processes in that project might also be restarted.
2.8. Setting the body size limit for metrics scraping Copia collegamentoCollegamento copiato negli appunti!
By default, no limit exists for the uncompressed body size for data returned from scraped metrics targets. You can set a body size limit to help avoid situations in which Prometheus consumes excessive amounts of memory when scraped targets return a response that contains a large amount of data. In addition, by setting a body size limit, you can reduce the impact that a malicious target might have on Prometheus and on the cluster as a whole.
After you set a value for
enforcedBodySizeLimit
PrometheusScrapeBodySizeLimitHit
If metrics data scraped from a target has an uncompressed body size exceeding the configured size limit, the scrape fails. Prometheus then considers this target to be down and sets its
up
0
TargetDown
Prerequisites
-
You have access to the cluster as a user with the cluster role.
cluster-admin -
You have installed the OpenShift CLI ().
oc
Procedure
Edit the
cluster-monitoring-configobject in theConfigMapnamespace:openshift-monitoring$ oc -n openshift-monitoring edit configmap cluster-monitoring-configAdd a value for
toenforcedBodySizeLimitto limit the body size that can be accepted per target scrape:data/config.yaml/prometheusK8sapiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: |- prometheusK8s: enforcedBodySizeLimit: 40MB1 - 1
- Specify the maximum body size for scraped metrics targets. This
enforcedBodySizeLimitexample limits the uncompressed size per target scrape to 40 megabytes. Valid numeric values use the Prometheus data size format: B (bytes), KB (kilobytes), MB (megabytes), GB (gigabytes), TB (terabytes), PB (petabytes), and EB (exabytes). The default value is0, which specifies no limit. You can also set the value toautomaticto calculate the limit automatically based on cluster capacity.
Save the file to apply the changes automatically.
WarningWhen you save changes to a
config map, the pods and other resources in thecluster-monitoring-configproject might be redeployed. The running monitoring processes in that project might also restart.openshift-monitoring
2.9. Configuring a dedicated service monitor Copia collegamentoCollegamento copiato negli appunti!
You can configure OpenShift Container Platform core platform monitoring to use dedicated service monitors to collect metrics for the resource metrics pipeline.
When enabled, a dedicated service monitor exposes two additional metrics from the kubelet endpoint and sets the value of the
honorTimestamps
By enabling a dedicated service monitor, you can improve the consistency of Prometheus Adapter-based CPU usage measurements used by, for example, the
oc adm top pod
2.9.1. Enabling a dedicated service monitor Copia collegamentoCollegamento copiato negli appunti!
You can configure core platform monitoring to use a dedicated service monitor by configuring the
dedicatedServiceMonitors
cluster-monitoring-config
ConfigMap
openshift-monitoring
Prerequisites
-
You have installed the OpenShift CLI ().
oc -
You have access to the cluster as a user with the cluster role.
cluster-admin -
You have created the
cluster-monitoring-configobject.ConfigMap
Procedure
Edit the
cluster-monitoring-configobject in theConfigMapnamespace:openshift-monitoring$ oc -n openshift-monitoring edit configmap cluster-monitoring-configAdd an
key-value pair as shown in the following sample:enabled: trueapiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | k8sPrometheusAdapter: dedicatedServiceMonitors: enabled: true1 - 1
- Set the value of the
enabledfield totrueto deploy a dedicated service monitor that exposes the kubelet/metrics/resourceendpoint.
Save the file to apply the changes automatically.
WarningWhen you save changes to a
config map, the pods and other resources in thecluster-monitoring-configproject might be redeployed. The running monitoring processes in that project might also restart.openshift-monitoring
2.10. Configuring persistent storage Copia collegamentoCollegamento copiato negli appunti!
Running cluster monitoring with persistent storage means that your metrics are stored to a persistent volume (PV) and can survive a pod being restarted or recreated. This is ideal if you require your metrics or alerting data to be guarded from data loss. For production environments, it is highly recommended to configure persistent storage. Because of the high IO demands, it is advantageous to use local storage.
2.10.1. Persistent storage prerequisites Copia collegamentoCollegamento copiato negli appunti!
- Dedicate sufficient local persistent storage to ensure that the disk does not become full. How much storage you need depends on the number of pods.
- Verify that you have a persistent volume (PV) ready to be claimed by the persistent volume claim (PVC), one PV for each replica. Because Prometheus and Alertmanager both have two replicas, you need four PVs to support the entire monitoring stack. The PVs are available from the Local Storage Operator, but not if you have enabled dynamically provisioned storage.
Use
as the storage type value for theFilesystemparameter when you configure the persistent volume.volumeModeNoteIf you use a local volume for persistent storage, do not use a raw block volume, which is described with
in thevolumeMode: Blockobject. Prometheus cannot use raw block volumes.LocalVolumeImportantPrometheus does not support file systems that are not POSIX compliant. For example, some NFS file system implementations are not POSIX compliant. If you want to use an NFS file system for storage, verify with the vendor that their NFS implementation is fully POSIX compliant.
2.10.2. Configuring a local persistent volume claim Copia collegamentoCollegamento copiato negli appunti!
For monitoring components to use a persistent volume (PV), you must configure a persistent volume claim (PVC).
Prerequisites
If you are configuring core OpenShift Container Platform monitoring components:
-
You have access to the cluster as a user with the cluster role.
cluster-admin -
You have created the
cluster-monitoring-configobject.ConfigMap
-
You have access to the cluster as a user with the
If you are configuring components that monitor user-defined projects:
-
You have access to the cluster as a user with the cluster role, or as a user with the
cluster-adminrole in theuser-workload-monitoring-config-editproject.openshift-user-workload-monitoring -
You have created the
user-workload-monitoring-configobject.ConfigMap
-
You have access to the cluster as a user with the
-
You have installed the OpenShift CLI ().
oc
Procedure
Edit the
object:ConfigMapTo configure a PVC for a component that monitors core OpenShift Container Platform projects:
Edit the
cluster-monitoring-configobject in theConfigMapproject:openshift-monitoring$ oc -n openshift-monitoring edit configmap cluster-monitoring-configAdd your PVC configuration for the component under
:data/config.yamlapiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | <component>: volumeClaimTemplate: spec: storageClassName: <storage_class> resources: requests: storage: <amount_of_storage>See the Kubernetes documentation on PersistentVolumeClaims for information on how to specify
.volumeClaimTemplateThe following example configures a PVC that claims local persistent storage for the Prometheus instance that monitors core OpenShift Container Platform components:
apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | prometheusK8s: volumeClaimTemplate: spec: storageClassName: local-storage resources: requests: storage: 40GiIn the above example, the storage class created by the Local Storage Operator is called
.local-storageThe following example configures a PVC that claims local persistent storage for Alertmanager:
apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | alertmanagerMain: volumeClaimTemplate: spec: storageClassName: local-storage resources: requests: storage: 10Gi
To configure a PVC for a component that monitors user-defined projects:
Edit the
user-workload-monitoring-configobject in theConfigMapproject:openshift-user-workload-monitoring$ oc -n openshift-user-workload-monitoring edit configmap user-workload-monitoring-configAdd your PVC configuration for the component under
:data/config.yamlapiVersion: v1 kind: ConfigMap metadata: name: user-workload-monitoring-config namespace: openshift-user-workload-monitoring data: config.yaml: | <component>: volumeClaimTemplate: spec: storageClassName: <storage_class> resources: requests: storage: <amount_of_storage>See the Kubernetes documentation on PersistentVolumeClaims for information on how to specify
.volumeClaimTemplateThe following example configures a PVC that claims local persistent storage for the Prometheus instance that monitors user-defined projects:
apiVersion: v1 kind: ConfigMap metadata: name: user-workload-monitoring-config namespace: openshift-user-workload-monitoring data: config.yaml: | prometheus: volumeClaimTemplate: spec: storageClassName: local-storage resources: requests: storage: 40GiIn the above example, the storage class created by the Local Storage Operator is called
.local-storageThe following example configures a PVC that claims local persistent storage for Thanos Ruler:
apiVersion: v1 kind: ConfigMap metadata: name: user-workload-monitoring-config namespace: openshift-user-workload-monitoring data: config.yaml: | thanosRuler: volumeClaimTemplate: spec: storageClassName: local-storage resources: requests: storage: 10GiNoteStorage requirements for the
component depend on the number of rules that are evaluated and how many samples each rule generates.thanosRuler
Save the file to apply the changes. The pods affected by the new configuration are restarted automatically and the new storage configuration is applied.
NoteConfigurations applied to the
user-workload-monitoring-configobject are not activated unless a cluster administrator has enabled monitoring for user-defined projects.ConfigMapWarningWhen changes are saved to a monitoring config map, the pods and other resources in the related project might be redeployed. The running monitoring processes in that project might also be restarted.
2.10.3. Resizing a persistent storage volume Copia collegamentoCollegamento copiato negli appunti!
OpenShift Container Platform does not support resizing an existing persistent storage volume used by
StatefulSet
StorageClass
storage
However, resizing a PV is still possible by using a manual process. If you want to resize a PV for a monitoring component such as Prometheus, Thanos Ruler, or Alertmanager, you can update the appropriate config map in which the component is configured. Then, patch the PVC, and delete and orphan the pods. Orphaning the pods recreates the
StatefulSet
Prerequisites
-
You have installed the OpenShift CLI ().
oc If you are configuring core OpenShift Container Platform monitoring components:
-
You have access to the cluster as a user with the cluster role.
cluster-admin -
You have created the
cluster-monitoring-configobject.ConfigMap - You have configured at least one PVC for core OpenShift Container Platform monitoring components.
-
You have access to the cluster as a user with the
If you are configuring components that monitor user-defined projects:
-
You have access to the cluster as a user with the cluster role, or as a user with the
cluster-adminrole in theuser-workload-monitoring-config-editproject.openshift-user-workload-monitoring -
You have created the
user-workload-monitoring-configobject.ConfigMap - You have configured at least one PVC for components that monitor user-defined projects.
-
You have access to the cluster as a user with the
Procedure
Edit the
object:ConfigMapTo resize a PVC for a component that monitors core OpenShift Container Platform projects:
Edit the
cluster-monitoring-configobject in theConfigMapproject:openshift-monitoring$ oc -n openshift-monitoring edit configmap cluster-monitoring-configAdd a new storage size for the PVC configuration for the component under
:data/config.yamlapiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | <component>:1 volumeClaimTemplate: spec: storageClassName: <storage_class>2 resources: requests: storage: <amount_of_storage>3 The following example configures a PVC that sets the local persistent storage to 100 gigabytes for the Prometheus instance that monitors core OpenShift Container Platform components:
apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | prometheusK8s: volumeClaimTemplate: spec: storageClassName: local-storage resources: requests: storage: 100GiThe following example configures a PVC that sets the local persistent storage for Alertmanager to 40 gigabytes:
apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | alertmanagerMain: volumeClaimTemplate: spec: storageClassName: local-storage resources: requests: storage: 40Gi
To resize a PVC for a component that monitors user-defined projects:
NoteYou can resize the volumes for the Thanos Ruler and Prometheus instances that monitor user-defined projects.
Edit the
user-workload-monitoring-configobject in theConfigMapproject:openshift-user-workload-monitoring$ oc -n openshift-user-workload-monitoring edit configmap user-workload-monitoring-configUpdate the PVC configuration for the monitoring component under
:data/config.yamlapiVersion: v1 kind: ConfigMap metadata: name: user-workload-monitoring-config namespace: openshift-user-workload-monitoring data: config.yaml: | <component>:1 volumeClaimTemplate: spec: storageClassName: <storage_class>2 resources: requests: storage: <amount_of_storage>3 The following example configures the PVC size to 100 gigabytes for the Prometheus instance that monitors user-defined projects:
apiVersion: v1 kind: ConfigMap metadata: name: user-workload-monitoring-config namespace: openshift-user-workload-monitoring data: config.yaml: | prometheus: volumeClaimTemplate: spec: storageClassName: local-storage resources: requests: storage: 100GiThe following example sets the PVC size to 20 gigabytes for Thanos Ruler:
apiVersion: v1 kind: ConfigMap metadata: name: user-workload-monitoring-config namespace: openshift-user-workload-monitoring data: config.yaml: | thanosRuler: volumeClaimTemplate: spec: storageClassName: local-storage resources: requests: storage: 20GiNoteStorage requirements for the
component depend on the number of rules that are evaluated and how many samples each rule generates.thanosRuler
Save the file to apply the changes. The pods affected by the new configuration restart automatically.
WarningWhen you save changes to a monitoring config map, the pods and other resources in the related project might be redeployed. The monitoring processes running in that project might also be restarted.
Manually patch every PVC with the updated storage request. The following example resizes the storage size for the Prometheus component in the
namespace to 100Gi:openshift-monitoring$ for p in $(oc -n openshift-monitoring get pvc -l app.kubernetes.io/name=prometheus -o jsonpath='{range .items[*]}{.metadata.name} {end}'); do \ oc -n openshift-monitoring patch pvc/${p} --patch '{"spec": {"resources": {"requests": {"storage":"100Gi"}}}}'; \ doneDelete the underlying StatefulSet with the
parameter:--cascade=orphan$ oc delete statefulset -l app.kubernetes.io/name=prometheus --cascade=orphan
2.10.4. Modifying the retention time and size for Prometheus metrics data Copia collegamentoCollegamento copiato negli appunti!
By default, Prometheus automatically retains metrics data for 15 days. You can modify the retention time to change how soon data is deleted by specifying a time value in the
retention
retentionSize
Note the following behaviors of these data retention settings:
-
The size-based retention policy applies to all data block directories in the directory, including persistent blocks, write-ahead log (WAL) data, and m-mapped chunks.
/prometheus -
Data in the and
/waldirectories counts toward the retention size limit, but Prometheus never purges data from these directories based on size- or time-based retention policies. Thus, if you set a retention size limit lower than the maximum size set for the/head_chunksand/waldirectories, you have configured the system not to retain any data blocks in the/head_chunksdata directories./prometheus - The size-based retention policy is applied only when Prometheus cuts a new data block, which occurs every two hours after the WAL contains at least three hours of data.
-
If you do not explicitly define values for either or
retention, retention time defaults to 15 days, and retention size is not set.retentionSize -
If you define values for both and
retention, both values apply. If any data blocks exceed the defined retention time or the defined size limit, Prometheus purges these data blocks.retentionSize -
If you define a value for and do not define
retentionSize, only theretentionvalue applies.retentionSize -
If you do not define a value for and only define a value for
retentionSize, only theretentionvalue applies.retention
Prerequisites
If you are configuring core OpenShift Container Platform monitoring components:
-
You have access to the cluster as a user with the cluster role.
cluster-admin -
You have created the
cluster-monitoring-configobject.ConfigMap
-
You have access to the cluster as a user with the
If you are configuring components that monitor user-defined projects:
- A cluster administrator has enabled monitoring for user-defined projects.
-
You have access to the cluster as a user with the cluster role, or as a user with the
cluster-adminrole in theuser-workload-monitoring-config-editproject.openshift-user-workload-monitoring -
You have created the
user-workload-monitoring-configobject.ConfigMap
-
You have installed the OpenShift CLI ().
oc
Saving changes to a monitoring config map might restart monitoring processes and redeploy the pods and other resources in the related project. The running monitoring processes in that project might also restart.
Procedure
Edit the
object:ConfigMapTo modify the retention time and size for the Prometheus instance that monitors core OpenShift Container Platform projects:
Edit the
cluster-monitoring-configobject in theConfigMapproject:openshift-monitoring$ oc -n openshift-monitoring edit configmap cluster-monitoring-configAdd the retention time and size configuration under
:data/config.yamlapiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | prometheusK8s: retention: <time_specification>1 retentionSize: <size_specification>2 - 1
- The retention time: a number directly followed by
ms(milliseconds),s(seconds),m(minutes),h(hours),d(days),w(weeks), ory(years). You can also combine time values for specific times, such as1h30m15s. - 2
- The retention size: a number directly followed by
B(bytes),KB(kilobytes),MB(megabytes),GB(gigabytes),TB(terabytes),PB(petabytes), andEB(exabytes).
The following example sets the retention time to 24 hours and the retention size to 10 gigabytes for the Prometheus instance that monitors core OpenShift Container Platform components:
apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | prometheusK8s: retention: 24h retentionSize: 10GB
To modify the retention time and size for the Prometheus instance that monitors user-defined projects:
Edit the
user-workload-monitoring-configobject in theConfigMapproject:openshift-user-workload-monitoring$ oc -n openshift-user-workload-monitoring edit configmap user-workload-monitoring-configAdd the retention time and size configuration under
:data/config.yamlapiVersion: v1 kind: ConfigMap metadata: name: user-workload-monitoring-config namespace: openshift-user-workload-monitoring data: config.yaml: | prometheus: retention: <time_specification>1 retentionSize: <size_specification>2 - 1
- The retention time: a number directly followed by
ms(milliseconds),s(seconds),m(minutes),h(hours),d(days),w(weeks), ory(years). You can also combine time values for specific times, such as1h30m15s. - 2
- The retention size: a number directly followed by
B(bytes),KB(kilobytes),MB(megabytes),GB(gigabytes),TB(terabytes),PB(petabytes), orEB(exabytes).
The following example sets the retention time to 24 hours and the retention size to 10 gigabytes for the Prometheus instance that monitors user-defined projects:
apiVersion: v1 kind: ConfigMap metadata: name: user-workload-monitoring-config namespace: openshift-user-workload-monitoring data: config.yaml: | prometheus: retention: 24h retentionSize: 10GB
- Save the file to apply the changes. The pods affected by the new configuration restart automatically.
2.10.5. Modifying the retention time for Thanos Ruler metrics data Copia collegamentoCollegamento copiato negli appunti!
By default, for user-defined projects, Thanos Ruler automatically retains metrics data for 24 hours. You can modify the retention time to change how long this data is retained by specifying a time value in the
user-workload-monitoring-config
openshift-user-workload-monitoring
Prerequisites
-
You have installed the OpenShift CLI ().
oc - A cluster administrator has enabled monitoring for user-defined projects.
-
You have access to the cluster as a user with the cluster role or as a user with the
cluster-adminrole in theuser-workload-monitoring-config-editproject.openshift-user-workload-monitoring -
You have created the
user-workload-monitoring-configobject.ConfigMap
Saving changes to a monitoring config map might restart monitoring processes and redeploy the pods and other resources in the related project. The running monitoring processes in that project might also restart.
Procedure
Edit the
user-workload-monitoring-configobject in theConfigMapproject:openshift-user-workload-monitoring$ oc -n openshift-user-workload-monitoring edit configmap user-workload-monitoring-configAdd the retention time configuration under
:data/config.yamlapiVersion: v1 kind: ConfigMap metadata: name: user-workload-monitoring-config namespace: openshift-user-workload-monitoring data: config.yaml: | thanosRuler: retention: <time_specification>1 - 1
- Specify the retention time in the following format: a number directly followed by
ms(milliseconds),s(seconds),m(minutes),h(hours),d(days),w(weeks), ory(years). You can also combine time values for specific times, such as1h30m15s. The default is24h.
The following example sets the retention time to 10 days for Thanos Ruler data:
apiVersion: v1 kind: ConfigMap metadata: name: user-workload-monitoring-config namespace: openshift-user-workload-monitoring data: config.yaml: | thanosRuler: retention: 10d- Save the file to apply the changes. The pods affected by the new configuration automatically restart.
2.11. Configuring remote write storage Copia collegamentoCollegamento copiato negli appunti!
You can configure remote write storage to enable Prometheus to send ingested metrics to remote systems for long-term storage. Doing so has no impact on how or for how long Prometheus stores metrics.
Prerequisites
If you are configuring core OpenShift Container Platform monitoring components:
-
You have access to the cluster as a user with the cluster role.
cluster-admin -
You have created the
cluster-monitoring-configobject.ConfigMap
-
You have access to the cluster as a user with the
If you are configuring components that monitor user-defined projects:
-
You have access to the cluster as a user with the cluster role or as a user with the
cluster-adminrole in theuser-workload-monitoring-config-editproject.openshift-user-workload-monitoring -
You have created the
user-workload-monitoring-configobject.ConfigMap
-
You have access to the cluster as a user with the
-
You have installed the OpenShift CLI ().
oc - You have set up a remote write compatible endpoint (such as Thanos) and know the endpoint URL. See the Prometheus remote endpoints and storage documentation for information about endpoints that are compatible with the remote write feature.
You have set up authentication credentials in a
object for the remote write endpoint. You must create the secret in the same namespace as the Prometheus object for which you configure remote write: theSecretnamespace for default platform monitoring or theopenshift-monitoringnamespace for user workload monitoring.openshift-user-workload-monitoringImportantTo reduce security risks, use HTTPS and authentication to send metrics to an endpoint.
Procedure
Follow these steps to configure remote write for default platform monitoring in the
cluster-monitoring-config
openshift-monitoring
If you configure remote write for the Prometheus instance that monitors user-defined projects, make similar edits to the
user-workload-monitoring-config
openshift-user-workload-monitoring
prometheus
user-workload-monitoring-config
ConfigMap
prometheusK8s
cluster-monitoring-config
ConfigMap
Edit the
cluster-monitoring-configobject in theConfigMapproject:openshift-monitoring$ oc -n openshift-monitoring edit configmap cluster-monitoring-config-
Add a section under
remoteWrite:.data/config.yaml/prometheusK8s Add an endpoint URL and authentication credentials in this section:
apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | prometheusK8s: remoteWrite: - url: "https://remote-write-endpoint.example.com"1 <endpoint_authentication_credentials>2 - 1
- The URL of the remote write endpoint.
- 2
- The authentication method and credentials for the endpoint. Currently supported authentication methods are AWS Signature Version 4, authentication using HTTP in an
Authorizationrequest header, Basic authentication, OAuth 2.0, and TLS client. See Supported remote write authentication settings for sample configurations of supported authentication methods.
Add write relabel configuration values after the authentication credentials:
apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | prometheusK8s: remoteWrite: - url: "https://remote-write-endpoint.example.com" <endpoint_authentication_credentials> <write_relabel_configs>1 - 1
- The write relabel configuration settings.
For
substitute a list of write relabel configurations for metrics that you want to send to the remote endpoint.<write_relabel_configs>The following sample shows how to forward a single metric called
:my_metricapiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | prometheusK8s: remoteWrite: - url: "https://remote-write-endpoint.example.com" writeRelabelConfigs: - sourceLabels: [__name__] regex: 'my_metric' action: keepSee the Prometheus relabel_config documentation for information about write relabel configuration options.
Save the file to apply the changes to the
object. The pods affected by the new configuration restart automatically.ConfigMapNoteConfigurations applied to the
user-workload-monitoring-configobject are not activated unless a cluster administrator has enabled monitoring for user-defined projects.ConfigMapWarningSaving changes to a monitoring
object might redeploy the pods and other resources in the related project. Saving changes might also restart the running monitoring processes in that project.ConfigMap
2.11.1. Supported remote write authentication settings Copia collegamentoCollegamento copiato negli appunti!
You can use different methods to authenticate with a remote write endpoint. Currently supported authentication methods are AWS Signature Version 4, Basic authentication, authentication using HTTP in an
Authorization
| Authentication method | Config map field | Description |
|---|---|---|
| AWS Signature Version 4 |
| This method uses AWS Signature Version 4 authentication to sign requests. You cannot use this method simultaneously with authorization, OAuth 2.0, or Basic authentication. |
| Basic authentication |
| Basic authentication sets the authorization header on every remote write request with the configured username and password. |
| authorization |
| Authorization sets the
|
| OAuth 2.0 |
| An OAuth 2.0 configuration uses the client credentials grant type. Prometheus fetches an access token from
|
| TLS client |
| A TLS client configuration specifies the CA certificate, the client certificate, and the client key file information used to authenticate with the remote write endpoint server using TLS. The sample configuration assumes that you have already created a CA certificate file, a client certificate file, and a client key file. |
2.11.1.1. Config map location for authentication settings Copia collegamentoCollegamento copiato negli appunti!
The following shows the location of the authentication configuration in the
ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
prometheusK8s:
remoteWrite:
- url: "https://remote-write-endpoint.example.com"
<endpoint_authentication_details>
- 1
- The URL of the remote write endpoint.
- 2
- The required configuration details for the authentication method for the endpoint. Currently supported authentication methods are Amazon Web Services (AWS) Signature Version 4, authentication using HTTP in an
Authorizationrequest header, Basic authentication, OAuth 2.0, and TLS client.
If you configure remote write for the Prometheus instance that monitors user-defined projects, edit the
user-workload-monitoring-config
openshift-user-workload-monitoring
prometheus
user-workload-monitoring-config
ConfigMap
prometheusK8s
cluster-monitoring-config
ConfigMap
2.11.1.2. Example remote write authentication settings Copia collegamentoCollegamento copiato negli appunti!
The following samples show different authentication settings you can use to connect to a remote write endpoint. Each sample also shows how to configure a corresponding
Secret
openshift-monitoring
Sample YAML for AWS Signature Version 4 authentication
The following shows the settings for a
sigv4
sigv4-credentials
openshift-monitoring
apiVersion: v1
kind: Secret
metadata:
name: sigv4-credentials
namespace: openshift-monitoring
stringData:
accessKey: <AWS_access_key>
secretKey: <AWS_secret_key>
type: Opaque
The following shows sample AWS Signature Version 4 remote write authentication settings that use a
Secret
sigv4-credentials
openshift-monitoring
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
prometheusK8s:
remoteWrite:
- url: "https://authorization.example.com/api/write"
sigv4:
region: <AWS_region>
accessKey:
name: sigv4-credentials
key: accessKey
secretKey:
name: sigv4-credentials
key: secretKey
profile: <AWS_profile_name>
roleArn: <AWS_role_arn>
- 1
- The AWS region.
- 2 4
- The name of the
Secretobject containing the AWS API access credentials. - 3
- The key that contains the AWS API access key in the specified
Secretobject. - 5
- The key that contains the AWS API secret key in the specified
Secretobject. - 6
- The name of the AWS profile that is being used to authenticate.
- 7
- The unique identifier for the Amazon Resource Name (ARN) assigned to your role.
Sample YAML for Basic authentication
The following shows sample Basic authentication settings for a
Secret
rw-basic-auth
openshift-monitoring
apiVersion: v1
kind: Secret
metadata:
name: rw-basic-auth
namespace: openshift-monitoring
stringData:
user: <basic_username>
password: <basic_password>
type: Opaque
The following sample shows a
basicAuth
Secret
rw-basic-auth
openshift-monitoring
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
prometheusK8s:
remoteWrite:
- url: "https://basicauth.example.com/api/write"
basicAuth:
username:
name: rw-basic-auth
key: user
password:
name: rw-basic-auth
key: password
Sample YAML for authentication with a bearer token using a Secret Object
The following shows bearer token settings for a
Secret
rw-bearer-auth
openshift-monitoring
apiVersion: v1
kind: Secret
metadata:
name: rw-bearer-auth
namespace: openshift-monitoring
stringData:
token: <authentication_token>
type: Opaque
- 1
- The authentication token.
The following shows sample bearer token config map settings that use a
Secret
rw-bearer-auth
openshift-monitoring
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
enableUserWorkload: true
prometheusK8s:
remoteWrite:
- url: "https://authorization.example.com/api/write"
authorization:
type: Bearer
credentials:
name: rw-bearer-auth
key: token
Sample YAML for OAuth 2.0 authentication
The following shows sample OAuth 2.0 settings for a
Secret
oauth2-credentials
openshift-monitoring
apiVersion: v1
kind: Secret
metadata:
name: oauth2-credentials
namespace: openshift-monitoring
stringData:
id: <oauth2_id>
secret: <oauth2_secret>
token: <oauth2_authentication_token>
type: Opaque
The following shows an
oauth2
Secret
oauth2-credentials
openshift-monitoring
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
prometheusK8s:
remoteWrite:
- url: "https://test.example.com/api/write"
oauth2:
clientId:
secret:
name: oauth2-credentials
key: id
clientSecret:
name: oauth2-credentials
key: secret
tokenUrl: https://example.com/oauth2/token
scopes:
- <scope_1>
- <scope_2>
endpointParams:
param1: <parameter_1>
param2: <parameter_2>
- 1 3
- The name of the corresponding
Secretobject. Note thatClientIdcan alternatively refer to aConfigMapobject, althoughclientSecretmust refer to aSecretobject. - 2 4
- The key that contains the OAuth 2.0 credentials in the specified
Secretobject. - 5
- The URL used to fetch a token with the specified
clientIdandclientSecret. - 6
- The OAuth 2.0 scopes for the authorization request. These scopes limit what data the tokens can access.
- 7
- The OAuth 2.0 authorization request parameters required for the authorization server.
Sample YAML for TLS client authentication
The following shows sample TLS client settings for a
tls
Secret
mtls-bundle
openshift-monitoring
apiVersion: v1
kind: Secret
metadata:
name: mtls-bundle
namespace: openshift-monitoring
data:
ca.crt: <ca_cert>
client.crt: <client_cert>
client.key: <client_key>
type: tls
The following sample shows a
tlsConfig
Secret
mtls-bundle
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
prometheusK8s:
remoteWrite:
- url: "https://remote-write-endpoint.example.com"
tlsConfig:
ca:
secret:
name: mtls-bundle
key: ca.crt
cert:
secret:
name: mtls-bundle
key: client.crt
keySecret:
name: mtls-bundle
key: client.key
- 1 3 5
- The name of the corresponding
Secretobject that contains the TLS authentication credentials. Note thatcaandcertcan alternatively refer to aConfigMapobject, thoughkeySecretmust refer to aSecretobject. - 2
- The key in the specified
Secretobject that contains the CA certificate for the endpoint. - 4
- The key in the specified
Secretobject that contains the client certificate for the endpoint. - 6
- The key in the specified
Secretobject that contains the client key secret.
2.12. Adding cluster ID labels to metrics Copia collegamentoCollegamento copiato negli appunti!
If you manage multiple OpenShift Container Platform clusters and use the remote write feature to send metrics data from these clusters to an external storage location, you can add cluster ID labels to identify the metrics data coming from different clusters. You can then query these labels to identify the source cluster for a metric and distinguish that data from similar metrics data sent by other clusters.
This way, if you manage many clusters for multiple customers and send metrics data to a single centralized storage system, you can use cluster ID labels to query metrics for a particular cluster or customer.
Creating and using cluster ID labels involves three general steps:
- Configuring the write relabel settings for remote write storage.
- Adding cluster ID labels to the metrics.
- Querying these labels to identify the source cluster or customer for a metric.
2.12.1. Creating cluster ID labels for metrics Copia collegamentoCollegamento copiato negli appunti!
You can create cluster ID labels for metrics for default platform monitoring and for user workload monitoring.
For default platform monitoring, you add cluster ID labels for metrics in the
write_relabel
cluster-monitoring-config
openshift-monitoring
For user workload monitoring, you edit the settings in the
user-workload-monitoring-config
openshift-user-workload-monitoring
When Prometheus scrapes user workload targets that expose a
namespace
exported_namespace
honorLabels
true
PodMonitor
ServiceMonitor
Prerequisites
-
You have installed the OpenShift CLI ().
oc - You have configured remote write storage.
If you are configuring default platform monitoring components:
-
You have access to the cluster as a user with the cluster role.
cluster-admin -
You have created the
cluster-monitoring-configobject.ConfigMap
-
You have access to the cluster as a user with the
If you are configuring components that monitor user-defined projects:
-
You have access to the cluster as a user with the cluster role or as a user with the
cluster-adminrole in theuser-workload-monitoring-config-editproject.openshift-user-workload-monitoring -
You have created the
user-workload-monitoring-configobject.ConfigMap
-
You have access to the cluster as a user with the
Procedure
Edit the
cluster-monitoring-configobject in theConfigMapproject:openshift-monitoring$ oc -n openshift-monitoring edit configmap cluster-monitoring-configNoteIf you configure cluster ID labels for metrics for the Prometheus instance that monitors user-defined projects, edit the
config map in theuser-workload-monitoring-confignamespace. Note that the Prometheus component is calledopenshift-user-workload-monitoringin this config map and notprometheus, which is the name used in theprometheusK8sconfig map.cluster-monitoring-configIn the
section underwriteRelabelConfigs:, add cluster ID relabel configuration values:data/config.yaml/prometheusK8s/remoteWriteapiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | prometheusK8s: remoteWrite: - url: "https://remote-write-endpoint.example.com" <endpoint_authentication_credentials> writeRelabelConfigs:1 - <relabel_config>2 The following sample shows how to forward a metric with the cluster ID label
in default platform monitoring:cluster_idapiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | prometheusK8s: remoteWrite: - url: "https://remote-write-endpoint.example.com" writeRelabelConfigs: - sourceLabels: - __tmp_openshift_cluster_id__1 targetLabel: cluster_id2 action: replace3 - 1
- The system initially applies a temporary cluster ID source label named
__tmp_openshift_cluster_id__. This temporary label gets replaced by the cluster ID label name that you specify. - 2
- Specify the name of the cluster ID label for metrics sent to remote write storage. If you use a label name that already exists for a metric, that value is overwritten with the name of this cluster ID label. For the label name, do not use
__tmp_openshift_cluster_id__. The final relabeling step removes labels that use this name. - 3
- The
replacewrite relabel action replaces the temporary label with the target label for outgoing metrics. This action is the default and is applied if no action is specified.
Save the file to apply the changes to the
object. The pods affected by the updated configuration automatically restart.ConfigMapWarningSaving changes to a monitoring
object might redeploy the pods and other resources in the related project. Saving changes might also restart the running monitoring processes in that project.ConfigMap
2.13. Controlling the impact of unbound metrics attributes in user-defined projects Copia collegamentoCollegamento copiato negli appunti!
Developers can create labels to define attributes for metrics in the form of key-value pairs. The number of potential key-value pairs corresponds to the number of possible values for an attribute. An attribute that has an unlimited number of potential values is called an unbound attribute. For example, a
customer_id
Every assigned key-value pair has a unique time series. Using many unbound attributes in labels can create exponentially more time series, which can impact Prometheus performance and available disk space.
Cluster administrators can use the following measures to control the impact of unbound metrics attributes in user-defined projects:
- Limit the number of samples that can be accepted per target scrape in user-defined projects
- Limit the number of scraped labels, the length of label names, and the length of label values.
- Create alerts that fire when a scrape sample threshold is reached or when the target cannot be scraped
To prevent issues caused by adding many unbound attributes, limit the number of scrape samples, label names, and unbound attributes you define for metrics. Also reduce the number of potential key-value pair combinations by using attributes that are bound to a limited set of possible values.
2.13.1. Setting scrape sample and label limits for user-defined projects Copia collegamentoCollegamento copiato negli appunti!
You can limit the number of samples that can be accepted per target scrape in user-defined projects. You can also limit the number of scraped labels, the length of label names, and the length of label values.
If you set sample or label limits, no further sample data is ingested for that target scrape after the limit is reached.
Prerequisites
-
You have access to the cluster as a user with the cluster role, or as a user with the
cluster-adminrole in theuser-workload-monitoring-config-editproject.openshift-user-workload-monitoring - You have enabled monitoring for user-defined projects.
-
You have installed the OpenShift CLI ().
oc
Procedure
Edit the
user-workload-monitoring-configobject in theConfigMapproject:openshift-user-workload-monitoring$ oc -n openshift-user-workload-monitoring edit configmap user-workload-monitoring-configAdd the
configuration toenforcedSampleLimitto limit the number of samples that can be accepted per target scrape in user-defined projects:data/config.yamlapiVersion: v1 kind: ConfigMap metadata: name: user-workload-monitoring-config namespace: openshift-user-workload-monitoring data: config.yaml: | prometheus: enforcedSampleLimit: 500001 - 1
- A value is required if this parameter is specified. This
enforcedSampleLimitexample limits the number of samples that can be accepted per target scrape in user-defined projects to 50,000.
Add the
,enforcedLabelLimit, andenforcedLabelNameLengthLimitconfigurations toenforcedLabelValueLengthLimitto limit the number of scraped labels, the length of label names, and the length of label values in user-defined projects:data/config.yamlapiVersion: v1 kind: ConfigMap metadata: name: user-workload-monitoring-config namespace: openshift-user-workload-monitoring data: config.yaml: | prometheus: enforcedLabelLimit: 5001 enforcedLabelNameLengthLimit: 502 enforcedLabelValueLengthLimit: 6003 - 1
- Specifies the maximum number of labels per scrape. The default value is
0, which specifies no limit. - 2
- Specifies the maximum length in characters of a label name. The default value is
0, which specifies no limit. - 3
- Specifies the maximum length in characters of a label value. The default value is
0, which specifies no limit.
Save the file to apply the changes. The limits are applied automatically.
NoteConfigurations applied to the
user-workload-monitoring-configobject are not activated unless a cluster administrator has enabled monitoring for user-defined projects.ConfigMapWarningWhen changes are saved to the
user-workload-monitoring-configobject, the pods and other resources in theConfigMapproject might be redeployed. The running monitoring processes in that project might also be restarted.openshift-user-workload-monitoring
2.13.2. Creating scrape sample alerts Copia collegamentoCollegamento copiato negli appunti!
You can create alerts that notify you when:
-
The target cannot be scraped or is not available for the specified duration
for -
A scrape sample threshold is reached or is exceeded for the specified duration
for
Prerequisites
-
You have access to the cluster as a user with the cluster role, or as a user with the
cluster-adminrole in theuser-workload-monitoring-config-editproject.openshift-user-workload-monitoring - You have enabled monitoring for user-defined projects.
-
You have created the
user-workload-monitoring-configobject.ConfigMap -
You have limited the number of samples that can be accepted per target scrape in user-defined projects, by using .
enforcedSampleLimit -
You have installed the OpenShift CLI ().
oc
Procedure
Create a YAML file with alerts that inform you when the targets are down and when the enforced sample limit is approaching. The file in this example is called
:monitoring-stack-alerts.yamlapiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: labels: prometheus: k8s role: alert-rules name: monitoring-stack-alerts1 namespace: ns12 spec: groups: - name: general.rules rules: - alert: TargetDown3 annotations: message: '{{ printf "%.4g" $value }}% of the {{ $labels.job }}/{{ $labels.service }} targets in {{ $labels.namespace }} namespace are down.'4 expr: 100 * (count(up == 0) BY (job, namespace, service) / count(up) BY (job, namespace, service)) > 10 for: 10m5 labels: severity: warning6 - alert: ApproachingEnforcedSamplesLimit7 annotations: message: '{{ $labels.container }} container of the {{ $labels.pod }} pod in the {{ $labels.namespace }} namespace consumes {{ $value | humanizePercentage }} of the samples limit budget.'8 expr: scrape_samples_scraped/50000 > 0.89 for: 10m10 labels: severity: warning11 - 1
- Defines the name of the alerting rule.
- 2
- Specifies the user-defined project where the alerting rule will be deployed.
- 3
- The
TargetDownalert will fire if the target cannot be scraped or is not available for theforduration. - 4
- The message that will be output when the
TargetDownalert fires. - 5
- The conditions for the
TargetDownalert must be true for this duration before the alert is fired. - 6
- Defines the severity for the
TargetDownalert. - 7
- The
ApproachingEnforcedSamplesLimitalert will fire when the defined scrape sample threshold is reached or exceeded for the specifiedforduration. - 8
- The message that will be output when the
ApproachingEnforcedSamplesLimitalert fires. - 9
- The threshold for the
ApproachingEnforcedSamplesLimitalert. In this example the alert will fire when the number of samples per target scrape has exceeded 80% of the enforced sample limit of50000. Theforduration must also have passed before the alert will fire. The<number>in the expressionscrape_samples_scraped/<number> > <threshold>must match theenforcedSampleLimitvalue defined in theuser-workload-monitoring-configConfigMapobject. - 10
- The conditions for the
ApproachingEnforcedSamplesLimitalert must be true for this duration before the alert is fired. - 11
- Defines the severity for the
ApproachingEnforcedSamplesLimitalert.
Apply the configuration to the user-defined project:
$ oc apply -f monitoring-stack-alerts.yaml