Ce contenu n'est pas disponible dans la langue sélectionnée.

Chapter 3. Storing and recording data for user workload monitoring

Store and record your metrics and alerting data, configure logs to specify which activities are recorded, control how long Prometheus retains stored data, and set the maximum amount of disk space for the data. These actions help you protect your data and use them for troubleshooting.

3.1. Configuring persistent storage
Copier lien

Learn about persistent storage configuration for monitoring components to properly plan and deploy production-ready monitoring infrastructure.

Run cluster monitoring with persistent storage to gain the following benefits:

Protect your metrics and alerting data from data loss by storing them in a persistent volume (PV). As a result, they can survive pods being restarted or recreated.
Avoid getting duplicate notifications and losing silences for alerts when the Alertmanager pods are restarted.

For production environments, it is highly recommended to configure persistent storage.

Important

In multi-node clusters, you must configure persistent storage for Prometheus, Alertmanager, and Thanos Ruler to ensure high availability.

3.1.1. Persistent storage prerequisites
Copier lien

Dedicate sufficient persistent storage to ensure that the disk does not become full.
Use Filesystem as the storage type value for the volumeMode parameter when you configure the persistent volume.
Important
- Do not use a raw block volume, which is described with volumeMode: Block in the PersistentVolume resource. Prometheus cannot use raw block volumes.
- Prometheus does not support file systems that are not POSIX compliant. For example, some NFS file system implementations are not POSIX compliant. If you want to use an NFS file system for storage, verify with the vendor that their NFS implementation is fully POSIX compliant.

3.1.2. Configuring a persistent volume claim
Copier lien

To use a persistent volume (PV) for monitoring components, you must configure a persistent volume claim (PVC).

Prerequisites

You have access to the cluster as a user with the cluster-admin cluster role, or as a user with the user-workload-monitoring-config-edit role in the openshift-user-workload-monitoring project.
A cluster administrator has enabled monitoring for user-defined projects.
You have installed the OpenShift CLI (oc).

Procedure

Edit the user-workload-monitoring-config config map in the openshift-user-workload-monitoring project:
```
$ oc -n openshift-user-workload-monitoring edit configmap user-workload-monitoring-config
```

Add your PVC configuration for the component under data/config.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: user-workload-monitoring-config
  namespace: openshift-user-workload-monitoring
data:
  config.yaml: |
    <component>:
      volumeClaimTemplate:
        spec:
          storageClassName: <storage_class>
          resources:
            requests:
              storage: <amount_of_storage>

where:

<component>: Specifies the monitoring component for which you want to configure the PVC.
<storage_class>: Specifies an existing storage class. If a storage class is not specified, the default storage class is used.
<amount_of_storage>: Specifies the amount of required storage.

The following example configures a PVC that claims persistent storage for Thanos Ruler:

apiVersion: v1
kind: ConfigMap
metadata:
  name: user-workload-monitoring-config
  namespace: openshift-user-workload-monitoring
data:
  config.yaml: |
    thanosRuler:
      volumeClaimTemplate:
        spec:
          storageClassName: my-storage-class
          resources:
            requests:
              storage: 10Gi

Note

Storage requirements for the thanosRuler component depend on the number of rules that are evaluated and how many samples each rule generates.

Save the file to apply the changes. The pods affected by the new configuration are automatically redeployed and the new storage configuration is applied.
Warning
When you update the config map with a PVC configuration, the affected StatefulSet object is recreated, resulting in a temporary service outage.

3.1.3. Resizing a persistent volume
Copier lien

You can resize a persistent volume (PV) for the instances of Prometheus, Thanos Ruler, and Alertmanager to meet your capacity requirements. You need to manually expand a persistent volume claim (PVC), and then update the config map in which the component is configured.

Important

You can only expand the size of the PVC. Shrinking the storage size is not possible.

Prerequisites

You have access to the cluster as a user with the cluster-admin cluster role, or as a user with the user-workload-monitoring-config-edit role in the openshift-user-workload-monitoring project.
A cluster administrator has enabled monitoring for user-defined projects.
You have configured at least one PVC for components that monitor user-defined projects.
You have installed the OpenShift CLI (oc).

Procedure

Manually expand a PVC with the updated storage request. For more information, see "Expanding persistent volume claims (PVCs) with a file system" in Expanding persistent volumes.
Edit the user-workload-monitoring-config config map in the openshift-user-workload-monitoring project:
```
$ oc -n openshift-user-workload-monitoring edit configmap user-workload-monitoring-config
```

Add a new storage size for the PVC configuration for the component under data/config.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: user-workload-monitoring-config
  namespace: openshift-user-workload-monitoring
data:
  config.yaml: |
    <component>:
      volumeClaimTemplate:
        spec:
          resources:
            requests:
              storage: <amount_of_storage>

where:

<component>: Specifies the component for which you want to change the storage size.
<amount_of_storage>: Specifies the new size for the storage volume. It must be greater than the previous value.

The following example sets the new PVC request to 20 gigabytes for Thanos Ruler:

apiVersion: v1
kind: ConfigMap
metadata:
  name: user-workload-monitoring-config
  namespace: openshift-user-workload-monitoring
data:
  config.yaml: |
    thanosRuler:
      volumeClaimTemplate:
        spec:
          resources:
            requests:
              storage: 20Gi

Note

Storage requirements for the thanosRuler component depend on the number of rules that are evaluated and how many samples each rule generates.

Save the file to apply the changes. The pods affected by the new configuration are automatically redeployed.
Warning
When you update the config map with a new storage size, the affected StatefulSet object is recreated, resulting in a temporary service outage.

3.2. Modifying retention time and size for Prometheus metrics data
Copier lien

Modify the retention time for the Prometheus instance to change when the data is deleted. You can also set the maximum amount of disk space the retained metrics data uses. This ensures you maintain necessary metrics while preventing excessive disk space usage.

By default, Prometheus retains metrics data for 24 hours for monitoring for user-defined projects.

Note

Data compaction occurs every two hours. Therefore, a persistent volume (PV) might fill up before compaction, potentially exceeding the retentionSize limit. In such cases, the KubePersistentVolumeFillingUp alert fires until the space on a PV is lower than the retentionSize limit.

Prerequisites

You have access to the cluster as a user with the cluster-admin cluster role, or as a user with the user-workload-monitoring-config-edit role in the openshift-user-workload-monitoring project.
A cluster administrator has enabled monitoring for user-defined projects.
You have installed the OpenShift CLI (oc).

Procedure

Edit the user-workload-monitoring-config config map in the openshift-user-workload-monitoring project:
```
$ oc -n openshift-user-workload-monitoring edit configmap user-workload-monitoring-config
```
Add the retention time and size configuration under data/config.yaml:
```
apiVersion: v1
kind: ConfigMap
metadata:
  name: user-workload-monitoring-config
  namespace: openshift-user-workload-monitoring
data:
  config.yaml: |
    prometheus:
      retention: <time_specification>
      retentionSize: <size_specification>
```
where:
<time_specification>
Specifies the retention time. The number is directly followed by ms (milliseconds), s (seconds), m (minutes), h (hours), d (days), w (weeks), or y (years). You can also combine time values for specific times, such as 1h30m15s.
<size_specification>
Specifies the retention size. The number is directly followed by B (bytes), KB (kilobytes), MB (megabytes), GB (gigabytes), TB (terabytes), PB (petabytes), and EB (exabytes).
The following example sets the retention time to 24 hours and the retention size to 10 gigabytes for the Prometheus instance:
```
apiVersion: v1
kind: ConfigMap
metadata:
  name: user-workload-monitoring-config
  namespace: openshift-user-workload-monitoring
data:
  config.yaml: |
    prometheus:
      retention: 24h
      retentionSize: 10GB
```
Save the file to apply the changes. The pods affected by the new configuration are automatically redeployed.

3.2.1. Modifying the retention time for Thanos Ruler metrics data
Copier lien

By default, Thanos Ruler retains metrics data for 24 hours for user-defined projects. Modify the retention time to change how long this data is retained by editing the user-workload-monitoring-config config map in the openshift-user-workload-monitoring project.

Prerequisites

You have access to the cluster as a user with the cluster-admin cluster role or as a user with the user-workload-monitoring-config-edit role in the openshift-user-workload-monitoring project.
A cluster administrator has enabled monitoring for user-defined projects.
You have installed the OpenShift CLI (oc).

Procedure

Edit the user-workload-monitoring-config ConfigMap object in the openshift-user-workload-monitoring project:
```
$ oc -n openshift-user-workload-monitoring edit configmap user-workload-monitoring-config
```

Add the retention time configuration under data/config.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: user-workload-monitoring-config
  namespace: openshift-user-workload-monitoring
data:
  config.yaml: |
    thanosRuler:
      retention: <time_specification>

where:

<time_specification>: Specifies the retention time. The number is directly followed by ms (milliseconds), s (seconds), m (minutes), h (hours), d (days), w (weeks), or y (years). You can also combine time values for specific times, such as 1h30m15s. The default is 24h.

The following example sets the retention time to 10 days for Thanos Ruler data:

apiVersion: v1
kind: ConfigMap
metadata:
  name: user-workload-monitoring-config
  namespace: openshift-user-workload-monitoring
data:
  config.yaml: |
    thanosRuler:
      retention: 10d

Save the file to apply the changes. The pods affected by the new configuration are automatically redeployed.

3.3. Setting log levels for monitoring components
Copier lien

You can configure the log level for Alertmanager, Prometheus Operator, Prometheus, and Thanos Ruler. You can use these settings for troubleshooting and to gain better insight into how the components are functioning.

The following log levels can be applied to the relevant component in the user-workload-monitoring-config ConfigMap object:

debug. Log debug, informational, warning, and error messages.
info (default). Log informational, warning, and error messages.
warn. Log warning and error messages only.
error. Log error messages only.

Prerequisites

You have access to the cluster as a user with the cluster-admin cluster role or as a user with the user-workload-monitoring-config-edit role in the openshift-user-workload-monitoring project.
A cluster administrator has enabled monitoring for user-defined projects.
You have installed the OpenShift CLI (oc).

Procedure

Edit the user-workload-monitoring-config config map in the openshift-user-workload-monitoring project:
```
$ oc -n openshift-user-workload-monitoring edit configmap user-workload-monitoring-config
```
Add log configuration for a component under data/config.yaml:
```
apiVersion: v1
kind: ConfigMap
metadata:
  name: user-workload-monitoring-config
  namespace: openshift-user-workload-monitoring
data:
  config.yaml: |
    <component>:
      logLevel: <log_level>
    # ...
```
where:
<component>
Specifies the monitoring stack component for which you are setting a log level. Available component values are prometheus, alertmanager, prometheusOperator, and thanosRuler.
<log_level>
Specifies the log level for the component. The available values are error, warn, info, and debug. The default value is info.
Save the file to apply the changes. The pods affected by the new configuration are automatically redeployed.

Verification

Verify that the log configuration is applied by reviewing the deployment or pod configuration in the related project.
- The following example checks the log level for the prometheus-operator deployment:
  $ oc -n openshift-user-workload-monitoring get deploy prometheus-operator -o yaml | grep "log-level"
  Example output:
  - --log-level=debug
Verify that the pods for the component are running:
```
$ oc -n openshift-user-workload-monitoring get pods
```
Note
If an unrecognized logLevel value is included in the ConfigMap object, the pods for the component might not restart successfully.

3.4. Enabling the query log file for Prometheus
Copier lien

Configure Prometheus to write all queries that have been run by the engine to a log file for detailed analysis.

Important

Because log rotation is not supported, only enable this feature temporarily when you need to troubleshoot an issue. After you finish troubleshooting, disable query logging by reverting the changes you made to the ConfigMap object to enable the feature.

Prerequisites

You have access to the cluster as a user with the cluster-admin cluster role or as a user with the user-workload-monitoring-config-edit role in the openshift-user-workload-monitoring project.
A cluster administrator has enabled monitoring for user-defined projects.
You have installed the OpenShift CLI (oc).

Procedure

Edit the user-workload-monitoring-config config map in the openshift-user-workload-monitoring project:
```
$ oc -n openshift-user-workload-monitoring edit configmap user-workload-monitoring-config
```

Add the queryLogFile parameter for Prometheus under data/config.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: user-workload-monitoring-config
  namespace: openshift-user-workload-monitoring
data:
  config.yaml: |
    prometheus:
      queryLogFile: <path>

where:

<path>: Specifies the full path to the file in which queries will be logged.

Save the file to apply the changes. The pods affected by the new configuration are automatically redeployed.

Verification

Verify that the pods for the component are running. The following sample command lists the status of pods:

$ oc -n openshift-user-workload-monitoring get pods

Example output:

...
prometheus-operator-776fcbbd56-2nbfm   2/2     Running   0          132m
prometheus-user-workload-0             5/5     Running   1          132m
prometheus-user-workload-1             5/5     Running   1          132m
thanos-ruler-user-workload-0           3/3     Running   0          132m
thanos-ruler-user-workload-1           3/3     Running   0          132m
...

Read the query log:
```
$ oc -n openshift-user-workload-monitoring exec prometheus-user-workload-0 -- cat <path>
```
Important
Revert the setting in the config map after you have examined the logged query information.

Ce contenu n'est pas disponible dans la langue sélectionnée.

Chapter 3. Storing and recording data for user workload monitoring

3.1. Configuring persistent storage
Copier lien

3.1.1. Persistent storage prerequisites
Copier lien

3.1.2. Configuring a persistent volume claim
Copier lien

3.1.3. Resizing a persistent volume
Copier lien

3.2. Modifying retention time and size for Prometheus metrics data
Copier lien

3.2.1. Modifying the retention time for Thanos Ruler metrics data
Copier lien

3.3. Setting log levels for monitoring components
Copier lien

3.4. Enabling the query log file for Prometheus
Copier lien

Apprendre

Essayez, achetez et vendez

Communautés

À propos de la documentation Red Hat

Rendre l’open source plus inclusif

À propos de Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Ce contenu n'est pas disponible dans la langue sélectionnée.

Chapter 3. Storing and recording data for user workload monitoring

3.1. Configuring persistent storageCopier lienLien copié sur presse-papiers!

3.1.1. Persistent storage prerequisitesCopier lienLien copié sur presse-papiers!

3.1.2. Configuring a persistent volume claimCopier lienLien copié sur presse-papiers!

3.1.3. Resizing a persistent volumeCopier lienLien copié sur presse-papiers!

3.2. Modifying retention time and size for Prometheus metrics dataCopier lienLien copié sur presse-papiers!

3.2.1. Modifying the retention time for Thanos Ruler metrics dataCopier lienLien copié sur presse-papiers!

3.3. Setting log levels for monitoring componentsCopier lienLien copié sur presse-papiers!

3.4. Enabling the query log file for PrometheusCopier lienLien copié sur presse-papiers!

Apprendre

Essayez, achetez et vendez

Communautés

À propos de la documentation Red Hat

Rendre l’open source plus inclusif

À propos de Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

3.1. Configuring persistent storage
Copier lien

3.1.1. Persistent storage prerequisites
Copier lien

3.1.2. Configuring a persistent volume claim
Copier lien

3.1.3. Resizing a persistent volume
Copier lien

3.2. Modifying retention time and size for Prometheus metrics data
Copier lien

3.2.1. Modifying the retention time for Thanos Ruler metrics data
Copier lien

3.3. Setting log levels for monitoring components
Copier lien

3.4. Enabling the query log file for Prometheus
Copier lien