Chapter 5. Configuring pod topology spread constraints for monitoring
You can use pod topology spread constraints to control how Prometheus, Thanos Ruler, and Alertmanager pods are spread across a network topology when OpenShift Container Platform pods are deployed in multiple availability zones.
Pod topology spread constraints are suitable for controlling pod scheduling within hierarchical topologies in which nodes are spread across different infrastructure levels, such as regions and zones within those regions. Additionally, by being able to schedule pods in different zones, you can improve network latency in certain scenarios.
Additional resources
5.1. Setting up pod topology spread constraints for Prometheus
For core OpenShift Container Platform platform monitoring, you can set up pod topology spread constraints for Prometheus to fine tune how pod replicas are scheduled to nodes across zones. Doing so helps ensure that Prometheus pods are highly available and run more efficiently, because workloads are spread across nodes in different data centers or hierarchical infrastructure zones.
You configure pod topology spread constraints for Prometheus in the cluster-monitoring-config
config map.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
cluster role. -
You have created the
cluster-monitoring-config
ConfigMap
object. -
You have installed the OpenShift CLI (
oc
).
Procedure
Edit the
cluster-monitoring-config
ConfigMap
object in theopenshift-monitoring
namespace:$ oc -n openshift-monitoring edit configmap cluster-monitoring-config
Add values for the following settings under
data/config.yaml/prometheusK8s
to configure pod topology spread constraints:apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | prometheusK8s: topologySpreadConstraints: - maxSkew: 1 1 topologyKey: monitoring 2 whenUnsatisfiable: DoNotSchedule 3 labelSelector: matchLabels: 4 app.kubernetes.io/name: prometheus
- 1
- Specify a numeric value for
maxSkew
, which defines the degree to which pods are allowed to be unevenly distributed. This field is required, and the value must be greater than zero. The value specified has a different effect depending on what value you specify forwhenUnsatisfiable
. - 2
- Specify a key of node labels for
topologyKey
. This field is required. Nodes that have a label with this key and identical values are considered to be in the same topology. The scheduler will try to put a balanced number of pods into each domain. - 3
- Specify a value for
whenUnsatisfiable
. This field is required. Available options areDoNotSchedule
andScheduleAnyway
. SpecifyDoNotSchedule
if you want themaxSkew
value to define the maximum difference allowed between the number of matching pods in the target topology and the global minimum. SpecifyScheduleAnyway
if you want the scheduler to still schedule the pod but to give higher priority to nodes that might reduce the skew. - 4
- Specify a value for
matchLabels
. This value is used to identify the set of matching pods to which to apply the constraints.
- Save the file to apply the changes. The pods affected by the new configuration are automatically redeployed.
5.2. Setting up pod topology spread constraints for Alertmanager
For core OpenShift Container Platform platform monitoring, you can set up pod topology spread constraints for Alertmanager to fine tune how pod replicas are scheduled to nodes across zones. Doing so helps ensure that Alertmanager pods are highly available and run more efficiently, because workloads are spread across nodes in different data centers or hierarchical infrastructure zones.
You configure pod topology spread constraints for Alertmanager in the cluster-monitoring-config
config map.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
cluster role. -
You have created the
cluster-monitoring-config
ConfigMap
object. -
You have installed the OpenShift CLI (
oc
).
Procedure
Edit the
cluster-monitoring-config
ConfigMap
object in theopenshift-monitoring
namespace:$ oc -n openshift-monitoring edit configmap cluster-monitoring-config
Add values for the following settings under
data/config.yaml/alertmanagermain
to configure pod topology spread constraints:apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | alertmanagerMain: topologySpreadConstraints: - maxSkew: 1 1 topologyKey: monitoring 2 whenUnsatisfiable: DoNotSchedule 3 labelSelector: matchLabels: 4 app.kubernetes.io/name: alertmanager
- 1
- Specify a numeric value for
maxSkew
, which defines the degree to which pods are allowed to be unevenly distributed. This field is required, and the value must be greater than zero. The value specified has a different effect depending on what value you specify forwhenUnsatisfiable
. - 2
- Specify a key of node labels for
topologyKey
. This field is required. Nodes that have a label with this key and identical values are considered to be in the same topology. The scheduler will try to put a balanced number of pods into each domain. - 3
- Specify a value for
whenUnsatisfiable
. This field is required. Available options areDoNotSchedule
andScheduleAnyway
. SpecifyDoNotSchedule
if you want themaxSkew
value to define the maximum difference allowed between the number of matching pods in the target topology and the global minimum. SpecifyScheduleAnyway
if you want the scheduler to still schedule the pod but to give higher priority to nodes that might reduce the skew. - 4
- Specify a value for
matchLabels
. This value is used to identify the set of matching pods to which to apply the constraints.
- Save the file to apply the changes. The pods affected by the new configuration are automatically redeployed.
5.3. Setting up pod topology spread constraints for Thanos Ruler
For user-defined monitoring, you can set up pod topology spread constraints for Thanos Ruler to fine tune how pod replicas are scheduled to nodes across zones. Doing so helps ensure that Thanos Ruler pods are highly available and run more efficiently, because workloads are spread across nodes in different data centers or hierarchical infrastructure zones.
You configure pod topology spread constraints for Thanos Ruler in the user-workload-monitoring-config
config map.
Prerequisites
- A cluster administrator has enabled monitoring for user-defined projects.
-
You have access to the cluster as a user with the
cluster-admin
cluster role, or as a user with theuser-workload-monitoring-config-edit
role in theopenshift-user-workload-monitoring
project. -
You have installed the OpenShift CLI (
oc
).
Procedure
Edit the
user-workload-monitoring-config
config map in theopenshift-user-workload-monitoring
namespace:$ oc -n openshift-user-workload-monitoring edit configmap user-workload-monitoring-config
Add values for the following settings under
data/config.yaml/thanosRuler
to configure pod topology spread constraints:apiVersion: v1 kind: ConfigMap metadata: name: user-workload-monitoring-config namespace: openshift-user-workload-monitoring data: config.yaml: | thanosRuler: topologySpreadConstraints: - maxSkew: 1 1 topologyKey: monitoring 2 whenUnsatisfiable: ScheduleAnyway 3 labelSelector: matchLabels: 4 app.kubernetes.io/name: thanos-ruler
- 1
- Specify a numeric value for
maxSkew
, which defines the degree to which pods are allowed to be unevenly distributed. This field is required, and the value must be greater than zero. The value specified has a different effect depending on what value you specify forwhenUnsatisfiable
. - 2
- Specify a key of node labels for
topologyKey
. This field is required. Nodes that have a label with this key and identical values are considered to be in the same topology. The scheduler will try to put a balanced number of pods into each domain. - 3
- Specify a value for
whenUnsatisfiable
. This field is required. Available options areDoNotSchedule
andScheduleAnyway
. SpecifyDoNotSchedule
if you want themaxSkew
value to define the maximum difference allowed between the number of matching pods in the target topology and the global minimum. SpecifyScheduleAnyway
if you want the scheduler to still schedule the pod but to give higher priority to nodes that might reduce the skew. - 4
- Specify a value for
matchLabels
. This value is used to identify the set of matching pods to which to apply the constraints.
Save the file to apply the changes automatically.
WarningWhen you save changes to the
user-workload-monitoring-config
config map, the pods and other resources in theopenshift-user-workload-monitoring
project might be redeployed. The running monitoring processes in that project might also restart.
5.4. Setting log levels for monitoring components
You can configure the log level for Alertmanager, Prometheus Operator, Prometheus, Thanos Querier, and Thanos Ruler.
The following log levels can be applied to the relevant component in the cluster-monitoring-config
and user-workload-monitoring-config
ConfigMap
objects:
-
debug
. Log debug, informational, warning, and error messages. -
info
. Log informational, warning, and error messages. -
warn
. Log warning and error messages only. -
error
. Log error messages only.
The default log level is info
.
Prerequisites
If you are setting a log level for Alertmanager, Prometheus Operator, Prometheus, or Thanos Querier in the
openshift-monitoring
project:-
You have access to the cluster as a user with the
cluster-admin
cluster role. -
You have created the
cluster-monitoring-config
ConfigMap
object.
-
You have access to the cluster as a user with the
If you are setting a log level for Prometheus Operator, Prometheus, or Thanos Ruler in the
openshift-user-workload-monitoring
project:-
You have access to the cluster as a user with the
cluster-admin
cluster role, or as a user with theuser-workload-monitoring-config-edit
role in theopenshift-user-workload-monitoring
project. - A cluster administrator has enabled monitoring for user-defined projects.
-
You have access to the cluster as a user with the
-
You have installed the OpenShift CLI (
oc
).
Procedure
Edit the
ConfigMap
object:To set a log level for a component in the
openshift-monitoring
project:Edit the
cluster-monitoring-config
ConfigMap
object in theopenshift-monitoring
project:$ oc -n openshift-monitoring edit configmap cluster-monitoring-config
Add
logLevel: <log_level>
for a component underdata/config.yaml
:apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | <component>: 1 logLevel: <log_level> 2
- 1
- The monitoring stack component for which you are setting a log level. For default platform monitoring, available component values are
prometheusK8s
,alertmanagerMain
,prometheusOperator
, andthanosQuerier
. - 2
- The log level to set for the component. The available values are
error
,warn
,info
, anddebug
. The default value isinfo
.
To set a log level for a component in the
openshift-user-workload-monitoring
project:Edit the
user-workload-monitoring-config
ConfigMap
object in theopenshift-user-workload-monitoring
project:$ oc -n openshift-user-workload-monitoring edit configmap user-workload-monitoring-config
Add
logLevel: <log_level>
for a component underdata/config.yaml
:apiVersion: v1 kind: ConfigMap metadata: name: user-workload-monitoring-config namespace: openshift-user-workload-monitoring data: config.yaml: | <component>: 1 logLevel: <log_level> 2
- 1
- The monitoring stack component for which you are setting a log level. For user workload monitoring, available component values are
alertmanager
,prometheus
,prometheusOperator
, andthanosRuler
. - 2
- The log level to apply to the component. The available values are
error
,warn
,info
, anddebug
. The default value isinfo
.
- Save the file to apply the changes. The pods affected by the new configuration are automatically redeployed.
Confirm that the log-level has been applied by reviewing the deployment or pod configuration in the related project. The following example checks the log level in the
prometheus-operator
deployment in theopenshift-user-workload-monitoring
project:$ oc -n openshift-user-workload-monitoring get deploy prometheus-operator -o yaml | grep "log-level"
Example output
- --log-level=debug
Check that the pods for the component are running. The following example lists the status of pods in the
openshift-user-workload-monitoring
project:$ oc -n openshift-user-workload-monitoring get pods
NoteIf an unrecognized
logLevel
value is included in theConfigMap
object, the pods for the component might not restart successfully.
5.5. Enabling the query log file for Prometheus
You can configure Prometheus to write all queries that have been run by the engine to a log file. You can do so for default platform monitoring and for user-defined workload monitoring.
Because log rotation is not supported, only enable this feature temporarily when you need to troubleshoot an issue. After you finish troubleshooting, disable query logging by reverting the changes you made to the ConfigMap
object to enable the feature.
Prerequisites
If you are enabling the query log file feature for Prometheus in the
openshift-monitoring
project:-
You have access to the cluster as a user with the
cluster-admin
cluster role. -
You have created the
cluster-monitoring-config
ConfigMap
object.
-
You have access to the cluster as a user with the
If you are enabling the query log file feature for Prometheus in the
openshift-user-workload-monitoring
project:-
You have access to the cluster as a user with the
cluster-admin
cluster role, or as a user with theuser-workload-monitoring-config-edit
role in theopenshift-user-workload-monitoring
project. - A cluster administrator has enabled monitoring for user-defined projects.
-
You have access to the cluster as a user with the
-
You have installed the OpenShift CLI (
oc
).
Procedure
To set the query log file for Prometheus in the
openshift-monitoring
project:Edit the
cluster-monitoring-config
ConfigMap
object in theopenshift-monitoring
project:$ oc -n openshift-monitoring edit configmap cluster-monitoring-config
Add
queryLogFile: <path>
forprometheusK8s
underdata/config.yaml
:apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | prometheusK8s: queryLogFile: <path> 1
- 1
- The full path to the file in which queries will be logged.
- Save the file to apply the changes. The pods affected by the new configuration are automatically redeployed.
Verify that the pods for the component are running. The following sample command lists the status of pods in the
openshift-monitoring
project:$ oc -n openshift-monitoring get pods
Read the query log:
$ oc -n openshift-monitoring exec prometheus-k8s-0 -- cat <path>
ImportantRevert the setting in the config map after you have examined the logged query information.
To set the query log file for Prometheus in the
openshift-user-workload-monitoring
project:Edit the
user-workload-monitoring-config
ConfigMap
object in theopenshift-user-workload-monitoring
project:$ oc -n openshift-user-workload-monitoring edit configmap user-workload-monitoring-config
Add
queryLogFile: <path>
forprometheus
underdata/config.yaml
:apiVersion: v1 kind: ConfigMap metadata: name: user-workload-monitoring-config namespace: openshift-user-workload-monitoring data: config.yaml: | prometheus: queryLogFile: <path> 1
- 1
- The full path to the file in which queries will be logged.
- Save the file to apply the changes. The pods affected by the new configuration are automatically redeployed.
Verify that the pods for the component are running. The following example command lists the status of pods in the
openshift-user-workload-monitoring
project:$ oc -n openshift-user-workload-monitoring get pods
Read the query log:
$ oc -n openshift-user-workload-monitoring exec prometheus-user-workload-0 -- cat <path>
ImportantRevert the setting in the config map after you have examined the logged query information.
Additional resources
- See Preparing to configure the monitoring stack for steps to create monitoring config maps
- See Enabling monitoring for user-defined projects for steps to enable user-defined monitoring.
5.6. Enabling query logging for Thanos Querier
For default platform monitoring in the openshift-monitoring
project, you can enable the Cluster Monitoring Operator (CMO) to log all queries run by Thanos Querier.
Because log rotation is not supported, only enable this feature temporarily when you need to troubleshoot an issue. After you finish troubleshooting, disable query logging by reverting the changes you made to the ConfigMap
object to enable the feature.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have access to the cluster as a user with the
cluster-admin
cluster role. -
You have created the
cluster-monitoring-config
ConfigMap
object.
Procedure
You can enable query logging for Thanos Querier in the openshift-monitoring
project:
Edit the
cluster-monitoring-config
ConfigMap
object in theopenshift-monitoring
project:$ oc -n openshift-monitoring edit configmap cluster-monitoring-config
Add a
thanosQuerier
section underdata/config.yaml
and add values as shown in the following example:apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | thanosQuerier: enableRequestLogging: <value> 1 logLevel: <value> 2
- Save the file to apply the changes. The pods affected by the new configuration are automatically redeployed.
Verification
Verify that the Thanos Querier pods are running. The following sample command lists the status of pods in the
openshift-monitoring
project:$ oc -n openshift-monitoring get pods
Run a test query using the following sample commands as a model:
$ token=`oc create token prometheus-k8s -n openshift-monitoring` $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?query=cluster_version'
Run the following command to read the query log:
$ oc -n openshift-monitoring logs <thanos_querier_pod_name> -c thanos-query
NoteBecause the
thanos-querier
pods are highly available (HA) pods, you might be able to see logs in only one pod.-
After you examine the logged query information, disable query logging by changing the
enableRequestLogging
value tofalse
in the config map.
Additional resources
- See Preparing to configure the monitoring stack for steps to create monitoring config maps.
5.7. Setting audit log levels for the Prometheus Adapter
In default platform monitoring, you can configure the audit log level for the Prometheus Adapter.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have access to the cluster as a user with the
cluster-admin
cluster role. -
You have created the
cluster-monitoring-config
ConfigMap
object.
Procedure
You can set an audit log level for the Prometheus Adapter in the default openshift-monitoring
project:
Edit the
cluster-monitoring-config
ConfigMap
object in theopenshift-monitoring
project:$ oc -n openshift-monitoring edit configmap cluster-monitoring-config
Add
profile:
in thek8sPrometheusAdapter/audit
section underdata/config.yaml
:apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | k8sPrometheusAdapter: audit: profile: <audit_log_level> 1
- 1
- The audit log level to apply to the Prometheus Adapter.
Set the audit log level by using one of the following values for the
profile:
parameter:-
None
: Do not log events. -
Metadata
: Log only the metadata for the request, such as user, timestamp, and so forth. Do not log the request text and the response text.Metadata
is the default audit log level. -
Request
: Log only the metadata and the request text but not the response text. This option does not apply for non-resource requests. -
RequestResponse
: Log event metadata, request text, and response text. This option does not apply for non-resource requests.
-
- Save the file to apply the changes. The pods affected by the new configuration are automatically redeployed.
Verification
-
In the config map, under
k8sPrometheusAdapter/audit/profile
, set the log level toRequest
and save the file. Confirm that the pods for the Prometheus Adapter are running. The following example lists the status of pods in the
openshift-monitoring
project:$ oc -n openshift-monitoring get pods
Confirm that the audit log level and audit log file path are correctly configured:
$ oc -n openshift-monitoring get deploy prometheus-adapter -o yaml
Example output
... - --audit-policy-file=/etc/audit/request-profile.yaml - --audit-log-path=/var/log/adapter/audit.log
Confirm that the correct log level has been applied in the
prometheus-adapter
deployment in theopenshift-monitoring
project:$ oc -n openshift-monitoring exec deploy/prometheus-adapter -c prometheus-adapter -- cat /etc/audit/request-profile.yaml
Example output
"apiVersion": "audit.k8s.io/v1" "kind": "Policy" "metadata": "name": "Request" "omitStages": - "RequestReceived" "rules": - "level": "Request"
NoteIf you enter an unrecognized
profile
value for the Prometheus Adapter in theConfigMap
object, no changes are made to the Prometheus Adapter, and an error is logged by the Cluster Monitoring Operator.Review the audit log for the Prometheus Adapter:
$ oc -n openshift-monitoring exec -c <prometheus_adapter_pod_name> -- cat /var/log/adapter/audit.log
Additional resources
- See Preparing to configure the monitoring stack for steps to create monitoring config maps.
5.8. Disabling the local Alertmanager
A local Alertmanager that routes alerts from Prometheus instances is enabled by default in the openshift-monitoring
project of the OpenShift Container Platform monitoring stack.
If you do not need the local Alertmanager, you can disable it by configuring the cluster-monitoring-config
config map in the openshift-monitoring
project.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
cluster role. -
You have created the
cluster-monitoring-config
config map. -
You have installed the OpenShift CLI (
oc
).
Procedure
Edit the
cluster-monitoring-config
config map in theopenshift-monitoring
project:$ oc -n openshift-monitoring edit configmap cluster-monitoring-config
Add
enabled: false
for thealertmanagerMain
component underdata/config.yaml
:apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | alertmanagerMain: enabled: false
- Save the file to apply the changes. The Alertmanager instance is disabled automatically when you apply the change.
Additional resources
- Prometheus Alertmanager documentation
- xref:[Managing alerts]