3.2. 为核心平台监控配置性能和可扩展性

3.2.1. 控制监控组件的放置和分发
复制链接

您可以将监控堆栈组件移到特定的节点上：

使用带有标记节点的 nodeSelector 约束，将任何监控堆栈组件移到特定的节点上。
分配容限以启用将组件移到污点节点。

通过这样做，您可以控制集群中监控组件的放置和分发。

通过控制监控组件的放置和分发，您可以根据特定要求或策略优化系统资源使用、提高性能和隔离工作负载。

3.2.1.1. 将监控组件移到其他节点
复制链接

要指定运行监控堆栈组件的集群中的节点，请在 cluster-monitoring-config 配置映射中为组件配置 nodeSelector 约束，以匹配分配给节点的标签。

注意

您不能将节点选择器约束直接添加到现有调度的 pod 中。

先决条件

您可以使用具有 cluster-admin 集群角色的用户身份访问集群。
您已创建 cluster-monitoring-config ConfigMap 对象。
已安装 OpenShift CLI(oc)。

流程

如果您还没有这样做，请在要运行监控组件的节点中添加标签：
```
oc label nodes <node_name> <node_label>
```
```
$ oc label nodes <node_name> <node_label> 
```
1
Copy to Clipboard Toggle word wrap
1
将 <node_name> 替换为您要添加标签的节点的名称。将 <node_label> 替换为所需标签的名称。

编辑 openshift-monitoring 项目中的 cluster-monitoring-config ConfigMap 对象：

oc -n openshift-monitoring edit configmap cluster-monitoring-config

$ oc -n openshift-monitoring edit configmap cluster-monitoring-config

Copy to Clipboard

Toggle word wrap

在 data/config.yaml 下为组件指定 nodeSelector 约束的节点标签：

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    # ...
    <component>: 
      nodeSelector:
        <node_label_1> 
        <node_label_2> 
    # ...

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    # ...
    <component>:

1


      nodeSelector:
        <node_label_1>

2


        <node_label_2>

3


    # ...

Copy to Clipboard

Toggle word wrap

1: 将 <component> 替换为适当的监控堆栈组件名称。
2: 将 <node_label_1> 替换为添加到节点的标签。
3: 可选：指定附加标签。如果您指定了额外的标签，则组件的 pod 仅调度到包含所有指定标签的节点上。

注意

如果在配置 nodeSelector 约束后监控组件仍然处于 Pending 状态，请检查 Pod 事件中与污点和容限相关的错误。

保存文件以使改变生效。新配置中指定的组件会自动移到新节点上，受新配置影响的 pod 会被重新部署。

3.2.1.2. 为监控组件分配容忍（tolerations）
复制链接

您可以为任何监控堆栈组件分配容忍，以便将其移到污点。

先决条件

您可以使用具有 cluster-admin 集群角色的用户身份访问集群。
您已创建 cluster-monitoring-config ConfigMap 对象。
已安装 OpenShift CLI(oc)。

流程

编辑 openshift-monitoring 项目中的 cluster-monitoring-config 配置映射：

oc -n openshift-monitoring edit configmap cluster-monitoring-config

$ oc -n openshift-monitoring edit configmap cluster-monitoring-config

Copy to Clipboard

Toggle word wrap

为组件指定 tolerations：

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    <component>:
      tolerations:
        <toleration_specification>

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    <component>:
      tolerations:
        <toleration_specification>

Copy to Clipboard

Toggle word wrap

相应地替换 <component> 和 <toleration_specification>。

例如，oc adm taint nodes node1 key1=value1:NoSchedule 会将一个键为 key1 且值为 value1 的污点添加到 node1。这会防止监控组件在 node1 上部署 Pod，除非为该污点配置了容限。以下示例将 alertmanagerMain 组件配置为容许示例污点：

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    alertmanagerMain:
      tolerations:
      - key: "key1"
        operator: "Equal"
        value: "value1"
        effect: "NoSchedule"

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    alertmanagerMain:
      tolerations:
      - key: "key1"
        operator: "Equal"
        value: "value1"
        effect: "NoSchedule"

Copy to Clipboard

Toggle word wrap

保存文件以使改变生效。受新配置影响的 Pod 会自动重新部署。

3.2.2. 为指标提取设置正文大小限制
复制链接

默认情况下，针对从提取的指标目标返回的数据的未压缩正文大小没有限制。您可以设置正文大小限制，以帮助避免在提取目标返回包含大量数据时 Prometheus 消耗大量内存的情况。另外，通过设置正文大小限制，您可以降低恶意目标在 Prometheus 和整个集群中可能对这个影响。

为 enforcedBodySizeLimit 设置了一个值后，当至少有一个 Prometheus scrape 目标回复大于配置的值时，PrometheusScrapeBodySizeLimitHit 会触发警报。

注意

如果从目标中提取的指标数据有一个不压缩的正文大小超过配置的大小限制，则提取会失败。然后，Prometheus 会认为这个目标为停机状态，并将其 up 指标值设置为 0，它将触发一个 TargetDown 警报。

先决条件

您可以使用具有 cluster-admin 集群角色的用户身份访问集群。
已安装 OpenShift CLI(oc)。

流程

编辑 openshift-monitoring 命名空间中的 cluster-monitoring-config ConfigMap 对象：
```
oc -n openshift-monitoring edit configmap cluster-monitoring-config
```
```
$ oc -n openshift-monitoring edit configmap cluster-monitoring-config
```
Copy to Clipboard Toggle word wrap
将 enforcedBodySizeLimit 的值添加到 data/config.yaml/prometheusK8s 中，以限制每个目标提取可接受的正文大小：
```
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |-
    prometheusK8s:
      enforcedBodySizeLimit: 40MB 
```
```
apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |-
    prometheusK8s:
      enforcedBodySizeLimit: 40MB 
```
1
Copy to Clipboard Toggle word wrap
1
指定提取指标目标的最大正文大小。这个 enforceBodySizeLimit 示例将每个目标提取的未压缩大小限制为 40MB。有效数字值使用 Prometheus 数据大小格式：B (bytes), KB (kilobytes), MB (megabytes), GB (gigabytes), TB (terabytes), PB (petabytes), and EB (exabytes)。默认值为 0，代表没有指定限制。您还可以将值设为 automatic，以根据集群容量自动计算限制。
保存文件以使改变生效。新的配置会被自动应用。

3.2.3. 管理监控组件的 CPU 和内存资源
复制链接

您可以通过为这些组件的资源限值和请求指定值来确保运行监控组件的容器具有足够的 CPU 和内存资源。

您可以为 openshift-monitoring 命名空间中的核心平台监控组件配置这些限制和请求。

3.2.3.1. 指定限制和请求
复制链接

要配置 CPU 和内存资源，请在 openshift-monitoring 命名空间中的 cluster-monitoring-config ConfigMap 对象中指定资源限值和请求值。

先决条件

您可以使用具有 cluster-admin 集群角色的用户身份访问集群。
您已创建了名为 cluster-monitoring-config 的 ConfigMap 对象。
已安装 OpenShift CLI(oc)。

流程

编辑 openshift-monitoring 项目中的 cluster-monitoring-config 配置映射：

oc -n openshift-monitoring edit configmap cluster-monitoring-config

$ oc -n openshift-monitoring edit configmap cluster-monitoring-config

Copy to Clipboard

Toggle word wrap

添加值来为您要配置的每个组件定义资源限值和请求。

重要

确保为限制设置的值始终高于为请求设置的值。否则，会出现错误，容器将不会运行。

设置资源限值和请求示例

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    alertmanagerMain:
      resources:
        limits:
          cpu: 500m
          memory: 1Gi
        requests:
          cpu: 200m
          memory: 500Mi
    prometheusK8s:
      resources:
        limits:
          cpu: 500m
          memory: 3Gi
        requests:
          cpu: 200m
          memory: 500Mi
    thanosQuerier:
      resources:
        limits:
          cpu: 500m
          memory: 1Gi
        requests:
          cpu: 200m
          memory: 500Mi
    prometheusOperator:
      resources:
        limits:
          cpu: 500m
          memory: 1Gi
        requests:
          cpu: 200m
          memory: 500Mi
    metricsServer:
      resources:
        requests:
          cpu: 10m
          memory: 50Mi
        limits:
          cpu: 50m
          memory: 500Mi
    kubeStateMetrics:
      resources:
        limits:
          cpu: 500m
          memory: 1Gi
        requests:
          cpu: 200m
          memory: 500Mi
    telemeterClient:
      resources:
        limits:
          cpu: 500m
          memory: 1Gi
        requests:
          cpu: 200m
          memory: 500Mi
    openshiftStateMetrics:
      resources:
        limits:
          cpu: 500m
          memory: 1Gi
        requests:
          cpu: 200m
          memory: 500Mi
    nodeExporter:
      resources:
        limits:
          cpu: 50m
          memory: 150Mi
        requests:
          cpu: 20m
          memory: 50Mi
    monitoringPlugin:
      resources:
        limits:
          cpu: 500m
          memory: 1Gi
        requests:
          cpu: 200m
          memory: 500Mi
    prometheusOperatorAdmissionWebhook:
      resources:
        limits:
          cpu: 50m
          memory: 100Mi
        requests:
          cpu: 20m
          memory: 50Mi

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    alertmanagerMain:
      resources:
        limits:
          cpu: 500m
          memory: 1Gi
        requests:
          cpu: 200m
          memory: 500Mi
    prometheusK8s:
      resources:
        limits:
          cpu: 500m
          memory: 3Gi
        requests:
          cpu: 200m
          memory: 500Mi
    thanosQuerier:
      resources:
        limits:
          cpu: 500m
          memory: 1Gi
        requests:
          cpu: 200m
          memory: 500Mi
    prometheusOperator:
      resources:
        limits:
          cpu: 500m
          memory: 1Gi
        requests:
          cpu: 200m
          memory: 500Mi
    metricsServer:
      resources:
        requests:
          cpu: 10m
          memory: 50Mi
        limits:
          cpu: 50m
          memory: 500Mi
    kubeStateMetrics:
      resources:
        limits:
          cpu: 500m
          memory: 1Gi
        requests:
          cpu: 200m
          memory: 500Mi
    telemeterClient:
      resources:
        limits:
          cpu: 500m
          memory: 1Gi
        requests:
          cpu: 200m
          memory: 500Mi
    openshiftStateMetrics:
      resources:
        limits:
          cpu: 500m
          memory: 1Gi
        requests:
          cpu: 200m
          memory: 500Mi
    nodeExporter:
      resources:
        limits:
          cpu: 50m
          memory: 150Mi
        requests:
          cpu: 20m
          memory: 50Mi
    monitoringPlugin:
      resources:
        limits:
          cpu: 500m
          memory: 1Gi
        requests:
          cpu: 200m
          memory: 500Mi
    prometheusOperatorAdmissionWebhook:
      resources:
        limits:
          cpu: 50m
          memory: 100Mi
        requests:
          cpu: 20m
          memory: 50Mi

Copy to Clipboard

Toggle word wrap

保存文件以使改变生效。受新配置影响的 Pod 会自动重新部署。

3.2.4. 选择指标集合配置集
复制链接

重要

指标集合配置集只是一个技术预览功能。技术预览功能不受红帽产品服务等级协议（SLA）支持，且功能可能并不完整。红帽不推荐在生产环境中使用它们。这些技术预览功能可以使用户提早试用新的功能，并有机会在开发阶段提供反馈意见。

有关红帽技术预览功能支持范围的更多信息，请参阅以下链接：

技术预览功能支持范围

要为 OpenShift Container Platform 核心监控组件选择指标集合配置集，请编辑 cluster-monitoring-config ConfigMap 对象。

先决条件

已安装 OpenShift CLI(oc)。
已使用 FeatureGate 自定义资源 (CR) 启用了技术预览功能。
您已创建 cluster-monitoring-config ConfigMap 对象。
您可以使用具有 cluster-admin 集群角色的用户身份访问集群。

流程

编辑 openshift-monitoring 项目中的 cluster-monitoring-config ConfigMap 对象：

oc -n openshift-monitoring edit configmap cluster-monitoring-config

$ oc -n openshift-monitoring edit configmap cluster-monitoring-config

Copy to Clipboard

Toggle word wrap

在 data/config.yaml/prometheusK8s 下添加 metrics collection 配置集设置：

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      collectionProfile: <metrics_collection_profile_name>

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      collectionProfile: <metrics_collection_profile_name>

1

Copy to Clipboard

Toggle word wrap

1: 指标集合配置集的名称。可用值为 full 或 minimal。如果没有指定值，或者配置映射中不存在 collectionProfile 键名称，则会使用 full 的默认设置。

以下示例将 Prometheus 核心平台实例的指标集合配置集设置为 minimal ：

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      collectionProfile: minimal

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      collectionProfile: minimal

Copy to Clipboard

Toggle word wrap

保存文件以使改变生效。新的配置会被自动应用。

3.2.5. 配置 pod 拓扑分布限制
复制链接

您可以为 Cluster Monitoring Operator 部署的所有 pod 配置 pod 拓扑分布限制，以控制如何在区调度到节点的 pod 副本。这样可确保 pod 具有高可用性并更有效地运行，因为工作负载分散在不同的数据中心或分层基础架构区域中。

您可以使用 cluster-monitoring-config 配置映射为监控 pod 配置 pod 拓扑分布限制。

先决条件

您可以使用具有 cluster-admin 集群角色的用户身份访问集群。
您已创建 cluster-monitoring-config ConfigMap 对象。
已安装 OpenShift CLI(oc)。

流程

编辑 openshift-monitoring 项目中的 cluster-monitoring-config 配置映射：

oc -n openshift-monitoring edit configmap cluster-monitoring-config

$ oc -n openshift-monitoring edit configmap cluster-monitoring-config

Copy to Clipboard

Toggle word wrap

在 data/config.yaml 字段中添加以下设置来配置 pod 拓扑分布限制：

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    <component>: 
      topologySpreadConstraints:
      - maxSkew: <n> 
        topologyKey: <key> 
        whenUnsatisfiable: <value> 
        labelSelector: 
          <match_option>

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    <component>:

1


      topologySpreadConstraints:
      - maxSkew: <n>

2


        topologyKey: <key>

3


        whenUnsatisfiable: <value>

4


        labelSelector:

5


          <match_option>

Copy to Clipboard

Toggle word wrap

1: 指定您要为其设置 pod 拓扑分布限制的组件名称。
2: 为 maxSkew 指定数字值，它定义了允许不均匀分布 pod 的程度。
3: 为 topologyKey 指定节点标签键。带有具有此键和相同值标签的节点被视为在同一拓扑中。调度程序会尝试将大量 pod 放置到每个域中。
4: 为 whenUnsatisfiable 指定一个值。可用选项包括 DoNotSchedule 和 ScheduleAnyway。如果您希望 maxSkew 值定义目标拓扑和全局最小值中匹配 pod 数量之间允许的最大值，则指定 DoNotSchedule。如果您希望调度程序仍然调度 pod，但为可能降低 skew 的节点赋予更高的优先级，请指定 ScheduleAnyway。
5: 指定 labelSelector 来查找匹配的 pod。与此标签选择器匹配的 Pod 被计算，以确定其对应拓扑域中的 pod 数量。

Prometheus 配置示例

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: monitoring
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app.kubernetes.io/name: prometheus

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    prometheusK8s:
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: monitoring
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app.kubernetes.io/name: prometheus

Copy to Clipboard

Toggle word wrap

保存文件以使改变生效。受新配置影响的 Pod 会自动重新部署。

3.2.1. 控制监控组件的放置和分发
复制链接

3.2.1.1. 将监控组件移到其他节点
复制链接

3.2.1.2. 为监控组件分配容忍（tolerations）
复制链接

3.2.2. 为指标提取设置正文大小限制
复制链接

3.2.3. 管理监控组件的 CPU 和内存资源
复制链接

3.2.3.1. 指定限制和请求
复制链接

3.2.4. 选择指标集合配置集
复制链接

3.2.5. 配置 pod 拓扑分布限制
复制链接

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

3.2. 为核心平台监控配置性能和可扩展性

3.2.1. 控制监控组件的放置和分发复制链接链接已复制到粘贴板!

3.2.1.1. 将监控组件移到其他节点复制链接链接已复制到粘贴板!

3.2.1.2. 为监控组件分配容忍（tolerations）复制链接链接已复制到粘贴板!

3.2.2. 为指标提取设置正文大小限制复制链接链接已复制到粘贴板!

3.2.3. 管理监控组件的 CPU 和内存资源复制链接链接已复制到粘贴板!

3.2.3.1. 指定限制和请求复制链接链接已复制到粘贴板!

3.2.4. 选择指标集合配置集复制链接链接已复制到粘贴板!

3.2.5. 配置 pod 拓扑分布限制复制链接链接已复制到粘贴板!

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

3.2.1. 控制监控组件的放置和分发
复制链接

3.2.1.1. 将监控组件移到其他节点
复制链接

3.2.1.2. 为监控组件分配容忍（tolerations）
复制链接

3.2.2. 为指标提取设置正文大小限制
复制链接

3.2.3. 管理监控组件的 CPU 和内存资源
复制链接

3.2.3.1. 指定限制和请求
复制链接

3.2.4. 选择指标集合配置集
复制链接

3.2.5. 配置 pod 拓扑分布限制
复制链接