1.5. 管理警报

1.5.1. 先决条件
复制链接

您必须拥有以下凭据并完成以下操作才能管理警报：

在您的中心集群上启用可观察性。
拥有open-cluster-management-observability命名空间中Secret资源的创建权限。
对MultiClusterObservability资源拥有编辑权限。
在托管集群上配置 Prometheus 实例，以帮助防止重启期间指标丢失。如需帮助，请参阅 OpenShift Container Platform 文档中的配置持久存储。
在托管集群上启用用户工作负载监控和警报。如需帮助，请参阅 OpenShift Container Platform 文档中的配置用户工作负载监控和管理警报。
对于 Prometheus 的用户工作负载，在警报规则中添加标签，以强制仅部署在openshift-user-workload-monitoring命名空间中的用户工作负载 Prometheus 评估警报规则，并阻止 Thanos Ruler 实例处理它们。如需帮助，请参阅 OpenShift Container Platform 文档中的为用户定义项目创建警报规则。

1.5.2. 配置 Alertmanager
复制链接

集成外部消息工具，如 email、Slack 和 PagerDuty 以接收来自 Alertmanager 的通知。您必须覆盖 open-cluster-management-observability 命名空间中的 alertmanager-config secret 来添加集成，并为 Alertmanager 配置路由。完成以下步骤以更新自定义接收器规则：

从 alertmanager-config secret 中提取数据。运行以下命令:

oc -n open-cluster-management-observability get secret alertmanager-config --template='{{ index .data "alertmanager.yaml" }}' |base64 -d > alertmanager.yaml

oc -n open-cluster-management-observability get secret alertmanager-config --template='{{ index .data "alertmanager.yaml" }}' |base64 -d > alertmanager.yaml

Copy to Clipboard

Toggle word wrap

运行以下命令，编辑并保存 alertmanager.yaml 文件配置：

oc -n open-cluster-management-observability create secret generic alertmanager-config --from-file=alertmanager.yaml --dry-run -o=yaml |  oc -n open-cluster-management-observability replace secret --filename=-

oc -n open-cluster-management-observability create secret generic alertmanager-config --from-file=alertmanager.yaml --dry-run -o=yaml |  oc -n open-cluster-management-observability replace secret --filename=-

Copy to Clipboard

Toggle word wrap

更新的 secret 可能与以下类似：

global
  smtp_smarthost: 'localhost:25'
  smtp_from: 'alertmanager@example.org'
  smtp_auth_username: 'alertmanager'
  smtp_auth_password: 'password'
templates:
- '/etc/alertmanager/template/*.tmpl'
route:
  group_by: ['alertname', 'cluster', 'service']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 3h
  receiver: team-X-mails
  routes:
  - match_re:
      service: ^(foo1|foo2|baz)$
    receiver: team-X-mails

global
  smtp_smarthost: 'localhost:25'
  smtp_from: 'alertmanager@example.org'
  smtp_auth_username: 'alertmanager'
  smtp_auth_password: 'password'
templates:
- '/etc/alertmanager/template/*.tmpl'
route:
  group_by: ['alertname', 'cluster', 'service']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 3h
  receiver: team-X-mails
  routes:
  - match_re:
      service: ^(foo1|foo2|baz)$
    receiver: team-X-mails

Copy to Clipboard

Toggle word wrap

修改机密后，您的更改将立即应用。有关 Alertmanager 的示例，请参阅 prometheus/alertmanager。

1.5.3. 保护从 Alertmanager 到第三方端点的通信
复制链接

通过 Kubernetes Secret资源确保您的凭证安全且易于管理，从而实现从 Alertmanager 到第三方端点（如 Slack、电子邮件和 PagerDuty）的安全外部通信。您可以创建具有任意内容的Secret资源，并将其安装在您的alertmanager pod 中以访问授权凭据。

要在 Alertmanager 配置中引用 secret，请在 open-cluster-management-observability 命名空间内添加 Secret 资源内容，并在 alertmanager pod 中挂载内容。例如，要创建和挂载 tls secret，请完成以下步骤：

要使用 TLS 证书创建 tls secret，请运行以下命令：

oc create secret tls tls --cert=</path/to/cert.crt> --key=</path/to/cert.key> -n open-cluster-management-observability

oc create secret tls tls --cert=</path/to/cert.crt> --key=</path/to/cert.key> -n open-cluster-management-observability

Copy to Clipboard

Toggle word wrap

要将 tls secret 挂载到 MultiClusterObservability 资源，将其添加到 advanced 部分。您的资源可能类似以下示例：
```
...
advanced:
 alertmanager:
   secrets: ['tls']
```
```
...
advanced:
 alertmanager:
   secrets: ['tls']
```
Copy to Clipboard Toggle word wrap

要在 Alertmanager 配置中添加 tls secret 的引用，请将 secret 的路径添加到配置中。您的资源可能类似以下配置：

tls_config:
 cert_file: '/etc/alertmanager/secrets/tls/tls.crt'
 key_file: '/etc/alertmanager/secrets/tls/tls.key'

tls_config:
 cert_file: '/etc/alertmanager/secrets/tls/tls.crt'
 key_file: '/etc/alertmanager/secrets/tls/tls.key'

Copy to Clipboard

Toggle word wrap

要验证 secret 是否在 alertmanager pod 中，请运行以下命令：

oc -n open-cluster-management-observability get secret alertmanager-config --template='{{ index .data "alertmanager.yaml" }}' |base64 -d > alertmanager.yaml

oc -n open-cluster-management-observability get secret alertmanager-config --template='{{ index .data "alertmanager.yaml" }}' |base64 -d > alertmanager.yaml

Copy to Clipboard

Toggle word wrap

您的 YAML 可能类似以下内容：

"global":
  "http_config":
    "tls_config":
      "cert_file": "/etc/alertmanager/secrets/storyverify/tls.crt"
      "key_file": "/etc/alertmanager/secrets/storyverify/tls.key"

"global":
  "http_config":
    "tls_config":
      "cert_file": "/etc/alertmanager/secrets/storyverify/tls.crt"
      "key_file": "/etc/alertmanager/secrets/storyverify/tls.key"

Copy to Clipboard

Toggle word wrap

要在 alertmanager-config secret 中保存 alertmanager.yaml 配置，请运行以下命令：

oc -n open-cluster-management-observability create secret generic alertmanager-config --from-file=alertmanager.yaml --dry-run -o=yaml

oc -n open-cluster-management-observability create secret generic alertmanager-config --from-file=alertmanager.yaml --dry-run -o=yaml

Copy to Clipboard

Toggle word wrap

要将之前的 secret 替换为新 secret，请运行以下命令：

oc -n open-cluster-management-observability replace secret --filename=-

oc -n open-cluster-management-observability replace secret --filename=-

Copy to Clipboard

Toggle word wrap

启用可观察性后，来自 OpenShift Container Platform 管理集群的警报将自动发送到中心集群上的 Alertmanager。默认情况下，所有平台警报都会发送到中心集群上的 Alertmanager。当您在 OpenShift Container Platform 管理的集群上启用用户工作负载警报时，用户工作负载警报也会发送到您的中心集群。您可以使用 alertmanager-config YAML 文件，为警报配置外部通知系统。

查看 alertmanager-config YAML 文件示例：

global:
  slack_api_url: '<slack_webhook_url>'

route:
  receiver: 'slack-notifications'
  group_by: [alertname, datacenter, app]

receivers:
- name: 'slack-notifications'
  slack_configs:
  - channel: '#alerts'
    text: 'https://internal.myorg.net/wiki/alerts/{{ .GroupLabels.app }}/{{ .GroupLabels.alertname }}'

global:
  slack_api_url: '<slack_webhook_url>'

route:
  receiver: 'slack-notifications'
  group_by: [alertname, datacenter, app]

receivers:
- name: 'slack-notifications'
  slack_configs:
  - channel: '#alerts'
    text: 'https://internal.myorg.net/wiki/alerts/{{ .GroupLabels.app }}/{{ .GroupLabels.alertname }}'

Copy to Clipboard

Toggle word wrap

如果要配置代理进行警报转发，请在 alertmanager-config YAML 文件中添加以下 global 条目：

global:
  slack_api_url: '<slack_webhook_url>'
  http_config:
    proxy_url: http://****

global:
  slack_api_url: '<slack_webhook_url>'
  http_config:
    proxy_url: http://****

Copy to Clipboard

Toggle word wrap

要转发用户工作负载警报，警报必须由用户工作负载 Prometheus 实例处理，而不是由 Thanos Ruler 处理。请参阅以下PrometheusRule资源示例：

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    openshift.io/prometheus-rule-evaluation-scope: leaf-prometheus

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    openshift.io/prometheus-rule-evaluation-scope: leaf-prometheus

Copy to Clipboard

Toggle word wrap

1.5.5. 禁用受管集群的警报转发
复制链接

要禁用托管集群的警报转发，请将mco-disable-alerting：“true”注释添加到MultiClusterObservability自定义资源。当您设置mco-disable-alerting：“true”注释时，平台和用户工作负载警报都不会转发到中心集群上的 Alertmanager。托管集群上的转发配置已恢复。

openshift-monitoring命名空间中的cluster-monitoring-config配置映射的配置更新已被恢复。设置注释可确保cluster-monitoring-config配置图不由 Observability 操作员端点管理或更新。更新配置后，托管集群上的平台和用户工作负载 Prometheus 实例都会重新启动。完成以下步骤：

恢复更改后，在 open-cluster-management-addon-observability 命名空间中会创建一个名为 cluster-monitoring-reverted 的 ConfigMap。任何手动添加的新警报转发配置都不会从配置图中恢复。

运行以下命令将mco-disable-alerting注释设置为“true” ：
```
oc annotate MultiClusterObservability observability mco-disable-alerting=true
```
```
oc annotate MultiClusterObservability observability mco-disable-alerting=true
```
Copy to Clipboard Toggle word wrap
重要提示：当您托管集群上的 Prometheus 未配置持久卷时，您会丢失指标。
验证 hub 集群警报管理器不再将受管集群警报传播到第三方消息传递工具。

1.5.6. 禁用托管集群的用户工作负载警报转发
复制链接

要禁用托管集群的用户工作负载警报转发，请将mco-disable-uwl-alerting：“true”注释添加到MultiClusterObservability自定义资源。当您设置注释时，将用户工作负载警报转发到中心集群上的 Alertmanager 会停止，而平台警报会继续转发到 Alertmanager。

设置注释可确保用户工作负载监控配置映射不被可观察性操作员端点管理或更新。更新配置后，托管集群上的用户工作负载 Prometheus 实例将重新启动。

运行以下命令将mco-disable-uwl-alerting注释设置为“true” ：

oc annotate MultiClusterObservability observability mco-disable-uwl-alerting=true

oc annotate MultiClusterObservability observability mco-disable-uwl-alerting=true

Copy to Clipboard

Toggle word wrap

1.5.7. 静默警报
复制链接

添加您不想接收的警报。您可以根据警报名称、匹配标签或持续时间来静默警报。添加要静默的警报后，会创建一个 ID。您的静默警报的 ID 可能类似以下字符串 d839aca9-ed46-40be-84c4-dca8773671da。

继续读取静默警报的方法：

要静默 Red Hat Advanced Cluster Management 警报，您必须有权访问 open-cluster-management-observability 命名空间中的 alertmanager-main pod。例如，在 observability-alertmanager-0 pod 终端中输入以下命令来静默 SampleAlert ：
```
amtool silence add --alertmanager.url="http://localhost:9093" --author="user" --comment="Silencing sample alert" alertname="SampleAlert"
```
```
amtool silence add --alertmanager.url="http://localhost:9093" --author="user" --comment="Silencing sample alert" alertname="SampleAlert"
```
Copy to Clipboard Toggle word wrap

通过使用多个匹配标签静默警报。以下命令使用 match-label-1 和 match-label-2 ：

amtool silence add --alertmanager.url="http://localhost:9093" --author="user" --comment="Silencing sample alert" <match-label-1>=<match-value-1> <match-label-2>=<match-value-2>

amtool silence add --alertmanager.url="http://localhost:9093" --author="user" --comment="Silencing sample alert" <match-label-1>=<match-value-1> <match-label-2>=<match-value-2>

Copy to Clipboard

Toggle word wrap

如果要在特定时间段内静默警报，请使用 --duration 标志。运行以下命令，以静默一个小时的 SampleAlert ：

amtool silence add --alertmanager.url="http://localhost:9093" --author="user" --comment="Silencing sample alert" --duration="1h" alertname="SampleAlert"

amtool silence add --alertmanager.url="http://localhost:9093" --author="user" --comment="Silencing sample alert" --duration="1h" alertname="SampleAlert"

Copy to Clipboard

Toggle word wrap

您还可以为静默的警报指定开始或结束时间。输入以下命令在特定开始时静默 SampleAlert ：

amtool silence add --alertmanager.url="http://localhost:9093" --author="user" --comment="Silencing sample alert" --start="2023-04-14T15:04:05-07:00" alertname="SampleAlert"

amtool silence add --alertmanager.url="http://localhost:9093" --author="user" --comment="Silencing sample alert" --start="2023-04-14T15:04:05-07:00" alertname="SampleAlert"

Copy to Clipboard

Toggle word wrap

要查看创建的所有静默警报，请运行以下命令：
```
amtool silence --alertmanager.url="http://localhost:9093"
```
```
amtool silence --alertmanager.url="http://localhost:9093"
```
Copy to Clipboard Toggle word wrap

如果您不再需要静默警报，请运行以下命令终止警报：

amtool silence expire --alertmanager.url="http://localhost:9093" "d839aca9-ed46-40be-84c4-dca8773671da"

amtool silence expire --alertmanager.url="http://localhost:9093" "d839aca9-ed46-40be-84c4-dca8773671da"

Copy to Clipboard

Toggle word wrap

要结束所有警报的静默，请运行以下命令：

amtool silence expire --alertmanager.url="http://localhost:9093" $(amtool silence query --alertmanager.url="http://localhost:9093" -q)

amtool silence expire --alertmanager.url="http://localhost:9093" $(amtool silence query --alertmanager.url="http://localhost:9093" -q)

Copy to Clipboard

Toggle word wrap

1.5.8. 迁移可观察存储
复制链接

如果使用警报静默程序，您可以在从之前的状态中保留静默时迁移可观察性存储。要做到这一点，请通过使用您选择的 StorageClass 资源创建新 StatefulSets 和 PersistentVolume (PV)资源来 迁移您的 Red Hat Advanced Cluster Management observability 存储。

注： PV 的存储与用于存储从集群收集指标的对象存储不同。

当您使用 StatefulSets 和 PV 将可观察性数据迁移到新存储时，它会存储以下数据组件：

observatorium 或 Thanos： Receives 数据，然后将其上传到对象存储。其一些组件在 PV 中存储数据。对于这个数据，Observatorium 或 Thanos 会在启动时自动重新生成对象存储，因此当您丢失此数据时不会有后果。
Alertmanager ： 仅存储静默的警报。如果要保留这些静默的警报，您必须将这些数据迁移到新 PV。

要迁移您的可观察性存储，请完成以下步骤：

在 MultiClusterObservability 中，将 .spec.storageConfig.storageClass 字段设置为新的存储类。
为确保之前 PersistentVolume 的数据被保留，即使您删除 PersistentVolumeClaim，请转至所有现有 PersistentVolume。
将 reclaimPolicy 改为 "Retain": 'oc patch pv <your-pv-name> -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'。
可选： 要避免丢失数据，请参阅 OCP 4 中的将持久数据迁移到 DG 8 Operator 中的另一个存储类。
在以下 StatefulSet 情况下删除 StatefulSet 和 PersistentVolumeClaim ：
1. alertmanager-db-observability-alertmanager-<REPLICA_NUMBER>
2. data-observability-thanos-<COMPONENT_NAME>
3. data-observability-thanos-receive-default
4. data-observability-thanos-store-shard
5. 重要： 您可能需要删除，然后重新创建 MultiClusterObservability operator pod，以便您可以创建新 StatefulSet。
重新创建名称相同的新 PersistentVolumeClaim，但正确的 StorageClass。
创建一个新的 PersistentVolumeClaim，引用旧的 PersistentVolume。
验证新的 StatefulSet 和 PersistentVolume 是否使用您选择的新 StorageClass。

1.5.9. 阻止警报
复制链接

在全局范围内限制 Red Hat Advanced Cluster Management 警报，这不太严重。通过在 open-cluster-management-observability 命名空间中定义 alertmanager-config 中的禁止规则来限制警报。

当一组与另一组现有匹配者匹配的一组参数时，禁止规则会静默警报。为了让规则生效，目标和源警报必须具有 equal 列表中标签名称相同的标签值。您的 inhibit_rules 可能类似以下：

global:
  resolve_timeout: 1h
inhibit_rules:
  - equal:
      - namespace
    source_match:
      severity: critical
    target_match_re:
      severity: warning|info

global:
  resolve_timeout: 1h
inhibit_rules:

1


  - equal:
      - namespace
    source_match:

2


      severity: critical
    target_match_re:
      severity: warning|info

Copy to Clipboard

Toggle word wrap

1 1

定义了 inhibit_rules 参数部分，以查找同一命名空间中的警报。当一个命名空间中出现了一个 critical 警报时，如果在那个命名空间中还有其他包括严重性级别为 warning 或 info 的警报时，只有 critical 警报会路由到 Alertmanager 接收器。匹配时可能会显示以下警报：

ALERTS{alertname="foo", namespace="ns-1", severity="critical"}
ALERTS{alertname="foo", namespace="ns-1", severity="warning"}

ALERTS{alertname="foo", namespace="ns-1", severity="critical"}
ALERTS{alertname="foo", namespace="ns-1", severity="warning"}

Copy to Clipboard

Toggle word wrap

2 2

如果 source_match 和 target_match_re 参数的值不匹配，则警报将路由到接收器：

ALERTS{alertname="foo", namespace="ns-1", severity="critical"}
ALERTS{alertname="foo", namespace="ns-2", severity="warning"}

ALERTS{alertname="foo", namespace="ns-1", severity="critical"}
ALERTS{alertname="foo", namespace="ns-2", severity="warning"}

Copy to Clipboard

Toggle word wrap

要查看 Red Hat Advanced Cluster Management 中的禁止的警报，请输入以下命令：

amtool alert --alertmanager.url="http://localhost:9093" --inhibited

amtool alert --alertmanager.url="http://localhost:9093" --inhibited

Copy to Clipboard

Toggle word wrap

1.5.10. 其他资源
复制链接

如需了解更多详细信息，请参阅 Observability 高级配置。
有关更多可观察性主题，请参阅 Observability 服务。

1.5.1. 先决条件
复制链接

1.5.2. 配置 Alertmanager
复制链接

1.5.3. 保护从 Alertmanager 到第三方端点的通信
复制链接

1.5.4. 转发警报
复制链接

1.5.5. 禁用受管集群的警报转发
复制链接

1.5.6. 禁用托管集群的用户工作负载警报转发
复制链接

1.5.7. 静默警报
复制链接

1.5.8. 迁移可观察存储
复制链接

1.5.9. 阻止警报
复制链接

1.5.10. 其他资源
复制链接

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

1.5. 管理警报

1.5.1. 先决条件复制链接链接已复制到粘贴板!

1.5.2. 配置 Alertmanager复制链接链接已复制到粘贴板!

1.5.3. 保护从 Alertmanager 到第三方端点的通信复制链接链接已复制到粘贴板!

1.5.4. 转发警报复制链接链接已复制到粘贴板!

1.5.5. 禁用受管集群的警报转发复制链接链接已复制到粘贴板!

1.5.6. 禁用托管集群的用户工作负载警报转发复制链接链接已复制到粘贴板!

1.5.7. 静默警报复制链接链接已复制到粘贴板!

1.5.8. 迁移可观察存储复制链接链接已复制到粘贴板!

1.5.9. 阻止警报复制链接链接已复制到粘贴板!

1.5.10. 其他资源复制链接链接已复制到粘贴板!

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

1.5.1. 先决条件
复制链接

1.5.2. 配置 Alertmanager
复制链接

1.5.3. 保护从 Alertmanager 到第三方端点的通信
复制链接

1.5.4. 转发警报
复制链接

1.5.5. 禁用受管集群的警报转发
复制链接

1.5.6. 禁用托管集群的用户工作负载警报转发
复制链接

1.5.7. 静默警报
复制链接

1.5.8. 迁移可观察存储
复制链接

1.5.9. 阻止警报
复制链接

1.5.10. 其他资源
复制链接