1.6. 管理警报

接收并定义可观察服务的警报，以通知 hub 集群和受管集群更改。

1.6.1. 先决条件

您必须在 hub 集群中启用可观察性。
您必须在 open-cluster-management-observability 命名空间中具有 secret 资源的 create 权限。
您必须对 MultiClusterObservability 资源具有编辑权限。

1.6.2. 配置 Alertmanager

集成外部消息工具，如 email、Slack 和 PagerDuty 以接收来自 Alertmanager 的通知。您必须覆盖 open-cluster-management-observability 命名空间中的 alertmanager-config secret 来添加集成，并为 Alertmanager 配置路由。完成以下步骤以更新自定义接收器规则：

从 alertmanager-config secret 中提取数据。运行以下命令:

oc -n open-cluster-management-observability get secret alertmanager-config --template='{{ index .data "alertmanager.yaml" }}' |base64 -d > alertmanager.yaml

运行以下命令，编辑并保存 alertmanager.yaml 文件配置：

oc -n open-cluster-management-observability create secret generic alertmanager-config --from-file=alertmanager.yaml --dry-run -o=yaml |  oc -n open-cluster-management-observability replace secret --filename=-

更新的 secret 可能与以下类似：

global
  smtp_smarthost: 'localhost:25'
  smtp_from: 'alertmanager@example.org'
  smtp_auth_username: 'alertmanager'
  smtp_auth_password: 'password'
templates:
- '/etc/alertmanager/template/*.tmpl'
route:
  group_by: ['alertname', 'cluster', 'service']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 3h
  receiver: team-X-mails
  routes:
  - match_re:
      service: ^(foo1|foo2|baz)$
    receiver: team-X-mails

您的更改会在修改后立即生效。有关 Alertmanager 的示例，请参阅 prometheus/alertmanager。

1.6.2.1. 在 Alertmanager pod 中挂载 secret

您可以使用任意内容创建 Secret 资源，这些资源可在 alertmanager pod 中挂载，以访问授权凭证。

要在 Alertmanager 配置中引用 secret，请在 open-cluster-management-observability 命名空间内添加 Secret 资源内容，并在 alertmanager pod 中挂载内容。例如，要创建和挂载 tls secret，请完成以下步骤：

要使用 TLS 证书创建 tls secret，请运行以下命令：

oc create secret tls tls --cert=</path/to/cert.crt> --key=</path/to/cert.key> -n open-cluster-management-observability

要将 tls secret 挂载到 MultiClusterObservability 资源，将其添加到 advanced 部分。您的资源可能类似以下内容：
```
...
advanced:
 alertmanager:
   secrets: ['tls']
```
要在 Alertmanager 配置中添加 tls secret 的引用，请将 secret 的路径添加到配置中。您的资源可能类似以下配置：
```
tls_config:
 cert_file: '/etc/alertmanager/secrets/tls/tls.crt'
 key_file: '/etc/alertmanager/secrets/tls/tls.key'
```

要验证 secret 是否在 alertmanager pod 中，请运行以下命令：

oc -n open-cluster-management-observability get secret alertmanager-config --template='{{ index .data "alertmanager.yaml" }}' |base64 -d > alertmanager.yaml

您的 YAML 可能类似以下内容：

"global":
  "http_config":
    "tls_config":
      "cert_file": "/etc/alertmanager/secrets/storyverify/tls.crt"
      "key_file": "/etc/alertmanager/secrets/storyverify/tls.key"

要在 alertmanager-config secret 中保存 alertmanager.yaml 配置，请运行以下命令：

oc -n open-cluster-management-observability create secret generic alertmanager-config --from-file=alertmanager.yaml --dry-run -o=yaml

要将之前的 secret 替换为新 secret，请运行以下命令：

oc -n open-cluster-management-observability replace secret --filename=-

1.6.3. 转发警报

启用可观察性后，来自 OpenShift Container Platform 受管集群的警报会自动发送到 hub 集群。您可以使用 alertmanager-config YAML 文件，为警报配置外部通知系统。

查看 alertmanager-config YAML 文件示例：

global:
  slack_api_url: '<slack_webhook_url>'

route:
  receiver: 'slack-notifications'
  group_by: [alertname, datacenter, app]

receivers:
- name: 'slack-notifications'
  slack_configs:
  - channel: '#alerts'
    text: 'https://internal.myorg.net/wiki/alerts/{{ .GroupLabels.app }}/{{ .GroupLabels.alertname }}'

如果要配置代理进行警报转发，请在 alertmanager-config YAML 文件中添加以下 global 条目：

global:
  slack_api_url: '<slack_webhook_url>'
  http_config:
    proxy_url: http://****

1.6.3.1. 禁用受管集群的警报转发

要禁用受管集群的警报转发，请在 MultiClusterObservability 自定义资源中添加以下注解：

metadata:
      annotations:
        mco-disable-alerting: "true"

设置注解时，受管集群上的警报转发配置会被恢复。对 openshift-monitoring 命名空间中的 ocp-monitoring-config 配置映射所做的任何更改也会被恢复。设置注解可确保 ocp-monitoring-config 配置映射不再由 observability operator 端点管理或更新。更新配置后，受管集群中的 Prometheus 实例会重启。

重要： 如果您有一个带有指标数据的 Prometheus 实例，且 Prometheus 实例重启了 Prometheus 实例，则受管集群上的指标将会丢失。hub 集群中的指标不会受到影响。

恢复更改后，在 open-cluster-management-addon-observability 命名空间中会创建一个名为 cluster-monitoring-reverted 的 ConfigMap。任何新的、手动添加的警报转发配置都不会从 ConfigMap 恢复。

验证 hub 集群警报管理器不再将受管集群警报传播到第三方消息传递工具。请参阅上一节，配置 Alertmanager。

1.6.4. 静默警报

添加您不想接收的警报。您可以根据警报名称、匹配标签或持续时间来静默警报。添加要静默的警报后，会创建一个 ID。您的静默警报的 ID 可能类似以下字符串 d839aca9-ed46-40be-84c4-dca8773671da。

继续读取静默警报的方法：

要静默 Red Hat Advanced Cluster Management 警报，您必须有权访问 open-cluster-management-observability 命名空间中的 alertmanager pod。例如，在 observability-alertmanager-0 pod 终端中输入以下命令来静默 SampleAlert ：
```
amtool silence add --alertmanager.url="http://localhost:9093" --author="user" --comment="Silencing sample alert" alertname="SampleAlert"
```

通过使用多个匹配标签静默警报。以下命令使用 match-label-1 和 match-label-2 ：

amtool silence add --alertmanager.url="http://localhost:9093" --author="user" --comment="Silencing sample alert" <match-label-1>=<match-value-1> <match-label-2>=<match-value-2>

如果要在特定时间段内静默警报，请使用 --duration 标志。运行以下命令，以静默一个小时的 SampleAlert ：

amtool silence add --alertmanager.url="http://localhost:9093" --author="user" --comment="Silencing sample alert" --duration="1h" alertname="SampleAlert"

您还可以为静默的警报指定开始或结束时间。输入以下命令在特定开始时静默 SampleAlert ：

amtool silence add --alertmanager.url="http://localhost:9093" --author="user" --comment="Silencing sample alert" --start="2023-04-14T15:04:05-07:00" alertname="SampleAlert"

要查看创建的所有静默警报，请运行以下命令：
```
amtool silence --alertmanager.url="http://localhost:9093"
```

如果您不再需要静默警报，请运行以下命令终止警报：

amtool silence expire --alertmanager.url="http://localhost:9093" "d839aca9-ed46-40be-84c4-dca8773671da"

要结束所有警报的静默，请运行以下命令：

amtool silence expire --alertmanager.url="http://localhost:9093" $(amtool silence query --alertmanager.url="http://localhost:9093" -q)

1.6.4.1. 迁移可观察存储

如果使用警报静默程序，您可以在从之前的状态中保留静默时迁移可观察性存储。要做到这一点，请通过使用您选择的 StorageClass 资源创建新 StatefulSets 和 PersistentVolume (PV)资源来 迁移您的 Red Hat Advanced Cluster Management observability 存储。

注： PV 的存储与用于存储从集群收集指标的对象存储不同。

当您使用 StatefulSets 和 PV 将可观察性数据迁移到新存储时，它会存储以下数据组件：

observatorium 或 Thanos： Receives 数据，然后将其上传到对象存储。其一些组件在 PV 中存储数据。对于这个数据，Observatorium 或 Thanos 会在启动时自动重新生成对象存储，因此当您丢失此数据时不会有后果。
Alertmanager ： 仅存储静默的警报。如果要保留这些静默的警报，您必须将这些数据迁移到新 PV。

要迁移您的可观察性存储，请完成以下步骤：

在 MultiClusterObservability 中，将 .spec.storageConfig.storageClass 字段设置为新的存储类。
为确保之前 PersistentVolume 的数据被保留，即使您删除 PersistentVolumeClaim，请转至所有现有 PersistentVolume。
将 reclaimPolicy 改为 "Retain": 'oc patch pv <your-pv-name> -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'。
可选： 要避免丢失数据，请参阅 OCP 4 中的将持久数据迁移到 DG 8 Operator 中的另一个存储类。
在以下 StatefulSet 情况下删除 StatefulSet 和 PersistentVolumeClaim ：
1. alertmanager-db-observability-alertmanager-<REPLICA_NUMBER>
2. data-observability-thanos-<COMPONENT_NAME>
3. data-observability-thanos-receive-default
4. data-observability-thanos-store-shard
5. 重要： 您可能需要删除，然后重新创建 MultiClusterObservability operator pod，以便您可以创建新 StatefulSet。
重新创建名称相同的新 PersistentVolumeClaim，但正确的 StorageClass。
创建一个新的 PersistentVolumeClaim，引用旧的 PersistentVolume。
验证新的 StatefulSet 和 PersistentVolume 是否使用您选择的新 StorageClass。

1.6.5. 阻止警报

在全局范围内限制 Red Hat Advanced Cluster Management 警报，这不太严重。通过在 open-cluster-management-observability 命名空间中定义 alertmanager-config 中的禁止规则来限制警报。

当一组与另一组现有匹配者匹配的一组参数时，禁止规则会静默警报。为了让规则生效，目标和源警报必须具有 equal 列表中标签名称相同的标签值。您的 inhibit_rules 可能类似以下：

global:
  resolve_timeout: 1h
inhibit_rules:1
  - equal:
      - namespace
    source_match:2
      severity: critical
    target_match_re:
      severity: warning|info

1 1

定义了 inhibit_rules 参数部分，以查找同一命名空间中的警报。当一个命名空间中出现了一个 critical 警报时，如果在那个命名空间中还有其他包括严重性级别为 warning 或 info 的警报时，只有 critical 警报会路由到 Alertmanager 接收器。匹配时可能会显示以下警报：

ALERTS{alertname="foo", namespace="ns-1", severity="critical"}
ALERTS{alertname="foo", namespace="ns-1", severity="warning"}

2 2

如果 source_match 和 target_match_re 参数的值不匹配，则警报将路由到接收器：

ALERTS{alertname="foo", namespace="ns-1", severity="critical"}
ALERTS{alertname="foo", namespace="ns-2", severity="warning"}

要查看 Red Hat Advanced Cluster Management 中的禁止的警报，请输入以下命令：

amtool alert --alertmanager.url="http://localhost:9093" --inhibited

1.6.6. 其他资源

如需了解更多详细信息，请参阅自定义可观察性。
有关更多可观察性主题，请参阅 Observability 服务。

1.6. 管理警报

1.6.1. 先决条件

1.6.2. 配置 Alertmanager

1.6.2.1. 在 Alertmanager pod 中挂载 secret

1.6.3. 转发警报

1.6.3.1. 禁用受管集群的警报转发

1.6.4. 静默警报

1.6.4.1. 迁移可观察存储

1.6.5. 阻止警报

1.6.6. 其他资源

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Red Hat legal and privacy links

Red Hat legal and privacy links