7.5. 灾难恢复警报
本节提供了在灾难恢复环境中与 Red Hat OpenShift Data Foundation 关联的所有支持警报的列表。
记录规则
record:
ramen_sync_duration_seconds- 表达式
sum by (obj_name, obj_namespace, obj_type, job, policyname)(time() - (ramen_last_sync_timestamp_seconds > 0))- 用途
- 卷组最后一次同步时间和时间(以秒为单位)之间的时间间隔。
Record:
ramen_rpo_difference- 表达式
ramen_sync_duration_seconds{job="ramen-hub-operator-metrics-service"} / on(policyname, job) group_left() (ramen_policy_schedule_interval_seconds{job="ramen-hub-operator-metrics-service"})- 用途
- 预期同步延迟和卷复制组所使用的实际同步延迟之间的差别。
Record:
count_persistentvolumeclaim_total- 表达式
count(kube_persistentvolumeclaim_info)- 用途
- 来自受管集群的所有 PVC 的总和。
警报
alert:
VolumeSynchronizationDelay- 影响
- Critical
- 用途
- 卷复制组占用的实际同步延迟是延迟预期同步延迟。
- YAML
alert: VolumeSynchronizationDelay expr: ramen_rpo_difference >= 3 for: 5s labels: severity: critical annotations: description: "The syncing of volumes is exceeding three times the scheduled snapshot interval, or the volumes have been recently protected. (DRPC: {{ $labels.obj_name }}, Namespace: {{ $labels.obj_namespace }})" alert_type: "DisasterRecovery"
alert:
VolumeSynchronizationDelay- 影响
- Warning
- 用途
- 卷复制组占用的实际同步延迟是预期的同步延迟的两倍。
- YAML
alert: VolumeSynchronizationDelay expr: ramen_rpo_difference > 2 and ramen_rpo_difference < 3 for: 5s labels: severity: warning annotations: description: "The syncing of volumes is exceeding two times the scheduled snapshot interval, or the volumes have been recently protected. (DRPC: {{ $labels.obj_name }}, Namespace: {{ $labels.obj_namespace }})" alert_type: "DisasterRecovery"
警报:
WorkloadUnprotected- 影响
- Warning
- 用途
- 应用程序保护状态降级超过 10 分钟
- YAML
alert: WorkloadUnprotected expr: ramen_workload_protection_status == 0 for: 10m labels: severity: warning annotations: description: "Workload is not protected for disaster recovery (DRPC: {{ $labels.obj_name }}, Namespace: {{ $labels.obj_namespace }})." alert_type: "DisasterRecovery"