3.3. Camel K operator 警报
您可以创建 PrometheusRule 资源,以便 OpenShift 监控堆栈中的 AlertManager 实例可以根据 Camel K operator 公开的指标来触发警报。
示例
您可以根据公开的指标创建一个带有警报规则的 PrometheusRule 资源,如下所示。
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: camel-k-operator
spec:
groups:
- name: camel-k-operator
rules:
- alert: CamelKReconciliationDuration
expr: |
(
1 - sum(rate(camel_k_reconciliation_duration_seconds_bucket{le="0.5"}[5m])) by (job)
/
sum(rate(camel_k_reconciliation_duration_seconds_count[5m])) by (job)
)
* 100
> 10
for: 1m
labels:
severity: warning
annotations:
message: |
{{ printf "%0.0f" $value }}% of the reconciliation requests
for {{ $labels.job }} have their duration above 0.5s.
- alert: CamelKReconciliationFailure
expr: |
sum(rate(camel_k_reconciliation_duration_seconds_count{result="Errored"}[5m])) by (job)
/
sum(rate(camel_k_reconciliation_duration_seconds_count[5m])) by (job)
* 100
> 1
for: 10m
labels:
severity: warning
annotations:
message: |
{{ printf "%0.0f" $value }}% of the reconciliation requests
for {{ $labels.job }} have failed.
- alert: CamelKSuccessBuildDuration2m
expr: |
(
1 - sum(rate(camel_k_build_duration_seconds_bucket{le="120",result="Succeeded"}[5m])) by (job)
/
sum(rate(camel_k_build_duration_seconds_count{result="Succeeded"}[5m])) by (job)
)
* 100
> 10
for: 1m
labels:
severity: warning
annotations:
message: |
{{ printf "%0.0f" $value }}% of the successful builds
for {{ $labels.job }} have their duration above 2m.
- alert: CamelKSuccessBuildDuration5m
expr: |
(
1 - sum(rate(camel_k_build_duration_seconds_bucket{le="300",result="Succeeded"}[5m])) by (job)
/
sum(rate(camel_k_build_duration_seconds_count{result="Succeeded"}[5m])) by (job)
)
* 100
> 1
for: 1m
labels:
severity: critical
annotations:
message: |
{{ printf "%0.0f" $value }}% of the successful builds
for {{ $labels.job }} have their duration above 5m.
- alert: CamelKBuildFailure
expr: |
sum(rate(camel_k_build_duration_seconds_count{result="Failed"}[5m])) by (job)
/
sum(rate(camel_k_build_duration_seconds_count[5m])) by (job)
* 100
> 1
for: 10m
labels:
severity: warning
annotations:
message: |
{{ printf "%0.0f" $value }}% of the builds for {{ $labels.job }} have failed.
- alert: CamelKBuildError
expr: |
sum(rate(camel_k_build_duration_seconds_count{result="Error"}[5m])) by (job)
/
sum(rate(camel_k_build_duration_seconds_count[5m])) by (job)
* 100
> 1
for: 10m
labels:
severity: critical
annotations:
message: |
{{ printf "%0.0f" $value }}% of the builds for {{ $labels.job }} have errored.
- alert: CamelKBuildQueueDuration1m
expr: |
(
1 - sum(rate(camel_k_build_queue_duration_seconds_bucket{le="60"}[5m])) by (job)
/
sum(rate(camel_k_build_queue_duration_seconds_count[5m])) by (job)
)
* 100
> 1
for: 1m
labels:
severity: warning
annotations:
message: |
{{ printf "%0.0f" $value }}% of the builds for {{ $labels.job }}
have been queued for more than 1m.
- alert: CamelKBuildQueueDuration5m
expr: |
(
1 - sum(rate(camel_k_build_queue_duration_seconds_bucket{le="300"}[5m])) by (job)
/
sum(rate(camel_k_build_queue_duration_seconds_count[5m])) by (job)
)
* 100
> 1
for: 1m
labels:
severity: critical
annotations:
message: |
{{ printf "%0.0f" $value }}% of the builds for {{ $labels.job }}
have been queued for more than 5m.
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: camel-k-operator
spec:
groups:
- name: camel-k-operator
rules:
- alert: CamelKReconciliationDuration
expr: |
(
1 - sum(rate(camel_k_reconciliation_duration_seconds_bucket{le="0.5"}[5m])) by (job)
/
sum(rate(camel_k_reconciliation_duration_seconds_count[5m])) by (job)
)
* 100
> 10
for: 1m
labels:
severity: warning
annotations:
message: |
{{ printf "%0.0f" $value }}% of the reconciliation requests
for {{ $labels.job }} have their duration above 0.5s.
- alert: CamelKReconciliationFailure
expr: |
sum(rate(camel_k_reconciliation_duration_seconds_count{result="Errored"}[5m])) by (job)
/
sum(rate(camel_k_reconciliation_duration_seconds_count[5m])) by (job)
* 100
> 1
for: 10m
labels:
severity: warning
annotations:
message: |
{{ printf "%0.0f" $value }}% of the reconciliation requests
for {{ $labels.job }} have failed.
- alert: CamelKSuccessBuildDuration2m
expr: |
(
1 - sum(rate(camel_k_build_duration_seconds_bucket{le="120",result="Succeeded"}[5m])) by (job)
/
sum(rate(camel_k_build_duration_seconds_count{result="Succeeded"}[5m])) by (job)
)
* 100
> 10
for: 1m
labels:
severity: warning
annotations:
message: |
{{ printf "%0.0f" $value }}% of the successful builds
for {{ $labels.job }} have their duration above 2m.
- alert: CamelKSuccessBuildDuration5m
expr: |
(
1 - sum(rate(camel_k_build_duration_seconds_bucket{le="300",result="Succeeded"}[5m])) by (job)
/
sum(rate(camel_k_build_duration_seconds_count{result="Succeeded"}[5m])) by (job)
)
* 100
> 1
for: 1m
labels:
severity: critical
annotations:
message: |
{{ printf "%0.0f" $value }}% of the successful builds
for {{ $labels.job }} have their duration above 5m.
- alert: CamelKBuildFailure
expr: |
sum(rate(camel_k_build_duration_seconds_count{result="Failed"}[5m])) by (job)
/
sum(rate(camel_k_build_duration_seconds_count[5m])) by (job)
* 100
> 1
for: 10m
labels:
severity: warning
annotations:
message: |
{{ printf "%0.0f" $value }}% of the builds for {{ $labels.job }} have failed.
- alert: CamelKBuildError
expr: |
sum(rate(camel_k_build_duration_seconds_count{result="Error"}[5m])) by (job)
/
sum(rate(camel_k_build_duration_seconds_count[5m])) by (job)
* 100
> 1
for: 10m
labels:
severity: critical
annotations:
message: |
{{ printf "%0.0f" $value }}% of the builds for {{ $labels.job }} have errored.
- alert: CamelKBuildQueueDuration1m
expr: |
(
1 - sum(rate(camel_k_build_queue_duration_seconds_bucket{le="60"}[5m])) by (job)
/
sum(rate(camel_k_build_queue_duration_seconds_count[5m])) by (job)
)
* 100
> 1
for: 1m
labels:
severity: warning
annotations:
message: |
{{ printf "%0.0f" $value }}% of the builds for {{ $labels.job }}
have been queued for more than 1m.
- alert: CamelKBuildQueueDuration5m
expr: |
(
1 - sum(rate(camel_k_build_queue_duration_seconds_bucket{le="300"}[5m])) by (job)
/
sum(rate(camel_k_build_queue_duration_seconds_count[5m])) by (job)
)
* 100
> 1
for: 1m
labels:
severity: critical
annotations:
message: |
{{ printf "%0.0f" $value }}% of the builds for {{ $labels.job }}
have been queued for more than 5m.
Camel K operator 警报
下表显示了在 PrometheusRule 资源中定义的警报规则。
| 名称 | 重要性 | 描述 |
|---|---|---|
|
| warning | 协调请求超过 10% 的持续时间,超过 0.5 以上,至少有 1 分钟。 |
|
| warning | 超过 1% 的协调请求失败至少 10 分钟。 |
|
| warning | 超过 10% 的成功构建的时间已超过 2 分钟超过 1 分钟。 |
|
| critical | 成功构建的超过 1% 的时间已超过 5 分钟超过 1 分钟。 |
|
| critical | 构建超过 1% 的构建错误至少增加了 10 分钟。 |
|
| warning | 超过 1% 的构建队列已排队超过 1 分钟以上。 |
|
| critical | 超过 1% 的构建排队了超过 5 分钟以上时间超过 1 分钟。 |
您可以在 OpenShift 文档 创建警报规则 中找到有关警报的更多信息。