Chapter 3. Monitoring Camel K operator
Red Hat Integration - Camel K monitoring is based on the OpenShift monitoring system. This chapter explains how to use the available options for monitoring Red Hat Integration - Camel K operator at runtime. You can use the Prometheus Operator that is already deployed as part of OpenShift Monitoring to monitor your own applications.
3.1. Camel K Operator metrics
The Camel K operator monitoring endpoint exposes the following metrics:
Name | Type | Description | Buckets | Labels |
---|---|---|---|---|
|
| Reconciliation request duration | 0.25s, 0.5s, 1s, 5s |
|
|
| Build duration | 30s, 1m, 1.5m, 2m, 5m, 10m |
|
|
| Build recovery attempts | 0, 1, 2, 3, 4, 5 |
|
|
| Build queue duration | 5s, 15s, 30s, 1m, 5m, | N/A |
|
| Time to first integration readiness | 5s, 10s, 30s, 1m, 2m | N/A |
3.2. Enabling Camel K Operator monitoring
OpenShift 4.3 or higher includes an embedded Prometheus Operator already deployed as part of OpenShift Monitoring. This section explains how to enable monitoring of your own application services in OpenShift Monitoring.
Prerequisites
- You must have cluster administrator access to an OpenShift cluster on which the Camel K Operator is installed. See Installing Camel K.
- You must have already enabled monitoring of your own services in OpenShift. See Enabling user workload monitoring in OpenShift.
Procedure
Create a
PodMonitor
resource targeting the operator metrics endpoint, so that the Prometheus server can scrape the metrics exposed by the operator.operator-pod-monitor.yaml
apiVersion: monitoring.coreos.com/v1 kind: PodMonitor metadata: name: camel-k-operator labels: app: "camel-k" camel.apache.org/component: operator spec: selector: matchLabels: app: "camel-k" camel.apache.org/component: operator podMetricsEndpoints: - port: metrics
Create
PodMonitor
resource.oc apply -f operator-pod-monitor.yaml
Additional Resources
- For more information about the discovery mechanism and the relationship between the operator resources see Prometheus Operator getting started guide.
-
In case your operator metrics are not discovered, you can find more information in Troubleshooting ServiceMonitor changes, which also applies to
PodMonitor
resources troubleshooting.
3.3. Camel K operator alerts
You can create a PrometheusRule
resource so that the AlertManager instance from the OpenShift monitoring stack can trigger alerts, based on the metrics exposed by the Camel K operator.
Example
You can create a PrometheusRule
resource with alerting rules based on the exposed metrics as shown below.
apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: name: camel-k-operator spec: groups: - name: camel-k-operator rules: - alert: CamelKReconciliationDuration expr: | ( 1 - sum(rate(camel_k_reconciliation_duration_seconds_bucket{le="0.5"}[5m])) by (job) / sum(rate(camel_k_reconciliation_duration_seconds_count[5m])) by (job) ) * 100 > 10 for: 1m labels: severity: warning annotations: message: | {{ printf "%0.0f" $value }}% of the reconciliation requests for {{ $labels.job }} have their duration above 0.5s. - alert: CamelKReconciliationFailure expr: | sum(rate(camel_k_reconciliation_duration_seconds_count{result="Errored"}[5m])) by (job) / sum(rate(camel_k_reconciliation_duration_seconds_count[5m])) by (job) * 100 > 1 for: 10m labels: severity: warning annotations: message: | {{ printf "%0.0f" $value }}% of the reconciliation requests for {{ $labels.job }} have failed. - alert: CamelKSuccessBuildDuration2m expr: | ( 1 - sum(rate(camel_k_build_duration_seconds_bucket{le="120",result="Succeeded"}[5m])) by (job) / sum(rate(camel_k_build_duration_seconds_count{result="Succeeded"}[5m])) by (job) ) * 100 > 10 for: 1m labels: severity: warning annotations: message: | {{ printf "%0.0f" $value }}% of the successful builds for {{ $labels.job }} have their duration above 2m. - alert: CamelKSuccessBuildDuration5m expr: | ( 1 - sum(rate(camel_k_build_duration_seconds_bucket{le="300",result="Succeeded"}[5m])) by (job) / sum(rate(camel_k_build_duration_seconds_count{result="Succeeded"}[5m])) by (job) ) * 100 > 1 for: 1m labels: severity: critical annotations: message: | {{ printf "%0.0f" $value }}% of the successful builds for {{ $labels.job }} have their duration above 5m. - alert: CamelKBuildFailure expr: | sum(rate(camel_k_build_duration_seconds_count{result="Failed"}[5m])) by (job) / sum(rate(camel_k_build_duration_seconds_count[5m])) by (job) * 100 > 1 for: 10m labels: severity: warning annotations: message: | {{ printf "%0.0f" $value }}% of the builds for {{ $labels.job }} have failed. - alert: CamelKBuildError expr: | sum(rate(camel_k_build_duration_seconds_count{result="Error"}[5m])) by (job) / sum(rate(camel_k_build_duration_seconds_count[5m])) by (job) * 100 > 1 for: 10m labels: severity: critical annotations: message: | {{ printf "%0.0f" $value }}% of the builds for {{ $labels.job }} have errored. - alert: CamelKBuildQueueDuration1m expr: | ( 1 - sum(rate(camel_k_build_queue_duration_seconds_bucket{le="60"}[5m])) by (job) / sum(rate(camel_k_build_queue_duration_seconds_count[5m])) by (job) ) * 100 > 1 for: 1m labels: severity: warning annotations: message: | {{ printf "%0.0f" $value }}% of the builds for {{ $labels.job }} have been queued for more than 1m. - alert: CamelKBuildQueueDuration5m expr: | ( 1 - sum(rate(camel_k_build_queue_duration_seconds_bucket{le="300"}[5m])) by (job) / sum(rate(camel_k_build_queue_duration_seconds_count[5m])) by (job) ) * 100 > 1 for: 1m labels: severity: critical annotations: message: | {{ printf "%0.0f" $value }}% of the builds for {{ $labels.job }} have been queued for more than 5m.
Camel K operator alerts
Following table shows the alerting rules that are defined in the PrometheusRule
resource.
Name | Severity | Description |
---|---|---|
| warning | More than 10% of the reconciliation requests have their duration above 0.5s over at least 1 min. |
| warning | More than 1% of the reconciliation requests have failed over at least 10 min. |
| warning | More than 10% of the successful builds have their duration above 2 min over at least 1 min. |
| critical | More than 1% of the successful builds have their duration above 5 min over at least 1 min. |
| critical | More than 1% of the builds have errored over at least 10 min. |
| warning | More than 1% of the builds have been queued for more than 1 min over at least 1 min. |
| critical | More than 1% of the builds have been queued for more than 5 min over at least 1 min. |
You can find more information about alerts in Creating alerting rules from the OpenShift documentation.