Installing Red Hat OpenShift Cluster Observability Operator
Installing and configuring the Cluster Observability Operator.
Abstract
Chapter 1. Installing the Cluster Observability Operator Copy linkLink copied to clipboard!
As a cluster administrator, you can install or remove the Cluster Observability Operator (COO) from OperatorHub by using the OpenShift Container Platform web console. OperatorHub is a user interface that works in conjunction with Operator Lifecycle Manager (OLM), which installs and manages Operators on a cluster.
For OpenShift Container Platform v4.20+, you can install an Operator using the software catalog.
1.1. Installing the Cluster Observability Operator in the web console Copy linkLink copied to clipboard!
Install the Cluster Observability Operator (COO) from OperatorHub by using the OpenShift Container Platform web console.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admincluster role. - You have logged in to the OpenShift Container Platform web console.
Procedure
In the OpenShift Container Platform web console:
- Click Operators → OperatorHub, if your version of OpenShift Container Platform is 4.19 or earlier.
- Click Ecosystem → Software Catalog, if your version of OpenShift Container Platform is 4.20+.
-
Type
cluster observability operatorin the Filter by keyword box. - Click Cluster Observability Operator in the list of results.
Read the information about the Operator, and configure the following installation settings:
- Update channel → stable
- Version → 1.0.0 or later
- Installation mode → All namespaces on the cluster (default)
- Installed Namespace → Operator recommended Namespace: openshift-cluster-observability-operator
- Select Enable Operator recommended cluster monitoring on this Namespace
- Update approval → Automatic
- Optional: You can change the installation settings to suit your requirements. For example, you can select to subscribe to a different update channel, to install an older released version of the Operator, or to require manual approval for updates to new versions of the Operator.
- Click Install.
Verification
In the OpenShift Container Platform web console:
- Go to Operators → Installed Operators, if your version of OpenShift Container Platform is 4.19 or earlier.
- Go to Ecosystem → Installed Operators, if your version of OpenShift Container Platform is 4.20+.
- Verify that the Cluster Observability Operator entry appears in the list.
Additional resources
1.2. Uninstalling the Cluster Observability Operator using the web console Copy linkLink copied to clipboard!
If you have installed the Cluster Observability Operator (COO) by using OperatorHub or software catalog, you can uninstall it in the OpenShift Container Platform web console.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admincluster role. - You have logged in to the OpenShift Container Platform web console.
Procedure
In the OpenShift Container Platform web console:
- Go to Operators → Installed Operators, if your version of OpenShift Container Platform is 4.19 or earlier.
- Go to Ecosystem → Installed Operators, if your version of OpenShift Container Platform is 4.20+.
- Locate the Cluster Observability Operator entry in the list.
-
Click
for this entry and select Uninstall Operator.
Verification
In the OpenShift Container Platform web console:
- Go to Operators → Installed Operators, if your version of OpenShift Container Platform is 4.19 or earlier.
- Go to Ecosystem → Installed Operators, if your version of OpenShift Container Platform is 4.20+.
- Verify that the Cluster Observability Operator entry no longer appears in the list.
Chapter 2. Installing end-to-end observability Copy linkLink copied to clipboard!
Installation guide for the end-to-end observability stack in OpenShift Container Platform using the ObservabilityInstaller custom resource definition. Integrated distributed tracing capabilities are powered by OpenTelemetry, Tempo, and supporting infrastructure.
The end-to-end observability stack is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
The ObservabilityInstaller custom resource definition (CRD) can be used to install the end-to-end observability stack.
2.1. Distributed tracing Copy linkLink copied to clipboard!
The distributed tracing capability installs the OpenTelemetry Collector, Tempo and the distributed tracing UI plugin.
Procedure
Install distributed tracing by using the
ObservabilityInstallerresource.apiVersion: observability.openshift.io/v1alpha1 kind: ObservabilityInstaller metadata: name: tracing namespace: observability spec: capabilities: tracing: enabled: true operators: install: true storage: objectStorage: ...namespace: observability-
Namespace where the
ObservabilityInstallerCR is created. The namespace should contain the secrets referenced in the object storage configuration. enabled: true- Enables the tracing capability. This installs the OpenTelemetry Collector, Tempo and all necessary configuration.
install: true-
Installs the operators required by the tracing capability. The default is
true. objectStorage:- Required object storage configuration for the tracing capability. This is used to store traces in an object storage service.
Configure S3 object storage
storage: objectStorage: s3: bucket: <bucket_name> endpoint: <s3_endpoint> accessKeyID: <access_key_id> accessKeySecret: name: s3-access-secret key: access_key_secret region: us-east-1<bucket_name>- Name of the bucket to store data in.
<s3_endpoint>- Endpoint of the S3-compatible object storage service.
<access_key_id>- Access key ID for the S3-compatible object storage service.
<access_key_secret>- Secret containing the access key secret for the S3-compatible object storage service.
<region>AWS region for the S3 bucket.
Use the following command to create the secret containing the access key:
$ oc create secret generic s3-access-secret \ --from-literal=access_key_secret="secret"
2.2. Object storage types Copy linkLink copied to clipboard!
Configure supported object storage types for the Cluster Observability Operator, including Amazon S3, MinIO, Microsoft Azure Blob Storage, Google Cloud Storage, and their respective authentication methods. Each object storage type has its own set of required fields which are configured directly in the ObservabilityInstaller CR.
2.2.1. Amazon S3 with Secure Token Service (STS) - Short lived Copy linkLink copied to clipboard!
Configure Amazon S3 with Secure Token Service (STS) for short-lived credentials in the Cluster Observability Operator using temporary security tokens.
S3 with STS is supported by Tempo and Loki.
spec:
capabilities:
tracing:
enabled: true
storage:
objectStorage:
s3STS:
bucket: bucket-name
roleARN:
region: us-east-1
Use the following command to create the secret containing the access key:
$ oc create secret generic storage-secret \
--from-literal=bucket="<BUCKET_NAME>" \
--from-literal=role_arn="<AWS_ROLE_ARN>" \
--from-literal=region="<AWS_REGION_YOUR_BUCKET_LIVES_IN>"
2.2.2. Amazon S3 / MinIO Copy linkLink copied to clipboard!
Configure Amazon S3 or MinIO object storage for the Cluster Observability Operator to store tracing and logging data with support for both Tempo and Loki Operators.
spec:
capabilities:
tracing:
enabled: true
storage:
objectStorage:
s3:
bucket: bucket-name
endpoint: http://minio.minio.svc:9000
accessKeyID: tempo
accessKeySecret:
name: minio-secret
key: access_key_secret
region: us-east-1
Use the following command to create the secret containing the access key:
$ oc create secret generic minio-secret \
--from-literal=access_key_secret="supersecret"
2.2.2.1. Secret supported by Tempo and Loki operators Copy linkLink copied to clipboard!
$ oc create secret generic storage-secret \
--from-literal=bucket="<BUCKET_NAME>" \
--from-literal=endpoint="<AWS_BUCKET_ENDPOINT>" \
--from-literal=access_key_id="<AWS_ACCESS_KEY_ID>" \
--from-literal=access_key_secret="<AWS_ACCESS_KEY_SECRET>" \
--from-literal=region="<AWS_REGION_YOUR_BUCKET_LIVES_IN>"
-
regionis optional in Tempo and required by Loki.
2.2.3. Microsoft Azure Blob Storage Copy linkLink copied to clipboard!
Configure Microsoft Azure Blob Storage for the Cluster Observability Operator to store tracing and logging data with support for both Tempo and Loki operators.
spec:
capabilities:
tracing:
enabled: true
storage:
objectStorage:
azure:
container:
accountName:
accountKeySecret:
name: azure-secret
key: account_key
Use the following command to create the secret containing the access key:
$ oc create secret generic azure-secret \
--from-literal=account_key="<ACCOUNT_KEY>"
2.2.3.1. Secret supported by Tempo and Loki operators Copy linkLink copied to clipboard!
$ oc create secret generic storage-secret \
--from-literal=container="<BLOB_STORAGE_CONTAINER_NAME>" \
--from-literal=account_name="<BLOB_STORAGE_ACCOUNT_NAME>" \
--from-literal=account_key="<BLOB_STORAGE_ACCOUNT_KEY>"
The Loki operator also supports the following fields:
-
environment -
endpoint_suffix- optional
2.2.4. Azure WIF - Short lived Copy linkLink copied to clipboard!
Configure Azure Workload Identity Federation (WIF) with short-lived credentials for the Cluster Observability Operator to securely access Azure Blob Storage without long-term access keys. Azure WIF short lived is supported by Tempo.
spec:
capabilities:
tracing:
enabled: true
storage:
objectStorage:
azureWIF:
container:
accountName:
audience:
clientID:
tenantID:
2.2.4.1. Secret supported by Tempo and Loki operators Copy linkLink copied to clipboard!
$ oc create secret generic storage-secret \
--from-literal=container="<BLOB_STORAGE_CONTAINER_NAME>" \
--from-literal=account_name="<BLOB_STORAGE_ACCOUNT_NAME>" \
--from-literal=audience="<AUDIENCE>" \
--from-literal=client_id="CLIENT_ID>" \
--from-literal=tenant_id="<TENANT_ID>"
-
audience- optional and defaults toapi://AzureADTokenExchange
2.2.5. Google Cloud Storage Copy linkLink copied to clipboard!
Configure Google Cloud Storage for the Cluster Observability Operator to store tracing and logging data with support for both standard and Workload Identity Federation authentication methods. Google Cloud Storage is supported by Tempo and Loki.
spec:
capabilities:
tracing:
enabled: true
storage:
objectStorage:
gcs:
bucket: bucket-name
keyJSONSecret:
name: gcs-secret
key: key.json
Use the following command to create the secret:
$ oc -n $NAMESPACE create secret generic gcs-secret \
--from-file=key.json="$GCS_KEY_FILE_PATH"
2.2.5.1. Secret supported by Tempo and Loki operators Copy linkLink copied to clipboard!
$ oc create secret generic storage-secret \
--from-literal=bucketname="<BUCKET_NAME>" \
--from-literal=key.json="<PATH_TO_JSON_KEY_FILE>"
2.2.5.2. Google Cloud Storage WIF - Short lived Copy linkLink copied to clipboard!
Google Cloud Storage WIF short lived is supported by Tempo.
spec:
capabilities:
tracing:
enabled: true
storage:
objectStorage:
gcsWIF:
bucket: bucket-name
keyJSONSecret:
name: gcs-secret
key: key.json
audience: # optional
Use the following command to create the secret:
$ oc -n $NAMESPACE create secret generic gcs-secret \
--from-file=key.json="$GCS_KEY_FILE_PATH"
Chapter 3. Configuring the Cluster Observability Operator to monitor a service Copy linkLink copied to clipboard!
Monitor metrics for a service in OpenShift Container Platform by configuring monitoring stacks managed by the Cluster Observability Operator (COO).
To test monitoring a service, follow these steps:
- Deploy a sample service that defines a service endpoint.
-
Create a
ServiceMonitorobject that specifies how the service is to be monitored by the COO. -
Create a
MonitoringStackobject to discover theServiceMonitorobject.
3.1. Deploying a sample service for Cluster Observability Operator Copy linkLink copied to clipboard!
This configuration deploys a sample service named prometheus-coo-example-app in the user-defined ns1-coo project. The service exposes the custom version metric.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admincluster role or as a user with administrative permissions for the namespace.
Procedure
Create a YAML file named
prometheus-coo-example-app.yamlthat contains the following configuration details for a namespace, deployment, and service:apiVersion: v1 kind: Namespace metadata: name: ns1-coo --- apiVersion: apps/v1 kind: Deployment metadata: labels: app: prometheus-coo-example-app name: prometheus-coo-example-app namespace: ns1-coo spec: replicas: 1 selector: matchLabels: app: prometheus-coo-example-app template: metadata: labels: app: prometheus-coo-example-app spec: containers: - image: ghcr.io/rhobs/prometheus-example-app:0.4.2 imagePullPolicy: IfNotPresent name: prometheus-coo-example-app --- apiVersion: v1 kind: Service metadata: labels: app: prometheus-coo-example-app name: prometheus-coo-example-app namespace: ns1-coo spec: ports: - port: 8080 protocol: TCP targetPort: 8080 name: web selector: app: prometheus-coo-example-app type: ClusterIP- Save the file.
Apply the configuration to the cluster by running the following command:
$ oc apply -f prometheus-coo-example-app.yamlVerify that the pod is running by running the following command and observing the output:
$ oc -n ns1-coo get podExample output
NAME READY STATUS RESTARTS AGE prometheus-coo-example-app-0927545cb7-anskj 1/1 Running 0 81m
3.2. Specifying how a service is monitored by Cluster Observability Operator Copy linkLink copied to clipboard!
To use the metrics exposed by the sample service you created in the "Deploying a sample service for Cluster Observability Operator" section, you must configure monitoring components to scrape metrics from the /metrics endpoint. You can create this configuration by using a ServiceMonitor object that specifies how the service is to be monitored, or a PodMonitor object that specifies how a pod is to be monitored.
The ServiceMonitor object requires a Service object. The PodMonitor object does not, which enables the MonitoringStack object to scrape metrics directly from the metrics endpoint exposed by a pod.
This procedure shows how to create a ServiceMonitor object for a sample service named prometheus-coo-example-app in the ns1-coo namespace.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admincluster role or as a user with administrative permissions for the namespace. - You have installed the Cluster Observability Operator.
You have deployed the
prometheus-coo-example-appsample service in thens1-coonamespace.NoteThe
prometheus-coo-example-appsample service does not support TLS authentication.
Procedure
Create a YAML file named
example-coo-app-service-monitor.yamlthat contains the followingServiceMonitorobject configuration details:apiVersion: monitoring.rhobs/v1 kind: ServiceMonitor metadata: labels: k8s-app: prometheus-coo-example-monitor name: prometheus-coo-example-monitor namespace: ns1-coo spec: endpoints: - interval: 30s port: web scheme: http selector: matchLabels: app: prometheus-coo-example-appThis configuration defines a
ServiceMonitorobject that theMonitoringStackobject will reference to scrape the metrics data exposed by theprometheus-coo-example-appsample service.Apply the configuration to the cluster by running the following command:
$ oc apply -f example-coo-app-service-monitor.yamlVerify that the
ServiceMonitorresource is created by running the following command and observing the output:$ oc -n ns1-coo get servicemonitors.monitoring.rhobsExample output
NAME AGE prometheus-coo-example-monitor 81m
3.3. Creating a MonitoringStack object for the Cluster Observability Operator Copy linkLink copied to clipboard!
To scrape the metrics data exposed by the target prometheus-coo-example-app service, create a MonitoringStack object that references the ServiceMonitor object you created in the "Specifying how a service is monitored for Cluster Observability Operator" section. This MonitoringStack object can then discover the service and scrape the exposed metrics data from it.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admincluster role or as a user with administrative permissions for the namespace. - You have installed the Cluster Observability Operator.
-
You have deployed the
prometheus-coo-example-appsample service in thens1-coonamespace. -
You have created a
ServiceMonitorobject namedprometheus-coo-example-monitorin thens1-coonamespace.
Procedure
-
Create a YAML file for the
MonitoringStackobject configuration. For this example, name the fileexample-coo-monitoring-stack.yaml. Add the following
MonitoringStackobject configuration details:apiVersion: monitoring.rhobs/v1alpha1 kind: MonitoringStack metadata: name: example-coo-monitoring-stack namespace: ns1-coo spec: logLevel: debug retention: 1d resourceSelector: matchLabels: k8s-app: prometheus-coo-example-monitorApply the
MonitoringStackobject by running the following command:$ oc apply -f example-coo-monitoring-stack.yamlVerify that the
MonitoringStackobject is available by running the following command and inspecting the output:$ oc -n ns1-coo get monitoringstackExample output
NAME AGE example-coo-monitoring-stack 81mRun the following comand to retrieve information about the active targets from Prometheus and filter the output to list only targets labeled with
app=prometheus-coo-example-app. This verifies which targets are discovered and actively monitored by Prometheus with this specific label.$ oc -n ns1-coo exec -c prometheus prometheus-example-coo-monitoring-stack-0 -- curl -s 'http://localhost:9090/api/v1/targets' | jq '.data.activeTargets[].discoveredLabels | select(.__meta_kubernetes_endpoints_label_app=="prometheus-coo-example-app")'Example output
{ "__address__": "10.129.2.25:8080", "__meta_kubernetes_endpoint_address_target_kind": "Pod", "__meta_kubernetes_endpoint_address_target_name": "prometheus-coo-example-app-5d8cd498c7-9j2gj", "__meta_kubernetes_endpoint_node_name": "ci-ln-8tt8vxb-72292-6cxjr-worker-a-wdfnz", "__meta_kubernetes_endpoint_port_name": "web", "__meta_kubernetes_endpoint_port_protocol": "TCP", "__meta_kubernetes_endpoint_ready": "true", "__meta_kubernetes_endpoints_annotation_endpoints_kubernetes_io_last_change_trigger_time": "2024-11-05T11:24:09Z", "__meta_kubernetes_endpoints_annotationpresent_endpoints_kubernetes_io_last_change_trigger_time": "true", "__meta_kubernetes_endpoints_label_app": "prometheus-coo-example-app", "__meta_kubernetes_endpoints_labelpresent_app": "true", "__meta_kubernetes_endpoints_name": "prometheus-coo-example-app", "__meta_kubernetes_namespace": "ns1-coo", "__meta_kubernetes_pod_annotation_k8s_ovn_org_pod_networks": "{\"default\":{\"ip_addresses\":[\"10.129.2.25/23\"],\"mac_address\":\"0a:58:0a:81:02:19\",\"gateway_ips\":[\"10.129.2.1\"],\"routes\":[{\"dest\":\"10.128.0.0/14\",\"nextHop\":\"10.129.2.1\"},{\"dest\":\"172.30.0.0/16\",\"nextHop\":\"10.129.2.1\"},{\"dest\":\"100.64.0.0/16\",\"nextHop\":\"10.129.2.1\"}],\"ip_address\":\"10.129.2.25/23\",\"gateway_ip\":\"10.129.2.1\",\"role\":\"primary\"}}", "__meta_kubernetes_pod_annotation_k8s_v1_cni_cncf_io_network_status": "[{\n \"name\": \"ovn-kubernetes\",\n \"interface\": \"eth0\",\n \"ips\": [\n \"10.129.2.25\"\n ],\n \"mac\": \"0a:58:0a:81:02:19\",\n \"default\": true,\n \"dns\": {}\n}]", "__meta_kubernetes_pod_annotation_openshift_io_scc": "restricted-v2", "__meta_kubernetes_pod_annotation_seccomp_security_alpha_kubernetes_io_pod": "runtime/default", "__meta_kubernetes_pod_annotationpresent_k8s_ovn_org_pod_networks": "true", "__meta_kubernetes_pod_annotationpresent_k8s_v1_cni_cncf_io_network_status": "true", "__meta_kubernetes_pod_annotationpresent_openshift_io_scc": "true", "__meta_kubernetes_pod_annotationpresent_seccomp_security_alpha_kubernetes_io_pod": "true", "__meta_kubernetes_pod_controller_kind": "ReplicaSet", "__meta_kubernetes_pod_controller_name": "prometheus-coo-example-app-5d8cd498c7", "__meta_kubernetes_pod_host_ip": "10.0.128.2", "__meta_kubernetes_pod_ip": "10.129.2.25", "__meta_kubernetes_pod_label_app": "prometheus-coo-example-app", "__meta_kubernetes_pod_label_pod_template_hash": "5d8cd498c7", "__meta_kubernetes_pod_labelpresent_app": "true", "__meta_kubernetes_pod_labelpresent_pod_template_hash": "true", "__meta_kubernetes_pod_name": "prometheus-coo-example-app-5d8cd498c7-9j2gj", "__meta_kubernetes_pod_node_name": "ci-ln-8tt8vxb-72292-6cxjr-worker-a-wdfnz", "__meta_kubernetes_pod_phase": "Running", "__meta_kubernetes_pod_ready": "true", "__meta_kubernetes_pod_uid": "054c11b6-9a76-4827-a860-47f3a4596871", "__meta_kubernetes_service_label_app": "prometheus-coo-example-app", "__meta_kubernetes_service_labelpresent_app": "true", "__meta_kubernetes_service_name": "prometheus-coo-example-app", "__metrics_path__": "/metrics", "__scheme__": "http", "__scrape_interval__": "30s", "__scrape_timeout__": "10s", "job": "serviceMonitor/ns1-coo/prometheus-coo-example-monitor/0" }NoteThe above example uses
jqcommand-line JSON processor to format the output for convenience.
3.4. Validating the monitoring stack Copy linkLink copied to clipboard!
To validate that the monitoring stack is working correctly, access the example service and then view the gathered metrics.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admincluster role or as a user with administrative permissions for the namespace. - You have installed the Cluster Observability Operator.
-
You have deployed the
prometheus-coo-example-appsample service in thens1-coonamespace. -
You have created a
ServiceMonitorobject namedprometheus-coo-example-monitorin thens1-coonamespace. -
You have created a
MonitoringStackobject namedexample-coo-monitoring-stackin thens1-coonamespace.
Procedure
Create a route to expose the example
prometheus-coo-example-appservice. From your terminal, run the command:$ oc expose svc prometheus-coo-example-app -n ns1-coo- Access the route from your browser, or command line, to generate metrics.
Execute a query on the Prometheus pod to return the total HTTP requests metric:
$ oc -n ns1-coo exec -c prometheus prometheus-example-coo-monitoring-stack-0 -- curl -s 'http://localhost:9090/api/v1/query?query=http_requests_total'Example output (formatted using
jqfor convenience){ "status": "success", "data": { "resultType": "vector", "result": [ { "metric": { "__name__": "http_requests_total", "code": "200", "endpoint": "web", "instance": "10.129.2.25:8080", "job": "prometheus-coo-example-app", "method": "get", "namespace": "ns1-coo", "pod": "prometheus-coo-example-app-5d8cd498c7-9j2gj", "service": "prometheus-coo-example-app" }, "value": [ 1730807483.632, "3" ] }, { "metric": { "__name__": "http_requests_total", "code": "404", "endpoint": "web", "instance": "10.129.2.25:8080", "job": "prometheus-coo-example-app", "method": "get", "namespace": "ns1-coo", "pod": "prometheus-coo-example-app-5d8cd498c7-9j2gj", "service": "prometheus-coo-example-app" }, "value": [ 1730807483.632, "0" ] } ] } }
3.5. Scrape targets in multiple namespaces Copy linkLink copied to clipboard!
To scrape targets in multiple namespaces, set the namespace and resource selector in the MonitoringStack object.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admincluster role or as a user with administrative permissions for the namespace. - You have installed the Cluster Observability Operator.
Procedure
Deploy the following namespace object and
MonitoringStackYAML file:Example
MonitoringStackapiVersion: v1 kind: Namespace metadata: name: ns1-coo labels: monitoring.rhobs/stack: multi-ns --- apiVersion: monitoring.rhobs/v1alpha1 kind: MonitoringStack metadata: name: example-coo-monitoring-stack namespace: ns1-coo spec: logLevel: debug retention: 1d resourceSelector: matchLabels: k8s-app: prometheus-coo-example-monitor namespaceSelector: matchLabels: monitoring.rhobs/stack: multi-nsDeploy a sample application in the namespace
ns1-coo, with an alert that is always firing:apiVersion: apps/v1 kind: Deployment metadata: labels: app: prometheus-coo-example-app name: prometheus-coo-example-app namespace: ns1-coo spec: replicas: 1 selector: matchLabels: app: prometheus-coo-example-app template: metadata: labels: app: prometheus-coo-example-app spec: containers: - image: ghcr.io/rhobs/prometheus-example-app:0.4.2 imagePullPolicy: IfNotPresent name: prometheus-coo-example-app --- apiVersion: v1 kind: Service metadata: labels: app: prometheus-coo-example-app name: prometheus-coo-example-app namespace: ns1-coo spec: ports: - port: 8080 protocol: TCP targetPort: 8080 name: web selector: app: prometheus-coo-example-app type: ClusterIP --- apiVersion: monitoring.rhobs/v1 kind: ServiceMonitor metadata: labels: k8s-app: prometheus-coo-example-monitor name: prometheus-coo-example-monitor namespace: ns1-coo spec: endpoints: - interval: 30s port: web scheme: http selector: matchLabels: app: prometheus-coo-example-app --- apiVersion: monitoring.rhobs/v1 kind: PrometheusRule metadata: name: example-alert namespace: ns1-coo labels: k8s-app: prometheus-coo-example-monitor spec: groups: - name: example rules: - alert: VersionAlert for: 1m expr: version{job="prometheus-coo-example-app"} > 0 labels: severity: warningDeploy the same example application in another namespace labeled with
monitoring.rhobs/stack: multi-ns:apiVersion: v1 kind: Namespace metadata: name: ns2-coo labels: monitoring.rhobs/stack: multi-ns --- apiVersion: apps/v1 kind: Deployment metadata: labels: app: prometheus-coo-example-app name: prometheus-coo-example-app namespace: ns2-coo spec: replicas: 1 selector: matchLabels: app: prometheus-coo-example-app template: metadata: labels: app: prometheus-coo-example-app spec: containers: - image: ghcr.io/rhobs/prometheus-example-app:0.4.2 imagePullPolicy: IfNotPresent name: prometheus-coo-example-app --- apiVersion: v1 kind: Service metadata: labels: app: prometheus-coo-example-app name: prometheus-coo-example-app namespace: ns2-coo spec: ports: - port: 8080 protocol: TCP targetPort: 8080 name: web selector: app: prometheus-coo-example-app type: ClusterIP --- apiVersion: monitoring.rhobs/v1 kind: ServiceMonitor metadata: labels: k8s-app: prometheus-coo-example-monitor name: prometheus-coo-example-monitor namespace: ns2-coo spec: endpoints: - interval: 30s port: web scheme: http selector: matchLabels: app: prometheus-coo-example-app
Verification
Verify that the Prometheus instance adds new targets and that the alert are firing. Use a port-forward command to expose the Prometheus or the Alertmanager user interface that has been deployed by the
Monitoringstackinstance.Prometheus
$ oc port-forward -n ns1-coo pod/prometheus-example-coo-monitoring-stack-0 9090Alertmanager
$ oc port-forward -n ns1-coo pod/alertmanager-example-coo-monitoring-stack-0 9093-
Verify that the targets are being scraped and that the alerts are firing by browsing to
http://localhost:9090/targetsorhttp://localhost:9093/#/alerts.