Chapter 2. Customizing observability

2.1. Creating custom rules
Copy link

Create custom rules for the observability installation by adding Prometheus recording rules and alerting rules to the observability resource.

Recording rules provide you the ability to precalculate, or computate expensive expressions as needed. The results are saved as a new set of time series.

Alerting rules provide you the ability to specify the alert conditions based on how an alert should be sent to an external service.

Define custom rules with Prometheus to create alert conditions, and send notifications to an external messaging service.

Note: When you update your custom rules, observability-thanos-rule pods are restarted automatically.

Create a ConfigMap named thanos-ruler-custom-rules in the open-cluster-management-observability namespace. The key must be named, custom_rules.yaml, as shown in the following example. You can create multiple rules in the configuration.

By default, the out-of-the-box alert rules are defined in the thanos-ruler-default-rules ConfigMap in the open-cluster-management-observability namespace.

For example, you can create a custom alert rule that notifies you when your CPU usage passes your defined value. Your YAML might resemble the following content:

data:
  custom_rules.yaml: |
    groups:
      - name: cluster-health
        rules:
        - alert: ClusterCPUHealth-jb
          annotations:
            summary: Notify when CPU utilization on a cluster is greater than the defined utilization limit
            description: "The cluster has a high CPU usage: {{ $value }} core for {{ $labels.cluster }} {{ $labels.clusterID }}."
          expr: |
            max(cluster:cpu_usage_cores:sum) by (clusterID, cluster, prometheus) > 0
          for: 5s
          labels:
            cluster: "{{ $labels.cluster }}"
            prometheus: "{{ $labels.prometheus }}"
            severity: critical

data:
  custom_rules.yaml: |
    groups:
      - name: cluster-health
        rules:
        - alert: ClusterCPUHealth-jb
          annotations:
            summary: Notify when CPU utilization on a cluster is greater than the defined utilization limit
            description: "The cluster has a high CPU usage: {{ $value }} core for {{ $labels.cluster }} {{ $labels.clusterID }}."
          expr: |
            max(cluster:cpu_usage_cores:sum) by (clusterID, cluster, prometheus) > 0
          for: 5s
          labels:
            cluster: "{{ $labels.cluster }}"
            prometheus: "{{ $labels.prometheus }}"
            severity: critical

Copy to Clipboard

Toggle word wrap

You can also create a custom recording rule within the thanos-ruler-custom-rules ConfigMap.
For example, you can create a recording rule that provides you the ability to get the sum of the container memory cache of a pod. Your YAML might resemble the following content:

data:
  custom_rules.yaml: |
    groups:
      - name: container-memory
        rules:
        - record: pod:container_memory_cache:sum
          expr: sum(container_memory_cache{pod!=""}) BY (pod, container)

data:
  custom_rules.yaml: |
    groups:
      - name: container-memory
        rules:
        - record: pod:container_memory_cache:sum
          expr: sum(container_memory_cache{pod!=""}) BY (pod, container)

Copy to Clipboard

Toggle word wrap

+ Note: If this is the first new custom rule, it is created immediately. For changes to the ConfigMap, the configuration is automatically reloaded. The configuration is reloaded because of the config-reload within the observability-thanos-ruler sidecar.

To verify that the alert rules are functioning correctly, launch the Grafana dashboard, navigate to the Explore page, and query ALERTS. The alert is only available in Grafana if the alert is initiated.

2.2. Adding custom metrics
Copy link

Add metrics to the metrics_list.yaml file, to be collected from managed clusters.

Before you add a custom metric, verify that mco observability is enabled with the following command: oc get mco observability -o yaml. Check for the following message in the status.conditions.message reads: Observability components are deployed and running.

Create a file named observability-metrics-custom-allowlist.yaml and add the name of the custom metric to the metrics_list.yaml parameter. Your YAML for the ConfigMap might resemble the following content:

kind: ConfigMap
apiVersion: v1
metadata:
  name: observability-metrics-custom-allowlist
data:
  metrics_list.yaml: |
    names:
      - node_memory_MemTotal_bytes
    rules:
    - record: apiserver_request_duration_seconds:histogram_quantile_90
      expr: histogram_quantile(0.90,sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",
        verb!=\"WATCH\"}[5m])) by (verb,le))

kind: ConfigMap
apiVersion: v1
metadata:
  name: observability-metrics-custom-allowlist
data:
  metrics_list.yaml: |
    names:
      - node_memory_MemTotal_bytes
    rules:
    - record: apiserver_request_duration_seconds:histogram_quantile_90
      expr: histogram_quantile(0.90,sum(rate(apiserver_request_duration_seconds_bucket{job=\"apiserver\",
        verb!=\"WATCH\"}[5m])) by (verb,le))

Copy to Clipboard

Toggle word wrap

For user workload metrics, see the Adding user workload metrics section.

In the names section, add the name of the custom metrics that is to be collected from the managed cluster.
In the rules section, enter only one value for the expr and record parameter pair to define the query expression. The metrics are collected as the name that is defined in the record parameter from your managed cluster. The metric value returned are the results after you run the query expression.
The names and rules sections are optional. You can use either one or both of the sections.

Create the observability-metrics-custom-allowlist ConfigMap in the open-cluster-management-observability namespace with the following command: oc apply -n open-cluster-management-observability -f observability-metrics-custom-allowlist.yaml.

Verify that data from your custom metric is being collected by querying the metric from the Explore page, from the Grafana dashboard. You can also use the custom metrics in your own dashboard. For more information about viewing the dashboard, see Using Grafana dashboards.

2.2.1. Adding user workload metrics
Copy link

You can collect OpenShift Container Platform user-defined metrics from workloads in OpenShift Container Platform. You must enable monitoring, see Enabling monitoring for user-defined projects.

If you have a managed cluster with monitoring for user-defined workloads enabled, the user workloads are located in the test namespace and generate metrics. These metrics are collected by Prometheus from the OpenShift Container Platform user workload.

Collect the metrics from the user workloads by creating a ConfigMap named, observability-metrics-custom-allowlist in the test namespace. View the following example:

kind: ConfigMap
apiVersion: v1
metadata:
  name: observability-metrics-custom-allowlist
  namespace: test
data:
  uwl_metrics_list.yaml: |
    names:
      - sample_metrics

kind: ConfigMap
apiVersion: v1
metadata:
  name: observability-metrics-custom-allowlist
  namespace: test
data:
  uwl_metrics_list.yaml: |
    names:
      - sample_metrics

Copy to Clipboard

Toggle word wrap

The uwl_metrics_list.yaml is the key for the ConfigMap data.
The value of the ConfigMap data is in YAML format. The names section includes the list of metric names, which you want to collect from the test namespace. After you create the ConfigMap, the specified metrics from the target namespace is collected by the observability collector and pushed to the hub cluster.

2.2.2. Removing default metrics
Copy link

If you want data to not be collected in your managed cluster for a specific metric, remove the metric from the observability-metrics-custom-allowlist.yaml file. When you remove a metric, the metric data is not collected in your managed clusters. As mentioned previously, first verify that mco observability is enabled.

Add the name of the default metric to the metrics_list.yaml parameter with a hyphen - at the start of the metric name. For example, -cluster_infrastructure_provider.

Create the observability-metrics-custom-allowlist ConfigMap in the open-cluster-management-observability namespace with the following command: oc apply -n open-cluster-management-observability -f observability-metrics-custom-allowlist.yaml.

Verify that the specific metric is not being collected from your managed clusters. When you query the metric from the Grafana dashboard, the metric is not displayed.

2.3. Exporting metrics to external endpoints
Copy link

You can customize observability to export the metrics to external endpoints, which support the Prometheus Remote-Write specification in real time. For more information, see Prometheus Remote-Write specification.

2.3.1. Creating the Kubernetes secret for an external endpoint
Copy link

You must create a Kubernetes secret with the access information of the external endpoint in the open-cluster-management-observability namespace. View the following example secret:

apiVersion: v1
kind: Secret
metadata:
  name: victoriametrics
  namespace: open-cluster-management-observability
type: Opaque
stringData:
  ep.yaml: |
    url: http://victoriametrics:8428/api/v1/write
    http_client_config:
      basic_auth:
        username: test
        password: test

apiVersion: v1
kind: Secret
metadata:
  name: victoriametrics
  namespace: open-cluster-management-observability
type: Opaque
stringData:
  ep.yaml: |
    url: http://victoriametrics:8428/api/v1/write
    http_client_config:
      basic_auth:
        username: test
        password: test

Copy to Clipboard

Toggle word wrap

The ep.yaml is the key of the content and is used in the MultiClusterObservability custom resource in next step. Currently, observability supports exporting metrics to endpoints without any security checks, with basic authentication or with tls enablement. View the following tables for a full list of supported parameters:

Expand

Name	Description	Schema
url required	URL for the external endpoint.	string
http_client_config optional	Advanced configuration for the HTTP client.	HttpClientConfig

HttpClientConfig

Expand

Name	Description	Schema
basic_auth optional	HTTP client configuration for basic authentication.	BasicAuth
tls_config optional	HTTP client configuration for TLS.	TLSConfig

BasicAuth

Expand

Name	Description	Schema
username optional	User name for basic authorization.	string
password optional	Password for basic authorization.	string

TLSConfig

Expand

Name	Description	Schema
secret_name required	Name of the secret that contains certificates.	string
ca_file_key optional	Key of the CA certificate in the secret (only optional if insecure_skip_verify is set to true).	string
cert_file_key required	Key of the client certificate in the secret.	string
key_file_key required	Key of the client key in the secret.	string
insecure_skip_verify optional	Parameter to skip the verification for target certificate.	bool

2.3.2. Updating the MultiClusterObservability custom resource
Copy link

After you create the Kubernetes secret, you must update the MultiClusterObservability custom resource to add writeStorage in the spec.storageConfig parameter. View the following example:

spec:
  storageConfig:
    writeStorage:
    - key: ep.yaml
      name: victoriametrics

spec:
  storageConfig:
    writeStorage:
    - key: ep.yaml
      name: victoriametrics

Copy to Clipboard

Toggle word wrap

The value for writeStorage is a list. You can add an item to the list when you want to export metrics to one external endpoint. If you add more than one item to the list, then the metrics are exported to multiple external endpoints. Each item contains two attributes: name and key. Name is the name of the Kubernetes secret that contains endpoint access information, and key is the key of the content in the secret. View the following description table for the

2.3.3. Viewing the status of metric export
Copy link

After the metrics export is enabled, you can view the status of metrics export by checking the acm_remote_write_requests_total metric. From the OpenShift console of your hub cluster, navigate to the Metrics page by clicking Metrics in the Observe section.

Then query the acm_remote_write_requests_total metric. The value of that metric is the total number of requests with a specific response for one external endpoint, on one observatorium API instance. The name label is the name for the external endpoint. The code label is the return code of the HTTP request for the metrics export.

2.4. Adding advanced configuration
Copy link

Add the advanced configuration section to update the retention for each observability component, according to your needs.

Edit the MultiClusterObservability custom resource and add the advanced section with the following command: oc edit mco observability -o yaml. Your YAML file might resemble the following contents:

spec:
  advanced:
    retentionConfig:
      blockDuration: 2h
      deleteDelay: 48h
      retentionInLocal: 24h
      retentionResolutionRaw: 30d
      retentionResolution5m: 180d
      retentionResolution1h: 0d
    receive:
      resources:
        limits:
          memory: 4096Gi
      replicas: 3

spec:
  advanced:
    retentionConfig:
      blockDuration: 2h
      deleteDelay: 48h
      retentionInLocal: 24h
      retentionResolutionRaw: 30d
      retentionResolution5m: 180d
      retentionResolution1h: 0d
    receive:
      resources:
        limits:
          memory: 4096Gi
      replicas: 3

Copy to Clipboard

Toggle word wrap

For descriptions of all the parameters that can added into the advanced configuration, see the Observability API.

2.5. Updating the MultiClusterObservability custom resource replicas from the console
Copy link

If your workload increases, increase the number of replicas of your observability pods. Navigate to the Red Hat OpenShift Container Platform console from your hub cluster. Locate the MultiClusterObservability custom resource, and update the replicas parameter value for the component where you want to change the replicas. Your updated YAML might resemble the following content:

spec:
   advanced:
      receive:
         replicas: 6

spec:
   advanced:
      receive:
         replicas: 6

Copy to Clipboard

Toggle word wrap

For more information about the parameters within the mco observability custom resource, see the Observability API.

2.6. Customizing route certificate
Copy link

If you want to customize the OpenShift Container Platform route certification, you must add the routes in the alt_names section. To ensure your OpenShift Container Platform routes are accessible, add the following information: alertmanager.apps.<domainname>, observatorium-api.apps.<domainname>, rbac-query-proxy.apps.<domainname>.

Note: Users are responsible for certificate rotations and updates.

2.7. Customizing certificates for accessing the object store
Copy link

Complete the following steps to customize certificates for accessing the object store:

Edit the http_config section by adding the certificate in the object store secret. View the following example:

 thanos.yaml: |
    type: s3
    config:
      bucket: "thanos"
      endpoint: "minio:9000"
      insecure: false
      access_key: "minio"
      secret_key: "minio123"
      http_config:
        tls_config:
          ca_file: /etc/minio/certs/ca.crt
          insecure_skip_verify: false

 thanos.yaml: |
    type: s3
    config:
      bucket: "thanos"
      endpoint: "minio:9000"
      insecure: false
      access_key: "minio"
      secret_key: "minio123"
      http_config:
        tls_config:
          ca_file: /etc/minio/certs/ca.crt
          insecure_skip_verify: false

Copy to Clipboard

Toggle word wrap

Add the object store secret in the open-cluster-management-observability namespace. The secret must contain the ca.crt that you defined in the previous secret example. If you want to enable Mutual TLS, you need to provide public.crt, and private.key in the previous secret. View the following example:

 thanos.yaml: |
    type: s3
    config:
      ...
      http_config:
        tls_config:
          ca_file: /etc/minio/certs/ca.crt 
          cert_file: /etc/minio/certs/public.crt
          key_file: /etc/minio/certs/private.key
          insecure_skip_verify: false

 thanos.yaml: |
    type: s3
    config:
      ...
      http_config:
        tls_config:
          ca_file: /etc/minio/certs/ca.crt

1


          cert_file: /etc/minio/certs/public.crt
          key_file: /etc/minio/certs/private.key
          insecure_skip_verify: false

Copy to Clipboard

Toggle word wrap

1: The path to certificates and key values for the thanos-object-storage secret.

Configure the secret name by updating the TLSSecretName parameter in the MultiClusterObservability custom resource. View the following example where the secret name is tls-certs-secret:
```
metricObjectStorage:
  key: thanos.yaml
  name: thanos-object-storage
  tlsSecretName: tls-certs-secret
```
```
metricObjectStorage:
  key: thanos.yaml
  name: thanos-object-storage
  tlsSecretName: tls-certs-secret
```
Copy to Clipboard Toggle word wrap

Mount the secret in the tlsSecretMountPath resource of all components that need to access the object store by renaming the existing TLS. See the following example:

metricObjectStorage:
      key: thanos.yaml
      name: thanos-object-storage
      tlsSecretName: <existing-tls-certs-secret>
      tlsSecretMountPath: /etc/minio/certs
        receiver:
        store:
        ruler:
        compact:

metricObjectStorage:
      key: thanos.yaml
      name: thanos-object-storage
      tlsSecretName: <existing-tls-certs-secret>
      tlsSecretMountPath: /etc/minio/certs
        receiver:
        store:
        ruler:
        compact:

Copy to Clipboard

Toggle word wrap

To verify that you can access the object store, check that the pods are displayed.

2.8. Viewing and exploring data
Copy link

View the data from your managed clusters by accessing Grafana from the hub cluster. You can query specific alerts and add filters for the query.

For example, to cluster_infrastructure_provider from a single node cluster, use the following query expression: cluster_infrastructure_provider{clusterType="SNO"}

Note: Do not set the ObservabilitySpec.resources.CPU.limits parameter if observability is enabled on single node managed clusters. When you set the CPU limits, it causes the observability pod to be counted against the capacity for your managed cluster. See the reference for Management Workload Partitioning in the Additional resources section.

2.8.1. Viewing historical data
Copy link

When you query historical data, manually set your query parameter options to control how much data is displayed from the dashboard. Complete the following steps:

From your hub cluster, select the Grafana link that is in the console header.
Edit your cluster dashboard by selecting Edit Panel.
From the Query front-end data source in Grafana, click the Query tab.
Select $datasource.
If you want to see more data, increase the value of the Step parameter section. If the Step parameter section is empty, it is automatically calculated.
Find the Custom query parameters field and select max_source_resolution=auto.
To verify that the data is displayed, refresh your Grafana page.

Your query data appears from the Grafana dashboard.

2.8.2. Viewing the etcd table
Copy link

View the etcd table from the hub cluster dashboard in Grafana to learn the stability of the etcd as a data store.

Select the Grafana link from your hub cluster to view the etcd table data, which is collected from your hub cluster. The Leader election changes across managed clusters are displayed.

2.8.3. Viewing the cluster fleet service-level overview for the Kubernetes API server dashboard
Copy link

View the cluster fleet Kubernetes API service-level overview from the hub cluster dashboard in Grafana.

After you navigate to the Grafana dashboard, access the managed dashboard menu by selecting Kubernetes > Service-Level Overview > API Server. The Fleet Overview and Top Cluster details are displayed.

View the total number of clusters that are exceeding or meeting the targeted service-level objective (SLO) value for the past seven or 30-day period, offending and non-offending clusters, and API Server Request Duration.

2.8.4. Viewing the cluster service-level overview for the Kubernetes API server dashboard
Copy link

View the Kubernetes API service-level overview table from the hub cluster dashboard in Grafana.

After you navigate to the Grafana dashboard, access the managed dashboard menu by selecting Kubernetes > Service-Level Overview > API Server. The Fleet Overview and Top Cluster details are displayed.

View the error budget for the past seven or 30-day period, the remaining downtime, and trend.

2.9. Disabling observability
Copy link

You can disable observability, which stops data collection on the Red Hat Advanced Cluster Management hub cluster.

2.9.1. Disabling observability on all clusters
Copy link

Disable observability by removing observability components on all managed clusters.

Update the multicluster-observability-operator resource by setting enableMetrics to false. Your updated resource might resemble the following change:

spec:
  imagePullPolicy: Always
  imagePullSecret: multiclusterhub-operator-pull-secret
  observabilityAddonSpec: # The ObservabilityAddonSpec defines the global settings for all managed clusters which have observability add-on enabled
    enableMetrics: false #indicates the observability addon push metrics to hub server

spec:
  imagePullPolicy: Always
  imagePullSecret: multiclusterhub-operator-pull-secret
  observabilityAddonSpec: # The ObservabilityAddonSpec defines the global settings for all managed clusters which have observability add-on enabled
    enableMetrics: false #indicates the observability addon push metrics to hub server

Copy to Clipboard

Toggle word wrap

2.9.2. Disabling observability on a single cluster
Copy link

Disable observability by removing observability components on specific managed clusters. Add the observability: disabled label to the managedclusters.cluster.open-cluster-management.io custom resource.

From the Red Hat Advanced Cluster Management console Clusters page, add the observability=disabled label to the specified cluster.

Note: When a managed cluster with the observability component is detached, the metrics-collector deployments are removed.

2.10. Additional resources
Copy link

For more details about observability alerts, see Observability alerts
To learn more about alert forwarding, see the Prometheus Alertmanager documentation.
See Observability alerts for more information.
For more topics about the observability service, see Observability service introduction.
Refer to Prometheus configuration for more information.
See Management Workload Partitioning for more information.
Return to the beginning of this topic, Customizing observability.

2.1. Creating custom rules
Copy link

2.2. Adding custom metrics
Copy link

2.2.1. Adding user workload metrics
Copy link

2.2.2. Removing default metrics
Copy link

2.3. Exporting metrics to external endpoints
Copy link

2.3.1. Creating the Kubernetes secret for an external endpoint
Copy link

2.3.2. Updating the MultiClusterObservability custom resource
Copy link

2.3.3. Viewing the status of metric export
Copy link

2.4. Adding advanced configuration
Copy link

2.5. Updating the MultiClusterObservability custom resource replicas from the console
Copy link

2.6. Customizing route certificate
Copy link

2.7. Customizing certificates for accessing the object store
Copy link

2.8. Viewing and exploring data
Copy link

2.8.1. Viewing historical data
Copy link

2.8.2. Viewing the etcd table
Copy link

2.8.3. Viewing the cluster fleet service-level overview for the Kubernetes API server dashboard
Copy link

2.8.4. Viewing the cluster service-level overview for the Kubernetes API server dashboard
Copy link

2.9. Disabling observability
Copy link

2.9.1. Disabling observability on all clusters
Copy link

2.9.2. Disabling observability on a single cluster
Copy link

2.10. Additional resources
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 2. Customizing observability

2.1. Creating custom rulesCopy linkLink copied to clipboard!

2.2. Adding custom metricsCopy linkLink copied to clipboard!

2.2.1. Adding user workload metricsCopy linkLink copied to clipboard!

2.2.2. Removing default metricsCopy linkLink copied to clipboard!

2.3. Exporting metrics to external endpointsCopy linkLink copied to clipboard!

2.3.1. Creating the Kubernetes secret for an external endpointCopy linkLink copied to clipboard!

2.3.2. Updating the MultiClusterObservability custom resourceCopy linkLink copied to clipboard!

2.3.3. Viewing the status of metric exportCopy linkLink copied to clipboard!

2.4. Adding advanced configurationCopy linkLink copied to clipboard!

2.5. Updating the MultiClusterObservability custom resource replicas from the consoleCopy linkLink copied to clipboard!

2.6. Customizing route certificateCopy linkLink copied to clipboard!

2.7. Customizing certificates for accessing the object storeCopy linkLink copied to clipboard!

2.8. Viewing and exploring dataCopy linkLink copied to clipboard!

2.8.1. Viewing historical dataCopy linkLink copied to clipboard!

2.8.2. Viewing the etcd tableCopy linkLink copied to clipboard!

2.8.3. Viewing the cluster fleet service-level overview for the Kubernetes API server dashboardCopy linkLink copied to clipboard!

2.8.4. Viewing the cluster service-level overview for the Kubernetes API server dashboardCopy linkLink copied to clipboard!

2.9. Disabling observabilityCopy linkLink copied to clipboard!

2.9.1. Disabling observability on all clustersCopy linkLink copied to clipboard!

2.9.2. Disabling observability on a single clusterCopy linkLink copied to clipboard!

2.10. Additional resourcesCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

2.1. Creating custom rules
Copy link

2.2. Adding custom metrics
Copy link

2.2.1. Adding user workload metrics
Copy link

2.2.2. Removing default metrics
Copy link

2.3. Exporting metrics to external endpoints
Copy link

2.3.1. Creating the Kubernetes secret for an external endpoint
Copy link

2.3.2. Updating the MultiClusterObservability custom resource
Copy link

2.3.3. Viewing the status of metric export
Copy link

2.4. Adding advanced configuration
Copy link

2.5. Updating the MultiClusterObservability custom resource replicas from the console
Copy link

2.6. Customizing route certificate
Copy link

2.7. Customizing certificates for accessing the object store
Copy link

2.8. Viewing and exploring data
Copy link

2.8.1. Viewing historical data
Copy link

2.8.2. Viewing the etcd table
Copy link

2.8.3. Viewing the cluster fleet service-level overview for the Kubernetes API server dashboard
Copy link

2.8.4. Viewing the cluster service-level overview for the Kubernetes API server dashboard
Copy link

2.9. Disabling observability
Copy link

2.9.1. Disabling observability on all clusters
Copy link

2.9.2. Disabling observability on a single cluster
Copy link

2.10. Additional resources
Copy link