Cluster Observability Operator


OpenShift Container Platform 4.17

Configuring and using the Cluster Observability Operator in OpenShift Container Platform

Red Hat OpenShift Documentation Team

Abstract

Use the Cluster Observability Operator to deploy and configure observability components in OpenShift Container Platform.

The Cluster Observability Operator (COO) is an optional OpenShift Container Platform Operator that enables administrators to create standalone monitoring stacks that are independently configurable for use by different services and users.

The COO complements the built-in monitoring capabilities of OpenShift Container Platform. You can deploy it in parallel with the default platform and user workload monitoring stacks managed by the Cluster Monitoring Operator (CMO).

These release notes track the development of the Cluster Observability Operator in OpenShift Container Platform.

The following table provides information about which features are available depending on the version of Cluster Observability Operator and OpenShift Container Platform:

Expand
COO VersionOCP VersionsDistributed tracingLoggingTroubleshooting panelACM alertsIncident detection

1.1+

4.12 - 4.14

1.1+

4.15

1.1+

4.16 - 4.18

1.2+

4.19+

1.1. Cluster Observability Operator 1.2.2

The following advisory is available for Cluster Observability Operator 1.2.2:

1.1.1. Bug fixes

  • Before this update, the installation of the incident detection feature could fail intermittently. The symptoms include the incident detection UI being visible but not including any data. In addition, the health-analyzer ServiceMonitor resource is in a failed state, with the error message tls: failed to verify certificate: x509. With this release, the incident detection feature installs correctly. (COO-1062)

    If you are upgrading from 1.2.1 where the bug was occurring, you must recreate the monitoring UI plugin to resolve the issue.

1.1.2. Known issues

These are the known issues in Cluster Observability Operator 1.2.2:

  • When installing version 1.2.2 or when upgrading from version 1.2.0, the monitoring plugin’s UIPlugin resource can be corrupted. This occurs when you have also deployed distributed tracing, the troubleshooting panel, and Advance Cluster Management (ACM), together with the monitoring UI plugin. You can resolve this issue by recreating the UI plugin. (COO-1051)

    If you have previously resolved the issue in 1.2.1 and then upgrade to 1.2.2, this issue will not reoccur.

1.2. Cluster Observability Operator 1.2.1

The following advisory is available for Cluster Observability Operator 1.2.1:

1.2.1. Bug fixes

  • Before this update, an old version label matcher was retained during the Operator version 1.2 upgrade. This caused Perses dashboards to become unavailable. With this release, the version label is removed and Perses dashboards are correctly reconciled.

1.2.2. Known issues

These are the known issues in Cluster Observability Operator 1.2.1:

  • The installation of the incident detection feature could fail intermittently. The symptoms include the incident detection UI being visible but not including any data. In addition, the health-analyzer ServiceMonitor resource is in a failed state, with the error message tls: failed to verify certificate: x509. You can resolve this issue by upgrading to 1.2.2 and recreating the monitoring UI plugin. (COO-1062)
  • When installing version 1.2.1 or when upgrading from version 1.2.0, the monitoring plugin’s UIPlugin resource can be corrupted. This occurs when you have also deployed distributed tracing, the troubleshooting panel, and Advance Cluster Management (ACM), together with the monitoring UI plugin. You can resolve this issue by recreating the UI plugin. (COO-1051)

1.3. Cluster Observability Operator 1.2

The following advisory is available for Cluster Observability Operator 1.2:

1.3.1. New features and enhancements

  • The logging UI plugin now supports the OTEL format, in addition to the previously supported ViaQ scheme. (COO-816)
  • Accelerators Perses dashboards are deployed by default when you install the monitoring UI plugin. (COO-942)
  • Multiple results per graph node are now displayed for Korrel8r. (COO-785)
  • Direct navigation to individual incident detail is now supported in the incident detection panel, and this enables the incidents overview functionality in Red Hat Advanced Cluster Management (RHACM) 2.14. (COO-977, ACM-18751)
  • Advanced filters have been added to the tracing view. (COO-979)
  • The status of the distributed tracing UI plugin is now General Availability (GA), supporting Patternfly 4, 5 and 6. (COO-873)

1.3.2. Bug fixes

  • Previously, LokiStack was a prerequisite for installing the logging UI plugin. With this release, you can install the logging UI plugin without LokiStack. (COO-760)
  • Previously, the Silence Alert button in the IncidentsComponent section did not pre-populate the fields and was not usable. This release resolves the issue. (COO-970)

1.3.3. Known issues

These are the known issues in Cluster Observability Operator 1.2.0:

  • When upgrading from COO 1.1.1 to COO 1.2, the Perses dashboard is not correctly reconciled, and this requires the monitoring UI plugin to be reinstalled. (COO-978)

1.4. Cluster Observability Operator 1.1.1

1.4.1. Bug fixes

  • Previously, observability-operator and perses-operator pods on many clusters entered a CrashLoopBackOff state due to OutOfMemory errors, after upgrading from Cluster Observability Operator 1.0. This release resolves the issue. (COO-784)

1.5. Cluster Observability Operator 1.1

The following advisory is available for Cluster Observability Operator 1.1:

1.5.1. New features and enhancements

  • You can now install the monitoring UI plugin using COO. (COO-262)
  • TLS support for the Thanos web endpoint has been added. (COO-222)

1.5.2. Known issues

These are the known issues in Cluster Observability Operator 1.1.0:

1.5.3. Bug fixes

  • Previously, the logging UI plugin did not support setting a custom LokiStack name or namespace. This release resolves the issue. (COO-332)

1.6. Cluster Observability Operator 1.0

1.6.1. New features and enhancements

  • COO is now enabled for OpenShift Container Platform platform monitoring. (COO-476)

    • Implements HTTPS support for COO web server. (COO-480)
    • Implements authn/authz for COO web server. (COO-481)
    • Configures ServiceMonitor resource to collect metrics from COO. (COO-482)
    • Adds operatorframework.io/cluster-monitoring=true annotation to the OLM bundle. (COO-483)
    • Defines the alerting strategy for COO . (COO-484)
    • Configures PrometheusRule for alerting. (COO-485)
  • Support level annotations have been added to the UIPlugin CR when created. The support level is based on the plugin type, with values of DevPreview, TechPreview, or GeneralAvailability. (COO-318)
  • You can now configure the Alertmanager scheme and tlsConfig fields in the Prometheus CR. (COO-219)
  • The extended Technical Preview for the troubleshooting panel adds support for correlating traces with Kubernetes resources and directly with other observable signals including logs, alerts, metrics, and network events. (COO-450)

    • You can select a Tempo instance and tenant when you navigate to the tracing page by clicking Observe → Tracing in the web console. The preview troubleshooting panel only works with the openshift-tracing / platform instance and the platform tenant.
    • The troubleshooting panel works best in the Administrator perspective. It has limited functionality in the Developer perspective due to authorization issues with some back ends, most notably Prometheus for metrics and alerts. This will be addressed in a future release.

The following table provides information about which features are available depending on the version of Cluster Observability Operator and OpenShift Container Platform:

Expand
COO VersionOCP VersionsDistributed TracingLoggingTroubleshooting Panel

1.0

4.12 - 4.15

1.0

4.16+

1.6.2. CVEs

1.6.3. Bug fixes

  • Previously, the default namespace for the COO installation was openshift-operators. With this release, the defaullt namespace changes to openshift-cluster-observability-operator. (COO-32)
  • Previously, korrel8r was only able to parse time series selector expressions. With this release, korrel8r can parse any valid PromQL expression to extract the time series selectors that it uses for correlation. (COO-558)
  • Previously, when viewing a Tempo instance from the Distributed Tracing UI plugin, the scatter plot graph showing the traces duration was not rendered correctly. The bubble size was too large and overlapped the x and y axis. With this release, the graph is rendered correctly. (COO-319)

The following table provides information about which features are available depending on older version of Cluster Observability Operator and OpenShift Container Platform:

Expand
COO VersionOCP VersionsDashboardsDistributed TracingLoggingTroubleshooting Panel

0.2.0

4.11

0.3.0+, 0.4.0+

4.11 - 4.15

0.3.0+, 0.4.0+

4.16+

1.8. Cluster Observability Operator 0.4.1

The following advisory is available for Cluster Observability Operator 0.4.1:

1.8.1. New features and enhancements

  • You can now configure WebTLS for Prometheus and Alertmanager.

1.8.2. CVEs

1.8.3. Bug fixes

  • Previously, when you deleted the dashboard UI plugin, the consoles.operator.openshift.io resource still contained console-dashboards-plugin. This release resolves the issue. (COO-152)
  • Previously, the web console did not display the correct icon for Red Hat COO . This release resolves the issue. (COO-353)
  • Previously, when you installed the COO from the web console, the support section contained an invalid link. This release resolves the issue. (COO-354)
  • Previously, the cluster service version (CSV) for COO linked to an unofficial version of the documentation. This release resolves the issue. (COO-356)

1.9. Cluster Observability Operator 0.4.0

The following advisory is available for Cluster Observability Operator 0.4.0:

1.9.1. New features and enhancements

1.9.1.1. Troubleshooting UI plugin
  • The troubleshooting UI panel has been improved so you can now select and focus on a specific starting signal.
  • There is more visibility into Korrel8r queries, with the option of selecting the depth.
  • Users of OpenShift Container Platform version 4.17+ can access the troubleshooting UI panel from the Application Launcher app launcher . Alternatively, on versions 4.16+, you can access it in the web console by clicking on ObserveAlerting.

For more information, see troubleshooting UI plugin.

1.9.1.2. Distributed tracing UI plugin
  • The distributed tracing UI plugin has been enhanced, with a Gantt chart now available for exploring traces.

For more information, see distributed tracing UI plugin.

1.9.2. Bug fixes

  • Previously, metrics were not available to normal users when accessed in the Developer perspective of the web console, by clicking on ObserveLogs. This release resolves the issue. (COO-288)
  • Previously, the troubleshooting UI plugin used the wrong filter for network observability. This release resolves the issue. (COO-299)
  • Previously, the troubleshooting UI plugin generated an incorrect URL for pod label searches. This release resolves the issue. (COO-298)
  • Previously, there was an authorization vulnerability in the Distributed tracing UI plugin. This release resolves the issue and the Distributed tracing UI plugin has been hardened by using only multi-tenant TempoStack and TempoMonolithic instances going forward.

1.10. Cluster Observability Operator 0.3.2

The following advisory is available for Cluster Observability Operator 0.3.2:

1.10.1. New features and enhancements

  • With this release, you can now use tolerations and node selectors with MonitoringStack components.

1.10.2. Bug fixes

  • Previously, the logging UIPlugin was not in the Available state and the logging pod was not created, when installed on a specific version of OpenShift Container Platform. This release resolves the issue. (COO-260)

1.11. Cluster Observability Operator 0.3.0

The following advisory is available for Cluster Observability Operator 0.3.0:

1.11.1. New features and enhancements

  • With this release, the Cluster Observability Operator adds backend support for future OpenShift Container Platform observability web console UI plugins and observability components.

1.12. Cluster Observability Operator 0.2.0

The following advisory is available for Cluster Observability Operator 0.2.0:

1.12.1. New features and enhancements

  • With this release, the Cluster Observability Operator supports installing and managing observability-related plugins for the OpenShift Container Platform web console user interface (UI). (COO-58)

1.13. Cluster Observability Operator 0.1.3

The following advisory is available for Cluster Observability Operator 0.1.3:

1.13.1. Bug fixes

  • Previously, if you tried to access the Prometheus web user interface (UI) at http://<prometheus_url>:9090/graph, the following error message would display: Error opening React index.html: open web/ui/static/react/index.html: no such file or directory. This release resolves the issue, and the Prometheus web UI now displays correctly. (COO-34)

1.14. Cluster Observability Operator 0.1.2

The following advisory is available for Cluster Observability Operator 0.1.2:

1.14.1. CVEs

1.14.2. Bug fixes

  • Previously, certain cluster service version (CSV) annotations were not included in the metadata for COO. Because of these missing annotations, certain COO features and capabilities did not appear in the package manifest or in the OperatorHub user interface. This release adds the missing annotations, thereby resolving this issue. (COO-11)
  • Previously, automatic updates of the COO did not work, and a newer version of the Operator did not automatically replace the older version, even though the newer version was available in OperatorHub. This release resolves the issue. (COO-12)
  • Previously, Thanos Querier only listened for network traffic on port 9090 of 127.0.0.1 (localhost), which resulted in a 502 Bad Gateway error if you tried to reach the Thanos Querier service. With this release, the Thanos Querier configuration has been updated so that the component now listens on the default port (10902), thereby resolving the issue. As a result of this change, you can also now modify the port via server side apply (SSA) and add a proxy chain, if required. (COO-14)

1.15. Cluster Observability Operator 0.1.1

The following advisory is available for Cluster Observability Operator 0.1.1:

1.15.1. New features and enhancements

This release updates the Cluster Observability Operator to support installing the Operator in restricted networks or disconnected environments.

1.16. Cluster Observability Operator 0.1

This release makes a Technology Preview version of the Cluster Observability Operator available on OperatorHub.

The Cluster Observability Operator (COO) is an optional component of the OpenShift Container Platform designed for creating and managing highly customizable monitoring stacks. It enables cluster administrators to automate configuration and management of monitoring needs extensively, offering a more tailored and detailed view of each namespace compared to the default OpenShift Container Platform monitoring system.

The COO deploys the following monitoring components:

  • Prometheus - A highly available Prometheus instance capable of sending metrics to an external endpoint by using remote write.
  • Thanos Querier (optional) - Enables querying of Prometheus instances from a central location.
  • Alertmanager (optional) - Provides alert configuration capabilities for different services.
  • UI plugins (optional) - Enhances the observability capabilities with plugins for monitoring, logging, distributed tracing and troubleshooting.
  • Korrel8r (optional) - Provides observability signal correlation, powered by the open source Korrel8r project.

2.1. COO compared to default monitoring stack

The COO components function independently of the default in-cluster monitoring stack, which is deployed and managed by the Cluster Monitoring Operator (CMO). Monitoring stacks deployed by the two Operators do not conflict. You can use a COO monitoring stack in addition to the default platform monitoring components deployed by the CMO.

The key differences between COO and the default in-cluster monitoring stack are shown in the following table:

Expand
FeatureCOODefault monitoring stack

Scope and integration

Offers comprehensive monitoring and analytics for enterprise-level needs, covering cluster and workload performance.

However, it lacks direct integration with OpenShift Container Platform and typically requires an external Grafana instance for dashboards.

Limited to core components within the cluster, for example, API server and etcd, and to OpenShift-specific namespaces.

There is deep integration into OpenShift Container Platform including console dashboards and alert management in the console.

Configuration and customization

Broader configuration options including data retention periods, storage methods, and collected data types.

The COO can delegate ownership of single configurable fields in custom resources to users by using Server-Side Apply (SSA), which enhances customization.

Built-in configurations with limited customization options.

Data retention and storage

Long-term data retention, supporting historical analysis and capacity planning

Shorter data retention times, focusing on short-term monitoring and real-time detection.

2.2. Key advantages of using COO

Deploying COO helps you address monitoring requirements that are hard to achieve using the default monitoring stack.

2.2.1. Extensibility

  • You can add more metrics to a COO-deployed monitoring stack, which is not possible with core platform monitoring without losing support.
  • You can receive cluster-specific metrics from core platform monitoring through federation.
  • COO supports advanced monitoring scenarios like trend forecasting and anomaly detection.

2.2.2. Multi-tenancy support

  • You can create monitoring stacks per user namespace.
  • You can deploy multiple stacks per namespace or a single stack for multiple namespaces.
  • COO enables independent configuration of alerts and receivers for different teams.

2.2.3. Scalability

  • Supports multiple monitoring stacks on a single cluster.
  • Enables monitoring of large clusters through manual sharding.
  • Addresses cases where metrics exceed the capabilities of a single Prometheus instance.

2.2.4. Flexibility

  • Decoupled from OpenShift Container Platform release cycles.
  • Faster release iterations and rapid response to changing requirements.
  • Independent management of alerting rules.

2.3. Target users for COO

COO is ideal for users who need high customizability, scalability, and long-term data retention, especially in complex, multi-tenant enterprise environments.

2.3.1. Enterprise-level users and administrators

Enterprise users require in-depth monitoring capabilities for OpenShift Container Platform clusters, including advanced performance analysis, long-term data retention, trend forecasting, and historical analysis. These features help enterprises better understand resource usage, prevent performance issues, and optimize resource allocation.

With multi-tenancy support, COO allows different teams to configure monitoring views for their projects and applications, making it suitable for teams with flexible monitoring needs.

2.3.3. Development and operations teams

COO provides fine-grained monitoring and customizable observability views for in-depth troubleshooting, anomaly detection, and performance tuning during development and operations.

Server-Side Apply is a feature that enables collaborative management of Kubernetes resources. The control plane tracks how different users and controllers manage fields within a Kubernetes object. It introduces the concept of field managers and tracks ownership of fields. This centralized control provides conflict detection and resolution, and reduces the risk of unintended overwrites.

Compared to Client-Side Apply, it is more declarative, and tracks field management instead of last applied state.

Server-Side Apply
Declarative configuration management by updating a resource’s state without needing to delete and recreate it.
Field management
Users can specify which fields of a resource they want to update, without affecting the other fields.
Managed fields
Kubernetes stores metadata about who manages each field of an object in the managedFields field within metadata.
Conflicts
If multiple managers try to modify the same field, a conflict occurs. The applier can choose to overwrite, relinquish control, or share management.
Merge strategy
Server-Side Apply merges fields based on the actor who manages them.

Procedure

  1. Add a MonitoringStack resource using the following configuration:

    Example MonitoringStack object

    apiVersion: monitoring.rhobs/v1alpha1
    kind: MonitoringStack
    metadata:
      labels:
        coo: example
      name: sample-monitoring-stack
      namespace: coo-demo
    spec:
      logLevel: debug
      retention: 1d
      resourceSelector:
        matchLabels:
          app: demo
    Copy to Clipboard Toggle word wrap

  2. A Prometheus resource named sample-monitoring-stack is generated in the coo-demo namespace. Retrieve the managed fields of the generated Prometheus resource by running the following command:

    $ oc -n coo-demo get Prometheus.monitoring.rhobs -oyaml --show-managed-fields
    Copy to Clipboard Toggle word wrap

    Example output

    managedFields:
    - apiVersion: monitoring.rhobs/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:labels:
            f:app.kubernetes.io/managed-by: {}
            f:app.kubernetes.io/name: {}
            f:app.kubernetes.io/part-of: {}
          f:ownerReferences:
            k:{"uid":"81da0d9a-61aa-4df3-affc-71015bcbde5a"}: {}
        f:spec:
          f:additionalScrapeConfigs: {}
          f:affinity:
            f:podAntiAffinity:
              f:requiredDuringSchedulingIgnoredDuringExecution: {}
          f:alerting:
            f:alertmanagers: {}
          f:arbitraryFSAccessThroughSMs: {}
          f:logLevel: {}
          f:podMetadata:
            f:labels:
              f:app.kubernetes.io/component: {}
              f:app.kubernetes.io/part-of: {}
          f:podMonitorSelector: {}
          f:replicas: {}
          f:resources:
            f:limits:
              f:cpu: {}
              f:memory: {}
            f:requests:
              f:cpu: {}
              f:memory: {}
          f:retention: {}
          f:ruleSelector: {}
          f:rules:
            f:alert: {}
          f:securityContext:
            f:fsGroup: {}
            f:runAsNonRoot: {}
            f:runAsUser: {}
          f:serviceAccountName: {}
          f:serviceMonitorSelector: {}
          f:thanos:
            f:baseImage: {}
            f:resources: {}
            f:version: {}
          f:tsdb: {}
      manager: observability-operator
      operation: Apply
    - apiVersion: monitoring.rhobs/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          .: {}
          f:availableReplicas: {}
          f:conditions:
            .: {}
            k:{"type":"Available"}:
              .: {}
              f:lastTransitionTime: {}
              f:observedGeneration: {}
              f:status: {}
              f:type: {}
            k:{"type":"Reconciled"}:
              .: {}
              f:lastTransitionTime: {}
              f:observedGeneration: {}
              f:status: {}
              f:type: {}
          f:paused: {}
          f:replicas: {}
          f:shardStatuses:
            .: {}
            k:{"shardID":"0"}:
              .: {}
              f:availableReplicas: {}
              f:replicas: {}
              f:shardID: {}
              f:unavailableReplicas: {}
              f:updatedReplicas: {}
          f:unavailableReplicas: {}
          f:updatedReplicas: {}
      manager: PrometheusOperator
      operation: Update
      subresource: status
    Copy to Clipboard Toggle word wrap

  3. Check the metadata.managedFields values, and observe that some fields in metadata and spec are managed by the MonitoringStack resource.
  4. Modify a field that is not controlled by the MonitoringStack resource:

    1. Change spec.enforcedSampleLimit, which is a field not set by the MonitoringStack resource. Create the file prom-spec-edited.yaml:

      prom-spec-edited.yaml

      apiVersion: monitoring.rhobs/v1
      kind: Prometheus
      metadata:
        name: sample-monitoring-stack
        namespace: coo-demo
      spec:
        enforcedSampleLimit: 1000
      Copy to Clipboard Toggle word wrap

    2. Apply the YAML by running the following command:

      $ oc apply -f ./prom-spec-edited.yaml --server-side
      Copy to Clipboard Toggle word wrap
      Note

      You must use the --server-side flag.

    3. Get the changed Prometheus object and note that there is one more section in managedFields which has spec.enforcedSampleLimit:

      $ oc get prometheus -n coo-demo
      Copy to Clipboard Toggle word wrap

      Example output

      managedFields: 
      1
      
      - apiVersion: monitoring.rhobs/v1
        fieldsType: FieldsV1
        fieldsV1:
          f:metadata:
            f:labels:
              f:app.kubernetes.io/managed-by: {}
              f:app.kubernetes.io/name: {}
              f:app.kubernetes.io/part-of: {}
          f:spec:
            f:enforcedSampleLimit: {} 
      2
      
        manager: kubectl
        operation: Apply
      Copy to Clipboard Toggle word wrap

      1
      managedFields
      2
      spec.enforcedSampleLimit
  5. Modify a field that is managed by the MonitoringStack resource:

    1. Change spec.LogLevel, which is a field managed by the MonitoringStack resource, using the following YAML configuration:

      # changing the logLevel from debug to info
      apiVersion: monitoring.rhobs/v1
      kind: Prometheus
      metadata:
        name: sample-monitoring-stack
        namespace: coo-demo
      spec:
        logLevel: info 
      1
      Copy to Clipboard Toggle word wrap
      1
      spec.logLevel has been added
    2. Apply the YAML by running the following command:

      $ oc apply -f ./prom-spec-edited.yaml --server-side
      Copy to Clipboard Toggle word wrap

      Example output

      error: Apply failed with 1 conflict: conflict with "observability-operator": .spec.logLevel
      Please review the fields above--they currently have other managers. Here
      are the ways you can resolve this warning:
      * If you intend to manage all of these fields, please re-run the apply
        command with the `--force-conflicts` flag.
      * If you do not intend to manage all of the fields, please edit your
        manifest to remove references to the fields that should keep their
        current managers.
      * You may co-own fields by updating your manifest to match the existing
        value; in this case, you'll become the manager if the other manager(s)
        stop managing the field (remove it from their configuration).
      See https://kubernetes.io/docs/reference/using-api/server-side-apply/#conflicts
      Copy to Clipboard Toggle word wrap

    3. Notice that the field spec.logLevel cannot be changed using Server-Side Apply, because it is already managed by observability-operator.
    4. Use the --force-conflicts flag to force the change.

      $ oc apply -f ./prom-spec-edited.yaml --server-side --force-conflicts
      Copy to Clipboard Toggle word wrap

      Example output

      prometheus.monitoring.rhobs/sample-monitoring-stack serverside-applied
      Copy to Clipboard Toggle word wrap

      With --force-conflicts flag, the field can be forced to change, but since the same field is also managed by the MonitoringStack resource, the Observability Operator detects the change, and reverts it back to the value set by the MonitoringStack resource.

      Note

      Some Prometheus fields generated by the MonitoringStack resource are influenced by the fields in the MonitoringStack spec stanza, for example, logLevel. These can be changed by changing the MonitoringStack spec.

    5. To change the logLevel in the Prometheus object, apply the following YAML to change the MonitoringStack resource:

      apiVersion: monitoring.rhobs/v1alpha1
      kind: MonitoringStack
      metadata:
        name: sample-monitoring-stack
        labels:
          coo: example
      spec:
        logLevel: info
      Copy to Clipboard Toggle word wrap
    6. To confirm that the change has taken place, query for the log level by running the following command:

      $ oc -n coo-demo get Prometheus.monitoring.rhobs -o=jsonpath='{.items[0].spec.logLevel}'
      Copy to Clipboard Toggle word wrap

      Example output

      info
      Copy to Clipboard Toggle word wrap

Note
  1. If a new version of an Operator generates a field that was previously generated and controlled by an actor, the value set by the actor will be overridden.

    For example, you are managing a field enforcedSampleLimit which is not generated by the MonitoringStack resource. If the Observability Operator is upgraded, and the new version of the Operator generates a value for enforcedSampleLimit, this will overide the value you have previously set.

  2. The Prometheus object generated by the MonitoringStack resource may contain some fields which are not explicitly set by the monitoring stack. These fields appear because they have default values.

As a cluster administrator, you can install or remove the Cluster Observability Operator (COO) from OperatorHub by using the OpenShift Container Platform web console. OperatorHub is a user interface that works in conjunction with Operator Lifecycle Manager (OLM), which installs and manages Operators on a cluster.

Install the Cluster Observability Operator (COO) from OperatorHub by using the OpenShift Container Platform web console.

Prerequisites

  • You have access to the cluster as a user with the cluster-admin cluster role.
  • You have logged in to the OpenShift Container Platform web console.

Procedure

  1. In the OpenShift Container Platform web console, click OperatorsOperatorHub.
  2. Type cluster observability operator in the Filter by keyword box.
  3. Click Cluster Observability Operator in the list of results.
  4. Read the information about the Operator, and configure the following installation settings:

    • Update channelstable
    • Version1.0.0 or later
    • Installation modeAll namespaces on the cluster (default)
    • Installed NamespaceOperator recommended Namespace: openshift-cluster-observability-operator
    • Select Enable Operator recommended cluster monitoring on this Namespace
    • Update approvalAutomatic
  5. Optional: You can change the installation settings to suit your requirements. For example, you can select to subscribe to a different update channel, to install an older released version of the Operator, or to require manual approval for updates to new versions of the Operator.
  6. Click Install.

Verification

  • Go to OperatorsInstalled Operators, and verify that the Cluster Observability Operator entry appears in the list.

Additional resources

Adding Operators to a cluster

If you have installed the Cluster Observability Operator (COO) by using OperatorHub, you can uninstall it in the OpenShift Container Platform web console.

Prerequisites

  • You have access to the cluster as a user with the cluster-admin cluster role.
  • You have logged in to the OpenShift Container Platform web console.

Procedure

  1. Go to OperatorsInstalled Operators.
  2. Locate the Cluster Observability Operator entry in the list.
  3. Click kebab for this entry and select Uninstall Operator.

Verification

  • Go to OperatorsInstalled Operators, and verify that the Cluster Observability Operator entry no longer appears in the list.

You can monitor metrics for a service by configuring monitoring stacks managed by the Cluster Observability Operator (COO).

To test monitoring a service, follow these steps:

  • Deploy a sample service that defines a service endpoint.
  • Create a ServiceMonitor object that specifies how the service is to be monitored by the COO.
  • Create a MonitoringStack object to discover the ServiceMonitor object.

This configuration deploys a sample service named prometheus-coo-example-app in the user-defined ns1-coo project. The service exposes the custom version metric.

Prerequisites

  • You have access to the cluster as a user with the cluster-admin cluster role or as a user with administrative permissions for the namespace.

Procedure

  1. Create a YAML file named prometheus-coo-example-app.yaml that contains the following configuration details for a namespace, deployment, and service:

    apiVersion: v1
    kind: Namespace
    metadata:
      name: ns1-coo
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      labels:
        app: prometheus-coo-example-app
      name: prometheus-coo-example-app
      namespace: ns1-coo
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: prometheus-coo-example-app
      template:
        metadata:
          labels:
            app: prometheus-coo-example-app
        spec:
          containers:
          - image: ghcr.io/rhobs/prometheus-example-app:0.4.2
            imagePullPolicy: IfNotPresent
            name: prometheus-coo-example-app
    ---
    apiVersion: v1
    kind: Service
    metadata:
      labels:
        app: prometheus-coo-example-app
      name: prometheus-coo-example-app
      namespace: ns1-coo
    spec:
      ports:
      - port: 8080
        protocol: TCP
        targetPort: 8080
        name: web
      selector:
        app: prometheus-coo-example-app
      type: ClusterIP
    Copy to Clipboard Toggle word wrap
  2. Save the file.
  3. Apply the configuration to the cluster by running the following command:

    $ oc apply -f prometheus-coo-example-app.yaml
    Copy to Clipboard Toggle word wrap
  4. Verify that the pod is running by running the following command and observing the output:

    $ oc -n ns1-coo get pod
    Copy to Clipboard Toggle word wrap

    Example output

    NAME                                      READY     STATUS    RESTARTS   AGE
    prometheus-coo-example-app-0927545cb7-anskj   1/1       Running   0          81m
    Copy to Clipboard Toggle word wrap

To use the metrics exposed by the sample service you created in the "Deploying a sample service for Cluster Observability Operator" section, you must configure monitoring components to scrape metrics from the /metrics endpoint.

You can create this configuration by using a ServiceMonitor object that specifies how the service is to be monitored, or a PodMonitor object that specifies how a pod is to be monitored. The ServiceMonitor object requires a Service object. The PodMonitor object does not, which enables the MonitoringStack object to scrape metrics directly from the metrics endpoint exposed by a pod.

This procedure shows how to create a ServiceMonitor object for a sample service named prometheus-coo-example-app in the ns1-coo namespace.

Prerequisites

  • You have access to the cluster as a user with the cluster-admin cluster role or as a user with administrative permissions for the namespace.
  • You have installed the Cluster Observability Operator.
  • You have deployed the prometheus-coo-example-app sample service in the ns1-coo namespace.

    Note

    The prometheus-coo-example-app sample service does not support TLS authentication.

Procedure

  1. Create a YAML file named example-coo-app-service-monitor.yaml that contains the following ServiceMonitor object configuration details:

    apiVersion: monitoring.rhobs/v1
    kind: ServiceMonitor
    metadata:
      labels:
        k8s-app: prometheus-coo-example-monitor
      name: prometheus-coo-example-monitor
      namespace: ns1-coo
    spec:
      endpoints:
      - interval: 30s
        port: web
        scheme: http
      selector:
        matchLabels:
          app: prometheus-coo-example-app
    Copy to Clipboard Toggle word wrap

    This configuration defines a ServiceMonitor object that the MonitoringStack object will reference to scrape the metrics data exposed by the prometheus-coo-example-app sample service.

  2. Apply the configuration to the cluster by running the following command:

    $ oc apply -f example-coo-app-service-monitor.yaml
    Copy to Clipboard Toggle word wrap
  3. Verify that the ServiceMonitor resource is created by running the following command and observing the output:

    $ oc -n ns1-coo get servicemonitors.monitoring.rhobs
    Copy to Clipboard Toggle word wrap

    Example output

    NAME                         AGE
    prometheus-coo-example-monitor   81m
    Copy to Clipboard Toggle word wrap

To scrape the metrics data exposed by the target prometheus-coo-example-app service, create a MonitoringStack object that references the ServiceMonitor object you created in the "Specifying how a service is monitored for Cluster Observability Operator" section. This MonitoringStack object can then discover the service and scrape the exposed metrics data from it.

Prerequisites

  • You have access to the cluster as a user with the cluster-admin cluster role or as a user with administrative permissions for the namespace.
  • You have installed the Cluster Observability Operator.
  • You have deployed the prometheus-coo-example-app sample service in the ns1-coo namespace.
  • You have created a ServiceMonitor object named prometheus-coo-example-monitor in the ns1-coo namespace.

Procedure

  1. Create a YAML file for the MonitoringStack object configuration. For this example, name the file example-coo-monitoring-stack.yaml.
  2. Add the following MonitoringStack object configuration details:

    Example MonitoringStack object

    apiVersion: monitoring.rhobs/v1alpha1
    kind: MonitoringStack
    metadata:
      name: example-coo-monitoring-stack
      namespace: ns1-coo
    spec:
      logLevel: debug
      retention: 1d
      resourceSelector:
        matchLabels:
          k8s-app: prometheus-coo-example-monitor
    Copy to Clipboard Toggle word wrap

  3. Apply the MonitoringStack object by running the following command:

    $ oc apply -f example-coo-monitoring-stack.yaml
    Copy to Clipboard Toggle word wrap
  4. Verify that the MonitoringStack object is available by running the following command and inspecting the output:

    $ oc -n ns1-coo get monitoringstack
    Copy to Clipboard Toggle word wrap

    Example output

    NAME                         AGE
    example-coo-monitoring-stack   81m
    Copy to Clipboard Toggle word wrap

  5. Run the following comand to retrieve information about the active targets from Prometheus and filter the output to list only targets labeled with app=prometheus-coo-example-app. This verifies which targets are discovered and actively monitored by Prometheus with this specific label.

    $ oc -n ns1-coo exec -c prometheus prometheus-example-coo-monitoring-stack-0 -- curl -s 'http://localhost:9090/api/v1/targets' | jq '.data.activeTargets[].discoveredLabels | select(.__meta_kubernetes_endpoints_label_app=="prometheus-coo-example-app")'
    Copy to Clipboard Toggle word wrap

    Example output

    {
      "__address__": "10.129.2.25:8080",
      "__meta_kubernetes_endpoint_address_target_kind": "Pod",
      "__meta_kubernetes_endpoint_address_target_name": "prometheus-coo-example-app-5d8cd498c7-9j2gj",
      "__meta_kubernetes_endpoint_node_name": "ci-ln-8tt8vxb-72292-6cxjr-worker-a-wdfnz",
      "__meta_kubernetes_endpoint_port_name": "web",
      "__meta_kubernetes_endpoint_port_protocol": "TCP",
      "__meta_kubernetes_endpoint_ready": "true",
      "__meta_kubernetes_endpoints_annotation_endpoints_kubernetes_io_last_change_trigger_time": "2024-11-05T11:24:09Z",
      "__meta_kubernetes_endpoints_annotationpresent_endpoints_kubernetes_io_last_change_trigger_time": "true",
      "__meta_kubernetes_endpoints_label_app": "prometheus-coo-example-app",
      "__meta_kubernetes_endpoints_labelpresent_app": "true",
      "__meta_kubernetes_endpoints_name": "prometheus-coo-example-app",
      "__meta_kubernetes_namespace": "ns1-coo",
      "__meta_kubernetes_pod_annotation_k8s_ovn_org_pod_networks": "{\"default\":{\"ip_addresses\":[\"10.129.2.25/23\"],\"mac_address\":\"0a:58:0a:81:02:19\",\"gateway_ips\":[\"10.129.2.1\"],\"routes\":[{\"dest\":\"10.128.0.0/14\",\"nextHop\":\"10.129.2.1\"},{\"dest\":\"172.30.0.0/16\",\"nextHop\":\"10.129.2.1\"},{\"dest\":\"100.64.0.0/16\",\"nextHop\":\"10.129.2.1\"}],\"ip_address\":\"10.129.2.25/23\",\"gateway_ip\":\"10.129.2.1\",\"role\":\"primary\"}}",
      "__meta_kubernetes_pod_annotation_k8s_v1_cni_cncf_io_network_status": "[{\n    \"name\": \"ovn-kubernetes\",\n    \"interface\": \"eth0\",\n    \"ips\": [\n        \"10.129.2.25\"\n    ],\n    \"mac\": \"0a:58:0a:81:02:19\",\n    \"default\": true,\n    \"dns\": {}\n}]",
      "__meta_kubernetes_pod_annotation_openshift_io_scc": "restricted-v2",
      "__meta_kubernetes_pod_annotation_seccomp_security_alpha_kubernetes_io_pod": "runtime/default",
      "__meta_kubernetes_pod_annotationpresent_k8s_ovn_org_pod_networks": "true",
      "__meta_kubernetes_pod_annotationpresent_k8s_v1_cni_cncf_io_network_status": "true",
      "__meta_kubernetes_pod_annotationpresent_openshift_io_scc": "true",
      "__meta_kubernetes_pod_annotationpresent_seccomp_security_alpha_kubernetes_io_pod": "true",
      "__meta_kubernetes_pod_controller_kind": "ReplicaSet",
      "__meta_kubernetes_pod_controller_name": "prometheus-coo-example-app-5d8cd498c7",
      "__meta_kubernetes_pod_host_ip": "10.0.128.2",
      "__meta_kubernetes_pod_ip": "10.129.2.25",
      "__meta_kubernetes_pod_label_app": "prometheus-coo-example-app",
      "__meta_kubernetes_pod_label_pod_template_hash": "5d8cd498c7",
      "__meta_kubernetes_pod_labelpresent_app": "true",
      "__meta_kubernetes_pod_labelpresent_pod_template_hash": "true",
      "__meta_kubernetes_pod_name": "prometheus-coo-example-app-5d8cd498c7-9j2gj",
      "__meta_kubernetes_pod_node_name": "ci-ln-8tt8vxb-72292-6cxjr-worker-a-wdfnz",
      "__meta_kubernetes_pod_phase": "Running",
      "__meta_kubernetes_pod_ready": "true",
      "__meta_kubernetes_pod_uid": "054c11b6-9a76-4827-a860-47f3a4596871",
      "__meta_kubernetes_service_label_app": "prometheus-coo-example-app",
      "__meta_kubernetes_service_labelpresent_app": "true",
      "__meta_kubernetes_service_name": "prometheus-coo-example-app",
      "__metrics_path__": "/metrics",
      "__scheme__": "http",
      "__scrape_interval__": "30s",
      "__scrape_timeout__": "10s",
      "job": "serviceMonitor/ns1-coo/prometheus-coo-example-monitor/0"
    }
    Copy to Clipboard Toggle word wrap

    Note

    The above example uses jq command-line JSON processor to format the output for convenience.

4.4. Validating the monitoring stack

To validate that the monitoring stack is working correctly, access the example service and then view the gathered metrics.

Prerequisites

  • You have access to the cluster as a user with the cluster-admin cluster role or as a user with administrative permissions for the namespace.
  • You have installed the Cluster Observability Operator.
  • You have deployed the prometheus-coo-example-app sample service in the ns1-coo namespace.
  • You have created a ServiceMonitor object named prometheus-coo-example-monitor in the ns1-coo namespace.
  • You have created a MonitoringStack object named example-coo-monitoring-stack in the ns1-coo namespace.

Procedure

  1. Create a route to expose the example prometheus-coo-example-app service. From your terminal, run the command:

    $ oc expose svc prometheus-coo-example-app -n ns1-coo
    Copy to Clipboard Toggle word wrap
  2. Access the route from your browser, or command line, to generate metrics.
  3. Execute a query on the Prometheus pod to return the total HTTP requests metric:

    $ oc -n ns1-coo exec -c prometheus prometheus-example-coo-monitoring-stack-0 -- curl -s 'http://localhost:9090/api/v1/query?query=http_requests_total'
    Copy to Clipboard Toggle word wrap

    Example output (formatted using jq for convenience)

    {
      "status": "success",
      "data": {
        "resultType": "vector",
        "result": [
          {
            "metric": {
              "__name__": "http_requests_total",
              "code": "200",
              "endpoint": "web",
              "instance": "10.129.2.25:8080",
              "job": "prometheus-coo-example-app",
              "method": "get",
              "namespace": "ns1-coo",
              "pod": "prometheus-coo-example-app-5d8cd498c7-9j2gj",
              "service": "prometheus-coo-example-app"
            },
            "value": [
              1730807483.632,
              "3"
            ]
          },
          {
            "metric": {
              "__name__": "http_requests_total",
              "code": "404",
              "endpoint": "web",
              "instance": "10.129.2.25:8080",
              "job": "prometheus-coo-example-app",
              "method": "get",
              "namespace": "ns1-coo",
              "pod": "prometheus-coo-example-app-5d8cd498c7-9j2gj",
              "service": "prometheus-coo-example-app"
            },
            "value": [
              1730807483.632,
              "0"
            ]
          }
        ]
      }
    }
    Copy to Clipboard Toggle word wrap

4.5. Scrape targets in multiple namespaces

To scrape targets in multiple namespaces, set the namespace and resource selector in the MonitoringStack object.

Prerequisites

  • You have access to the cluster as a user with the cluster-admin cluster role or as a user with administrative permissions for the namespace.
  • You have installed the Cluster Observability Operator.

Procedure

  1. Deploy the following namespace object and MonitoringStack YAML file:

    Example MonitoringStack

    apiVersion: v1
    kind: Namespace
    metadata:
      name: ns1-coo
      labels:
        monitoring.rhobs/stack: multi-ns
    ---
    apiVersion: monitoring.rhobs/v1alpha1
    kind: MonitoringStack
    metadata:
      name: example-coo-monitoring-stack
      namespace: ns1-coo
    spec:
      logLevel: debug
      retention: 1d
      resourceSelector:
          matchLabels:
            k8s-app: prometheus-coo-example-monitor
      namespaceSelector:
          matchLabels:
            monitoring.rhobs/stack: multi-ns
    Copy to Clipboard Toggle word wrap

  2. Deploy a sample application in the namespace ns1-coo, with an alert that is always firing:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      labels:
        app: prometheus-coo-example-app
      name: prometheus-coo-example-app
      namespace: ns1-coo
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: prometheus-coo-example-app
      template:
        metadata:
          labels:
            app: prometheus-coo-example-app
        spec:
          containers:
          - image: ghcr.io/rhobs/prometheus-example-app:0.4.2
            imagePullPolicy: IfNotPresent
            name: prometheus-coo-example-app
    ---
    apiVersion: v1
    kind: Service
    metadata:
      labels:
        app: prometheus-coo-example-app
      name: prometheus-coo-example-app
      namespace: ns1-coo
    spec:
      ports:
      - port: 8080
        protocol: TCP
        targetPort: 8080
        name: web
      selector:
        app: prometheus-coo-example-app
      type: ClusterIP
    ---
    apiVersion: monitoring.rhobs/v1
    kind: ServiceMonitor
    metadata:
      labels:
        k8s-app: prometheus-coo-example-monitor
      name: prometheus-coo-example-monitor
      namespace: ns1-coo
    spec:
      endpoints:
      - interval: 30s
        port: web
        scheme: http
      selector:
        matchLabels:
          app: prometheus-coo-example-app
    ---
    apiVersion: monitoring.rhobs/v1
    kind: PrometheusRule
    metadata:
      name: example-alert
      namespace: ns1-coo
      labels:
        k8s-app: prometheus-coo-example-monitor
    spec:
      groups:
      - name: example
        rules:
        - alert: VersionAlert
          for: 1m
          expr: version{job="prometheus-coo-example-app"} > 0
          labels:
            severity: warning
    Copy to Clipboard Toggle word wrap
  3. Deploy the same example application in another namespace labeled with monitoring.rhobs/stack: multi-ns:

    apiVersion: v1
    kind: Namespace
    metadata:
      name: ns2-coo
      labels:
        monitoring.rhobs/stack: multi-ns
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      labels:
        app: prometheus-coo-example-app
      name: prometheus-coo-example-app
      namespace: ns2-coo
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: prometheus-coo-example-app
      template:
        metadata:
          labels:
            app: prometheus-coo-example-app
        spec:
          containers:
          - image: ghcr.io/rhobs/prometheus-example-app:0.4.2
            imagePullPolicy: IfNotPresent
            name: prometheus-coo-example-app
    ---
    apiVersion: v1
    kind: Service
    metadata:
      labels:
        app: prometheus-coo-example-app
      name: prometheus-coo-example-app
      namespace: ns2-coo
    spec:
      ports:
      - port: 8080
        protocol: TCP
        targetPort: 8080
        name: web
      selector:
        app: prometheus-coo-example-app
      type: ClusterIP
    ---
    apiVersion: monitoring.rhobs/v1
    kind: ServiceMonitor
    metadata:
      labels:
        k8s-app: prometheus-coo-example-monitor
      name: prometheus-coo-example-monitor
      namespace: ns2-coo
    spec:
      endpoints:
      - interval: 30s
        port: web
        scheme: http
      selector:
        matchLabels:
          app: prometheus-coo-example-app
    Copy to Clipboard Toggle word wrap

Verification

  1. Verify that the Prometheus instance adds new targets and that the alert are firing. Use a port-forward command to expose the Prometheus or the Alertmanager user interface that has been deployed by the Monitoringstack instance.

    Prometheus

    $ oc port-forward -n ns1-coo pod/prometheus-example-coo-monitoring-stack-0 9090
    Copy to Clipboard Toggle word wrap

    Alertmanager

    $ oc port-forward -n ns1-coo pod/alertmanager-example-coo-monitoring-stack-0 9093
    Copy to Clipboard Toggle word wrap

  2. Verify that the targets are being scraped and that the alerts are firing by browsing to http://localhost:9090/targets or http://localhost:9093/#/alerts.

Chapter 5. Observability UI plugins

5.1. Observability UI plugins overview

You can use the Cluster Observability Operator (COO) to install and manage UI plugins to enhance the observability capabilities of the OpenShift Container Platform web console. The plugins extend the default functionality, providing new UI features for troubleshooting, distributed tracing, and cluster logging.

5.1.1. Monitoring

The monitoring UI plugin adds monitoring related UI features to the OpenShift web console, for the Advance Cluster Management (ACM) perspective.

  • ACM: The monitoring plugin in Cluster Observability Operator (COO) allows it to function in Red Hat Advanced Cluster Management (RHACM) environments, providing ACM with the same monitoring capabilities as OpenShift Container Platform.

For more information, see the monitoring UI plugin page.

5.1.2. Cluster logging

The logging UI plugin surfaces logging data in the web console on the ObserveLogs page. You can specify filters, queries, time ranges and refresh rates. The results displayed a list of collapsed logs, which can then be expanded to show more detailed information for each log.

For more information, see the logging UI plugin page.

5.1.3. Troubleshooting

Important

The Cluster Observability Operator troubleshooting panel UI plugin is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

The troubleshooting panel UI plugin for OpenShift Container Platform version 4.16+ provides observability signal correlation, powered by the open source Korrel8r project. You can use the troubleshooting panel available from the ObserveAlerting page to easily correlate metrics, logs, alerts, netflows, and additional observability signals and resources, across different data stores. Users of OpenShift Container Platform version 4.17+ can also access the troubleshooting UI panel from the Application Launcher app launcher .

The output of Korrel8r is displayed as an interactive node graph. When you click on a node, you are automatically redirected to the corresponding web console page with the specific information for that node, for example, metric, log, or pod.

For more information, see the troubleshooting UI plugin page.

5.1.4. Distributed tracing

The distributed tracing UI plugin adds tracing-related features to the web console on the ObserveTraces page. You can follow requests through the front end and into the backend of microservices, helping you identify code errors and performance bottlenecks in distributed systems. You can select a supported TempoStack or TempoMonolithic multi-tenant instance running in the cluster and set a time range and query to view the trace data.

For more information, see the distributed tracing UI plugin page.

5.2. Monitoring UI plugin

Important

The Cluster Observability Operator monitoring UI plugin is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

The monitoring UI plugin adds monitoring features to the Administrator perspective of the OpenShift web console.

  • RHACM: The monitoring plugin in Cluster Observability Operator (COO) allows it to function in Red Hat Advanced Cluster Management (RHACM) environments, providing RHACM with the same alerting capabilities as OpenShift Container Platform. You can configure the plugin to fetch alerts from the RHACM Alertmanager backend. This enables seamless integration and user experience by aligning RHACM and OpenShift Container Platform monitoring workflows.

The monitoring UI plugin adds monitoring related UI features to the OpenShift web console, for the Advance Cluster Management (ACM) perspective.

Prerequisites

  • You have access to the cluster as a user with the cluster-admin cluster role.
  • You have logged in to the OpenShift Container Platform web console.
  • You have installed the Cluster Observability Operator

Procedure

  1. In the OpenShift Container Platform web console, click OperatorsInstalled Operators and select Cluster Observability Operator
  2. Choose the UI Plugin tab (at the far right of the tab list) and press Create UIPlugin
  3. Select YAML view, enter the following content, and then press Create:

    apiVersion: observability.openshift.io/v1alpha1
    kind: UIPlugin
    metadata:
      name: monitoring
    spec:
      type: Monitoring
      monitoring:
        acm: 
    1
    
          enabled: true
          alertmanager:
            url: 'https://alertmanager.open-cluster-management-observability.svc:9095'
          thanosQuerier:
            url: 'https://rbac-query-proxy.open-cluster-management-observability.svc:8443'
    Copy to Clipboard Toggle word wrap
    1
    Enable RHACM features. You must configure the Alertmanager and ThanosQuerier Service endpoints.

5.3. Logging UI plugin

The logging UI plugin surfaces logging data in the OpenShift Container Platform web console on the ObserveLogs page. You can specify filters, queries, time ranges and refresh rates, with the results displayed as a list of collapsed logs, which can then be expanded to show more detailed information for each log.

When you have also deployed the Troubleshooting UI plugin on OpenShift Container Platform version 4.16+, it connects to the Korrel8r service and adds direct links from the Administration perspective, from the ObserveLogs page, to the ObserveMetrics page with a correlated PromQL query. It also adds a See Related Logs link from the Administration perspective alerting detail page, at ObserveAlerting, to the ObserveLogs page with a correlated filter set selected.

The features of the plugin are categorized as:

dev-console
Adds the logging view to the Developer perspective.
alerts
Merges the web console alerts with log-based alerts defined in the Loki ruler. Adds a log-based metrics chart in the alert detail view.
dev-alerts
Merges the web console alerts with log-based alerts defined in the Loki ruler. Adds a log-based metrics chart in the alert detail view for the Developer perspective.

For Cluster Observability Operator (COO) versions, the support for these features in OpenShift Container Platform versions is shown in the following table:

Expand
COO versionOCP versionsFeatures

0.3.0+

4.12

dev-console

0.3.0+

4.13

dev-console, alerts

0.3.0+

4.14+

dev-console, alerts, dev-alerts

Prerequisites

  • You have access to the cluster as a user with the cluster-admin role.
  • You have logged in to the OpenShift Container Platform web console.
  • You have installed the Cluster Observability Operator.
  • You have a LokiStack instance in your cluster.

Procedure

  1. In the OpenShift Container Platform web console, click OperatorsInstalled Operators and select Cluster Observability Operator.
  2. Choose the UI Plugin tab (at the far right of the tab list) and click Create UIPlugin.
  3. Select YAML view, enter the following content, and then click Create:

    apiVersion: observability.openshift.io/v1alpha1
    kind: UIPlugin
    metadata:
      name: logging
    spec:
      type: Logging
      logging:
        lokiStack:
          name: logging-loki
        logsLimit: 50
        timeout: 30s
        schema: otel 
    1
    Copy to Clipboard Toggle word wrap
    1
    schema is one of otel, viaq, or select. The default is viaq if no value is specified. When you choose select, you can select the mode in the UI when you run a query.
    Note

    These are the known issues for the logging UI plugin - for more information, see OU-587.

    • The schema feature is only supported in OpenShift Container Platform 4.15 and later. In earlier versions of OpenShift Container Platform, the logging UI plugin will only use the viaq attribute, ignoring any other values that might be set.
    • Non-administrator users cannot query logs using the otel attribute with logging for Red Hat OpenShift versions 5.8 to 6.2. This issue will be fixed in a future logging release. (LOG-6589)
    • In logging for Red Hat OpenShift version 5.9, the severity_text Otel attribute is not set.

5.4. Distributed tracing UI plugin

The distributed tracing UI plugin adds tracing-related features to the Administrator perspective of the OpenShift web console at ObserveTraces. You can follow requests through the front end and into the backend of microservices, helping you identify code errors and performance bottlenecks in distributed systems.

Prerequisites

  • You have access to the cluster as a user with the cluster-admin cluster role.
  • You have logged in to the OpenShift Container Platform web console.
  • You have installed the Cluster Observability Operator

Procedure

  1. In the OpenShift Container Platform web console, click OperatorsInstalled Operators and select Cluster Observability Operator
  2. Choose the UI Plugin tab (at the far right of the tab list) and press Create UIPlugin
  3. Select YAML view, enter the following content, and then press Create:

    apiVersion: observability.openshift.io/v1alpha1
    kind: UIPlugin
    metadata:
      name: distributed-tracing
    spec:
      type: DistributedTracing
    Copy to Clipboard Toggle word wrap

Prerequisites

  • You have access to the cluster as a user with the cluster-admin cluster role.
  • You have logged in to the OpenShift Container Platform web console.
  • You have installed the Cluster Observability Operator.
  • You have installed the Cluster Observability Operator distributed tracing UI plugin.
  • You have a TempoStack or TempoMonolithic multi-tenant instance in the cluster.

Procedure

  1. In the Administrator perspective of the OpenShift Container Platform web console, click ObserveTraces.
  2. Select a TempoStack or TempoMonolithic multi-tenant instance and set a time range and query for the traces to be loaded.

    The traces are displayed on a scatter-plot showing the trace start time, duration, and number of spans. Underneath the scatter plot, there is a list of traces showing information such as the Trace Name, number of Spans, and Duration.

  3. Click on a trace name link.

    The trace detail page for the selected trace contains a Gantt Chart of all of the spans within the trace. Select a span to show a breakdown of the configured attributes.

5.5. Troubleshooting UI plugin

Important

The Cluster Observability Operator troubleshooting panel UI plugin is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

The troubleshooting UI plugin for OpenShift Container Platform version 4.16+ provides observability signal correlation, powered by the open source Korrel8r project. With the troubleshooting panel that is available under ObserveAlerting, you can easily correlate metrics, logs, alerts, netflows, and additional observability signals and resources, across different data stores. Users of OpenShift Container Platform version 4.17+ can also access the troubleshooting UI panel from the Application Launcher app launcher .

When you install the troubleshooting UI plugin, a Korrel8r service named korrel8r is deployed in the same namespace, and it is able to locate related observability signals and Kubernetes resources from its correlation engine.

The output of Korrel8r is displayed in the form of an interactive node graph in the OpenShift Container Platform web console. Nodes in the graph represent a type of resource or signal, while edges represent relationships. When you click on a node, you are automatically redirected to the corresponding web console page with the specific information for that node, for example, metric, log, pod.

Prerequisites

  • You have access to the cluster as a user with the cluster-admin cluster role.
  • You have logged in to the OpenShift Container Platform web console.
  • You have installed the Cluster Observability Operator

Procedure

  1. In the OpenShift Container Platform web console, click OperatorsInstalled Operators and select Cluster Observability Operator
  2. Choose the UI Plugin tab (at the far right of the tab list) and press Create UIPlugin
  3. Select YAML view, enter the following content, and then press Create:

    apiVersion: observability.openshift.io/v1alpha1
    kind: UIPlugin
    metadata:
      name: troubleshooting-panel
    spec:
      type: TroubleshootingPanel
    Copy to Clipboard Toggle word wrap

Prerequisites

  • You have access to the OpenShift Container Platform cluster as a user with the cluster-admin cluster role. If your cluster version is 4.17+, you can access the troubleshooting UI panel from the Application Launcher app launcher .
  • You have logged in to the OpenShift Container Platform web console.
  • You have installed OpenShift Container Platform Logging, if you want to visualize correlated logs.
  • You have installed OpenShift Container Platform Network Observability, if you want to visualize correlated netflows.
  • You have installed the Cluster Observability Operator.
  • You have installed the Cluster Observability Operator troubleshooting UI plugin.

    Note

    The troubleshooting panel relies on the observability signal stores installed in your cluster. Kuberenetes resources, alerts and metrics are always available by default in an OpenShift Container Platform cluster. Other signal types require optional components to be installed:

    • Logs: Red Hat Openshift Logging (collection) and Loki Operator provided by Red Hat (store)
    • Network events: Network observability provided by Red Hat (collection) and Loki Operator provided by Red Hat (store)

    Procedure

    1. In the admin perspective of the web console, navigate to ObserveAlerting and then select an alert. If the alert has correlated items, a Troubleshooting Panel link will appear above the chart on the alert detail page.

      Click on the Troubleshooting Panel link to display the panel.

    2. The panel consists of query details and a topology graph of the query results. The selected alert is converted into a Korrel8r query string and sent to the korrel8r service. The results are displayed as a graph network connecting the returned signals and resources. This is a neighbourhood graph, starting at the current resource and including related objects up to 3 steps away from the starting point. Clicking on nodes in the graph takes you to the corresponding web console pages for those resouces.
    3. You can use the troubleshooting panel to find resources relating to the chosen alert.

      Note

      Clicking on a node may sometimes show fewer results than indicated on the graph. This is a known issue that will be addressed in a future release.

      1. Alert (1): This node is the starting point in the graph and represents the KubeContainerWaiting alert displayed in the web console.
      2. Pod (1): This node indicates that there is a single Pod resource associated with this alert. Clicking on this node will open a console search showing the related pod directly.
      3. Event (2): There are two Kuberenetes events associated with the pod. Click this node to see the events.
      4. Logs (74): This pod has 74 lines of logs, which you can access by clicking on this node.
      5. Metrics (105): There are many metrics associated with the pod.
      6. Network (6): There are network events, meaning the pod has communicated over the network. The remaining nodes in the graph represent the Service, Deployment and DaemonSet resources that the pod has communicated with.
      7. Focus: Clicking this button updates the graph. By default, the graph itself does not change when you click on nodes in the graph. Instead, the main web console page changes, and you can then navigate to other resources using links on the page, while the troubleshooting panel itself stays open and unchanged. To force an update to the graph in the troubleshooting panel, click Focus. This draws a new graph, using the current resource in the web console as the starting point.
      8. Show Query: Clicking this button enables some experimental features:

        1. Hide Query hides the experimental features.
        2. The query that identifies the starting point for the graph. The query language, part of the Korrel8r correlation engine used to create the graphs, is experimental and may change in future. The query is updated by the Focus button to correspond to the resources in the main web console window.
        3. Neighbourhood depth is used to display a smaller or larger neighbourhood.

          Note

          Setting a large value in a large cluster might cause the query to fail, if the number of results is too big.

        4. Goal class results in a goal directed search instead of a neighbourhood search. A goal directed search shows all paths from the starting point to the goal class, which indicates a type of resource or signal. The format of the goal class is experimental and may change. Currently, the following goals are valid:

          • k8s:RESOURCE[VERSION.[GROUP]] identifying a kind of kuberenetes resource. For example k8s:Pod or k8s:Deployment.apps.v1.
          • alert:alert representing any alert.
          • metric:metric representing any metric.
          • netflow:network representing any network observability network event.
          • log:LOG_TYPE representing stored logs, where LOG_TYPE must be one of application, infrastructure or audit.

5.5.3. Creating the example alert

To trigger an alert as a starting point to use in the troubleshooting UI panel, you can deploy a container that is deliberately misconfigured.

Procedure

  1. Use the following YAML, either from the command line or in the web console, to create a broken deployment in a system namespace:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: bad-deployment
      namespace: default 
    1
    
    spec:
      selector:
        matchLabels:
          app: bad-deployment
      template:
        metadata:
          labels:
            app: bad-deployment
        spec:
          containers: 
    2
    
          - name: bad-deployment
            image: quay.io/openshift-logging/vector:5.8
    Copy to Clipboard Toggle word wrap
    1
    The deployment must be in a system namespace (such as default) to cause the desired alerts.
    2
    This container deliberately tries to start a vector server with no configuration file. The server logs a few messages, and then exits with an error. Alternatively, you can deploy any container you like that is badly configured, causing it to trigger an alert.
  2. View the alerts:

    1. Go to ObserveAlerting and click clear all filters. View the Pending alerts.

      Important

      Alerts first appear in the Pending state. They do not start Firing until the container has been crashing for some time. By viewing Pending alerts, you do not have to wait as long to see them occur.

    2. Choose one of the KubeContainerWaiting, KubePodCrashLooping, or KubePodNotReady alerts and open the troubleshooting panel by clicking on the link. Alternatively, if the panel is already open, click the "Focus" button to update the graph.

Chapter 6. Monitoring API reference

The resource types are MonitoringStack and ThanosQuerier.

6.1. MonitoringStack

MonitoringStack is the Schema for the monitoringstacks API.

Expand
NameTypeDescriptionRequired

apiVersion

string

monitoring.rhobs/v1alpha1

true

kind

string

MonitoringStack

true

metadata

object

Refer to the Kubernetes API documentation for the fields of the metadata field.

true

spec

object

MonitoringStackSpec is the specification for the desired MonitoringStack object.

true

status

object

MonitoringStackStatus defines the observed state of the MonitoringStack object. It should always be reconstructable from the state of the cluster and/or outside world.

false

6.2. MonitoringStack.spec

MonitoringStackSpec is the specification for the desired MonitoringStack object.

Expand
NameTypeDescriptionRequired

alertmanagerConfig

object

Define Alertmanager config.

Default: map[disabled:false]

false

logLevel

enum

Set log levels of configured components.

Enum: debug, info, warn, error

Default: info

false

namespaceSelector

object

Namespace selector for MonitoringStack resources.

  • To monitor everything, set to empty map selector. For example, namespaceSelector: {}.
  • To monitor resources in the namespace where MonitoringStack instance was created, set to null. For example, namespaceSelector:.

false

nodeSelector

map[string]string

Define node selector for MonitoringStack pods.

false

prometheusConfig

object

Define prometheus config.

Default: map[replicas:2]

false

resourceSelector

object

Label selector for MonitoringStack resources.

  • To monitor everything, set to empty map selector. For example, resourceSelector: {}.
  • To disable service discovery, set to null. For example, resourceSelector:.

false

resources

object

Define resources requests and limits for MonitoringStack pods.

Default: map[limits:map[cpu:500m memory:512Mi] requests:map[cpu:100m memory:256Mi]]

false

retention

string

Time duration to retain data. The string must match the regular expression [0-9]+(ms|s|m|h|d|w|y) (milliseconds seconds minutes hours days weeks years).

Default: 120h

false

tolerations

[]object

Define tolerations for MonitoringStack pods.

false

6.3. MonitoringStack.spec.alertmanagerConfig

Define Alertmanager config.

Expand
NameTypeDescriptionRequired

disabled

boolean

Disables the deployment of Alertmanager.

Default: false

false

webTLSConfig

object

Configure TLS options for the Alertmanager web server.

false

Configure TLS options for the Alertmanager web server.

Expand
NameTypeDescriptionRequired

certificate

object

Reference to the TLS public certificate for the web server.

true

certificateAuthority

object

Reference to the root Certificate Authority used to verify the web server’s certificate.

true

privateKey

object

Reference to the TLS private key for the web server.

true

Reference to the TLS public certificate for the web server.

Expand
NameTypeDescriptionRequired

key

string

The key of the secret to select from. Must be a valid secret key.

true

name

string

The name of the secret in the object’s namespace to select from.

true

Reference to the root Certificate Authority used to verify the web server’s certificate.

Expand
NameTypeDescriptionRequired

key

string

The key of the secret to select from. Must be a valid secret key.

true

name

string

The name of the secret in the object’s namespace to select from.

true

Reference to the TLS private key for the web server.

Expand
NameTypeDescriptionRequired

key

string

The key of the secret to select from. Must be a valid secret key.

true

name

string

The name of the secret in the object’s namespace to select from.

true

6.8. MonitoringStack.spec.namespaceSelector

Namespace selector for MonitoringStack resources.

  • To monitor everything, set to empty map selector. For example, namespaceSelector: {}.
  • To monitor resources in the namespace where the MonitoringStack instance was created, set to null. For example, namespaceSelector:.
Expand
NameTypeDescriptionRequired

matchExpressions

[]object

matchExpressions is a list of label selector requirements. The requirements are ANDed.

false

matchLabels

map[string]string

matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels map is equivalent to an element of matchExpressions, whose key field is "key", the operator is "In", and the values array contains only "value". The requirements are ANDed.

false

A label selector requirement is a selector that contains values, a key, and an operator that relates the key and values.

Expand
NameTypeDescriptionRequired

key

string

key is the label key that the selector applies to.

true

operator

string

operator represents a key’s relationship to a set of values. Valid operators are In, NotIn, Exists and DoesNotExist.

true

values

[]string

values is an array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. This array is replaced during a strategic merge patch.

false

6.10. MonitoringStack.spec.prometheusConfig

Define Prometheus configuration.

Expand
NameTypeDescriptionRequired

enableOtlpHttpReceiver

boolean

Enable Prometheus to accept OpenTelemetry Metrics via the otlp/http`protocol. Defaults to the value of `false. The resulting endpoint is /api/v1/otlp/v1/metrics.

false

enableRemoteWriteReceiver

boolean

Enable Prometheus to be used as a receiver for the Prometheus remote write protocol. Defaults to the value of false.

false

externalLabels

map[string]string

Define ExternalLabels for Prometheus.

false

persistentVolumeClaim

object

Define persistent volume claim for Prometheus.

false

remoteWrite

[]object

Define remote write for Prometheus.

false

replicas

integer

Number of replicas/pods to deploy for a Prometheus deployment.

Format: int32

Default: 2

Minimum: 0

false

scrapeInterval

string

Default interval between scrapes.

false

webTLSConfig

object

Configure TLS options for the Prometheus web server.

false

Define persistent volume claim for Prometheus.

Expand
NameTypeDescriptionRequired

accessModes

[]string

accessModes contains the desired access modes the volume should have. For more information, see Kubernetes Docs.

false

dataSource

object

dataSource field can be used to specify either:

  • An existing VolumeSnapshot object (snapshot.storage.k8s.io/VolumeSnapshot)
  • An existing PVC (PersistentVolumeClaim)

If the provisioner or an external controller can support the specified data source, it will create a new volume based on the contents of the specified data source.

When the AnyVolumeDataSource feature gate is enabled, dataSource contents will be copied to dataSourceRef, and dataSourceRef contents will be copied to dataSource when dataSourceRef.namespace is not specified. If the namespace is specified, then dataSourceRef will not be copied to dataSource.

false

dataSourceRef

object

dataSourceRef specifies the object from which to populate the volume with data, if a non-empty volume is desired. This may be any object from a non-empty API group (non core object) or a PersistentVolumeClaim object.

When this field is specified, volume binding will only succeed if the type of the specified object matches some installed volume populator or dynamic provisioner. This field will replace the functionality of the dataSource field and as such if both fields are non-empty, they must have the same value.

For backwards compatibility, when namespace is not specified in dataSourceRef, both fields (dataSource and dataSourceRef) will be set to the same value automatically if one of them is empty and the other is non-empty. When namespace is specified in dataSourceRef, dataSource isn’t set to the same value and must be empty.

There are three important differences between dataSource and dataSourceRef:

  • While dataSource only allows two specific types of objects, dataSourceRef allows any non-core object, as well as PersistentVolumeClaim objects.
  • While dataSource ignores disallowed values (dropping them), dataSourceRef preserves all values, and generates an error if a disallowed value is specified.
  • While dataSource only allows local objects, dataSourceRef allows objects in any namespaces.

Beta: Using this field requires the AnyVolumeDataSource feature gate to be enabled.

Alpha: Using the namespace field of dataSourceRef requires the CrossNamespaceVolumeDataSource feature gate to be enabled.

false

resources

object

resources represents the minimum resources the volume should have.

If RecoverVolumeExpansionFailure feature is enabled, users are allowed to specify resource requirements that are lower than the previous value but must still be higher than the capacity recorded in the status field of the claim.

For more information, see Kubernetes Docs

false

selector

object

selector is a label query over volumes to consider for binding.

false

storageClassName

string

storageClassName is the name of the StorageClass required by the claim. For more information, see Kubernetes Docs

false

volumeAttributesClassName

string

volumeAttributesClassName may be used to set the VolumeAttributesClass used by this claim.

If specified, the CSI driver will create or update the volume with the attributes defined in the corresponding VolumeAttributesClass. This has a different purpose than storageClassName, as it can be changed after the claim is created. An empty string value means that no VolumeAttributesClass will be applied to the claim but it is not allowed to reset this field to the empty string once it is set.

If unspecified and the PersistentVolumeClaim is unbound, the default VolumeAttributesClass will be set by the persistentvolume controller if it exists. If the resource referred to by volumeAttributesClass does not exist, this PersistentVolumeClaim will be set to a Pending state, as reflected by the modifyVolumeStatus field, until such as a resource exists.

For more information, see Kubernetes Docs

Beta: Using this field requires the VolumeAttributesClass feature gate to be enabled (off by default).

false

volumeMode

string

volumeMode defines what type of volume is required by the claim. Value of Filesystem is implied when not included in claim spec.

false

volumeName

string

volumeName is the binding reference to the PersistentVolume backing this claim.

false

dataSource field can be used to specify either:

  • An existing VolumeSnapshot object (snapshot.storage.k8s.io/VolumeSnapshot)
  • An existing PVC (PersistentVolumeClaim)

If the provisioner or an external controller can support the specified data source, it will create a new volume based on the contents of the specified data source. When the AnyVolumeDataSource feature gate is enabled, dataSource contents will be copied to dataSourceRef, and dataSourceRef contents will be copied to dataSource when dataSourceRef.namespace is not specified. If the namespace is specified, then dataSourceRef will not be copied to dataSource.

Expand
NameTypeDescriptionRequired

kind

string

kind is the type of resource being referenced

true

name

string

name is the name of resource being referenced

true

apiGroup

string

apiGroup is the group for the resource being referenced. If apiGroup is not specified, the specified kind must be in the core API group. For any other third-party types, apiGroup is required.

false

dataSourceRef specifies the object from which to populate the volume with data, if a non-empty volume is desired. This may be any object from a non-empty API group (non core object) or a PersistentVolumeClaim object. When this field is specified, volume binding will only succeed if the type of the specified object matches some installed volume populator or dynamic provisioner.

This field will replace the functionality of the dataSource field and as such if both fields are non-empty, they must have the same value. For backwards compatibility, when namespace isn’t specified in dataSourceRef, both fields (dataSource and dataSourceRef) will be set to the same value automatically if one of them is empty and the other is non-empty. When namespace is specified in dataSourceRef, dataSource isn’t set to the same value and must be empty.

There are three important differences between dataSource and dataSourceRef:

  • While dataSource only allows two specific types of objects, dataSourceRef allows any non-core object, as well as PersistentVolumeClaim objects.
  • While dataSource ignores disallowed values (dropping them), dataSourceRef preserves all values, and generates an error if a disallowed value is specified.
  • While dataSource only allows local objects, dataSourceRef allows objects in any namespaces.

Beta: Using this field requires the AnyVolumeDataSource feature gate to be enabled.

Alpha: Using the namespace field of dataSourceRef requires the CrossNamespaceVolumeDataSource feature gate to be enabled.

Expand
NameTypeDescriptionRequired

kind

string

kind is the type of resource being referenced

true

name

string

name is the name of resource being referenced

true

apiGroup

string

apiGroup is the group for the resource being referenced.

If apiGroup is not specified, the specified kind must be in the core API group. For any other third-party types, apiGroup is required.

false

namespace

string

namespace is the namespace of resource being referenced.

Note that when a namespace is specified, a gateway.networking.k8s.io/ReferenceGrant object is required in the referent namespace to allow that namespace’s owner to accept the reference. See the ReferenceGrant documentation for details.

Alpha: This field requires the CrossNamespaceVolumeDataSource feature gate to be enabled.

false

resources represents the minimum resources the volume should have.

If RecoverVolumeExpansionFailure feature is enabled users are allowed to specify resource requirements that are lower than the previous value but must still be higher than the capacity recorded in the status field of the claim.

For more information, see Kubernetes Docs

Expand
NameTypeDescriptionRequired

limits

map[string]int or string

Limits describes the maximum amount of compute resources allowed.

For more information, see Kubernetes Docs

false

requests

map[string]int or string

Requests describes the minimum amount of compute resources required.

If Requests is omitted for a container, it defaults to Limits if that is explicitly specified, otherwise to an implementation-defined value. Requests cannot exceed Limits.

For more information, see Kubernetes Docs

false

selector is a label query over volumes to consider for binding.

Expand
NameTypeDescriptionRequired

matchExpressions

[]object

matchExpressions is a list of label selector requirements. The requirements are ANDed.

false

matchLabels

map[string]string

matchLabels is a map of {key,value} pairs.

A single {key,value} in the matchLabels map is equivalent to an element of matchExpressions, whose key field is "key", the operator is "In", and the values array contains only "value". The requirements are ANDed.

false

A label selector requirement is a selector that contains values, a key, and an operator that relates the key and values.

Expand
NameTypeDescriptionRequired

key

string

key is the label key that the selector applies to.

true

operator

string

operator represents a key’s relationship to a set of values. Valid operators are In, NotIn, Exists and DoesNotExist.

true

values

[]string

values is an array of string values.

If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. This array is replaced during a strategic merge patch.

false

RemoteWriteSpec defines the configuration to write samples from Prometheus to a remote endpoint.

Expand
NameTypeDescriptionRequired

url

string

The URL of the endpoint to send samples to.

true

authorization

object

Authorization section for the URL.

  • It requires Prometheus >= v2.26.0.
  • Cannot be set at the same time as sigv4, basicAuth, oauth2, or azureAd.

false

azureAd

object

AzureAD for the URL.

  • It requires Prometheus >= v2.45.0.
  • Cannot be set at the same time as authorization, basicAuth, oauth2, or sigv4.

false

basicAuth

object

BasicAuth configuration for the URL.

  • Cannot be set at the same time as sigv4, authorization, oauth2, or azureAd.

false

bearerToken

string

Warning: this field should not be used because the token value appears in clear-text. Prefer using authorization.

Deprecated: this will be removed in a future release.

false

bearerTokenFile

string

File from which to read bearer token for the URL.

Deprecated: this will be removed in a future release. Prefer using authorization.

false

enableHTTP2

boolean

Whether to enable HTTP2.

false

followRedirects

boolean

Configure whether HTTP requests follow HTTP 3xx redirects.

It requires Prometheus >= v2.26.0.

false

headers

map[string]string

Custom HTTP headers to be sent along with each remote write request. Be aware that headers that are set by Prometheus itself can’t be overwritten.

It requires Prometheus >= v2.25.0.

false

messageVersion

enum

The Remote Write message’s version to use when writing to the endpoint.

  • Version1.0 corresponds to the prometheus.WriteRequest protobuf message introduced in Remote Write 1.0. *Version2.0 corresponds to the io.prometheus.write.v2.Request protobuf message introduced in Remote Write 2.0.
  • When Version2.0 is selected, Prometheus will automatically be configured to append the metadata of scraped metrics to the WAL.
  • Before setting this field, consult with your remote storage provider what message version it supports.
  • It requires Prometheus >= v2.54.0.

Enum: V1.0, V2.0

false

metadataConfig

object

MetadataConfig configures the sending of series metadata to the remote storage.

false

name

string

The name of the remote write queue, it must be unique if specified. The name is used in metrics and logging in order to differentiate queues.

It requires Prometheus >= v2.15.0.

false

noProxy

string

noProxy is a comma-separated string that can contain IPs, CIDR notation, or domain names that should be excluded from proxying. IP and domain names can contain port numbers.

It requires Prometheus >= v2.43.0 or Alertmanager >= 0.25.0.

false

oauth2

object

OAuth2 configuration for the URL.

  • It requires Prometheus >= v2.27.0.
  • Cannot be set at the same time as sigv4, authorization, basicAuth, or azureAd.

false

proxyConnectHeader

map[string][]object

ProxyConnectHeader optionally specifies headers to send to proxies during CONNECT requests.

It requires Prometheus >= v2.43.0 or Alertmanager >= 0.25.0.

false

proxyFromEnvironment

boolean

Whether to use the proxy configuration defined by environment variables (HTTP_PROXY, HTTPS_PROXY, and NO_PROXY).

It requires Prometheus >= v2.43.0 or Alertmanager >= 0.25.0.

false

proxyUrl

string

proxyURL defines the HTTP proxy server to use.

false

queueConfig

object

QueueConfig allows tuning of the remote write queue parameters.

false

remoteTimeout

string

Timeout for requests to the remote write endpoint.

false

sendExemplars

boolean

Enables sending of exemplars over remote write. Note that exemplar-storage itself must be enabled using the spec.enableFeatures option for exemplars to be scraped in the first place.

It requires Prometheus >= v2.27.0.

false

sendNativeHistograms

boolean

Enables sending of native histograms, also known as sparse histograms over remote write.

It requires Prometheus >= v2.40.0.

false

sigv4

object

Sigv4 allows to configures AWS’s Signature Verification 4 for the URL.

  • It requires Prometheus >= v2.26.0.
  • Cannot be set at the same time as authorization, basicAuth, oauth2, or azureAd.

false

tlsConfig

object

TLS Config to use for the URL.

false

writeRelabelConfigs

[]object

The list of remote write relabel configurations.

false

Authorization section for the URL.

  • It requires Prometheus >= v2.26.0.
  • Cannot be set at the same time as sigv4, basicAuth, oauth2, or azureAd.
Expand
NameTypeDescriptionRequired

credentials

object

Selects a key of a Secret in the namespace that contains the credentials for authentication.

false

credentialsFile

string

File to read a secret from, mutually exclusive with credentials.

false

type

string

Defines the authentication type. The value is case-insensitive.

"Basic" is not a supported value.

Default: "Bearer"

false

Selects a key of a Secret in the namespace that contains the credentials for authentication.

Expand
NameTypeDescriptionRequired

key

string

The key of the secret to select from. Must be a valid secret key.

true

name

string

Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong. For more information, see Kubernetes Docs

false

optional

boolean

Specify whether the Secret or its key must be defined

false

AzureAD for the URL.

  • It requires Prometheus >= v2.45.0.
  • Cannot be set at the same time as authorization, basicAuth, oauth2, or sigv4.
Expand
NameTypeDescriptionRequired

cloud

enum

The Azure Cloud. Options are 'AzurePublic', 'AzureChina', or 'AzureGovernment'.

Enum: AzureChina, AzureGovernment, AzurePublic

false

managedIdentity

object

ManagedIdentity defines the Azure User-assigned Managed identity. Cannot be set at the same time as oauth or sdk.

false

oauth

object

OAuth defines the oauth config that is being used to authenticate.

  • Cannot be set at the same time as managedIdentity or sdk.
  • It requires Prometheus >= v2.48.0.

false

sdk

object

SDK defines the Azure SDK config that is being used to authenticate. See Azure SDK Authentication.

  • Cannot be set at the same time as oauth or managedIdentity.
  • It requires Prometheus >= 2.52.0.

false

ManagedIdentity defines the Azure User-assigned Managed identity.

  • Cannot be set at the same time as oauth or sdk.
Expand
NameTypeDescriptionRequired

clientId

string

The client id

true

OAuth defines the oauth config that is being used to authenticate.

  • Cannot be set at the same time as managedIdentity or sdk.
  • It requires Prometheus >= v2.48.0.
Expand
NameTypeDescriptionRequired

clientId

string

clientID is the clientId of the Azure Active Directory application that is being used to authenticate.

true

clientSecret

object

clientSecret specifies a key of a Secret containing the client secret of the Azure Active Directory application that is being used to authenticate.

true

tenantId

string

tenantId is the tenant ID of the Azure Active Directory application that is being used to authenticate.

true

clientSecret specifies a key of a Secret containing the client secret of the Azure Active Directory application that is being used to authenticate.

Expand
NameTypeDescriptionRequired

key

string

The key of the secret to select from. Must be a valid secret key.

true

name

string

Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong.

For more information, see Kubernetes Docs

false

optional

boolean

Specify whether the Secret or its key must be defined

false

SDK defines the Azure SDK config that is being used to authenticate.

For more information, see Azure SDK Authentication.

  • Cannot be set at the same time as oauth or managedIdentity.
  • It requires Prometheus >= 2.52.0.
Expand
NameTypeDescriptionRequired

tenantId

string

tenantId is the tenant ID of the azure active directory application that is being used to authenticate.

false

BasicAuth configuration for the URL.

  • Cannot be set at the same time as sigv4, authorization, oauth2, or azureAd.
Expand
NameTypeDescriptionRequired

password

object

password specifies a key of a Secret containing the password for authentication.

false

username

object

username specifies a key of a Secret containing the username for authentication.

false

password specifies a key of a Secret containing the password for authentication.

Expand
NameTypeDescriptionRequired

key

string

The key of the secret to select from. Must be a valid secret key.

true

name

string

Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong.

For more information, see Kubernetes Docs

false

optional

boolean

Specify whether the Secret or its key must be defined

false

username specifies a key of a Secret containing the username for authentication.

Expand
NameTypeDescriptionRequired

key

string

The key of the secret to select from. Must be a valid secret key.

true

name

string

Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong.

For more information, see Kubernetes Docs

false

optional

boolean

Specify whether the Secret or its key must be defined

false

MetadataConfig configures the sending of series metadata to the remote storage.

Expand
NameTypeDescriptionRequired

send

boolean

Defines whether metric metadata is sent to the remote storage or not.

false

sendInterval

string

Defines how frequently metric metadata is sent to the remote storage.

false

OAuth2 configuration for the URL.

  • It requires Prometheus >= v2.27.0.
  • Cannot be set at the same time as sigv4, authorization, basicAuth, or azureAd.
Expand
NameTypeDescriptionRequired

clientId

object

clientId specifies a key of a Secret or ConfigMap object containing the OAuth2 client’s ID.

true

clientSecret

object

clientSecret specifies a key of a Secret containing the OAuth2 client’s secret.

true

tokenUrl

string

tokenURL configures the URL to fetch the token from.

true

endpointParams

map[string]string

endpointParams configures the HTTP parameters to append to the token URL.

false

noProxy

string

noProxy is a comma-separated string that can contain IPs, CIDR notation, or domain names that should be excluded from proxying. IP and domain names can contain port numbers.

It requires Prometheus >= v2.43.0 or Alertmanager >= 0.25.0.

false

proxyConnectHeader

map[string][]object

ProxyConnectHeader optionally specifies headers to send to proxies during CONNECT requests.

It requires Prometheus >= v2.43.0 or Alertmanager >= 0.25.0.

false

proxyFromEnvironment

boolean

Whether to use the proxy configuration defined by environment variables (HTTP_PROXY, HTTPS_PROXY, and NO_PROXY).

It requires Prometheus >= v2.43.0 or Alertmanager >= 0.25.0.

false

proxyUrl

string

proxyURL defines the HTTP proxy server to use.

false

scopes

[]string

scopes defines the OAuth2 scopes used for the token request.

false

tlsConfig

object

TLS configuration to use when connecting to the OAuth2 server.

It requires Prometheus >= v2.43.0.

false

clientId specifies a key of a Secret or ConfigMap containing the OAuth2 client’s ID.

Expand
NameTypeDescriptionRequired

configMap

object

ConfigMap containing data to use for the targets.

false

secret

object

Secret containing data to use for the targets.

false

ConfigMap containing data to use for the targets.

Expand
NameTypeDescriptionRequired

key

string

The key to select.

true

name

string

Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong.

For more information, see Kubernetes Docs

false

optional

boolean

Specify whether the ConfigMap or its key must be defined

false

Secret containing data to use for the targets.

Expand
NameTypeDescriptionRequired

key

string

The key of the secret to select from. Must be a valid secret key.

true

name

string

Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong.

For more information, see Kubernetes Docs

false

optional

boolean

Specify whether the Secret or its key must be defined

false

clientSecret specifies a key of a Secret containing the OAuth2 client’s secret.

Expand
NameTypeDescriptionRequired

key

string

The key of the secret to select from. Must be a valid secret key.

true

name

string

Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong.

For more information, see Kubernetes Docs

false

optional

boolean

Specify whether the Secret or its key must be defined

false

SecretKeySelector selects a key of a Secret.

Expand
NameTypeDescriptionRequired

key

string

The key of the secret to select from. Must be a valid secret key.

true

name

string

Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong.

For more information, see Kubernetes Docs

false

optional

boolean

Specify whether the Secret or its key must be defined

false

TLS configuration to use when connecting to the OAuth2 server.

  • It requires Prometheus >= v2.43.0.
Expand
NameTypeDescriptionRequired

ca

object

Certificate authority used when verifying server certificates.

false

cert

object

Client certificate to present when doing client-authentication.

false

insecureSkipVerify

boolean

Disable target certificate validation.

false

keySecret

object

Secret containing the client key file for the targets.

false

maxVersion

enum

Maximum acceptable TLS version.

It requires Prometheus >= v2.41.0.

Enum: TLS10, TLS11, TLS12, TLS13

false

minVersion

enum

Minimum acceptable TLS version.

It requires Prometheus >= v2.35.0.

Enum: TLS10, TLS11, TLS12, TLS13

false

serverName

string

Used to verify the hostname for the targets.

false

Certificate authority used when verifying server certificates.

Expand
NameTypeDescriptionRequired

configMap

object

ConfigMap containing data to use for the targets.

false

secret

object

Secret containing data to use for the targets.

false

ConfigMap containing data to use for the targets.

Expand
NameTypeDescriptionRequired

key

string

The key to select.

true

name

string

Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong.

For more information, see Kubernetes Docs

false

optional

boolean

Specify whether the ConfigMap or its key must be defined

false

Secret containing data to use for the targets.

Expand
NameTypeDescriptionRequired

key

string

The key of the secret to select from. Must be a valid secret key.

true

name

string

Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong.

For more information, see Kubernetes Docs

false

optional

boolean

Specify whether the Secret or its key must be defined

false

Client certificate to present when doing client-authentication.

Expand
NameTypeDescriptionRequired

configMap

object

ConfigMap containing data to use for the targets.

false

secret

object

Secret containing data to use for the targets.

false

ConfigMap containing data to use for the targets.

Expand
NameTypeDescriptionRequired

key

string

The key to select.

true

name

string

Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong.

For more information, see Kubernetes Docs

false

optional

boolean

Specify whether the ConfigMap or its key must be defined

false

Secret containing data to use for the targets.

Expand
NameTypeDescriptionRequired

key

string

The key of the secret to select from. Must be a valid secret key.

true

name

string

Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong.

For more information, see Kubernetes Docs

false

optional

boolean

Specify whether the Secret or its key must be defined

false

Secret containing the client key file for the targets.

Expand
NameTypeDescriptionRequired

key

string

The key of the secret to select from. Must be a valid secret key.

true

name

string

Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong.

For more information, see Kubernetes Docs

false

optional

boolean

Specify whether the Secret or its key must be defined

false

SecretKeySelector selects a key of a Secret.

Expand
NameTypeDescriptionRequired

key

string

The key of the secret to select from. Must be a valid secret key.

true

name

string

Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong.

For more information, see Kubernetes Docs

false

optional

boolean

Specify whether the Secret or its key must be defined

false

QueueConfig allows tuning of the remote write queue parameters.

Expand
NameTypeDescriptionRequired

batchSendDeadline

string

BatchSendDeadline is the maximum time a sample will wait in buffer.

false

capacity

integer

Capacity is the number of samples to buffer per shard before we start dropping them.

false

maxBackoff

string

MaxBackoff is the maximum retry delay.

false

maxRetries

integer

MaxRetries is the maximum number of times to retry a batch on recoverable errors.

false

maxSamplesPerSend

integer

MaxSamplesPerSend is the maximum number of samples per send.

false

maxShards

integer

MaxShards is the maximum number of shards, that is, the amount of concurrency.

false

minBackoff

string

MinBackoff is the initial retry delay. Gets doubled for every retry.

false

minShards

integer

MinShards is the minimum number of shards, that is, the amount of concurrency.

false

retryOnRateLimit

boolean

Retry upon receiving a 429 status code from the remote-write storage.

This is an experimental feature, it may change in any upcoming release in a breaking way.

false

sampleAgeLimit

string

SampleAgeLimit drops samples older than the limit.

It requires Prometheus >= v2.50.0.

false

Sigv4 allows to configures AWS’s Signature Verification 4 for the URL.

  • It requires Prometheus >= v2.26.0.
  • Cannot be set at the same time as authorization, basicAuth, oauth2, or azureAd.
Expand
NameTypeDescriptionRequired

accessKey

object

AccessKey is the AWS API key. If not specified, the environment variable AWS_ACCESS_KEY_ID is used.

false

profile

string

Profile is the named AWS profile used to authenticate.

false

region

string

Region is the AWS region. If blank, the region from the default credentials chain used.

false

roleArn

string

RoleArn is the named AWS profile used to authenticate.

false

secretKey

object

SecretKey is the AWS API secret. If not specified, the environment variable AWS_SECRET_ACCESS_KEY is used.

false

AccessKey is the AWS API key. If not specified, the environment variable AWS_ACCESS_KEY_ID is used.

Expand
NameTypeDescriptionRequired

key

string

The key of the secret to select from. Must be a valid secret key.

true

name

string

Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong.

For more information, see Kubernetes Docs

false

optional

boolean

Specify whether the Secret or its key must be defined

false

SecretKey is the AWS API secret. If not specified, the environment variable AWS_SECRET_ACCESS_KEY is used.

Expand
NameTypeDescriptionRequired

key

string

The key of the secret to select from. Must be a valid secret key.

true

name

string

Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong.

For more information, see Kubernetes Docs

false

optional

boolean

Specify whether the Secret or its key must be defined

false

TLS Config to use for the URL.

Expand
NameTypeDescriptionRequired

ca

object

Certificate authority used when verifying server certificates.

false

caFile

string

Path to the CA cert in the Prometheus container to use for the targets.

false

cert

object

Client certificate to present when doing client-authentication.

false

certFile

string

Path to the client cert file in the Prometheus container for the targets.

false

insecureSkipVerify

boolean

Disable target certificate validation.

false

keyFile

string

Path to the client key file in the Prometheus container for the targets.

false

keySecret

object

Secret containing the client key file for the targets.

false

maxVersion

enum

Maximum acceptable TLS version.

It requires Prometheus >= v2.41.0.

Enum: TLS10, TLS11, TLS12, TLS13

false

minVersion

enum

Minimum acceptable TLS version.

It requires Prometheus >= v2.35.0.

Enum: TLS10, TLS11, TLS12, TLS13

false

serverName

string

Used to verify the hostname for the targets.

false

Certificate authority used when verifying server certificates.

Expand
NameTypeDescriptionRequired

configMap

object

ConfigMap containing data to use for the targets.

false

secret

object

Secret containing data to use for the targets.

false

ConfigMap containing data to use for the targets.

Expand
NameTypeDescriptionRequired

key

string

The key to select.

true

name

string

Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong.

For more information, see Kubernetes Docs

false

optional

boolean

Specify whether the ConfigMap or its key must be defined

false

Secret containing data to use for the targets.

Expand
NameTypeDescriptionRequired

key

string

The key of the secret to select from. Must be a valid secret key.

true

name

string

Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong.

For more information, see Kubernetes Docs

false

optional

boolean

Specify whether the Secret or its key must be defined

false

Client certificate to present when doing client-authentication.

Expand
NameTypeDescriptionRequired

configMap

object

ConfigMap containing data to use for the targets.

false

secret

object

Secret containing data to use for the targets.

false

ConfigMap containing data to use for the targets.

Expand
NameTypeDescriptionRequired

key

string

The key to select.

true

name

string

Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong.

For more information, see Kubernetes Docs

false

optional

boolean

Specify whether the ConfigMap or its key must be defined

false

Secret containing data to use for the targets.

Expand
NameTypeDescriptionRequired

key

string

The key of the secret to select from. Must be a valid secret key.

true

name

string

Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong.

For more information, see Kubernetes Docs

false

optional

boolean

Specify whether the Secret or its key must be defined

false

Secret containing the client key file for the targets.

Expand
NameTypeDescriptionRequired

key

string

The key of the secret to select from. Must be a valid secret key.

true

name

string

Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong.

For more information, see Kubernetes Docs

false

optional

boolean

Specify whether the Secret or its key must be defined

false

RelabelConfig allows dynamic rewriting of the label set for targets, alerts, scraped samples and remote write samples.

For more information, see Prometheus Docs

Expand
NameTypeDescriptionRequired

action

enum

Action to perform based on the regex matching.

  • Uppercase and Lowercase actions require Prometheus >= v2.36.0.
  • DropEqual and KeepEqual actions require Prometheus >= v2.41.0.

Enum: replace, Replace, keep, Keep, drop, Drop, hashmod, HashMod, labelmap, LabelMap, labeldrop, LabelDrop, labelkeep, LabelKeep, lowercase, Lowercase, uppercase, Uppercase, keepequal, KeepEqual, dropequal, DropEqual

Default: replace

false

modulus

integer

Modulus to take of the hash of the source label values.

Only applicable when the action is HashMod.

Format: int64

false

regex

string

Regular expression against which the extracted value is matched.

false

replacement

string

Replacement value against which a Replace action is performed if the regular expression matches.

Regex capture groups are available.

false

separator

string

Separator is the string between concatenated SourceLabels.

false

sourceLabels

[]string

The source labels select values from existing labels. Their content is concatenated using the configured Separator and matched against the configured regular expression.

false

targetLabel

string

Label to which the resulting string is written in a replacement.

It is mandatory for Replace, HashMod, Lowercase, Uppercase, KeepEqual and DropEqual actions.

Regex capture groups are available.

false

Configure TLS options for the Prometheus web server.

Expand
NameTypeDescriptionRequired

certificate

object

Reference to the TLS public certificate for the web server.

true

certificateAuthority

object

Reference to the root Certificate Authority used to verify the web server’s certificate.

true

privateKey

object

Reference to the TLS private key for the web server.

true

Reference to the TLS public certificate for the web server.

Expand
NameTypeDescriptionRequired

key

string

The key of the secret to select from. Must be a valid secret key.

true

name

string

The name of the secret in the object’s namespace to select from.

true

Reference to the root Certificate Authority used to verify the web server’s certificate.

Expand
NameTypeDescriptionRequired

key

string

The key of the secret to select from. Must be a valid secret key.

true

name

string

The name of the secret in the object’s namespace to select from.

true

Reference to the TLS private key for the web server.

Expand
NameTypeDescriptionRequired

key

string

The key of the secret to select from. Must be a valid secret key.

true

name

string

The name of the secret in the object’s namespace to select from.

true

6.61. MonitoringStack.spec.resourceSelector

Label selector for MonitoringStack resources.

  • To monitor everything, set to empty map selector. For example, resourceSelector: {}.
  • To disable service discovery, set to null. For example, resourceSelector:.
Expand
NameTypeDescriptionRequired

matchExpressions

[]object

matchExpressions is a list of label selector requirements. The requirements are ANDed.

false

matchLabels

map[string]string

matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels map is equivalent to an element of matchExpressions, whose key field is "key", the operator is "In", and the values array contains only "value". The requirements are ANDed.

false

A label selector requirement is a selector that contains values, a key, and an operator that relates the key and values.

Expand
NameTypeDescriptionRequired

key

string

key is the label key that the selector applies to.

true

operator

string

operator represents a key’s relationship to a set of values. Valid operators are In, NotIn, Exists and DoesNotExist.

true

values

[]string

values is an array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. This array is replaced during a strategic merge patch.

false

6.63. MonitoringStack.spec.resources

Define resources requests and limits for MonitoringStack pods.

Expand
NameTypeDescriptionRequired

claims

[]object

Claims lists the names of resources, defined in spec.resourceClaims, that are used by this container.

This is an alpha field and requires enabling the DynamicResourceAllocation feature gate.

This field is immutable. It can only be set for containers.

false

limits

map[string]int or string

Limits describes the maximum amount of compute resources allowed.

For more information, see Kubernetes Docs

false

requests

map[string]int or string

Requests describes the minimum amount of compute resources required. If Requests is omitted for a container, it defaults to Limits if that is explicitly specified, otherwise to an implementation-defined value. Requests cannot exceed Limits.

For more information, see Kubernetes Docs

false

6.64. MonitoringStack.spec.resources.claims[index]

ResourceClaim references one entry in PodSpec.ResourceClaims.

Expand
NameTypeDescriptionRequired

name

string

Name must match the name of one entry in pod.spec.resourceClaims of the Pod where this field is used. It makes that resource available inside a container.

true

request

string

Request is the name chosen for a request in the referenced claim. If empty, everything from the claim is made available, otherwise only the result of this request.

false

6.65. MonitoringStack.spec.tolerations[index]

The pod this Toleration is attached to tolerates any taint that matches the triple <key,value,effect> using the matching operator <operator>.

Expand
NameTypeDescriptionRequired

effect

string

Effect indicates the taint effect to match. Empty means match all taint effects. When specified, allowed values are NoSchedule, PreferNoSchedule and NoExecute.

false

key

string

Key is the taint key that the toleration applies to. Empty means match all taint keys. If the key is empty, operator must be Exists; this combination means to match all values and all keys.

false

operator

string

Operator represents a key’s relationship to the value. Valid operators are Exists and Equal. Defaults to Equal. Exists is equivalent to wildcard for value, so that a pod can tolerate all taints of a particular category.

false

tolerationSeconds

integer

TolerationSeconds represents the period of time the toleration (which must be of effect NoExecute, otherwise this field is ignored) tolerates the taint. By default, it is not set, which means tolerate the taint forever (do not evict). Zero and negative values will be treated as 0 (evict immediately) by the system.

Format: int64

false

value

string

Value is the taint value the toleration matches to. If the operator is Exists, the value should be empty, otherwise just a regular string.

false

6.66. MonitoringStack.status

MonitoringStackStatus defines the observed state of the MonitoringStack instance. It should always be reconstructable from the state of the cluster and/or outside world.

Expand
NameTypeDescriptionRequired

conditions

[]object

Conditions provide status information about the MonitoringStack instance.

true

6.67. MonitoringStack.status.conditions[index]

Expand
NameTypeDescriptionRequired

lastTransitionTime

string

lastTransitionTime is the last time the condition transitioned from one status to another. This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable.

Format: date-time

true

message

string

message is a human readable message indicating details about the transition. This may be an empty string.

true

reason

string

reason contains a programmatic identifier indicating the reason for the condition’s last transition. Producers of specific condition types may define expected values and meanings for this field, and whether the values are considered a guaranteed API. The value should be a CamelCase string. This field may not be empty.

true

status

enum

status of the condition

Enum: True, False, Unknown, Degraded

true

type

string

type of condition in CamelCase or in foo.example.com/CamelCase. The regex it matches is (dns1123SubdomainFmt/)?(qualifiedNameFmt)

true

observedGeneration

integer

observedGeneration represents the .metadata.generation that the condition was set based upon. For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date with respect to the current state of the instance.

Format: int64

Minimum: 0

false

6.68. ThanosQuerier

ThanosQuerier outlines the Thanos querier components, managed by this stack

Expand
NameTypeDescriptionRequired

apiVersion

string

monitoring.rhobs/v1alpha1

true

kind

string

ThanosQuerier

true

metadata

object

Refer to the Kubernetes API documentation for the fields of the metadata field.

true

spec

object

ThanosQuerierSpec defines a single Thanos Querier instance. This means a label selector by which MonitoringStack instances to query are selected, and an optional namespace selector and a list of replica labels by which to deduplicate.

false

status

object

ThanosQuerierStatus defines the observed state of ThanosQuerier. It should always be reconstructable from the state of the cluster and/or outside world.

false

6.69. ThanosQuerier.spec

ThanosQuerierSpec defines a single Thanos Querier instance. This means a label selector by which MonitoringStack instances to query are selected, and an optional namespace selector and a list of replica labels by which to deduplicate.

Expand
NameTypeDescriptionRequired

selector

object

Selector to select monitoring stacks to unify.

true

namespaceSelector

object

Selector to select which namespaces the MonitoringStack objects are discovered from.

false

replicaLabels

[]string

 

false

webTLSConfig

object

Configure TLS options for the Thanos web server.

false

6.70. ThanosQuerier.spec.selector

Selector to select monitoring stacks to unify.

Expand
NameTypeDescriptionRequired

matchExpressions

[]object

matchExpressions is a list of label selector requirements. The requirements are ANDed.

false

matchLabels

map[string]string

matchLabels is a map of {key,value} pairs. A single {key,value} in the matchLabels map is equivalent to an element of matchExpressions, whose key field is "key", the operator is "In", and the values array contains only "value". The requirements are ANDed.

false

A label selector requirement is a selector that contains values, a key, and an operator that relates the key and values.

Expand
NameTypeDescriptionRequired

key

string

key is the label key that the selector applies to.

true

operator

string

operator represents a key’s relationship to a set of values. Valid operators are In, NotIn, Exists and DoesNotExist.

true

values

[]string

values is an array of string values. If the operator is In or NotIn, the values array must be non-empty. If the operator is Exists or DoesNotExist, the values array must be empty. This array is replaced during a strategic merge patch.

false

6.72. ThanosQuerier.spec.namespaceSelector

Selector to select which namespaces the MonitoringStack objects are discovered from.

Expand
NameTypeDescriptionRequired

any

boolean

Boolean describing whether all namespaces are selected in contrast to a list restricting them.

false

matchNames

[]string

List of namespace names.

false

6.73. ThanosQuerier.spec.webTLSConfig

Configure TLS options for the Thanos web server.

Expand
NameTypeDescriptionRequired

certificate

object

Reference to the TLS public certificate for the web server.

true

certificateAuthority

object

Reference to the root Certificate Authority used to verify the web server’s certificate.

true

privateKey

object

Reference to the TLS private key for the web server.

true

6.74. ThanosQuerier.spec.webTLSConfig.certificate

Reference to the TLS public certificate for the web server.

Expand
NameTypeDescriptionRequired

key

string

The key of the secret to select from. Must be a valid secret key.

true

name

string

The name of the secret in the object’s namespace to select from.

true

Reference to the root Certificate Authority used to verify the web server’s certificate.

Expand
NameTypeDescriptionRequired

key

string

The key of the secret to select from. Must be a valid secret key.

true

name

string

The name of the secret in the object’s namespace to select from.

true

6.76. ThanosQuerier.spec.webTLSConfig.privateKey

Reference to the TLS private key for the web server.

Expand
NameTypeDescriptionRequired

key

string

The key of the secret to select from. Must be a valid secret key.

true

name

string

The name of the secret in the object’s namespace to select from.

true

Legal Notice

Copyright © 2025 Red Hat

OpenShift documentation is licensed under the Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0).

Modified versions must remove all Red Hat trademarks.

Portions adapted from https://github.com/kubernetes-incubator/service-catalog/ with modifications by Red Hat.

Red Hat, Red Hat Enterprise Linux, the Red Hat logo, the Shadowman logo, JBoss, OpenShift, Fedora, the Infinity logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries.

Linux® is the registered trademark of Linus Torvalds in the United States and other countries.

Java® is a registered trademark of Oracle and/or its affiliates.

XFS® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries.

MySQL® is a registered trademark of MySQL AB in the United States, the European Union and other countries.

Node.js® is an official trademark of Joyent. Red Hat Software Collections is not formally related to or endorsed by the official Joyent Node.js open source or commercial project.

The OpenStack® Word Mark and OpenStack logo are either registered trademarks/service marks or trademarks/service marks of the OpenStack Foundation, in the United States and other countries and are used with the OpenStack Foundation’s permission. We are not affiliated with, endorsed or sponsored by the OpenStack Foundation, or the OpenStack community.

All other trademarks are the property of their respective owners.

Back to top
Red Hat logoGithubredditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust. Explore our recent updates.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

Theme

© 2025 Red Hat