Cluster Observability Operator
Configuring and using the Cluster Observability Operator in OpenShift Container Platform
Abstract
Chapter 1. Cluster Observability Operator release notes Copy linkLink copied to clipboard!
The Cluster Observability Operator (COO) is an optional OpenShift Container Platform Operator that enables administrators to create standalone monitoring stacks that are independently configurable for use by different services and users.
The COO complements the built-in monitoring capabilities of OpenShift Container Platform. You can deploy it in parallel with the default platform and user workload monitoring stacks managed by the Cluster Monitoring Operator (CMO).
These release notes track the development of the Cluster Observability Operator in OpenShift Container Platform.
The following table provides information about which features are available depending on the version of Cluster Observability Operator and OpenShift Container Platform:
COO Version | OCP Versions | Distributed tracing | Logging | Troubleshooting panel | ACM alerts | Incident detection |
---|---|---|---|---|---|---|
1.1+ | 4.12 - 4.14 | ✔ | ✔ | ✘ | ✘ | ✘ |
1.1+ | 4.15 | ✔ | ✔ | ✘ | ✔ | ✘ |
1.1+ | 4.16 - 4.18 | ✔ | ✔ | ✔ | ✔ | ✘ |
1.2+ | 4.19+ | ✔ | ✔ | ✔ | ✔ | ✔ |
1.1. Cluster Observability Operator 1.2.2 Copy linkLink copied to clipboard!
The following advisory is available for Cluster Observability Operator 1.2.2:
1.1.1. Bug fixes Copy linkLink copied to clipboard!
Before this update, the installation of the incident detection feature could fail intermittently. The symptoms include the incident detection UI being visible but not including any data. In addition, the health-analyzer
ServiceMonitor
resource is in a failed state, with the error messagetls: failed to verify certificate: x509
. With this release, the incident detection feature installs correctly. (COO-1062)If you are upgrading from 1.2.1 where the bug was occurring, you must recreate the monitoring UI plugin to resolve the issue.
1.1.2. Known issues Copy linkLink copied to clipboard!
These are the known issues in Cluster Observability Operator 1.2.2:
When installing version 1.2.2 or when upgrading from version 1.2.0, the monitoring plugin’s
UIPlugin
resource can be corrupted. This occurs when you have also deployed distributed tracing, the troubleshooting panel, and Advance Cluster Management (ACM), together with the monitoring UI plugin. You can resolve this issue by recreating the UI plugin. (COO-1051)If you have previously resolved the issue in 1.2.1 and then upgrade to 1.2.2, this issue will not reoccur.
1.2. Cluster Observability Operator 1.2.1 Copy linkLink copied to clipboard!
The following advisory is available for Cluster Observability Operator 1.2.1:
1.2.1. Bug fixes Copy linkLink copied to clipboard!
- Before this update, an old version label matcher was retained during the Operator version 1.2 upgrade. This caused Perses dashboards to become unavailable. With this release, the version label is removed and Perses dashboards are correctly reconciled.
1.2.2. Known issues Copy linkLink copied to clipboard!
These are the known issues in Cluster Observability Operator 1.2.1:
-
The installation of the incident detection feature could fail intermittently. The symptoms include the incident detection UI being visible but not including any data. In addition, the health-analyzer
ServiceMonitor
resource is in a failed state, with the error messagetls: failed to verify certificate: x509
. You can resolve this issue by upgrading to 1.2.2 and recreating the monitoring UI plugin. (COO-1062) -
When installing version 1.2.1 or when upgrading from version 1.2.0, the monitoring plugin’s
UIPlugin
resource can be corrupted. This occurs when you have also deployed distributed tracing, the troubleshooting panel, and Advance Cluster Management (ACM), together with the monitoring UI plugin. You can resolve this issue by recreating the UI plugin. (COO-1051)
1.3. Cluster Observability Operator 1.2 Copy linkLink copied to clipboard!
The following advisory is available for Cluster Observability Operator 1.2:
1.3.1. New features and enhancements Copy linkLink copied to clipboard!
- The logging UI plugin now supports the OTEL format, in addition to the previously supported ViaQ scheme. (COO-816)
- Accelerators Perses dashboards are deployed by default when you install the monitoring UI plugin. (COO-942)
- Multiple results per graph node are now displayed for Korrel8r. (COO-785)
- Direct navigation to individual incident detail is now supported in the incident detection panel, and this enables the incidents overview functionality in Red Hat Advanced Cluster Management (RHACM) 2.14. (COO-977, ACM-18751)
- Advanced filters have been added to the tracing view. (COO-979)
- The status of the distributed tracing UI plugin is now General Availability (GA), supporting Patternfly 4, 5 and 6. (COO-873)
1.3.2. Bug fixes Copy linkLink copied to clipboard!
- Previously, LokiStack was a prerequisite for installing the logging UI plugin. With this release, you can install the logging UI plugin without LokiStack. (COO-760)
- Previously, the Silence Alert button in the Incidents → Component section did not pre-populate the fields and was not usable. This release resolves the issue. (COO-970)
1.3.3. Known issues Copy linkLink copied to clipboard!
These are the known issues in Cluster Observability Operator 1.2.0:
- When upgrading from COO 1.1.1 to COO 1.2, the Perses dashboard is not correctly reconciled, and this requires the monitoring UI plugin to be reinstalled. (COO-978)
1.4. Cluster Observability Operator 1.1.1 Copy linkLink copied to clipboard!
1.4.1. Bug fixes Copy linkLink copied to clipboard!
-
Previously,
observability-operator
andperses-operator
pods on many clusters entered aCrashLoopBackOff
state due toOutOfMemory
errors, after upgrading from Cluster Observability Operator 1.0. This release resolves the issue. (COO-784)
1.5. Cluster Observability Operator 1.1 Copy linkLink copied to clipboard!
The following advisory is available for Cluster Observability Operator 1.1:
1.5.1. New features and enhancements Copy linkLink copied to clipboard!
1.5.2. Known issues Copy linkLink copied to clipboard!
These are the known issues in Cluster Observability Operator 1.1.0:
observability-operator
andperses-operator
pods enter aCrashLoopBackOff
state due toOutOfMemory
errors, after upgrading from Cluster Observability Operator 1.0.A workaround is provided in the knowledge base article ClusterObservability and perses operator pod in CrashLoopBackOff due to OOMKilled in RHOCP4.
This issue is being tracked in COO-784.
1.5.3. Bug fixes Copy linkLink copied to clipboard!
- Previously, the logging UI plugin did not support setting a custom LokiStack name or namespace. This release resolves the issue. (COO-332)
1.6. Cluster Observability Operator 1.0 Copy linkLink copied to clipboard!
1.6.1. New features and enhancements Copy linkLink copied to clipboard!
COO is now enabled for OpenShift Container Platform platform monitoring. (COO-476)
- Implements HTTPS support for COO web server. (COO-480)
- Implements authn/authz for COO web server. (COO-481)
- Configures ServiceMonitor resource to collect metrics from COO. (COO-482)
-
Adds
operatorframework.io/cluster-monitoring=true
annotation to the OLM bundle. (COO-483) - Defines the alerting strategy for COO . (COO-484)
- Configures PrometheusRule for alerting. (COO-485)
-
Support level annotations have been added to the
UIPlugin
CR when created. The support level is based on the plugin type, with values ofDevPreview
,TechPreview
, orGeneralAvailability
. (COO-318)
-
You can now configure the Alertmanager
scheme
andtlsConfig
fields in the Prometheus CR. (COO-219)
The extended Technical Preview for the troubleshooting panel adds support for correlating traces with Kubernetes resources and directly with other observable signals including logs, alerts, metrics, and network events. (COO-450)
-
You can select a Tempo instance and tenant when you navigate to the tracing page by clicking Observe → Tracing in the web console. The preview troubleshooting panel only works with the
openshift-tracing / platform
instance and theplatform
tenant. - The troubleshooting panel works best in the Administrator perspective. It has limited functionality in the Developer perspective due to authorization issues with some back ends, most notably Prometheus for metrics and alerts. This will be addressed in a future release.
-
You can select a Tempo instance and tenant when you navigate to the tracing page by clicking Observe → Tracing in the web console. The preview troubleshooting panel only works with the
The following table provides information about which features are available depending on the version of Cluster Observability Operator and OpenShift Container Platform:
COO Version | OCP Versions | Distributed Tracing | Logging | Troubleshooting Panel |
---|---|---|---|---|
1.0 | 4.12 - 4.15 | ✔ | ✔ | ✘ |
1.0 | 4.16+ | ✔ | ✔ | ✔ |
1.6.2. CVEs Copy linkLink copied to clipboard!
1.6.3. Bug fixes Copy linkLink copied to clipboard!
-
Previously, the default namespace for the COO installation was
openshift-operators
. With this release, the defaullt namespace changes toopenshift-cluster-observability-operator
. (COO-32) -
Previously,
korrel8r
was only able to parse time series selector expressions. With this release,korrel8r
can parse any valid PromQL expression to extract the time series selectors that it uses for correlation. (COO-558) - Previously, when viewing a Tempo instance from the Distributed Tracing UI plugin, the scatter plot graph showing the traces duration was not rendered correctly. The bubble size was too large and overlapped the x and y axis. With this release, the graph is rendered correctly. (COO-319)
1.7. Features available on older, Technology Preview releases Copy linkLink copied to clipboard!
The following table provides information about which features are available depending on older version of Cluster Observability Operator and OpenShift Container Platform:
COO Version | OCP Versions | Dashboards | Distributed Tracing | Logging | Troubleshooting Panel |
---|---|---|---|---|---|
0.2.0 | 4.11 | ✔ | ✘ | ✘ | ✘ |
0.3.0+, 0.4.0+ | 4.11 - 4.15 | ✔ | ✔ | ✔ | ✘ |
0.3.0+, 0.4.0+ | 4.16+ | ✔ | ✔ | ✔ | ✔ |
1.8. Cluster Observability Operator 0.4.1 Copy linkLink copied to clipboard!
The following advisory is available for Cluster Observability Operator 0.4.1:
1.8.1. New features and enhancements Copy linkLink copied to clipboard!
- You can now configure WebTLS for Prometheus and Alertmanager.
1.8.2. CVEs Copy linkLink copied to clipboard!
1.8.3. Bug fixes Copy linkLink copied to clipboard!
-
Previously, when you deleted the dashboard UI plugin, the
consoles.operator.openshift.io
resource still containedconsole-dashboards-plugin
. This release resolves the issue. (COO-152) - Previously, the web console did not display the correct icon for Red Hat COO . This release resolves the issue. (COO-353)
- Previously, when you installed the COO from the web console, the support section contained an invalid link. This release resolves the issue. (COO-354)
- Previously, the cluster service version (CSV) for COO linked to an unofficial version of the documentation. This release resolves the issue. (COO-356)
1.9. Cluster Observability Operator 0.4.0 Copy linkLink copied to clipboard!
The following advisory is available for Cluster Observability Operator 0.4.0:
1.9.1. New features and enhancements Copy linkLink copied to clipboard!
1.9.1.1. Troubleshooting UI plugin Copy linkLink copied to clipboard!
- The troubleshooting UI panel has been improved so you can now select and focus on a specific starting signal.
- There is more visibility into Korrel8r queries, with the option of selecting the depth.
-
Users of OpenShift Container Platform version 4.17+ can access the troubleshooting UI panel from the Application Launcher
. Alternatively, on versions 4.16+, you can access it in the web console by clicking on Observe → Alerting.
For more information, see troubleshooting UI plugin.
1.9.1.2. Distributed tracing UI plugin Copy linkLink copied to clipboard!
- The distributed tracing UI plugin has been enhanced, with a Gantt chart now available for exploring traces.
For more information, see distributed tracing UI plugin.
1.9.2. Bug fixes Copy linkLink copied to clipboard!
- Previously, metrics were not available to normal users when accessed in the Developer perspective of the web console, by clicking on Observe → Logs. This release resolves the issue. (COO-288)
- Previously, the troubleshooting UI plugin used the wrong filter for network observability. This release resolves the issue. (COO-299)
- Previously, the troubleshooting UI plugin generated an incorrect URL for pod label searches. This release resolves the issue. (COO-298)
-
Previously, there was an authorization vulnerability in the Distributed tracing UI plugin. This release resolves the issue and the Distributed tracing UI plugin has been hardened by using only multi-tenant
TempoStack
andTempoMonolithic
instances going forward.
1.10. Cluster Observability Operator 0.3.2 Copy linkLink copied to clipboard!
The following advisory is available for Cluster Observability Operator 0.3.2:
1.10.1. New features and enhancements Copy linkLink copied to clipboard!
-
With this release, you can now use tolerations and node selectors with
MonitoringStack
components.
1.10.2. Bug fixes Copy linkLink copied to clipboard!
-
Previously, the logging UIPlugin was not in the
Available
state and the logging pod was not created, when installed on a specific version of OpenShift Container Platform. This release resolves the issue. (COO-260)
1.11. Cluster Observability Operator 0.3.0 Copy linkLink copied to clipboard!
The following advisory is available for Cluster Observability Operator 0.3.0:
1.11.1. New features and enhancements Copy linkLink copied to clipboard!
- With this release, the Cluster Observability Operator adds backend support for future OpenShift Container Platform observability web console UI plugins and observability components.
1.12. Cluster Observability Operator 0.2.0 Copy linkLink copied to clipboard!
The following advisory is available for Cluster Observability Operator 0.2.0:
1.12.1. New features and enhancements Copy linkLink copied to clipboard!
- With this release, the Cluster Observability Operator supports installing and managing observability-related plugins for the OpenShift Container Platform web console user interface (UI). (COO-58)
1.13. Cluster Observability Operator 0.1.3 Copy linkLink copied to clipboard!
The following advisory is available for Cluster Observability Operator 0.1.3:
1.13.1. Bug fixes Copy linkLink copied to clipboard!
-
Previously, if you tried to access the Prometheus web user interface (UI) at
http://<prometheus_url>:9090/graph
, the following error message would display:Error opening React index.html: open web/ui/static/react/index.html: no such file or directory
. This release resolves the issue, and the Prometheus web UI now displays correctly. (COO-34)
1.14. Cluster Observability Operator 0.1.2 Copy linkLink copied to clipboard!
The following advisory is available for Cluster Observability Operator 0.1.2:
1.14.1. CVEs Copy linkLink copied to clipboard!
1.14.2. Bug fixes Copy linkLink copied to clipboard!
- Previously, certain cluster service version (CSV) annotations were not included in the metadata for COO. Because of these missing annotations, certain COO features and capabilities did not appear in the package manifest or in the OperatorHub user interface. This release adds the missing annotations, thereby resolving this issue. (COO-11)
- Previously, automatic updates of the COO did not work, and a newer version of the Operator did not automatically replace the older version, even though the newer version was available in OperatorHub. This release resolves the issue. (COO-12)
-
Previously, Thanos Querier only listened for network traffic on port 9090 of 127.0.0.1 (
localhost
), which resulted in a502 Bad Gateway
error if you tried to reach the Thanos Querier service. With this release, the Thanos Querier configuration has been updated so that the component now listens on the default port (10902), thereby resolving the issue. As a result of this change, you can also now modify the port via server side apply (SSA) and add a proxy chain, if required. (COO-14)
1.15. Cluster Observability Operator 0.1.1 Copy linkLink copied to clipboard!
The following advisory is available for Cluster Observability Operator 0.1.1:
1.15.1. New features and enhancements Copy linkLink copied to clipboard!
This release updates the Cluster Observability Operator to support installing the Operator in restricted networks or disconnected environments.
1.16. Cluster Observability Operator 0.1 Copy linkLink copied to clipboard!
This release makes a Technology Preview version of the Cluster Observability Operator available on OperatorHub.
Chapter 2. Cluster Observability Operator overview Copy linkLink copied to clipboard!
The Cluster Observability Operator (COO) is an optional component of the OpenShift Container Platform designed for creating and managing highly customizable monitoring stacks. It enables cluster administrators to automate configuration and management of monitoring needs extensively, offering a more tailored and detailed view of each namespace compared to the default OpenShift Container Platform monitoring system.
The COO deploys the following monitoring components:
- Prometheus - A highly available Prometheus instance capable of sending metrics to an external endpoint by using remote write.
- Thanos Querier (optional) - Enables querying of Prometheus instances from a central location.
- Alertmanager (optional) - Provides alert configuration capabilities for different services.
- UI plugins (optional) - Enhances the observability capabilities with plugins for monitoring, logging, distributed tracing and troubleshooting.
- Korrel8r (optional) - Provides observability signal correlation, powered by the open source Korrel8r project.
2.1. COO compared to default monitoring stack Copy linkLink copied to clipboard!
The COO components function independently of the default in-cluster monitoring stack, which is deployed and managed by the Cluster Monitoring Operator (CMO). Monitoring stacks deployed by the two Operators do not conflict. You can use a COO monitoring stack in addition to the default platform monitoring components deployed by the CMO.
The key differences between COO and the default in-cluster monitoring stack are shown in the following table:
Feature | COO | Default monitoring stack |
---|---|---|
Scope and integration | Offers comprehensive monitoring and analytics for enterprise-level needs, covering cluster and workload performance. However, it lacks direct integration with OpenShift Container Platform and typically requires an external Grafana instance for dashboards. | Limited to core components within the cluster, for example, API server and etcd, and to OpenShift-specific namespaces. There is deep integration into OpenShift Container Platform including console dashboards and alert management in the console. |
Configuration and customization | Broader configuration options including data retention periods, storage methods, and collected data types. The COO can delegate ownership of single configurable fields in custom resources to users by using Server-Side Apply (SSA), which enhances customization. | Built-in configurations with limited customization options. |
Data retention and storage | Long-term data retention, supporting historical analysis and capacity planning | Shorter data retention times, focusing on short-term monitoring and real-time detection. |
2.2. Key advantages of using COO Copy linkLink copied to clipboard!
Deploying COO helps you address monitoring requirements that are hard to achieve using the default monitoring stack.
2.2.1. Extensibility Copy linkLink copied to clipboard!
- You can add more metrics to a COO-deployed monitoring stack, which is not possible with core platform monitoring without losing support.
- You can receive cluster-specific metrics from core platform monitoring through federation.
- COO supports advanced monitoring scenarios like trend forecasting and anomaly detection.
2.2.2. Multi-tenancy support Copy linkLink copied to clipboard!
- You can create monitoring stacks per user namespace.
- You can deploy multiple stacks per namespace or a single stack for multiple namespaces.
- COO enables independent configuration of alerts and receivers for different teams.
2.2.3. Scalability Copy linkLink copied to clipboard!
- Supports multiple monitoring stacks on a single cluster.
- Enables monitoring of large clusters through manual sharding.
- Addresses cases where metrics exceed the capabilities of a single Prometheus instance.
2.2.4. Flexibility Copy linkLink copied to clipboard!
- Decoupled from OpenShift Container Platform release cycles.
- Faster release iterations and rapid response to changing requirements.
- Independent management of alerting rules.
2.3. Target users for COO Copy linkLink copied to clipboard!
COO is ideal for users who need high customizability, scalability, and long-term data retention, especially in complex, multi-tenant enterprise environments.
2.3.1. Enterprise-level users and administrators Copy linkLink copied to clipboard!
Enterprise users require in-depth monitoring capabilities for OpenShift Container Platform clusters, including advanced performance analysis, long-term data retention, trend forecasting, and historical analysis. These features help enterprises better understand resource usage, prevent performance issues, and optimize resource allocation.
2.3.2. Operations teams in multi-tenant environments Copy linkLink copied to clipboard!
With multi-tenancy support, COO allows different teams to configure monitoring views for their projects and applications, making it suitable for teams with flexible monitoring needs.
2.3.3. Development and operations teams Copy linkLink copied to clipboard!
COO provides fine-grained monitoring and customizable observability views for in-depth troubleshooting, anomaly detection, and performance tuning during development and operations.
2.4. Using Server-Side Apply to customize Prometheus resources Copy linkLink copied to clipboard!
Server-Side Apply is a feature that enables collaborative management of Kubernetes resources. The control plane tracks how different users and controllers manage fields within a Kubernetes object. It introduces the concept of field managers and tracks ownership of fields. This centralized control provides conflict detection and resolution, and reduces the risk of unintended overwrites.
Compared to Client-Side Apply, it is more declarative, and tracks field management instead of last applied state.
- Server-Side Apply
- Declarative configuration management by updating a resource’s state without needing to delete and recreate it.
- Field management
- Users can specify which fields of a resource they want to update, without affecting the other fields.
- Managed fields
-
Kubernetes stores metadata about who manages each field of an object in the
managedFields
field within metadata. - Conflicts
- If multiple managers try to modify the same field, a conflict occurs. The applier can choose to overwrite, relinquish control, or share management.
- Merge strategy
- Server-Side Apply merges fields based on the actor who manages them.
Procedure
Add a
MonitoringStack
resource using the following configuration:Example
MonitoringStack
objectCopy to Clipboard Copied! Toggle word wrap Toggle overflow A Prometheus resource named
sample-monitoring-stack
is generated in thecoo-demo
namespace. Retrieve the managed fields of the generated Prometheus resource by running the following command:oc -n coo-demo get Prometheus.monitoring.rhobs -oyaml --show-managed-fields
$ oc -n coo-demo get Prometheus.monitoring.rhobs -oyaml --show-managed-fields
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Check the
metadata.managedFields
values, and observe that some fields inmetadata
andspec
are managed by theMonitoringStack
resource. Modify a field that is not controlled by the
MonitoringStack
resource:Change
spec.enforcedSampleLimit
, which is a field not set by theMonitoringStack
resource. Create the fileprom-spec-edited.yaml
:prom-spec-edited.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the YAML by running the following command:
oc apply -f ./prom-spec-edited.yaml --server-side
$ oc apply -f ./prom-spec-edited.yaml --server-side
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteYou must use the
--server-side
flag.Get the changed Prometheus object and note that there is one more section in
managedFields
which hasspec.enforcedSampleLimit
:oc get prometheus -n coo-demo
$ oc get prometheus -n coo-demo
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Modify a field that is managed by the
MonitoringStack
resource:Change
spec.LogLevel
, which is a field managed by theMonitoringStack
resource, using the following YAML configuration:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
spec.logLevel
has been added
Apply the YAML by running the following command:
oc apply -f ./prom-spec-edited.yaml --server-side
$ oc apply -f ./prom-spec-edited.yaml --server-side
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Notice that the field
spec.logLevel
cannot be changed using Server-Side Apply, because it is already managed byobservability-operator
. Use the
--force-conflicts
flag to force the change.oc apply -f ./prom-spec-edited.yaml --server-side --force-conflicts
$ oc apply -f ./prom-spec-edited.yaml --server-side --force-conflicts
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
prometheus.monitoring.rhobs/sample-monitoring-stack serverside-applied
prometheus.monitoring.rhobs/sample-monitoring-stack serverside-applied
Copy to Clipboard Copied! Toggle word wrap Toggle overflow With
--force-conflicts
flag, the field can be forced to change, but since the same field is also managed by theMonitoringStack
resource, the Observability Operator detects the change, and reverts it back to the value set by theMonitoringStack
resource.NoteSome Prometheus fields generated by the
MonitoringStack
resource are influenced by the fields in theMonitoringStack
spec
stanza, for example,logLevel
. These can be changed by changing theMonitoringStack
spec
.To change the
logLevel
in the Prometheus object, apply the following YAML to change theMonitoringStack
resource:Copy to Clipboard Copied! Toggle word wrap Toggle overflow To confirm that the change has taken place, query for the log level by running the following command:
oc -n coo-demo get Prometheus.monitoring.rhobs -o=jsonpath='{.items[0].spec.logLevel}'
$ oc -n coo-demo get Prometheus.monitoring.rhobs -o=jsonpath='{.items[0].spec.logLevel}'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
info
info
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
If a new version of an Operator generates a field that was previously generated and controlled by an actor, the value set by the actor will be overridden.
For example, you are managing a field
enforcedSampleLimit
which is not generated by theMonitoringStack
resource. If the Observability Operator is upgraded, and the new version of the Operator generates a value forenforcedSampleLimit
, this will overide the value you have previously set.-
The
Prometheus
object generated by theMonitoringStack
resource may contain some fields which are not explicitly set by the monitoring stack. These fields appear because they have default values.
Chapter 3. Installing the Cluster Observability Operator Copy linkLink copied to clipboard!
As a cluster administrator, you can install or remove the Cluster Observability Operator (COO) from OperatorHub by using the OpenShift Container Platform web console. OperatorHub is a user interface that works in conjunction with Operator Lifecycle Manager (OLM), which installs and manages Operators on a cluster.
3.1. Installing the Cluster Observability Operator in the web console Copy linkLink copied to clipboard!
Install the Cluster Observability Operator (COO) from OperatorHub by using the OpenShift Container Platform web console.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
cluster role. - You have logged in to the OpenShift Container Platform web console.
Procedure
- In the OpenShift Container Platform web console, click Operators → OperatorHub.
-
Type
cluster observability operator
in the Filter by keyword box. - Click Cluster Observability Operator in the list of results.
Read the information about the Operator, and configure the following installation settings:
- Update channel → stable
- Version → 1.0.0 or later
- Installation mode → All namespaces on the cluster (default)
- Installed Namespace → Operator recommended Namespace: openshift-cluster-observability-operator
- Select Enable Operator recommended cluster monitoring on this Namespace
- Update approval → Automatic
- Optional: You can change the installation settings to suit your requirements. For example, you can select to subscribe to a different update channel, to install an older released version of the Operator, or to require manual approval for updates to new versions of the Operator.
- Click Install.
Verification
- Go to Operators → Installed Operators, and verify that the Cluster Observability Operator entry appears in the list.
Additional resources
3.2. Uninstalling the Cluster Observability Operator using the web console Copy linkLink copied to clipboard!
If you have installed the Cluster Observability Operator (COO) by using OperatorHub, you can uninstall it in the OpenShift Container Platform web console.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
cluster role. - You have logged in to the OpenShift Container Platform web console.
Procedure
- Go to Operators → Installed Operators.
- Locate the Cluster Observability Operator entry in the list.
-
Click
for this entry and select Uninstall Operator.
Verification
- Go to Operators → Installed Operators, and verify that the Cluster Observability Operator entry no longer appears in the list.
Chapter 4. Configuring the Cluster Observability Operator to monitor a service Copy linkLink copied to clipboard!
You can monitor metrics for a service by configuring monitoring stacks managed by the Cluster Observability Operator (COO).
To test monitoring a service, follow these steps:
- Deploy a sample service that defines a service endpoint.
-
Create a
ServiceMonitor
object that specifies how the service is to be monitored by the COO. -
Create a
MonitoringStack
object to discover theServiceMonitor
object.
4.1. Deploying a sample service for Cluster Observability Operator Copy linkLink copied to clipboard!
This configuration deploys a sample service named prometheus-coo-example-app
in the user-defined ns1-coo
project. The service exposes the custom version
metric.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
cluster role or as a user with administrative permissions for the namespace.
Procedure
Create a YAML file named
prometheus-coo-example-app.yaml
that contains the following configuration details for a namespace, deployment, and service:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Save the file.
Apply the configuration to the cluster by running the following command:
oc apply -f prometheus-coo-example-app.yaml
$ oc apply -f prometheus-coo-example-app.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the pod is running by running the following command and observing the output:
oc -n ns1-coo get pod
$ oc -n ns1-coo get pod
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME READY STATUS RESTARTS AGE prometheus-coo-example-app-0927545cb7-anskj 1/1 Running 0 81m
NAME READY STATUS RESTARTS AGE prometheus-coo-example-app-0927545cb7-anskj 1/1 Running 0 81m
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.2. Specifying how a service is monitored by Cluster Observability Operator Copy linkLink copied to clipboard!
To use the metrics exposed by the sample service you created in the "Deploying a sample service for Cluster Observability Operator" section, you must configure monitoring components to scrape metrics from the /metrics
endpoint.
You can create this configuration by using a ServiceMonitor
object that specifies how the service is to be monitored, or a PodMonitor
object that specifies how a pod is to be monitored. The ServiceMonitor
object requires a Service
object. The PodMonitor
object does not, which enables the MonitoringStack
object to scrape metrics directly from the metrics endpoint exposed by a pod.
This procedure shows how to create a ServiceMonitor
object for a sample service named prometheus-coo-example-app
in the ns1-coo
namespace.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
cluster role or as a user with administrative permissions for the namespace. - You have installed the Cluster Observability Operator.
You have deployed the
prometheus-coo-example-app
sample service in thens1-coo
namespace.NoteThe
prometheus-coo-example-app
sample service does not support TLS authentication.
Procedure
Create a YAML file named
example-coo-app-service-monitor.yaml
that contains the followingServiceMonitor
object configuration details:Copy to Clipboard Copied! Toggle word wrap Toggle overflow This configuration defines a
ServiceMonitor
object that theMonitoringStack
object will reference to scrape the metrics data exposed by theprometheus-coo-example-app
sample service.Apply the configuration to the cluster by running the following command:
oc apply -f example-coo-app-service-monitor.yaml
$ oc apply -f example-coo-app-service-monitor.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the
ServiceMonitor
resource is created by running the following command and observing the output:oc -n ns1-coo get servicemonitors.monitoring.rhobs
$ oc -n ns1-coo get servicemonitors.monitoring.rhobs
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME AGE prometheus-coo-example-monitor 81m
NAME AGE prometheus-coo-example-monitor 81m
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.3. Creating a MonitoringStack object for the Cluster Observability Operator Copy linkLink copied to clipboard!
To scrape the metrics data exposed by the target prometheus-coo-example-app
service, create a MonitoringStack
object that references the ServiceMonitor
object you created in the "Specifying how a service is monitored for Cluster Observability Operator" section. This MonitoringStack
object can then discover the service and scrape the exposed metrics data from it.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
cluster role or as a user with administrative permissions for the namespace. - You have installed the Cluster Observability Operator.
-
You have deployed the
prometheus-coo-example-app
sample service in thens1-coo
namespace. -
You have created a
ServiceMonitor
object namedprometheus-coo-example-monitor
in thens1-coo
namespace.
Procedure
-
Create a YAML file for the
MonitoringStack
object configuration. For this example, name the fileexample-coo-monitoring-stack.yaml
. Add the following
MonitoringStack
object configuration details:Example
MonitoringStack
objectCopy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the
MonitoringStack
object by running the following command:oc apply -f example-coo-monitoring-stack.yaml
$ oc apply -f example-coo-monitoring-stack.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the
MonitoringStack
object is available by running the following command and inspecting the output:oc -n ns1-coo get monitoringstack
$ oc -n ns1-coo get monitoringstack
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME AGE example-coo-monitoring-stack 81m
NAME AGE example-coo-monitoring-stack 81m
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run the following comand to retrieve information about the active targets from Prometheus and filter the output to list only targets labeled with
app=prometheus-coo-example-app
. This verifies which targets are discovered and actively monitored by Prometheus with this specific label.oc -n ns1-coo exec -c prometheus prometheus-example-coo-monitoring-stack-0 -- curl -s 'http://localhost:9090/api/v1/targets' | jq '.data.activeTargets[].discoveredLabels | select(.__meta_kubernetes_endpoints_label_app=="prometheus-coo-example-app")'
$ oc -n ns1-coo exec -c prometheus prometheus-example-coo-monitoring-stack-0 -- curl -s 'http://localhost:9090/api/v1/targets' | jq '.data.activeTargets[].discoveredLabels | select(.__meta_kubernetes_endpoints_label_app=="prometheus-coo-example-app")'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteThe above example uses
jq
command-line JSON processor to format the output for convenience.
4.4. Validating the monitoring stack Copy linkLink copied to clipboard!
To validate that the monitoring stack is working correctly, access the example service and then view the gathered metrics.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
cluster role or as a user with administrative permissions for the namespace. - You have installed the Cluster Observability Operator.
-
You have deployed the
prometheus-coo-example-app
sample service in thens1-coo
namespace. -
You have created a
ServiceMonitor
object namedprometheus-coo-example-monitor
in thens1-coo
namespace. -
You have created a
MonitoringStack
object namedexample-coo-monitoring-stack
in thens1-coo
namespace.
Procedure
Create a route to expose the example
prometheus-coo-example-app
service. From your terminal, run the command:oc expose svc prometheus-coo-example-app -n ns1-coo
$ oc expose svc prometheus-coo-example-app -n ns1-coo
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Access the route from your browser, or command line, to generate metrics.
Execute a query on the Prometheus pod to return the total HTTP requests metric:
oc -n ns1-coo exec -c prometheus prometheus-example-coo-monitoring-stack-0 -- curl -s 'http://localhost:9090/api/v1/query?query=http_requests_total'
$ oc -n ns1-coo exec -c prometheus prometheus-example-coo-monitoring-stack-0 -- curl -s 'http://localhost:9090/api/v1/query?query=http_requests_total'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output (formatted using
jq
for convenience)Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.5. Scrape targets in multiple namespaces Copy linkLink copied to clipboard!
To scrape targets in multiple namespaces, set the namespace and resource selector in the MonitoringStack
object.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
cluster role or as a user with administrative permissions for the namespace. - You have installed the Cluster Observability Operator.
Procedure
Deploy the following namespace object and
MonitoringStack
YAML file:Example
MonitoringStack
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Deploy a sample application in the namespace
ns1-coo
, with an alert that is always firing:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Deploy the same example application in another namespace labeled with
monitoring.rhobs/stack: multi-ns
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Verify that the Prometheus instance adds new targets and that the alert are firing. Use a port-forward command to expose the Prometheus or the Alertmanager user interface that has been deployed by the
Monitoringstack
instance.Prometheus
oc port-forward -n ns1-coo pod/prometheus-example-coo-monitoring-stack-0 9090
$ oc port-forward -n ns1-coo pod/prometheus-example-coo-monitoring-stack-0 9090
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Alertmanager
oc port-forward -n ns1-coo pod/alertmanager-example-coo-monitoring-stack-0 9093
$ oc port-forward -n ns1-coo pod/alertmanager-example-coo-monitoring-stack-0 9093
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Verify that the targets are being scraped and that the alerts are firing by browsing to
http://localhost:9090/targets
orhttp://localhost:9093/#/alerts
.
Chapter 5. Observability UI plugins Copy linkLink copied to clipboard!
5.1. Observability UI plugins overview Copy linkLink copied to clipboard!
You can use the Cluster Observability Operator (COO) to install and manage UI plugins to enhance the observability capabilities of the OpenShift Container Platform web console. The plugins extend the default functionality, providing new UI features for troubleshooting, distributed tracing, and cluster logging.
5.1.1. Monitoring Copy linkLink copied to clipboard!
The monitoring UI plugin adds monitoring related UI features to the OpenShift web console, for the Advance Cluster Management (ACM) perspective.
- ACM: The monitoring plugin in Cluster Observability Operator (COO) allows it to function in Red Hat Advanced Cluster Management (RHACM) environments, providing ACM with the same monitoring capabilities as OpenShift Container Platform.
For more information, see the monitoring UI plugin page.
5.1.2. Cluster logging Copy linkLink copied to clipboard!
The logging UI plugin surfaces logging data in the web console on the Observe → Logs page. You can specify filters, queries, time ranges and refresh rates. The results displayed a list of collapsed logs, which can then be expanded to show more detailed information for each log.
For more information, see the logging UI plugin page.
5.1.3. Troubleshooting Copy linkLink copied to clipboard!
The Cluster Observability Operator troubleshooting panel UI plugin is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
The troubleshooting panel UI plugin for OpenShift Container Platform version 4.16+ provides observability signal correlation, powered by the open source Korrel8r project. You can use the troubleshooting panel available from the Observe → Alerting page to easily correlate metrics, logs, alerts, netflows, and additional observability signals and resources, across different data stores. Users of OpenShift Container Platform version 4.17+ can also access the troubleshooting UI panel from the Application Launcher
.
The output of Korrel8r is displayed as an interactive node graph. When you click on a node, you are automatically redirected to the corresponding web console page with the specific information for that node, for example, metric, log, or pod.
For more information, see the troubleshooting UI plugin page.
5.1.4. Distributed tracing Copy linkLink copied to clipboard!
The distributed tracing UI plugin adds tracing-related features to the web console on the Observe → Traces page. You can follow requests through the front end and into the backend of microservices, helping you identify code errors and performance bottlenecks in distributed systems. You can select a supported TempoStack
or TempoMonolithic
multi-tenant instance running in the cluster and set a time range and query to view the trace data.
For more information, see the distributed tracing UI plugin page.
5.2. Monitoring UI plugin Copy linkLink copied to clipboard!
The Cluster Observability Operator monitoring UI plugin is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
The monitoring UI plugin adds monitoring features to the Administrator perspective of the OpenShift web console.
- RHACM: The monitoring plugin in Cluster Observability Operator (COO) allows it to function in Red Hat Advanced Cluster Management (RHACM) environments, providing RHACM with the same alerting capabilities as OpenShift Container Platform. You can configure the plugin to fetch alerts from the RHACM Alertmanager backend. This enables seamless integration and user experience by aligning RHACM and OpenShift Container Platform monitoring workflows.
5.2.1. Installing the Cluster Observability Operator monitoring UI plugin Copy linkLink copied to clipboard!
The monitoring UI plugin adds monitoring related UI features to the OpenShift web console, for the Advance Cluster Management (ACM) perspective.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
cluster role. - You have logged in to the OpenShift Container Platform web console.
- You have installed the Cluster Observability Operator
Procedure
- In the OpenShift Container Platform web console, click Operators → Installed Operators and select Cluster Observability Operator
- Choose the UI Plugin tab (at the far right of the tab list) and press Create UIPlugin
Select YAML view, enter the following content, and then press Create:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Enable RHACM features. You must configure the Alertmanager and ThanosQuerier Service endpoints.
5.3. Logging UI plugin Copy linkLink copied to clipboard!
The logging UI plugin surfaces logging data in the OpenShift Container Platform web console on the Observe → Logs page. You can specify filters, queries, time ranges and refresh rates, with the results displayed as a list of collapsed logs, which can then be expanded to show more detailed information for each log.
When you have also deployed the Troubleshooting UI plugin on OpenShift Container Platform version 4.16+, it connects to the Korrel8r service and adds direct links from the Administration perspective, from the Observe → Logs page, to the Observe → Metrics page with a correlated PromQL query. It also adds a See Related Logs link from the Administration perspective alerting detail page, at Observe → Alerting, to the Observe → Logs page with a correlated filter set selected.
The features of the plugin are categorized as:
- dev-console
- Adds the logging view to the Developer perspective.
- alerts
- Merges the web console alerts with log-based alerts defined in the Loki ruler. Adds a log-based metrics chart in the alert detail view.
- dev-alerts
- Merges the web console alerts with log-based alerts defined in the Loki ruler. Adds a log-based metrics chart in the alert detail view for the Developer perspective.
For Cluster Observability Operator (COO) versions, the support for these features in OpenShift Container Platform versions is shown in the following table:
COO version | OCP versions | Features |
---|---|---|
0.3.0+ | 4.12 |
|
0.3.0+ | 4.13 |
|
0.3.0+ | 4.14+ |
|
5.3.1. Installing the Cluster Observability Operator logging UI plugin Copy linkLink copied to clipboard!
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
role. - You have logged in to the OpenShift Container Platform web console.
- You have installed the Cluster Observability Operator.
-
You have a
LokiStack
instance in your cluster.
Procedure
- In the OpenShift Container Platform web console, click Operators → Installed Operators and select Cluster Observability Operator.
- Choose the UI Plugin tab (at the far right of the tab list) and click Create UIPlugin.
Select YAML view, enter the following content, and then click Create:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
schema
is one ofotel
,viaq
, orselect
. The default isviaq
if no value is specified. When you chooseselect
, you can select the mode in the UI when you run a query.
NoteThese are the known issues for the logging UI plugin - for more information, see OU-587.
-
The
schema
feature is only supported in OpenShift Container Platform 4.15 and later. In earlier versions of OpenShift Container Platform, the logging UI plugin will only use theviaq
attribute, ignoring any other values that might be set. -
Non-administrator users cannot query logs using the
otel
attribute with logging for Red Hat OpenShift versions 5.8 to 6.2. This issue will be fixed in a future logging release. (LOG-6589) -
In logging for Red Hat OpenShift version 5.9, the
severity_text
Otel attribute is not set.
5.4. Distributed tracing UI plugin Copy linkLink copied to clipboard!
The distributed tracing UI plugin adds tracing-related features to the Administrator perspective of the OpenShift web console at Observe → Traces. You can follow requests through the front end and into the backend of microservices, helping you identify code errors and performance bottlenecks in distributed systems.
5.4.1. Installing the Cluster Observability Operator distributed tracing UI plugin Copy linkLink copied to clipboard!
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
cluster role. - You have logged in to the OpenShift Container Platform web console.
- You have installed the Cluster Observability Operator
Procedure
- In the OpenShift Container Platform web console, click Operators → Installed Operators and select Cluster Observability Operator
- Choose the UI Plugin tab (at the far right of the tab list) and press Create UIPlugin
Select YAML view, enter the following content, and then press Create:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
5.4.2. Using the Cluster Observability Operator distributed tracing UI plugin Copy linkLink copied to clipboard!
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
cluster role. - You have logged in to the OpenShift Container Platform web console.
- You have installed the Cluster Observability Operator.
- You have installed the Cluster Observability Operator distributed tracing UI plugin.
-
You have a
TempoStack
orTempoMonolithic
multi-tenant instance in the cluster.
Procedure
- In the Administrator perspective of the OpenShift Container Platform web console, click Observe → Traces.
Select a
TempoStack
orTempoMonolithic
multi-tenant instance and set a time range and query for the traces to be loaded.The traces are displayed on a scatter-plot showing the trace start time, duration, and number of spans. Underneath the scatter plot, there is a list of traces showing information such as the
Trace Name
, number ofSpans
, andDuration
.Click on a trace name link.
The trace detail page for the selected trace contains a Gantt Chart of all of the spans within the trace. Select a span to show a breakdown of the configured attributes.
5.5. Troubleshooting UI plugin Copy linkLink copied to clipboard!
The Cluster Observability Operator troubleshooting panel UI plugin is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
The troubleshooting UI plugin for OpenShift Container Platform version 4.16+ provides observability signal correlation, powered by the open source Korrel8r project. With the troubleshooting panel that is available under Observe → Alerting, you can easily correlate metrics, logs, alerts, netflows, and additional observability signals and resources, across different data stores. Users of OpenShift Container Platform version 4.17+ can also access the troubleshooting UI panel from the Application Launcher
.
When you install the troubleshooting UI plugin, a Korrel8r service named korrel8r
is deployed in the same namespace, and it is able to locate related observability signals and Kubernetes resources from its correlation engine.
The output of Korrel8r is displayed in the form of an interactive node graph in the OpenShift Container Platform web console. Nodes in the graph represent a type of resource or signal, while edges represent relationships. When you click on a node, you are automatically redirected to the corresponding web console page with the specific information for that node, for example, metric, log, pod.
5.5.1. Installing the Cluster Observability Operator Troubleshooting UI plugin Copy linkLink copied to clipboard!
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
cluster role. - You have logged in to the OpenShift Container Platform web console.
- You have installed the Cluster Observability Operator
Procedure
- In the OpenShift Container Platform web console, click Operators → Installed Operators and select Cluster Observability Operator
- Choose the UI Plugin tab (at the far right of the tab list) and press Create UIPlugin
Select YAML view, enter the following content, and then press Create:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
5.5.2. Using the Cluster Observability Operator troubleshooting UI plugin Copy linkLink copied to clipboard!
Prerequisites
-
You have access to the OpenShift Container Platform cluster as a user with the
cluster-admin
cluster role. If your cluster version is 4.17+, you can access the troubleshooting UI panel from the Application Launcher.
- You have logged in to the OpenShift Container Platform web console.
- You have installed OpenShift Container Platform Logging, if you want to visualize correlated logs.
- You have installed OpenShift Container Platform Network Observability, if you want to visualize correlated netflows.
- You have installed the Cluster Observability Operator.
You have installed the Cluster Observability Operator troubleshooting UI plugin.
NoteThe troubleshooting panel relies on the observability signal stores installed in your cluster. Kuberenetes resources, alerts and metrics are always available by default in an OpenShift Container Platform cluster. Other signal types require optional components to be installed:
- Logs: Red Hat Openshift Logging (collection) and Loki Operator provided by Red Hat (store)
- Network events: Network observability provided by Red Hat (collection) and Loki Operator provided by Red Hat (store)
Procedure
In the admin perspective of the web console, navigate to Observe → Alerting and then select an alert. If the alert has correlated items, a Troubleshooting Panel link will appear above the chart on the alert detail page.
Click on the Troubleshooting Panel link to display the panel.
-
The panel consists of query details and a topology graph of the query results. The selected alert is converted into a Korrel8r query string and sent to the
korrel8r
service. The results are displayed as a graph network connecting the returned signals and resources. This is a neighbourhood graph, starting at the current resource and including related objects up to 3 steps away from the starting point. Clicking on nodes in the graph takes you to the corresponding web console pages for those resouces. You can use the troubleshooting panel to find resources relating to the chosen alert.
NoteClicking on a node may sometimes show fewer results than indicated on the graph. This is a known issue that will be addressed in a future release.
-
Alert (1): This node is the starting point in the graph and represents the
KubeContainerWaiting
alert displayed in the web console. -
Pod (1): This node indicates that there is a single
Pod
resource associated with this alert. Clicking on this node will open a console search showing the related pod directly. - Event (2): There are two Kuberenetes events associated with the pod. Click this node to see the events.
- Logs (74): This pod has 74 lines of logs, which you can access by clicking on this node.
- Metrics (105): There are many metrics associated with the pod.
-
Network (6): There are network events, meaning the pod has communicated over the network. The remaining nodes in the graph represent the
Service
,Deployment
andDaemonSet
resources that the pod has communicated with. - Focus: Clicking this button updates the graph. By default, the graph itself does not change when you click on nodes in the graph. Instead, the main web console page changes, and you can then navigate to other resources using links on the page, while the troubleshooting panel itself stays open and unchanged. To force an update to the graph in the troubleshooting panel, click Focus. This draws a new graph, using the current resource in the web console as the starting point.
Show Query: Clicking this button enables some experimental features:
- Hide Query hides the experimental features.
- The query that identifies the starting point for the graph. The query language, part of the Korrel8r correlation engine used to create the graphs, is experimental and may change in future. The query is updated by the Focus button to correspond to the resources in the main web console window.
Neighbourhood depth is used to display a smaller or larger neighbourhood.
NoteSetting a large value in a large cluster might cause the query to fail, if the number of results is too big.
Goal class results in a goal directed search instead of a neighbourhood search. A goal directed search shows all paths from the starting point to the goal class, which indicates a type of resource or signal. The format of the goal class is experimental and may change. Currently, the following goals are valid:
-
k8s:RESOURCE[VERSION.[GROUP]]
identifying a kind of kuberenetes resource. For examplek8s:Pod
ork8s:Deployment.apps.v1
. -
alert:alert
representing any alert. -
metric:metric
representing any metric. -
netflow:network
representing any network observability network event. -
log:LOG_TYPE
representing stored logs, whereLOG_TYPE
must be one ofapplication
,infrastructure
oraudit
.
-
-
Alert (1): This node is the starting point in the graph and represents the
5.5.3. Creating the example alert Copy linkLink copied to clipboard!
To trigger an alert as a starting point to use in the troubleshooting UI panel, you can deploy a container that is deliberately misconfigured.
Procedure
Use the following YAML, either from the command line or in the web console, to create a broken deployment in a system namespace:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The deployment must be in a system namespace (such as
default
) to cause the desired alerts. - 2
- This container deliberately tries to start a
vector
server with no configuration file. The server logs a few messages, and then exits with an error. Alternatively, you can deploy any container you like that is badly configured, causing it to trigger an alert.
View the alerts:
Go to Observe → Alerting and click clear all filters. View the
Pending
alerts.ImportantAlerts first appear in the
Pending
state. They do not startFiring
until the container has been crashing for some time. By viewingPending
alerts, you do not have to wait as long to see them occur.-
Choose one of the
KubeContainerWaiting
,KubePodCrashLooping
, orKubePodNotReady
alerts and open the troubleshooting panel by clicking on the link. Alternatively, if the panel is already open, click the "Focus" button to update the graph.
Chapter 6. Monitoring API reference Copy linkLink copied to clipboard!
The resource types are MonitoringStack
and ThanosQuerier
.
6.1. MonitoringStack Copy linkLink copied to clipboard!
MonitoringStack
is the Schema for the monitoringstacks API.
Name | Type | Description | Required |
---|---|---|---|
| string |
| true |
| string |
| true |
object |
Refer to the Kubernetes API documentation for the fields of the | true | |
object |
| true | |
object |
| false |
6.2. MonitoringStack.spec Copy linkLink copied to clipboard!
MonitoringStackSpec
is the specification for the desired MonitoringStack
object.
Name | Type | Description | Required |
---|---|---|---|
object |
Define
Default: | false | |
| enum | Set log levels of configured components.
Enum:
Default: | false |
object |
Namespace selector for
| false | |
|
|
Define node selector for | false |
object | Define prometheus config.
Default: | false | |
object |
Label selector for
| false | |
object |
Define resources requests and limits for
Default: | false | |
| string |
Time duration to retain data. The string must match the regular expression
Default: | false |
|
Define tolerations for | false |
6.3. MonitoringStack.spec.alertmanagerConfig Copy linkLink copied to clipboard!
Define Alertmanager
config.
Name | Type | Description | Required |
---|---|---|---|
| boolean |
Disables the deployment of Default: false | false |
object |
Configure TLS options for the | false |
6.4. MonitoringStack.spec.alertmanagerConfig.webTLSConfig Copy linkLink copied to clipboard!
Configure TLS options for the Alertmanager
web server.
Name | Type | Description | Required |
---|---|---|---|
object | Reference to the TLS public certificate for the web server. | true | |
object | Reference to the root Certificate Authority used to verify the web server’s certificate. | true | |
object | Reference to the TLS private key for the web server. | true |
6.5. MonitoringStack.spec.alertmanagerConfig.webTLSConfig.certificate Copy linkLink copied to clipboard!
Reference to the TLS public certificate for the web server.
Name | Type | Description | Required |
---|---|---|---|
| string | The key of the secret to select from. Must be a valid secret key. | true |
| string | The name of the secret in the object’s namespace to select from. | true |
6.6. MonitoringStack.spec.alertmanagerConfig.webTLSConfig.certificateAuthority Copy linkLink copied to clipboard!
Reference to the root Certificate Authority used to verify the web server’s certificate.
Name | Type | Description | Required |
---|---|---|---|
| string | The key of the secret to select from. Must be a valid secret key. | true |
| string | The name of the secret in the object’s namespace to select from. | true |
6.7. MonitoringStack.spec.alertmanagerConfig.webTLSConfig.privateKey Copy linkLink copied to clipboard!
Reference to the TLS private key for the web server.
Name | Type | Description | Required |
---|---|---|---|
| string | The key of the secret to select from. Must be a valid secret key. | true |
| string | The name of the secret in the object’s namespace to select from. | true |
6.8. MonitoringStack.spec.namespaceSelector Copy linkLink copied to clipboard!
Namespace selector for MonitoringStack
resources.
-
To monitor everything, set to empty map selector. For example,
namespaceSelector: {}
. -
To monitor resources in the namespace where the
MonitoringStack
instance was created, set to null. For example,namespaceSelector:
.
Name | Type | Description | Required |
---|---|---|---|
|
| false | |
|
|
| false |
6.9. MonitoringStack.spec.namespaceSelector.matchExpressions[index] Copy linkLink copied to clipboard!
A label selector requirement is a selector that contains values, a key, and an operator that relates the key and values.
Name | Type | Description | Required |
---|---|---|---|
| string |
| true |
| string |
| true |
|
|
| false |
6.10. MonitoringStack.spec.prometheusConfig Copy linkLink copied to clipboard!
Define Prometheus configuration.
Name | Type | Description | Required |
---|---|---|---|
| boolean |
Enable Prometheus to accept OpenTelemetry Metrics via the | false |
| boolean |
Enable Prometheus to be used as a receiver for the Prometheus remote write protocol. Defaults to the value of | false |
|
|
Define | false |
object | Define persistent volume claim for Prometheus. | false | |
| Define remote write for Prometheus. | false | |
| integer | Number of replicas/pods to deploy for a Prometheus deployment. Format: int32 Default: 2 Minimum: 0 | false |
| string | Default interval between scrapes. | false |
object | Configure TLS options for the Prometheus web server. | false |
6.11. MonitoringStack.spec.prometheusConfig.persistentVolumeClaim Copy linkLink copied to clipboard!
Define persistent volume claim for Prometheus.
Name | Type | Description | Required |
---|---|---|---|
|
|
| false |
object |
If the provisioner or an external controller can support the specified data source, it will create a new volume based on the contents of the specified data source.
When the | false | |
object |
When this field is specified, volume binding will only succeed if the type of the specified object matches some installed volume populator or dynamic provisioner. This field will replace the functionality of the
For backwards compatibility, when
There are three important differences between
Beta: Using this field requires the
Alpha: Using the namespace field of | false | |
object |
If For more information, see Kubernetes Docs | false | |
object |
| false | |
| string |
| false |
| string |
If specified, the CSI driver will create or update the volume with the attributes defined in the corresponding
If unspecified and the For more information, see Kubernetes Docs
Beta: Using this field requires the | false |
| string |
| false |
| string |
| false |
6.12. MonitoringStack.spec.prometheusConfig.persistentVolumeClaim.dataSource Copy linkLink copied to clipboard!
dataSource
field can be used to specify either:
-
An existing
VolumeSnapshot
object (snapshot.storage.k8s.io/VolumeSnapshot
) - An existing PVC (PersistentVolumeClaim)
If the provisioner or an external controller can support the specified data source, it will create a new volume based on the contents of the specified data source. When the AnyVolumeDataSource
feature gate is enabled, dataSource
contents will be copied to dataSourceRef
, and dataSourceRef
contents will be copied to dataSource
when dataSourceRef.namespace
is not specified. If the namespace is specified, then dataSourceRef
will not be copied to dataSource
.
Name | Type | Description | Required |
---|---|---|---|
| string |
| true |
| string |
| true |
| string |
| false |
6.13. MonitoringStack.spec.prometheusConfig.persistentVolumeClaim.dataSourceRef Copy linkLink copied to clipboard!
dataSourceRef
specifies the object from which to populate the volume with data, if a non-empty volume is desired. This may be any object from a non-empty API group (non core object) or a PersistentVolumeClaim
object. When this field is specified, volume binding will only succeed if the type of the specified object matches some installed volume populator or dynamic provisioner.
This field will replace the functionality of the dataSource
field and as such if both fields are non-empty, they must have the same value. For backwards compatibility, when namespace
isn’t specified in dataSourceRef
, both fields (dataSource
and dataSourceRef
) will be set to the same value automatically if one of them is empty and the other is non-empty. When namespace
is specified in dataSourceRef
, dataSource
isn’t set to the same value and must be empty.
There are three important differences between dataSource
and dataSourceRef
:
-
While
dataSource
only allows two specific types of objects,dataSourceRef
allows any non-core object, as well asPersistentVolumeClaim
objects. -
While
dataSource
ignores disallowed values (dropping them),dataSourceRef
preserves all values, and generates an error if a disallowed value is specified. -
While
dataSource
only allows local objects,dataSourceRef
allows objects in any namespaces.
Beta: Using this field requires the AnyVolumeDataSource
feature gate to be enabled.
Alpha: Using the namespace field of dataSourceRef
requires the CrossNamespaceVolumeDataSource
feature gate to be enabled.
Name | Type | Description | Required |
---|---|---|---|
| string |
| true |
| string |
| true |
| string |
If | false |
| string |
Note that when a namespace is specified, a
Alpha: This field requires the | false |
6.14. MonitoringStack.spec.prometheusConfig.persistentVolumeClaim.resources Copy linkLink copied to clipboard!
resources
represents the minimum resources the volume should have.
If RecoverVolumeExpansionFailure
feature is enabled users are allowed to specify resource requirements that are lower than the previous value but must still be higher than the capacity recorded in the status field of the claim.
For more information, see Kubernetes Docs
Name | Type | Description | Required |
---|---|---|---|
|
|
For more information, see Kubernetes Docs | false |
|
|
If For more information, see Kubernetes Docs | false |
6.15. MonitoringStack.spec.prometheusConfig.persistentVolumeClaim.selector Copy linkLink copied to clipboard!
selector
is a label query over volumes to consider for binding.
Name | Type | Description | Required |
---|---|---|---|
|
| false | |
|
|
A single {key,value} in the | false |
6.16. MonitoringStack.spec.prometheusConfig.persistentVolumeClaim.selector.matchExpressions[index] Copy linkLink copied to clipboard!
A label selector requirement is a selector that contains values, a key, and an operator that relates the key and values.
Name | Type | Description | Required |
---|---|---|---|
| string |
| true |
| string |
| true |
|
|
If the operator is | false |
6.17. MonitoringStack.spec.prometheusConfig.remoteWrite[index] Copy linkLink copied to clipboard!
RemoteWriteSpec
defines the configuration to write samples from Prometheus to a remote endpoint.
Name | Type | Description | Required |
---|---|---|---|
| string | The URL of the endpoint to send samples to. | true |
object | Authorization section for the URL.
| false | |
object | AzureAD for the URL.
| false | |
object | BasicAuth configuration for the URL.
| false | |
| string |
Warning: this field should not be used because the token value appears in clear-text. Prefer using Deprecated: this will be removed in a future release. | false |
| string | File from which to read bearer token for the URL.
Deprecated: this will be removed in a future release. Prefer using | false |
| boolean | Whether to enable HTTP2. | false |
| boolean | Configure whether HTTP requests follow HTTP 3xx redirects. It requires Prometheus >= v2.26.0. | false |
|
| Custom HTTP headers to be sent along with each remote write request. Be aware that headers that are set by Prometheus itself can’t be overwritten. It requires Prometheus >= v2.25.0. | false |
| enum | The Remote Write message’s version to use when writing to the endpoint.
Enum: | false |
object |
| false | |
| string | The name of the remote write queue, it must be unique if specified. The name is used in metrics and logging in order to differentiate queues. It requires Prometheus >= v2.15.0. | false |
| string |
It requires Prometheus >= v2.43.0 or Alertmanager >= 0.25.0. | false |
object | OAuth2 configuration for the URL.
| false | |
|
It requires Prometheus >= v2.43.0 or Alertmanager >= 0.25.0. | false | |
| boolean |
Whether to use the proxy configuration defined by environment variables ( It requires Prometheus >= v2.43.0 or Alertmanager >= 0.25.0. | false |
| string |
| false |
object |
| false | |
| string | Timeout for requests to the remote write endpoint. | false |
| boolean |
Enables sending of exemplars over remote write. Note that exemplar-storage itself must be enabled using the It requires Prometheus >= v2.27.0. | false |
| boolean | Enables sending of native histograms, also known as sparse histograms over remote write. It requires Prometheus >= v2.40.0. | false |
object |
| false | |
object | TLS Config to use for the URL. | false | |
| The list of remote write relabel configurations. | false |
6.18. MonitoringStack.spec.prometheusConfig.remoteWrite[index].authorization Copy linkLink copied to clipboard!
Authorization section for the URL.
- It requires Prometheus >= v2.26.0.
-
Cannot be set at the same time as
sigv4
,basicAuth
,oauth2
, orazureAd
.
Name | Type | Description | Required |
---|---|---|---|
object | Selects a key of a Secret in the namespace that contains the credentials for authentication. | false | |
| string |
File to read a secret from, mutually exclusive with | false |
| string | Defines the authentication type. The value is case-insensitive. "Basic" is not a supported value. Default: "Bearer" | false |
6.19. MonitoringStack.spec.prometheusConfig.remoteWrite[index].authorization.credentials Copy linkLink copied to clipboard!
Selects a key of a Secret in the namespace that contains the credentials for authentication.
Name | Type | Description | Required |
---|---|---|---|
| string | The key of the secret to select from. Must be a valid secret key. | true |
| string | Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong. For more information, see Kubernetes Docs | false |
| boolean | Specify whether the Secret or its key must be defined | false |
6.20. MonitoringStack.spec.prometheusConfig.remoteWrite[index].azureAd Copy linkLink copied to clipboard!
AzureAD for the URL.
- It requires Prometheus >= v2.45.0.
-
Cannot be set at the same time as
authorization
,basicAuth
,oauth2
, orsigv4
.
Name | Type | Description | Required |
---|---|---|---|
| enum | The Azure Cloud. Options are 'AzurePublic', 'AzureChina', or 'AzureGovernment'.
Enum: | false |
object |
| false | |
object |
| false | |
object |
| false |
6.21. MonitoringStack.spec.prometheusConfig.remoteWrite[index].azureAd.managedIdentity Copy linkLink copied to clipboard!
ManagedIdentity defines the Azure User-assigned Managed identity.
-
Cannot be set at the same time as
oauth
orsdk
.
Name | Type | Description | Required |
---|---|---|---|
| string | The client id | true |
6.22. MonitoringStack.spec.prometheusConfig.remoteWrite[index].azureAd.oauth Copy linkLink copied to clipboard!
OAuth
defines the oauth config that is being used to authenticate.
-
Cannot be set at the same time as
managedIdentity
orsdk
. - It requires Prometheus >= v2.48.0.
Name | Type | Description | Required |
---|---|---|---|
| string |
| true |
object |
| true | |
| string |
| true |
6.23. MonitoringStack.spec.prometheusConfig.remoteWrite[index].azureAd.oauth.clientSecret Copy linkLink copied to clipboard!
clientSecret
specifies a key of a Secret containing the client secret of the Azure Active Directory application that is being used to authenticate.
Name | Type | Description | Required |
---|---|---|---|
| string | The key of the secret to select from. Must be a valid secret key. | true |
| string | Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong. For more information, see Kubernetes Docs | false |
| boolean | Specify whether the Secret or its key must be defined | false |
6.24. MonitoringStack.spec.prometheusConfig.remoteWrite[index].azureAd.sdk Copy linkLink copied to clipboard!
SDK
defines the Azure SDK config that is being used to authenticate.
For more information, see Azure SDK Authentication.
-
Cannot be set at the same time as
oauth
ormanagedIdentity
. - It requires Prometheus >= 2.52.0.
Name | Type | Description | Required |
---|---|---|---|
| string |
| false |
6.25. MonitoringStack.spec.prometheusConfig.remoteWrite[index].basicAuth Copy linkLink copied to clipboard!
BasicAuth configuration for the URL.
-
Cannot be set at the same time as
sigv4
,authorization
,oauth2
, orazureAd
.
6.26. MonitoringStack.spec.prometheusConfig.remoteWrite[index].basicAuth.password Copy linkLink copied to clipboard!
password
specifies a key of a Secret containing the password for authentication.
Name | Type | Description | Required |
---|---|---|---|
| string | The key of the secret to select from. Must be a valid secret key. | true |
| string | Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong. For more information, see Kubernetes Docs | false |
| boolean | Specify whether the Secret or its key must be defined | false |
6.27. MonitoringStack.spec.prometheusConfig.remoteWrite[index].basicAuth.username Copy linkLink copied to clipboard!
username
specifies a key of a Secret containing the username for authentication.
Name | Type | Description | Required |
---|---|---|---|
| string | The key of the secret to select from. Must be a valid secret key. | true |
| string | Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong. For more information, see Kubernetes Docs | false |
| boolean | Specify whether the Secret or its key must be defined | false |
6.28. MonitoringStack.spec.prometheusConfig.remoteWrite[index].metadataConfig Copy linkLink copied to clipboard!
MetadataConfig
configures the sending of series metadata to the remote storage.
Name | Type | Description | Required |
---|---|---|---|
| boolean | Defines whether metric metadata is sent to the remote storage or not. | false |
| string | Defines how frequently metric metadata is sent to the remote storage. | false |
6.29. MonitoringStack.spec.prometheusConfig.remoteWrite[index].oauth2 Copy linkLink copied to clipboard!
OAuth2 configuration for the URL.
- It requires Prometheus >= v2.27.0.
-
Cannot be set at the same time as
sigv4
,authorization
,basicAuth
, orazureAd
.
Name | Type | Description | Required |
---|---|---|---|
object |
| true | |
object |
| true | |
| string |
| true |
|
|
| false |
| string |
It requires Prometheus >= v2.43.0 or Alertmanager >= 0.25.0. | false |
|
ProxyConnectHeader optionally specifies headers to send to proxies during It requires Prometheus >= v2.43.0 or Alertmanager >= 0.25.0. | false | |
| boolean |
Whether to use the proxy configuration defined by environment variables ( It requires Prometheus >= v2.43.0 or Alertmanager >= 0.25.0. | false |
| string |
| false |
|
|
| false |
object | TLS configuration to use when connecting to the OAuth2 server. It requires Prometheus >= v2.43.0. | false |
6.30. MonitoringStack.spec.prometheusConfig.remoteWrite[index].oauth2.clientId Copy linkLink copied to clipboard!
clientId
specifies a key of a Secret or ConfigMap containing the OAuth2 client’s ID.
6.31. MonitoringStack.spec.prometheusConfig.remoteWrite[index].oauth2.clientId.configMap Copy linkLink copied to clipboard!
ConfigMap containing data to use for the targets.
Name | Type | Description | Required |
---|---|---|---|
| string | The key to select. | true |
| string | Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong. For more information, see Kubernetes Docs | false |
| boolean | Specify whether the ConfigMap or its key must be defined | false |
6.32. MonitoringStack.spec.prometheusConfig.remoteWrite[index].oauth2.clientId.secret Copy linkLink copied to clipboard!
Secret containing data to use for the targets.
Name | Type | Description | Required |
---|---|---|---|
| string | The key of the secret to select from. Must be a valid secret key. | true |
| string | Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong. For more information, see Kubernetes Docs | false |
| boolean | Specify whether the Secret or its key must be defined | false |
6.33. MonitoringStack.spec.prometheusConfig.remoteWrite[index].oauth2.clientSecret Copy linkLink copied to clipboard!
clientSecret
specifies a key of a Secret containing the OAuth2 client’s secret.
Name | Type | Description | Required |
---|---|---|---|
| string | The key of the secret to select from. Must be a valid secret key. | true |
| string | Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong. For more information, see Kubernetes Docs | false |
| boolean | Specify whether the Secret or its key must be defined | false |
6.34. MonitoringStack.spec.prometheusConfig.remoteWrite[index].oauth2.proxyConnectHeader[key][index] Copy linkLink copied to clipboard!
SecretKeySelector selects a key of a Secret.
Name | Type | Description | Required |
---|---|---|---|
| string | The key of the secret to select from. Must be a valid secret key. | true |
| string | Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong. For more information, see Kubernetes Docs | false |
| boolean | Specify whether the Secret or its key must be defined | false |
6.35. MonitoringStack.spec.prometheusConfig.remoteWrite[index].oauth2.tlsConfig Copy linkLink copied to clipboard!
TLS configuration to use when connecting to the OAuth2 server.
- It requires Prometheus >= v2.43.0.
Name | Type | Description | Required |
---|---|---|---|
object | Certificate authority used when verifying server certificates. | false | |
object | Client certificate to present when doing client-authentication. | false | |
| boolean | Disable target certificate validation. | false |
object | Secret containing the client key file for the targets. | false | |
| enum | Maximum acceptable TLS version. It requires Prometheus >= v2.41.0.
Enum: | false |
| enum | Minimum acceptable TLS version. It requires Prometheus >= v2.35.0.
Enum: | false |
| string | Used to verify the hostname for the targets. | false |
6.36. MonitoringStack.spec.prometheusConfig.remoteWrite[index].oauth2.tlsConfig.ca Copy linkLink copied to clipboard!
Certificate authority used when verifying server certificates.
6.37. MonitoringStack.spec.prometheusConfig.remoteWrite[index].oauth2.tlsConfig.ca.configMap Copy linkLink copied to clipboard!
ConfigMap containing data to use for the targets.
Name | Type | Description | Required |
---|---|---|---|
| string | The key to select. | true |
| string | Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong. For more information, see Kubernetes Docs | false |
| boolean | Specify whether the ConfigMap or its key must be defined | false |
6.38. MonitoringStack.spec.prometheusConfig.remoteWrite[index].oauth2.tlsConfig.ca.secret Copy linkLink copied to clipboard!
Secret containing data to use for the targets.
Name | Type | Description | Required |
---|---|---|---|
| string | The key of the secret to select from. Must be a valid secret key. | true |
| string | Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong. For more information, see Kubernetes Docs | false |
| boolean | Specify whether the Secret or its key must be defined | false |
6.39. MonitoringStack.spec.prometheusConfig.remoteWrite[index].oauth2.tlsConfig.cert Copy linkLink copied to clipboard!
Client certificate to present when doing client-authentication.
6.40. MonitoringStack.spec.prometheusConfig.remoteWrite[index].oauth2.tlsConfig.cert.configMap Copy linkLink copied to clipboard!
ConfigMap containing data to use for the targets.
Name | Type | Description | Required |
---|---|---|---|
| string | The key to select. | true |
| string | Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong. For more information, see Kubernetes Docs | false |
| boolean | Specify whether the ConfigMap or its key must be defined | false |
6.41. MonitoringStack.spec.prometheusConfig.remoteWrite[index].oauth2.tlsConfig.cert.secret Copy linkLink copied to clipboard!
Secret containing data to use for the targets.
Name | Type | Description | Required |
---|---|---|---|
| string | The key of the secret to select from. Must be a valid secret key. | true |
| string | Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong. For more information, see Kubernetes Docs | false |
| boolean | Specify whether the Secret or its key must be defined | false |
6.42. MonitoringStack.spec.prometheusConfig.remoteWrite[index].oauth2.tlsConfig.keySecret Copy linkLink copied to clipboard!
Secret containing the client key file for the targets.
Name | Type | Description | Required |
---|---|---|---|
| string | The key of the secret to select from. Must be a valid secret key. | true |
| string | Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong. For more information, see Kubernetes Docs | false |
| boolean | Specify whether the Secret or its key must be defined | false |
6.43. MonitoringStack.spec.prometheusConfig.remoteWrite[index].proxyConnectHeader[key][index] Copy linkLink copied to clipboard!
SecretKeySelector selects a key of a Secret.
Name | Type | Description | Required |
---|---|---|---|
| string | The key of the secret to select from. Must be a valid secret key. | true |
| string | Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong. For more information, see Kubernetes Docs | false |
| boolean | Specify whether the Secret or its key must be defined | false |
6.44. MonitoringStack.spec.prometheusConfig.remoteWrite[index].queueConfig Copy linkLink copied to clipboard!
QueueConfig allows tuning of the remote write queue parameters.
Name | Type | Description | Required |
---|---|---|---|
| string |
| false |
| integer |
| false |
| string |
| false |
| integer |
| false |
| integer |
| false |
| integer |
| false |
| string |
| false |
| integer |
| false |
| boolean | Retry upon receiving a 429 status code from the remote-write storage. This is an experimental feature, it may change in any upcoming release in a breaking way. | false |
| string |
It requires Prometheus >= v2.50.0. | false |
6.45. MonitoringStack.spec.prometheusConfig.remoteWrite[index].sigv4 Copy linkLink copied to clipboard!
Sigv4 allows to configures AWS’s Signature Verification 4 for the URL.
- It requires Prometheus >= v2.26.0.
-
Cannot be set at the same time as
authorization
,basicAuth
,oauth2
, orazureAd
.
Name | Type | Description | Required |
---|---|---|---|
object |
| false | |
| string |
| false |
| string |
| false |
| string |
| false |
object |
| false |
6.46. MonitoringStack.spec.prometheusConfig.remoteWrite[index].sigv4.accessKey Copy linkLink copied to clipboard!
AccessKey
is the AWS API key. If not specified, the environment variable AWS_ACCESS_KEY_ID
is used.
Name | Type | Description | Required |
---|---|---|---|
| string | The key of the secret to select from. Must be a valid secret key. | true |
| string | Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong. For more information, see Kubernetes Docs | false |
| boolean | Specify whether the Secret or its key must be defined | false |
6.47. MonitoringStack.spec.prometheusConfig.remoteWrite[index].sigv4.secretKey Copy linkLink copied to clipboard!
SecretKey
is the AWS API secret. If not specified, the environment variable AWS_SECRET_ACCESS_KEY
is used.
Name | Type | Description | Required |
---|---|---|---|
| string | The key of the secret to select from. Must be a valid secret key. | true |
| string | Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong. For more information, see Kubernetes Docs | false |
| boolean | Specify whether the Secret or its key must be defined | false |
6.48. MonitoringStack.spec.prometheusConfig.remoteWrite[index].tlsConfig Copy linkLink copied to clipboard!
TLS Config to use for the URL.
Name | Type | Description | Required |
---|---|---|---|
object | Certificate authority used when verifying server certificates. | false | |
| string | Path to the CA cert in the Prometheus container to use for the targets. | false |
object | Client certificate to present when doing client-authentication. | false | |
| string | Path to the client cert file in the Prometheus container for the targets. | false |
| boolean | Disable target certificate validation. | false |
| string | Path to the client key file in the Prometheus container for the targets. | false |
object | Secret containing the client key file for the targets. | false | |
| enum | Maximum acceptable TLS version. It requires Prometheus >= v2.41.0.
Enum: | false |
| enum | Minimum acceptable TLS version. It requires Prometheus >= v2.35.0.
Enum: | false |
| string | Used to verify the hostname for the targets. | false |
6.49. MonitoringStack.spec.prometheusConfig.remoteWrite[index].tlsConfig.ca Copy linkLink copied to clipboard!
Certificate authority used when verifying server certificates.
6.50. MonitoringStack.spec.prometheusConfig.remoteWrite[index].tlsConfig.ca.configMap Copy linkLink copied to clipboard!
ConfigMap containing data to use for the targets.
Name | Type | Description | Required |
---|---|---|---|
| string | The key to select. | true |
| string | Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong. For more information, see Kubernetes Docs | false |
| boolean | Specify whether the ConfigMap or its key must be defined | false |
6.51. MonitoringStack.spec.prometheusConfig.remoteWrite[index].tlsConfig.ca.secret Copy linkLink copied to clipboard!
Secret containing data to use for the targets.
Name | Type | Description | Required |
---|---|---|---|
| string | The key of the secret to select from. Must be a valid secret key. | true |
| string | Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong. For more information, see Kubernetes Docs | false |
| boolean | Specify whether the Secret or its key must be defined | false |
6.52. MonitoringStack.spec.prometheusConfig.remoteWrite[index].tlsConfig.cert Copy linkLink copied to clipboard!
Client certificate to present when doing client-authentication.
6.53. MonitoringStack.spec.prometheusConfig.remoteWrite[index].tlsConfig.cert.configMap Copy linkLink copied to clipboard!
ConfigMap containing data to use for the targets.
Name | Type | Description | Required |
---|---|---|---|
| string | The key to select. | true |
| string | Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong. For more information, see Kubernetes Docs | false |
| boolean | Specify whether the ConfigMap or its key must be defined | false |
6.54. MonitoringStack.spec.prometheusConfig.remoteWrite[index].tlsConfig.cert.secret Copy linkLink copied to clipboard!
Secret containing data to use for the targets.
Name | Type | Description | Required |
---|---|---|---|
| string | The key of the secret to select from. Must be a valid secret key. | true |
| string | Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong. For more information, see Kubernetes Docs | false |
| boolean | Specify whether the Secret or its key must be defined | false |
6.55. MonitoringStack.spec.prometheusConfig.remoteWrite[index].tlsConfig.keySecret Copy linkLink copied to clipboard!
Secret containing the client key file for the targets.
Name | Type | Description | Required |
---|---|---|---|
| string | The key of the secret to select from. Must be a valid secret key. | true |
| string | Name of the referent. This field is effectively required, but due to backwards compatibility is allowed to be empty. Instances of this type with an empty value here are almost certainly wrong. For more information, see Kubernetes Docs | false |
| boolean | Specify whether the Secret or its key must be defined | false |
6.56. MonitoringStack.spec.prometheusConfig.remoteWrite[index].writeRelabelConfigs[index] Copy linkLink copied to clipboard!
RelabelConfig allows dynamic rewriting of the label set for targets, alerts, scraped samples and remote write samples.
For more information, see Prometheus Docs
Name | Type | Description | Required |
---|---|---|---|
| enum | Action to perform based on the regex matching.
Enum:
Default: | false |
| integer | Modulus to take of the hash of the source label values.
Only applicable when the action is Format: int64 | false |
| string | Regular expression against which the extracted value is matched. | false |
| string | Replacement value against which a Replace action is performed if the regular expression matches. Regex capture groups are available. | false |
| string |
| false |
|
| The source labels select values from existing labels. Their content is concatenated using the configured Separator and matched against the configured regular expression. | false |
| string | Label to which the resulting string is written in a replacement.
It is mandatory for Regex capture groups are available. | false |
6.57. MonitoringStack.spec.prometheusConfig.webTLSConfig Copy linkLink copied to clipboard!
Configure TLS options for the Prometheus web server.
Name | Type | Description | Required |
---|---|---|---|
object | Reference to the TLS public certificate for the web server. | true | |
object | Reference to the root Certificate Authority used to verify the web server’s certificate. | true | |
object | Reference to the TLS private key for the web server. | true |
6.58. MonitoringStack.spec.prometheusConfig.webTLSConfig.certificate Copy linkLink copied to clipboard!
Reference to the TLS public certificate for the web server.
Name | Type | Description | Required |
---|---|---|---|
| string | The key of the secret to select from. Must be a valid secret key. | true |
| string | The name of the secret in the object’s namespace to select from. | true |
6.59. MonitoringStack.spec.prometheusConfig.webTLSConfig.certificateAuthority Copy linkLink copied to clipboard!
Reference to the root Certificate Authority used to verify the web server’s certificate.
Name | Type | Description | Required |
---|---|---|---|
| string | The key of the secret to select from. Must be a valid secret key. | true |
| string | The name of the secret in the object’s namespace to select from. | true |
6.60. MonitoringStack.spec.prometheusConfig.webTLSConfig.privateKey Copy linkLink copied to clipboard!
Reference to the TLS private key for the web server.
Name | Type | Description | Required |
---|---|---|---|
| string | The key of the secret to select from. Must be a valid secret key. | true |
| string | The name of the secret in the object’s namespace to select from. | true |
6.61. MonitoringStack.spec.resourceSelector Copy linkLink copied to clipboard!
Label selector for MonitoringStack
resources.
-
To monitor everything, set to empty map selector. For example,
resourceSelector: {}
. -
To disable service discovery, set to null. For example,
resourceSelector:
.
Name | Type | Description | Required |
---|---|---|---|
|
| false | |
|
|
| false |
6.62. MonitoringStack.spec.resourceSelector.matchExpressions[index] Copy linkLink copied to clipboard!
A label selector requirement is a selector that contains values, a key, and an operator that relates the key and values.
Name | Type | Description | Required |
---|---|---|---|
| string |
| true |
| string |
| true |
|
|
| false |
6.63. MonitoringStack.spec.resources Copy linkLink copied to clipboard!
Define resources requests and limits for MonitoringStack
pods.
Name | Type | Description | Required |
---|---|---|---|
|
This is an alpha field and requires enabling the This field is immutable. It can only be set for containers. | false | |
|
|
For more information, see Kubernetes Docs | false |
|
|
For more information, see Kubernetes Docs | false |
6.64. MonitoringStack.spec.resources.claims[index] Copy linkLink copied to clipboard!
ResourceClaim
references one entry in PodSpec.ResourceClaims
.
Name | Type | Description | Required |
---|---|---|---|
| string |
| true |
| string |
| false |
6.65. MonitoringStack.spec.tolerations[index] Copy linkLink copied to clipboard!
The pod this Toleration
is attached to tolerates any taint that matches the triple <key,value,effect>
using the matching operator <operator>
.
Name | Type | Description | Required |
---|---|---|---|
| string |
| false |
| string |
| false |
| string |
| false |
| integer |
Format: int64 | false |
| string |
| false |
6.66. MonitoringStack.status Copy linkLink copied to clipboard!
MonitoringStackStatus
defines the observed state of the MonitoringStack
instance. It should always be reconstructable from the state of the cluster and/or outside world.
Name | Type | Description | Required |
---|---|---|---|
|
| true |
6.67. MonitoringStack.status.conditions[index] Copy linkLink copied to clipboard!
Name | Type | Description | Required |
---|---|---|---|
| string |
Format: date-time | true |
| string |
| true |
| string |
| true |
| enum | status of the condition
Enum: | true |
| string |
| true |
| integer |
Format: int64 Minimum: 0 | false |
6.68. ThanosQuerier Copy linkLink copied to clipboard!
ThanosQuerier outlines the Thanos querier components, managed by this stack
Name | Type | Description | Required |
---|---|---|---|
| string |
| true |
| string |
| true |
object |
Refer to the Kubernetes API documentation for the fields of the | true | |
object |
| false | |
| object |
| false |
6.69. ThanosQuerier.spec Copy linkLink copied to clipboard!
ThanosQuerierSpec
defines a single Thanos Querier instance. This means a label selector by which MonitoringStack
instances to query are selected, and an optional namespace selector and a list of replica labels by which to deduplicate.
Name | Type | Description | Required |
---|---|---|---|
object | Selector to select monitoring stacks to unify. | true | |
object |
Selector to select which namespaces the | false | |
|
| false | |
object | Configure TLS options for the Thanos web server. | false |
6.70. ThanosQuerier.spec.selector Copy linkLink copied to clipboard!
Selector to select monitoring stacks to unify.
Name | Type | Description | Required |
---|---|---|---|
|
| false | |
|
|
| false |
6.71. ThanosQuerier.spec.selector.matchExpressions[index] Copy linkLink copied to clipboard!
A label selector requirement is a selector that contains values, a key, and an operator that relates the key and values.
Name | Type | Description | Required |
---|---|---|---|
| string |
| true |
| string |
| true |
|
|
| false |
6.72. ThanosQuerier.spec.namespaceSelector Copy linkLink copied to clipboard!
Selector to select which namespaces the MonitoringStack
objects are discovered from.
Name | Type | Description | Required |
---|---|---|---|
| boolean | Boolean describing whether all namespaces are selected in contrast to a list restricting them. | false |
|
| List of namespace names. | false |
6.73. ThanosQuerier.spec.webTLSConfig Copy linkLink copied to clipboard!
Configure TLS options for the Thanos web server.
Name | Type | Description | Required |
---|---|---|---|
object | Reference to the TLS public certificate for the web server. | true | |
object | Reference to the root Certificate Authority used to verify the web server’s certificate. | true | |
object | Reference to the TLS private key for the web server. | true |
6.74. ThanosQuerier.spec.webTLSConfig.certificate Copy linkLink copied to clipboard!
Reference to the TLS public certificate for the web server.
Name | Type | Description | Required |
---|---|---|---|
| string | The key of the secret to select from. Must be a valid secret key. | true |
| string | The name of the secret in the object’s namespace to select from. | true |
6.75. ThanosQuerier.spec.webTLSConfig.certificateAuthority Copy linkLink copied to clipboard!
Reference to the root Certificate Authority used to verify the web server’s certificate.
Name | Type | Description | Required |
---|---|---|---|
| string | The key of the secret to select from. Must be a valid secret key. | true |
| string | The name of the secret in the object’s namespace to select from. | true |
6.76. ThanosQuerier.spec.webTLSConfig.privateKey Copy linkLink copied to clipboard!
Reference to the TLS private key for the web server.
Name | Type | Description | Required |
---|---|---|---|
| string | The key of the secret to select from. Must be a valid secret key. | true |
| string | The name of the secret in the object’s namespace to select from. | true |
Legal Notice
Copy linkLink copied to clipboard!
Copyright © 2025 Red Hat
OpenShift documentation is licensed under the Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0).
Modified versions must remove all Red Hat trademarks.
Portions adapted from https://github.com/kubernetes-incubator/service-catalog/ with modifications by Red Hat.
Red Hat, Red Hat Enterprise Linux, the Red Hat logo, the Shadowman logo, JBoss, OpenShift, Fedora, the Infinity logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries.
Linux® is the registered trademark of Linus Torvalds in the United States and other countries.
Java® is a registered trademark of Oracle and/or its affiliates.
XFS® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries.
MySQL® is a registered trademark of MySQL AB in the United States, the European Union and other countries.
Node.js® is an official trademark of Joyent. Red Hat Software Collections is not formally related to or endorsed by the official Joyent Node.js open source or commercial project.
The OpenStack® Word Mark and OpenStack logo are either registered trademarks/service marks or trademarks/service marks of the OpenStack Foundation, in the United States and other countries and are used with the OpenStack Foundation’s permission. We are not affiliated with, endorsed or sponsored by the OpenStack Foundation, or the OpenStack community.
All other trademarks are the property of their respective owners.