Chapter 9. Managing metrics

You can collect metrics to monitor how cluster components and your own workloads are performing.

9.1. Understanding metrics

In OpenShift Container Platform 4.17, cluster components are monitored by scraping metrics exposed through service endpoints. You can also configure metrics collection for user-defined projects. Metrics enable you to monitor how cluster components and your own workloads are performing.

You can define the metrics that you want to provide for your own workloads by using Prometheus client libraries at the application level.

In OpenShift Container Platform, metrics are exposed through an HTTP service endpoint under the /metrics canonical name. You can list all available metrics for a service by running a curl query against http://<endpoint>/metrics. For instance, you can expose a route to the prometheus-example-app example application and then run the following to view all of its available metrics:

$ curl http://<example_app_endpoint>/metrics

Example output

# HELP http_requests_total Count of all HTTP requests
# TYPE http_requests_total counter
http_requests_total{code="200",method="get"} 4
http_requests_total{code="404",method="get"} 2
# HELP version Version information about this binary
# TYPE version gauge
version{version="v0.1.0"} 1

Additional resources

Prometheus client library documentation

9.2. Setting up metrics collection for user-defined projects

You can create a ServiceMonitor resource to scrape metrics from a service endpoint in a user-defined project. This assumes that your application uses a Prometheus client library to expose metrics to the /metrics canonical name.

This section describes how to deploy a sample service in a user-defined project and then create a ServiceMonitor resource that defines how that service should be monitored.

9.2.1. Deploying a sample service

To test monitoring of a service in a user-defined project, you can deploy a sample service.

Prerequisites

You have access to the cluster as a user with the cluster-admin cluster role or as a user with administrative permissions for the namespace.

Procedure

Create a YAML file for the service configuration. In this example, it is called prometheus-example-app.yaml.

Add the following deployment and service configuration details to the file:

apiVersion: v1
kind: Namespace
metadata:
  name: ns1
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: prometheus-example-app
  name: prometheus-example-app
  namespace: ns1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus-example-app
  template:
    metadata:
      labels:
        app: prometheus-example-app
    spec:
      containers:
      - image: ghcr.io/rhobs/prometheus-example-app:0.4.2
        imagePullPolicy: IfNotPresent
        name: prometheus-example-app
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: prometheus-example-app
  name: prometheus-example-app
  namespace: ns1
spec:
  ports:
  - port: 8080
    protocol: TCP
    targetPort: 8080
    name: web
  selector:
    app: prometheus-example-app
  type: ClusterIP

This configuration deploys a service named prometheus-example-app in the user-defined ns1 project. This service exposes the custom version metric.

Apply the configuration to the cluster:
```
$ oc apply -f prometheus-example-app.yaml
```
It takes some time to deploy the service.

You can check that the pod is running:

$ oc -n ns1 get pod

Example output

NAME                                      READY     STATUS    RESTARTS   AGE
prometheus-example-app-7857545cb7-sbgwq   1/1       Running   0          81m

9.2.2. Specifying how a service is monitored

To use the metrics exposed by your service, you must configure OpenShift Container Platform monitoring to scrape metrics from the /metrics endpoint. You can do this using a ServiceMonitor custom resource definition (CRD) that specifies how a service should be monitored, or a PodMonitor CRD that specifies how a pod should be monitored. The former requires a Service object, while the latter does not, allowing Prometheus to directly scrape metrics from the metrics endpoint exposed by a pod.

This procedure shows you how to create a ServiceMonitor resource for a service in a user-defined project.

Prerequisites

You have access to the cluster as a user with the cluster-admin cluster role or the monitoring-edit cluster role.
You have enabled monitoring for user-defined projects.
For this example, you have deployed the prometheus-example-app sample service in the ns1 project.
Note
The prometheus-example-app sample service does not support TLS authentication.

Procedure

Create a new YAML configuration file named example-app-service-monitor.yaml.
Add a ServiceMonitor resource to the YAML file. The following example creates a service monitor named prometheus-example-monitor to scrape metrics exposed by the prometheus-example-app service in the ns1 namespace:
```
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: prometheus-example-monitor
  namespace: ns1 1
spec:
  endpoints:
  - interval: 30s
    port: web 2
    scheme: http
  selector: 3
    matchLabels:
      app: prometheus-example-app
```
1
Specify a user-defined namespace where your service runs.
2
Specify endpoint ports to be scraped by Prometheus.
3
Configure a selector to match your service based on its metadata labels.
Note
A ServiceMonitor resource in a user-defined namespace can only discover services in the same namespace. That is, the namespaceSelector field of the ServiceMonitor resource is always ignored.
Apply the configuration to the cluster:
```
$ oc apply -f example-app-service-monitor.yaml
```
It takes some time to deploy the ServiceMonitor resource.

Verify that the ServiceMonitor resource is running:

$ oc -n <namespace> get servicemonitor

Example output

NAME                         AGE
prometheus-example-monitor   81m

9.2.3. Example service endpoint authentication settings

You can configure authentication for service endpoints for user-defined project monitoring by using ServiceMonitor and PodMonitor custom resource definitions (CRDs).

The following samples show different authentication settings for a ServiceMonitor resource. Each sample shows how to configure a corresponding Secret object that contains authentication credentials and other relevant settings.

9.2.3.1. Sample YAML authentication with a bearer token

The following sample shows bearer token settings for a Secret object named example-bearer-auth in the ns1 namespace:

Example bearer token secret

apiVersion: v1
kind: Secret
metadata:
  name: example-bearer-auth
  namespace: ns1
stringData:
  token: <authentication_token> 1

1: Specify an authentication token.

The following sample shows bearer token authentication settings for a ServiceMonitor CRD. The example uses a Secret object named example-bearer-auth:

Example bearer token authentication settings

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: prometheus-example-monitor
  namespace: ns1
spec:
  endpoints:
  - authorization:
      credentials:
        key: token 1
        name: example-bearer-auth 2
    port: web
  selector:
    matchLabels:
      app: prometheus-example-app

1: The key that contains the authentication token in the specified Secret object.
2: The name of the Secret object that contains the authentication credentials.

Important

Do not use bearerTokenFile to configure bearer token. If you use the bearerTokenFile configuration, the ServiceMonitor resource is rejected.

9.2.3.2. Sample YAML for Basic authentication

The following sample shows Basic authentication settings for a Secret object named example-basic-auth in the ns1 namespace:

Example Basic authentication secret

apiVersion: v1
kind: Secret
metadata:
  name: example-basic-auth
  namespace: ns1
stringData:
  user: <basic_username> 1
  password: <basic_password>  2

1: Specify a username for authentication.
2: Specify a password for authentication.

The following sample shows Basic authentication settings for a ServiceMonitor CRD. The example uses a Secret object named example-basic-auth:

Example Basic authentication settings

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: prometheus-example-monitor
  namespace: ns1
spec:
  endpoints:
  - basicAuth:
      username:
        key: user 1
        name: example-basic-auth 2
      password:
        key: password 3
        name: example-basic-auth 4
    port: web
  selector:
    matchLabels:
      app: prometheus-example-app

1: The key that contains the username in the specified Secret object.
2 4: The name of the Secret object that contains the Basic authentication.
3: The key that contains the password in the specified Secret object.

9.2.3.3. Sample YAML authentication with OAuth 2.0

The following sample shows OAuth 2.0 settings for a Secret object named example-oauth2 in the ns1 namespace:

Example OAuth 2.0 secret

apiVersion: v1
kind: Secret
metadata:
  name: example-oauth2
  namespace: ns1
stringData:
  id: <oauth2_id> 1
  secret: <oauth2_secret> 2

1: Specify an Oauth 2.0 ID.
2: Specify an Oauth 2.0 secret.

The following sample shows OAuth 2.0 authentication settings for a ServiceMonitor CRD. The example uses a Secret object named example-oauth2:

Example OAuth 2.0 authentication settings

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: prometheus-example-monitor
  namespace: ns1
spec:
  endpoints:
  - oauth2:
      clientId:
        secret:
          key: id 1
          name: example-oauth2 2
      clientSecret:
        key: secret 3
        name: example-oauth2 4
      tokenUrl: https://example.com/oauth2/token 5
    port: web
  selector:
    matchLabels:
      app: prometheus-example-app

1: The key that contains the OAuth 2.0 ID in the specified Secret object.
2 4: The name of the Secret object that contains the OAuth 2.0 credentials.
3: The key that contains the OAuth 2.0 secret in the specified Secret object.
5: The URL used to fetch a token with the specified clientId and clientSecret.

Additional resources

9.3. Viewing a list of available metrics

As a cluster administrator or as a user with view permissions for all projects, you can view a list of metrics available in a cluster and output the list in JSON format.

Prerequisites

You are a cluster administrator, or you have access to the cluster as a user with the cluster-monitoring-view cluster role.
You have installed the OpenShift Container Platform CLI (oc).
You have obtained the OpenShift Container Platform API route for Thanos Querier.
You are able to get a bearer token by using the oc whoami -t command.
Important
You can only use bearer token authentication to access the Thanos Querier API route.

Procedure

If you have not obtained the OpenShift Container Platform API route for Thanos Querier, run the following command:
```
$ oc get routes -n openshift-monitoring thanos-querier -o jsonpath='{.status.ingress[0].host}'
```
Retrieve a list of metrics in JSON format from the Thanos Querier API route by running the following command. This command uses oc to authenticate with a bearer token.
```
$ curl -k -H "Authorization: Bearer $(oc whoami -t)" https://<thanos_querier_route>/api/v1/metadata 1
```
1
Replace <thanos_querier_route> with the OpenShift Container Platform API route for Thanos Querier.

9.4. Querying metrics

The OpenShift Container Platform monitoring dashboard enables you to run Prometheus Query Language (PromQL) queries to examine metrics visualized on a plot. This functionality provides information about the state of a cluster and any user-defined workloads that you are monitoring.

As a cluster administrator, you can query metrics for all core OpenShift Container Platform and user-defined projects.

As a developer, you must specify a project name when querying metrics. You must have the required privileges to view metrics for the selected project.

9.4.1. Querying metrics for all projects as a cluster administrator

As a cluster administrator or as a user with view permissions for all projects, you can access metrics for all default OpenShift Container Platform and user-defined projects in the Metrics UI.

The Metrics UI includes predefined queries, for example, CPU, memory, bandwidth, or network packet for all projects. You can also run custom Prometheus Query Language (PromQL) queries.

Prerequisites

You have access to the cluster as a user with the cluster-admin cluster role or with view permissions for all projects.
You have installed the OpenShift CLI (oc).

Procedure

In the Administrator perspective of the OpenShift Container Platform web console, click Observe and go to the Metrics tab.

To add one or more queries, perform any of the following actions:

Option	Description
Select an existing query.	From the Select query drop-down list, select an existing query.
Create a custom query.	Add your Prometheus Query Language (PromQL) query to the Expression field. As you type a PromQL expression, autocomplete suggestions appear in a drop-down list. These suggestions include functions, metrics, labels, and time tokens. Use the keyboard arrows to select one of these suggested items and then press Enter to add the item to your expression. Move your mouse pointer over a suggested item to view a brief description of that item.
Add multiple queries.	Click Add query.
Duplicate an existing query.	Click the options menu next to the query, then choose Duplicate query.
Disable a query from being run.	Click the options menu next to the query and choose Disable query.

To run queries that you created, click Run queries. The metrics from the queries are visualized on the plot. If a query is invalid, the UI shows an error message.
Note
- When drawing time series graphs, queries that operate on large amounts of data might time out or overload the browser. To avoid this, click Hide graph and calibrate your query by using only the metrics table. Then, after finding a feasible query, enable the plot to draw the graphs.
- By default, the query table shows an expanded view that lists every metric and its current value. Click the ˅ down arrowhead to minimize the expanded view for a query.
Optional: The page URL now contains the queries you ran. To use this set of queries again in the future, save this URL.

Explore the visualized metrics. Initially, all metrics from all enabled queries are shown on the plot. Select which metrics are shown by performing any of the following actions:

Option	Description
Hide all metrics from a query.	Click the options menu for the query and click Hide all series.
Hide a specific metric.	Go to the query table and click the colored square near the metric name.
Zoom into the plot and change the time range.	Perform one of the following actions: Visually select the time range by clicking and dragging on the plot horizontally. Use the menu to select the time range.
Reset the time range.	Click Reset zoom.
Display outputs for all queries at a specific point in time.	Hover over the plot at the point you are interested in. The query outputs appear in a pop-up box.
Hide the plot.	Click Hide graph.

Additional resources

For more information about creating PromQL queries, see the Prometheus query documentation.

9.4.2. Querying metrics for user-defined projects as a developer

You can access metrics for a user-defined project as a developer or as a user with view permissions for the project.

The Metrics UI includes predefined queries, for example, CPU, memory, bandwidth, or network packet. These queries are restricted to the selected project. You can also run custom Prometheus Query Language (PromQL) queries for the project.

Note

Developers can only use the Developer perspective and not the Administrator perspective. As a developer, you can only query metrics for one project at a time.

Prerequisites

You have access to the cluster as a developer or as a user with view permissions for the project that you are viewing metrics for.
You have enabled monitoring for user-defined projects.
You have deployed a service in a user-defined project.
You have created a ServiceMonitor custom resource definition (CRD) for the service to define how the service is monitored.

Procedure

In the Developer perspective of the OpenShift Container Platform web console, click Observe and go to the Metrics tab.
Select the project that you want to view metrics for in the Project: list.

To add one or more queries, perform any of the following actions:

Option	Description
Select an existing query.	From the Select query drop-down list, select an existing query.
Create a custom query.	Add your Prometheus Query Language (PromQL) query to the Expression field. As you type a PromQL expression, autocomplete suggestions appear in a drop-down list. These suggestions include functions, metrics, labels, and time tokens. Use the keyboard arrows to select one of these suggested items and then press Enter to add the item to your expression. Move your mouse pointer over a suggested item to view a brief description of that item.
Add multiple queries.	Click Add query.
Duplicate an existing query.	Click the options menu next to the query, then choose Duplicate query.
Disable a query from being run.	Click the options menu next to the query and choose Disable query.

To run queries that you created, click Run queries. The metrics from the queries are visualized on the plot. If a query is invalid, the UI shows an error message.
Note
- When drawing time series graphs, queries that operate on large amounts of data might time out or overload the browser. To avoid this, click Hide graph and calibrate your query by using only the metrics table. Then, after finding a feasible query, enable the plot to draw the graphs.
- By default, the query table shows an expanded view that lists every metric and its current value. Click the ˅ down arrowhead to minimize the expanded view for a query.
Optional: The page URL now contains the queries you ran. To use this set of queries again in the future, save this URL.

Explore the visualized metrics. Initially, all metrics from all enabled queries are shown on the plot. Select which metrics are shown by performing any of the following actions:

Option	Description
Hide all metrics from a query.	Click the options menu for the query and click Hide all series.
Hide a specific metric.	Go to the query table and click the colored square near the metric name.
Zoom into the plot and change the time range.	Perform one of the following actions: Visually select the time range by clicking and dragging on the plot horizontally. Use the menu to select the time range.
Reset the time range.	Click Reset zoom.
Display outputs for all queries at a specific point in time.	Hover over the plot at the point you are interested in. The query outputs appear in a pop-up box.
Hide the plot.	Click Hide graph.

Additional resources

For more information about creating PromQL queries, see the Prometheus query documentation.

9.5. Getting detailed information about a metrics target

In the Administrator perspective in the OpenShift Container Platform web console, you can use the Metrics targets page to view, search, and filter the endpoints that are currently targeted for scraping, which helps you to identify and troubleshoot problems. For example, you can view the current status of targeted endpoints to see when OpenShift Container Platform Monitoring is not able to scrape metrics from a targeted component.

The Metrics targets page shows targets for default OpenShift Container Platform projects and for user-defined projects.

Prerequisites

You have access to the cluster as an administrator for the project for which you want to view metrics targets.

Procedure

In the Administrator perspective, select Observe Targets. The Metrics targets page opens with a list of all service endpoint targets that are being scraped for metrics.
This page shows details about targets for default OpenShift Container Platform and user-defined projects. This page lists the following information for each target:
- Service endpoint URL being scraped
- ServiceMonitor component being monitored
- The up or down status of the target
- Namespace
- Last scrape time
- Duration of the last scrape

Optional: The list of metrics targets can be long. To find a specific target, do any of the following:

Option	Description
Filter the targets by status and source.	Select filters in the Filter list. The following filtering options are available: Status filters: Up. The target is currently up and being actively scraped for metrics. Down. The target is currently down and not being scraped for metrics. Source filters: Platform. Platform-level targets relate only to default Red Hat OpenShift Service on AWS projects. These projects provide core Red Hat OpenShift Service on AWS functionality. User. User targets relate to user-defined projects. These projects are user-created and can be customized.
Search for a target by name or label.	Enter a search term in the Text or Label field next to the search box.
Sort the targets.	Click one or more of the Endpoint Status, Namespace, Last Scrape, and Scrape Duration column headers.

Option

Description

Filter the targets by status and source.

Select filters in the Filter list.

The following filtering options are available:

Status filters:
- Up. The target is currently up and being actively scraped for metrics.
- Down. The target is currently down and not being scraped for metrics.
Source filters:
- Platform. Platform-level targets relate only to default Red Hat OpenShift Service on AWS projects. These projects provide core Red Hat OpenShift Service on AWS functionality.
- User. User targets relate to user-defined projects. These projects are user-created and can be customized.

Search for a target by name or label.

Enter a search term in the Text or Label field next to the search box.

Sort the targets.

Click one or more of the Endpoint Status, Namespace, Last Scrape, and Scrape Duration column headers.

Click the URL in the Endpoint column for a target to navigate to its Target details page. This page provides information about the target, including the following:
- The endpoint URL being scraped for metrics
- The current Up or Down status of the target
- A link to the namespace
- A link to the ServiceMonitor details
- Labels attached to the target
- The most recent time that the target was scraped for metrics

Chapter 9. Managing metrics

9.1. Understanding metrics

9.2. Setting up metrics collection for user-defined projects

9.2.1. Deploying a sample service

9.2.2. Specifying how a service is monitored

9.2.3. Example service endpoint authentication settings

9.2.3.1. Sample YAML authentication with a bearer token

9.2.3.2. Sample YAML for Basic authentication

9.2.3.3. Sample YAML authentication with OAuth 2.0

9.3. Viewing a list of available metrics

9.4. Querying metrics

9.4.1. Querying metrics for all projects as a cluster administrator

9.4.2. Querying metrics for user-defined projects as a developer

9.5. Getting detailed information about a metrics target

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Red Hat legal and privacy links

Red Hat legal and privacy links