Home
Products
OpenShift Container Platform
4.5
Support
Chapter 2. Remote health monitoring with connected clusters

Chapter 2. Remote health monitoring with connected clusters

2.1. About remote health monitoring
Copy link

OpenShift Container Platform collects telemetry and configuration data about your cluster and reports it to Red Hat by using the Telemeter Client and the Insights Operator. The data that is provided to Red Hat enables the benefits outlined in this document.

A cluster that reports data to Red Hat through Telemetry and the Insights Operator is considered a connected cluster.

Telemetry is the term that Red Hat uses to describe the information being sent to Red Hat by the OpenShift Container Platform Telemeter Client. Lightweight attributes are sent from connected clusters to Red Hat to enable subscription management automation, monitor the health of clusters, assist with support, and improve customer experience.

The Insights Operator gathers OpenShift Container Platform configuration data and sends it to Red Hat. The data is used to produce insights about potential issues that a cluster might be exposed to. These insights are communicated to cluster administrators on cloud.redhat.com/openshift.

More information is provided in this document about these two processes.

Telemetry and Insights Operator benefits

Telemetry and the Insights Operator enable the following benefits for end-users:

Enhanced identification and resolution of issues. Events that might seem normal to an end-user can be observed by Red Hat from a broader perspective across a fleet of clusters. Some issues can be more rapidly identified from this point of view and resolved without an end-user needing to open a support case or file a Bugzilla.
Advanced release management. OpenShift Container Platform offers the candidate, fast, and stable release channels, which enable you to choose an update strategy. The graduation of a release from fast to stable is dependent on the success rate of updates and on the events seen during upgrades. With the information provided by connected clusters, Red Hat can improve the quality of releases to stable channels and react more rapidly to issues found in the fast channels.
Targeted prioritization of new features and functionality. The data collected provides insights about which areas of OpenShift Container Platform are used most. With this information, Red Hat can focus on developing the new features and functionality that have the greatest impact for our customers.
A streamlined support experience. You can provide a cluster ID for a connected cluster when creating a support ticket on the Red Hat Customer Portal. This enables Red Hat to deliver a streamlined support experience that is specific to your cluster, by using the connected information. This document provides more information about that enhanced support experience.
Predictive analytics. The insights displayed for your cluster on cloud.redhat.com/openshift are enabled by the information collected from connected clusters. Red Hat is investing in applying deep learning, machine learning, and artificial intelligence automation to help identify issues that OpenShift Container Platform clusters are exposed to.

2.1.1. About Telemetry
Copy link

Telemetry sends a carefully chosen subset of the cluster monitoring metrics to Red Hat. The Telemeter Client fetches the metrics values every four minutes and thirty seconds and uploads the data to Red Hat. These metrics are described in this document.

This stream of data is used by Red Hat to monitor the clusters in real-time and to react as necessary to problems that impact our customers. It also allows Red Hat to roll out OpenShift Container Platform upgrades to customers to minimize service impact and continuously improve the upgrade experience.

This debugging information is available to Red Hat Support and Engineering teams with the same restrictions as accessing data reported through support cases. All connected cluster information is used by Red Hat to help make OpenShift Container Platform better and more intuitive to use.

Additional resources

See the OpenShift Container Platform update documentation for more information about updating or upgrading a cluster.

2.1.1.1. Information collected by Telemetry
Copy link

The following information is collected by Telemetry:

The unique random identifier that is generated during an installation
Version information, including the OpenShift Container Platform cluster version and installed update details that are used to determine update version availability
Update information, including the number of updates available per cluster, the channel and image repository used for an update, update progress information, and the number of errors that occur in an update
The name of the provider platform that OpenShift Container Platform is deployed on and the data center location
Sizing information about clusters, machine types, and machines, including the number of CPU cores and the amount of RAM used for each
The number of etcd members and the number of objects stored in the etcd cluster
The OpenShift Container Platform framework components installed in a cluster and their condition and status
Usage information about components, features, and extensions
Usage details about Technology Previews and unsupported configurations
Information about degraded software
Information about nodes that are marked as NotReady
Events for all namespaces listed as "related objects" for a degraded Operator
Configuration details that help Red Hat Support to provide beneficial support for customers. This includes node configuration at the cloud infrastructure level, host names, IP addresses, Kubernetes pod names, namespaces, and services.
Information about the validity of certificates

Telemetry does not collect identifying information such as user names, or passwords. Red Hat does not intend to collect personal information. If Red Hat discovers that personal information has been inadvertently received, Red Hat will delete such information. To the extent that any telemetry data constitutes personal data, please refer to the Red Hat Privacy Statement for more information about Red Hat’s privacy practices.

Additional resources

See Showing data collected by Telemetry for details about how to list the attributes that Telemetry gathers from Prometheus in OpenShift Container Platform.
See the upstream cluster-monitoring-operator source code for a list of the attributes that Telemetry gathers from Prometheus.
Telemetry is installed and enabled by default. If you need to opt out of remote health reporting, see Opting out of remote health reporting.

2.1.2. About the Insights Operator
Copy link

The Insights Operator periodically gathers configuration and component failure status and, by default, reports that data every two hours to Red Hat. This information enables Red Hat to assess configuration and deeper failure data than is reported through Telemetry.

Users of OpenShift Container Platform can display the report of each cluster in Red Hat OpenShift Cluster Manager. If any issues have been identified, Insights provides further details and, if available, steps on how to solve a problem.

The Insights Operator does not collect identifying information, such as user names, passwords, or certificates. See Red Hat Insights Data & Application Security for information about Red Hat Insights data collection and controls.

Red Hat uses all connected cluster information to:

Proactively identify potential cluster issues and provide a solution and preventive actions in Red Hat OpenShift Cluster Manager
Improve OpenShift Container Platform by providing aggregated and critical information to product and support teams
Make OpenShift Container Platform more intuitive

Additional resources

The Insights Operator is installed and enabled by default. If you need to opt out of remote health reporting, see Opting out of remote health reporting.

2.1.2.1. Information collected by the Insights Operator
Copy link

The following information is collected by the Insights Operator:

General information about your cluster and its components to identify issues that are specific to your OpenShift Container Platform version and environment
Configuration files, such as the image registry configuration, of your cluster to determine incorrect settings and issues that are specific to parameters you set
Errors that occur in the cluster components
Progress information of running updates, and the status of any component upgrades
Details of the platform that OpenShift Container Platform is deployed on, such as Amazon Web Services, and the region that the cluster is located in
If an Operator reports an issue, information is collected about core OpenShift Container Platform pods in the openshift-* and kube-* projects. This includes state, resource, security context, volume information, and more.

Additional resources

See Showing data collected by the Insights Operator for details about how to review the data that is collected by the Insights Operator.
The Insights Operator source code is available for review and contribution. See the Insights Operator upstream project for a list of the items collected by the Insights Operator.

2.1.3. Understanding Telemetry and Insights Operator data flow
Copy link

The Telemeter Client collects selected time series data from the Prometheus API. The time series data is uploaded to api.openshift.com every four minutes and thirty seconds for processing.

The Insights Operator gathers selected data from the Kubernetes API and the Prometheus API into an archive. The archive is uploaded to cloud.redhat.com every two hours for processing. The Insights Operator also downloads the latest Insights analysis from cloud.redhat.com. This is used to populate the Insights status pop-up that is included in the Overview page in the OpenShift Container Platform web console.

All of the communication with Red Hat occurs over encrypted channels by using Transport Layer Security (TLS) and mutual certificate authentication. All of the data is encrypted in transit and at rest.

Access to the systems that handle customer data is controlled through multi-factor authentication and strict authorization controls. Access is granted on a need-to-know basis and is limited to required operations.

Telemetry and Insights Operator data flow

Additional resources

See About Cluster Monitoring for more information about the OpenShift Container Platform monitoring stack.
See Configuring your firewall for details about configuring a firewall and enabling endpoints for Telemetry and Insights

2.1.4. Additional details about how remote health monitoring data is used
Copy link

The information collected to enable remote health monitoring is detailed in Information collected by Telemetry and Information collected by the Insights Operator.

As further described in the preceding sections of this document, Red Hat collects data about your use of the Red Hat Product(s) for purposes such as providing support and upgrades, optimizing performance or configuration, minimizing service impacts, identifying and remediating threats, troubleshooting, improving the offerings and user experience, responding to issues, and for billing purposes if applicable.

Collection safeguards

Red Hat employs technical and organizational measures designed to protect the telemetry and configuration data.

Sharing

Red Hat may share the data collected through Telemetry and the Insights Operator internally within Red Hat to improve your user experience. Red Hat may share telemetry and configuration data with its business partners in an aggregated form that does not identify customers to help the partners better understand their markets and their customers’ use of Red Hat offerings or to ensure the successful integration of products jointly supported by those partners.

Third party service providers

Red Hat may engage certain service providers to assist in the collection and storage of the telemetry and configuration data.

User control / enabling and disabling telemetry and configuration data collection

You may disable OpenShift Container Platform Telemetry and the Insights Operator by following the instructions in Opting out of remote health reporting.

2.2. Showing data collected by remote health monitoring
Copy link

As an administrator, you can review the metrics collected by Telemetry and the Insights Operator.

2.2.1. Showing data collected by Telemetry
Copy link

You can see the cluster and components time series data captured by Telemetry.

Prerequisites

Install the OpenShift CLI (oc).
You must log in to the cluster with a user that has either the cluster-admin role or the cluster-monitoring-view role.

Procedure

Find the URL for the Prometheus service that runs in the OpenShift Container Platform cluster:

oc get route prometheus-k8s -n openshift-monitoring -o jsonpath="{.spec.host}"

$ oc get route prometheus-k8s -n openshift-monitoring -o jsonpath="{.spec.host}"

Copy to Clipboard

Toggle word wrap

Navigate to the URL.

Enter this query in the Expression input box and press Execute:

{__name__=~"cluster:usage:.*|count:up0|count:up1|cluster_version|cluster_version_available_updates|cluster_operator_up|cluster_operator_conditions|cluster_version_payload|cluster_installer|cluster_infrastructure_provider|cluster_feature_set|instance:etcd_object_counts:sum|ALERTS|code:apiserver_request_total:rate:sum|cluster:capacity_cpu_cores:sum|cluster:capacity_memory_bytes:sum|cluster:cpu_usage_cores:sum|cluster:memory_usage_bytes:sum|openshift:cpu_usage_cores:sum|openshift:memory_usage_bytes:sum|workload:cpu_usage_cores:sum|workload:memory_usage_bytes:sum|cluster:virt_platform_nodes:sum|cluster:node_instance_type_count:sum|cnv:vmi_status_running:count|node_role_os_version_machine:cpu_capacity_cores:sum|node_role_os_version_machine:cpu_capacity_sockets:sum|subscription_sync_total|csv_succeeded|csv_abnormal|ceph_cluster_total_bytes|ceph_cluster_total_used_raw_bytes|ceph_health_status|job:ceph_osd_metadata:count|job:kube_pv:count|job:ceph_pools_iops:total|job:ceph_pools_iops_bytes:total|job:ceph_versions_running:count|job:noobaa_total_unhealthy_buckets:sum|job:noobaa_bucket_count:sum|job:noobaa_total_object_count:sum|noobaa_accounts_num|noobaa_total_usage|console_url|cluster:network_attachment_definition_instances:max|cluster:network_attachment_definition_enabled_instance_up:max|insightsclient_request_send_total|cam_app_workload_migrations|cluster:apiserver_current_inflight_requests:sum:max_over_time:2m|cluster:telemetry_selected_series:count",alertstate=~"firing|"}

{__name__=~"cluster:usage:.*|count:up0|count:up1|cluster_version|cluster_version_available_updates|cluster_operator_up|cluster_operator_conditions|cluster_version_payload|cluster_installer|cluster_infrastructure_provider|cluster_feature_set|instance:etcd_object_counts:sum|ALERTS|code:apiserver_request_total:rate:sum|cluster:capacity_cpu_cores:sum|cluster:capacity_memory_bytes:sum|cluster:cpu_usage_cores:sum|cluster:memory_usage_bytes:sum|openshift:cpu_usage_cores:sum|openshift:memory_usage_bytes:sum|workload:cpu_usage_cores:sum|workload:memory_usage_bytes:sum|cluster:virt_platform_nodes:sum|cluster:node_instance_type_count:sum|cnv:vmi_status_running:count|node_role_os_version_machine:cpu_capacity_cores:sum|node_role_os_version_machine:cpu_capacity_sockets:sum|subscription_sync_total|csv_succeeded|csv_abnormal|ceph_cluster_total_bytes|ceph_cluster_total_used_raw_bytes|ceph_health_status|job:ceph_osd_metadata:count|job:kube_pv:count|job:ceph_pools_iops:total|job:ceph_pools_iops_bytes:total|job:ceph_versions_running:count|job:noobaa_total_unhealthy_buckets:sum|job:noobaa_bucket_count:sum|job:noobaa_total_object_count:sum|noobaa_accounts_num|noobaa_total_usage|console_url|cluster:network_attachment_definition_instances:max|cluster:network_attachment_definition_enabled_instance_up:max|insightsclient_request_send_total|cam_app_workload_migrations|cluster:apiserver_current_inflight_requests:sum:max_over_time:2m|cluster:telemetry_selected_series:count",alertstate=~"firing|"}

Copy to Clipboard

Toggle word wrap

This query replicates the request that Telemetry makes against a running OpenShift Container Platform cluster’s Prometheus service and returns the full set of time series captured by Telemetry.

2.2.2. Showing data collected by the Insights Operator
Copy link

You can review the data that is collected by the Insights Operator.

Prerequisites

Access to the cluster as a user with the cluster-admin role.

Procedure

Find the name of the currently running pod for the Insights Operator:

INSIGHTS_OPERATOR_POD=$(oc get pods --namespace=openshift-insights -o custom-columns=:metadata.name --no-headers  --field-selector=status.phase=Running)

$ INSIGHTS_OPERATOR_POD=$(oc get pods --namespace=openshift-insights -o custom-columns=:metadata.name --no-headers  --field-selector=status.phase=Running)

Copy to Clipboard

Toggle word wrap

Copy the recent data archives collected by the Insights Operator:

oc cp openshift-insights/$INSIGHTS_OPERATOR_POD:/var/lib/insights-operator ./insights-data

$ oc cp openshift-insights/$INSIGHTS_OPERATOR_POD:/var/lib/insights-operator ./insights-data

Copy to Clipboard

Toggle word wrap

The recent Insights Operator archives are now available in the insights-data directory.

2.3. Opting out of remote health reporting
Copy link

You may choose to opt out of reporting health and usage data for your cluster.

To opt out of remote health reporting, you must:

Modify the global cluster pull secret to disable remote health reporting.
Update the cluster to use this modified pull secret.

2.3.1. Consequences of disabling remote health reporting
Copy link

In OpenShift Container Platform, customers can opt out of reporting usage information. However, connected clusters allow Red Hat to react more quickly to problems and better support our customers, as well as better understand how product upgrades impact clusters. Connected clusters also help to simplify the subscription and entitlement process and enable the Red Hat OpenShift Cluster Manager service to provide an overview of your clusters and their subscription status.

Red Hat strongly recommends leaving health and usage reporting enabled for pre-production and test clusters even if it is necessary to opt out for production clusters. This allows Red Hat to be a participant in qualifying OpenShift Container Platform in your environments and react more rapidly to product issues.

Some of the consequences of opting out of having a connected cluster are:

Red Hat will not be able to monitor the success of product upgrades or the health of your clusters without a support case being opened.
Red Hat will not be able to use configuration data to better triage customer support cases and identify which configurations our customers find important.
The Red Hat OpenShift Cluster Manager will not show data about your clusters including health and usage information.
Your subscription entitlement information must be manually entered via cloud.redhat.com without the benefit of automatic usage reporting.

In restricted networks, Telemetry and Insights data can still be reported through appropriate configuration of your proxy.

2.3.2. Modifying the global cluster pull secret to disable remote health reporting
Copy link

You can modify your existing global cluster pull secret to disable remote health reporting. This disables both Telemetry and the Insights Operator.

Prerequisites

You have access to the cluster as a user with the cluster-admin role.

Procedure

Download the global cluster pull secret to your local file system.
```
oc extract secret/pull-secret -n openshift-config --to=.
```
```
$ oc extract secret/pull-secret -n openshift-config --to=.
```
Copy to Clipboard Toggle word wrap
In a text editor, edit the .dockerconfigjson file that was downloaded.

Remove the cloud.openshift.com JSON entry, for example:

"cloud.openshift.com":{"auth":"<hash>","email":"<email_address>"}

"cloud.openshift.com":{"auth":"<hash>","email":"<email_address>"}

Copy to Clipboard

Toggle word wrap

Save the file.

You can now update your cluster to use this modified pull secret.

2.3.3. Updating the global cluster pull secret
Copy link

You can update the global pull secret for your cluster.

Warning

Cluster resources must adjust to the new pull secret, which can temporarily limit the usability of the cluster.

Warning

Updating the global pull secret will cause node reboots while the Machine Config Operator (MCO) syncs the changes.

Prerequisites

You have a new or modified pull secret file to upload.
You have access to the cluster as a user with the cluster-admin role.

Procedure

Enter the following command to update the global pull secret for your cluster:

oc set data secret/pull-secret -n openshift-config --from-file=.dockerconfigjson=<pull-secret-location>

$ oc set data secret/pull-secret -n openshift-config --from-file=.dockerconfigjson=<pull-secret-location>

Copy to Clipboard

Toggle word wrap

1: Provide the path to the new pull secret file.

This update is rolled out to all nodes, which can take some time depending on the size of your cluster. During this time, nodes are drained and pods are rescheduled on the remaining nodes.

2.4. Using Insights to identify issues with your cluster
Copy link

Insights repeatedly analyzes the data Insights Operator sends. Users of OpenShift Container Platform can display the report on the Insights tab of each cluster in Red Hat OpenShift Cluster Manager.

2.4.1. Displaying potential issues with your cluster
Copy link

This section describes how to display the Insights report in the Red Hat OpenShift Cluster Manager.

Note that Insights repeatedly analyzes your cluster and shows the latest results. These results can change, for example, if you fix an issue or a new issue has been detected.

Prerequisites

Your cluster is registered in the Red Hat OpenShift Cluster Manager.
Remote health reporting is enabled, which is the default.
You are logged in to the Red Hat OpenShift Cluster Manager.

Procedure

Click the Clusters menu in the left pane.
Click the cluster’s name to display the details of the cluster.
Open the Insights tab of the cluster.
Depending on the result, the tab displays one of the following:
- Your cluster passed all health checks, if Insights did not identify any issues.
- A list of issues Insights has detected, prioritized by risk (low, moderate, important, and critical).
- No health checks to display, if Insights has not yet analyzed the cluster. The analysis starts shortly after the cluster has been installed and connected to the internet.
If any issues are displayed on the tab, click the > icon in front of the entry for further details.
Depending on the issue, the details can also contain a link to an Red Hat Knowledge Base article. For details and information on how to solve the problem, click How to remediate this issue.

Chapter 2. Remote health monitoring with connected clusters

2.1. About remote health monitoring
Copy link

2.1.1. About Telemetry
Copy link

2.1.1.1. Information collected by Telemetry
Copy link

2.1.2. About the Insights Operator
Copy link

2.1.2.1. Information collected by the Insights Operator
Copy link

2.1.3. Understanding Telemetry and Insights Operator data flow
Copy link

2.1.4. Additional details about how remote health monitoring data is used
Copy link

2.2. Showing data collected by remote health monitoring
Copy link

2.2.1. Showing data collected by Telemetry
Copy link

2.2.2. Showing data collected by the Insights Operator
Copy link

2.3. Opting out of remote health reporting
Copy link

2.3.1. Consequences of disabling remote health reporting
Copy link

2.3.2. Modifying the global cluster pull secret to disable remote health reporting
Copy link

2.3.3. Updating the global cluster pull secret
Copy link

2.4. Using Insights to identify issues with your cluster
Copy link

2.4.1. Displaying potential issues with your cluster
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 2. Remote health monitoring with connected clusters

2.1. About remote health monitoringCopy linkLink copied to clipboard!

2.1.1. About TelemetryCopy linkLink copied to clipboard!

2.1.1.1. Information collected by TelemetryCopy linkLink copied to clipboard!

2.1.2. About the Insights OperatorCopy linkLink copied to clipboard!

2.1.2.1. Information collected by the Insights OperatorCopy linkLink copied to clipboard!

2.1.3. Understanding Telemetry and Insights Operator data flowCopy linkLink copied to clipboard!

2.1.4. Additional details about how remote health monitoring data is usedCopy linkLink copied to clipboard!

2.2. Showing data collected by remote health monitoringCopy linkLink copied to clipboard!

2.2.1. Showing data collected by TelemetryCopy linkLink copied to clipboard!

2.2.2. Showing data collected by the Insights OperatorCopy linkLink copied to clipboard!

2.3. Opting out of remote health reportingCopy linkLink copied to clipboard!

2.3.1. Consequences of disabling remote health reportingCopy linkLink copied to clipboard!

2.3.2. Modifying the global cluster pull secret to disable remote health reportingCopy linkLink copied to clipboard!

2.3.3. Updating the global cluster pull secretCopy linkLink copied to clipboard!

2.4. Using Insights to identify issues with your clusterCopy linkLink copied to clipboard!

2.4.1. Displaying potential issues with your clusterCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

2.1. About remote health monitoring
Copy link

2.1.1. About Telemetry
Copy link

2.1.1.1. Information collected by Telemetry
Copy link

2.1.2. About the Insights Operator
Copy link

2.1.2.1. Information collected by the Insights Operator
Copy link

2.1.3. Understanding Telemetry and Insights Operator data flow
Copy link

2.1.4. Additional details about how remote health monitoring data is used
Copy link

2.2. Showing data collected by remote health monitoring
Copy link

2.2.1. Showing data collected by Telemetry
Copy link

2.2.2. Showing data collected by the Insights Operator
Copy link

2.3. Opting out of remote health reporting
Copy link

2.3.1. Consequences of disabling remote health reporting
Copy link

2.3.2. Modifying the global cluster pull secret to disable remote health reporting
Copy link

2.3.3. Updating the global cluster pull secret
Copy link

2.4. Using Insights to identify issues with your cluster
Copy link

2.4.1. Displaying potential issues with your cluster
Copy link