Chapter 3. Remote health monitoring with connected clusters


3.1. About remote health monitoring

OpenShift Container Platform collects anonymized aggregated information about the health, usage, and size of clusters and reports it to Red Hat via two integrated components: Telemetry and the Insights Operator. This information allows Red Hat to improve OpenShift Container Platform and to react to issues that impact customers more quickly. This also simplifies the subscription and entitlement process for Red Hat customers and enables the Red Hat OpenShift Cluster Manager service to provide an overview of your clusters and their health and subscription status.

A cluster that reports data to Red Hat via Telemetry and the Insights Operator is considered a connected cluster.

3.1.1. About Telemetry

Telemetry sends a carefully chosen subset of the cluster monitoring metrics to Red Hat. These metrics are sent continuously and describe:

  • The size of an OpenShift Container Platform cluster
  • The health and status of OpenShift Container Platform components
  • The health and status of any upgrade being performed
  • Limited usage information about OpenShift Container Platform components and features
  • Summary info about alerts reported by the cluster monitoring component

This continuous stream of data is used by Red Hat to monitor the health of clusters in real time and to react as necessary to problems that impact our customers. It also allows Red Hat to roll out OpenShift Container Platform upgrades to customers so as to minimize service impact and continuously improve the upgrade experience.

This debugging information is available to Red Hat Support and engineering teams with the same restrictions as accessing data reported via support cases. All connected cluster information is used by Red Hat to help make OpenShift Container Platform better and more intuitive to use. None of the information is shared with third parties.

3.1.1.1. Information collected by Telemetry

Primary information collected by Telemetry includes:

  • The number of updates available per cluster
  • Channel and image repository used for an update
  • The number of errors that occurred during an update
  • Progress information of running updates
  • The number of machines per cluster
  • The number of CPU cores and size of RAM of the machines
  • The number of members in the etcd cluster and number of objects currently stored in the etcd cluster
  • The number of CPU cores and RAM used per machine type - infra or master
  • The number of CPU cores and RAM used per cluster
  • Use of OpenShift Container Platform framework components per cluster
  • The version of the OpenShift Container Platform cluster
  • Health, condition, and status for any OpenShift Container Platform framework component that is installed on the cluster, for example Cluster Version Operator, Cluster Monitoring, Image Registry, and Elasticsearch for Logging
  • A unique random identifier that is generated during installation
  • The name of the platform that OpenShift Container Platform is deployed on, such as Amazon Web Services

Telemetry does not collect identifying information such as user names, passwords, or the names or addresses of user resources.

3.1.2. About the Insights Operator

The Insights Operator periodically gathers anonymized configuration and component failure status and reports that to Red Hat. This is a subset of the information captured by the must-gather tool and allows Red Hat to assess important configuration and deeper failure data than is reported via Telemetry. This data is sent several times a day and describes:

  • Important configuration information about the environment that the cluster runs in
  • Details about the state of the cluster and its major components
  • Debugging information about infrastructure Pods or nodes that are reporting failures

This debugging information is available to Red Hat Support and engineering teams with the same restrictions as accessing data reported via support cases. All connected cluster information is used by Red Hat to help make OpenShift Container Platform better and more intuitive to use. None of the information is shared with third parties.

3.1.2.1. Information collected by the Insights Operator

Primary information collected by the Insights Operator includes:

  • The version of the cluster and its components, as well as the unique cluster identifier
  • Channel and image repository used for an update
  • Details about errors that have occurred in the cluster components
  • Progress and health information of running updates and the status of any component upgrades
  • Anonymized details about the cluster configuration that is relevant to Red Hat Support
  • Details about any Technology Preview or unsupported configurations that might impact Red Hat Support
  • Details of the platform that OpenShift Container Platform is deployed on, such as Amazon Web Services, and the region that the cluster is located in
  • Information about Pods of degraded OpenShift Container Platform cluster Operators
  • Information about nodes marked as NotReady
  • Events for all namespaces listed as "related objects" for Degraded operator

The Insights Operator does not collect identifying information such as user names, passwords, or the names or addresses of user resources.

3.2. Showing data collected by remote health monitoring

As an administrator, you can review the metrics collected by Telemetry and the Insights Operator.

3.2.1. Showing data collected by Telemetry

You can see the cluster and components time series data captured by Telemetry.

Prerequisites

  • Install the OpenShift Command-line Interface (CLI), commonly known as oc.
  • You must log in to the cluster with a user that has the cluster-admin role.

Procedure

  1. Find the URL for the Prometheus service that runs in the OpenShift Container Platform cluster:

    $ oc get route prometheus-k8s -n openshift-monitoring -o jsonpath="{.spec.host}"
  2. Navigate to the URL.
  3. Enter this query in the Expression input box and press Execute:

    {__name__="up"} or {__name__="cluster_version"} or {__name__="cluster_version_available_updates"} or {__name__="cluster_operator_up"} or {__name__="cluster_operator_conditions"} or {__name__="cluster_version_payload"} or {__name__="cluster_version_payload_errors"} or {__name__="instance:etcd_object_counts:sum"} or {__name__="ALERTS",alertstate="firing"} or {__name__="code:apiserver_request_count:rate:sum"} or {__name__="kube_pod_status_ready:etcd:sum"} or {__name__="kube_pod_status_ready:image_registry:sum"} or {__name__="cluster:capacity_cpu_cores:sum"} or {__name__="cluster:capacity_memory_bytes:sum"} or {__name__="cluster:cpu_usage_cores:sum"} or {__name__="cluster:memory_usage_bytes:sum"} or {__name__="openshift:cpu_usage_cores:sum"} or {__name__="openshift:memory_usage_bytes:sum"} or {__name__="cluster:node_instance_type_count:sum"}

    This query replicates the request that Telemetry makes against a running OpenShift Container Platform cluster’s Prometheus service and returns the full set of time series captured by Telemetry.

3.2.2. Showing data collected by the Insights Operator

You can review the data that is collected by the Insights Operator.

Prerequisites

  • Access to the cluster as a user with the cluster-admin role.

Procedure

  1. Find the name of the currently running Pod for the Insights Operator:

    $ INSIGHTS_OPERATOR_POD=$(oc get pods --namespace=openshift-insights -o custom-columns=:metadata.name --no-headers  --field-selector=status.phase=Running)
  2. Copy the recent data archives collected by the Insights Operator:

    $ oc cp openshift-insights/$INSIGHTS_OPERATOR_POD:/var/lib/insights-operator ./insights-data

The recent Insights Operator archives are now available in the insights-data directory.

3.3. Opting out of remote health reporting

You might need to opt out of reporting health and usage data for your cluster. For example, you might need to comply with privacy laws or standards governing how your organization reports monitoring data.

To opt out of remote health reporting, you must:

  1. Modify the global cluster pull secret to disable remote health reporting.
  2. Update the cluster to use this modified pull secret.

3.3.1. Consequences of disabling remote health reporting

In OpenShift Container Platform, customers can opt out of reporting health and usage information. However, connected clusters allow Red Hat to react more quickly to problems and better support our customers, as well as better understand how product upgrades impact clusters.

Red Hat strongly recommends leaving health and usage reporting enabled for pre-production and test clusters even if it is necessary to opt out for production clusters. This allows Red Hat to be a participant in qualifying OpenShift Container Platform in your environments and react more rapidly to product issues.

Some of the consequences of opting out of having a connected cluster are:

  • Red Hat will not be able to monitor the success of product upgrades or the health of your clusters without a support case being opened.
  • Red Hat will not be able to use anonymized configuration data to better triage customer support cases and identify which configurations our customers find important.
  • The Red Hat OpenShift Cluster Manager will not show data about your clusters including health and usage information.
  • Your subscription entitlement information must be manually entered via cloud.redhat.com without the benefit of automatic usage reporting.

In restricted networks, Telemetry and Insights data can still be reported through appropriate configuration of your proxy.

3.3.2. Modifying the global cluster pull secret to disable remote health reporting

You can modify your existing global cluster pull secret to disable remote health reporting. This disables both Telemetry and the Insights Operator.

Prerequisites

  • You have access to the cluster as a user with the cluster-admin role.

Procedure

  1. Download the global cluster pull secret to your local file system.

    $ oc extract secret/pull-secret -n openshift-config --to=.
  2. In a text editor, edit the .dockerconfigjson file that was downloaded.
  3. Remove the cloud.openshift.com JSON entry, for example:

    "cloud.openshift.com":{"auth":"<hash>","email":"<email_address>"}
  4. Save the file.

You can now update your cluster to use this modified pull secret.

3.3.3. Updating the global cluster pull secret

You can update the global pull secret for your cluster.

Warning

Cluster resources must adjust to the new pull secret, which can temporarily limit the usability of the cluster.

Prerequisites

  • You have a new or modified pull secret file to upload.
  • You have access to the cluster as a user with the cluster-admin role.

Procedure

  • Run the following command to update the global pull secret for your cluster:

    $ oc set data secret/pull-secret -n openshift-config --from-file=.dockerconfigjson=<pull-secret-location> 1
    1
    Provide the path to the new pull secret file.

This update is rolled out to all nodes, which can take some time depending on the size of your cluster. During this time, nodes are drained and Pods are rescheduled on the remaining nodes.

Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.