Chapter 2. Enabling the observability service
When you enable the observability service on your hub cluster, the multicluster-observability-operator
watches for new managed clusters and automatically deploys metric and alert collection services to the managed clusters. You can use metrics and configure Grafana dashboards to make cluster resource information visible, help you save cost, and prevent service disruptions.
Monitor the status of your managed clusters with the observability component, also known as the multicluster-observability-operator
pod.
Required access: Cluster administrator, the open-cluster-management:cluster-manager-admin
role, or S3 administrator.
2.1. Prerequisites
- You must install Red Hat Advanced Cluster Management for Kubernetes. See Installing while connected online for more information.
-
You must define a storage class in the
MultiClusterObservability
custom resource, if there is no default storage class specified. - Direct network access to the hub cluster is required. Network access to load balancers and proxies are not supported. For more information, see Networking.
You must configure an object store to create a storage solution.
- Important: When you configure your object store, ensure that you meet the encryption requirements that are necessary when sensitive data is persisted. The observability service uses Thanos supported, stable object stores. You might not be able to share an object store bucket by multiple Red Hat Advanced Cluster Management observability installations. Therefore, for each installation, provide a separate object store bucket.
Red Hat Advanced Cluster Management supports the following cloud providers with stable object stores:
- Amazon Web Services S3 (AWS S3)
- Red Hat Ceph (S3 compatible API)
- Google Cloud Storage
- Azure storage
- Red Hat OpenShift Data Foundation, formerly known as Red Hat OpenShift Container Storage
- Red Hat OpenShift on IBM (ROKS)
2.2. Enabling observability from the command line interface
Enable the observability service by creating a MultiClusterObservability
custom resource instance. Before you enable observability, see Observability pod capacity requests for more information.
Note:
-
When observability is enabled or disabled on OpenShift Container Platform managed clusters that are managed by Red Hat Advanced Cluster Management, the observability endpoint operator updates the
cluster-monitoring-config
config map by adding additionalalertmanager
configuration that automatically restarts the local Prometheus. -
The observability endpoint operator updates the
cluster-monitoring-config
config map by adding additionalalertmanager
configurations that automatically restart the local Prometheus. Therefore, when you insert thealertmanager
configuration in the OpenShift Container Platform managed cluster, the configuration removes the settings that relate to the retention of the Prometheus metrics.
Complete the following steps to enable the observability service:
- Log in to your Red Hat Advanced Cluster Management hub cluster.
Create a namespace for the observability service with the following command:
oc create namespace open-cluster-management-observability
Generate your pull-secret. If Red Hat Advanced Cluster Management is installed in the
open-cluster-management
namespace, run the following command:DOCKER_CONFIG_JSON=`oc extract secret/multiclusterhub-operator-pull-secret -n open-cluster-management --to=-`
If the
multiclusterhub-operator-pull-secret
is not defined in the namespace, copy thepull-secret
from theopenshift-config
namespace into theopen-cluster-management-observability
namespace. Run the following command:DOCKER_CONFIG_JSON=`oc extract secret/pull-secret -n openshift-config --to=-`
Then, create the pull-secret in the
open-cluster-management-observability
namespace, run the following command:oc create secret generic multiclusterhub-operator-pull-secret \ -n open-cluster-management-observability \ --from-literal=.dockerconfigjson="$DOCKER_CONFIG_JSON" \ --type=kubernetes.io/dockerconfigjson
Important: If you modify the global pull secret for your cluster by using the OpenShift Container Platform documentation, be sure to also update the global pull secret in the observability namespace. See Updating the global pull secret for more details.
Create a secret for your object storage for your cloud provider. Your secret must contain the credentials to your storage solution. For example, run the following command:
oc create -f thanos-object-storage.yaml -n open-cluster-management-observability
View the following examples of secrets for the supported object stores:
For Amazon S3 or S3 compatible, your secret might resemble the following file:
apiVersion: v1 kind: Secret metadata: name: thanos-object-storage namespace: open-cluster-management-observability type: Opaque stringData: thanos.yaml: | type: s3 config: bucket: YOUR_S3_BUCKET endpoint: YOUR_S3_ENDPOINT 1 insecure: true access_key: YOUR_ACCESS_KEY secret_key: YOUR_SECRET_KEY
- 1
- Enter the URL without the protocol. Enter the URL for your Amazon S3 endpoint that might resemble the following URL:
s3.us-east-1.amazonaws.com
.
For more details, see the Amazon Simple Storage Service user guide.
For Google Cloud Platform, your secret might resemble the following file:
apiVersion: v1 kind: Secret metadata: name: thanos-object-storage namespace: open-cluster-management-observability type: Opaque stringData: thanos.yaml: | type: GCS config: bucket: YOUR_GCS_BUCKET service_account: YOUR_SERVICE_ACCOUNT
For more details, see Google Cloud Storage.
For Azure your secret might resemble the following file:
apiVersion: v1 kind: Secret metadata: name: thanos-object-storage namespace: open-cluster-management-observability type: Opaque stringData: thanos.yaml: | type: AZURE config: storage_account: YOUR_STORAGE_ACCT storage_account_key: YOUR_STORAGE_KEY container: YOUR_CONTAINER endpoint: blob.core.windows.net 1 max_retries: 0
- 1
- If you use the
msi_resource
path, the endpoint authentication is complete by using the system-assigned managed identity. Your value must resemble the following endpoint:https://<storage-account-name>.blob.core.windows.net
.
If you use the
user_assigned_id
path, endpoint authentication is complete by using the user-assigned managed identity. When you use theuser_assigned_id
, themsi_resource
endpoint default value ishttps:<storage_account>.<endpoint>
. For more details, see Azure Storage documentation.Note: If you use Azure as an object storage for a Red Hat OpenShift Container Platform cluster, the storage account associated with the cluster is not supported. You must create a new storage account.
For Red Hat OpenShift Data Foundation, your secret might resemble the following file:
apiVersion: v1 kind: Secret metadata: name: thanos-object-storage namespace: open-cluster-management-observability type: Opaque stringData: thanos.yaml: | type: s3 config: bucket: YOUR_RH_DATA_FOUNDATION_BUCKET endpoint: YOUR_RH_DATA_FOUNDATION_ENDPOINT 1 insecure: false access_key: YOUR_RH_DATA_FOUNDATION_ACCESS_KEY secret_key: YOUR_RH_DATA_FOUNDATION_SECRET_KEY
- 1
- Enter the URL without the protocol. Enter the URL for your Red Hat OpenShift Data Foundation endpoint that might resemble the following URL:
example.redhat.com:443
.
For more details, see Red Hat OpenShift Data Foundation.
For Red Hat OpenShift on IBM (ROKS), your secret might resemble the following file:
apiVersion: v1 kind: Secret metadata: name: thanos-object-storage namespace: open-cluster-management-observability type: Opaque stringData: thanos.yaml: | type: s3 config: bucket: YOUR_ROKS_S3_BUCKET endpoint: YOUR_ROKS_S3_ENDPOINT 1 insecure: true access_key: YOUR_ROKS_ACCESS_KEY secret_key: YOUR_ROKS_SECRET_KEY
- 1
- Enter the URL without the protocol. Enter the URL for your Red Hat OpenShift Data Foundation endpoint that might resemble the following URL:
example.redhat.com:443
.
For more details, follow the IBM Cloud documentation, Cloud Object Storage. Be sure to use the service credentials to connect with the object storage. For more details, follow the IBM Cloud documentation, Cloud Object Store and Service Credentials.
2.2.1. Configuring storage for AWS Security Token Service
For Amazon S3 or S3 compatible storage, you can also use short term, limited-privilege credentials that are generated with AWS Security Token Service (AWS STS). Refer to AWS Security Token Service documentation for more details.
Generating access keys using AWS Security Service require the following additional steps:
- Create an IAM policy that limits access to an S3 bucket.
- Create an IAM role with a trust policy to generate JWT tokens for OpenShift Container Platform service accounts.
- Specify annotations for the observability service accounts that requires access to the S3 bucket. You can find an example of how observability on Red Hat OpenShift Service on AWS (ROSA) cluster can be configured to work with AWS STS tokens in the Set environment step. See Red Hat OpenShift Service on AWS (ROSA) for more details, along with ROSA with STS explained for an in-depth description of the requirements and setup to use STS tokens.
2.2.2. Generating access keys using the AWS Security Service
Complete the following steps to generate access keys using the AWS Security Service:
Set up the AWS environment. Run the following commands:
export POLICY_VERSION=$(date +"%m-%d-%y") export TRUST_POLICY_VERSION=$(date +"%m-%d-%y") export CLUSTER_NAME=<my-cluster> export S3_BUCKET=$CLUSTER_NAME-acm-observability export REGION=us-east-2 export NAMESPACE=open-cluster-management-observability export SA=tbd export SCRATCH_DIR=/tmp/scratch export OIDC_PROVIDER=$(oc get authentication.config.openshift.io cluster -o json | jq -r .spec.serviceAccountIssuer| sed -e "s/^https:\/\///") export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) export AWS_PAGER="" rm -rf $SCRATCH_DIR mkdir -p $SCRATCH_DIR
Create an S3 bucket with the following command:
aws s3 mb s3://$S3_BUCKET
Create a
s3-policy
JSON file for access to your S3 bucket. Run the following command:{ "Version": "$POLICY_VERSION", "Statement": [ { "Sid": "Statement", "Effect": "Allow", "Action": [ "s3:ListBucket", "s3:GetObject", "s3:DeleteObject", "s3:PutObject", "s3:PutObjectAcl", "s3:CreateBucket", "s3:DeleteBucket" ], "Resource": [ "arn:aws:s3:::$S3_BUCKET/*", "arn:aws:s3:::$S3_BUCKET" ] } ] }
Apply the policy with the following command:
S3_POLICY=$(aws iam create-policy --policy-name $CLUSTER_NAME-acm-obs \ --policy-document file://$SCRATCH_DIR/s3-policy.json \ --query 'Policy.Arn' --output text) echo $S3_POLICY
Create a
TrustPolicy
JSON file. Run the following command:{ "Version": "$TRUST_POLICY_VERSION", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "${OIDC_PROVIDER}:sub": [ "system:serviceaccount:${NAMESPACE}:observability-thanos-query", "system:serviceaccount:${NAMESPACE}:observability-thanos-store-shard", "system:serviceaccount:${NAMESPACE}:observability-thanos-compact" "system:serviceaccount:${NAMESPACE}:observability-thanos-rule", "system:serviceaccount:${NAMESPACE}:observability-thanos-receive", ] } } } ] }
Create a role for AWS Prometheus and CloudWatch with the following command:
S3_ROLE=$(aws iam create-role \ --role-name "$CLUSTER_NAME-acm-obs-s3" \ --assume-role-policy-document file://$SCRATCH_DIR/TrustPolicy.json \ --query "Role.Arn" --output text) echo $S3_ROLE
Attach the policies to the role. Run the following command:
aws iam attach-role-policy \ --role-name "$CLUSTER_NAME-acm-obs-s3" \ --policy-arn $S3_POLICY
Your secret might resemble the following file. The
config
section specifiessignature_version2: false
and does not specifyaccess_key
andsecret_key
:apiVersion: v1 kind: Secret metadata: name: thanos-object-storage namespace: open-cluster-management-observability type: Opaque stringData: thanos.yaml: | type: s3 config: bucket: $S3_BUCKET endpoint: s3.$REGION.amazonaws.com signature_version2: false
-
Specify service account annotations when you the
MultiClusterObservability
custom resource as described in Creating the MultiClusterObservability custom resource section. You can retrieve the S3 access key and secret key for your cloud providers with the following commands. You must decode, edit, and encode your
base64
string in the secret:YOUR_CLOUD_PROVIDER_ACCESS_KEY=$(oc -n open-cluster-management-observability get secret <object-storage-secret> -o jsonpath="{.data.thanos\.yaml}" | base64 --decode | grep access_key | awk '{print $2}') echo $ACCESS_KEY YOUR_CLOUD_PROVIDER_SECRET_KEY=$(oc -n open-cluster-management-observability get secret <object-storage-secret> -o jsonpath="{.data.thanos\.yaml}" | base64 --decode | grep secret_key | awk '{print $2}') echo $SECRET_KEY
Verify that observability is enabled by checking the pods for the following deployments and stateful sets. You might receive the following information:
observability-thanos-query (deployment) observability-thanos-compact (statefulset) observability-thanos-receive-default (statefulset) observability-thanos-rule (statefulset) observability-thanos-store-shard-x (statefulsets)
2.2.3. Creating the MultiClusterObservability custom resource
Use the MultiClusterObservability
custom resource to specify the persistent volume storage size for various components. You must set the storage size during the initial creation of the MultiClusterObservability
custom resource. When you update the storage size values post-deployment, changes take effect only if the storage class supports dynamic volume expansion. For more information, see Expanding persistent volumes from the Red Hat OpenShift Container Platform documentation.
Complete the following steps to create the MultiClusterObservability
custom resource on your hub cluster:
Create the
MultiClusterObservability
custom resource YAML file namedmulticlusterobservability_cr.yaml
.View the following default YAML file for observability:
apiVersion: observability.open-cluster-management.io/v1beta2 kind: MultiClusterObservability metadata: name: observability spec: observabilityAddonSpec: {} storageConfig: metricObjectStorage: name: thanos-object-storage key: thanos.yaml
You might want to modify the value for the
retentionConfig
parameter in theadvanced
section. For more information, see Thanos Downsampling resolution and retention. Depending on the number of managed clusters, you might want to update the amount of storage for stateful sets. If your S3 bucket is configured to use STS tokens, annotate the service accounts to use STS with S3 role. View the following configuration:spec: advanced: compact: serviceAccountAnnotations: eks.amazonaws.com/role-arn: $S3_ROLE store: serviceAccountAnnotations: eks.amazonaws.com/role-arn: $S3_ROLE rule: serviceAccountAnnotations: eks.amazonaws.com/role-arn: $S3_ROLE receive: serviceAccountAnnotations: eks.amazonaws.com/role-arn: $S3_ROLE query: serviceAccountAnnotations: eks.amazonaws.com/role-arn: $S3_ROLE
See Observability API for more information.
To deploy on infrastructure machine sets, you must set a label for your set by updating the
nodeSelector
in theMultiClusterObservability
YAML. Your YAML might resemble the following content:nodeSelector: node-role.kubernetes.io/infra:
For more information, see Creating infrastructure machine sets.
Apply the observability YAML to your cluster by running the following command:
oc apply -f multiclusterobservability_cr.yaml
All the pods in
open-cluster-management-observability
namespace for Thanos, Grafana and Alertmanager are created. All the managed clusters connected to the Red Hat Advanced Cluster Management hub cluster are enabled to send metrics back to the Red Hat Advanced Cluster Management Observability service.- Validate that the observability service is enabled and the data is populated by launching the Grafana dashboards.
Click the Grafana link that is near the console header, from either the console Overview page or the Clusters page.
-
Alternatively, access the OpenShift Container Platform 3.11 Grafana dashboards with the following URL:
https://$ACM_URL/grafana/dashboards
. - To view the OpenShift Container Platform 3.11 dashboards, select the folder named OCP 3.11 .
-
Alternatively, access the OpenShift Container Platform 3.11 Grafana dashboards with the following URL:
Access the
multicluster-observability-operator
deployment to verify that themulticluster-observability-operator
pod is being deployed by themulticlusterhub-operator
deployment. Run the following command:oc get deploy multicluster-observability-operator -n open-cluster-management --show-labels NAME READY UP-TO-DATE AVAILABLE AGE LABELS multicluster-observability-operator 1/1 1 1 35m installer.name=multiclusterhub,installer.namespace=open-cluster-management
View the
labels
section of themulticluster-observability-operator
deployment for labels that are associated with the resource. Thelabels
section might contain the following details:labels: installer.name: multiclusterhub installer.namespace: open-cluster-management
. . Optional: If you want to exclude specific managed clusters from collecting the observability data, add the following cluster label to your clusters: observability: disabled
.
The observability service is enabled. After you enable the observability service, the following functions are initiated:
- All the alert managers from the managed clusters are forwarded to the Red Hat Advanced Cluster Management hub cluster.
All the managed clusters that are connected to the Red Hat Advanced Cluster Management hub cluster are enabled to send alerts back to the Red Hat Advanced Cluster Management observability service. You can configure the Red Hat Advanced Cluster Management Alertmanager to take care of deduplicating, grouping, and routing the alerts to the correct receiver integration such as email, PagerDuty, or OpsGenie. You can also handle silencing and inhibition of the alerts.
Note: Alert forwarding to the Red Hat Advanced Cluster Management hub cluster feature is only supported by managed clusters with Red Hat OpenShift Container Platform version 4.12 or later. After you install Red Hat Advanced Cluster Management with observability enabled, alerts from OpenShift Container Platform 4.12 and later are automatically forwarded to the hub cluster. See Forwarding alerts to learn more.
2.3. Enabling observability from the Red Hat OpenShift Container Platform console
Optionally, you can enable observability from the Red Hat OpenShift Container Platform console, create a project named open-cluster-management-observability
. Be sure to create an image pull-secret named, multiclusterhub-operator-pull-secret
in the open-cluster-management-observability
project.
Create your object storage secret named, thanos-object-storage
in the open-cluster-management-observability
project. Enter the object storage secret details, then click Create. See step four of the Enabling observability section to view an example of a secret.
Create the MultiClusterObservability
custom resource instance. When you receive the following message, the observability service is enabled successfully from OpenShift Container Platform: Observability components are deployed and running
.
2.3.1. Verifying the Thanos version
After Thanos is deployed on your cluster, verify the Thanos version from the command line interface (CLI).
After you log in to your hub cluster, run the following command in the observability pods to receive the Thanos version:
thanos --version
The Thanos version is displayed.
2.4. Disabling observability
You can disable observability, which stops data collection on the Red Hat Advanced Cluster Management hub cluster.
2.4.1. Disabling observability on all clusters
Disable observability by removing observability components on all managed clusters. Update the multicluster-observability-operator
resource by setting enableMetrics
to false
. Your updated resource might resemble the following change:
spec: imagePullPolicy: Always imagePullSecret: multiclusterhub-operator-pull-secret observabilityAddonSpec: # The ObservabilityAddonSpec defines the global settings for all managed clusters which have observability add-on enabled enableMetrics: false #indicates the observability addon push metrics to hub server
2.4.2. Disabling observability on a single cluster
Disable observability by removing observability components on specific managed clusters. Add the observability: disabled
label to the managedclusters.cluster.open-cluster-management.io
custom resource. From the Red Hat Advanced Cluster Management console Clusters page, add the observability=disabled
label to the specified cluster.
Note: When a managed cluster with the observability component is detached, the metrics-collector
deployments are removed.
2.5. Removing observability
When you remove the MultiClusterObservability
custom resource, you are disabling and uninstalling the observability service. From the OpenShift Container Platform console navigation, select Operators > Installed Operators > Advanced Cluster Manager for Kubernetes. Remove the MultiClusterObservability
custom resource.
2.6. Additional resources
Links to cloud provider documentation for object storage information:
- See Using observability.
- To learn more about customizing the observability service, see Customizing observability.
- For more related topics, return to the Observability service introduction.