Chapter 2. Enabling the observability service


When you enable the observability service on your hub cluster, the multicluster-observability-operator watches for new managed clusters and automatically deploys metric and alert collection services to the managed clusters. You can use metrics and configure Grafana dashboards to make cluster resource information visible, help you save cost, and prevent service disruptions.

Monitor the status of your managed clusters with the observability component, also known as the multicluster-observability-operator pod.

Required access: Cluster administrator, the open-cluster-management:cluster-manager-admin role, or S3 administrator.

2.1. Prerequisites

  • You must install Red Hat Advanced Cluster Management for Kubernetes. See Installing while connected online for more information.
  • You must define a storage class in the MultiClusterObservability custom resource, if there is no default storage class specified.
  • Direct network access to the hub cluster is required. Network access to load balancers and proxies are not supported. For more information, see Networking.
  • You must configure an object store to create a storage solution.

    • Important: When you configure your object store, ensure that you meet the encryption requirements that are necessary when sensitive data is persisted. The observability service uses Thanos supported, stable object stores. You might not be able to share an object store bucket by multiple Red Hat Advanced Cluster Management observability installations. Therefore, for each installation, provide a separate object store bucket.
    • Red Hat Advanced Cluster Management supports the following cloud providers with stable object stores:

      • Amazon Web Services S3 (AWS S3)
      • Red Hat Ceph (S3 compatible API)
      • Google Cloud Storage
      • Azure storage
      • Red Hat OpenShift Data Foundation, formerly known as Red Hat OpenShift Container Storage
      • Red Hat OpenShift on IBM (ROKS)

2.2. Enabling observability from the command line interface

Enable the observability service by creating a MultiClusterObservability custom resource instance. Before you enable observability, see Observability pod capacity requests for more information.

Note:

  • When observability is enabled or disabled on OpenShift Container Platform managed clusters that are managed by Red Hat Advanced Cluster Management, the observability endpoint operator updates the cluster-monitoring-config config map by adding additional alertmanager configuration that automatically restarts the local Prometheus.
  • The observability endpoint operator updates the cluster-monitoring-config config map by adding additional alertmanager configurations that automatically restart the local Prometheus. Therefore, when you insert the alertmanager configuration in the OpenShift Container Platform managed cluster, the configuration removes the settings that relate to the retention of the Prometheus metrics.

Complete the following steps to enable the observability service:

  1. Log in to your Red Hat Advanced Cluster Management hub cluster.
  2. Create a namespace for the observability service with the following command:

    oc create namespace open-cluster-management-observability
  3. Generate your pull-secret. If Red Hat Advanced Cluster Management is installed in the open-cluster-management namespace, run the following command:

    DOCKER_CONFIG_JSON=`oc extract secret/multiclusterhub-operator-pull-secret -n open-cluster-management --to=-`

    If the multiclusterhub-operator-pull-secret is not defined in the namespace, copy the pull-secret from the openshift-config namespace into the open-cluster-management-observability namespace. Run the following command:

    DOCKER_CONFIG_JSON=`oc extract secret/pull-secret -n openshift-config --to=-`

    Then, create the pull-secret in the open-cluster-management-observability namespace, run the following command:

    oc create secret generic multiclusterhub-operator-pull-secret \
        -n open-cluster-management-observability \
        --from-literal=.dockerconfigjson="$DOCKER_CONFIG_JSON" \
        --type=kubernetes.io/dockerconfigjson

    Important: If you modify the global pull secret for your cluster by using the OpenShift Container Platform documentation, be sure to also update the global pull secret in the observability namespace. See Updating the global pull secret for more details.

  4. Create a secret for your object storage for your cloud provider. Your secret must contain the credentials to your storage solution. For example, run the following command:

    oc create -f thanos-object-storage.yaml -n open-cluster-management-observability

    View the following examples of secrets for the supported object stores:

    • For Amazon S3 or S3 compatible, your secret might resemble the following file:

      apiVersion: v1
      kind: Secret
      metadata:
        name: thanos-object-storage
        namespace: open-cluster-management-observability
      type: Opaque
      stringData:
        thanos.yaml: |
          type: s3
          config:
            bucket: YOUR_S3_BUCKET
            endpoint: YOUR_S3_ENDPOINT 1
            insecure: true
            access_key: YOUR_ACCESS_KEY
            secret_key: YOUR_SECRET_KEY
      1
      Enter the URL without the protocol. Enter the URL for your Amazon S3 endpoint that might resemble the following URL: s3.us-east-1.amazonaws.com.

      For more details, see the Amazon Simple Storage Service user guide.

    • For Google Cloud Platform, your secret might resemble the following file:

      apiVersion: v1
      kind: Secret
      metadata:
        name: thanos-object-storage
        namespace: open-cluster-management-observability
      type: Opaque
      stringData:
        thanos.yaml: |
          type: GCS
          config:
            bucket: YOUR_GCS_BUCKET
            service_account: YOUR_SERVICE_ACCOUNT

      For more details, see Google Cloud Storage.

    • For Azure your secret might resemble the following file:

      apiVersion: v1
      kind: Secret
      metadata:
        name: thanos-object-storage
        namespace: open-cluster-management-observability
      type: Opaque
      stringData:
        thanos.yaml: |
          type: AZURE
          config:
            storage_account: YOUR_STORAGE_ACCT
            storage_account_key: YOUR_STORAGE_KEY
            container: YOUR_CONTAINER
            endpoint: blob.core.windows.net 1
            max_retries: 0
      1
      If you use the msi_resource path, the endpoint authentication is complete by using the system-assigned managed identity. Your value must resemble the following endpoint: https://<storage-account-name>.blob.core.windows.net.

      If you use the user_assigned_id path, endpoint authentication is complete by using the user-assigned managed identity. When you use the user_assigned_id, the msi_resource endpoint default value is https:<storage_account>.<endpoint>. For more details, see Azure Storage documentation.

      Note: If you use Azure as an object storage for a Red Hat OpenShift Container Platform cluster, the storage account associated with the cluster is not supported. You must create a new storage account.

    • For Red Hat OpenShift Data Foundation, your secret might resemble the following file:

      apiVersion: v1
      kind: Secret
      metadata:
        name: thanos-object-storage
        namespace: open-cluster-management-observability
      type: Opaque
      stringData:
        thanos.yaml: |
          type: s3
          config:
            bucket: YOUR_RH_DATA_FOUNDATION_BUCKET
            endpoint: YOUR_RH_DATA_FOUNDATION_ENDPOINT 1
            insecure: false
            access_key: YOUR_RH_DATA_FOUNDATION_ACCESS_KEY
            secret_key: YOUR_RH_DATA_FOUNDATION_SECRET_KEY
      1
      Enter the URL without the protocol. Enter the URL for your Red Hat OpenShift Data Foundation endpoint that might resemble the following URL: example.redhat.com:443.

      For more details, see Red Hat OpenShift Data Foundation.

    • For Red Hat OpenShift on IBM (ROKS), your secret might resemble the following file:

      apiVersion: v1
      kind: Secret
      metadata:
        name: thanos-object-storage
        namespace: open-cluster-management-observability
      type: Opaque
      stringData:
        thanos.yaml: |
          type: s3
          config:
            bucket: YOUR_ROKS_S3_BUCKET
            endpoint: YOUR_ROKS_S3_ENDPOINT 1
            insecure: true
            access_key: YOUR_ROKS_ACCESS_KEY
            secret_key: YOUR_ROKS_SECRET_KEY
      1
      Enter the URL without the protocol. Enter the URL for your Red Hat OpenShift Data Foundation endpoint that might resemble the following URL: example.redhat.com:443.

      For more details, follow the IBM Cloud documentation, Cloud Object Storage. Be sure to use the service credentials to connect with the object storage. For more details, follow the IBM Cloud documentation, Cloud Object Store and Service Credentials.

2.2.1. Configuring storage for AWS Security Token Service

For Amazon S3 or S3 compatible storage, you can also use short term, limited-privilege credentials that are generated with AWS Security Token Service (AWS STS). Refer to AWS Security Token Service documentation for more details.

Generating access keys using AWS Security Service require the following additional steps:

  1. Create an IAM policy that limits access to an S3 bucket.
  2. Create an IAM role with a trust policy to generate JWT tokens for OpenShift Container Platform service accounts.
  3. Specify annotations for the observability service accounts that requires access to the S3 bucket. You can find an example of how observability on Red Hat OpenShift Service on AWS (ROSA) cluster can be configured to work with AWS STS tokens in the Set environment step. See Red Hat OpenShift Service on AWS (ROSA) for more details, along with ROSA with STS explained for an in-depth description of the requirements and setup to use STS tokens.

2.2.2. Generating access keys using the AWS Security Service

Complete the following steps to generate access keys using the AWS Security Service:

  1. Set up the AWS environment. Run the following commands:

    export POLICY_VERSION=$(date +"%m-%d-%y")
    export TRUST_POLICY_VERSION=$(date +"%m-%d-%y")
    export CLUSTER_NAME=<my-cluster>
    export S3_BUCKET=$CLUSTER_NAME-acm-observability
    export REGION=us-east-2
    export NAMESPACE=open-cluster-management-observability
    export SA=tbd
    export SCRATCH_DIR=/tmp/scratch
    export OIDC_PROVIDER=$(oc get authentication.config.openshift.io cluster -o json | jq -r .spec.serviceAccountIssuer| sed -e "s/^https:\/\///")
    export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
    export AWS_PAGER=""
    rm -rf $SCRATCH_DIR
    mkdir -p $SCRATCH_DIR
  2. Create an S3 bucket with the following command:

    aws s3 mb s3://$S3_BUCKET
  3. Create a s3-policy JSON file for access to your S3 bucket. Run the following command:

    {
        "Version": "$POLICY_VERSION",
        "Statement": [
            {
                "Sid": "Statement",
                "Effect": "Allow",
                "Action": [
                    "s3:ListBucket",
                    "s3:GetObject",
                    "s3:DeleteObject",
                    "s3:PutObject",
                    "s3:PutObjectAcl",
                    "s3:CreateBucket",
                    "s3:DeleteBucket"
                ],
                "Resource": [
                    "arn:aws:s3:::$S3_BUCKET/*",
                    "arn:aws:s3:::$S3_BUCKET"
                ]
            }
        ]
     }
  4. Apply the policy with the following command:

    S3_POLICY=$(aws iam create-policy --policy-name $CLUSTER_NAME-acm-obs \
    --policy-document file://$SCRATCH_DIR/s3-policy.json \
    --query 'Policy.Arn' --output text)
    echo $S3_POLICY
  5. Create a TrustPolicy JSON file. Run the following command:

    {
     "Version": "$TRUST_POLICY_VERSION",
     "Statement": [
       {
         "Effect": "Allow",
         "Principal": {
           "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}"
         },
         "Action": "sts:AssumeRoleWithWebIdentity",
         "Condition": {
           "StringEquals": {
             "${OIDC_PROVIDER}:sub": [
               "system:serviceaccount:${NAMESPACE}:observability-thanos-query",
               "system:serviceaccount:${NAMESPACE}:observability-thanos-store-shard",
               "system:serviceaccount:${NAMESPACE}:observability-thanos-compact"
               "system:serviceaccount:${NAMESPACE}:observability-thanos-rule",
               "system:serviceaccount:${NAMESPACE}:observability-thanos-receive",
             ]
           }
         }
       }
     ]
    }
  6. Create a role for AWS Prometheus and CloudWatch with the following command:

    S3_ROLE=$(aws iam create-role \
      --role-name "$CLUSTER_NAME-acm-obs-s3" \
      --assume-role-policy-document file://$SCRATCH_DIR/TrustPolicy.json \
      --query "Role.Arn" --output text)
    echo $S3_ROLE
  7. Attach the policies to the role. Run the following command:

    aws iam attach-role-policy \
      --role-name "$CLUSTER_NAME-acm-obs-s3" \
      --policy-arn $S3_POLICY

    Your secret might resemble the following file. The config section specifies signature_version2: false and does not specify access_key and secret_key:

    apiVersion: v1
    kind: Secret
    metadata:
      name: thanos-object-storage
      namespace: open-cluster-management-observability
    type: Opaque
    stringData:
      thanos.yaml: |
     type: s3
     config:
       bucket: $S3_BUCKET
       endpoint: s3.$REGION.amazonaws.com
       signature_version2: false
  8. Specify service account annotations when you the MultiClusterObservability custom resource as described in Creating the MultiClusterObservability custom resource section.
  9. You can retrieve the S3 access key and secret key for your cloud providers with the following commands. You must decode, edit, and encode your base64 string in the secret:

    YOUR_CLOUD_PROVIDER_ACCESS_KEY=$(oc -n open-cluster-management-observability get secret <object-storage-secret> -o jsonpath="{.data.thanos\.yaml}" | base64 --decode | grep access_key | awk '{print $2}')
    
    echo $ACCESS_KEY
    
    YOUR_CLOUD_PROVIDER_SECRET_KEY=$(oc -n open-cluster-management-observability get secret <object-storage-secret> -o jsonpath="{.data.thanos\.yaml}" | base64 --decode | grep secret_key | awk '{print $2}')
    
    echo $SECRET_KEY
  10. Verify that observability is enabled by checking the pods for the following deployments and stateful sets. You might receive the following information:

    observability-thanos-query (deployment)
    observability-thanos-compact (statefulset)
    observability-thanos-receive-default  (statefulset)
    observability-thanos-rule   (statefulset)
    observability-thanos-store-shard-x  (statefulsets)

2.2.3. Creating the MultiClusterObservability custom resource

Use the MultiClusterObservability custom resource to specify the persistent volume storage size for various components. You must set the storage size during the initial creation of the MultiClusterObservability custom resource. When you update the storage size values post-deployment, changes take effect only if the storage class supports dynamic volume expansion. For more information, see Expanding persistent volumes from the Red Hat OpenShift Container Platform documentation.

Complete the following steps to create the MultiClusterObservability custom resource on your hub cluster:

  1. Create the MultiClusterObservability custom resource YAML file named multiclusterobservability_cr.yaml.

    View the following default YAML file for observability:

    apiVersion: observability.open-cluster-management.io/v1beta2
    kind: MultiClusterObservability
    metadata:
      name: observability
    spec:
      observabilityAddonSpec: {}
      storageConfig:
        metricObjectStorage:
          name: thanos-object-storage
          key: thanos.yaml

    You might want to modify the value for the retentionConfig parameter in the advanced section. For more information, see Thanos Downsampling resolution and retention. Depending on the number of managed clusters, you might want to update the amount of storage for stateful sets. If your S3 bucket is configured to use STS tokens, annotate the service accounts to use STS with S3 role. View the following configuration:

    spec:
      advanced:
        compact:
           serviceAccountAnnotations:
               eks.amazonaws.com/role-arn: $S3_ROLE
        store:
           serviceAccountAnnotations:
              eks.amazonaws.com/role-arn: $S3_ROLE
        rule:
           serviceAccountAnnotations:
              eks.amazonaws.com/role-arn: $S3_ROLE
        receive:
           serviceAccountAnnotations:
              eks.amazonaws.com/role-arn: $S3_ROLE
        query:
           serviceAccountAnnotations:
              eks.amazonaws.com/role-arn: $S3_ROLE

    See Observability API for more information.

  2. To deploy on infrastructure machine sets, you must set a label for your set by updating the nodeSelector in the MultiClusterObservability YAML. Your YAML might resemble the following content:

      nodeSelector:
        node-role.kubernetes.io/infra:

    For more information, see Creating infrastructure machine sets.

  3. Apply the observability YAML to your cluster by running the following command:

    oc apply -f multiclusterobservability_cr.yaml

    All the pods in open-cluster-management-observability namespace for Thanos, Grafana and Alertmanager are created. All the managed clusters connected to the Red Hat Advanced Cluster Management hub cluster are enabled to send metrics back to the Red Hat Advanced Cluster Management Observability service.

  4. Validate that the observability service is enabled and the data is populated by launching the Grafana dashboards.
  5. Click the Grafana link that is near the console header, from either the console Overview page or the Clusters page.

    1. Alternatively, access the OpenShift Container Platform 3.11 Grafana dashboards with the following URL: https://$ACM_URL/grafana/dashboards.
    2. To view the OpenShift Container Platform 3.11 dashboards, select the folder named OCP 3.11 .
  6. Access the multicluster-observability-operator deployment to verify that the multicluster-observability-operator pod is being deployed by the multiclusterhub-operator deployment. Run the following command:

    oc get deploy multicluster-observability-operator -n open-cluster-management --show-labels
    
    NAME                                  READY   UP-TO-DATE   AVAILABLE   AGE   LABELS
    multicluster-observability-operator   1/1     1            1           35m   installer.name=multiclusterhub,installer.namespace=open-cluster-management
  7. View the labels section of the multicluster-observability-operator deployment for labels that are associated with the resource. The labels section might contain the following details:

     labels:
        installer.name: multiclusterhub
        installer.namespace: open-cluster-management

. . Optional: If you want to exclude specific managed clusters from collecting the observability data, add the following cluster label to your clusters: observability: disabled.

The observability service is enabled. After you enable the observability service, the following functions are initiated:

  • All the alert managers from the managed clusters are forwarded to the Red Hat Advanced Cluster Management hub cluster.
  • All the managed clusters that are connected to the Red Hat Advanced Cluster Management hub cluster are enabled to send alerts back to the Red Hat Advanced Cluster Management observability service. You can configure the Red Hat Advanced Cluster Management Alertmanager to take care of deduplicating, grouping, and routing the alerts to the correct receiver integration such as email, PagerDuty, or OpsGenie. You can also handle silencing and inhibition of the alerts.

    Note: Alert forwarding to the Red Hat Advanced Cluster Management hub cluster feature is only supported by managed clusters with Red Hat OpenShift Container Platform version 4.12 or later. After you install Red Hat Advanced Cluster Management with observability enabled, alerts from OpenShift Container Platform 4.12 and later are automatically forwarded to the hub cluster. See Forwarding alerts to learn more.

2.3. Enabling observability from the Red Hat OpenShift Container Platform console

Optionally, you can enable observability from the Red Hat OpenShift Container Platform console, create a project named open-cluster-management-observability. Be sure to create an image pull-secret named, multiclusterhub-operator-pull-secret in the open-cluster-management-observability project.

Create your object storage secret named, thanos-object-storage in the open-cluster-management-observability project. Enter the object storage secret details, then click Create. See step four of the Enabling observability section to view an example of a secret.

Create the MultiClusterObservability custom resource instance. When you receive the following message, the observability service is enabled successfully from OpenShift Container Platform: Observability components are deployed and running.

2.3.1. Verifying the Thanos version

After Thanos is deployed on your cluster, verify the Thanos version from the command line interface (CLI).

After you log in to your hub cluster, run the following command in the observability pods to receive the Thanos version:

thanos --version

The Thanos version is displayed.

2.4. Disabling observability

You can disable observability, which stops data collection on the Red Hat Advanced Cluster Management hub cluster.

2.4.1. Disabling observability on all clusters

Disable observability by removing observability components on all managed clusters. Update the multicluster-observability-operator resource by setting enableMetrics to false. Your updated resource might resemble the following change:

spec:
  imagePullPolicy: Always
  imagePullSecret: multiclusterhub-operator-pull-secret
  observabilityAddonSpec: # The ObservabilityAddonSpec defines the global settings for all managed clusters which have observability add-on enabled
    enableMetrics: false #indicates the observability addon push metrics to hub server

2.4.2. Disabling observability on a single cluster

Disable observability by removing observability components on specific managed clusters. Add the observability: disabled label to the managedclusters.cluster.open-cluster-management.io custom resource. From the Red Hat Advanced Cluster Management console Clusters page, add the observability=disabled label to the specified cluster.

Note: When a managed cluster with the observability component is detached, the metrics-collector deployments are removed.

2.5. Removing observability

When you remove the MultiClusterObservability custom resource, you are disabling and uninstalling the observability service. From the OpenShift Container Platform console navigation, select Operators > Installed Operators > Advanced Cluster Manager for Kubernetes. Remove the MultiClusterObservability custom resource.

2.6. Additional resources

Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.