検索

1.4. Customizing observability

download PDF

Review the following sections to learn more about customizing, managing, and viewing data that is collected by the observability service.

Collect logs about new information that is created for observability resources with the must-gather command. For more information, see the Must-gather section in the Troubleshooting documentation.

1.4.1. Creating custom rules

You can create custom rules for the observability installation by adding Prometheus recording rules and alerting rules to the observability resource. For more information, see Prometheus configuration.

Note: You can only create custom rules on the metrics that are collected from all managed clusters. View a list of of the metrics that are collected by running the following command: kubectl describe cm observability-metrics-whitelist.

Define custom rules with Prometheus to create alert conditions, and send notifications to an external messaging service. Complete the following steps to create a custom rule:

  1. Log in to your Red Hat Advanced Cluster Management hub cluster.
  2. Create a ConfigMap named thanos-rule-custom-rules in the open-cluster-management-observability namespace. The key must be named, thanos-ruler-custom-rules.yaml, as shown in the following example. You can create multiple rules in the configuration:

    By default, the out-of-the-box alert rules are defined in the ConfigMap in the open-cluster-management-observability namespace.

    For example, you can create a custom alert rule that notifies you when your CPU usage passes your defined value:

    data:
      custom_rules.yaml: |
        groups:
          - name: cluster-health
            rules:
            - alert: ClusterCPUHealth-jb
              annotations:
                summary: Notify when CPU utilization on a cluster is greater than the defined utilization limit
                description: "The cluster has a high CPU usage: {{ $value }} core for {{ $labels.cluster }} {{ $labels.clusterID }}."
              expr: |
                max(cluster:cpu_usage_cores:sum) by (clusterID, cluster, prometheus) > 0
              for: 5s
              labels:
                cluster: "{{ $labels.cluster }}"
                prometheus: "{{ $labels.prometheus }}"
                severity: critical

    Note: If this is the first new custom rule, it is created immediately. For changes to the ConfigMap, you must restart the observability pods with the following command: kubectl rollout restart statefulset observability-observatorium-thanos-rule -n open-cluster-management-observability.

  3. If you want to verify that the alert rules is functioning appropriately, complete the following steps:

    1. Access your Grafana dashboard and select the Explore icon.
    2. In the Metrics exploration bar, type in "ALERTS" and run the query. All the ALERTS that are currently in pending or firing state in the system are displayed.
    3. If your alert is not displayed, revisit the rule to see if the expression is accurate.

A custom rule is created.

1.4.1.1. Configuring rules for AlertManager

Integrate external messaging tools such as email, Slack, and PagerDuty to receive notifications from AlertManager. You must override the alertmanager-config secret in the open-cluster-management-observability namespace to add integrations, and configure routes for AlertManager. Complete the following steps to update the custom receiver rules:

  1. Extract the data from the alertmanager-config secret. Run the following command:

    oc -n open-cluster-management-observability get secret alertmanager-config --template='{{ index .data "alertmanager.yaml" }}' |base64 -d > alertmanager.yaml
  2. Edit and save the alertmanager.yaml file configuration by running the following command:

    oc -n open-cluster-management-observability create secret generic alertmanager-config --from-file=alertmanager.yaml --dry-run -o=yaml |  oc -n open-cluster-management-observability replace secret --filename=-

    Your updated secret might resemble the following content:

    global
      smtp_smarthost: 'localhost:25'
      smtp_from: 'alertmanager@example.org'
      smtp_auth_username: 'alertmanager'
      smtp_auth_password: 'password'
    templates:
    - '/etc/alertmanager/template/*.tmpl'
    route:
      group_by: ['alertname', 'cluster', 'service']
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 3h
      receiver: team-X-mails
      routes:
      - match_re:
          service: ^(foo1|foo2|baz)$
        receiver: team-X-mails

Your changes are applied immediately after it is modified. For an example of AlertManager, see prometheus/alertmanager.

Red Hat logoGithubRedditYoutubeTwitter

詳細情報

試用、購入および販売

コミュニティー

Red Hat ドキュメントについて

Red Hat をお使いのお客様が、信頼できるコンテンツが含まれている製品やサービスを活用することで、イノベーションを行い、目標を達成できるようにします。

多様性を受け入れるオープンソースの強化

Red Hat では、コード、ドキュメント、Web プロパティーにおける配慮に欠ける用語の置き換えに取り組んでいます。このような変更は、段階的に実施される予定です。詳細情報: Red Hat ブログ.

会社概要

Red Hat は、企業がコアとなるデータセンターからネットワークエッジに至るまで、各種プラットフォームや環境全体で作業を簡素化できるように、強化されたソリューションを提供しています。

© 2024 Red Hat, Inc.