Este contenido no está disponible en el idioma seleccionado.
Chapter 4. Customizing observability configuration
Customize the observability configuration to the specific needs of your environment, after you enable observability.
To learn more about how you want to manage and view cluster fleet data that the observability service collects, read the following sections:
Required access: Cluster administrator
- Creating custom rules
- Adding custom metrics
- Adding advanced configuration
- Updating the MultiClusterObservability custom resource replicas from the console
- Customizing route certification
- Customizing certificates for accessing the object store
- Configuring proxy settings for observability add-ons
- Disabling proxy settings for observability add-ons
4.1. Creating custom rules Copiar enlaceEnlace copiado en el portapapeles!
Create custom rules for the observability installation by adding Prometheus recording rules and alerting rules to the observability resource.
To precalculate expensive expressions, use the recording rules abilities. The results are saved as a new set of time series. With the alerting rules, you can specify the alert conditions based on how you want to send an alert to an external service.
Note: When you update your custom rules, observability-thanos-rule pods restart automatically.
Define custom rules with Prometheus to create alert conditions, and send notifications to an external messaging service. View the following examples of custom rules:
Create a custom alert rule. Create a config map named
thanos-ruler-custom-rulesin theopen-cluster-management-observabilitynamespace. You must name the key,custom_rules.yaml, as shown in the following example. You can create multiple rules in the configuration.- Create a custom alert rule that notifies you when your CPU usage passes your defined value. Your YAML might resemble the following content:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow +
-
The default alert rules are in the
thanos-ruler-default-rulesconfig map of theopen-cluster-management-observabilitynamespace.
Create a custom recording rule within the
thanos-ruler-custom-rulesconfig map. Create a recording rule that gives you the ability to get the sum of the container memory cache of a pod. Your YAML might resemble the following content:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Note: After you make changes to the config map, the configuration automatically reloads. The configuration reloads because of the
config-reloadwithin theobservability-thanos-rulersidecar.-
To verify that the alert rules are functioning correctly, go to the Grafana dashboard, select to the Explore page, and query
ALERTS. The alert is only available in Grafana if you created the alert.
4.2. Adding custom metrics Copiar enlaceEnlace copiado en el portapapeles!
Add metrics to the metrics_list.yaml file, to collect from managed clusters. Complete the following steps:
Before you add a custom metric, verify that
mco observabilityis enabled with the following command:oc get mco observability -o yaml
oc get mco observability -o yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Check for the following message in the
status.conditions.messagesection reads:Observability components are deployed and running
Observability components are deployed and runningCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create the
observability-metrics-custom-allowlistconfig map in theopen-cluster-management-observabilitynamespace with the following command:oc apply -n open-cluster-management-observability -f observability-metrics-custom-allowlist.yaml
oc apply -n open-cluster-management-observability -f observability-metrics-custom-allowlist.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Add the name of the custom metric to the
metrics_list.yamlparameter. Your YAML for the config map might resemble the following content:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Optional: Add the name of the custom metrics that are to be collected from the managed cluster.
- 2
- Optional: Enter only one value for the
exprandrecordparameter pair to define the query expression. The metrics are collected as the name that is defined in therecordparameter from your managed cluster. The metric value returned are the results after you run the query expression.
You can use either one or both of the sections. For user workload metrics, see the Adding user workload metrics section.
- Verify the data collection from your custom metric by querying the metric from the Explore page. You can also use the custom metrics in your own dashboard.
4.2.1. Adding user workload metrics Copiar enlaceEnlace copiado en el portapapeles!
Collect OpenShift Container Platform user-defined metrics from workloads in OpenShift Container Platform to display the metrics from your Grafana dashboard. Complete the following steps:
Enable monitoring on your OpenShift Container Platform cluster. See Enabling monitoring for user-defined projects in the Additional resources section.
If you have a managed cluster with monitoring for user-defined workloads enabled, the user workloads are located in the
testnamespace and generate metrics. These metrics are collected by Prometheus from the OpenShift Container Platform user workload.Add user workload metrics to the
observability-metrics-custom-allowlistconfig map to collect the metrics in thetestnamespace. View the following example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Enter the key for the config map data.
- 2
- Enter the value of the config map data in YAML format. The
namessection includes the list of metric names, which you want to collect from thetestnamespace. After you create the config map, the observability collector collects and pushes the metrics from the target namespace to the hub cluster.
4.2.2. Removing default metrics Copiar enlaceEnlace copiado en el portapapeles!
If you do not want to collect data for a specific metric from your managed cluster, remove the metric from the observability-metrics-custom-allowlist.yaml file. When you remove a metric, the metric data is not collected from your managed clusters. Complete the following steps to remove a default metric:
Verify that
mco observabilityis enabled by using the following command:oc get mco observability -o yaml
oc get mco observability -o yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Add the name of the default metric to the
metrics_list.yamlparameter with a hyphen-at the start of the metric name. View the following metric example:-cluster_infrastructure_provider
-cluster_infrastructure_providerCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create the
observability-metrics-custom-allowlistconfig map in theopen-cluster-management-observabilitynamespace with the following command:oc apply -n open-cluster-management-observability -f observability-metrics-custom-allowlist.yaml
oc apply -n open-cluster-management-observability -f observability-metrics-custom-allowlist.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow - Verify that the observability service is not collecting the specific metric from your managed clusters. When you query the metric from the Grafana dashboard, the metric is not displayed.
4.3. Adding advanced configuration for retention Copiar enlaceEnlace copiado en el portapapeles!
Add the advanced configuration section to update the retention for each observability component, according to your needs.
Edit the MultiClusterObservability custom resource and add the advanced section with the following command:
oc edit mco observability -o yaml
oc edit mco observability -o yaml
Your YAML file might resemble the following contents:
For descriptions of all the parameters that can added into the advanced configuration, see the Observability API documentation.
4.4. Dynamic metrics for single-node OpenShift clusters Copiar enlaceEnlace copiado en el portapapeles!
Dynamic metrics collection supports automatic metric collection based on certain conditions. By default, a single-node OpenShift cluster does not collect pod and container resource metrics. Once a single-node OpenShift cluster reaches a specific level of resource consumption, the defined granular metrics are collected dynamically. When the cluster resource consumption is consistently less than the threshold for a period of time, granular metric collection stops.
The metrics are collected dynamically based on the conditions on the managed cluster specified by a collection rule. Because these metrics are collected dynamically, the following Red Hat Advanced Cluster Management Grafana dashboards do not display any data. When a collection rule is activated and the corresponding metrics are collected, the following panels display data for the duration of the time that the collection rule is initiated:
- Kubernetes/Compute Resources/Namespace (Pods)
- Kubernetes/Compute Resources/Namespace (Workloads)
- Kubernetes/Compute Resources/Nodes (Pods)
- Kubernetes/Compute Resources/Pod
- Kubernetes/Compute Resources/Workload A collection rule includes the following conditions:
- A set of metrics to collect dynamically.
- Conditions written as a PromQL expression.
-
A time interval for the collection, which must be set to
true. - A match expression to select clusters where the collect rule must be evaluated.
By default, collection rules are evaluated continuously on managed clusters every 30 seconds, or at a specific time interval. The lowest value between the collection interval and time interval takes precedence. Once the collection rule condition persists for the duration specified by the for attribute, the collection rule starts and the metrics specified by the rule are automatically collected on the managed cluster. Metrics collection stops automatically after the collection rule condition no longer exists on the managed cluster, at least 15 minutes after it starts.
The collection rules are grouped together as a parameter section named collect_rules, where it can be enabled or disabled as a group. Red Hat Advanced Cluster Management installation includes the collection rule group, SNOResourceUsage with two default collection rules: HighCPUUsage and HighMemoryUsage. The HighCPUUsage collection rule begins when the node CPU usage exceeds 70%. The HighMemoryUsage collection rule begins if the overall memory utilization of the single-node OpenShift cluster exceeds 70% of the available node memory. Currently, the previously mentioned thresholds are fixed and cannot be changed. When a collection rule begins for more than the interval specified by the for attribute, the system automatically starts collecting the metrics that are specified in the dynamic_metrics section.
View the list of dynamic metrics that from the collect_rules section, in the following YAML file:
A collect_rules.group can be disabled in the custom-allowlist as shown in the following example. When a collect_rules.group is disabled, metrics collection reverts to the previous behavior. These metrics are collected at regularly, specified intervals:
collect_rules: - group: -SNOResourceUsage
collect_rules:
- group: -SNOResourceUsage
The data is only displayed in Grafana when the rule is initiated.
4.5. Updating the MultiClusterObservability custom resource replicas from the console Copiar enlaceEnlace copiado en el portapapeles!
If your workload increases, increase the number of replicas of your observability pods. Navigate to the Red Hat OpenShift Container Platform console from your hub cluster. Locate the MultiClusterObservability custom resource, and update the replicas parameter value for the component where you want to change the replicas. Your updated YAML might resemble the following content:
spec:
advanced:
receive:
replicas: 6
spec:
advanced:
receive:
replicas: 6
For more information about the parameters within the mco observability custom resource, see the Observability API documentation.
4.6. Customizing route certificate Copiar enlaceEnlace copiado en el portapapeles!
If you want to customize the OpenShift Container Platform route certification, you must add the routes in the alt_names section. To ensure your OpenShift Container Platform routes are accessible, add the following information: alertmanager.apps.<domainname>, observatorium-api.apps.<domainname>, rbac-query-proxy.apps.<domainname>.
For more details, see Replacing certificates for alertmanager route in the Governance documentation.
Note: Users are responsible for certificate rotations and updates.
4.7. Customizing certificates for accessing the object store Copiar enlaceEnlace copiado en el portapapeles!
Complete the following steps to customize certificates for accessing the object store:
Edit the
http_configsection by adding the certificate in the object store secret. View the following example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Add the object store secret in the
open-cluster-management-observabilitynamespace. The secret must contain theca.crtthat you defined in the previous secret example. If you want to enable Mutual TLS, you need to providepublic.crt, andprivate.keyin the previous secret. View the following example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The path to certificates and key values for the
thanos-object-storagesecret.
Configure the secret name and mount path by updating the
tlsSecretNameandtlsSecretMountPathparameters in theMultiClusterObservabilitycustom resource. View the following example where the secret name istls-certs-secretand the mount path for the certificates and key value is the directory that is used in the prior example:metricObjectStorage: key: thanos.yaml name: thanos-object-storage tlsSecretName: tls-certs-secret tlsSecretMountPath: /etc/minio/certsmetricObjectStorage: key: thanos.yaml name: thanos-object-storage tlsSecretName: tls-certs-secret tlsSecretMountPath: /etc/minio/certsCopy to Clipboard Copied! Toggle word wrap Toggle overflow Mount the secret in the
tlsSecretMountPathresource of all components that need to access the object store by renaming the existing TLS. See the following example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - To verify that you can access the object store, check that the pods are displayed.
4.8. Configuring proxy settings for observability add-ons Copiar enlaceEnlace copiado en el portapapeles!
Configure the proxy settings to allow the communications from the managed cluster to access the hub cluster through a HTTP and HTTPS proxy server. Typically, add-ons do not need any special configuration to support HTTP and HTTPS proxy servers between a hub cluster and a managed cluster. But if you enabled the observability add-on, you must complete the proxy configuration.
4.9. Prerequisite Copiar enlaceEnlace copiado en el portapapeles!
- You have a hub cluster.
- You have enabled the proxy settings between the hub cluster and managed cluster.
Complete the following steps to configure the proxy settings for the observability add-on:
- Go to the cluster namespace on your hub cluster.
Create an
AddOnDeploymentConfigresource with the proxy settings by adding aspec.proxyConfigparameter. View the following YAML example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow To get the IP address, run following command on your managed cluster:
oc -n default describe svc kubernetes | grep IP:
oc -n default describe svc kubernetes | grep IP:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Go to the
ManagedClusterAddOnresource and update it by referencing theAddOnDeploymentConfigresource that you made. View the following YAML example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify the proxy settings. If you successfully configured the proxy settings, the metric collector deployed by the observability add-on agent on the managed cluster sends the data to the hub cluster. Complete the following steps:
- Go to the hub cluster then the managed cluster on the Grafana dashboard.
- View the metrics for the proxy settings.
4.10. Disabling proxy settings for observability add-ons Copiar enlaceEnlace copiado en el portapapeles!
If your development needs change, you might need to disable the proxy setting for the observability add-ons you configured for the hub cluster and managed cluster. You can disable the proxy settings for the observability add-on at any time. Complete the following steps:
-
Go to the
ManagedClusterAddOnresource. -
Remove the referenced
AddOnDeploymentConfigresource.
4.11. Additional resources Copiar enlaceEnlace copiado en el portapapeles!
- Refer to Prometheus configuration for more information. For more information about recording rules and alerting rules, refer to the recording rules and alerting rules from the Prometheus documentation.
- For more information about viewing the dashboard, see Using Grafana dashboards.
- See Exporting metrics to external endpoints.
- See Enabling monitoring for user-defined projects.
- See the Observability API.
- For information about updating the certificate for the alertmanager route, see Replacing certificates for alertmanager.
- For more details about observability alerts, see Observability alerts
- To learn more about alert forwarding, see the Prometheus Alertmanager documentation.
- See Observability alerts for more information.
- For more topics about the observability service, see Observability service introduction.
- See Management Workload Partitioning for more information.
- Return to the beginning of this topic, Customizing observability.