Chapter 6. Using operational features of Service Telemetry Framework

6.1. Dashboards in Service Telemetry Framework
Copia collegamento

Use the third-party application, Grafana, to visualize system-level metrics that the data collectors collectd and Ceilometer gather for each individual host node.

For more information about configuring data collectors, see Section 4.1, “Deploying Red Hat OpenStack Platform overcloud for Service Telemetry Framework using director”.

You can use dashboards to monitor a cloud:

Infrastructure dashboard

Use the infrastructure dashboard to view metrics for a single node at a time. Select a node from the upper left corner of the dashboard.

Cloud view dashboard

Use the cloud view dashboard to view panels to monitor service resource usage, API stats, and cloud events. You must enable API health monitoring and service monitoring to provide the data for this dashboard. API health monitoring is enabled by default in the STF base configuration. For more information, see Section 4.1.4, “Creating the base configuration for STF”.

For more information about API health monitoring, see Section 6.9, “Red Hat OpenStack Platform API status and containerized services health”.
For more information about RHOSP service monitoring, see Section 6.8, “Resource usage of Red Hat OpenStack Platform services”.

Virtual machine view dashboard

Use the virtual machine view dashboard to view panels to monitor virtual machine infrastructure usage. Select a cloud and project from the upper left corner of the dashboard. You must enable event storage if you want to enable the event annotations on this dashboard. For more information, see Section 3.2, “Creating a ServiceTelemetry object in Red Hat OpenShift Container Platform”.

Memcached view dashboard

Use the memcached view dashboard to view panels to monitor connections, availability, system metrics and cache performance. Select a cloud from the upper left corner of the dashboard.

6.1.1. Configuring Grafana to host the dashboard
Copia collegamento

Grafana is not included in the default Service Telemetry Framework (STF) deployment, so you must deploy the Grafana Operator from community-operators CatalogSource. If you use the Service Telemetry Operator to deploy Grafana, it results in a Grafana instance and the configuration of the default data sources for the local STF deployment.

Procedure

Log in to your Red Hat OpenShift Container Platform environment where STF is hosted.

Subscribe to the Grafana Operator by using the community-operators CatalogSource:

Warning

Community Operators are Operators which have not been vetted or verified by Red Hat. Community Operators should be used with caution because their stability is unknown. Red Hat provides no support for community Operators.

Learn more about Red Hat’s third party software support policy

$ oc apply -f - <<EOF
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  labels:
    operators.coreos.com/grafana-operator.openshift-operators: ""
  name: grafana-operator
  namespace: openshift-operators
spec:
  channel: v5
  installPlanApproval: Automatic
  name: grafana-operator
  source: community-operators
  sourceNamespace: openshift-marketplace
EOF

$ oc apply -f - <<EOF
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  labels:
    operators.coreos.com/grafana-operator.openshift-operators: ""
  name: grafana-operator
  namespace: openshift-operators
spec:
  channel: v5
  installPlanApproval: Automatic
  name: grafana-operator
  source: community-operators
  sourceNamespace: openshift-marketplace
EOF

Copy to Clipboard

Toggle word wrap

Verify that the Operator launched successfully. In the command output, if the value of the PHASE column is Succeeded, the Operator launched successfully:

oc wait --for jsonpath="{.status.phase}"=Succeeded csv --namespace openshift-operators -l operators.coreos.com/grafana-operator.openshift-operators

$ oc wait --for jsonpath="{.status.phase}"=Succeeded csv --namespace openshift-operators -l operators.coreos.com/grafana-operator.openshift-operators

clusterserviceversion.operators.coreos.com/grafana-operator.v5.6.0 condition met

Copy to Clipboard

Toggle word wrap

To launch a Grafana instance, create or modify the ServiceTelemetry object. Set graphing.enabled and graphing.grafana.ingressEnabled to true. Optionally, set the value of graphing.grafana.baseImage to the Grafana workload container image that will be deployed:
```
oc edit stf default
```
```
$ oc edit stf default

apiVersion: infra.watch/v1beta1
kind: ServiceTelemetry
...
spec:
  ...
  graphing:
    enabled: true
    grafana:
      ingressEnabled: true
      baseImage: 'registry.redhat.io/rhel8/grafana:9'
```
Copy to Clipboard Toggle word wrap

Verify that the Grafana instance deployed:

oc wait --for jsonpath="{.status.phase}"=Running pod -l app=default-grafana --timeout=600s

$ oc wait --for jsonpath="{.status.phase}"=Running pod -l app=default-grafana --timeout=600s

pod/default-grafana-deployment-669968df64-wz5s2 condition met

Copy to Clipboard

Toggle word wrap

Verify that the Grafana data sources installed correctly:

oc get grafanadatasources.grafana.integreatly.org

$ oc get grafanadatasources.grafana.integreatly.org

NAME                        NO MATCHING INSTANCES   LAST RESYNC   AGE
default-ds-stf-prometheus                           2m35s         2m56s

Copy to Clipboard

Toggle word wrap

Verify that the Grafana route exists:

oc get route default-grafana-route

$ oc get route default-grafana-route

NAME                    HOST/PORT                                                   PATH   SERVICES                  PORT   TERMINATION   WILDCARD
default-grafana-route   default-grafana-route-service-telemetry.apps.infra.watch           default-grafana-service   web    reencrypt     None

Copy to Clipboard

Toggle word wrap

6.1.2. Enabling dashboards
Copia collegamento

The Grafana Operator can import and manage dashboards by creating GrafanaDashboard objects. Service Telemetry Operator can enable a set of default dashboards that create the GrafanaDashboard objects that load dashboards into the Grafana instance.

Set the value of graphing.grafana.dashboards.enabled to true to load the following dashboards into Grafana :

Infrastructure dashboard
Cloud view dashboard
Virtual machine view dashboard
Memcached view dashboard

You can use the GrafanaDashboard object to create and load additional dashboards into Grafana. For more information about managing dashboards with Grafana Operator, see Dashboards in the Grafana Operator project documentation.

Prerequisites

You enabled graphing in the ServiceTelemetry object. For more information about graphing, see Section 6.1.1, “Configuring Grafana to host the dashboard”.

Procedure

To enable the managed dashboards, create or modify the ServiceTelemetry object. Set graphing.grafana.dashboards.enabled to true:

oc edit stf default

$ oc edit stf default

apiVersion: infra.watch/v1beta1
kind: ServiceTelemetry
...
spec:
  ...
  graphing:
    enabled: true
    grafana:
      dashboards:
        enabled: true

Copy to Clipboard

Toggle word wrap

Verify that the Grafana dashboards are created. The process of Service Telemetry Operator creating the dashboards might take some time.

oc get grafanadashboards.grafana.integreatly.org

$ oc get grafanadashboards.grafana.integreatly.org

NAME                          NO MATCHING INSTANCES   LAST RESYNC   AGE
memcached-dashboard-1                                 38s           38s
rhos-cloud-dashboard-1                                39s           39s
rhos-dashboard-1                                      39s           39s
virtual-machine-dashboard-1                           37s           37s

Copy to Clipboard

Toggle word wrap

Retrieve the Grafana route address:

oc get route default-grafana-route -ojsonpath='{.spec.host}'

$ oc get route default-grafana-route -ojsonpath='{.spec.host}'

default-grafana-route-service-telemetry.apps.infra.watch

Copy to Clipboard

Toggle word wrap

In a web browser, navigate to https://<grafana_route_address>. Replace <grafana_route_address> with the value that you retrieved in the previous step.
Log in with OpenShift credentials. For more information about logging in, see Section 3.3, “Accessing user interfaces for STF components”.
To view the dashboard, click Dashboards and Browse. The managed dashboards are available in the service-telemetry folder.

6.1.3. Connecting an external dashboard system
Copia collegamento

It is possible to configure third-party visualization tools to connect to the STF Prometheus for metrics retrieval. Access is controlled via an OAuth token, and a ServiceAccount is already created that has (only) the required permissions. A new OAuth token can be generated against this account for the external system to use.

To use the authentication token, the third-party tool must be configured to supply an HTTP Bearer Token Authorization header as described in RFC6750. Consult the documentation of the third-party tool for how to configure this header. For example Configure Prometheus - Custom HTTP Headers in the Grafana Documentation.

Procedure

Log in to Red Hat OpenShift Container Platform.
Change to the service-telemetry namespace:
```
oc project service-telemetry
```
```
$ oc project service-telemetry
```
Copy to Clipboard Toggle word wrap

Create a new token secret for the stf-prometheus-reader service account

oc create -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
  name: my-prometheus-reader-token
  namespace: service-telemetry
  annotations:
    kubernetes.io/service-account.name: stf-prometheus-reader
type: kubernetes.io/service-account-token
EOF

$ oc create -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
  name: my-prometheus-reader-token
  namespace: service-telemetry
  annotations:
    kubernetes.io/service-account.name: stf-prometheus-reader
type: kubernetes.io/service-account-token
EOF

Copy to Clipboard

Toggle word wrap

Retrieve the token from the secret

TOKEN=$(oc get secret my-prometheus-reader-token -o template='{{.data.token}}' | base64 -d)

$ TOKEN=$(oc get secret my-prometheus-reader-token -o template='{{.data.token}}' | base64 -d)

Copy to Clipboard

Toggle word wrap

Retrieve the Prometheus host name

PROM_HOST=$(oc get route default-prometheus-proxy -ogo-template='{{ .spec.host }}')

$ PROM_HOST=$(oc get route default-prometheus-proxy -ogo-template='{{ .spec.host }}')

Copy to Clipboard

Toggle word wrap

Test the access token

curl -k -H "Authorization: Bearer ${TOKEN}" https://${PROM_HOST}/api/v1/query?query=up

$ curl -k -H "Authorization: Bearer ${TOKEN}" https://${PROM_HOST}/api/v1/query?query=up

{"status":"success",[...]

Copy to Clipboard

Toggle word wrap

Configure your third-party tool with the PROM_HOST and TOKEN values from above
```
echo $PROM_HOST
echo $TOKEN
```
```
$ echo $PROM_HOST
$ echo $TOKEN
```
Copy to Clipboard Toggle word wrap
The token remains valid as long as the secret exists. You can revoke the token by deleting the secret.
```
oc delete secret my-prometheus-reader-token
```
```
$ oc delete secret my-prometheus-reader-token
secret "my-prometheus-reader-token" deleted
```
Copy to Clipboard Toggle word wrap

Additional information

For more information about service account token secrets, see Creating a service account token secret in the OpenShift Container Platform Documentation.

6.2. Metrics retention time period in Service Telemetry Framework
Copia collegamento

The default retention time for metrics stored in Service Telemetry Framework (STF) is 24 hours, which provides enough data for trends to develop for the purposes of alerting.

For long-term storage, use systems designed for long-term data retention, for example, Thanos.

Additional resources

To adjust STF for additional metrics retention time, see Section 6.2.1, “Editing the metrics retention time period in Service Telemetry Framework”.
For recommendations about Prometheus data storage and estimating storage space, see https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects
For more information about Thanos, see https://thanos.io/

6.2.1. Editing the metrics retention time period in Service Telemetry Framework
Copia collegamento

You can adjust Service Telemetry Framework (STF) for additional metrics retention time.

Procedure

Log in to Red Hat OpenShift Container Platform.
Change to the service-telemetry namespace:
```
oc project service-telemetry
```
```
$ oc project service-telemetry
```
Copy to Clipboard Toggle word wrap
Edit the ServiceTelemetry object:
```
oc edit stf default
```
```
$ oc edit stf default
```
Copy to Clipboard Toggle word wrap

Add retention: 7d to the storage section of backends.metrics.prometheus.storage to increase the retention period to seven days:

Note

If you set a long retention period, retrieving data from heavily populated Prometheus systems can result in queries returning results slowly.

apiVersion: infra.watch/v1beta1
kind: ServiceTelemetry
metadata:
  name: default
  namespace: service-telemetry
spec:
  ...
  backends:
    metrics:
      prometheus:
        enabled: true
        storage:
          strategy: persistent
          retention: 7d
    ...

apiVersion: infra.watch/v1beta1
kind: ServiceTelemetry
metadata:
  name: default
  namespace: service-telemetry
spec:
  ...
  backends:
    metrics:
      prometheus:
        enabled: true
        storage:
          strategy: persistent
          retention: 7d
    ...

Copy to Clipboard

Toggle word wrap

Save your changes and close the object.
Wait for prometheus to restart with the new settings.
```
oc get po -l app.kubernetes.io/name=prometheus -w
```
```
$ oc get po -l app.kubernetes.io/name=prometheus -w
```
Copy to Clipboard Toggle word wrap

Verify the new retention setting by checking the command line arguments used in the pod.

oc describe po prometheus-default-0 | grep retention.time

$ oc describe po prometheus-default-0 | grep retention.time
      --storage.tsdb.retention.time=24h

Copy to Clipboard

Toggle word wrap

Additional resources

For more information about the metrics retention time, see Section 6.2, “Metrics retention time period in Service Telemetry Framework”.

6.3. Alerts in Service Telemetry Framework
Copia collegamento

You create alert rules in Prometheus and alert routes in Alertmanager. Alert rules in Prometheus servers send alerts to an Alertmanager, which manages the alerts. Alertmanager can silence, inhibit, or aggregate alerts, and send notifications by using email, on-call notification systems, or chat platforms.

To create an alert, complete the following tasks:

Create an alert rule in Prometheus. For more information, see Section 6.3.1, “Creating an alert rule in Prometheus”.
Create an alert route in Alertmanager. There are two ways in which you can create an alert route:
- Creating a standard alert route in Alertmanager.
- Creating an alert route with templating in Alertmanager.

Additional resources

For more information about alerts or notifications with Prometheus and Alertmanager, see https://prometheus.io/docs/alerting/overview/

To view an example set of alerts that you can use with Service Telemetry Framework (STF), see https://github.com/infrawatch/service-telemetry-operator/tree/master/deploy/alerts

6.3.1. Creating an alert rule in Prometheus
Copia collegamento

Prometheus evaluates alert rules to trigger notifications. If the rule condition returns an empty result set, the condition is false. Otherwise, the rule is true and it triggers an alert.

Procedure

Log in to Red Hat OpenShift Container Platform.
Change to the service-telemetry namespace:
```
oc project service-telemetry
```
```
$ oc project service-telemetry
```
Copy to Clipboard Toggle word wrap

Create a PrometheusRule object that contains the alert rule. The Prometheus Operator loads the rule into Prometheus:

$ oc apply -f - <<EOF
apiVersion: monitoring.rhobs/v1
kind: PrometheusRule
metadata:
  creationTimestamp: null
  labels:
    prometheus: default
    role: alert-rules
  name: prometheus-alarm-rules
  namespace: service-telemetry
spec:
  groups:
    - name: ./openstack.rules
      rules:
        - alert: Collectd metrics receive rate is zero
          expr: rate(sg_total_collectd_msg_received_count[1m]) == 0
EOF

$ oc apply -f - <<EOF
apiVersion: monitoring.rhobs/v1
kind: PrometheusRule
metadata:
  creationTimestamp: null
  labels:
    prometheus: default
    role: alert-rules
  name: prometheus-alarm-rules
  namespace: service-telemetry
spec:
  groups:
    - name: ./openstack.rules
      rules:
        - alert: Collectd metrics receive rate is zero
          expr: rate(sg_total_collectd_msg_received_count[1m]) == 0
EOF

Copy to Clipboard

Toggle word wrap

To change the rule, edit the value of the expr parameter.

To verify that the Operator loaded the rules into Prometheus, run the curl command against the default-prometheus-proxy route with basic authentication:

curl -k -H "Authorization: Bearer $(oc create token stf-prometheus-reader)" https://$(oc get route default-prometheus-proxy -ogo-template='{{ .spec.host }}')/api/v1/rules

$ curl -k -H "Authorization: Bearer $(oc create token stf-prometheus-reader)" https://$(oc get route default-prometheus-proxy -ogo-template='{{ .spec.host }}')/api/v1/rules

{"status":"success","data":{"groups":[{"name":"./openstack.rules","file":"/etc/prometheus/rules/prometheus-default-rulefiles-0/service-telemetry-prometheus-alarm-rules.yaml","rules":[{"state":"inactive","name":"Collectd metrics receive count is zero","query":"rate(sg_total_collectd_msg_received_count[1m]) == 0","duration":0,"labels":{},"annotations":{},"alerts":[],"health":"ok","evaluationTime":0.00034627,"lastEvaluation":"2021-12-07T17:23:22.160448028Z","type":"alerting"}],"interval":30,"evaluationTime":0.000353787,"lastEvaluation":"2021-12-07T17:23:22.160444017Z"}]}}

Copy to Clipboard

Toggle word wrap

Additional resources

For more information on alerting, see https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/developer/alerting.md.

6.3.2. Configuring custom alerts
Copia collegamento

You can add custom alerts to the PrometheusRule object that you created in Section 6.3.1, “Creating an alert rule in Prometheus”.

Procedure

Use the oc edit command:

oc edit prometheusrules.monitoring.rhobs prometheus-alarm-rules

$ oc edit prometheusrules.monitoring.rhobs prometheus-alarm-rules

Copy to Clipboard

Toggle word wrap

Edit the PrometheusRules manifest.
Save and close the manifest.

Additional resources

For more information about how to configure alerting rules, see https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/.
For more information about PrometheusRules objects, see https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/alerting.md

6.3.3. Creating a standard alert route in Alertmanager
Copia collegamento

Use Alertmanager to deliver alerts to an external system, such as email, IRC, or other notification channel. The Prometheus Operator manages the Alertmanager configuration as a Red Hat OpenShift Container Platform secret. By default, Service Telemetry Framework (STF) deploys a basic configuration that results in no receivers:

alertmanager.yaml: |-
  global:
    resolve_timeout: 5m
  route:
    group_by: ['job']
    group_wait: 30s
    group_interval: 5m
    repeat_interval: 12h
    receiver: 'null'
  receivers:
  - name: 'null'

alertmanager.yaml: |-
  global:
    resolve_timeout: 5m
  route:
    group_by: ['job']
    group_wait: 30s
    group_interval: 5m
    repeat_interval: 12h
    receiver: 'null'
  receivers:
  - name: 'null'

Copy to Clipboard

Toggle word wrap

To deploy a custom Alertmanager route with STF, you must add a alertmanagerConfigManifest parameter to the Service Telemetry Operator that results in an updated secret, managed by the Prometheus Operator.

Note

If your alertmanagerConfigManifest contains a custom template, for example, to construct the title and text of the sent alert, you must deploy the contents of the alertmanagerConfigManifest using a base64-encoded configuration. For more information, see Section 6.3.4, “Creating an alert route with templating in Alertmanager”.

Procedure

Log in to Red Hat OpenShift Container Platform.
Change to the service-telemetry namespace:
```
oc project service-telemetry
```
```
$ oc project service-telemetry
```
Copy to Clipboard Toggle word wrap
Edit the ServiceTelemetry object for your STF deployment:
```
oc edit stf default
```
```
$ oc edit stf default
```
Copy to Clipboard Toggle word wrap

Add the new parameter alertmanagerConfigManifest and the Secret object contents to define the alertmanager.yaml configuration for Alertmanager:

Note

This step loads the default template that the Service Telemetry Operator manages. To verify that the changes are populating correctly, change a value, return the alertmanager-default secret, and verify that the new value is loaded into memory. For example, change the value of the parameter global.resolve_timeout from 5m to 10m.

apiVersion: infra.watch/v1beta1
kind: ServiceTelemetry
metadata:
  name: default
  namespace: service-telemetry
spec:
  backends:
    metrics:
      prometheus:
        enabled: true
  alertmanagerConfigManifest: |
    apiVersion: v1
    kind: Secret
    metadata:
      name: 'alertmanager-default'
      namespace: 'service-telemetry'
    type: Opaque
    stringData:
      alertmanager.yaml: |-
        global:
          resolve_timeout: 10m
        route:
          group_by: ['job']
          group_wait: 30s
          group_interval: 5m
          repeat_interval: 12h
          receiver: 'null'
        receivers:
        - name: 'null'

apiVersion: infra.watch/v1beta1
kind: ServiceTelemetry
metadata:
  name: default
  namespace: service-telemetry
spec:
  backends:
    metrics:
      prometheus:
        enabled: true
  alertmanagerConfigManifest: |
    apiVersion: v1
    kind: Secret
    metadata:
      name: 'alertmanager-default'
      namespace: 'service-telemetry'
    type: Opaque
    stringData:
      alertmanager.yaml: |-
        global:
          resolve_timeout: 10m
        route:
          group_by: ['job']
          group_wait: 30s
          group_interval: 5m
          repeat_interval: 12h
          receiver: 'null'
        receivers:
        - name: 'null'

Copy to Clipboard

Toggle word wrap

Verify that the configuration has been applied to the secret:

oc get secret alertmanager-default -o go-template='{{index .data "alertmanager.yaml" | base64decode }}'

$ oc get secret alertmanager-default -o go-template='{{index .data "alertmanager.yaml" | base64decode }}'

global:
  resolve_timeout: 10m
route:
  group_by: ['job']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h
  receiver: 'null'
receivers:
- name: 'null'

Copy to Clipboard

Toggle word wrap

Run the wget command from the prometheus pod against the alertmanager-proxy service to retrieve the status and configYAML contents, and verify that the supplied configuration matches the configuration in Alertmanager:

oc exec -it prometheus-default-0 -c prometheus -- sh -c "wget --header \"Authorization: Bearer \$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)\" https://default-alertmanager-proxy:9095/api/v1/status -q -O -"

{"status":"success","data":{"configYAML":"...",...}}

$ oc exec -it prometheus-default-0 -c prometheus -- sh -c "wget --header \"Authorization: Bearer \$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)\" https://default-alertmanager-proxy:9095/api/v1/status -q -O -"

{"status":"success","data":{"configYAML":"...",...}}

Copy to Clipboard

Toggle word wrap

Verify that the configYAML field contains the changes you expect.

Additional resources

For more information about the Red Hat OpenShift Container Platform secret and the Prometheus operator, see Prometheus user guide on alerting.

6.3.4. Creating an alert route with templating in Alertmanager
Copia collegamento

Use Alertmanager to deliver alerts to an external system, such as email, IRC, or other notification channel. The Prometheus Operator manages the Alertmanager configuration as a Red Hat OpenShift Container Platform secret. By default, Service Telemetry Framework (STF) deploys a basic configuration that results in no receivers:

alertmanager.yaml: |-
  global:
    resolve_timeout: 5m
  route:
    group_by: ['job']
    group_wait: 30s
    group_interval: 5m
    repeat_interval: 12h
    receiver: 'null'
  receivers:
  - name: 'null'

alertmanager.yaml: |-
  global:
    resolve_timeout: 5m
  route:
    group_by: ['job']
    group_wait: 30s
    group_interval: 5m
    repeat_interval: 12h
    receiver: 'null'
  receivers:
  - name: 'null'

Copy to Clipboard

Toggle word wrap

If the alertmanagerConfigManifest parameter contains a custom template, for example, to construct the title and text of the sent alert, you must deploy the contents of the alertmanagerConfigManifest by using a base64-encoded configuration.

Procedure

Log in to Red Hat OpenShift Container Platform.
Change to the service-telemetry namespace:
```
oc project service-telemetry
```
```
$ oc project service-telemetry
```
Copy to Clipboard Toggle word wrap

Create the necessary alertmanager config in a file called alertmanager.yaml, for example:

$ cat > alertmanager.yaml <<EOF
global:
  resolve_timeout: 10m
  slack_api_url: <slack_api_url>
receivers:
  - name: slack
    slack_configs:
    - channel: #stf-alerts
      title: |-
        ...
      text: >-
        ...
route:
  group_by: ['job']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h
  receiver: 'slack'
EOF

$ cat > alertmanager.yaml <<EOF
global:
  resolve_timeout: 10m
  slack_api_url: <slack_api_url>
receivers:
  - name: slack
    slack_configs:
    - channel: #stf-alerts
      title: |-
        ...
      text: >-
        ...
route:
  group_by: ['job']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h
  receiver: 'slack'
EOF

Copy to Clipboard

Toggle word wrap

Generate the config manifest and add it to the ServiceTelemetry object for your STF deployment:

CONFIG_MANIFEST=$(oc create secret --dry-run=client generic alertmanager-default --from-file=alertmanager.yaml -o json)
oc patch stf default --type=merge -p '{"spec":{"alertmanagerConfigManifest":'"$CONFIG_MANIFEST"'}}'

$ CONFIG_MANIFEST=$(oc create secret --dry-run=client generic alertmanager-default --from-file=alertmanager.yaml -o json)
$ oc patch stf default --type=merge -p '{"spec":{"alertmanagerConfigManifest":'"$CONFIG_MANIFEST"'}}'

Copy to Clipboard

Toggle word wrap

Verify that the configuration has been applied to the secret:

Note

There will be a short delay as the operators update each object

oc get secret alertmanager-default -o go-template='{{index .data "alertmanager.yaml" | base64decode }}'

$ oc get secret alertmanager-default -o go-template='{{index .data "alertmanager.yaml" | base64decode }}'

global:
  resolve_timeout: 10m
  slack_api_url: <slack_api_url>
receivers:
  - name: slack
    slack_configs:
    - channel: #stf-alerts
      title: |-
        ...
      text: >-
        ...
route:
  group_by: ['job']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h
  receiver: 'slack'

Copy to Clipboard

Toggle word wrap

Run the wget command from the prometheus pod against the alertmanager-proxy service to retrieve the status and configYAML contents, and verify that the supplied configuration matches the configuration in Alertmanager:

oc exec -it prometheus-default-0 -c prometheus -- /bin/sh -c "wget --header \"Authorization: Bearer \$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)\" https://default-alertmanager-proxy:9095/api/v1/status -q -O -"

{"status":"success","data":{"configYAML":"...",...}}

$ oc exec -it prometheus-default-0 -c prometheus -- /bin/sh -c "wget --header \"Authorization: Bearer \$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)\" https://default-alertmanager-proxy:9095/api/v1/status -q -O -"

{"status":"success","data":{"configYAML":"...",...}}

Copy to Clipboard

Toggle word wrap

Verify that the configYAML field contains the changes you expect.

Additional resources

For more information about the Red Hat OpenShift Container Platform secret and the Prometheus operator, see Prometheus user guide on alerting.

6.4. Sending alerts as SNMP traps
Copia collegamento

To enable SNMP traps, modify the ServiceTelemetry object and configure the snmpTraps parameters. SNMP traps are sent using version 2c.

6.4.1. Configuration parameters for snmpTraps
Copia collegamento

The snmpTraps parameter contains the following sub-parameters for configuring the alert receiver:

enabled: Set the value of this sub-parameter to true to enable the SNMP trap alert receiver. The default value is false.
target: Target address to send SNMP traps. Value is a string. Default is 192.168.24.254.
port: Target port to send SNMP traps. Value is an integer. Default is 162.
community: Target community to send SNMP traps to. Value is a string. Default is public.
retries: SNMP trap retry delivery limit. Value is an integer. Default is 5.
timeout: SNMP trap delivery timeout defined in seconds. Value is an integer. Default is 1.
alertOidLabel: Label name in the alert that defines the OID value to send the SNMP trap as. Value is a string. Default is oid.
trapOidPrefix: SNMP trap OID prefix for variable bindings. Value is a string. Default is 1.3.6.1.4.1.50495.15.
trapDefaultOid: SNMP trap OID when no alert OID label has been specified with the alert. Value is a string. Default is 1.3.6.1.4.1.50495.15.1.2.1.
trapDefaultSeverity: SNMP trap severity when no alert severity has been set. Value is a string. Defaults to an empty string.

Configure the snmpTraps parameter as part of the alerting.alertmanager.receivers definition in the ServiceTelemetry object:

apiVersion: infra.watch/v1beta1
kind: ServiceTelemetry
metadata:
  name: default
  namespace: service-telemetry
spec:
  alerting:
    alertmanager:
      receivers:
        snmpTraps:
          alertOidLabel: oid
          community: public
          enabled: true
          port: 162
          retries: 5
          target: 192.168.25.254
          timeout: 1
          trapDefaultOid: 1.3.6.1.4.1.50495.15.1.2.1
          trapDefaultSeverity: ""
          trapOidPrefix: 1.3.6.1.4.1.50495.15
...

apiVersion: infra.watch/v1beta1
kind: ServiceTelemetry
metadata:
  name: default
  namespace: service-telemetry
spec:
  alerting:
    alertmanager:
      receivers:
        snmpTraps:
          alertOidLabel: oid
          community: public
          enabled: true
          port: 162
          retries: 5
          target: 192.168.25.254
          timeout: 1
          trapDefaultOid: 1.3.6.1.4.1.50495.15.1.2.1
          trapDefaultSeverity: ""
          trapOidPrefix: 1.3.6.1.4.1.50495.15
...

Copy to Clipboard

Toggle word wrap

6.4.2. Overview of the MIB definition
Copia collegamento

Delivery of SNMP traps uses object identifier (OID) value 1.3.6.1.4.1.50495.15.1.2.1 by default. The management information base (MIB) schema is available at https://github.com/infrawatch/prometheus-webhook-snmp/blob/master/PROMETHEUS-ALERT-CEPH-MIB.txt.

The OID number is comprised of the following component values: * The value 1.3.6.1.4.1 is a global OID defined for private enterprises. * The next identifier 50495 is a private enterprise number assigned by IANA for the Ceph organization. * The other values are child OIDs of the parent.

15: prometheus objects
15.1: prometheus alerts
15.1.2: prometheus alert traps
15.1.2.1: prometheus alert trap default

The prometheus alert trap default is an object comprised of several other sub-objects to OID 1.3.6.1.4.1.50495.15 which is defined by the alerting.alertmanager.receivers.snmpTraps.trapOidPrefix parameter:

<trapOidPrefix>.1.1.1: alert name
<trapOidPrefix>.1.1.2: status
<trapOidPrefix>.1.1.3: severity
<trapOidPrefix>.1.1.4: instance
<trapOidPrefix>.1.1.5: job
<trapOidPrefix>.1.1.6: description
<trapOidPrefix>.1.1.7: labels
<trapOidPrefix>.1.1.8: timestamp
<trapOidPrefix>.1.1.9: rawdata

The following is example output from a simple SNMP trap receiver that outputs the received trap to the console:

  SNMPv2-MIB::snmpTrapOID.0 = OID: SNMPv2-SMI::enterprises.50495.15.1.2.1
  SNMPv2-SMI::enterprises.50495.15.1.1.1 = STRING: "TEST ALERT FROM PROMETHEUS PLEASE ACKNOWLEDGE"
  SNMPv2-SMI::enterprises.50495.15.1.1.2 = STRING: "firing"
  SNMPv2-SMI::enterprises.50495.15.1.1.3 = STRING: "warning"
  SNMPv2-SMI::enterprises.50495.15.1.1.4 = ""
  SNMPv2-SMI::enterprises.50495.15.1.1.5 = ""
  SNMPv2-SMI::enterprises.50495.15.1.1.6 = STRING: "TEST ALERT FROM "
  SNMPv2-SMI::enterprises.50495.15.1.1.7 = STRING: "{\"cluster\": \"TEST\", \"container\": \"sg-core\", \"endpoint\": \"prom-https\", \"prometheus\": \"service-telemetry/default\", \"service\": \"default-cloud1-coll-meter\", \"source\": \"SG\"}"
  SNMPv2-SMI::enterprises.50495.15.1.1.8 = Timeticks: (1676476389) 194 days, 0:52:43.89
  SNMPv2-SMI::enterprises.50495.15.1.1.9 = STRING: "{\"status\": \"firing\", \"labels\": {\"cluster\": \"TEST\", \"container\": \"sg-core\", \"endpoint\": \"prom-https\", \"prometheus\": \"service-telemetry/default\", \"service\": \"default-cloud1-coll-meter\", \"source\": \"SG\"}, \"annotations\": {\"action\": \"TESTING PLEASE ACKNOWLEDGE, NO FURTHER ACTION REQUIRED ONLY A TEST\"}, \"startsAt\": \"2023-02-15T15:53:09.109Z\", \"endsAt\": \"0001-01-01T00:00:00Z\", \"generatorURL\": \"http://prometheus-default-0:9090/graph?g0.expr=sg_total_collectd_msg_received_count+%3E+1&g0.tab=1\", \"fingerprint\": \"feefeb77c577a02f\"}"

  SNMPv2-MIB::snmpTrapOID.0 = OID: SNMPv2-SMI::enterprises.50495.15.1.2.1
  SNMPv2-SMI::enterprises.50495.15.1.1.1 = STRING: "TEST ALERT FROM PROMETHEUS PLEASE ACKNOWLEDGE"
  SNMPv2-SMI::enterprises.50495.15.1.1.2 = STRING: "firing"
  SNMPv2-SMI::enterprises.50495.15.1.1.3 = STRING: "warning"
  SNMPv2-SMI::enterprises.50495.15.1.1.4 = ""
  SNMPv2-SMI::enterprises.50495.15.1.1.5 = ""
  SNMPv2-SMI::enterprises.50495.15.1.1.6 = STRING: "TEST ALERT FROM "
  SNMPv2-SMI::enterprises.50495.15.1.1.7 = STRING: "{\"cluster\": \"TEST\", \"container\": \"sg-core\", \"endpoint\": \"prom-https\", \"prometheus\": \"service-telemetry/default\", \"service\": \"default-cloud1-coll-meter\", \"source\": \"SG\"}"
  SNMPv2-SMI::enterprises.50495.15.1.1.8 = Timeticks: (1676476389) 194 days, 0:52:43.89
  SNMPv2-SMI::enterprises.50495.15.1.1.9 = STRING: "{\"status\": \"firing\", \"labels\": {\"cluster\": \"TEST\", \"container\": \"sg-core\", \"endpoint\": \"prom-https\", \"prometheus\": \"service-telemetry/default\", \"service\": \"default-cloud1-coll-meter\", \"source\": \"SG\"}, \"annotations\": {\"action\": \"TESTING PLEASE ACKNOWLEDGE, NO FURTHER ACTION REQUIRED ONLY A TEST\"}, \"startsAt\": \"2023-02-15T15:53:09.109Z\", \"endsAt\": \"0001-01-01T00:00:00Z\", \"generatorURL\": \"http://prometheus-default-0:9090/graph?g0.expr=sg_total_collectd_msg_received_count+%3E+1&g0.tab=1\", \"fingerprint\": \"feefeb77c577a02f\"}"

Copy to Clipboard

Toggle word wrap

6.4.3. Configuring SNMP traps
Copia collegamento

Prerequisites

Ensure that you know the IP address or hostname of the SNMP trap receiver where you want to send the alerts to.

Procedure

Log in to Red Hat OpenShift Container Platform.
Change to the service-telemetry namespace:
```
oc project service-telemetry
```
```
$ oc project service-telemetry
```
Copy to Clipboard Toggle word wrap
To enable SNMP traps, modify the ServiceTelemetry object:
```
oc edit stf default
```
```
$ oc edit stf default
```
Copy to Clipboard Toggle word wrap

Set the alerting.alertmanager.receivers.snmpTraps parameters:

apiVersion: infra.watch/v1beta1
kind: ServiceTelemetry
...
spec:
  ...
  alerting:
    alertmanager:
      receivers:
        snmpTraps:
          enabled: true
          target: 10.10.10.10

apiVersion: infra.watch/v1beta1
kind: ServiceTelemetry
...
spec:
  ...
  alerting:
    alertmanager:
      receivers:
        snmpTraps:
          enabled: true
          target: 10.10.10.10

Copy to Clipboard

Toggle word wrap

Ensure that you set the value of target to the IP address or hostname of the SNMP trap receiver.

Additional Information

For more information about available parameters for snmpTraps, see Section 6.4.1, “Configuration parameters for snmpTraps”.

6.4.4. Creating alerts for SNMP traps
Copia collegamento

You can create alerts that are configured for delivery by SNMP traps by adding labels that are parsed by the prometheus-webhook-snmp middleware to define the trap information and delivered object identifiers (OID). Adding the oid or severity labels is only required if you need to change the default values for a particular alert definition.

Note

When you set the oid label, the top-level SNMP trap OID changes, but the sub-OIDs remain defined by the global trapOidPrefix value plus the child OID values .1.1.1 through .1.1.9. For more information about the MIB definition, see Section 6.4.2, “Overview of the MIB definition”.

Procedure

Log in to Red Hat OpenShift Container Platform.
Change to the service-telemetry namespace:
```
oc project service-telemetry
```
```
$ oc project service-telemetry
```
Copy to Clipboard Toggle word wrap

Create a PrometheusRule object that contains the alert rule and an oid label that contains the SNMP trap OID override value:

$ oc apply -f - <<EOF
apiVersion: monitoring.rhobs/v1
kind: PrometheusRule
metadata:
  creationTimestamp: null
  labels:
    prometheus: default
    role: alert-rules
  name: prometheus-alarm-rules-snmp
  namespace: service-telemetry
spec:
  groups:
    - name: ./openstack.rules
      rules:
        - alert: Collectd metrics receive rate is zero
          expr: rate(sg_total_collectd_msg_received_count[1m]) == 0
          labels:
            oid: 1.3.6.1.4.1.50495.15.1.2.1
            severity: critical
EOF

$ oc apply -f - <<EOF
apiVersion: monitoring.rhobs/v1
kind: PrometheusRule
metadata:
  creationTimestamp: null
  labels:
    prometheus: default
    role: alert-rules
  name: prometheus-alarm-rules-snmp
  namespace: service-telemetry
spec:
  groups:
    - name: ./openstack.rules
      rules:
        - alert: Collectd metrics receive rate is zero
          expr: rate(sg_total_collectd_msg_received_count[1m]) == 0
          labels:
            oid: 1.3.6.1.4.1.50495.15.1.2.1
            severity: critical
EOF

Copy to Clipboard

Toggle word wrap

Additional information

For more information about configuring alerts, see Section 6.3, “Alerts in Service Telemetry Framework”.

6.5. Configuring the duration for the TLS certificates
Copia collegamento

To configure the duration of the TLS certificates that you use for the AMQ Interconnect connection in Service Telemetry Framework (STF), modify the ServiceTelemetry object and configure the certificates parameter.

6.5.1. Configuration parameters for the TLS certificates
Copia collegamento

You can configure the duration of the certificate with the following sub-parameters of the certificates parameter:

endpointCertDuration: The requested duration or lifetime of the endpoint Certificate. Minimum accepted duration is 1 hour. Value must be in units accepted by Go time.ParseDuration https://golang.org/pkg/time/#ParseDuration. The default value is 70080h.
caCertDuration: The requested duration or lifetime of the CA Certificate. Minimum accepted duration is 1 hour. Value must be in units accepted by Go time.ParseDuration https://golang.org/pkg/time/#ParseDuration. Default value is 70080h.

Note

The default duration of certificates is long, because you usually copy a subset of them in the Red Hat OpenStack Platform deployment when the certificates renew. For more information about the QDR CA Certificate renewal process, see Chapter 7, Renewing the AMQ Interconnect certificate.

You can configure the certificates parameter for QDR that is part of the transports.qdr definition in the ServiceTelemetry object:

apiVersion: infra.watch/v1beta1
kind: ServiceTelemetry
metadata:
  name: default
  namespace: service-telemetry
spec:
...
  transports:
    ...
    qdr:
      enabled: true
      certificates:
        endpointCertDuration: 70080h
        caCertDuration: 70080h
...

apiVersion: infra.watch/v1beta1
kind: ServiceTelemetry
metadata:
  name: default
  namespace: service-telemetry
spec:
...
  transports:
    ...
    qdr:
      enabled: true
      certificates:
        endpointCertDuration: 70080h
        caCertDuration: 70080h
...

Copy to Clipboard

Toggle word wrap

6.5.2. Configuring TLS certificates duration
Copia collegamento

To configure the duration of the TLS certificates to use with Service Telemetry Framework (STF), modify the ServiceTelemetry object and configure the certificates parameter.

Prerequisites

You didn’t deploy an instance of Service Telemetry Operator already.

Note

When you create the ServiceTelemetry object, the required certificates and their secrets for STF are also created. For more information about how to modify the certificates and the secrets, see: Chapter 7, Renewing the AMQ Interconnect certificate. The following procedure is valid for new STF deployments.

Procedure

To edit the duration of the TLS certificate, you can set the QDR caCertDuration, for example 87600h for 10 years:

$ oc apply -f - <<EOF
apiVersion: infra.watch/v1beta1
kind: ServiceTelemetry
metadata:
  name: default
  namespace: service-telemetry
spec:
  transport:
    qdr:
      enabled: true
      certificates:
        caCertDuration: 87600h
EOF

$ oc apply -f - <<EOF
apiVersion: infra.watch/v1beta1
kind: ServiceTelemetry
metadata:
  name: default
  namespace: service-telemetry
spec:
  transport:
    qdr:
      enabled: true
      certificates:
        caCertDuration: 87600h
EOF

Copy to Clipboard

Toggle word wrap

Verification

Verify that the expiry date for the certificate is correct:

oc get secret  default-interconnect-selfsigned -o jsonpath='{.data.tls\.crt}' | base64 -d  | openssl x509 -in - -text | grep "Not After"

$ oc get secret  default-interconnect-selfsigned -o jsonpath='{.data.tls\.crt}' | base64 -d  | openssl x509 -in - -text | grep "Not After"
            Not After : Mar  9 21:00:16 2033 GMT

Copy to Clipboard

Toggle word wrap

6.6. High availability
Copia collegamento

Warning

STF high availability (HA) mode is deprecated and is not supported in production environments. Red Hat OpenShift Container Platform is a highly-available platform, and you can cause issues and complicate debugging in STF if you enable HA mode.

With high availability, Service Telemetry Framework (STF) can rapidly recover from failures in its component services. Although Red Hat OpenShift Container Platform restarts a failed pod if nodes are available to schedule the workload, this recovery process might take more than one minute, during which time events and metrics are lost. A high availability configuration includes multiple copies of STF components, which reduces recovery time to approximately 2 seconds. To protect against failure of an Red Hat OpenShift Container Platform node, deploy STF to an Red Hat OpenShift Container Platform cluster with three or more nodes.

Enabling high availability has the following effects:

The following components run two pods instead of the default one:
- AMQ Interconnect
- Alertmanager
- Prometheus
- Events Smart Gateway
- Metrics Smart Gateway
Recovery time from a lost pod in any of these services reduces to approximately 2 seconds.

6.6.1. Configuring high availability
Copia collegamento

To configure Service Telemetry Framework (STF) for high availability, add highAvailability.enabled: true to the ServiceTelemetry object in Red Hat OpenShift Container Platform. You can set this parameter at installation time or, if you already deployed STF, complete the following steps:

Procedure

Log in to Red Hat OpenShift Container Platform.
Change to the service-telemetry namespace:
```
oc project service-telemetry
```
```
$ oc project service-telemetry
```
Copy to Clipboard Toggle word wrap
Use the oc command to edit the ServiceTelemetry object:
```
oc edit stf default
```
```
$ oc edit stf default
```
Copy to Clipboard Toggle word wrap

Add highAvailability.enabled: true to the spec section:

apiVersion: infra.watch/v1beta1
kind: ServiceTelemetry
...
spec:
  ...
  highAvailability:
    enabled: true

apiVersion: infra.watch/v1beta1
kind: ServiceTelemetry
...
spec:
  ...
  highAvailability:
    enabled: true

Copy to Clipboard

Toggle word wrap

Save your changes and close the object.

6.7. Observability Strategy in Service Telemetry Framework
Copia collegamento

Service Telemetry Framework (STF) does not include event storage backends or dashboarding tools. STF can optionally create datasource configurations for Grafana using the community operator to provide a dashboarding interface.

Instead of having Service Telemetry Operator create custom resource requests, you can use your own deployments of these applications or other compatible applications, and scrape the metrics Smart Gateways for delivery to your own Prometheus-compatible system for telemetry storage. If you set the observabilityStrategy to none, then storage backends will not be deployed so persistent storage will not be required by STF.

Use the observabilityStrategy property on the STF object to specify which type of observability components will be deployed.

The following values are available:

Expand

value	meaning
use_redhat	Red Hat supported components are requested by STF. This includes Prometheus and Alertmanager from the Cluster Observability Operator, but no resource requests to Elastic Cloud on Kubernetes (ECK) Operator. If enabled, resources are also requested from the Grafana Operator (community component).
use_hybrid	In addition to the Red Hat supported components, Elasticsearch and Grafana resources are also requested (if specified in the ServiceTelemetry object)
use_community	The community version of Prometheus Operator is used instead of Cluster Observability Operator. Elasticsearch and Grafana resources are also requested (if specified in the ServiceTelemetry object)
none	No storage or alerting components are deployed

Note

Newly deployed STF environments as of 1.5.3 default to use_redhat. Existing STF deployments created before 1.5.3 default to use_community.

To migrate an existing STF deployment to use_redhat, see the Red Hat Knowledge Base article Migrating Service Telemetry Framework to fully supported operators.

6.7.1. Configuring an alternate observability strategy
Copia collegamento

To skip the deployment of storage, visualization, and alerting backends, add observabilityStrategy: none to the ServiceTelemetry spec. In this mode, you only deploy AMQ Interconnect routers and Smart Gateways, and you must configure an external Prometheus-compatible system to collect metrics from the STF Smart Gateways, and an external Elasticsearch to receive the forwarded events.

Procedure

Create a ServiceTelemetry object with the property observabilityStrategy: none in the spec parameter. The manifest shows results in a default deployment of STF that is suitable for receiving telemetry from a single cloud with all metrics collector types.

$ oc apply -f - <<EOF
apiVersion: infra.watch/v1beta1
kind: ServiceTelemetry
metadata:
  name: default
  namespace: service-telemetry
spec:
  observabilityStrategy: none
EOF

$ oc apply -f - <<EOF
apiVersion: infra.watch/v1beta1
kind: ServiceTelemetry
metadata:
  name: default
  namespace: service-telemetry
spec:
  observabilityStrategy: none
EOF

Copy to Clipboard

Toggle word wrap

Delete the remaining objects that are managed by community operators

for o in alertmanagers.monitoring.rhobs/default prometheuses.monitoring.rhobs/default elasticsearch/elasticsearch grafana/default-grafana; do oc delete $o; done

$ for o in alertmanagers.monitoring.rhobs/default prometheuses.monitoring.rhobs/default elasticsearch/elasticsearch grafana/default-grafana; do oc delete $o; done

Copy to Clipboard

Toggle word wrap

To verify that all workloads are operating correctly, view the pods and the status of each pod:

oc get pods

$ oc get pods
NAME                                                      READY   STATUS    RESTARTS   AGE
default-cloud1-ceil-event-smartgateway-6f8547df6c-p2db5   3/3     Running   0          132m
default-cloud1-ceil-meter-smartgateway-59c845d65b-gzhcs   3/3     Running   0          132m
default-cloud1-coll-event-smartgateway-bf859f8d77-tzb66   3/3     Running   0          132m
default-cloud1-coll-meter-smartgateway-75bbd948b9-d5phm   3/3     Running   0          132m
default-cloud1-sens-meter-smartgateway-7fdbb57b6d-dh2g9   3/3     Running   0          132m
default-interconnect-668d5bbcd6-57b2l                     1/1     Running   0          132m
interconnect-operator-b8f5bb647-tlp5t                     1/1     Running   0          47h
service-telemetry-operator-566b9dd695-wkvjq               1/1     Running   0          156m
smart-gateway-operator-58d77dcf7-6xsq7                    1/1     Running   0          47h

Copy to Clipboard

Toggle word wrap

Additional resources

For more information about configuring additional clouds or to change the set of supported collectors, see Section 4.3.2, “Deploying Smart Gateways”.
To migrate an existing STF deployment to use_redhat, see the Red Hat Knowledge Base article Migrating Service Telemetry Framework to fully supported operators.

6.8. Resource usage of Red Hat OpenStack Platform services
Copia collegamento

You can monitor the resource usage of the Red Hat OpenStack Platform (RHOSP) services, such as the APIs and other infrastructure processes, to identify bottlenecks in the overcloud by showing services that run out of compute power. Resource usage monitoring is enabled by default.

Additional resources

To disable resource usage monitoring, see Section 6.8.1, “Disabling resource usage monitoring of Red Hat OpenStack Platform services”.

6.8.1. Disabling resource usage monitoring of Red Hat OpenStack Platform services
Copia collegamento

To disable the monitoring of RHOSP containerized service resource usage, you must set the CollectdEnableLibpodstats parameter to false.

Prerequisites

You have created the stf-connectors.yaml file. For more information, see Section 4.1, “Deploying Red Hat OpenStack Platform overcloud for Service Telemetry Framework using director”.
You are using the most current version of Red Hat OpenStack Platform (RHOSP) 17.1.

Procedure

Open the stf-connectors.yaml file and add the CollectdEnableLibpodstats parameter to override the setting in enable-stf.yaml. Ensure that stf-connectors.yaml is called from the openstack overcloud deploy command after enable-stf.yaml:
```
  CollectdEnableLibpodstats: false
```
```
  CollectdEnableLibpodstats: false
```
Copy to Clipboard Toggle word wrap
Continue with the overcloud deployment procedure. For more information, see Section 4.1.6, “Deploying the overcloud”.

6.9. Red Hat OpenStack Platform API status and containerized services health
Copia collegamento

You can use the OCI (Open Container Initiative) standard to assess the container health status of each Red Hat OpenStack Platform (RHOSP) service by periodically running a health check script. Most RHOSP services implement a health check that logs issues and returns a binary status. For the RHOSP APIs, the health checks query the root endpoint and determine the health based on the response time.

Monitoring of RHOSP container health and API status is enabled by default.

Additional resources

To disable RHOSP container health and API status monitoring, see Section 6.9.1, “Disabling container health and API status monitoring”.

6.9.1. Disabling container health and API status monitoring
Copia collegamento

To disable RHOSP containerized service health and API status monitoring, you must set the CollectdEnableSensubility parameter to false.

Prerequisites

You have created the stf-connectors.yaml file in your templates directory. For more information, see Section 4.1, “Deploying Red Hat OpenStack Platform overcloud for Service Telemetry Framework using director”.
You are using the most current version of Red Hat OpenStack Platform (RHOSP) 17.1.

Procedure

Open the stf-connectors.yaml and add the CollectdEnableSensubility parameter to override the setting in enable-stf.yaml. Ensure that stf-connectors.yaml is called from the openstack overcloud deploy command after enable-stf.yaml:
```
CollectdEnableSensubility: false
```
```
CollectdEnableSensubility: false
```
Copy to Clipboard Toggle word wrap
Continue with the overcloud deployment procedure. For more information, see Section 4.1.6, “Deploying the overcloud”.

Additional resources

For more information about multiple cloud addresses, see Section 4.3, “Configuring multiple clouds”.

Questo contenuto non è disponibile nella lingua selezionata.

6.1. Dashboards in Service Telemetry Framework
Copia collegamento

6.1.1. Configuring Grafana to host the dashboard
Copia collegamento

6.1.2. Enabling dashboards
Copia collegamento

6.1.3. Connecting an external dashboard system
Copia collegamento

6.2. Metrics retention time period in Service Telemetry Framework
Copia collegamento

6.2.1. Editing the metrics retention time period in Service Telemetry Framework
Copia collegamento

6.3. Alerts in Service Telemetry Framework
Copia collegamento

6.3.1. Creating an alert rule in Prometheus
Copia collegamento

6.3.2. Configuring custom alerts
Copia collegamento

6.3.3. Creating a standard alert route in Alertmanager
Copia collegamento

6.3.4. Creating an alert route with templating in Alertmanager
Copia collegamento

6.4. Sending alerts as SNMP traps
Copia collegamento

6.4.1. Configuration parameters for snmpTraps
Copia collegamento

6.4.2. Overview of the MIB definition
Copia collegamento

6.4.3. Configuring SNMP traps
Copia collegamento

6.4.4. Creating alerts for SNMP traps
Copia collegamento

6.5. Configuring the duration for the TLS certificates
Copia collegamento

6.5.1. Configuration parameters for the TLS certificates
Copia collegamento

6.5.2. Configuring TLS certificates duration
Copia collegamento

6.6. High availability
Copia collegamento

6.6.1. Configuring high availability
Copia collegamento

6.7. Observability Strategy in Service Telemetry Framework
Copia collegamento

6.7.1. Configuring an alternate observability strategy
Copia collegamento

6.8. Resource usage of Red Hat OpenStack Platform services
Copia collegamento

6.8.1. Disabling resource usage monitoring of Red Hat OpenStack Platform services
Copia collegamento

6.9. Red Hat OpenStack Platform API status and containerized services health
Copia collegamento

6.9.1. Disabling container health and API status monitoring
Copia collegamento

Formazione

Prova, acquista e vendi

Community

Informazioni sulla documentazione di Red Hat

Rendiamo l’open source più inclusivo

Informazioni su Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Questo contenuto non è disponibile nella lingua selezionata.

Chapter 6. Using operational features of Service Telemetry Framework

6.1. Dashboards in Service Telemetry FrameworkCopia collegamentoCollegamento copiato negli appunti!

6.1.1. Configuring Grafana to host the dashboardCopia collegamentoCollegamento copiato negli appunti!

6.1.2. Enabling dashboardsCopia collegamentoCollegamento copiato negli appunti!

6.1.3. Connecting an external dashboard systemCopia collegamentoCollegamento copiato negli appunti!

6.2. Metrics retention time period in Service Telemetry FrameworkCopia collegamentoCollegamento copiato negli appunti!

6.2.1. Editing the metrics retention time period in Service Telemetry FrameworkCopia collegamentoCollegamento copiato negli appunti!

6.3. Alerts in Service Telemetry FrameworkCopia collegamentoCollegamento copiato negli appunti!

6.3.1. Creating an alert rule in PrometheusCopia collegamentoCollegamento copiato negli appunti!

6.3.2. Configuring custom alertsCopia collegamentoCollegamento copiato negli appunti!

6.3.3. Creating a standard alert route in AlertmanagerCopia collegamentoCollegamento copiato negli appunti!

6.3.4. Creating an alert route with templating in AlertmanagerCopia collegamentoCollegamento copiato negli appunti!

6.4. Sending alerts as SNMP trapsCopia collegamentoCollegamento copiato negli appunti!

6.4.1. Configuration parameters for snmpTrapsCopia collegamentoCollegamento copiato negli appunti!

6.4.2. Overview of the MIB definitionCopia collegamentoCollegamento copiato negli appunti!

6.4.3. Configuring SNMP trapsCopia collegamentoCollegamento copiato negli appunti!

6.4.4. Creating alerts for SNMP trapsCopia collegamentoCollegamento copiato negli appunti!

6.5. Configuring the duration for the TLS certificatesCopia collegamentoCollegamento copiato negli appunti!

6.5.1. Configuration parameters for the TLS certificatesCopia collegamentoCollegamento copiato negli appunti!

6.5.2. Configuring TLS certificates durationCopia collegamentoCollegamento copiato negli appunti!

6.6. High availabilityCopia collegamentoCollegamento copiato negli appunti!

6.6.1. Configuring high availabilityCopia collegamentoCollegamento copiato negli appunti!

6.7. Observability Strategy in Service Telemetry FrameworkCopia collegamentoCollegamento copiato negli appunti!

6.7.1. Configuring an alternate observability strategyCopia collegamentoCollegamento copiato negli appunti!

6.8. Resource usage of Red Hat OpenStack Platform servicesCopia collegamentoCollegamento copiato negli appunti!

6.8.1. Disabling resource usage monitoring of Red Hat OpenStack Platform servicesCopia collegamentoCollegamento copiato negli appunti!

6.9. Red Hat OpenStack Platform API status and containerized services healthCopia collegamentoCollegamento copiato negli appunti!

6.9.1. Disabling container health and API status monitoringCopia collegamentoCollegamento copiato negli appunti!

Formazione

Prova, acquista e vendi

Community

Informazioni sulla documentazione di Red Hat

Rendiamo l’open source più inclusivo

Informazioni su Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

6.1. Dashboards in Service Telemetry Framework
Copia collegamento

6.1.1. Configuring Grafana to host the dashboard
Copia collegamento

6.1.2. Enabling dashboards
Copia collegamento

6.1.3. Connecting an external dashboard system
Copia collegamento

6.2. Metrics retention time period in Service Telemetry Framework
Copia collegamento

6.2.1. Editing the metrics retention time period in Service Telemetry Framework
Copia collegamento

6.3. Alerts in Service Telemetry Framework
Copia collegamento

6.3.1. Creating an alert rule in Prometheus
Copia collegamento

6.3.2. Configuring custom alerts
Copia collegamento

6.3.3. Creating a standard alert route in Alertmanager
Copia collegamento

6.3.4. Creating an alert route with templating in Alertmanager
Copia collegamento

6.4. Sending alerts as SNMP traps
Copia collegamento

6.4.1. Configuration parameters for snmpTraps
Copia collegamento

6.4.2. Overview of the MIB definition
Copia collegamento

6.4.3. Configuring SNMP traps
Copia collegamento

6.4.4. Creating alerts for SNMP traps
Copia collegamento

6.5. Configuring the duration for the TLS certificates
Copia collegamento

6.5.1. Configuration parameters for the TLS certificates
Copia collegamento

6.5.2. Configuring TLS certificates duration
Copia collegamento

6.6. High availability
Copia collegamento

6.6.1. Configuring high availability
Copia collegamento

6.7. Observability Strategy in Service Telemetry Framework
Copia collegamento

6.7.1. Configuring an alternate observability strategy
Copia collegamento

6.8. Resource usage of Red Hat OpenStack Platform services
Copia collegamento

6.8.1. Disabling resource usage monitoring of Red Hat OpenStack Platform services
Copia collegamento

6.9. Red Hat OpenStack Platform API status and containerized services health
Copia collegamento

6.9.1. Disabling container health and API status monitoring
Copia collegamento