Questo contenuto non è disponibile nella lingua selezionata.

Chapter 5. Using operational features of Service Telemetry Framework


You can use the following operational features to provide additional functionality to the Service Telemetry Framework (STF):

5.1. Dashboards in Service Telemetry Framework

Use the third-party application, Grafana, to visualize system-level metrics that collectd and Ceilometer gathers for each individual host node.

For more information about configuring collectd, see Section 4.1, “Deploying Red Hat OpenStack Platform overcloud for Service Telemetry Framework”.

5.1.1. Configuring Grafana to host the dashboard

Grafana is not included in the default Service Telemetry Framework (STF) deployment so you must deploy the Grafana Operator from OperatorHub.io. When you use the Service Telemetry Operator to deploy Grafana, it results in a Grafana instance and the configuration of the default data sources for the local STF deployment.

Procedure

  1. Log in to Red Hat OpenShift Container Platform.
  2. Change to the service-telemetry namespace:

    $ oc project service-telemetry
    Copy to Clipboard Toggle word wrap
  3. Deploy the Grafana operator:

    $ oc apply -f - <<EOF
    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: grafana-operator
      namespace: service-telemetry
    spec:
      channel: alpha
      installPlanApproval: Automatic
      name: grafana-operator
      source: operatorhubio-operators
      sourceNamespace: openshift-marketplace
    EOF
    Copy to Clipboard Toggle word wrap
  4. Verify that the Operator launched successfully. In the command output, if the value of the PHASE column is Succeeded, the Operator launched successfully:

    $ oc get csv --selector operators.coreos.com/grafana-operator.service-telemetry
    
    NAME                       DISPLAY            VERSION   REPLACES                   PHASE
    grafana-operator.v3.10.3   Grafana Operator   3.10.3    grafana-operator.v3.10.2   Succeeded
    Copy to Clipboard Toggle word wrap
  5. To launch a Grafana instance, create or modify the ServiceTelemetry object. Set graphing.enabled and graphing.grafana.ingressEnabled to true:

    $ oc edit stf default
    
    apiVersion: infra.watch/v1beta1
    kind: ServiceTelemetry
    ...
    spec:
      ...
      graphing:
        enabled: true
        grafana:
          ingressEnabled: true
    Copy to Clipboard Toggle word wrap
  6. Verify that the Grafana instance deployed:

    $ oc get pod -l app=grafana
    
    NAME                                  READY   STATUS    RESTARTS   AGE
    grafana-deployment-7fc7848b56-sbkhv   1/1     Running   0          1m
    Copy to Clipboard Toggle word wrap
  7. Verify that the Grafana data sources installed correctly:

    $ oc get grafanadatasources
    
    NAME                    AGE
    default-datasources     20h
    Copy to Clipboard Toggle word wrap
  8. Verify that the Grafana route exists:

    $ oc get route grafana-route
    
    NAME            HOST/PORT                                          PATH   SERVICES          PORT   TERMINATION   WILDCARD
    grafana-route   grafana-route-service-telemetry.apps.infra.watch          grafana-service   3000   edge          None
    Copy to Clipboard Toggle word wrap

5.1.2. Retrieving and setting Grafana login credentials

Service Telemetry Framework (STF) sets default login credentials when Grafana is enabled. You can override the credentials in the ServiceTelemetry object.

Procedure

  1. Log in to Red Hat OpenShift Container Platform.
  2. Change to the service-telemetry namespace:

    $ oc project service-telemetry
    Copy to Clipboard Toggle word wrap
  3. To retrieve the default username and password, describe the Grafana object:

    $ oc describe grafana default
    Copy to Clipboard Toggle word wrap
  4. To modify the default values of the Grafana administrator username and password through the ServiceTelemetry object, use the graphing.grafana.adminUser and graphing.grafana.adminPassword parameters.

5.2. Metrics retention time period in Service Telemetry Framework

The default retention time for metrics stored in Service Telemetry Framework (STF) is 24 hours, which provides enough data for trends to develop for the purposes of alerting.

For long-term storage, use systems designed for long-term data retention, for example, Thanos.

Additional resources

You can adjust Service Telemetry Framework (STF) for additional metrics retention time.

Procedure

  1. Log in to Red Hat OpenShift Container Platform.
  2. Change to the service-telemetry namespace:

    $ oc project service-telemetry
    Copy to Clipboard Toggle word wrap
  3. Edit the ServiceTelemetry object:

    $ oc edit stf default
    Copy to Clipboard Toggle word wrap
  4. Add retention: 7d to the storage section of backends.metrics.prometheus.storage to increase the retention period to seven days:

    Note

    If you set a long retention period, retrieving data from heavily populated Prometheus systems can result in queries returning results slowly.

    apiVersion: infra.watch/v1beta1
    kind: ServiceTelemetry
    metadata:
      name: stf-default
      namespace: service-telemetry
    spec:
      ...
      backends:
        metrics:
          prometheus:
            enabled: true
            storage:
              strategy: persistent
              retention: 7d
        ...
    Copy to Clipboard Toggle word wrap
  5. Save your changes and close the object.

Additional resources

5.3. Alerts in Service Telemetry Framework

You create alert rules in Prometheus and alert routes in Alertmanager. Alert rules in Prometheus servers send alerts to an Alertmanager, which manages the alerts. Alertmanager can silence, inhibit, or aggregate alerts, and send notifications by using email, on-call notification systems, or chat platforms.

To create an alert, complete the following tasks:

  1. Create an alert rule in Prometheus. For more information, see Section 5.3.1, “Creating an alert rule in Prometheus”.
  2. Create an alert route in Alertmanager. For more information, see Section 5.3.3, “Creating an alert route in Alertmanager”.

Additional resources

For more information about alerts or notifications with Prometheus and Alertmanager, see https://prometheus.io/docs/alerting/overview/

To view an example set of alerts that you can use with Service Telemetry Framework (STF), see https://github.com/infrawatch/service-telemetry-operator/tree/master/deploy/alerts

5.3.1. Creating an alert rule in Prometheus

Prometheus evaluates alert rules to trigger notifications. If the rule condition returns an empty result set, the condition is false. Otherwise, the rule is true and it triggers an alert.

Procedure

  1. Log in to Red Hat OpenShift Container Platform.
  2. Change to the service-telemetry namespace:

    $ oc project service-telemetry
    Copy to Clipboard Toggle word wrap
  3. Create a PrometheusRule object that contains the alert rule. The Prometheus Operator loads the rule into Prometheus:

    $ oc apply -f - <<EOF
    apiVersion: monitoring.coreos.com/v1
    kind: PrometheusRule
    metadata:
      creationTimestamp: null
      labels:
        prometheus: default
        role: alert-rules
      name: prometheus-alarm-rules
      namespace: service-telemetry
    spec:
      groups:
        - name: ./openstack.rules
          rules:
            - alert: Metric Listener down
              expr: collectd_qpid_router_status < 1 
    1
    
    EOF
    Copy to Clipboard Toggle word wrap
    1
    To change the rule, edit the value of the expr parameter.
  4. To verify that the Operator loaded the rules into Prometheus, create a pod with access to curl:

    $ oc run curl --generator=run-pod/v1 --image=radial/busyboxplus:curl -i --tty
    Copy to Clipboard Toggle word wrap
  5. Run the curl command to access the prometheus-operated service to return the rules loaded into memory:

    [ root@curl:/ ]$ curl prometheus-operated:9090/api/v1/rules
    {"status":"success","data":{"groups":[{"name":"./openstack.rules","file":"/etc/prometheus/rules/prometheus-default-rulefiles-0/service-telemetry-prometheus-alarm-rules.yaml","rules":[{"name":"Metric Listener down","query":"collectd_qpid_router_status \u003c 1","duration":0,"labels":{},"annotations":{},"alerts":[],"health":"ok","type":"alerting"}],"interval":30}]}}
    Copy to Clipboard Toggle word wrap
  6. To verify that the output shows the rules loaded into the PrometheusRule object, for example the output contains the defined ./openstack.rules, exit the pod:

    [ root@curl:/ ]$ exit
    Copy to Clipboard Toggle word wrap
  7. Clean up the environment by deleting the curl pod:

    $ oc delete pod curl
    
    pod "curl" deleted
    Copy to Clipboard Toggle word wrap

5.3.2. Configuring custom alerts

You can add custom alerts to the PrometheusRule object that you created in Section 5.3.1, “Creating an alert rule in Prometheus”.

Procedure

  1. Use the oc edit command:

    $ oc edit prometheusrules prometheus-alarm-rules
    Copy to Clipboard Toggle word wrap
  2. Edit the PrometheusRules manifest.
  3. Save and close the manifest.

Additional resources

5.3.3. Creating an alert route in Alertmanager

Use Alertmanager to deliver alerts to an external system, such as email, IRC, or other notification channel. The Prometheus Operator manages the Alertmanager configuration as a Red Hat OpenShift Container Platform secret. By default, Service Telemetry Framework (STF) deploys a basic configuration that results in no receivers:

alertmanager.yaml: |-
  global:
    resolve_timeout: 5m
  route:
    group_by: ['job']
    group_wait: 30s
    group_interval: 5m
    repeat_interval: 12h
    receiver: 'null'
  receivers:
  - name: 'null'
Copy to Clipboard Toggle word wrap

To deploy a custom Alertmanager route with STF, you must pass an alertmanagerConfigManifest parameter to the Service Telemetry Operator that results in an updated secret, managed by the Prometheus Operator.

Procedure

  1. Log in to Red Hat OpenShift Container Platform.
  2. Change to the service-telemetry namespace:

    $ oc project service-telemetry
    Copy to Clipboard Toggle word wrap
  3. Edit the ServiceTelemetry object for your STF deployment:

    $ oc edit stf default
    Copy to Clipboard Toggle word wrap
  4. Add the new parameter alertmanagerConfigManifest and the Secret object contents to define the alertmanager.yaml configuration for Alertmanager:

    Note

    This step loads the default template that the Service Telemetry Operator manages. To verify that the changes are populating correctly, change a value, return the alertmanager-default secret, and verify that the new value is loaded into memory. For example, change the value of the parameter global.resolve_timeout from 5m to 10m.

    apiVersion: infra.watch/v1beta1
    kind: ServiceTelemetry
    metadata:
      name: default
      namespace: service-telemetry
    spec:
      backends:
        metrics:
          prometheus:
            enabled: true
      alertmanagerConfigManifest: |
        apiVersion: v1
        kind: Secret
        metadata:
          name: 'alertmanager-default'
          namespace: 'service-telemetry'
        type: Opaque
        stringData:
          alertmanager.yaml: |-
            global:
              resolve_timeout: 10m
            route:
              group_by: ['job']
              group_wait: 30s
              group_interval: 5m
              repeat_interval: 12h
              receiver: 'null'
            receivers:
            - name: 'null'
    Copy to Clipboard Toggle word wrap
  5. Verify that the configuration is applied to the secret:

    $ oc get secret alertmanager-default -o go-template='{{index .data "alertmanager.yaml" | base64decode }}'
    
    global:
      resolve_timeout: 10m
    route:
      group_by: ['job']
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 12h
      receiver: 'null'
    receivers:
    - name: 'null'
    Copy to Clipboard Toggle word wrap
  6. To verify the configuration is loaded into Alertmanager, create a pod with access to curl:

    $ oc run curl --generator=run-pod/v1 --image=radial/busyboxplus:curl -i --tty
    Copy to Clipboard Toggle word wrap
  7. Run curl against the alertmanager-operated service to retrieve the status and configYAML contents, and verify that the supplied configuration matches the configuration in Alertmanager:

    [ root@curl:/ ]$ curl alertmanager-operated:9093/api/v1/status
    
    {"status":"success","data":{"configYAML":"global:\n  resolve_timeout: 10m\n  http_config: {}\n  smtp_hello: localhost\n  smtp_require_tls: true\n  pagerduty_url: https://events.pagerduty.com/v2/enqueue\n  hipchat_api_url: https://api.hipchat.com/\n  opsgenie_api_url: https://api.opsgenie.com/\n  wechat_api_url: https://qyapi.weixin.qq.com/cgi-bin/\n  victorops_api_url: https://alert.victorops.com/integrations/generic/20131114/alert/\nroute:\n  receiver: \"null\"\n  group_by:\n  - job\n  group_wait: 30s\n  group_interval: 5m\n  repeat_interval: 12h\nreceivers:\n- name: \"null\"\ntemplates: []\n",...}}
    Copy to Clipboard Toggle word wrap
  8. Verify that the configYAML field contains the changes you expect.
  9. Exit the pod:

    [ root@curl:/ ]$ exit
    Copy to Clipboard Toggle word wrap
  10. To clean up the environment, delete the curl pod:

    $ oc delete pod curl
    
    pod "curl" deleted
    Copy to Clipboard Toggle word wrap

Additional resources

  • For more information about the Red Hat OpenShift Container Platform secret and the Prometheus operator, see Alerting.

5.4. Configuring SNMP traps

You can integrate Service Telemetry Framework (STF) with an existing infrastructure monitoring platform that receives notifications through SNMP traps. To enable SNMP traps, modify the ServiceTelemetry object and configure the snmpTraps parameters.

For more information about configuring alerts, see Section 5.3, “Alerts in Service Telemetry Framework”.

Prerequisites

  • Know the IP address or hostname of the SNMP trap receiver where you want to send the alerts

Procedure

  1. To enable SNMP traps, modify the ServiceTelemetry object:

    $ oc edit stf default
    Copy to Clipboard Toggle word wrap
  2. Set the alerting.alertmanager.receivers.snmpTraps parameters:

    apiVersion: infra.watch/v1beta1
    kind: ServiceTelemetry
    ...
    spec:
      ...
      alerting:
        alertmanager:
          receivers:
            snmpTraps:
              enabled: true
              target: 10.10.10.10
    Copy to Clipboard Toggle word wrap
  3. Ensure that you set the value of target to the IP address or hostname of the SNMP trap receiver.

5.5. High availability

With high availability, Service Telemetry Framework (STF) can rapidly recover from failures in its component services. Although Red Hat OpenShift Container Platform restarts a failed pod if nodes are available to schedule the workload, this recovery process might take more than one minute, during which time events and metrics are lost. A high availability configuration includes multiple copies of STF components, which reduces recovery time to approximately 2 seconds. To protect against failure of an Red Hat OpenShift Container Platform node, deploy STF to an Red Hat OpenShift Container Platform cluster with three or more nodes.

Warning

STF is not yet a fully fault tolerant system. Delivery of metrics and events during the recovery period is not guaranteed.

Enabling high availability has the following effects:

  • Three ElasticSearch pods run instead of the default one.
  • The following components run two pods instead of the default one:

    • AMQ Interconnect
    • Alertmanager
    • Prometheus
    • Events Smart Gateway
    • Metrics Smart Gateway
  • Recovery time from a lost pod in any of these services reduces to approximately 2 seconds.

5.5.1. Configuring high availability

To configure Service Telemetry Framework (STF) for high availability, add highAvailability.enabled: true to the ServiceTelemetry object in Red Hat OpenShift Container Platform. You can set this parameter at installation time or, if you already deployed STF, complete the following steps:

Procedure

  1. Log in to Red Hat OpenShift Container Platform.
  2. Change to the service-telemetry namespace:

    $ oc project service-telemetry
    Copy to Clipboard Toggle word wrap
  3. Use the oc command to edit the ServiceTelemetry object:

    $ oc edit stf default
    Copy to Clipboard Toggle word wrap
  4. Add highAvailability.enabled: true to the spec section:

    apiVersion: infra.watch/v1beta1
    kind: ServiceTelemetry
    ...
    spec:
      ...
      highAvailability:
        enabled: true
    Copy to Clipboard Toggle word wrap
  5. Save your changes and close the object.

5.6. Ephemeral storage

You can use ephemeral storage to run Service Telemetry Framework (STF) without persistently storing data in your Red Hat OpenShift Container Platform cluster.

Warning

If you use ephemeral storage, you might experience data loss if a pod is restarted, updated, or rescheduled onto another node. Use ephemeral storage only for development or testing, and not production environments.

5.6.1. Configuring ephemeral storage

To configure STF components for ephemeral storage, add ...storage.strategy: ephemeral to the corresponding parameter. For example, to enable ephemeral storage for the Prometheus back end, set backends.metrics.prometheus.storage.strategy: ephemeral. Components that support configuration of ephemeral storage include alerting.alertmanager, backends.metrics.prometheus, and backends.events.elasticsearch. You can add ephemeral storage configuration at installation time or, if you already deployed STF, complete the following steps:

Procedure

  1. Log in to Red Hat OpenShift Container Platform.
  2. Change to the service-telemetry namespace:

    $ oc project service-telemetry
    Copy to Clipboard Toggle word wrap
  3. Edit the ServiceTelemetry object:

    $ oc edit stf default
    Copy to Clipboard Toggle word wrap
  4. Add the ...storage.strategy: ephemeral parameter to the spec section of the relevant component:

    apiVersion: infra.watch/v1beta1
    kind: ServiceTelemetry
    metadata:
      name: stf-default
      namespace: service-telemetry
    spec:
      alerting:
        enabled: true
        alertmanager:
          storage:
            strategy: ephemeral
      backends:
        metrics:
          prometheus:
            enabled: true
            storage:
              strategy: ephemeral
        events:
          elasticsearch:
            enabled: true
            storage:
              strategy: ephemeral
    Copy to Clipboard Toggle word wrap
  5. Save your changes and close the object.

5.7. Creating a route in Red Hat OpenShift Container Platform

In Red Hat OpenShift Container Platform, you can expose applications to the external network through a route. For more information, see Configuring ingress cluster traffic.

In Service Telemetry Framework (STF), routes are not exposed by default to limit the attack surface of STF deployments. To access some services deployed in STF, you must expose the services in Red Hat OpenShift Container Platform for access.

A common service to expose in STF is Prometheus, as shown in the following example:

Procedure

  1. Log in to Red Hat OpenShift Container Platform.
  2. Change to the service-telemetry namespace:

    $ oc project service-telemetry
    Copy to Clipboard Toggle word wrap
  3. List the available services in the service-telemetry project:

    $ oc get services
    NAME                                     TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                                         AGE
    alertmanager-operated                    ClusterIP   None             <none>        9093/TCP,9094/TCP,9094/UDP                      93m
    default-cloud1-ceil-meter-smartgateway   ClusterIP   172.30.114.195   <none>        8081/TCP                                        93m
    default-cloud1-coll-meter-smartgateway   ClusterIP   172.30.133.180   <none>        8081/TCP                                        93m
    default-interconnect                     ClusterIP   172.30.3.241     <none>        5672/TCP,8672/TCP,55671/TCP,5671/TCP,5673/TCP   93m
    ibm-auditlogging-operator-metrics        ClusterIP   172.30.216.249   <none>        8383/TCP,8686/TCP                               11h
    prometheus-operated                      ClusterIP   None             <none>        9090/TCP                                        93m
    service-telemetry-operator-metrics       ClusterIP   172.30.11.66     <none>        8383/TCP,8686/TCP                               11h
    smart-gateway-operator-metrics           ClusterIP   172.30.145.199   <none>        8383/TCP,8686/TCP                               11h
    Copy to Clipboard Toggle word wrap
  4. Take note of the port and service name that you want to expose as a route, for example, service prometheus-operated and port 9090.
  5. Expose the prometheus-operated service as an edge route and redirect insecure traffic to the secure endpoint of port 9090:

    $ oc create route edge metrics-store --service=prometheus-operated --insecure-policy="Redirect" --port=9090
    route.route.openshift.io/metrics-store created
    Copy to Clipboard Toggle word wrap
  6. To verify and find the exposed external DNS for the route, use the oc get route command:

    $ oc get route metrics-store -ogo-template='{{.spec.host}}'
    metrics-store-service-telemetry.apps.infra.watch
    Copy to Clipboard Toggle word wrap

    The prometheus-operated service is now available at the exposed DNS address, for example, https://metrics-store-service-telemetry.apps.infra.watch

    Note

    The address of the route must be resolvable and configuration is environment specific.

Additional resources

Torna in cima
Red Hat logoGithubredditYoutubeTwitter

Formazione

Prova, acquista e vendi

Community

Informazioni sulla documentazione di Red Hat

Aiutiamo gli utenti Red Hat a innovarsi e raggiungere i propri obiettivi con i nostri prodotti e servizi grazie a contenuti di cui possono fidarsi. Esplora i nostri ultimi aggiornamenti.

Rendiamo l’open source più inclusivo

Red Hat si impegna a sostituire il linguaggio problematico nel codice, nella documentazione e nelle proprietà web. Per maggiori dettagli, visita il Blog di Red Hat.

Informazioni su Red Hat

Forniamo soluzioni consolidate che rendono più semplice per le aziende lavorare su piattaforme e ambienti diversi, dal datacenter centrale all'edge della rete.

Theme

© 2025 Red Hat