Questo contenuto non è disponibile nella lingua selezionata.
Chapter 5. Using operational features of Service Telemetry Framework
You can use the following operational features to provide additional functionality to the Service Telemetry Framework (STF):
5.1. Dashboards in Service Telemetry Framework Copia collegamentoCollegamento copiato negli appunti!
Use the third-party application, Grafana, to visualize system-level metrics that collectd and Ceilometer gathers for each individual host node.
For more information about configuring collectd, see Section 4.1, “Deploying Red Hat OpenStack Platform overcloud for Service Telemetry Framework”.
5.1.1. Configuring Grafana to host the dashboard Copia collegamentoCollegamento copiato negli appunti!
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you must deploy the Grafana Operator from OperatorHub.io. When you use the Service Telemetry Operator to deploy Grafana, it results in a Grafana instance and the configuration of the default data sources for the local STF deployment.
Procedure
- Log in to Red Hat OpenShift Container Platform.
Change to the
service-telemetrynamespace:oc project service-telemetry
$ oc project service-telemetryCopy to Clipboard Copied! Toggle word wrap Toggle overflow Deploy the Grafana operator:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the Operator launched successfully. In the command output, if the value of the
PHASEcolumn isSucceeded, the Operator launched successfully:oc get csv --selector operators.coreos.com/grafana-operator.service-telemetry
$ oc get csv --selector operators.coreos.com/grafana-operator.service-telemetry NAME DISPLAY VERSION REPLACES PHASE grafana-operator.v3.10.3 Grafana Operator 3.10.3 grafana-operator.v3.10.2 SucceededCopy to Clipboard Copied! Toggle word wrap Toggle overflow To launch a Grafana instance, create or modify the
ServiceTelemetryobject. Setgraphing.enabledandgraphing.grafana.ingressEnabledtotrue:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the Grafana instance deployed:
oc get pod -l app=grafana
$ oc get pod -l app=grafana NAME READY STATUS RESTARTS AGE grafana-deployment-7fc7848b56-sbkhv 1/1 Running 0 1mCopy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the Grafana data sources installed correctly:
oc get grafanadatasources
$ oc get grafanadatasources NAME AGE default-datasources 20hCopy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the Grafana route exists:
oc get route grafana-route
$ oc get route grafana-route NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD grafana-route grafana-route-service-telemetry.apps.infra.watch grafana-service 3000 edge NoneCopy to Clipboard Copied! Toggle word wrap Toggle overflow
5.1.2. Retrieving and setting Grafana login credentials Copia collegamentoCollegamento copiato negli appunti!
Service Telemetry Framework (STF) sets default login credentials when Grafana is enabled. You can override the credentials in the ServiceTelemetry object.
Procedure
- Log in to Red Hat OpenShift Container Platform.
Change to the
service-telemetrynamespace:oc project service-telemetry
$ oc project service-telemetryCopy to Clipboard Copied! Toggle word wrap Toggle overflow To retrieve the default username and password, describe the Grafana object:
oc describe grafana default
$ oc describe grafana defaultCopy to Clipboard Copied! Toggle word wrap Toggle overflow -
To modify the default values of the Grafana administrator username and password through the ServiceTelemetry object, use the
graphing.grafana.adminUserandgraphing.grafana.adminPasswordparameters.
5.2. Metrics retention time period in Service Telemetry Framework Copia collegamentoCollegamento copiato negli appunti!
The default retention time for metrics stored in Service Telemetry Framework (STF) is 24 hours, which provides enough data for trends to develop for the purposes of alerting.
For long-term storage, use systems designed for long-term data retention, for example, Thanos.
Additional resources
- To adjust STF for additional metrics retention time, see Section 5.2.1, “Editing the metrics retention time period in Service Telemetry Framework”.
- For recommendations about Prometheus data storage and estimating storage space, see https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects
- For more information about Thanos, see https://thanos.io/
5.2.1. Editing the metrics retention time period in Service Telemetry Framework Copia collegamentoCollegamento copiato negli appunti!
You can adjust Service Telemetry Framework (STF) for additional metrics retention time.
Procedure
- Log in to Red Hat OpenShift Container Platform.
Change to the service-telemetry namespace:
oc project service-telemetry
$ oc project service-telemetryCopy to Clipboard Copied! Toggle word wrap Toggle overflow Edit the ServiceTelemetry object:
oc edit stf default
$ oc edit stf defaultCopy to Clipboard Copied! Toggle word wrap Toggle overflow Add
retention: 7dto the storage section of backends.metrics.prometheus.storage to increase the retention period to seven days:NoteIf you set a long retention period, retrieving data from heavily populated Prometheus systems can result in queries returning results slowly.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Save your changes and close the object.
Additional resources
- For more information about the metrics retention time, see Section 5.2, “Metrics retention time period in Service Telemetry Framework”.
5.3. Alerts in Service Telemetry Framework Copia collegamentoCollegamento copiato negli appunti!
You create alert rules in Prometheus and alert routes in Alertmanager. Alert rules in Prometheus servers send alerts to an Alertmanager, which manages the alerts. Alertmanager can silence, inhibit, or aggregate alerts, and send notifications by using email, on-call notification systems, or chat platforms.
To create an alert, complete the following tasks:
- Create an alert rule in Prometheus. For more information, see Section 5.3.1, “Creating an alert rule in Prometheus”.
- Create an alert route in Alertmanager. For more information, see Section 5.3.3, “Creating an alert route in Alertmanager”.
Additional resources
For more information about alerts or notifications with Prometheus and Alertmanager, see https://prometheus.io/docs/alerting/overview/
To view an example set of alerts that you can use with Service Telemetry Framework (STF), see https://github.com/infrawatch/service-telemetry-operator/tree/master/deploy/alerts
5.3.1. Creating an alert rule in Prometheus Copia collegamentoCollegamento copiato negli appunti!
Prometheus evaluates alert rules to trigger notifications. If the rule condition returns an empty result set, the condition is false. Otherwise, the rule is true and it triggers an alert.
Procedure
- Log in to Red Hat OpenShift Container Platform.
Change to the
service-telemetrynamespace:oc project service-telemetry
$ oc project service-telemetryCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create a
PrometheusRuleobject that contains the alert rule. The Prometheus Operator loads the rule into Prometheus:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- To change the rule, edit the value of the
exprparameter.
To verify that the Operator loaded the rules into Prometheus, create a pod with access to
curl:oc run curl --generator=run-pod/v1 --image=radial/busyboxplus:curl -i --tty
$ oc run curl --generator=run-pod/v1 --image=radial/busyboxplus:curl -i --ttyCopy to Clipboard Copied! Toggle word wrap Toggle overflow Run the
curlcommand to access theprometheus-operatedservice to return the rules loaded into memory:[ root@curl:/ ]$ curl prometheus-operated:9090/api/v1/rules {"status":"success","data":{"groups":[{"name":"./openstack.rules","file":"/etc/prometheus/rules/prometheus-default-rulefiles-0/service-telemetry-prometheus-alarm-rules.yaml","rules":[{"name":"Metric Listener down","query":"collectd_qpid_router_status \u003c 1","duration":0,"labels":{},"annotations":{},"alerts":[],"health":"ok","type":"alerting"}],"interval":30}]}}[ root@curl:/ ]$ curl prometheus-operated:9090/api/v1/rules {"status":"success","data":{"groups":[{"name":"./openstack.rules","file":"/etc/prometheus/rules/prometheus-default-rulefiles-0/service-telemetry-prometheus-alarm-rules.yaml","rules":[{"name":"Metric Listener down","query":"collectd_qpid_router_status \u003c 1","duration":0,"labels":{},"annotations":{},"alerts":[],"health":"ok","type":"alerting"}],"interval":30}]}}Copy to Clipboard Copied! Toggle word wrap Toggle overflow To verify that the output shows the rules loaded into the
PrometheusRuleobject, for example the output contains the defined./openstack.rules, exit the pod:[ root@curl:/ ]$ exit
[ root@curl:/ ]$ exitCopy to Clipboard Copied! Toggle word wrap Toggle overflow Clean up the environment by deleting the
curlpod:oc delete pod curl
$ oc delete pod curl pod "curl" deletedCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Additional resources
- For more information on alerting, see https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/alerting.md
5.3.2. Configuring custom alerts Copia collegamentoCollegamento copiato negli appunti!
You can add custom alerts to the PrometheusRule object that you created in Section 5.3.1, “Creating an alert rule in Prometheus”.
Procedure
Use the
oc editcommand:oc edit prometheusrules prometheus-alarm-rules
$ oc edit prometheusrules prometheus-alarm-rulesCopy to Clipboard Copied! Toggle word wrap Toggle overflow -
Edit the
PrometheusRulesmanifest. - Save and close the manifest.
Additional resources
- For more information about how to configure alerting rules, see https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/.
- For more information about PrometheusRules objects, see https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/alerting.md
5.3.3. Creating an alert route in Alertmanager Copia collegamentoCollegamento copiato negli appunti!
Use Alertmanager to deliver alerts to an external system, such as email, IRC, or other notification channel. The Prometheus Operator manages the Alertmanager configuration as a Red Hat OpenShift Container Platform secret. By default, Service Telemetry Framework (STF) deploys a basic configuration that results in no receivers:
To deploy a custom Alertmanager route with STF, you must pass an alertmanagerConfigManifest parameter to the Service Telemetry Operator that results in an updated secret, managed by the Prometheus Operator.
Procedure
- Log in to Red Hat OpenShift Container Platform.
Change to the
service-telemetrynamespace:oc project service-telemetry
$ oc project service-telemetryCopy to Clipboard Copied! Toggle word wrap Toggle overflow Edit the
ServiceTelemetryobject for your STF deployment:oc edit stf default
$ oc edit stf defaultCopy to Clipboard Copied! Toggle word wrap Toggle overflow Add the new parameter
alertmanagerConfigManifestand theSecretobject contents to define thealertmanager.yamlconfiguration for Alertmanager:NoteThis step loads the default template that the Service Telemetry Operator manages. To verify that the changes are populating correctly, change a value, return the
alertmanager-defaultsecret, and verify that the new value is loaded into memory. For example, change the value of the parameterglobal.resolve_timeoutfrom5mto10m.Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the configuration is applied to the secret:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To verify the configuration is loaded into Alertmanager, create a pod with access to
curl:oc run curl --generator=run-pod/v1 --image=radial/busyboxplus:curl -i --tty
$ oc run curl --generator=run-pod/v1 --image=radial/busyboxplus:curl -i --ttyCopy to Clipboard Copied! Toggle word wrap Toggle overflow Run
curlagainst thealertmanager-operatedservice to retrieve the status andconfigYAMLcontents, and verify that the supplied configuration matches the configuration in Alertmanager:[ root@curl:/ ]$ curl alertmanager-operated:9093/api/v1/status {"status":"success","data":{"configYAML":"global:\n resolve_timeout: 10m\n http_config: {}\n smtp_hello: localhost\n smtp_require_tls: true\n pagerduty_url: https://events.pagerduty.com/v2/enqueue\n hipchat_api_url: https://api.hipchat.com/\n opsgenie_api_url: https://api.opsgenie.com/\n wechat_api_url: https://qyapi.weixin.qq.com/cgi-bin/\n victorops_api_url: https://alert.victorops.com/integrations/generic/20131114/alert/\nroute:\n receiver: \"null\"\n group_by:\n - job\n group_wait: 30s\n group_interval: 5m\n repeat_interval: 12h\nreceivers:\n- name: \"null\"\ntemplates: []\n",...}}[ root@curl:/ ]$ curl alertmanager-operated:9093/api/v1/status {"status":"success","data":{"configYAML":"global:\n resolve_timeout: 10m\n http_config: {}\n smtp_hello: localhost\n smtp_require_tls: true\n pagerduty_url: https://events.pagerduty.com/v2/enqueue\n hipchat_api_url: https://api.hipchat.com/\n opsgenie_api_url: https://api.opsgenie.com/\n wechat_api_url: https://qyapi.weixin.qq.com/cgi-bin/\n victorops_api_url: https://alert.victorops.com/integrations/generic/20131114/alert/\nroute:\n receiver: \"null\"\n group_by:\n - job\n group_wait: 30s\n group_interval: 5m\n repeat_interval: 12h\nreceivers:\n- name: \"null\"\ntemplates: []\n",...}}Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Verify that the
configYAMLfield contains the changes you expect. Exit the pod:
[ root@curl:/ ]$ exit
[ root@curl:/ ]$ exitCopy to Clipboard Copied! Toggle word wrap Toggle overflow To clean up the environment, delete the
curlpod:oc delete pod curl
$ oc delete pod curl pod "curl" deletedCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Additional resources
- For more information about the Red Hat OpenShift Container Platform secret and the Prometheus operator, see Alerting.
5.4. Configuring SNMP traps Copia collegamentoCollegamento copiato negli appunti!
You can integrate Service Telemetry Framework (STF) with an existing infrastructure monitoring platform that receives notifications through SNMP traps. To enable SNMP traps, modify the ServiceTelemetry object and configure the snmpTraps parameters.
For more information about configuring alerts, see Section 5.3, “Alerts in Service Telemetry Framework”.
Prerequisites
- Know the IP address or hostname of the SNMP trap receiver where you want to send the alerts
Procedure
To enable SNMP traps, modify the
ServiceTelemetryobject:oc edit stf default
$ oc edit stf defaultCopy to Clipboard Copied! Toggle word wrap Toggle overflow Set the
alerting.alertmanager.receivers.snmpTrapsparameters:Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Ensure that you set the value of
targetto the IP address or hostname of the SNMP trap receiver.
5.5. High availability Copia collegamentoCollegamento copiato negli appunti!
With high availability, Service Telemetry Framework (STF) can rapidly recover from failures in its component services. Although Red Hat OpenShift Container Platform restarts a failed pod if nodes are available to schedule the workload, this recovery process might take more than one minute, during which time events and metrics are lost. A high availability configuration includes multiple copies of STF components, which reduces recovery time to approximately 2 seconds. To protect against failure of an Red Hat OpenShift Container Platform node, deploy STF to an Red Hat OpenShift Container Platform cluster with three or more nodes.
STF is not yet a fully fault tolerant system. Delivery of metrics and events during the recovery period is not guaranteed.
Enabling high availability has the following effects:
- Three ElasticSearch pods run instead of the default one.
The following components run two pods instead of the default one:
- AMQ Interconnect
- Alertmanager
- Prometheus
- Events Smart Gateway
- Metrics Smart Gateway
- Recovery time from a lost pod in any of these services reduces to approximately 2 seconds.
5.5.1. Configuring high availability Copia collegamentoCollegamento copiato negli appunti!
To configure Service Telemetry Framework (STF) for high availability, add highAvailability.enabled: true to the ServiceTelemetry object in Red Hat OpenShift Container Platform. You can set this parameter at installation time or, if you already deployed STF, complete the following steps:
Procedure
- Log in to Red Hat OpenShift Container Platform.
Change to the
service-telemetrynamespace:oc project service-telemetry
$ oc project service-telemetryCopy to Clipboard Copied! Toggle word wrap Toggle overflow Use the oc command to edit the ServiceTelemetry object:
oc edit stf default
$ oc edit stf defaultCopy to Clipboard Copied! Toggle word wrap Toggle overflow Add
highAvailability.enabled: trueto thespecsection:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Save your changes and close the object.
5.6. Ephemeral storage Copia collegamentoCollegamento copiato negli appunti!
You can use ephemeral storage to run Service Telemetry Framework (STF) without persistently storing data in your Red Hat OpenShift Container Platform cluster.
If you use ephemeral storage, you might experience data loss if a pod is restarted, updated, or rescheduled onto another node. Use ephemeral storage only for development or testing, and not production environments.
5.6.1. Configuring ephemeral storage Copia collegamentoCollegamento copiato negli appunti!
To configure STF components for ephemeral storage, add ...storage.strategy: ephemeral to the corresponding parameter. For example, to enable ephemeral storage for the Prometheus back end, set backends.metrics.prometheus.storage.strategy: ephemeral. Components that support configuration of ephemeral storage include alerting.alertmanager, backends.metrics.prometheus, and backends.events.elasticsearch. You can add ephemeral storage configuration at installation time or, if you already deployed STF, complete the following steps:
Procedure
- Log in to Red Hat OpenShift Container Platform.
Change to the
service-telemetrynamespace:oc project service-telemetry
$ oc project service-telemetryCopy to Clipboard Copied! Toggle word wrap Toggle overflow Edit the ServiceTelemetry object:
oc edit stf default
$ oc edit stf defaultCopy to Clipboard Copied! Toggle word wrap Toggle overflow Add the
...storage.strategy: ephemeralparameter to thespecsection of the relevant component:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Save your changes and close the object.
5.7. Creating a route in Red Hat OpenShift Container Platform Copia collegamentoCollegamento copiato negli appunti!
In Red Hat OpenShift Container Platform, you can expose applications to the external network through a route. For more information, see Configuring ingress cluster traffic.
In Service Telemetry Framework (STF), routes are not exposed by default to limit the attack surface of STF deployments. To access some services deployed in STF, you must expose the services in Red Hat OpenShift Container Platform for access.
A common service to expose in STF is Prometheus, as shown in the following example:
Procedure
- Log in to Red Hat OpenShift Container Platform.
Change to the
service-telemetrynamespace:oc project service-telemetry
$ oc project service-telemetryCopy to Clipboard Copied! Toggle word wrap Toggle overflow List the available services in the
service-telemetryproject:Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Take note of the port and service name that you want to expose as a route, for example, service
prometheus-operatedand port9090. Expose the
prometheus-operatedservice as an edge route and redirect insecure traffic to the secure endpoint of port9090:oc create route edge metrics-store --service=prometheus-operated --insecure-policy="Redirect" --port=9090
$ oc create route edge metrics-store --service=prometheus-operated --insecure-policy="Redirect" --port=9090 route.route.openshift.io/metrics-store createdCopy to Clipboard Copied! Toggle word wrap Toggle overflow To verify and find the exposed external DNS for the route, use the
oc get routecommand:oc get route metrics-store -ogo-template='{{.spec.host}}'$ oc get route metrics-store -ogo-template='{{.spec.host}}' metrics-store-service-telemetry.apps.infra.watchCopy to Clipboard Copied! Toggle word wrap Toggle overflow The
prometheus-operatedservice is now available at the exposed DNS address, for example, https://metrics-store-service-telemetry.apps.infra.watchNoteThe address of the route must be resolvable and configuration is environment specific.
Additional resources
- For more information about Red Hat OpenShift Container Platform networking, see Understanding networking
- For more information about route configuration, see Route configuration
- For more information about ingress cluster traffic, see Configuring ingress cluster traffic overview