Chapter 4. Advanced features
The following optional features can provide additional functionality to the Service Telemetry Framework (STF):
- Customizing the deployment. For more information, see Section 4.1, “Customizing the deployment”.
- Alerts. For more information, see Section 4.2, “Alerts”.
- High availability. For more information, see Section 4.3, “High availability”.
- Dashboards. For more information, see Section 4.4, “Dashboards”.
- Multiple clouds. For more information, see Section 4.5, “Configuring multiple clouds”.
- Ephemeral storage. For more information, see Section 4.6, “Ephemeral storage”.
4.1. Customizing the deployment Copy linkLink copied to clipboard!
The Service Telemetry Operator watches for a ServiceTelemetry manifest to load into Red Hat OpenShift Container Platform (OCP). The Operator then creates other objects in memory, which results in the dependent Operators creating the workloads they are responsible for managing.
When you override the manifest, you must provide the entire manifest contents, including object names or namespaces. There is no dynamic parameter substitution when you override a manifest.
To override a manifest successfully with Service Telemetry Framework (STF), deploy a default environment using the core options only. For more information about the core options, see Section 2.3.10, “Creating a ServiceTelemetry object in OCP”. When you deploy STF, use the oc get command to retrieve the default deployed manifest. When you use a manifest that was originally generated by Service Telemetry Operator, the manifest is compatible with the other objects that are managed by the Operators.
For example, when the metricsEnabled: true parameter is configured in the ServiceTelemetry manifest, the Service Telemetry Operator requests components for metrics retrieval and storage using the default manifests. In some cases, you might want to override the default manifest. For more information, see Section 4.1.1, “Manifest override parameters”.
4.1.1. Manifest override parameters Copy linkLink copied to clipboard!
This table describes the available parameters that you can use to override a manifest, along with the corresponding retrieval commands.
| Override parameter | Description | Retrieval command |
|---|---|---|
|
|
Override the |
|
|
|
Override the |
|
|
|
Override the |
|
|
|
Override the |
|
|
|
Override the |
|
|
|
Override the |
|
|
|
Override the |
|
|
|
Override the |
|
|
|
Override the |
|
4.1.2. Overriding a managed manifest Copy linkLink copied to clipboard!
Edit the ServiceTelemetry object and provide a parameter and manifest. For a list of available manifest override parameters, see Section 4.1, “Customizing the deployment”. The default ServiceTelemetry object is stf-default. Use oc get servicetelemetry to list the available STF deployments.
The oc edit command loads the default system editor. To override the default editor, pass or set the environment variable EDITOR to the preferred editor. For example, EDITOR=nano oc edit servicetelemetry stf-default.
Procedure
- Log in to Red Hat OpenShift Container Platform.
Change to the
service-telemetrynamespace:oc project service-telemetry
oc project service-telemetryCopy to Clipboard Copied! Toggle word wrap Toggle overflow Load the
ServiceTelemetryobject into an editor:oc edit servicetelemetry stf-default
oc edit servicetelemetry stf-defaultCopy to Clipboard Copied! Toggle word wrap Toggle overflow To modify the
ServiceTelemetryobject, provide a manifest override parameter and the contents of the manifest to write to OCP instead of the defaults provided by STF.NoteThe trailing pipe (
|) after entering the manifest override parameter indicates that the value provided is multi-line.Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Save and close.
4.2. Alerts Copy linkLink copied to clipboard!
You create alert rules in Prometheus and alert routes in Alertmanager. Alert rules in Prometheus servers send alerts to an Alertmanager, which manages the alerts. Alertmanager can silence, inhibit, or aggregate alerts, and send notifications using email, on-call notification systems, or chat platforms.
To create an alert, complete the following tasks:
- Create an alert rule in Prometheus. For more information, see Section 4.2.1, “Creating an alert rule in Prometheus”.
- Create an alert route in Alertmanager. For more information, see Section 4.2.3, “Creating an alert route in Alertmanager”.
Additional resources
For more information about alerts or notifications with Prometheus and Alertmanager, see https://prometheus.io/docs/alerting/overview/
To view an example set of alerts that you can use with Service Telemetry Framework (STF), see https://github.com/infrawatch/service-telemetry-operator/tree/master/deploy/alerts
4.2.1. Creating an alert rule in Prometheus Copy linkLink copied to clipboard!
Prometheus evaluates alert rules to trigger notifications. If the rule condition returns an empty result set, the condition is false. Otherwise, the rule is true and it triggers an alert.
Procedure
- Log in to Red Hat OpenShift Container Platform.
Change to the
service-telemetrynamespace:oc project service-telemetry
oc project service-telemetryCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create a
PrometheusRuleobject that contains the alert rule. The Prometheus Operator loads the rule into Prometheus:Copy to Clipboard Copied! Toggle word wrap Toggle overflow To verify that the rules have been loaded into Prometheus by the Operator, create a pod with access to
curl:oc run curl --generator=run-pod/v1 --image=radial/busyboxplus:curl -i --tty
oc run curl --generator=run-pod/v1 --image=radial/busyboxplus:curl -i --ttyCopy to Clipboard Copied! Toggle word wrap Toggle overflow Run
curlto access theprometheus-operatedservice to return the rules loaded into memory:[ root@curl:/ ]$ curl prometheus-operated:9090/api/v1/rules {"status":"success","data":{"groups":[{"name":"./openstack.rules","file":"/etc/prometheus/rules/prometheus-stf-default-rulefiles-0/service-telemetry-prometheus-alarm-rules.yaml","rules":[{"name":"Metric Listener down","query":"collectd_qpid_router_status \u003c 1","duration":0,"labels":{},"annotations":{},"alerts":[],"health":"ok","type":"alerting"}],"interval":30}]}}[ root@curl:/ ]$ curl prometheus-operated:9090/api/v1/rules {"status":"success","data":{"groups":[{"name":"./openstack.rules","file":"/etc/prometheus/rules/prometheus-stf-default-rulefiles-0/service-telemetry-prometheus-alarm-rules.yaml","rules":[{"name":"Metric Listener down","query":"collectd_qpid_router_status \u003c 1","duration":0,"labels":{},"annotations":{},"alerts":[],"health":"ok","type":"alerting"}],"interval":30}]}}Copy to Clipboard Copied! Toggle word wrap Toggle overflow To verify that the output shows the rules loaded into the
PrometheusRuleobject, for example the output contains the defined./openstack.rules, exit from the pod:[ root@curl:/ ]$ exit
[ root@curl:/ ]$ exitCopy to Clipboard Copied! Toggle word wrap Toggle overflow Clean up the environment by deleting the
curlpod:oc delete pod curl pod "curl" deleted
$ oc delete pod curl pod "curl" deletedCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Additional resources
For more information on alerting, see https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/alerting.md
4.2.2. Configuring custom alerts Copy linkLink copied to clipboard!
You can add custom alerts to the PrometheusRule object that you created in Section 4.2.1, “Creating an alert rule in Prometheus”.
Procedure
Use the
oc editcommand:oc edit prometheusrules prometheus-alarm-rules
oc edit prometheusrules prometheus-alarm-rulesCopy to Clipboard Copied! Toggle word wrap Toggle overflow - Edit the PrometheusRules manifest.
- Save and close.
Additional resources
For more information about configuring alerting rules, see https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/.
For more information about PrometheusRules objects, see https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/alerting.md
4.2.3. Creating an alert route in Alertmanager Copy linkLink copied to clipboard!
Use Alertmanager to deliver alerts to an external system, such as email, IRC, or other notification channel. The Prometheus Operator manages the Alertmanager configuration as an Red Hat OpenShift Container Platform (OCP) secret. STF by default deploys a basic configuration that results in no receivers:
To deploy a custom Alertmanager route with STF, an alertmanagerConfigManifest parameter must be passed to the Service Telemetry Operator that results in an updated secret, managed by the Prometheus Operator. For more information, see Section 4.1.2, “Overriding a managed manifest”.
Procedure
- Log in to Red Hat OpenShift Container Platform.
Change to the
service-telemetrynamespace:oc project service-telemetry
oc project service-telemetryCopy to Clipboard Copied! Toggle word wrap Toggle overflow Edit the
ServiceTelemetryobject for your STF deploymentoc edit servicetelemetry stf-default
oc edit servicetelemetry stf-defaultCopy to Clipboard Copied! Toggle word wrap Toggle overflow Add a new parameter,
alertmanagerConfigManifestand theSecretobject contents to define thealertmanager.yamlconfiguration for Alertmanager:NoteThis loads the default template that is already managed by Service Telemetry Operator. To validate the changes are populating correctly, change a value, return the
alertmanager-stf-defaultsecret, and verify that the new value is loaded into memory, for example, changing the valueglobal.resolve_timeoutfrom5mto10m.Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the configuration was applied to the secret:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To verify the configuration has been loaded into Alertmanager, create a pod with access to
curl:oc run curl --generator=run-pod/v1 --image=radial/busyboxplus:curl -i --tty
oc run curl --generator=run-pod/v1 --image=radial/busyboxplus:curl -i --ttyCopy to Clipboard Copied! Toggle word wrap Toggle overflow Run
curlagainst thealertmanager-operatedservice to retrieve the status andconfigYAMLcontents and review the supplied configuration matches the configuration loaded into Alertmanager:[ root@curl:/ ]$ curl alertmanager-operated:9093/api/v1/status {"status":"success","data":{"configYAML":"global:\n resolve_timeout: 10m\n http_config: {}\n smtp_hello: localhost\n smtp_require_tls: true\n pagerduty_url: https://events.pagerduty.com/v2/enqueue\n hipchat_api_url: https://api.hipchat.com/\n opsgenie_api_url: https://api.opsgenie.com/\n wechat_api_url: https://qyapi.weixin.qq.com/cgi-bin/\n victorops_api_url: https://alert.victorops.com/integrations/generic/20131114/alert/\nroute:\n receiver: \"null\"\n group_by:\n - job\n group_wait: 30s\n group_interval: 5m\n repeat_interval: 12h\nreceivers:\n- name: \"null\"\ntemplates: []\n",...}}[ root@curl:/ ]$ curl alertmanager-operated:9093/api/v1/status {"status":"success","data":{"configYAML":"global:\n resolve_timeout: 10m\n http_config: {}\n smtp_hello: localhost\n smtp_require_tls: true\n pagerduty_url: https://events.pagerduty.com/v2/enqueue\n hipchat_api_url: https://api.hipchat.com/\n opsgenie_api_url: https://api.opsgenie.com/\n wechat_api_url: https://qyapi.weixin.qq.com/cgi-bin/\n victorops_api_url: https://alert.victorops.com/integrations/generic/20131114/alert/\nroute:\n receiver: \"null\"\n group_by:\n - job\n group_wait: 30s\n group_interval: 5m\n repeat_interval: 12h\nreceivers:\n- name: \"null\"\ntemplates: []\n",...}}Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the
configYAMLfield contains the expected changes. Exit from the pod:[ root@curl:/ ]$ exit
[ root@curl:/ ]$ exitCopy to Clipboard Copied! Toggle word wrap Toggle overflow To clean up the environment, delete the
curlpod:oc delete pod curl pod "curl" deleted
$ oc delete pod curl pod "curl" deletedCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Additional resources
For more information about the Red Hat OpenShift Container Platform secret and the Prometheus operator, see https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/alerting.md
4.3. High availability Copy linkLink copied to clipboard!
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover from failures in its component services. Although Red Hat OpenShift Container Platform (OCP) restarts a failed pod if nodes are available to schedule the workload, this recovery process might take more than one minute, during which time events and metrics are lost. A high availability configuration includes multiple copies of STF components, reducing recovery time to approximately 2 seconds. To protect against failure of an OCP node, deploy STF to an OCP cluster with three or more nodes.
STF is not yet a fully fault tolerant system. Delivery of metrics and events during the recovery period is not guaranteed.
Enabling high availability has the following effects:
- Two AMQ Interconnect pods run instead of the default 1.
- Three ElasticSearch pods run instead of the default 1.
- Recovery time from a lost pod in either of these services reduces to approximately 2 seconds.
4.3.1. Configuring high availability Copy linkLink copied to clipboard!
To configure STF for high availability, add highAvailabilityEnabled: true to the ServiceTelemetry object in OCP. You can this set this parameter at installation time or, if you already deployed STF, complete the following steps:
Procedure
- Log in to Red Hat OpenShift Container Platform.
Change to the
service-telemetrynamespace:oc project service-telemetry
oc project service-telemetryCopy to Clipboard Copied! Toggle word wrap Toggle overflow Use the oc command to edit the ServiceTelemetry object:
oc edit ServiceTelemetry
$ oc edit ServiceTelemetryCopy to Clipboard Copied! Toggle word wrap Toggle overflow Add
highAvailabilityEnabled: trueto thespecsection:spec: eventsEnabled: true metricsEnabled: true highAvailabilityEnabled: true
spec: eventsEnabled: true metricsEnabled: true highAvailabilityEnabled: trueCopy to Clipboard Copied! Toggle word wrap Toggle overflow - Save your changes and close the object.
4.4. Dashboards Copy linkLink copied to clipboard!
Use third-party application Grafana to visualize system-level metrics gathered by collectd for each individual host node. For more information about configuring collectd, see Section 3.3, “Configuring Red Hat OpenStack Platform overcloud for Service Telemetry Framework”.
4.4.1. Setting up Grafana to host the dashboard Copy linkLink copied to clipboard!
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you must deploy the Grafana Operator from OperatorHub.io.
Procedure
- Log in to Red Hat OpenShift Container Platform.
Change to the
service-telemetrynamespace:oc project service-telemetry
oc project service-telemetryCopy to Clipboard Copied! Toggle word wrap Toggle overflow Clone the dashboard repository.
git clone https://github.com/infrawatch/dashboards cd dashboards
git clone https://github.com/infrawatch/dashboards cd dashboardsCopy to Clipboard Copied! Toggle word wrap Toggle overflow Deploy the Grafana operator:
oc create -f deploy/subscription.yaml
oc create -f deploy/subscription.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow To verify that the operator launched successfully, run the
oc get csvcommand. If the value of the PHASE column is Succeeded, the operator launched successfully:oc get csv NAME DISPLAY VERSION REPLACES PHASE grafana-operator.v3.2.0 Grafana Operator 3.2.0 Succeeded ...
$ oc get csv NAME DISPLAY VERSION REPLACES PHASE grafana-operator.v3.2.0 Grafana Operator 3.2.0 Succeeded ...Copy to Clipboard Copied! Toggle word wrap Toggle overflow Launch a Grafana instance:
oc create -f deploy/grafana.yaml
$ oc create -f deploy/grafana.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the Grafana instance deployed:
oc get pod -l app=grafana NAME READY STATUS RESTARTS AGE grafana-deployment-7fc7848b56-sbkhv 1/1 Running 0 1m
$ oc get pod -l app=grafana NAME READY STATUS RESTARTS AGE grafana-deployment-7fc7848b56-sbkhv 1/1 Running 0 1mCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create the datasource and dashboard resources:
oc create -f deploy/datasource.yaml \ -f deploy/rhos-dashboard.yamloc create -f deploy/datasource.yaml \ -f deploy/rhos-dashboard.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the resources installed correctly:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Navigate to https://<grafana-route-address> in a web browser. Use the
oc get routescommand to retrieve the Grafana route address:oc get routes
oc get routesCopy to Clipboard Copied! Toggle word wrap Toggle overflow - To view the dashboard, click Dashboards and Manage.
Additional resources
- For more information about enabling the OperatorHub.io catalog source, see Section 2.3.5, “Enabling the OperatorHub.io Community Catalog Source”.
4.4.1.1. Viewing and editing queries Copy linkLink copied to clipboard!
Procedure
-
Log in to Red Hat OpenShift Container Platform. To view and edit queries, log in as the
adminuser. Change to the
service-telemetrynamespace:oc project service-telemetry
oc project service-telemetryCopy to Clipboard Copied! Toggle word wrap Toggle overflow To retrieve the default username and password, describe the Grafana object using
oc describe:oc describe grafana service-telemetry-grafana
oc describe grafana service-telemetry-grafanaCopy to Clipboard Copied! Toggle word wrap Toggle overflow
4.4.2. The Grafana infrastructure dashboard Copy linkLink copied to clipboard!
The infrastructure dashboard shows metrics for a single node at a time. Select a node from the upper left corner of the dashboard.
4.4.2.1. Top panels Copy linkLink copied to clipboard!
| Title | Unit | Description |
| Current Global Alerts | - | Current alerts fired by Prometheus |
| Recent Global Alerts | - | Recently fired alerts in 5m time steps |
| Status Panel | - | Node status: up, down, unavailable |
| Uptime | s/m/h/d/M/Y | Total operational time of node |
| CPU Cores | cores | Total number of cores |
| Memory | bytes | Total memory |
| Disk Size | bytes | Total storage size |
| Processes | processes | Total number of processes listed by type |
| Load Average | processes | Load average represents the average number of running and uninterruptible processes residing in the kernel execution queue. |
4.4.2.2. Networking panels Copy linkLink copied to clipboard!
Panels that display the network interfaces of the node.
| Panel | Unit | Description |
| Physical Interfaces Ingress Errors | errors | Total errors with incoming data |
| Physical Interfaces Egress Errors | errors | Total errors with outgoing data |
| Physical Interfaces Ingress Error Rates | errors/s | Rate of incoming data errors |
| Physical Interfaces egress Error Rates | errors/s | Rate of outgoing data errors |
| Physical Interfaces Packets Ingress pps Incoming packets per second | Physical Interfaces Packets Egress | pps |
| Outgoing packets per second | Physical Interfaces Data Ingress | bytes/s |
| Incoming data rates | Physical Interfaces Data Egress | bytes/s |
| Outgoing data rates | Physical Interfaces Drop Rate Ingress | pps |
| Incoming packets drop rate | Physical Interfaces Drop Rate Egress | pps |
4.4.2.3. CPU panels Copy linkLink copied to clipboard!
Panels that display CPU usage of the node.
| Panel | Unit | Description |
|---|---|---|
| Current CPU Usage | percent | Instantaneous usage at the time of the last query. |
| Aggregate CPU Usage | percent | Average non-idle CPU activity of all cores on a node. |
| Aggr. CPU Usage by Type | percent | Shows time spent for each type of thread averaged across all cores. |
4.4.2.4. Memory panels Copy linkLink copied to clipboard!
Panels that display memory usage on the node.
| Panel | Unit | Description |
|---|---|---|
| Memory Used | percent | Amount of memory being used at time of last query. |
| Huge Pages Used | hugepages | Number of hugepages being used. Memory |
4.4.2.5. Disk/file system Copy linkLink copied to clipboard!
Panels that display space used on disk.
| Panel | Unit | Description | Notes |
|---|---|---|---|
| Disk Space Usage | percent | Total disk use at time of last query. | |
| Inode Usage | percent | Total inode use at time of last query. | |
| Aggregate Disk Space Usage | bytes | Total disk space used and reserved. |
Because this query relies on the |
| Disk Traffic | bytes/s | Shows rates for both reading and writing. | |
| Disk Load | percent | Approximate percentage of total disk bandwidth being used. The weighted I/O time series includes the backlog that might be accumulating. For more information, see the collectd disk plugin docs. | |
| Operations/s | ops/s | Operations done per second | |
| Average I/O Operation Time | seconds | Average time each I/O operation took to complete. This average is not accurate, see the collectd disk plugin docs. |
4.5. Configuring multiple clouds Copy linkLink copied to clipboard!
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance of Service Telemetry Framework (STF):
- Plan the AMQP address prefixes that you want to use for each cloud. For more information, see Section 4.5.1, “Planning AMQP address prefixes”.
- Deploy metrics and events consumer Smart Gateways for each cloud to listen on the corresponding address prefixes. For more information, see Section 4.5.2, “Deploying Smart Gateways”.
- Configure each cloud to send its metrics and events to STF on the correct address. For more information, see Section 4.5.3, “Creating the OpenStack environment file”.
Figure 4.1. Two Red Hat OpenStack Platform clouds connect to STF
4.5.1. Planning AMQP address prefixes Copy linkLink copied to clipboard!
By default, Red Hat OpenStack Platform nodes get data through two data collectors; collectd and Ceilometer. These components send telemetry data or notifications to the respective AMQP addresses, for example, collectd/telemetry, where STF Smart Gateways listen on those addresses for monitoring data.
To support multiple clouds and to identify which cloud generated the monitoring data, configure each cloud to send data to a unique address. Prefix a cloud identifier to the second part of the address. The following list shows some example addresses and identifiers:
-
collectd/cloud1-telemetry -
collectd/cloud1-notify -
anycast/ceilometer/cloud1-event.sample -
collectd/cloud2-telemetry -
collectd/cloud2-notify -
anycast/ceilometer/cloud2-event.sample -
collectd/us-east-1-telemetry -
collectd/us-west-3-telemetry
4.5.2. Deploying Smart Gateways Copy linkLink copied to clipboard!
You must deploy a Smart Gateway for each of the data collection types for each cloud; one for collectd metrics, one for collectd events, and one for Ceilometer events. Configure each of the Smart Gateways to listen on the AMQP address that you define for the corresponding cloud.
When you deploy STF for the first time, Smart Gateway manifests are created that define the initial Smart Gateways for a single cloud. When deploying Smart Gateways for multiple cloud support, you deploy multiple Smart Gateways for each of the data collection types that handle the metrics and the events data for each cloud. The initial Smart Gateways act as a template to create additional Smart Gateways, along with any authentication information required to connect to the data stores.
Procedure
- Log in to Red Hat OpenShift Container Platform.
Change to the
service-telemetrynamespace:oc project service-telemetry
oc project service-telemetryCopy to Clipboard Copied! Toggle word wrap Toggle overflow Use the initially deployed Smart Gateways as a template for additional Smart Gateways. List the currently deployed Smart Gateways with the
oc get smartgatewayscommand. For example, if you deployed STF withmetricsEnabled: trueandeventsEnabled: true, the following Smart Gateways are displayed in the output:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Retrieve the manifests for each Smart Gateway and store the contents in a temporary file, which you can modify later and use to create the new set of Smart Gateways:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Modify the Smart Gateway manifest in the
/tmp/cloud1-smartgateways.yamlfile. Adjust themetadata.nameandspec.amqpUrlfields to include the cloud identifier from your schema. For more information, see ]. To view example Smart Gateway manifests, see <<example-manifests_advanced-features[. Deploy your new Smart Gateways:
oc apply -f /tmp/cloud1-smartgateways.yaml
oc apply -f /tmp/cloud1-smartgateways.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that each Smart Gateway is running. This can take several minutes depending on the number of Smart Gateways:
oc get po -l app=smart-gateway
oc get po -l app=smart-gatewayCopy to Clipboard Copied! Toggle word wrap Toggle overflow
4.5.2.1. Example manifests Copy linkLink copied to clipboard!
The content in the following examples might be different to the file content in your deployment. Copy the manifests in your deployment.
Ensure that the name and amqpUrl parameters of each Smart Gateway match the names that you want to use for your clouds. For more information, see Section 4.5.1, “Planning AMQP address prefixes”.
Your output may have some additional metadata parameters that you can remove from the manifests you that load into OCP.
4.5.3. Creating the OpenStack environment file Copy linkLink copied to clipboard!
To label traffic according to the cloud of origin, you must create a configuration with cloud-specific instance names. Create an stf-connectors.yaml file and adjust the values of CeilometerQdrEventsConfig and CollectdAmqpInstances to match the AMQP address prefix scheme. For more information, see Section 4.5.1, “Planning AMQP address prefixes”.
Remove enable-stf.yaml and ceilometer-write-qdr.yaml environment file references from your overcloud deployment. This configuration is redundant and results in duplicate information being sent from each cloud node.
Procedure
-
Create the
stf-connectors.yamlfile and modify it to match the AMQP address that you want for this cloud deployment:
+ <1> Define the topic for Ceilometer events. This value is the address format of anycast/ceilometer/cloud1-event.sample. <2> Define the topic for collectd events. This value is the format of collectd/cloud1-notify. <3> Define the topic for collectd metrics. This value is the format of collectd/cloud1-telemetry. <4> Adjust the MetricsQdrConnectors host to the address of the STF route.
-
Ensure that the naming convention in the
stf-connectors.yamlfile aligns with thespec.amqpUrlfield in the Smart Gateway configuration. For example, configure theCeilometerQdrEventsConfig.topicfield to a value ofcloud1-event. -
Save the file in a directory for custom environment files, for example
/home/stack/custom_templates/. Source the authentication file:
source stackrc (undercloud) [stack@undercloud-0 ~]$
[stack@undercloud-0 ~]$ source stackrc (undercloud) [stack@undercloud-0 ~]$Copy to Clipboard Copied! Toggle word wrap Toggle overflow Include the
stf-connectors.yamlfile in theovercloud deploymentcommand, along with any other environment files relevant to your environment:openstack overcloud deploy \ --templates /usr/share/openstack-tripleo-heat-templates \ ... -e /home/stack/custom_templates/stf-connectors.yaml \ ...
(undercloud) [stack@undercloud-0 ~]$ openstack overcloud deploy \ --templates /usr/share/openstack-tripleo-heat-templates \ ... -e /home/stack/custom_templates/stf-connectors.yaml \ ...Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Additional resources
For information about validating the deployment, see Section 3.3.3, “Validating client-side installation”.
4.5.4. Querying metrics data from multiple clouds Copy linkLink copied to clipboard!
Data stored in Prometheus has a service label attached according to the Smart Gateway it was scraped from. You can use this label to query data from a specific cloud.
To query data from a specific cloud, use a Prometheus promql query that matches the associated service label; for example: collectd_uptime{service="stf-default-collectd-telemetry-cloud1-smartgateway"}.
4.6. Ephemeral storage Copy linkLink copied to clipboard!
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storing data in your Red Hat OpenShift Container Platform (OCP) cluster. Ephemeral storage is not recommended in a production environment due to the volatility of the data in the platform when operating correctly and as designed. For example, restarting a pod or rescheduling the workload to another node results in the loss of any local data written since the pod started.
If you enable ephemeral storage in STF, the Service Telemetry Operator does not add the relevant storage sections to the data storage components manifests.
4.6.1. Configuring ephemeral storage Copy linkLink copied to clipboard!
To configure STF for ephemeral storage, add storageEphemeralEnabled: true to the ServiceTelemetry object in OCP. You can add storageEphemeralEnabled: true at installation time or, if you already deployed STF, complete the following steps:
Procedure
- Log in to Red Hat OpenShift Container Platform.
Change to the
service-telemetrynamespace:oc project service-telemetry
oc project service-telemetryCopy to Clipboard Copied! Toggle word wrap Toggle overflow Edit the ServiceTelemetry object:
oc edit ServiceTelemetry stf-default
$ oc edit ServiceTelemetry stf-defaultCopy to Clipboard Copied! Toggle word wrap Toggle overflow Add the
storageEphemeralEnabled: trueparameter to thespecsection:spec: eventsEnabled: true metricsEnabled: true storageEphemeralEnabled: true
spec: eventsEnabled: true metricsEnabled: true storageEphemeralEnabled: trueCopy to Clipboard Copied! Toggle word wrap Toggle overflow - Save your changes and close the object.