此内容没有您所选择的语言版本。
Chapter 4. Advanced features
The following optional features can provide additional functionality to the Service Telemetry Framework (STF):
- Customizing the deployment. For more information, see Section 4.1, “Customizing the deployment”.
- Alerts. For more information, see Section 4.2, “Alerts”.
- High availability. For more information, see Section 4.3, “High availability”.
- Dashboards. For more information, see Section 4.4, “Dashboards”.
- Multiple clouds. For more information, see Section 4.5, “Configuring multiple clouds”.
- Ephemeral storage. For more information, see Section 4.6, “Ephemeral storage”.
4.1. Customizing the deployment
The Service Telemetry Operator watches for a ServiceTelemetry
manifest to load into Red Hat OpenShift Container Platform (OCP). The Operator then creates other objects in memory, which results in the dependent Operators creating the workloads they are responsible for managing.
When you override the manifest, you must provide the entire manifest contents, including object names or namespaces. There is no dynamic parameter substitution when you override a manifest.
To override a manifest successfully with Service Telemetry Framework (STF), deploy a default environment using the core options only. For more information about the core options, see Section 2.3.10, “Creating a ServiceTelemetry object in OCP”. When you deploy STF, use the oc get
command to retrieve the default deployed manifest. When you use a manifest that was originally generated by Service Telemetry Operator, the manifest is compatible with the other objects that are managed by the Operators.
For example, when the metricsEnabled: true
parameter is configured in the ServiceTelemetry
manifest, the Service Telemetry Operator requests components for metrics retrieval and storage using the default manifests. In some cases, you might want to override the default manifest. For more information, see Section 4.1.1, “Manifest override parameters”.
4.1.1. Manifest override parameters
This table describes the available parameters that you can use to override a manifest, along with the corresponding retrieval commands.
Override parameter | Description | Retrieval command |
---|---|---|
|
Override the |
|
|
Override the |
|
|
Override the |
|
|
Override the |
|
|
Override the |
|
|
Override the |
|
|
Override the |
|
|
Override the |
|
|
Override the |
|
4.1.2. Overriding a managed manifest
Edit the ServiceTelemetry
object and provide a parameter and manifest. For a list of available manifest override parameters, see Section 4.1, “Customizing the deployment”. The default ServiceTelemetry
object is stf-default
. Use oc get servicetelemetry
to list the available STF deployments.
The oc edit
command loads the default system editor. To override the default editor, pass or set the environment variable EDITOR
to the preferred editor. For example, EDITOR=nano oc edit servicetelemetry stf-default
.
Procedure
- Log in to Red Hat OpenShift Container Platform.
Change to the
service-telemetry
namespace:oc project service-telemetry
Load the
ServiceTelemetry
object into an editor:oc edit servicetelemetry stf-default
To modify the
ServiceTelemetry
object, provide a manifest override parameter and the contents of the manifest to write to OCP instead of the defaults provided by STF.NoteThe trailing pipe (
|
) after entering the manifest override parameter indicates that the value provided is multi-line.$ oc edit servicetelemetry stf-default apiVersion: infra.watch/v1alpha1 kind: ServiceTelemetry metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"infra.watch/v1alpha1","kind":"ServiceTelemetry","metadata":{"annotations":{},"name":"stf-default","namespace":"service-telemetry"},"spec":{metricsEnabled":true}} creationTimestamp: "2020-04-14T20:29:42Z" generation: 1 name: stf-default namespace: service-telemetry resourceVersion: "1949423" selfLink: /apis/infra.watch/v1alpha1/namespaces/service-telemetry/servicetelemetrys/stf-default uid: d058bc41-1bb0-49f5-9a8b-642f4b8adb95 spec: metricsEnabled: true smartgatewayCollectdMetricsManifest: | 1 apiVersion: smartgateway.infra.watch/v2alpha1 kind: SmartGateway metadata: name: stf-default-collectd-telemetry namespace: service-telemetry spec: amqpUrl: stf-default-interconnect.service-telemetry.svc.cluster.local:5672/collectd/telemetry debug: true prefetch: 15000 serviceType: metrics size: 1 useTimestamp: true 2 status: conditions: - ansibleResult: changed: 0 completion: 2020-04-14T20:32:19.079508 failures: 0 ok: 52 skipped: 1 lastTransitionTime: "2020-04-14T20:29:59Z" message: Awaiting next reconciliation reason: Successful status: "True" type: Running
- Save and close.
4.2. Alerts
You create alert rules in Prometheus and alert routes in Alertmanager. Alert rules in Prometheus servers send alerts to an Alertmanager, which manages the alerts. Alertmanager can silence, inhibit, or aggregate alerts, and send notifications using email, on-call notification systems, or chat platforms.
To create an alert, complete the following tasks:
- Create an alert rule in Prometheus. For more information, see Section 4.2.1, “Creating an alert rule in Prometheus”.
- Create an alert route in Alertmanager. For more information, see Section 4.2.3, “Creating an alert route in Alertmanager”.
Additional resources
For more information about alerts or notifications with Prometheus and Alertmanager, see https://prometheus.io/docs/alerting/overview/
To view an example set of alerts that you can use with Service Telemetry Framework (STF), see https://github.com/infrawatch/service-telemetry-operator/tree/master/deploy/alerts
4.2.1. Creating an alert rule in Prometheus
Prometheus evaluates alert rules to trigger notifications. If the rule condition returns an empty result set, the condition is false. Otherwise, the rule is true and it triggers an alert.
Procedure
- Log in to Red Hat OpenShift Container Platform.
Change to the
service-telemetry
namespace:oc project service-telemetry
Create a
PrometheusRule
object that contains the alert rule. The Prometheus Operator loads the rule into Prometheus:oc apply -f - <<EOF apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: creationTimestamp: null labels: prometheus: stf-default role: alert-rules name: prometheus-alarm-rules namespace: service-telemetry spec: groups: - name: ./openstack.rules rules: - alert: Metric Listener down expr: collectd_qpid_router_status < 1 # To change the rule, edit the value of the expr parameter. EOF
To verify that the rules have been loaded into Prometheus by the Operator, create a pod with access to
curl
:oc run curl --generator=run-pod/v1 --image=radial/busyboxplus:curl -i --tty
Run
curl
to access theprometheus-operated
service to return the rules loaded into memory:[ root@curl:/ ]$ curl prometheus-operated:9090/api/v1/rules {"status":"success","data":{"groups":[{"name":"./openstack.rules","file":"/etc/prometheus/rules/prometheus-stf-default-rulefiles-0/service-telemetry-prometheus-alarm-rules.yaml","rules":[{"name":"Metric Listener down","query":"collectd_qpid_router_status \u003c 1","duration":0,"labels":{},"annotations":{},"alerts":[],"health":"ok","type":"alerting"}],"interval":30}]}}
To verify that the output shows the rules loaded into the
PrometheusRule
object, for example the output contains the defined./openstack.rules
, exit from the pod:[ root@curl:/ ]$ exit
Clean up the environment by deleting the
curl
pod:$ oc delete pod curl pod "curl" deleted
Additional resources
For more information on alerting, see https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/alerting.md
4.2.2. Configuring custom alerts
You can add custom alerts to the PrometheusRule
object that you created in Section 4.2.1, “Creating an alert rule in Prometheus”.
Procedure
Use the
oc edit
command:oc edit prometheusrules prometheus-alarm-rules
- Edit the PrometheusRules manifest.
- Save and close.
Additional resources
For more information about configuring alerting rules, see https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/.
For more information about PrometheusRules objects, see https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/alerting.md
4.2.3. Creating an alert route in Alertmanager
Use Alertmanager to deliver alerts to an external system, such as email, IRC, or other notification channel. The Prometheus Operator manages the Alertmanager configuration as an Red Hat OpenShift Container Platform (OCP) secret. STF by default deploys a basic configuration that results in no receivers:
alertmanager.yaml: |- global: resolve_timeout: 5m route: group_by: ['job'] group_wait: 30s group_interval: 5m repeat_interval: 12h receiver: 'null' receivers: - name: 'null'
To deploy a custom Alertmanager route with STF, an alertmanagerConfigManifest
parameter must be passed to the Service Telemetry Operator that results in an updated secret, managed by the Prometheus Operator. For more information, see Section 4.1.2, “Overriding a managed manifest”.
Procedure
- Log in to Red Hat OpenShift Container Platform.
Change to the
service-telemetry
namespace:oc project service-telemetry
Edit the
ServiceTelemetry
object for your STF deploymentoc edit servicetelemetry stf-default
Add a new parameter,
alertmanagerConfigManifest
and theSecret
object contents to define thealertmanager.yaml
configuration for Alertmanager:NoteThis loads the default template that is already managed by Service Telemetry Operator. To validate the changes are populating correctly, change a value, return the
alertmanager-stf-default
secret, and verify that the new value is loaded into memory, for example, changing the valueglobal.resolve_timeout
from5m
to10m
.apiVersion: infra.watch/v1alpha1 kind: ServiceTelemetry metadata: name: stf-default namespace: service-telemetry spec: metricsEnabled: true alertmanagerConfigManifest: | apiVersion: v1 kind: Secret metadata: name: 'alertmanager-stf-default' namespace: 'service-telemetry' type: Opaque stringData: alertmanager.yaml: |- global: resolve_timeout: 10m route: group_by: ['job'] group_wait: 30s group_interval: 5m repeat_interval: 12h receiver: 'null' receivers: - name: 'null'
Verify that the configuration was applied to the secret:
$ oc get secret alertmanager-stf-default -o go-template='{{index .data "alertmanager.yaml" | base64decode }}' global: resolve_timeout: 10m route: group_by: ['job'] group_wait: 30s group_interval: 5m repeat_interval: 12h receiver: 'null' receivers: - name: 'null'
To verify the configuration has been loaded into Alertmanager, create a pod with access to
curl
:oc run curl --generator=run-pod/v1 --image=radial/busyboxplus:curl -i --tty
Run
curl
against thealertmanager-operated
service to retrieve the status andconfigYAML
contents and review the supplied configuration matches the configuration loaded into Alertmanager:[ root@curl:/ ]$ curl alertmanager-operated:9093/api/v1/status {"status":"success","data":{"configYAML":"global:\n resolve_timeout: 10m\n http_config: {}\n smtp_hello: localhost\n smtp_require_tls: true\n pagerduty_url: https://events.pagerduty.com/v2/enqueue\n hipchat_api_url: https://api.hipchat.com/\n opsgenie_api_url: https://api.opsgenie.com/\n wechat_api_url: https://qyapi.weixin.qq.com/cgi-bin/\n victorops_api_url: https://alert.victorops.com/integrations/generic/20131114/alert/\nroute:\n receiver: \"null\"\n group_by:\n - job\n group_wait: 30s\n group_interval: 5m\n repeat_interval: 12h\nreceivers:\n- name: \"null\"\ntemplates: []\n",...}}
Verify that the
configYAML
field contains the expected changes. Exit from the pod:[ root@curl:/ ]$ exit
To clean up the environment, delete the
curl
pod:$ oc delete pod curl pod "curl" deleted
Additional resources
For more information about the Red Hat OpenShift Container Platform secret and the Prometheus operator, see https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/alerting.md
4.3. High availability
High availability is the ability of Service Telemetry Framework (STF) to rapidly recover from failures in its component services. Although Red Hat OpenShift Container Platform (OCP) restarts a failed pod if nodes are available to schedule the workload, this recovery process might take more than one minute, during which time events and metrics are lost. A high availability configuration includes multiple copies of STF components, reducing recovery time to approximately 2 seconds. To protect against failure of an OCP node, deploy STF to an OCP cluster with three or more nodes.
STF is not yet a fully fault tolerant system. Delivery of metrics and events during the recovery period is not guaranteed.
Enabling high availability has the following effects:
- Two AMQ Interconnect pods run instead of the default 1.
- Three ElasticSearch pods run instead of the default 1.
- Recovery time from a lost pod in either of these services reduces to approximately 2 seconds.
4.3.1. Configuring high availability
To configure STF for high availability, add highAvailabilityEnabled: true
to the ServiceTelemetry object in OCP. You can this set this parameter at installation time or, if you already deployed STF, complete the following steps:
Procedure
- Log in to Red Hat OpenShift Container Platform.
Change to the
service-telemetry
namespace:oc project service-telemetry
Use the oc command to edit the ServiceTelemetry object:
$ oc edit ServiceTelemetry
Add
highAvailabilityEnabled: true
to thespec
section:spec: eventsEnabled: true metricsEnabled: true highAvailabilityEnabled: true
- Save your changes and close the object.
4.4. Dashboards
Use third-party application Grafana to visualize system-level metrics gathered by collectd for each individual host node. For more information about configuring collectd, see Section 3.3, “Configuring Red Hat OpenStack Platform overcloud for Service Telemetry Framework”.
4.4.1. Setting up Grafana to host the dashboard
Grafana is not included in the default Service Telemetry Framework (STF) deployment so you must deploy the Grafana Operator from OperatorHub.io.
Procedure
- Log in to Red Hat OpenShift Container Platform.
Change to the
service-telemetry
namespace:oc project service-telemetry
Clone the dashboard repository.
git clone https://github.com/infrawatch/dashboards cd dashboards
Deploy the Grafana operator:
oc create -f deploy/subscription.yaml
To verify that the operator launched successfully, run the
oc get csv
command. If the value of the PHASE column is Succeeded, the operator launched successfully:$ oc get csv NAME DISPLAY VERSION REPLACES PHASE grafana-operator.v3.2.0 Grafana Operator 3.2.0 Succeeded ...
Launch a Grafana instance:
$ oc create -f deploy/grafana.yaml
Verify that the Grafana instance deployed:
$ oc get pod -l app=grafana NAME READY STATUS RESTARTS AGE grafana-deployment-7fc7848b56-sbkhv 1/1 Running 0 1m
Create the datasource and dashboard resources:
oc create -f deploy/datasource.yaml \ -f deploy/rhos-dashboard.yaml
Verify that the resources installed correctly:
$ oc get grafanadashboards NAME AGE rhos-dashboard 7d21h $ oc get grafanadatasources NAME AGE service-telemetry-grafanadatasource 1m
Navigate to https://<grafana-route-address> in a web browser. Use the
oc get routes
command to retrieve the Grafana route address:oc get routes
- To view the dashboard, click Dashboards and Manage.
Additional resources
- For more information about enabling the OperatorHub.io catalog source, see Section 2.3.5, “Enabling the OperatorHub.io Community Catalog Source”.
4.4.1.1. Viewing and editing queries
Procedure
-
Log in to Red Hat OpenShift Container Platform. To view and edit queries, log in as the
admin
user. Change to the
service-telemetry
namespace:oc project service-telemetry
To retrieve the default username and password, describe the Grafana object using
oc describe
:oc describe grafana service-telemetry-grafana
4.4.2. The Grafana infrastructure dashboard
The infrastructure dashboard shows metrics for a single node at a time. Select a node from the upper left corner of the dashboard.
4.4.2.1. Top panels
Title | Unit | Description |
Current Global Alerts | - | Current alerts fired by Prometheus |
Recent Global Alerts | - | Recently fired alerts in 5m time steps |
Status Panel | - | Node status: up, down, unavailable |
Uptime | s/m/h/d/M/Y | Total operational time of node |
CPU Cores | cores | Total number of cores |
Memory | bytes | Total memory |
Disk Size | bytes | Total storage size |
Processes | processes | Total number of processes listed by type |
Load Average | processes | Load average represents the average number of running and uninterruptible processes residing in the kernel execution queue. |
4.4.2.2. Networking panels
Panels that display the network interfaces of the node.
Panel | Unit | Description |
Physical Interfaces Ingress Errors | errors | Total errors with incoming data |
Physical Interfaces Egress Errors | errors | Total errors with outgoing data |
Physical Interfaces Ingress Error Rates | errors/s | Rate of incoming data errors |
Physical Interfaces egress Error Rates | errors/s | Rate of outgoing data errors |
Physical Interfaces Packets Ingress pps Incoming packets per second | Physical Interfaces Packets Egress | pps |
Outgoing packets per second | Physical Interfaces Data Ingress | bytes/s |
Incoming data rates | Physical Interfaces Data Egress | bytes/s |
Outgoing data rates | Physical Interfaces Drop Rate Ingress | pps |
Incoming packets drop rate | Physical Interfaces Drop Rate Egress | pps |
4.4.2.3. CPU panels
Panels that display CPU usage of the node.
Panel | Unit | Description |
---|---|---|
Current CPU Usage | percent | Instantaneous usage at the time of the last query. |
Aggregate CPU Usage | percent | Average non-idle CPU activity of all cores on a node. |
Aggr. CPU Usage by Type | percent | Shows time spent for each type of thread averaged across all cores. |
4.4.2.4. Memory panels
Panels that display memory usage on the node.
Panel | Unit | Description |
---|---|---|
Memory Used | percent | Amount of memory being used at time of last query. |
Huge Pages Used | hugepages | Number of hugepages being used. Memory |
4.4.2.5. Disk/file system
Panels that display space used on disk.
Panel | Unit | Description | Notes |
---|---|---|---|
Disk Space Usage | percent | Total disk use at time of last query. | |
Inode Usage | percent | Total inode use at time of last query. | |
Aggregate Disk Space Usage | bytes | Total disk space used and reserved. |
Because this query relies on the |
Disk Traffic | bytes/s | Shows rates for both reading and writing. | |
Disk Load | percent | Approximate percentage of total disk bandwidth being used. The weighted I/O time series includes the backlog that might be accumulating. For more information, see the collectd disk plugin docs. | |
Operations/s | ops/s | Operations done per second | |
Average I/O Operation Time | seconds | Average time each I/O operation took to complete. This average is not accurate, see the collectd disk plugin docs. |
4.5. Configuring multiple clouds
You can configure multiple Red Hat OpenStack Platform clouds to target a single instance of Service Telemetry Framework (STF):
- Plan the AMQP address prefixes that you want to use for each cloud. For more information, see Section 4.5.1, “Planning AMQP address prefixes”.
- Deploy metrics and events consumer Smart Gateways for each cloud to listen on the corresponding address prefixes. For more information, see Section 4.5.2, “Deploying Smart Gateways”.
- Configure each cloud to send its metrics and events to STF on the correct address. For more information, see Section 4.5.3, “Creating the OpenStack environment file”.
Figure 4.1. Two Red Hat OpenStack Platform clouds connect to STF

4.5.1. Planning AMQP address prefixes
By default, Red Hat OpenStack Platform nodes get data through two data collectors; collectd and Ceilometer. These components send telemetry data or notifications to the respective AMQP addresses, for example, collectd/telemetry
, where STF Smart Gateways listen on those addresses for monitoring data.
To support multiple clouds and to identify which cloud generated the monitoring data, configure each cloud to send data to a unique address. Prefix a cloud identifier to the second part of the address. The following list shows some example addresses and identifiers:
-
collectd/cloud1-telemetry
-
collectd/cloud1-notify
-
anycast/ceilometer/cloud1-event.sample
-
collectd/cloud2-telemetry
-
collectd/cloud2-notify
-
anycast/ceilometer/cloud2-event.sample
-
collectd/us-east-1-telemetry
-
collectd/us-west-3-telemetry
4.5.2. Deploying Smart Gateways
You must deploy a Smart Gateway for each of the data collection types for each cloud; one for collectd metrics, one for collectd events, and one for Ceilometer events. Configure each of the Smart Gateways to listen on the AMQP address that you define for the corresponding cloud.
When you deploy STF for the first time, Smart Gateway manifests are created that define the initial Smart Gateways for a single cloud. When deploying Smart Gateways for multiple cloud support, you deploy multiple Smart Gateways for each of the data collection types that handle the metrics and the events data for each cloud. The initial Smart Gateways act as a template to create additional Smart Gateways, along with any authentication information required to connect to the data stores.
Procedure
- Log in to Red Hat OpenShift Container Platform.
Change to the
service-telemetry
namespace:oc project service-telemetry
Use the initially deployed Smart Gateways as a template for additional Smart Gateways. List the currently deployed Smart Gateways with the
oc get smartgateways
command. For example, if you deployed STF withmetricsEnabled: true
andeventsEnabled: true
, the following Smart Gateways are displayed in the output:$ oc get smartgateways NAME AGE stf-default-ceilometer-notification 14d stf-default-collectd-notification 14d stf-default-collectd-telemetry 14d
Retrieve the manifests for each Smart Gateway and store the contents in a temporary file, which you can modify later and use to create the new set of Smart Gateways:
truncate --size 0 /tmp/cloud1-smartgateways.yaml && \ for sg in $(oc get smartgateways -oname) do echo "---" >> /tmp/cloud1-smartgateways.yaml oc get ${sg} -oyaml --export >> /tmp/cloud1-smartgateways.yaml done
-
Modify the Smart Gateway manifest in the
/tmp/cloud1-smartgateways.yaml
file. Adjust themetadata.name
andspec.amqpUrl
fields to include the cloud identifier from your schema. For more information, see ]. To view example Smart Gateway manifests, see <<example-manifests_advanced-features[. Deploy your new Smart Gateways:
oc apply -f /tmp/cloud1-smartgateways.yaml
Verify that each Smart Gateway is running. This can take several minutes depending on the number of Smart Gateways:
oc get po -l app=smart-gateway
4.5.2.1. Example manifests
The content in the following examples might be different to the file content in your deployment. Copy the manifests in your deployment.
Ensure that the name
and amqpUrl
parameters of each Smart Gateway match the names that you want to use for your clouds. For more information, see Section 4.5.1, “Planning AMQP address prefixes”.
Your output may have some additional metadata
parameters that you can remove from the manifests you that load into OCP.
apiVersion: smartgateway.infra.watch/v2alpha1 kind: SmartGateway metadata: name: stf-default-ceilometer-notification-cloud1 1 spec: amqpDataSource: ceilometer amqpUrl: stf-default-interconnect.service-telemetry.svc.cluster.local:5672/anycast/ceilometer/cloud1-event.sample 2 debug: false elasticPass: fkzfhghw...... elasticUrl: https://elasticsearch-es-http.service-telemetry.svc.cluster.local:9200 elasticUser: elastic resetIndex: false serviceType: events size: 1 tlsCaCert: /config/certs/ca.crt tlsClientCert: /config/certs/tls.crt tlsClientKey: /config/certs/tls.key tlsServerName: elasticsearch-es-http.service-telemetry.svc.cluster.local useBasicAuth: true useTls: true --- apiVersion: smartgateway.infra.watch/v2alpha1 kind: SmartGateway metadata: name: stf-default-collectd-notification-cloud1 3 spec: amqpDataSource: collectd amqpUrl: stf-default-interconnect.service-telemetry.svc.cluster.local:5672/collectd/cloud1-notify 4 debug: false elasticPass: fkzfhghw...... elasticUrl: https://elasticsearch-es-http.service-telemetry.svc.cluster.local:9200 elasticUser: elastic resetIndex: false serviceType: events size: 1 tlsCaCert: /config/certs/ca.crt tlsClientCert: /config/certs/tls.crt tlsClientKey: /config/certs/tls.key tlsServerName: elasticsearch-es-http.service-telemetry.svc.cluster.local useBasicAuth: true useTls: true --- apiVersion: smartgateway.infra.watch/v2alpha1 kind: SmartGateway metadata: name: stf-default-collectd-telemetry-cloud1 5 spec: amqpUrl: stf-default-interconnect.service-telemetry.svc.cluster.local:5672/collectd/cloud1-telemetry 6 debug: false prefetch: 15000 serviceType: metrics size: 1 useTimestamp: true
4.5.3. Creating the OpenStack environment file
To label traffic according to the cloud of origin, you must create a configuration with cloud-specific instance names. Create an stf-connectors.yaml
file and adjust the values of CeilometerQdrEventsConfig
and CollectdAmqpInstances
to match the AMQP address prefix scheme. For more information, see Section 4.5.1, “Planning AMQP address prefixes”.
Remove enable-stf.yaml
and ceilometer-write-qdr.yaml
environment file references from your overcloud deployment. This configuration is redundant and results in duplicate information being sent from each cloud node.
Procedure
-
Create the
stf-connectors.yaml
file and modify it to match the AMQP address that you want for this cloud deployment:
resource_registry: OS::TripleO::Services::Collectd: /usr/share/openstack-tripleo-heat-templates/deployment/metrics/collectd-container-puppet.yaml OS::TripleO::Services::MetricsQdr: /usr/share/openstack-tripleo-heat-templates/deployment/metrics/qdr-container-puppet.yaml OS::TripleO::Services::CeilometerAgentCentral: /usr/share/openstack-tripleo-heat-templates/deployment/ceilometer/ceilometer-agent-central-container-puppet.yaml OS::TripleO::Services::CeilometerAgentNotification: /usr/share/openstack-tripleo-heat-templates/deployment/ceilometer/ceilometer-agent-notification-container-puppet.yaml OS::TripleO::Services::CeilometerAgentIpmi: /usr/share/openstack-tripleo-heat-templates/deployment/ceilometer/ceilometer-agent-ipmi-container-puppet.yaml OS::TripleO::Services::ComputeCeilometerAgent: /usr/share/openstack-tripleo-heat-templates/deployment/ceilometer/ceilometer-agent-compute-container-puppet.yaml OS::TripleO::Services::Redis: /usr/share/openstack-tripleo-heat-templates/deployment/database/redis-pacemaker-puppet.yaml parameter_defaults: EnableSTF: true EventPipelinePublishers: [] CeilometerEnablePanko: false CeilometerQdrPublishEvents: true CeilometerQdrEventsConfig: driver: amqp topic: cloud1-event 1 CollectdConnectionType: amqp1 CollectdAmqpInterval: 5 CollectdDefaultPollingInterval: 5 CollectdAmqpInstances: cloud1-notify: 2 notify: true format: JSON presettle: false cloud1-telemetry: 3 format: JSON presettle: true MetricsQdrAddresses: - prefix: collectd distribution: multicast - prefix: anycast/ceilometer distribution: multicast MetricsQdrSSLProfiles: - name: sslProfile MetricsQdrConnectors: - host: stf-default-interconnect-5671-service-telemetry.apps.infra.watch 4 port: 443 role: edge verifyHostname: false sslProfile: sslProfile
+ <1> Define the topic for Ceilometer events. This value is the address format of anycast/ceilometer/cloud1-event.sample
. <2> Define the topic for collectd events. This value is the format of collectd/cloud1-notify
. <3> Define the topic for collectd metrics. This value is the format of collectd/cloud1-telemetry
. <4> Adjust the MetricsQdrConnectors
host to the address of the STF route.
-
Ensure that the naming convention in the
stf-connectors.yaml
file aligns with thespec.amqpUrl
field in the Smart Gateway configuration. For example, configure theCeilometerQdrEventsConfig.topic
field to a value ofcloud1-event
. -
Save the file in a directory for custom environment files, for example
/home/stack/custom_templates/
. Source the authentication file:
[stack@undercloud-0 ~]$ source stackrc (undercloud) [stack@undercloud-0 ~]$
Include the
stf-connectors.yaml
file in theovercloud deployment
command, along with any other environment files relevant to your environment:(undercloud) [stack@undercloud-0 ~]$ openstack overcloud deploy \ --templates /usr/share/openstack-tripleo-heat-templates \ ... -e /home/stack/custom_templates/stf-connectors.yaml \ ...
Additional resources
For information about validating the deployment, see Section 3.3.3, “Validating client-side installation”.
4.5.4. Querying metrics data from multiple clouds
Data stored in Prometheus has a service label attached according to the Smart Gateway it was scraped from. You can use this label to query data from a specific cloud.
To query data from a specific cloud, use a Prometheus promql
query that matches the associated service label; for example: collectd_uptime{service="stf-default-collectd-telemetry-cloud1-smartgateway"}
.
4.6. Ephemeral storage
Use ephemeral storage to run Service Telemetry Framework (STF) without persistently storing data in your Red Hat OpenShift Container Platform (OCP) cluster. Ephemeral storage is not recommended in a production environment due to the volatility of the data in the platform when operating correctly and as designed. For example, restarting a pod or rescheduling the workload to another node results in the loss of any local data written since the pod started.
If you enable ephemeral storage in STF, the Service Telemetry Operator does not add the relevant storage
sections to the data storage components manifests.
4.6.1. Configuring ephemeral storage
To configure STF for ephemeral storage, add storageEphemeralEnabled: true
to the ServiceTelemetry object in OCP. You can add storageEphemeralEnabled: true
at installation time or, if you already deployed STF, complete the following steps:
Procedure
- Log in to Red Hat OpenShift Container Platform.
Change to the
service-telemetry
namespace:oc project service-telemetry
Edit the ServiceTelemetry object:
$ oc edit ServiceTelemetry stf-default
Add the
storageEphemeralEnabled: true
parameter to thespec
section:spec: eventsEnabled: true metricsEnabled: true storageEphemeralEnabled: true
- Save your changes and close the object.