3.10. 使用 Telemetry 服务进行容量 metering


OpenStack Telemetry 服务提供可用于计费、收费和回放用途的使用指标。第三方应用也可以使用此类指标数据来规划集群上的容量,也可以使用 OpenStack Heat 自动扩展虚拟实例。如需更多信息,请参阅 实例的自动扩展

您可以使用 Ceilometer 和 Gnocchi 的组合来监控和警报。这在小型集群中支持,以及已知的限制。对于实时监控,Red Hat OpenStack Platform 附带了提供指标数据的代理,并可以被独立的监控基础架构和应用程序使用。如需更多信息,请参阅监控工具配置

3.10.1. 查看测量结果

列出特定资源的所有测量结果:

# openstack metric measures show --resource-id UUID METER_NAME

仅列出特定资源的测量结果,位于时间戳范围内:

# openstack metric measures show --aggregation mean --start <START_TIME> --stop <STOP_TIME> --resource-id UUID METER_NAME

时间戳变量 <START_TIME> 和 <STOP_TIME> 使用格式 iso-dateThh:mm:ss

3.10.2. 创建新测量结果

您可以使用测量结果将数据发送到 Telemetry 服务,它们不需要与之前定义的量表对应。例如:

# openstack metrics measures add -m 2015-01-12T17:56:23@42 --resource-id UUID METER_NAME

3.10.3. 示例:查看云使用测量结果

本例显示每个项目的所有实例的平均内存用量。

# openstack metric measures aggregation --resource-type instance --groupby project_id -m memory

3.10.4. 查看现有警报

要列出现有的 Telemetry 警报,请使用 aodh 命令。例如:

# aodh alarm list
+--------------------------------------+--------------------------------------------+----------------------------+-------------------+----------+---------+
| alarm_id                             | type                                       | name                       | state             | severity | enabled |
+--------------------------------------+--------------------------------------------+----------------------------+-------------------+----------+---------+
| 922f899c-27c8-4c7d-a2cf-107be51ca90a | gnocchi_aggregation_by_resources_threshold | iops-monitor-read-requests | insufficient data | low      | True    |
+--------------------------------------+--------------------------------------------+----------------------------+-------------------+----------+---------+

若要列出分配给资源的计量,请指定资源的 UUID (实例、镜像或卷等)。例如:

# gnocchi resource show 5e3fcbe2-7aab-475d-b42c-a440aa42e5ad

3.10.5. 创建警报

您可以使用 aodh 创建在达到阈值时激活的警报。在本例中,警报激活,并在单个实例的平均 CPU 使用率超过 80% 时添加一个日志条目。查询用于隔离特定实例的 id (94619081-abf5-4f1f-81c7-9cedaa872403)用于监控目的:

# aodh alarm create --type gnocchi_aggregation_by_resources_threshold --name cpu_usage_high --metric cpu_util --threshold 80 --aggregation-method sum --resource-type instance --query '{"=": {"id": "94619081-abf5-4f1f-81c7-9cedaa872403"}}' --alarm-action 'log://'
+---------------------------+-------------------------------------------------------+
| Field                     | Value                                                 |
+---------------------------+-------------------------------------------------------+
| aggregation_method        | sum                                                   |
| alarm_actions             | [u'log://']                                           |
| alarm_id                  | b794adc7-ed4f-4edb-ace4-88cbe4674a94                  |
| comparison_operator       | eq                                                    |
| description               | gnocchi_aggregation_by_resources_threshold alarm rule |
| enabled                   | True                                                  |
| evaluation_periods        | 1                                                     |
| granularity               | 60                                                    |
| insufficient_data_actions | []                                                    |
| metric                    | cpu_util                                              |
| name                      | cpu_usage_high                                        |
| ok_actions                | []                                                    |
| project_id                | 13c52c41e0e543d9841a3e761f981c20                      |
| query                     | {"=": {"id": "94619081-abf5-4f1f-81c7-9cedaa872403"}} |
| repeat_actions            | False                                                 |
| resource_type             | instance                                              |
| severity                  | low                                                   |
| state                     | insufficient data                                     |
| state_timestamp           | 2016-12-09T05:18:53.326000                            |
| threshold                 | 80.0                                                  |
| time_constraints          | []                                                    |
| timestamp                 | 2016-12-09T05:18:53.326000                            |
| type                      | gnocchi_aggregation_by_resources_threshold            |
| user_id                   | 32d3f2c9a234423cb52fb69d3741dbbc                      |
+---------------------------+-------------------------------------------------------+

要编辑现有的阈值警报,请使用 aodh alarm update 命令。例如,将警报阈值增加到 75%:

# aodh alarm update --name cpu_usage_high --threshold 75

3.10.6. 禁用或删除警报

禁用警报:

# aodh alarm update --name cpu_usage_high --enabled=false

删除警报:

# aodh alarm delete --name cpu_usage_high

3.10.7. 示例:监控实例的磁盘活动

以下示例演示了如何使用 Aodh 警报来监控特定项目中包含的所有实例的累积磁盘活动。

1.检查现有项目,然后选择您需要监控的项目的适当 UUID。本例使用 admin 项目:

$ openstack project list
+----------------------------------+----------+
| ID                               | Name     |
+----------------------------------+----------+
| 745d33000ac74d30a77539f8920555e7 | admin    |
| 983739bb834a42ddb48124a38def8538 | services |
| be9e767afd4c4b7ead1417c6dfedde2b | demo     |
+----------------------------------+----------+

2.使用项目的 UUID 创建一个警报,用于分析 admin 项目中实例生成的所有读取请求的 sum () (可以使用 --query 参数进行进一步的其余部分)。

# aodh alarm create --type gnocchi_aggregation_by_resources_threshold --name iops-monitor-read-requests --metric disk.read.requests.rate --threshold 42000 --aggregation-method sum --resource-type instance --query '{"=": {"project_id": "745d33000ac74d30a77539f8920555e7"}}'
+---------------------------+-----------------------------------------------------------+
| Field                     | Value                                                     |
+---------------------------+-----------------------------------------------------------+
| aggregation_method        | sum                                                       |
| alarm_actions             | []                                                        |
| alarm_id                  | 192aba27-d823-4ede-a404-7f6b3cc12469                      |
| comparison_operator       | eq                                                        |
| description               | gnocchi_aggregation_by_resources_threshold alarm rule     |
| enabled                   | True                                                      |
| evaluation_periods        | 1                                                         |
| granularity               | 60                                                        |
| insufficient_data_actions | []                                                        |
| metric                    | disk.read.requests.rate                                   |
| name                      | iops-monitor-read-requests                                |
| ok_actions                | []                                                        |
| project_id                | 745d33000ac74d30a77539f8920555e7                          |
| query                     | {"=": {"project_id": "745d33000ac74d30a77539f8920555e7"}} |
| repeat_actions            | False                                                     |
| resource_type             | instance                                                  |
| severity                  | low                                                       |
| state                     | insufficient data                                         |
| state_timestamp           | 2016-11-08T23:41:22.919000                                |
| threshold                 | 42000.0                                                   |
| time_constraints          | []                                                        |
| timestamp                 | 2016-11-08T23:41:22.919000                                |
| type                      | gnocchi_aggregation_by_resources_threshold                |
| user_id                   | 8c4aea738d774967b4ef388eb41fef5e                          |
+---------------------------+-----------------------------------------------------------+

3.10.8. 示例:监控 CPU 用量

如果要监控实例的性能,首先要检查 gnocchi 数据库以确定您可以监控哪些指标,如内存或 CPU 使用量。例如,针对实例运行 gnocchi 资源显示,以识别可以监控哪些指标:

  1. 查询特定实例 UUID 的可用指标:

    $ gnocchi resource show --type instance d71cdf9a-51dc-4bba-8170-9cd95edd3f66
    --------------------------------------------------------------------------------------------+
    | Field                 | Value                                                               |
    --------------------------------------------------------------------------------------------+
    | created_by_project_id | 44adccdc32614688ae765ed4e484f389                                    |
    | created_by_user_id    | c24fa60e46d14f8d847fca90531b43db                                    |
    | creator               | c24fa60e46d14f8d847fca90531b43db:44adccdc32614688ae765ed4e484f389   |
    | display_name          | test-instance                                                       |
    | ended_at              | None                                                                |
    | flavor_id             | 14c7c918-df24-481c-b498-0d3ec57d2e51                                |
    | flavor_name           | m1.tiny                                                             |
    | host                  | overcloud-compute-0                                                 |
    | id                    | d71cdf9a-51dc-4bba-8170-9cd95edd3f66                                |
    | image_ref             | e75dff7b-3408-45c2-9a02-61fbfbf054d7                                |
    | metrics               | compute.instance.booting.time: c739a70d-2d1e-45c1-8c1b-4d28ff2403ac |
    |                       | cpu.delta: 700ceb7c-4cff-4d92-be2f-6526321548d6                     |
    |                       | cpu: 716d6128-1ea6-430d-aa9c-ceaff2a6bf32                           |
    |                       | cpu_l3_cache: 3410955e-c724-48a5-ab77-c3050b8cbe6e                  |
    |                       | cpu_util: b148c392-37d6-4c8f-8609-e15fc15a4728                      |
    |                       | disk.allocation: 9dd464a3-acf8-40fe-bd7e-3cb5fb12d7cc               |
    |                       | disk.capacity: c183d0da-e5eb-4223-a42e-855675dd1ec6                 |
    |                       | disk.ephemeral.size: 15d1d828-fbb4-4448-b0f2-2392dcfed5b6           |
    |                       | disk.iops: b8009e70-daee-403f-94ed-73853359a087                     |
    |                       | disk.latency: 1c648176-18a6-4198-ac7f-33ee628b82a9                  |
    |                       | disk.read.bytes.rate: eb35828f-312f-41ce-b0bc-cb6505e14ab7          |
    |                       | disk.read.bytes: de463be7-769b-433d-9f22-f3265e146ec8               |
    |                       | disk.read.requests.rate: 588ca440-bd73-4fa9-a00c-8af67262f4fd       |
    |                       | disk.read.requests: 53e5d599-6cad-47de-b814-5cb23e8aaf24            |
    |                       | disk.root.size: cee9d8b1-181e-4974-9427-aa7adb3b96d9                |
    |                       | disk.usage: 4d724c99-7947-4c6d-9816-abbbc166f6f3                    |
    |                       | disk.write.bytes.rate: 45b8da6e-0c89-4a6c-9cce-c95d49d9cc8b         |
    |                       | disk.write.bytes: c7734f1b-b43a-48ee-8fe4-8a31b641b565              |
    |                       | disk.write.requests.rate: 96ba2f22-8dd6-4b89-b313-1e0882c4d0d6      |
    |                       | disk.write.requests: 553b7254-be2d-481b-9d31-b04c93dbb168           |
    |                       | memory.bandwidth.local: 187f29d4-7c70-4ae2-86d1-191d11490aad        |
    |                       | memory.bandwidth.total: eb09a4fc-c202-4bc3-8c94-aa2076df7e39        |
    |                       | memory.resident: 97cfb849-2316-45a6-9545-21b1d48b0052               |
    |                       | memory.swap.in: f0378d8f-6927-4b76-8d34-a5931799a301                |
    |                       | memory.swap.out: c5fba193-1a1b-44c8-82e3-9fdc9ef21f69               |
    |                       | memory.usage: 7958d06d-7894-4ca1-8c7e-72ba572c1260                  |
    |                       | memory: a35c7eab-f714-4582-aa6f-48c92d4b79cd                        |
    |                       | perf.cache.misses: da69636d-d210-4b7b-bea5-18d4959e95c1             |
    |                       | perf.cache.references: e1955a37-d7e4-4b12-8a2a-51de4ec59efd         |
    |                       | perf.cpu.cycles: 5d325d44-b297-407a-b7db-cc9105549193               |
    |                       | perf.instructions: 973d6c6b-bbeb-4a13-96c2-390a63596bfc             |
    |                       | vcpus: 646b53d0-0168-4851-b297-05d96cc03ab2                         |
    | original_resource_id  | d71cdf9a-51dc-4bba-8170-9cd95edd3f66                                |
    | project_id            | 3cee262b907b4040b26b678d7180566b                                    |
    | revision_end          | None                                                                |
    | revision_start        | 2017-11-16T04:00:27.081865+00:00                                    |
    | server_group          | None                                                                |
    | started_at            | 2017-11-16T01:09:20.668344+00:00                                    |
    | type                  | instance                                                            |
    | user_id               | 1dbf5787b2ee46cf9fa6a1dfea9c9996                                    |
    --------------------------------------------------------------------------------------------+

    因此,metrics 值列出了您可以使用 Aodh 警报监控的组件,如 cpu_util

  2. 要监控 CPU 用量,您需要 cpu_util 指标。查看此指标的更多信息:

    $ gnocchi metric show --resource d71cdf9a-51dc-4bba-8170-9cd95edd3f66 cpu_util
    -------------------------------------------------------------------------------------------------------+
    | Field                              | Value                                                             |
    -------------------------------------------------------------------------------------------------------+
    | archive_policy/aggregation_methods | std, count, min, max, sum, mean                                   |
    | archive_policy/back_window         | 0                                                                 |
    | archive_policy/definition          | - points: 8640, granularity: 0:05:00, timespan: 30 days, 0:00:00  |
    | archive_policy/name                | low                                                               |
    | created_by_project_id              | 44adccdc32614688ae765ed4e484f389                                  |
    | created_by_user_id                 | c24fa60e46d14f8d847fca90531b43db                                  |
    | creator                            | c24fa60e46d14f8d847fca90531b43db:44adccdc32614688ae765ed4e484f389 |
    | id                                 | b148c392-37d6-4c8f-8609-e15fc15a4728                              |
    | name                               | cpu_util                                                          |
    | resource/created_by_project_id     | 44adccdc32614688ae765ed4e484f389                                  |
    | resource/created_by_user_id        | c24fa60e46d14f8d847fca90531b43db                                  |
    | resource/creator                   | c24fa60e46d14f8d847fca90531b43db:44adccdc32614688ae765ed4e484f389 |
    | resource/ended_at                  | None                                                              |
    | resource/id                        | d71cdf9a-51dc-4bba-8170-9cd95edd3f66                              |
    | resource/original_resource_id      | d71cdf9a-51dc-4bba-8170-9cd95edd3f66                              |
    | resource/project_id                | 3cee262b907b4040b26b678d7180566b                                  |
    | resource/revision_end              | None                                                              |
    | resource/revision_start            | 2017-11-17T00:05:27.516421+00:00                                  |
    | resource/started_at                | 2017-11-16T01:09:20.668344+00:00                                  |
    | resource/type                      | instance                                                          |
    | resource/user_id                   | 1dbf5787b2ee46cf9fa6a1dfea9c9996                                  |
    | unit                               | None                                                              |
    -------------------------------------------------------------------------------------------------------+
    • archive_policy - 定义计算 std, count, min, max, sum, mean 值的聚合间隔。
  3. 使用 Aodh 创建查询 cpu_util 的监控任务。此任务将根据您指定的设置触发事件。例如,在延长持续时间内,当实例的 CPU 高峰超过 80% 时引发日志条目:

    aodh alarm create \
      --project-id 3cee262b907b4040b26b678d7180566b \
      --name high-cpu \
      --type gnocchi_resources_threshold \
      --description 'High CPU usage' \
      --metric cpu_util \
      --threshold 80.0 \
      --comparison-operator ge \
      --aggregation-method mean \
      --granularity 300 \
      --evaluation-periods 1 \
      --alarm-action 'log://' \
      --ok-action 'log://' \
      --resource-type instance \
      --resource-id d71cdf9a-51dc-4bba-8170-9cd95edd3f66
    +---------------------------+--------------------------------------+
    | Field                     | Value                                |
    +---------------------------+--------------------------------------+
    | aggregation_method        | mean                                 |
    | alarm_actions             | [u'log://']                          |
    | alarm_id                  | 1625015c-49b8-4e3f-9427-3c312a8615dd |
    | comparison_operator       | ge                                   |
    | description               | High CPU usage                       |
    | enabled                   | True                                 |
    | evaluation_periods        | 1                                    |
    | granularity               | 300                                  |
    | insufficient_data_actions | []                                   |
    | metric                    | cpu_util                             |
    | name                      | high-cpu                             |
    | ok_actions                | [u'log://']                          |
    | project_id                | 3cee262b907b4040b26b678d7180566b     |
    | repeat_actions            | False                                |
    | resource_id               | d71cdf9a-51dc-4bba-8170-9cd95edd3f66 |
    | resource_type             | instance                             |
    | severity                  | low                                  |
    | state                     | insufficient data                    |
    | state_reason              | Not evaluated yet                    |
    | state_timestamp           | 2017-11-16T05:20:48.891365           |
    | threshold                 | 80.0                                 |
    | time_constraints          | []                                   |
    | timestamp                 | 2017-11-16T05:20:48.891365           |
    | type                      | gnocchi_resources_threshold          |
    | user_id                   | 1dbf5787b2ee46cf9fa6a1dfea9c9996     |
    +---------------------------+--------------------------------------+
    • compare-operator - ge 运算符定义,如果 CPU 使用率大于(或等于) 80%,警报将触发。
    • granularity - 指标关联了一个归档策略,策略可以具有各种粒度(例如,1 小时有 5 分钟的,一个月有 1 小时的聚合)。granularity 值必须与归档策略中描述的持续时间匹配。
    • evaluation-periods - 在警报触发前需要传递 的粒度 周期数。例如,将此值设置为 2 意味着,在警报触发前,CPU 使用率需要超过 80%。
    • [U'log://'] - 这个值会将事件记录到 Aodh 日志文件。

      注意

      您可以定义在警报触发时运行的不同操作(alarm_actions),以及返回正常状态(ok_actions),如 Webhook URL。

  4. 要检查您的警报是否已触发,请查询警报的历史记录:

    aodh alarm-history show 1625015c-49b8-4e3f-9427-3c312a8615dd --fit-width
    +----------------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+
    | timestamp                  | type             | detail                                                                                                                                            | event_id                             |
    +----------------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+
    | 2017-11-16T05:21:47.850094 | state transition | {"transition_reason": "Transition to ok due to 1 samples inside threshold, most recent: 0.0366665763", "state": "ok"}                             | 3b51f09d-ded1-4807-b6bb-65fdc87669e4 |
    +----------------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+

3.10.9. 管理资源类型

以前硬编码的 Telemetry 资源类型现在可以由 gnocchi 客户端管理。您可以使用 gnocchi 客户端创建、查看和删除资源类型,您可以使用 gnocchi API 更新或删除属性。

1.创建新的 resource-type

$ gnocchi resource-type create testResource01 -a bla:string:True:min_length=123
+----------------+------------------------------------------------------------+
| Field          | Value                                                      |
+----------------+------------------------------------------------------------+
| attributes/bla | max_length=255, min_length=123, required=True, type=string |
| name           | testResource01                                             |
| state          | active                                                     |
+----------------+------------------------------------------------------------+

2.查看 resource-type 的配置:

$ gnocchi resource-type show testResource01
+----------------+------------------------------------------------------------+
| Field          | Value                                                      |
+----------------+------------------------------------------------------------+
| attributes/bla | max_length=255, min_length=123, required=True, type=string |
| name           | testResource01                                             |
| state          | active                                                     |
+----------------+------------------------------------------------------------+

3.删除 resource-type

$ gnocchi resource-type delete testResource01
注意

如果资源正在使用,则无法删除资源类型。

Red Hat logoGithubRedditYoutubeTwitter

学习

尝试、购买和销售

社区

关于红帽文档

通过我们的产品和服务,以及可以信赖的内容,帮助红帽用户创新并实现他们的目标。 了解我们当前的更新.

让开源更具包容性

红帽致力于替换我们的代码、文档和 Web 属性中存在问题的语言。欲了解更多详情,请参阅红帽博客.

關於紅帽

我们提供强化的解决方案,使企业能够更轻松地跨平台和环境(从核心数据中心到网络边缘)工作。

© 2024 Red Hat, Inc.