3.10. 使用 Telemetry 服务进行容量 metering
OpenStack Telemetry 服务提供可用于计费、收费和回放用途的使用指标。第三方应用也可以使用此类指标数据来规划集群上的容量,也可以使用 OpenStack Heat 自动扩展虚拟实例。如需更多信息,请参阅 实例的自动扩展。
您可以使用 Ceilometer 和 Gnocchi 的组合来监控和警报。这在小型集群中支持,以及已知的限制。对于实时监控,Red Hat OpenStack Platform 附带了提供指标数据的代理,并可以被独立的监控基础架构和应用程序使用。如需更多信息,请参阅监控工具配置。
3.10.1. 查看测量结果
列出特定资源的所有测量结果:
# openstack metric measures show --resource-id UUID METER_NAME
仅列出特定资源的测量结果,位于时间戳范围内:
# openstack metric measures show --aggregation mean --start <START_TIME> --stop <STOP_TIME> --resource-id UUID METER_NAME
时间戳变量 <START_TIME> 和 <STOP_TIME> 使用格式 iso-dateThh:mm:ss。
3.10.2. 创建新测量结果
您可以使用测量结果将数据发送到 Telemetry 服务,它们不需要与之前定义的量表对应。例如:
# openstack metrics measures add -m 2015-01-12T17:56:23@42 --resource-id UUID METER_NAME
3.10.3. 示例:查看云使用测量结果
本例显示每个项目的所有实例的平均内存用量。
# openstack metric measures aggregation --resource-type instance --groupby project_id -m memory
3.10.4. 查看现有警报
要列出现有的 Telemetry 警报,请使用 aodh
命令。例如:
# aodh alarm list +--------------------------------------+--------------------------------------------+----------------------------+-------------------+----------+---------+ | alarm_id | type | name | state | severity | enabled | +--------------------------------------+--------------------------------------------+----------------------------+-------------------+----------+---------+ | 922f899c-27c8-4c7d-a2cf-107be51ca90a | gnocchi_aggregation_by_resources_threshold | iops-monitor-read-requests | insufficient data | low | True | +--------------------------------------+--------------------------------------------+----------------------------+-------------------+----------+---------+
若要列出分配给资源的计量,请指定资源的 UUID (实例、镜像或卷等)。例如:
# gnocchi resource show 5e3fcbe2-7aab-475d-b42c-a440aa42e5ad
3.10.5. 创建警报
您可以使用 aodh
创建在达到阈值时激活的警报。在本例中,警报激活,并在单个实例的平均 CPU 使用率超过 80% 时添加一个日志条目。查询用于隔离特定实例的 id (94619081-abf5-4f1f-81c7-9cedaa872403
)用于监控目的:
# aodh alarm create --type gnocchi_aggregation_by_resources_threshold --name cpu_usage_high --metric cpu_util --threshold 80 --aggregation-method sum --resource-type instance --query '{"=": {"id": "94619081-abf5-4f1f-81c7-9cedaa872403"}}' --alarm-action 'log://' +---------------------------+-------------------------------------------------------+ | Field | Value | +---------------------------+-------------------------------------------------------+ | aggregation_method | sum | | alarm_actions | [u'log://'] | | alarm_id | b794adc7-ed4f-4edb-ace4-88cbe4674a94 | | comparison_operator | eq | | description | gnocchi_aggregation_by_resources_threshold alarm rule | | enabled | True | | evaluation_periods | 1 | | granularity | 60 | | insufficient_data_actions | [] | | metric | cpu_util | | name | cpu_usage_high | | ok_actions | [] | | project_id | 13c52c41e0e543d9841a3e761f981c20 | | query | {"=": {"id": "94619081-abf5-4f1f-81c7-9cedaa872403"}} | | repeat_actions | False | | resource_type | instance | | severity | low | | state | insufficient data | | state_timestamp | 2016-12-09T05:18:53.326000 | | threshold | 80.0 | | time_constraints | [] | | timestamp | 2016-12-09T05:18:53.326000 | | type | gnocchi_aggregation_by_resources_threshold | | user_id | 32d3f2c9a234423cb52fb69d3741dbbc | +---------------------------+-------------------------------------------------------+
要编辑现有的阈值警报,请使用 aodh alarm update
命令。例如,将警报阈值增加到 75%:
# aodh alarm update --name cpu_usage_high --threshold 75
3.10.6. 禁用或删除警报
禁用警报:
# aodh alarm update --name cpu_usage_high --enabled=false
删除警报:
# aodh alarm delete --name cpu_usage_high
3.10.7. 示例:监控实例的磁盘活动
以下示例演示了如何使用 Aodh 警报来监控特定项目中包含的所有实例的累积磁盘活动。
1.检查现有项目,然后选择您需要监控的项目的适当 UUID。本例使用 admin
项目:
$ openstack project list +----------------------------------+----------+ | ID | Name | +----------------------------------+----------+ | 745d33000ac74d30a77539f8920555e7 | admin | | 983739bb834a42ddb48124a38def8538 | services | | be9e767afd4c4b7ead1417c6dfedde2b | demo | +----------------------------------+----------+
2.使用项目的 UUID 创建一个警报,用于分析 admin
项目中实例生成的所有读取请求的 sum ()
(可以使用 --query
参数进行进一步的其余部分)。
# aodh alarm create --type gnocchi_aggregation_by_resources_threshold --name iops-monitor-read-requests --metric disk.read.requests.rate --threshold 42000 --aggregation-method sum --resource-type instance --query '{"=": {"project_id": "745d33000ac74d30a77539f8920555e7"}}' +---------------------------+-----------------------------------------------------------+ | Field | Value | +---------------------------+-----------------------------------------------------------+ | aggregation_method | sum | | alarm_actions | [] | | alarm_id | 192aba27-d823-4ede-a404-7f6b3cc12469 | | comparison_operator | eq | | description | gnocchi_aggregation_by_resources_threshold alarm rule | | enabled | True | | evaluation_periods | 1 | | granularity | 60 | | insufficient_data_actions | [] | | metric | disk.read.requests.rate | | name | iops-monitor-read-requests | | ok_actions | [] | | project_id | 745d33000ac74d30a77539f8920555e7 | | query | {"=": {"project_id": "745d33000ac74d30a77539f8920555e7"}} | | repeat_actions | False | | resource_type | instance | | severity | low | | state | insufficient data | | state_timestamp | 2016-11-08T23:41:22.919000 | | threshold | 42000.0 | | time_constraints | [] | | timestamp | 2016-11-08T23:41:22.919000 | | type | gnocchi_aggregation_by_resources_threshold | | user_id | 8c4aea738d774967b4ef388eb41fef5e | +---------------------------+-----------------------------------------------------------+
3.10.8. 示例:监控 CPU 用量
如果要监控实例的性能,首先要检查 gnocchi 数据库以确定您可以监控哪些指标,如内存或 CPU 使用量。例如,针对实例运行 gnocchi 资源显示
,以识别可以监控哪些指标:
查询特定实例 UUID 的可用指标:
$ gnocchi resource show --type instance d71cdf9a-51dc-4bba-8170-9cd95edd3f66
-----------------------
---------------------------------------------------------------------+ | Field | Value |-----------------------
---------------------------------------------------------------------+ | created_by_project_id | 44adccdc32614688ae765ed4e484f389 | | created_by_user_id | c24fa60e46d14f8d847fca90531b43db | | creator | c24fa60e46d14f8d847fca90531b43db:44adccdc32614688ae765ed4e484f389 | | display_name | test-instance | | ended_at | None | | flavor_id | 14c7c918-df24-481c-b498-0d3ec57d2e51 | | flavor_name | m1.tiny | | host | overcloud-compute-0 | | id | d71cdf9a-51dc-4bba-8170-9cd95edd3f66 | | image_ref | e75dff7b-3408-45c2-9a02-61fbfbf054d7 | | metrics | compute.instance.booting.time: c739a70d-2d1e-45c1-8c1b-4d28ff2403ac | | | cpu.delta: 700ceb7c-4cff-4d92-be2f-6526321548d6 | | | cpu: 716d6128-1ea6-430d-aa9c-ceaff2a6bf32 | | | cpu_l3_cache: 3410955e-c724-48a5-ab77-c3050b8cbe6e | | | cpu_util: b148c392-37d6-4c8f-8609-e15fc15a4728 | | | disk.allocation: 9dd464a3-acf8-40fe-bd7e-3cb5fb12d7cc | | | disk.capacity: c183d0da-e5eb-4223-a42e-855675dd1ec6 | | | disk.ephemeral.size: 15d1d828-fbb4-4448-b0f2-2392dcfed5b6 | | | disk.iops: b8009e70-daee-403f-94ed-73853359a087 | | | disk.latency: 1c648176-18a6-4198-ac7f-33ee628b82a9 | | | disk.read.bytes.rate: eb35828f-312f-41ce-b0bc-cb6505e14ab7 | | | disk.read.bytes: de463be7-769b-433d-9f22-f3265e146ec8 | | | disk.read.requests.rate: 588ca440-bd73-4fa9-a00c-8af67262f4fd | | | disk.read.requests: 53e5d599-6cad-47de-b814-5cb23e8aaf24 | | | disk.root.size: cee9d8b1-181e-4974-9427-aa7adb3b96d9 | | | disk.usage: 4d724c99-7947-4c6d-9816-abbbc166f6f3 | | | disk.write.bytes.rate: 45b8da6e-0c89-4a6c-9cce-c95d49d9cc8b | | | disk.write.bytes: c7734f1b-b43a-48ee-8fe4-8a31b641b565 | | | disk.write.requests.rate: 96ba2f22-8dd6-4b89-b313-1e0882c4d0d6 | | | disk.write.requests: 553b7254-be2d-481b-9d31-b04c93dbb168 | | | memory.bandwidth.local: 187f29d4-7c70-4ae2-86d1-191d11490aad | | | memory.bandwidth.total: eb09a4fc-c202-4bc3-8c94-aa2076df7e39 | | | memory.resident: 97cfb849-2316-45a6-9545-21b1d48b0052 | | | memory.swap.in: f0378d8f-6927-4b76-8d34-a5931799a301 | | | memory.swap.out: c5fba193-1a1b-44c8-82e3-9fdc9ef21f69 | | | memory.usage: 7958d06d-7894-4ca1-8c7e-72ba572c1260 | | | memory: a35c7eab-f714-4582-aa6f-48c92d4b79cd | | | perf.cache.misses: da69636d-d210-4b7b-bea5-18d4959e95c1 | | | perf.cache.references: e1955a37-d7e4-4b12-8a2a-51de4ec59efd | | | perf.cpu.cycles: 5d325d44-b297-407a-b7db-cc9105549193 | | | perf.instructions: 973d6c6b-bbeb-4a13-96c2-390a63596bfc | | | vcpus: 646b53d0-0168-4851-b297-05d96cc03ab2 | | original_resource_id | d71cdf9a-51dc-4bba-8170-9cd95edd3f66 | | project_id | 3cee262b907b4040b26b678d7180566b | | revision_end | None | | revision_start | 2017-11-16T04:00:27.081865+00:00 | | server_group | None | | started_at | 2017-11-16T01:09:20.668344+00:00 | | type | instance | | user_id | 1dbf5787b2ee46cf9fa6a1dfea9c9996 |-----------------------
---------------------------------------------------------------------+因此,
metrics
值列出了您可以使用 Aodh 警报监控的组件,如cpu_util
。要监控 CPU 用量,您需要
cpu_util
指标。查看此指标的更多信息:$ gnocchi metric show --resource d71cdf9a-51dc-4bba-8170-9cd95edd3f66 cpu_util
------------------------------------
-------------------------------------------------------------------+ | Field | Value |------------------------------------
-------------------------------------------------------------------+ | archive_policy/aggregation_methods | std, count, min, max, sum, mean | | archive_policy/back_window | 0 | | archive_policy/definition | - points: 8640, granularity: 0:05:00, timespan: 30 days, 0:00:00 | | archive_policy/name | low | | created_by_project_id | 44adccdc32614688ae765ed4e484f389 | | created_by_user_id | c24fa60e46d14f8d847fca90531b43db | | creator | c24fa60e46d14f8d847fca90531b43db:44adccdc32614688ae765ed4e484f389 | | id | b148c392-37d6-4c8f-8609-e15fc15a4728 | | name | cpu_util | | resource/created_by_project_id | 44adccdc32614688ae765ed4e484f389 | | resource/created_by_user_id | c24fa60e46d14f8d847fca90531b43db | | resource/creator | c24fa60e46d14f8d847fca90531b43db:44adccdc32614688ae765ed4e484f389 | | resource/ended_at | None | | resource/id | d71cdf9a-51dc-4bba-8170-9cd95edd3f66 | | resource/original_resource_id | d71cdf9a-51dc-4bba-8170-9cd95edd3f66 | | resource/project_id | 3cee262b907b4040b26b678d7180566b | | resource/revision_end | None | | resource/revision_start | 2017-11-17T00:05:27.516421+00:00 | | resource/started_at | 2017-11-16T01:09:20.668344+00:00 | | resource/type | instance | | resource/user_id | 1dbf5787b2ee46cf9fa6a1dfea9c9996 | | unit | None |------------------------------------
-------------------------------------------------------------------+-
archive_policy
- 定义计算std, count, min, max, sum, mean
值的聚合间隔。
-
使用 Aodh 创建查询
cpu_util
的监控任务。此任务将根据您指定的设置触发事件。例如,在延长持续时间内,当实例的 CPU 高峰超过 80% 时引发日志条目:aodh alarm create \ --project-id 3cee262b907b4040b26b678d7180566b \ --name high-cpu \ --type gnocchi_resources_threshold \ --description 'High CPU usage' \ --metric cpu_util \ --threshold 80.0 \ --comparison-operator ge \ --aggregation-method mean \ --granularity 300 \ --evaluation-periods 1 \ --alarm-action 'log://' \ --ok-action 'log://' \ --resource-type instance \ --resource-id d71cdf9a-51dc-4bba-8170-9cd95edd3f66 +---------------------------+--------------------------------------+ | Field | Value | +---------------------------+--------------------------------------+ | aggregation_method | mean | | alarm_actions | [u'log://'] | | alarm_id | 1625015c-49b8-4e3f-9427-3c312a8615dd | | comparison_operator | ge | | description | High CPU usage | | enabled | True | | evaluation_periods | 1 | | granularity | 300 | | insufficient_data_actions | [] | | metric | cpu_util | | name | high-cpu | | ok_actions | [u'log://'] | | project_id | 3cee262b907b4040b26b678d7180566b | | repeat_actions | False | | resource_id | d71cdf9a-51dc-4bba-8170-9cd95edd3f66 | | resource_type | instance | | severity | low | | state | insufficient data | | state_reason | Not evaluated yet | | state_timestamp | 2017-11-16T05:20:48.891365 | | threshold | 80.0 | | time_constraints | [] | | timestamp | 2017-11-16T05:20:48.891365 | | type | gnocchi_resources_threshold | | user_id | 1dbf5787b2ee46cf9fa6a1dfea9c9996 | +---------------------------+--------------------------------------+
-
compare-operator
-ge
运算符定义,如果 CPU 使用率大于(或等于) 80%,警报将触发。 -
granularity
- 指标关联了一个归档策略,策略可以具有各种粒度(例如,1 小时有 5 分钟的,一个月有 1 小时的聚合)。granularity
值必须与归档策略中描述的持续时间匹配。 -
evaluation-periods
- 在警报触发前需要传递的粒度
周期数。例如,将此值设置为2
意味着,在警报触发前,CPU 使用率需要超过 80%。 [U'log://']
- 这个值会将事件记录到 Aodh 日志文件。注意您可以定义在警报触发时运行的不同操作(
alarm_actions
),以及返回正常状态(ok_actions
),如 Webhook URL。
-
要检查您的警报是否已触发,请查询警报的历史记录:
aodh alarm-history show 1625015c-49b8-4e3f-9427-3c312a8615dd --fit-width +----------------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+ | timestamp | type | detail | event_id | +----------------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+ | 2017-11-16T05:21:47.850094 | state transition | {"transition_reason": "Transition to ok due to 1 samples inside threshold, most recent: 0.0366665763", "state": "ok"} | 3b51f09d-ded1-4807-b6bb-65fdc87669e4 | +----------------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+
3.10.9. 管理资源类型
以前硬编码的 Telemetry 资源类型现在可以由 gnocchi 客户端管理。您可以使用 gnocchi 客户端创建、查看和删除资源类型,您可以使用 gnocchi API 更新或删除属性。
1.创建新的 resource-type :
$ gnocchi resource-type create testResource01 -a bla:string:True:min_length=123 +----------------+------------------------------------------------------------+ | Field | Value | +----------------+------------------------------------------------------------+ | attributes/bla | max_length=255, min_length=123, required=True, type=string | | name | testResource01 | | state | active | +----------------+------------------------------------------------------------+
2.查看 resource-type 的配置:
$ gnocchi resource-type show testResource01 +----------------+------------------------------------------------------------+ | Field | Value | +----------------+------------------------------------------------------------+ | attributes/bla | max_length=255, min_length=123, required=True, type=string | | name | testResource01 | | state | active | +----------------+------------------------------------------------------------+
3.删除 resource-type :
$ gnocchi resource-type delete testResource01
如果资源正在使用,则无法删除资源类型。