3.10. 使用遥测服务进行容量计量
OpenStack 遥测服务提供可用于计费、退款和回放目的的使用量指标。这些指标数据也可供第三方应用用于计划集群上的容量,也可利用 OpenStack Heat 自动扩展虚拟实例。如需更多信息,请参阅实例自动扩展。
您可以使用 Ceilometer 和 Gnocchi 的组合来监控和警报。这支持 small-size 集群和已知限制。对于实时监控,Red Hat OpenStack Platform 附带了提供指标数据的代理,可以通过单独的监控基础架构和应用程序使用。如需更多信息,请参阅 监控工具配置。
3.10.1. 查看测量结果 复制链接链接已复制到粘贴板!
列出特定资源的所有测量结果:
# openstack metric measures show --resource-id UUID METER_NAME
仅列出对特定资源的测量结果,在时间戳范围内:
# openstack metric measures show --aggregation mean --start <START_TIME> --stop <STOP_TIME> --resource-id UUID METER_NAME
时间戳变量 <START_TIME> 和 <STOP_TIME> 使用格式 iso-dateThh:mm:ss。
3.10.2. 创建新测量结果 复制链接链接已复制到粘贴板!
您可以使用测量结果将数据发送到遥测服务,它们不需要与之前定义的计量对应。例如:
# openstack metrics measures add -m 2015-01-12T17:56:23@42 --resource-id UUID METER_NAME
3.10.3. 示例:查看云使用措施 复制链接链接已复制到粘贴板!
本例显示每个项目所有实例的平均内存用量。
# openstack metric measures aggregation --resource-type instance --groupby project_id -m memory
3.10.4. 查看现有 Alarms 复制链接链接已复制到粘贴板!
要列出现有的遥测警报,请使用 aodh 命令。例如:
# aodh alarm list
+--------------------------------------+--------------------------------------------+----------------------------+-------------------+----------+---------+
| alarm_id | type | name | state | severity | enabled |
+--------------------------------------+--------------------------------------------+----------------------------+-------------------+----------+---------+
| 922f899c-27c8-4c7d-a2cf-107be51ca90a | gnocchi_aggregation_by_resources_threshold | iops-monitor-read-requests | insufficient data | low | True |
+--------------------------------------+--------------------------------------------+----------------------------+-------------------+----------+---------+
要列出分配给资源的计量,请指定资源(实例、镜像或卷等)的 UUID。例如:
# gnocchi resource show 5e3fcbe2-7aab-475d-b42c-a440aa42e5ad
3.10.5. 创建 Alarm 复制链接链接已复制到粘贴板!
您可以使用 aodh 创建在达到阈值时激活的警报。在本例中,警报激活并在单个实例的平均 CPU 利用率超过 80% 时添加一个日志条目。查询用于隔离特定实例的 id (94619081-abf5-4f1f-81c7-9cedaa872403)进行监控:
# aodh alarm create --type gnocchi_aggregation_by_resources_threshold --name cpu_usage_high --metric cpu_util --threshold 80 --aggregation-method sum --resource-type instance --query '{"=": {"id": "94619081-abf5-4f1f-81c7-9cedaa872403"}}' --alarm-action 'log://'
+---------------------------+-------------------------------------------------------+
| Field | Value |
+---------------------------+-------------------------------------------------------+
| aggregation_method | sum |
| alarm_actions | [u'log://'] |
| alarm_id | b794adc7-ed4f-4edb-ace4-88cbe4674a94 |
| comparison_operator | eq |
| description | gnocchi_aggregation_by_resources_threshold alarm rule |
| enabled | True |
| evaluation_periods | 1 |
| granularity | 60 |
| insufficient_data_actions | [] |
| metric | cpu_util |
| name | cpu_usage_high |
| ok_actions | [] |
| project_id | 13c52c41e0e543d9841a3e761f981c20 |
| query | {"=": {"id": "94619081-abf5-4f1f-81c7-9cedaa872403"}} |
| repeat_actions | False |
| resource_type | instance |
| severity | low |
| state | insufficient data |
| state_timestamp | 2016-12-09T05:18:53.326000 |
| threshold | 80.0 |
| time_constraints | [] |
| timestamp | 2016-12-09T05:18:53.326000 |
| type | gnocchi_aggregation_by_resources_threshold |
| user_id | 32d3f2c9a234423cb52fb69d3741dbbc |
+---------------------------+-------------------------------------------------------+
若要编辑现有阈值警报,可使用 aodh alarm update 命令。例如,将警报阈值增加到 75%:
# aodh alarm update --name cpu_usage_high --threshold 75
3.10.6. 禁用或删除 Alarm 复制链接链接已复制到粘贴板!
禁用警报:
# aodh alarm update --name cpu_usage_high --enabled=false
删除警报:
# aodh alarm delete --name cpu_usage_high
3.10.7. 示例:监控实例的磁盘活动 复制链接链接已复制到粘贴板!
以下示例演示了如何使用 Aodh 警报监控特定项目中包含的所有实例的累积磁盘活动。
1.检查现有项目,再选择要监控的项目的适当 UUID。本例使用 admin 项目:
$ openstack project list
+----------------------------------+----------+
| ID | Name |
+----------------------------------+----------+
| 745d33000ac74d30a77539f8920555e7 | admin |
| 983739bb834a42ddb48124a38def8538 | services |
| be9e767afd4c4b7ead1417c6dfedde2b | demo |
+----------------------------------+----------+
2.使用项目的 UUID 创建警报,以分析 admin 项目中实例生成的所有读取请求的 sum () (查询可以通过 --query 参数进行进一步限制)。
# aodh alarm create --type gnocchi_aggregation_by_resources_threshold --name iops-monitor-read-requests --metric disk.read.requests.rate --threshold 42000 --aggregation-method sum --resource-type instance --query '{"=": {"project_id": "745d33000ac74d30a77539f8920555e7"}}'
+---------------------------+-----------------------------------------------------------+
| Field | Value |
+---------------------------+-----------------------------------------------------------+
| aggregation_method | sum |
| alarm_actions | [] |
| alarm_id | 192aba27-d823-4ede-a404-7f6b3cc12469 |
| comparison_operator | eq |
| description | gnocchi_aggregation_by_resources_threshold alarm rule |
| enabled | True |
| evaluation_periods | 1 |
| granularity | 60 |
| insufficient_data_actions | [] |
| metric | disk.read.requests.rate |
| name | iops-monitor-read-requests |
| ok_actions | [] |
| project_id | 745d33000ac74d30a77539f8920555e7 |
| query | {"=": {"project_id": "745d33000ac74d30a77539f8920555e7"}} |
| repeat_actions | False |
| resource_type | instance |
| severity | low |
| state | insufficient data |
| state_timestamp | 2016-11-08T23:41:22.919000 |
| threshold | 42000.0 |
| time_constraints | [] |
| timestamp | 2016-11-08T23:41:22.919000 |
| type | gnocchi_aggregation_by_resources_threshold |
| user_id | 8c4aea738d774967b4ef388eb41fef5e |
+---------------------------+-----------------------------------------------------------+
3.10.8. 示例:监控 CPU 用量 复制链接链接已复制到粘贴板!
如果要监控实例的性能,您需要先检查 gnocchi 数据库以确定您可以监控哪些指标,如内存或 CPU 用量。例如,对实例运行 gnocchi 资源,以识别可以监控哪些指标:
查询特定实例 UUID 的可用指标:
$ gnocchi resource show --type instance d71cdf9a-51dc-4bba-8170-9cd95edd3f66 --------------------------------------------------------------------------------------------+ | Field | Value | --------------------------------------------------------------------------------------------+ | created_by_project_id | 44adccdc32614688ae765ed4e484f389 | | created_by_user_id | c24fa60e46d14f8d847fca90531b43db | | creator | c24fa60e46d14f8d847fca90531b43db:44adccdc32614688ae765ed4e484f389 | | display_name | test-instance | | ended_at | None | | flavor_id | 14c7c918-df24-481c-b498-0d3ec57d2e51 | | flavor_name | m1.tiny | | host | overcloud-compute-0 | | id | d71cdf9a-51dc-4bba-8170-9cd95edd3f66 | | image_ref | e75dff7b-3408-45c2-9a02-61fbfbf054d7 | | metrics | compute.instance.booting.time: c739a70d-2d1e-45c1-8c1b-4d28ff2403ac | | | cpu.delta: 700ceb7c-4cff-4d92-be2f-6526321548d6 | | | cpu: 716d6128-1ea6-430d-aa9c-ceaff2a6bf32 | | | cpu_l3_cache: 3410955e-c724-48a5-ab77-c3050b8cbe6e | | | cpu_util: b148c392-37d6-4c8f-8609-e15fc15a4728 | | | disk.allocation: 9dd464a3-acf8-40fe-bd7e-3cb5fb12d7cc | | | disk.capacity: c183d0da-e5eb-4223-a42e-855675dd1ec6 | | | disk.ephemeral.size: 15d1d828-fbb4-4448-b0f2-2392dcfed5b6 | | | disk.iops: b8009e70-daee-403f-94ed-73853359a087 | | | disk.latency: 1c648176-18a6-4198-ac7f-33ee628b82a9 | | | disk.read.bytes.rate: eb35828f-312f-41ce-b0bc-cb6505e14ab7 | | | disk.read.bytes: de463be7-769b-433d-9f22-f3265e146ec8 | | | disk.read.requests.rate: 588ca440-bd73-4fa9-a00c-8af67262f4fd | | | disk.read.requests: 53e5d599-6cad-47de-b814-5cb23e8aaf24 | | | disk.root.size: cee9d8b1-181e-4974-9427-aa7adb3b96d9 | | | disk.usage: 4d724c99-7947-4c6d-9816-abbbc166f6f3 | | | disk.write.bytes.rate: 45b8da6e-0c89-4a6c-9cce-c95d49d9cc8b | | | disk.write.bytes: c7734f1b-b43a-48ee-8fe4-8a31b641b565 | | | disk.write.requests.rate: 96ba2f22-8dd6-4b89-b313-1e0882c4d0d6 | | | disk.write.requests: 553b7254-be2d-481b-9d31-b04c93dbb168 | | | memory.bandwidth.local: 187f29d4-7c70-4ae2-86d1-191d11490aad | | | memory.bandwidth.total: eb09a4fc-c202-4bc3-8c94-aa2076df7e39 | | | memory.resident: 97cfb849-2316-45a6-9545-21b1d48b0052 | | | memory.swap.in: f0378d8f-6927-4b76-8d34-a5931799a301 | | | memory.swap.out: c5fba193-1a1b-44c8-82e3-9fdc9ef21f69 | | | memory.usage: 7958d06d-7894-4ca1-8c7e-72ba572c1260 | | | memory: a35c7eab-f714-4582-aa6f-48c92d4b79cd | | | perf.cache.misses: da69636d-d210-4b7b-bea5-18d4959e95c1 | | | perf.cache.references: e1955a37-d7e4-4b12-8a2a-51de4ec59efd | | | perf.cpu.cycles: 5d325d44-b297-407a-b7db-cc9105549193 | | | perf.instructions: 973d6c6b-bbeb-4a13-96c2-390a63596bfc | | | vcpus: 646b53d0-0168-4851-b297-05d96cc03ab2 | | original_resource_id | d71cdf9a-51dc-4bba-8170-9cd95edd3f66 | | project_id | 3cee262b907b4040b26b678d7180566b | | revision_end | None | | revision_start | 2017-11-16T04:00:27.081865+00:00 | | server_group | None | | started_at | 2017-11-16T01:09:20.668344+00:00 | | type | instance | | user_id | 1dbf5787b2ee46cf9fa6a1dfea9c9996 | --------------------------------------------------------------------------------------------+因此,
指标值列出了您可以使用 Aodh 警报监控的组件,如cpu_util。要监控 CPU 用量,您需要
cpu_util指标。要查看此指标的更多信息:$ gnocchi metric show --resource d71cdf9a-51dc-4bba-8170-9cd95edd3f66 cpu_util -------------------------------------------------------------------------------------------------------+ | Field | Value | -------------------------------------------------------------------------------------------------------+ | archive_policy/aggregation_methods | std, count, min, max, sum, mean | | archive_policy/back_window | 0 | | archive_policy/definition | - points: 8640, granularity: 0:05:00, timespan: 30 days, 0:00:00 | | archive_policy/name | low | | created_by_project_id | 44adccdc32614688ae765ed4e484f389 | | created_by_user_id | c24fa60e46d14f8d847fca90531b43db | | creator | c24fa60e46d14f8d847fca90531b43db:44adccdc32614688ae765ed4e484f389 | | id | b148c392-37d6-4c8f-8609-e15fc15a4728 | | name | cpu_util | | resource/created_by_project_id | 44adccdc32614688ae765ed4e484f389 | | resource/created_by_user_id | c24fa60e46d14f8d847fca90531b43db | | resource/creator | c24fa60e46d14f8d847fca90531b43db:44adccdc32614688ae765ed4e484f389 | | resource/ended_at | None | | resource/id | d71cdf9a-51dc-4bba-8170-9cd95edd3f66 | | resource/original_resource_id | d71cdf9a-51dc-4bba-8170-9cd95edd3f66 | | resource/project_id | 3cee262b907b4040b26b678d7180566b | | resource/revision_end | None | | resource/revision_start | 2017-11-17T00:05:27.516421+00:00 | | resource/started_at | 2017-11-16T01:09:20.668344+00:00 | | resource/type | instance | | resource/user_id | 1dbf5787b2ee46cf9fa6a1dfea9c9996 | | unit | None | -------------------------------------------------------------------------------------------------------+-
archive_policy- 定义计算std、count、min、max、sum 和 mean值的聚合间隔。
-
使用 Aodh 创建查询
cpu_util的监控任务。此任务将根据您指定的设置触发事件。例如,当实例的 CPU 持续时间超过 80% 时,要引发日志条目:aodh alarm create \ --project-id 3cee262b907b4040b26b678d7180566b \ --name high-cpu \ --type gnocchi_resources_threshold \ --description 'High CPU usage' \ --metric cpu_util \ --threshold 80.0 \ --comparison-operator ge \ --aggregation-method mean \ --granularity 300 \ --evaluation-periods 1 \ --alarm-action 'log://' \ --ok-action 'log://' \ --resource-type instance \ --resource-id d71cdf9a-51dc-4bba-8170-9cd95edd3f66 +---------------------------+--------------------------------------+ | Field | Value | +---------------------------+--------------------------------------+ | aggregation_method | mean | | alarm_actions | [u'log://'] | | alarm_id | 1625015c-49b8-4e3f-9427-3c312a8615dd | | comparison_operator | ge | | description | High CPU usage | | enabled | True | | evaluation_periods | 1 | | granularity | 300 | | insufficient_data_actions | [] | | metric | cpu_util | | name | high-cpu | | ok_actions | [u'log://'] | | project_id | 3cee262b907b4040b26b678d7180566b | | repeat_actions | False | | resource_id | d71cdf9a-51dc-4bba-8170-9cd95edd3f66 | | resource_type | instance | | severity | low | | state | insufficient data | | state_reason | Not evaluated yet | | state_timestamp | 2017-11-16T05:20:48.891365 | | threshold | 80.0 | | time_constraints | [] | | timestamp | 2017-11-16T05:20:48.891365 | | type | gnocchi_resources_threshold | | user_id | 1dbf5787b2ee46cf9fa6a1dfea9c9996 | +---------------------------+--------------------------------------+-
comparison-operator-ge运算符定义 CPU 使用率高于(或等于)80% 时将触发该警报。 -
granularity- 指标关联了一个归档策略,策略可以具有各种粒度(例如,1 小时有 5 分钟的,一个月有 1 小时的聚合)。granularity值必须与归档策略中描述的持续时间匹配。 -
evaluation-periods- 在警报触发之前需要传递的粒度周期数。例如,将此值设置为2意味着,在警报触发前,CPU 用量需要超过 80% 以进行轮询。 [U'log://']- 此值会将事件记录到您的 Aodh 日志文件。注意您可以定义在警报触发时运行的不同操作(
alarm_actions),并在返回正常状态时(ok_actions),如 Webhook URL。
-
要检查是否触发了警报,请查询警报的历史记录:
aodh alarm-history show 1625015c-49b8-4e3f-9427-3c312a8615dd --fit-width +----------------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+ | timestamp | type | detail | event_id | +----------------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+ | 2017-11-16T05:21:47.850094 | state transition | {"transition_reason": "Transition to ok due to 1 samples inside threshold, most recent: 0.0366665763", "state": "ok"} | 3b51f09d-ded1-4807-b6bb-65fdc87669e4 | +----------------------------+------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+
3.10.9. 管理资源类型 复制链接链接已复制到粘贴板!
以前硬编码的 Telemetry 资源类型现在可以由 gnocchi 客户端管理。您可以使用 gnocchi 客户端创建、查看和删除资源类型,您可以使用 gnocchi API 来更新或删除属性。
1.创建新 资源类型 :
$ gnocchi resource-type create testResource01 -a bla:string:True:min_length=123
+----------------+------------------------------------------------------------+
| Field | Value |
+----------------+------------------------------------------------------------+
| attributes/bla | max_length=255, min_length=123, required=True, type=string |
| name | testResource01 |
| state | active |
+----------------+------------------------------------------------------------+
2.检查 resource-type 的配置:
$ gnocchi resource-type show testResource01
+----------------+------------------------------------------------------------+
| Field | Value |
+----------------+------------------------------------------------------------+
| attributes/bla | max_length=255, min_length=123, required=True, type=string |
| name | testResource01 |
| state | active |
+----------------+------------------------------------------------------------+
3.删除 resource-type :
$ gnocchi resource-type delete testResource01
如果资源正在使用,则无法删除资源类型。