3.4. 监控 etcd 的共识延迟
通过使用 etcdctl CLI,您可以观察到 etcd 达成共识的延迟。您必须识别其中一个 etcd pod,然后检索端点健康状况。
此流程(验证和监控集群健康状况)只能在活跃的集群中运行。
先决条件
- 在规划集群部署期间,您完成了磁盘和网络测试。
流程
输入以下命令:
# oc get pods -n openshift-etcd -l app=etcd输出示例
NAME READY STATUS RESTARTS AGE etcd-m0 4/4 Running 4 8h etcd-m1 4/4 Running 4 8h etcd-m2 4/4 Running 4 8h输入以下命令。为了更好地了解 etcd 延迟,您可以精确的监视循环运行这个命令,以等待数字低于 ~66 ms 阈值。更接近的共识时间为 100 毫秒,集群可能会遇到服务影响事件和不稳定的情况。
# oc exec -ti etcd-m0 -- etcdctl endpoint health -w table输出示例
+----------------------------+--------+-------------+-------+ | ENDPOINT | HEALTH | TOOK | ERROR | +----------------------------+--------+-------------+-------+ | https://198.18.111.12:2379 | true | 3.798349ms | | | https://198.18.111.14:2379 | true | 7.389608ms | | | https://198.18.111.13:2379 | true | 6.263117ms | | +----------------------------+--------+-------------+-------+输入以下命令:
# oc exec -ti etcd-m0 -- watch -dp -c etcdctl endpoint health -w table输出示例
+----------------------------+--------+-------------+-------+ | ENDPOINT | HEALTH | TOOK | ERROR | +----------------------------+--------+-------------+-------+ | https://198.18.111.12:2379 | true | 9.533405ms | | | https://198.18.111.13:2379 | true | 4.628054ms | | | https://198.18.111.14:2379 | true | 5.803378ms | | +----------------------------+--------+-------------+-------+