第 10 章 集群日志记录故障排除
10.1. 查看集群日志记录状态
您可以查看 Cluster Logging Operator 的状态以及多个集群日志记录组件的状态。
10.1.1. 查看 Cluster Logging Operator 的状态
您可以查看 Cluster Logging Operator 的状态。
先决条件
- 必须安装 Cluster Logging 和 Elasticsearch。
流程
进入
openshift-logging
项目。$ oc project openshift-logging
查看集群日志记录状态:
获取集群日志记录状态:
$ oc get clusterlogging instance -o yaml
输出示例
apiVersion: logging.openshift.io/v1 kind: ClusterLogging .... status: 1 collection: logs: fluentdStatus: daemonSet: fluentd 2 nodes: fluentd-2rhqp: ip-10-0-169-13.ec2.internal fluentd-6fgjh: ip-10-0-165-244.ec2.internal fluentd-6l2ff: ip-10-0-128-218.ec2.internal fluentd-54nx5: ip-10-0-139-30.ec2.internal fluentd-flpnn: ip-10-0-147-228.ec2.internal fluentd-n2frh: ip-10-0-157-45.ec2.internal pods: failed: [] notReady: [] ready: - fluentd-2rhqp - fluentd-54nx5 - fluentd-6fgjh - fluentd-6l2ff - fluentd-flpnn - fluentd-n2frh logstore: 3 elasticsearchStatus: - ShardAllocationEnabled: all cluster: activePrimaryShards: 5 activeShards: 5 initializingShards: 0 numDataNodes: 1 numNodes: 1 pendingTasks: 0 relocatingShards: 0 status: green unassignedShards: 0 clusterName: elasticsearch nodeConditions: elasticsearch-cdm-mkkdys93-1: nodeCount: 1 pods: client: failed: notReady: ready: - elasticsearch-cdm-mkkdys93-1-7f7c6-mjm7c data: failed: notReady: ready: - elasticsearch-cdm-mkkdys93-1-7f7c6-mjm7c master: failed: notReady: ready: - elasticsearch-cdm-mkkdys93-1-7f7c6-mjm7c visualization: 4 kibanaStatus: - deployment: kibana pods: failed: [] notReady: [] ready: - kibana-7fb4fd4cc9-f2nls replicaSets: - kibana-7fb4fd4cc9 replicas: 1
10.1.1.1. 情况消息示例
以下示例是来自集群日志记录实例的 Status.Nodes
部分的一些情况消息。
类似于以下内容的状态消息表示节点已超过配置的低水位线,并且没有分片将分配给此节点:
输出示例
nodes: - conditions: - lastTransitionTime: 2019-03-15T15:57:22Z message: Disk storage usage for node is 27.5gb (36.74%). Shards will be not be allocated on this node. reason: Disk Watermark Low status: "True" type: NodeStorage deploymentName: example-elasticsearch-clientdatamaster-0-1 upgradeStatus: {}
类似于以下内容的状态消息表示节点已超过配置的高水位线,并且分片将重新定位到其他节点:
输出示例
nodes: - conditions: - lastTransitionTime: 2019-03-15T16:04:45Z message: Disk storage usage for node is 27.5gb (36.74%). Shards will be relocated from this node. reason: Disk Watermark High status: "True" type: NodeStorage deploymentName: cluster-logging-operator upgradeStatus: {}
类似于以下内容的状态消息表示 CR 中的 Elasticsearch 节点选择器与集群中的任何节点都不匹配:
输出示例
Elasticsearch Status: Shard Allocation Enabled: shard allocation unknown Cluster: Active Primary Shards: 0 Active Shards: 0 Initializing Shards: 0 Num Data Nodes: 0 Num Nodes: 0 Pending Tasks: 0 Relocating Shards: 0 Status: cluster health unknown Unassigned Shards: 0 Cluster Name: elasticsearch Node Conditions: elasticsearch-cdm-mkkdys93-1: Last Transition Time: 2019-06-26T03:37:32Z Message: 0/5 nodes are available: 5 node(s) didn't match node selector. Reason: Unschedulable Status: True Type: Unschedulable elasticsearch-cdm-mkkdys93-2: Node Count: 2 Pods: Client: Failed: Not Ready: elasticsearch-cdm-mkkdys93-1-75dd69dccd-f7f49 elasticsearch-cdm-mkkdys93-2-67c64f5f4c-n58vl Ready: Data: Failed: Not Ready: elasticsearch-cdm-mkkdys93-1-75dd69dccd-f7f49 elasticsearch-cdm-mkkdys93-2-67c64f5f4c-n58vl Ready: Master: Failed: Not Ready: elasticsearch-cdm-mkkdys93-1-75dd69dccd-f7f49 elasticsearch-cdm-mkkdys93-2-67c64f5f4c-n58vl Ready:
类似于以下内容的状态消息表示请求的 PVC 无法绑定到 PV:
输出示例
Node Conditions: elasticsearch-cdm-mkkdys93-1: Last Transition Time: 2019-06-26T03:37:32Z Message: pod has unbound immediate PersistentVolumeClaims (repeated 5 times) Reason: Unschedulable Status: True Type: Unschedulable
类似于以下内容的状态消息表示无法调度 Fluentd Pod,因为节点选择器与任何节点都不匹配:
输出示例
Status: Collection: Logs: Fluentd Status: Daemon Set: fluentd Nodes: Pods: Failed: Not Ready: Ready:
10.1.2. 查看集群日志记录组件的状态
您可以查看多个集群日志记录组件的状态。
先决条件
- 必须安装 Cluster Logging 和 Elasticsearch。
流程
进入
openshift-logging
项目。$ oc project openshift-logging
查看集群日志记录环境的状态:
$ oc describe deployment cluster-logging-operator
输出示例
Name: cluster-logging-operator .... Conditions: Type Status Reason ---- ------ ------ Available True MinimumReplicasAvailable Progressing True NewReplicaSetAvailable .... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ScalingReplicaSet 62m deployment-controller Scaled up replica set cluster-logging-operator-574b8987df to 1----
查看集群日志记录副本集的状态:
获取副本集的名称:
输出示例
$ oc get replicaset
输出示例
NAME DESIRED CURRENT READY AGE cluster-logging-operator-574b8987df 1 1 1 159m elasticsearch-cdm-uhr537yu-1-6869694fb 1 1 1 157m elasticsearch-cdm-uhr537yu-2-857b6d676f 1 1 1 156m elasticsearch-cdm-uhr537yu-3-5b6fdd8cfd 1 1 1 155m kibana-5bd5544f87 1 1 1 157m
获取副本集的状态:
$ oc describe replicaset cluster-logging-operator-574b8987df
输出示例
Name: cluster-logging-operator-574b8987df .... Replicas: 1 current / 1 desired Pods Status: 1 Running / 0 Waiting / 0 Succeeded / 0 Failed .... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulCreate 66m replicaset-controller Created pod: cluster-logging-operator-574b8987df-qjhqv----