Chapter 10. Troubleshooting cluster logging

10.1. Viewing cluster logging status
Copy link

You can view the status of the Cluster Logging Operator and for a number of cluster logging components.

10.1.1. Viewing the status of the Cluster Logging Operator
Copy link

You can view the status of your Cluster Logging Operator.

Prerequisites

Cluster logging and Elasticsearch must be installed.

Procedure

Change to the openshift-logging project.
```
oc project openshift-logging
```
```
$ oc project openshift-logging
```
Copy to Clipboard Toggle word wrap

To view the cluster logging status:

Get the cluster logging status:

oc get clusterlogging instance -o yaml

$ oc get clusterlogging instance -o yaml

Copy to Clipboard

Toggle word wrap

Example output

apiVersion: logging.openshift.io/v1
kind: ClusterLogging

....

status:  
  collection:
    logs:
      fluentdStatus:
        daemonSet: fluentd  
        nodes:
          fluentd-2rhqp: ip-10-0-169-13.ec2.internal
          fluentd-6fgjh: ip-10-0-165-244.ec2.internal
          fluentd-6l2ff: ip-10-0-128-218.ec2.internal
          fluentd-54nx5: ip-10-0-139-30.ec2.internal
          fluentd-flpnn: ip-10-0-147-228.ec2.internal
          fluentd-n2frh: ip-10-0-157-45.ec2.internal
        pods:
          failed: []
          notReady: []
          ready:
          - fluentd-2rhqp
          - fluentd-54nx5
          - fluentd-6fgjh
          - fluentd-6l2ff
          - fluentd-flpnn
          - fluentd-n2frh
  logstore: 
    elasticsearchStatus:
    - ShardAllocationEnabled:  all
      cluster:
        activePrimaryShards:    5
        activeShards:           5
        initializingShards:     0
        numDataNodes:           1
        numNodes:               1
        pendingTasks:           0
        relocatingShards:       0
        status:                 green
        unassignedShards:       0
      clusterName:             elasticsearch
      nodeConditions:
        elasticsearch-cdm-mkkdys93-1:
      nodeCount:  1
      pods:
        client:
          failed:
          notReady:
          ready:
          - elasticsearch-cdm-mkkdys93-1-7f7c6-mjm7c
        data:
          failed:
          notReady:
          ready:
          - elasticsearch-cdm-mkkdys93-1-7f7c6-mjm7c
        master:
          failed:
          notReady:
          ready:
          - elasticsearch-cdm-mkkdys93-1-7f7c6-mjm7c
visualization:  
    kibanaStatus:
    - deployment: kibana
      pods:
        failed: []
        notReady: []
        ready:
        - kibana-7fb4fd4cc9-f2nls
      replicaSets:
      - kibana-7fb4fd4cc9
      replicas: 1

apiVersion: logging.openshift.io/v1
kind: ClusterLogging

....

status:

1


  collection:
    logs:
      fluentdStatus:
        daemonSet: fluentd

2


        nodes:
          fluentd-2rhqp: ip-10-0-169-13.ec2.internal
          fluentd-6fgjh: ip-10-0-165-244.ec2.internal
          fluentd-6l2ff: ip-10-0-128-218.ec2.internal
          fluentd-54nx5: ip-10-0-139-30.ec2.internal
          fluentd-flpnn: ip-10-0-147-228.ec2.internal
          fluentd-n2frh: ip-10-0-157-45.ec2.internal
        pods:
          failed: []
          notReady: []
          ready:
          - fluentd-2rhqp
          - fluentd-54nx5
          - fluentd-6fgjh
          - fluentd-6l2ff
          - fluentd-flpnn
          - fluentd-n2frh
  logstore:

3


    elasticsearchStatus:
    - ShardAllocationEnabled:  all
      cluster:
        activePrimaryShards:    5
        activeShards:           5
        initializingShards:     0
        numDataNodes:           1
        numNodes:               1
        pendingTasks:           0
        relocatingShards:       0
        status:                 green
        unassignedShards:       0
      clusterName:             elasticsearch
      nodeConditions:
        elasticsearch-cdm-mkkdys93-1:
      nodeCount:  1
      pods:
        client:
          failed:
          notReady:
          ready:
          - elasticsearch-cdm-mkkdys93-1-7f7c6-mjm7c
        data:
          failed:
          notReady:
          ready:
          - elasticsearch-cdm-mkkdys93-1-7f7c6-mjm7c
        master:
          failed:
          notReady:
          ready:
          - elasticsearch-cdm-mkkdys93-1-7f7c6-mjm7c
visualization:

4


    kibanaStatus:
    - deployment: kibana
      pods:
        failed: []
        notReady: []
        ready:
        - kibana-7fb4fd4cc9-f2nls
      replicaSets:
      - kibana-7fb4fd4cc9
      replicas: 1

Copy to Clipboard

Toggle word wrap

1: In the output, the cluster status fields appear in the status stanza.
2: Information on the Fluentd pods.
3: Information on the Elasticsearch pods, including Elasticsearch cluster health, green, yellow, or red.
4: Information on the Kibana pods.

10.1.1.1. Example condition messages
Copy link

The following are examples of some condition messages from the Status.Nodes section of the cluster logging instance.

A status message similar to the following indicates a node has exceeded the configured low watermark and no shard will be allocated to this node:

Example output

  nodes:
  - conditions:
    - lastTransitionTime: 2019-03-15T15:57:22Z
      message: Disk storage usage for node is 27.5gb (36.74%). Shards will be not
        be allocated on this node.
      reason: Disk Watermark Low
      status: "True"
      type: NodeStorage
    deploymentName: example-elasticsearch-clientdatamaster-0-1
    upgradeStatus: {}

  nodes:
  - conditions:
    - lastTransitionTime: 2019-03-15T15:57:22Z
      message: Disk storage usage for node is 27.5gb (36.74%). Shards will be not
        be allocated on this node.
      reason: Disk Watermark Low
      status: "True"
      type: NodeStorage
    deploymentName: example-elasticsearch-clientdatamaster-0-1
    upgradeStatus: {}

Copy to Clipboard

Toggle word wrap

A status message similar to the following indicates a node has exceeded the configured high watermark and shards will be relocated to other nodes:

Example output

  nodes:
  - conditions:
    - lastTransitionTime: 2019-03-15T16:04:45Z
      message: Disk storage usage for node is 27.5gb (36.74%). Shards will be relocated
        from this node.
      reason: Disk Watermark High
      status: "True"
      type: NodeStorage
    deploymentName: cluster-logging-operator
    upgradeStatus: {}

  nodes:
  - conditions:
    - lastTransitionTime: 2019-03-15T16:04:45Z
      message: Disk storage usage for node is 27.5gb (36.74%). Shards will be relocated
        from this node.
      reason: Disk Watermark High
      status: "True"
      type: NodeStorage
    deploymentName: cluster-logging-operator
    upgradeStatus: {}

Copy to Clipboard

Toggle word wrap

A status message similar to the following indicates the Elasticsearch node selector in the CR does not match any nodes in the cluster:

Example output

    Elasticsearch Status:
      Shard Allocation Enabled:  shard allocation unknown
      Cluster:
        Active Primary Shards:  0
        Active Shards:          0
        Initializing Shards:    0
        Num Data Nodes:         0
        Num Nodes:              0
        Pending Tasks:          0
        Relocating Shards:      0
        Status:                 cluster health unknown
        Unassigned Shards:      0
      Cluster Name:             elasticsearch
      Node Conditions:
        elasticsearch-cdm-mkkdys93-1:
          Last Transition Time:  2019-06-26T03:37:32Z
          Message:               0/5 nodes are available: 5 node(s) didn't match node selector.
          Reason:                Unschedulable
          Status:                True
          Type:                  Unschedulable
        elasticsearch-cdm-mkkdys93-2:
      Node Count:  2
      Pods:
        Client:
          Failed:
          Not Ready:
            elasticsearch-cdm-mkkdys93-1-75dd69dccd-f7f49
            elasticsearch-cdm-mkkdys93-2-67c64f5f4c-n58vl
          Ready:
        Data:
          Failed:
          Not Ready:
            elasticsearch-cdm-mkkdys93-1-75dd69dccd-f7f49
            elasticsearch-cdm-mkkdys93-2-67c64f5f4c-n58vl
          Ready:
        Master:
          Failed:
          Not Ready:
            elasticsearch-cdm-mkkdys93-1-75dd69dccd-f7f49
            elasticsearch-cdm-mkkdys93-2-67c64f5f4c-n58vl
          Ready:

    Elasticsearch Status:
      Shard Allocation Enabled:  shard allocation unknown
      Cluster:
        Active Primary Shards:  0
        Active Shards:          0
        Initializing Shards:    0
        Num Data Nodes:         0
        Num Nodes:              0
        Pending Tasks:          0
        Relocating Shards:      0
        Status:                 cluster health unknown
        Unassigned Shards:      0
      Cluster Name:             elasticsearch
      Node Conditions:
        elasticsearch-cdm-mkkdys93-1:
          Last Transition Time:  2019-06-26T03:37:32Z
          Message:               0/5 nodes are available: 5 node(s) didn't match node selector.
          Reason:                Unschedulable
          Status:                True
          Type:                  Unschedulable
        elasticsearch-cdm-mkkdys93-2:
      Node Count:  2
      Pods:
        Client:
          Failed:
          Not Ready:
            elasticsearch-cdm-mkkdys93-1-75dd69dccd-f7f49
            elasticsearch-cdm-mkkdys93-2-67c64f5f4c-n58vl
          Ready:
        Data:
          Failed:
          Not Ready:
            elasticsearch-cdm-mkkdys93-1-75dd69dccd-f7f49
            elasticsearch-cdm-mkkdys93-2-67c64f5f4c-n58vl
          Ready:
        Master:
          Failed:
          Not Ready:
            elasticsearch-cdm-mkkdys93-1-75dd69dccd-f7f49
            elasticsearch-cdm-mkkdys93-2-67c64f5f4c-n58vl
          Ready:

Copy to Clipboard

Toggle word wrap

A status message similar to the following indicates that the requested PVC could not bind to PV:

Example output

      Node Conditions:
        elasticsearch-cdm-mkkdys93-1:
          Last Transition Time:  2019-06-26T03:37:32Z
          Message:               pod has unbound immediate PersistentVolumeClaims (repeated 5 times)
          Reason:                Unschedulable
          Status:                True
          Type:                  Unschedulable

      Node Conditions:
        elasticsearch-cdm-mkkdys93-1:
          Last Transition Time:  2019-06-26T03:37:32Z
          Message:               pod has unbound immediate PersistentVolumeClaims (repeated 5 times)
          Reason:                Unschedulable
          Status:                True
          Type:                  Unschedulable

Copy to Clipboard

Toggle word wrap

A status message similar to the following indicates that the Fluentd pods cannot be scheduled because the node selector did not match any nodes:

Example output

Status:
  Collection:
    Logs:
      Fluentd Status:
        Daemon Set:  fluentd
        Nodes:
        Pods:
          Failed:
          Not Ready:
          Ready:

Status:
  Collection:
    Logs:
      Fluentd Status:
        Daemon Set:  fluentd
        Nodes:
        Pods:
          Failed:
          Not Ready:
          Ready:

Copy to Clipboard

Toggle word wrap

10.1.2. Viewing the status of cluster logging components
Copy link

You can view the status for a number of cluster logging components.

Prerequisites

Cluster logging and Elasticsearch must be installed.

Procedure

Change to the openshift-logging project.
```
oc project openshift-logging
```
```
$ oc project openshift-logging
```
Copy to Clipboard Toggle word wrap

View the status of the cluster logging environment:

oc describe deployment cluster-logging-operator

$ oc describe deployment cluster-logging-operator

Copy to Clipboard

Toggle word wrap

Example output

Name:                   cluster-logging-operator

....

Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable

....

Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  62m   deployment-controller  Scaled up replica set cluster-logging-operator-574b8987df to 1----

Name:                   cluster-logging-operator

....

Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable

....

Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  62m   deployment-controller  Scaled up replica set cluster-logging-operator-574b8987df to 1----

Copy to Clipboard

Toggle word wrap

View the status of the cluster logging replica set:

Get the name of a replica set:

Example output

oc get replicaset

$ oc get replicaset

Copy to Clipboard

Toggle word wrap

Example output

NAME                                      DESIRED   CURRENT   READY   AGE
cluster-logging-operator-574b8987df       1         1         1       159m
elasticsearch-cdm-uhr537yu-1-6869694fb    1         1         1       157m
elasticsearch-cdm-uhr537yu-2-857b6d676f   1         1         1       156m
elasticsearch-cdm-uhr537yu-3-5b6fdd8cfd   1         1         1       155m
kibana-5bd5544f87                         1         1         1       157m

NAME                                      DESIRED   CURRENT   READY   AGE
cluster-logging-operator-574b8987df       1         1         1       159m
elasticsearch-cdm-uhr537yu-1-6869694fb    1         1         1       157m
elasticsearch-cdm-uhr537yu-2-857b6d676f   1         1         1       156m
elasticsearch-cdm-uhr537yu-3-5b6fdd8cfd   1         1         1       155m
kibana-5bd5544f87                         1         1         1       157m

Copy to Clipboard

Toggle word wrap

Get the status of the replica set:

oc describe replicaset cluster-logging-operator-574b8987df

$ oc describe replicaset cluster-logging-operator-574b8987df

Copy to Clipboard

Toggle word wrap

Example output

Name:           cluster-logging-operator-574b8987df

....

Replicas:       1 current / 1 desired
Pods Status:    1 Running / 0 Waiting / 0 Succeeded / 0 Failed

....

Events:
  Type    Reason            Age   From                   Message
  ----    ------            ----  ----                   -------
  Normal  SuccessfulCreate  66m   replicaset-controller  Created pod: cluster-logging-operator-574b8987df-qjhqv----

Name:           cluster-logging-operator-574b8987df

....

Replicas:       1 current / 1 desired
Pods Status:    1 Running / 0 Waiting / 0 Succeeded / 0 Failed

....

Events:
  Type    Reason            Age   From                   Message
  ----    ------            ----  ----                   -------
  Normal  SuccessfulCreate  66m   replicaset-controller  Created pod: cluster-logging-operator-574b8987df-qjhqv----

Copy to Clipboard

Toggle word wrap

10.2. Viewing the status of the log store
Copy link

You can view the status of the OpenShift Elasticsearch Operator and for a number of Elasticsearch components.

10.2.1. Viewing the status of the log store
Copy link

You can view the status of your log store.

Prerequisites

Cluster logging and Elasticsearch must be installed.

Procedure

Change to the openshift-logging project.
```
oc project openshift-logging
```
```
$ oc project openshift-logging
```
Copy to Clipboard Toggle word wrap

To view the status:

Get the name of the log store instance:
```
oc get Elasticsearch
```
```
$ oc get Elasticsearch
```
Copy to Clipboard Toggle word wrap
Example output
```
NAME            AGE
elasticsearch   5h9m
```
```
NAME            AGE
elasticsearch   5h9m
```
Copy to Clipboard Toggle word wrap

Get the log store status:

oc get Elasticsearch <Elasticsearch-instance> -o yaml

$ oc get Elasticsearch <Elasticsearch-instance> -o yaml

Copy to Clipboard

Toggle word wrap

For example:

oc get Elasticsearch elasticsearch -n openshift-logging -o yaml

$ oc get Elasticsearch elasticsearch -n openshift-logging -o yaml

Copy to Clipboard

Toggle word wrap

The output includes information similar to the following:

Example output

status: 
  cluster: 
    activePrimaryShards: 30
    activeShards: 60
    initializingShards: 0
    numDataNodes: 3
    numNodes: 3
    pendingTasks: 0
    relocatingShards: 0
    status: green
    unassignedShards: 0
  clusterHealth: ""
  conditions: [] 
  nodes: 
  - deploymentName: elasticsearch-cdm-zjf34ved-1
    upgradeStatus: {}
  - deploymentName: elasticsearch-cdm-zjf34ved-2
    upgradeStatus: {}
  - deploymentName: elasticsearch-cdm-zjf34ved-3
    upgradeStatus: {}
  pods: 
    client:
      failed: []
      notReady: []
      ready:
      - elasticsearch-cdm-zjf34ved-1-6d7fbf844f-sn422
      - elasticsearch-cdm-zjf34ved-2-dfbd988bc-qkzjz
      - elasticsearch-cdm-zjf34ved-3-c8f566f7c-t7zkt
    data:
      failed: []
      notReady: []
      ready:
      - elasticsearch-cdm-zjf34ved-1-6d7fbf844f-sn422
      - elasticsearch-cdm-zjf34ved-2-dfbd988bc-qkzjz
      - elasticsearch-cdm-zjf34ved-3-c8f566f7c-t7zkt
    master:
      failed: []
      notReady: []
      ready:
      - elasticsearch-cdm-zjf34ved-1-6d7fbf844f-sn422
      - elasticsearch-cdm-zjf34ved-2-dfbd988bc-qkzjz
      - elasticsearch-cdm-zjf34ved-3-c8f566f7c-t7zkt
  shardAllocationEnabled: all

status:

1


  cluster:

2


    activePrimaryShards: 30
    activeShards: 60
    initializingShards: 0
    numDataNodes: 3
    numNodes: 3
    pendingTasks: 0
    relocatingShards: 0
    status: green
    unassignedShards: 0
  clusterHealth: ""
  conditions: []

3


  nodes:

4


  - deploymentName: elasticsearch-cdm-zjf34ved-1
    upgradeStatus: {}
  - deploymentName: elasticsearch-cdm-zjf34ved-2
    upgradeStatus: {}
  - deploymentName: elasticsearch-cdm-zjf34ved-3
    upgradeStatus: {}
  pods:

5


    client:
      failed: []
      notReady: []
      ready:
      - elasticsearch-cdm-zjf34ved-1-6d7fbf844f-sn422
      - elasticsearch-cdm-zjf34ved-2-dfbd988bc-qkzjz
      - elasticsearch-cdm-zjf34ved-3-c8f566f7c-t7zkt
    data:
      failed: []
      notReady: []
      ready:
      - elasticsearch-cdm-zjf34ved-1-6d7fbf844f-sn422
      - elasticsearch-cdm-zjf34ved-2-dfbd988bc-qkzjz
      - elasticsearch-cdm-zjf34ved-3-c8f566f7c-t7zkt
    master:
      failed: []
      notReady: []
      ready:
      - elasticsearch-cdm-zjf34ved-1-6d7fbf844f-sn422
      - elasticsearch-cdm-zjf34ved-2-dfbd988bc-qkzjz
      - elasticsearch-cdm-zjf34ved-3-c8f566f7c-t7zkt
  shardAllocationEnabled: all

Copy to Clipboard

Toggle word wrap

1

In the output, the cluster status fields appear in the status stanza.

2

The status of the log store:

The number of active primary shards.
The number of active shards.
The number of shards that are initializing.
The number of log store data nodes.
The total number of log store nodes.
The number of pending tasks.
The log store status: green, red, yellow.
The number of unassigned shards.

3

Any status conditions, if present. The log store status indicates the reasons from the scheduler if a pod could not be placed. Any events related to the following conditions are shown:

Container Waiting for both the log store and proxy containers.
Container Terminated for both the log store and proxy containers.
Pod unschedulable. Also, a condition is shown for a number of issues, see Example condition messages.

4

The log store nodes in the cluster, with upgradeStatus.

5

The log store client, data, and master pods in the cluster, listed under 'failed`, notReady or ready state.

10.2.1.1. Example condition messages
Copy link

The following are examples of some condition messages from the Status section of the Elasticsearch instance.

This status message indicates a node has exceeded the configured low watermark and no shard will be allocated to this node.

status:
  nodes:
  - conditions:
    - lastTransitionTime: 2019-03-15T15:57:22Z
      message: Disk storage usage for node is 27.5gb (36.74%). Shards will be not
        be allocated on this node.
      reason: Disk Watermark Low
      status: "True"
      type: NodeStorage
    deploymentName: example-elasticsearch-cdm-0-1
    upgradeStatus: {}

status:
  nodes:
  - conditions:
    - lastTransitionTime: 2019-03-15T15:57:22Z
      message: Disk storage usage for node is 27.5gb (36.74%). Shards will be not
        be allocated on this node.
      reason: Disk Watermark Low
      status: "True"
      type: NodeStorage
    deploymentName: example-elasticsearch-cdm-0-1
    upgradeStatus: {}

Copy to Clipboard

Toggle word wrap

This status message indicates a node has exceeded the configured high watermark and shards will be relocated to other nodes.

status:
  nodes:
  - conditions:
    - lastTransitionTime: 2019-03-15T16:04:45Z
      message: Disk storage usage for node is 27.5gb (36.74%). Shards will be relocated
        from this node.
      reason: Disk Watermark High
      status: "True"
      type: NodeStorage
    deploymentName: example-elasticsearch-cdm-0-1
    upgradeStatus: {}

status:
  nodes:
  - conditions:
    - lastTransitionTime: 2019-03-15T16:04:45Z
      message: Disk storage usage for node is 27.5gb (36.74%). Shards will be relocated
        from this node.
      reason: Disk Watermark High
      status: "True"
      type: NodeStorage
    deploymentName: example-elasticsearch-cdm-0-1
    upgradeStatus: {}

Copy to Clipboard

Toggle word wrap

This status message indicates the log store node selector in the CR does not match any nodes in the cluster:

status:
    nodes:
    - conditions:
      - lastTransitionTime: 2019-04-10T02:26:24Z
        message: '0/8 nodes are available: 8 node(s) didn''t match node selector.'
        reason: Unschedulable
        status: "True"
        type: Unschedulable

status:
    nodes:
    - conditions:
      - lastTransitionTime: 2019-04-10T02:26:24Z
        message: '0/8 nodes are available: 8 node(s) didn''t match node selector.'
        reason: Unschedulable
        status: "True"
        type: Unschedulable

Copy to Clipboard

Toggle word wrap

This status message indicates that the log store CR uses a non-existent PVC.

status:
   nodes:
   - conditions:
     - last Transition Time:  2019-04-10T05:55:51Z
       message:               pod has unbound immediate PersistentVolumeClaims (repeated 5 times)
       reason:                Unschedulable
       status:                True
       type:                  Unschedulable

status:
   nodes:
   - conditions:
     - last Transition Time:  2019-04-10T05:55:51Z
       message:               pod has unbound immediate PersistentVolumeClaims (repeated 5 times)
       reason:                Unschedulable
       status:                True
       type:                  Unschedulable

Copy to Clipboard

Toggle word wrap

This status message indicates that your log store cluster does not have enough nodes to support your log store redundancy policy.

status:
  clusterHealth: ""
  conditions:
  - lastTransitionTime: 2019-04-17T20:01:31Z
    message: Wrong RedundancyPolicy selected. Choose different RedundancyPolicy or
      add more nodes with data roles
    reason: Invalid Settings
    status: "True"
    type: InvalidRedundancy

status:
  clusterHealth: ""
  conditions:
  - lastTransitionTime: 2019-04-17T20:01:31Z
    message: Wrong RedundancyPolicy selected. Choose different RedundancyPolicy or
      add more nodes with data roles
    reason: Invalid Settings
    status: "True"
    type: InvalidRedundancy

Copy to Clipboard

Toggle word wrap

This status message indicates your cluster has too many control plane nodes (also known as the master nodes):

status:
  clusterHealth: green
  conditions:
    - lastTransitionTime: '2019-04-17T20:12:34Z'
      message: >-
        Invalid master nodes count. Please ensure there are no more than 3 total
        nodes with master roles
      reason: Invalid Settings
      status: 'True'
      type: InvalidMasters

status:
  clusterHealth: green
  conditions:
    - lastTransitionTime: '2019-04-17T20:12:34Z'
      message: >-
        Invalid master nodes count. Please ensure there are no more than 3 total
        nodes with master roles
      reason: Invalid Settings
      status: 'True'
      type: InvalidMasters

Copy to Clipboard

Toggle word wrap

10.2.2. Viewing the status of the log store components
Copy link

You can view the status for a number of the log store components.

Elasticsearch indices

You can view the status of the Elasticsearch indices.

Get the name of an Elasticsearch pod:

oc get pods --selector component=elasticsearch -o name

$ oc get pods --selector component=elasticsearch -o name

Copy to Clipboard

Toggle word wrap

Example output

pod/elasticsearch-cdm-1godmszn-1-6f8495-vp4lw
pod/elasticsearch-cdm-1godmszn-2-5769cf-9ms2n
pod/elasticsearch-cdm-1godmszn-3-f66f7d-zqkz7

pod/elasticsearch-cdm-1godmszn-1-6f8495-vp4lw
pod/elasticsearch-cdm-1godmszn-2-5769cf-9ms2n
pod/elasticsearch-cdm-1godmszn-3-f66f7d-zqkz7

Copy to Clipboard

Toggle word wrap

Get the status of the indices:

oc exec elasticsearch-cdm-4vjor49p-2-6d4d7db474-q2w7z -- indices

$ oc exec elasticsearch-cdm-4vjor49p-2-6d4d7db474-q2w7z -- indices

Copy to Clipboard

Toggle word wrap

Example output

Defaulting container name to elasticsearch.
Use 'oc describe pod/elasticsearch-cdm-4vjor49p-2-6d4d7db474-q2w7z -n openshift-logging' to see all of the containers in this pod.

green  open   infra-000002                                                     S4QANnf1QP6NgCegfnrnbQ   3   1     119926            0        157             78
green  open   audit-000001                                                     8_EQx77iQCSTzFOXtxRqFw   3   1          0            0          0              0
green  open   .security                                                        iDjscH7aSUGhIdq0LheLBQ   1   1          5            0          0              0
green  open   .kibana_-377444158_kubeadmin                                     yBywZ9GfSrKebz5gWBZbjw   3   1          1            0          0              0
green  open   infra-000001                                                     z6Dpe__ORgiopEpW6Yl44A   3   1     871000            0        874            436
green  open   app-000001                                                       hIrazQCeSISewG3c2VIvsQ   3   1       2453            0          3              1
green  open   .kibana_1                                                        JCitcBMSQxKOvIq6iQW6wg   1   1          0            0          0              0
green  open   .kibana_-1595131456_user1                                        gIYFIEGRRe-ka0W3okS-mQ   3   1          1            0          0              0

Defaulting container name to elasticsearch.
Use 'oc describe pod/elasticsearch-cdm-4vjor49p-2-6d4d7db474-q2w7z -n openshift-logging' to see all of the containers in this pod.

green  open   infra-000002                                                     S4QANnf1QP6NgCegfnrnbQ   3   1     119926            0        157             78
green  open   audit-000001                                                     8_EQx77iQCSTzFOXtxRqFw   3   1          0            0          0              0
green  open   .security                                                        iDjscH7aSUGhIdq0LheLBQ   1   1          5            0          0              0
green  open   .kibana_-377444158_kubeadmin                                     yBywZ9GfSrKebz5gWBZbjw   3   1          1            0          0              0
green  open   infra-000001                                                     z6Dpe__ORgiopEpW6Yl44A   3   1     871000            0        874            436
green  open   app-000001                                                       hIrazQCeSISewG3c2VIvsQ   3   1       2453            0          3              1
green  open   .kibana_1                                                        JCitcBMSQxKOvIq6iQW6wg   1   1          0            0          0              0
green  open   .kibana_-1595131456_user1                                        gIYFIEGRRe-ka0W3okS-mQ   3   1          1            0          0              0

Copy to Clipboard

Toggle word wrap

Log store pods

You can view the status of the pods that host the log store.

Get the name of a pod:

oc get pods --selector component=elasticsearch -o name

$ oc get pods --selector component=elasticsearch -o name

Copy to Clipboard

Toggle word wrap

Example output

pod/elasticsearch-cdm-1godmszn-1-6f8495-vp4lw
pod/elasticsearch-cdm-1godmszn-2-5769cf-9ms2n
pod/elasticsearch-cdm-1godmszn-3-f66f7d-zqkz7

pod/elasticsearch-cdm-1godmszn-1-6f8495-vp4lw
pod/elasticsearch-cdm-1godmszn-2-5769cf-9ms2n
pod/elasticsearch-cdm-1godmszn-3-f66f7d-zqkz7

Copy to Clipboard

Toggle word wrap

Get the status of a pod:

oc describe pod elasticsearch-cdm-1godmszn-1-6f8495-vp4lw

$ oc describe pod elasticsearch-cdm-1godmszn-1-6f8495-vp4lw

Copy to Clipboard

Toggle word wrap

The output includes the following status information:

Example output

....
Status:             Running

....

Containers:
  elasticsearch:
    Container ID:   cri-o://b7d44e0a9ea486e27f47763f5bb4c39dfd2
    State:          Running
      Started:      Mon, 08 Jun 2020 10:17:56 -0400
    Ready:          True
    Restart Count:  0
    Readiness:  exec [/usr/share/elasticsearch/probe/readiness.sh] delay=10s timeout=30s period=5s #success=1 #failure=3

....

  proxy:
    Container ID:  cri-o://3f77032abaddbb1652c116278652908dc01860320b8a4e741d06894b2f8f9aa1
    State:          Running
      Started:      Mon, 08 Jun 2020 10:18:38 -0400
    Ready:          True
    Restart Count:  0

....

Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True

....

Events:          <none>

....
Status:             Running

....

Containers:
  elasticsearch:
    Container ID:   cri-o://b7d44e0a9ea486e27f47763f5bb4c39dfd2
    State:          Running
      Started:      Mon, 08 Jun 2020 10:17:56 -0400
    Ready:          True
    Restart Count:  0
    Readiness:  exec [/usr/share/elasticsearch/probe/readiness.sh] delay=10s timeout=30s period=5s #success=1 #failure=3

....

  proxy:
    Container ID:  cri-o://3f77032abaddbb1652c116278652908dc01860320b8a4e741d06894b2f8f9aa1
    State:          Running
      Started:      Mon, 08 Jun 2020 10:18:38 -0400
    Ready:          True
    Restart Count:  0

....

Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True

....

Events:          <none>

Copy to Clipboard

Toggle word wrap

Log storage pod deployment configuration

You can view the status of the log store deployment configuration.

Get the name of a deployment configuration:

oc get deployment --selector component=elasticsearch -o name

$ oc get deployment --selector component=elasticsearch -o name

Copy to Clipboard

Toggle word wrap

Example output

deployment.extensions/elasticsearch-cdm-1gon-1
deployment.extensions/elasticsearch-cdm-1gon-2
deployment.extensions/elasticsearch-cdm-1gon-3

deployment.extensions/elasticsearch-cdm-1gon-1
deployment.extensions/elasticsearch-cdm-1gon-2
deployment.extensions/elasticsearch-cdm-1gon-3

Copy to Clipboard

Toggle word wrap

Get the deployment configuration status:

oc describe deployment elasticsearch-cdm-1gon-1

$ oc describe deployment elasticsearch-cdm-1gon-1

Copy to Clipboard

Toggle word wrap

The output includes the following status information:

Example output

....
  Containers:
   elasticsearch:
    Image:      registry.redhat.io/openshift4/ose-logging-elasticsearch5:v4.3
    Readiness:  exec [/usr/share/elasticsearch/probe/readiness.sh] delay=10s timeout=30s period=5s #success=1 #failure=3

....

Conditions:
  Type           Status   Reason
  ----           ------   ------
  Progressing    Unknown  DeploymentPaused
  Available      True     MinimumReplicasAvailable

....

Events:          <none>

....
  Containers:
   elasticsearch:
    Image:      registry.redhat.io/openshift4/ose-logging-elasticsearch5:v4.3
    Readiness:  exec [/usr/share/elasticsearch/probe/readiness.sh] delay=10s timeout=30s period=5s #success=1 #failure=3

....

Conditions:
  Type           Status   Reason
  ----           ------   ------
  Progressing    Unknown  DeploymentPaused
  Available      True     MinimumReplicasAvailable

....

Events:          <none>

Copy to Clipboard

Toggle word wrap

Log store replica set

You can view the status of the log store replica set.

Get the name of a replica set:

oc get replicaSet --selector component=elasticsearch -o name

$ oc get replicaSet --selector component=elasticsearch -o name

replicaset.extensions/elasticsearch-cdm-1gon-1-6f8495
replicaset.extensions/elasticsearch-cdm-1gon-2-5769cf
replicaset.extensions/elasticsearch-cdm-1gon-3-f66f7d

Copy to Clipboard

Toggle word wrap

Get the status of the replica set:

oc describe replicaSet elasticsearch-cdm-1gon-1-6f8495

$ oc describe replicaSet elasticsearch-cdm-1gon-1-6f8495

Copy to Clipboard

Toggle word wrap

The output includes the following status information:

Example output

....
  Containers:
   elasticsearch:
    Image:      registry.redhat.io/openshift4/ose-logging-elasticsearch6@sha256:4265742c7cdd85359140e2d7d703e4311b6497eec7676957f455d6908e7b1c25
    Readiness:  exec [/usr/share/elasticsearch/probe/readiness.sh] delay=10s timeout=30s period=5s #success=1 #failure=3

....

Events:          <none>

....
  Containers:
   elasticsearch:
    Image:      registry.redhat.io/openshift4/ose-logging-elasticsearch6@sha256:4265742c7cdd85359140e2d7d703e4311b6497eec7676957f455d6908e7b1c25
    Readiness:  exec [/usr/share/elasticsearch/probe/readiness.sh] delay=10s timeout=30s period=5s #success=1 #failure=3

....

Events:          <none>

Copy to Clipboard

Toggle word wrap

10.3. Understanding cluster logging alerts
Copy link

All of the logging collector alerts are listed on the Alerting UI of the OpenShift Container Platform web console.

10.3.1. Viewing logging collector alerts
Copy link

Alerts are shown in the OpenShift Container Platform web console, on the Alerts tab of the Alerting UI. Alerts are in one of the following states:

Firing. The alert condition is true for the duration of the timeout. Click the Options menu at the end of the firing alert to view more information or silence the alert.
Pending The alert condition is currently true, but the timeout has not been reached.
Not Firing. The alert is not currently triggered.

Procedure

To view cluster logging and other OpenShift Container Platform alerts:

In the OpenShift Container Platform console, click Monitoring Alerting.
Click the Alerts tab. The alerts are listed, based on the filters selected.

10.3.2. About logging collector alerts
Copy link

The following alerts are generated by the logging collector. You can view these alerts in the OpenShift Container Platform web console, on the Alerts page of the Alerting UI.

Expand

Table 10.1. Fluentd Prometheus alerts
Alert	Message	Description	Severity
`FluentDHighErrorRate`	`<value> of records have resulted in an error by fluentd <instance>.`	The number of FluentD output errors is high, by default more than 10 in the previous 15 minutes.	Warning
`FluentdNodeDown`	`Prometheus could not scrape fluentd <instance> for more than 10m.`	Fluentd is reporting that Prometheus could not scrape a specific Fluentd instance.	Critical
`FluentdQueueLengthBurst`	`In the last minute, fluentd <instance> buffer queue length increased more than 32. Current value is <value>.`	Fluentd is reporting that it cannot keep up with the data being indexed.	Warning
`FluentdQueueLengthIncreasing`	`In the last 12h, fluentd <instance> buffer queue length constantly increased more than 1. Current value is <value>.`	Fluentd is reporting that the queue size is increasing.	Critical
`FluentDVeryHighErrorRate`	`<value> of records have resulted in an error by fluentd <instance>.`	The number of FluentD output errors is very high, by default more than 25 in the previous 15 minutes.	Critical

10.3.3. About Elasticsearch alerting rules
Copy link

You can view these alerting rules in Prometheus.

Expand

Alert	Description	Severity
ElasticsearchClusterNotHealthy	The cluster health status has been RED for at least 2 minutes. The cluster does not accept writes, shards may be missing, or the master node hasn’t been elected yet.	critical
ElasticsearchClusterNotHealthy	The cluster health status has been YELLOW for at least 20 minutes. Some shard replicas are not allocated.	warning
ElasticsearchDiskSpaceRunningLow	The cluster is expected to be out of disk space within the next 6 hours.	Critical
ElasticsearchHighFileDescriptorUsage	The cluster is predicted to be out of file descriptors within the next hour.	warning
ElasticsearchJVMHeapUseHigh	The JVM Heap usage on the specified node is high.	Alert
ElasticsearchNodeDiskWatermarkReached	The specified node has hit the low watermark due to low free disk space. Shards can not be allocated to this node anymore. You should consider adding more disk space to the node.	info
ElasticsearchNodeDiskWatermarkReached	The specified node has hit the high watermark due to low free disk space. Some shards will be re-allocated to different nodes if possible. Make sure more disk space is added to the node or drop old indices allocated to this node.	warning
ElasticsearchNodeDiskWatermarkReached	The specified node has hit the flood watermark due to low free disk space. Every index that has a shard allocated on this node is enforced a read-only block. The index block must be manually released when the disk use falls below the high watermark.	critical
ElasticsearchJVMHeapUseHigh	The JVM Heap usage on the specified node is too high.	alert
ElasticsearchWriteRequestsRejectionJumps	Elasticsearch is experiencing an increase in write rejections on the specified node. This node might not be keeping up with the indexing speed.	Warning
AggregatedLoggingSystemCPUHigh	The CPU used by the system on the specified node is too high.	alert
ElasticsearchProcessCPUHigh	The CPU used by Elasticsearch on the specified node is too high.	alert

10.4. Troubleshooting the log curator
Copy link

You can use information in this section for debugging log curation. Curator is used to remove data that is in the Elasticsearch index format prior to OpenShift Container Platform 4.6, and will be removed in a later release.

10.4.1. Troubleshooting log curation
Copy link

You can use information in this section for debugging log curation. For example, if curator is in a failed state, but the log messages do not provide a reason, you could increase the log level and trigger a new job, instead of waiting for another scheduled run of the cron job.

Prerequisites

Cluster logging and Elasticsearch must be installed.

Procedure

To enable the Curator debug log and trigger next Curator iteration manually:

Enable debug log of Curator:
```
oc set env cronjob/curator CURATOR_LOG_LEVEL=DEBUG CURATOR_SCRIPT_LOG_LEVEL=DEBUG
```
```
$ oc set env cronjob/curator CURATOR_LOG_LEVEL=DEBUG CURATOR_SCRIPT_LOG_LEVEL=DEBUG
```
Copy to Clipboard Toggle word wrap
Specify the log level:
- CRITICAL. Curator displays only critical messages.
- ERROR. Curator displays only error and critical messages.
- WARNING. Curator displays only error, warning, and critical messages.
- INFO. Curator displays only informational, error, warning, and critical messages.
- DEBUG. Curator displays only debug messages, in addition to all of the above.
  The default value is INFO.
  Note
  Cluster logging uses the OpenShift Container Platform custom environment variable CURATOR_SCRIPT_LOG_LEVEL in OpenShift Container Platform wrapper scripts (run.sh and convert.py). The environment variable takes the same values as CURATOR_LOG_LEVEL for script debugging, as needed.

Trigger next curator iteration:

oc create job --from=cronjob/curator <job_name>

$ oc create job --from=cronjob/curator <job_name>

Copy to Clipboard

Toggle word wrap

Use the following commands to control the cron job:

Suspend a cron job:

oc patch cronjob curator -p '{"spec":{"suspend":true}}'

$ oc patch cronjob curator -p '{"spec":{"suspend":true}}'

Copy to Clipboard

Toggle word wrap

Resume a cron job:

oc patch cronjob curator -p '{"spec":{"suspend":false}}'

$ oc patch cronjob curator -p '{"spec":{"suspend":false}}'

Copy to Clipboard

Toggle word wrap

Change a cron job schedule:
```
oc patch cronjob curator -p '{"spec":{"schedule":"0 0 * * *"}}'
```
```
$ oc patch cronjob curator -p '{"spec":{"schedule":"0 0 * * *"}}' 
```
1
Copy to Clipboard Toggle word wrap
1
The schedule option accepts schedules in cron format.

10.5. Collecting logging data for Red Hat Support
Copy link

When opening a support case, it is helpful to provide debugging information about your cluster to Red Hat Support.

The must-gather tool enables you to collect diagnostic information for project-level resources, cluster-level resources, and each of the cluster logging components.

For prompt support, supply diagnostic information for both OpenShift Container Platform and cluster logging.

Note

Do not use the hack/logging-dump.sh script. The script is no longer supported and does not collect data.

10.5.1. About the must-gather tool
Copy link

The oc adm must-gather CLI command collects the information from your cluster that is most likely needed for debugging issues.

For your cluster logging environment, must-gather collects the following information:

project-level resources, including pods, configuration maps, service accounts, roles, role bindings, and events at the project level
cluster-level resources, including nodes, roles, and role bindings at the cluster level
cluster logging resources in the openshift-logging and openshift-operators-redhat namespaces, including health status for the log collector, the log store, the curator, and the log visualizer

When you run oc adm must-gather, a new pod is created on the cluster. The data is collected on that pod and saved in a new directory that starts with must-gather.local. This directory is created in the current working directory.

10.5.2. Prerequisites
Copy link

Cluster logging and Elasticsearch must be installed.

10.5.3. Collecting cluster logging data
Copy link

You can use the oc adm must-gather CLI command to collect information about your cluster logging environment.

Procedure

To collect cluster logging information with must-gather:

Navigate to the directory where you want to store the must-gather information.

Run the oc adm must-gather command against the cluster logging image:

oc adm must-gather --image=$(oc -n openshift-logging get deployment.apps/cluster-logging-operator -o jsonpath='{.spec.template.spec.containers[?(@.name == "cluster-logging-operator")].image}')

$ oc adm must-gather --image=$(oc -n openshift-logging get deployment.apps/cluster-logging-operator -o jsonpath='{.spec.template.spec.containers[?(@.name == "cluster-logging-operator")].image}')

Copy to Clipboard

Toggle word wrap

The must-gather tool creates a new directory that starts with must-gather.local within the current directory. For example: must-gather.local.4157245944708210408.

Create a compressed file from the must-gather directory that was just created. For example, on a computer that uses a Linux operating system, run the following command:
```
tar -cvaf must-gather.tar.gz must-gather.local.4157245944708210408
```
```
$ tar -cvaf must-gather.tar.gz must-gather.local.4157245944708210408
```
Copy to Clipboard Toggle word wrap
Attach the compressed file to your support case on the Red Hat Customer Portal.

10.1. Viewing cluster logging status
Copy link

10.1.1. Viewing the status of the Cluster Logging Operator
Copy link

10.1.1.1. Example condition messages
Copy link

10.1.2. Viewing the status of cluster logging components
Copy link

10.2. Viewing the status of the log store
Copy link

10.2.1. Viewing the status of the log store
Copy link

10.2.1.1. Example condition messages
Copy link

10.2.2. Viewing the status of the log store components
Copy link

10.3. Understanding cluster logging alerts
Copy link

10.3.1. Viewing logging collector alerts
Copy link

10.3.2. About logging collector alerts
Copy link

10.3.3. About Elasticsearch alerting rules
Copy link

10.4. Troubleshooting the log curator
Copy link

10.4.1. Troubleshooting log curation
Copy link

10.5. Collecting logging data for Red Hat Support
Copy link

10.5.1. About the must-gather tool
Copy link

10.5.2. Prerequisites
Copy link

10.5.3. Collecting cluster logging data
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 10. Troubleshooting cluster logging

10.1. Viewing cluster logging statusCopy linkLink copied to clipboard!

10.1.1. Viewing the status of the Cluster Logging OperatorCopy linkLink copied to clipboard!

10.1.1.1. Example condition messagesCopy linkLink copied to clipboard!

10.1.2. Viewing the status of cluster logging componentsCopy linkLink copied to clipboard!

10.2. Viewing the status of the log storeCopy linkLink copied to clipboard!

10.2.1. Viewing the status of the log storeCopy linkLink copied to clipboard!

10.2.1.1. Example condition messagesCopy linkLink copied to clipboard!

10.2.2. Viewing the status of the log store componentsCopy linkLink copied to clipboard!

10.3. Understanding cluster logging alertsCopy linkLink copied to clipboard!

10.3.1. Viewing logging collector alertsCopy linkLink copied to clipboard!

10.3.2. About logging collector alertsCopy linkLink copied to clipboard!

10.3.3. About Elasticsearch alerting rulesCopy linkLink copied to clipboard!

10.4. Troubleshooting the log curatorCopy linkLink copied to clipboard!

10.4.1. Troubleshooting log curationCopy linkLink copied to clipboard!

10.5. Collecting logging data for Red Hat SupportCopy linkLink copied to clipboard!

10.5.1. About the must-gather toolCopy linkLink copied to clipboard!

10.5.2. PrerequisitesCopy linkLink copied to clipboard!

10.5.3. Collecting cluster logging dataCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

10.1. Viewing cluster logging status
Copy link

10.1.1. Viewing the status of the Cluster Logging Operator
Copy link

10.1.1.1. Example condition messages
Copy link

10.1.2. Viewing the status of cluster logging components
Copy link

10.2. Viewing the status of the log store
Copy link

10.2.1. Viewing the status of the log store
Copy link

10.2.1.1. Example condition messages
Copy link

10.2.2. Viewing the status of the log store components
Copy link

10.3. Understanding cluster logging alerts
Copy link

10.3.1. Viewing logging collector alerts
Copy link

10.3.2. About logging collector alerts
Copy link

10.3.3. About Elasticsearch alerting rules
Copy link

10.4. Troubleshooting the log curator
Copy link

10.4.1. Troubleshooting log curation
Copy link

10.5. Collecting logging data for Red Hat Support
Copy link

10.5.1. About the must-gather tool
Copy link

10.5.2. Prerequisites
Copy link

10.5.3. Collecting cluster logging data
Copy link