This documentation is for a release that is no longer maintained
See documentation for the latest supported version 3 or the latest supported version 4.Chapter 12. Troubleshooting Logging
12.1. Viewing OpenShift Logging status 링크 복사링크가 클립보드에 복사되었습니다!
You can view the status of the Red Hat OpenShift Logging Operator and for a number of OpenShift Logging components.
12.1.1. Viewing the status of the Red Hat OpenShift Logging Operator 링크 복사링크가 클립보드에 복사되었습니다!
You can view the status of your Red Hat OpenShift Logging Operator.
Prerequisites
- OpenShift Logging and Elasticsearch must be installed.
Procedure
Change to the
openshift-loggingproject.oc project openshift-logging
$ oc project openshift-loggingCopy to Clipboard Copied! Toggle word wrap Toggle overflow To view the OpenShift Logging status:
Get the OpenShift Logging status:
oc get clusterlogging instance -o yaml
$ oc get clusterlogging instance -o yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
12.1.1.1. Example condition messages 링크 복사링크가 클립보드에 복사되었습니다!
The following are examples of some condition messages from the Status.Nodes section of the OpenShift Logging instance.
A status message similar to the following indicates a node has exceeded the configured low watermark and no shard will be allocated to this node:
Example output
A status message similar to the following indicates a node has exceeded the configured high watermark and shards will be relocated to other nodes:
Example output
A status message similar to the following indicates the Elasticsearch node selector in the CR does not match any nodes in the cluster:
Example output
A status message similar to the following indicates that the requested PVC could not bind to PV:
Example output
A status message similar to the following indicates that the Fluentd pods cannot be scheduled because the node selector did not match any nodes:
Example output
12.1.2. Viewing the status of OpenShift Logging components 링크 복사링크가 클립보드에 복사되었습니다!
You can view the status for a number of OpenShift Logging components.
Prerequisites
- OpenShift Logging and Elasticsearch must be installed.
Procedure
Change to the
openshift-loggingproject.oc project openshift-logging
$ oc project openshift-loggingCopy to Clipboard Copied! Toggle word wrap Toggle overflow View the status of the OpenShift Logging environment:
oc describe deployment cluster-logging-operator
$ oc describe deployment cluster-logging-operatorCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow View the status of the OpenShift Logging replica set:
Get the name of a replica set:
Example output
oc get replicaset
$ oc get replicasetCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Get the status of the replica set:
oc describe replicaset cluster-logging-operator-574b8987df
$ oc describe replicaset cluster-logging-operator-574b8987dfCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
12.2. Viewing the status of the log store 링크 복사링크가 클립보드에 복사되었습니다!
You can view the status of the OpenShift Elasticsearch Operator and for a number of Elasticsearch components.
12.2.1. Viewing the status of the log store 링크 복사링크가 클립보드에 복사되었습니다!
You can view the status of your log store.
Prerequisites
- OpenShift Logging and Elasticsearch must be installed.
Procedure
Change to the
openshift-loggingproject.oc project openshift-logging
$ oc project openshift-loggingCopy to Clipboard Copied! Toggle word wrap Toggle overflow To view the status:
Get the name of the log store instance:
oc get Elasticsearch
$ oc get ElasticsearchCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME AGE elasticsearch 5h9m
NAME AGE elasticsearch 5h9mCopy to Clipboard Copied! Toggle word wrap Toggle overflow Get the log store status:
oc get Elasticsearch <Elasticsearch-instance> -o yaml
$ oc get Elasticsearch <Elasticsearch-instance> -o yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow For example:
oc get Elasticsearch elasticsearch -n openshift-logging -o yaml
$ oc get Elasticsearch elasticsearch -n openshift-logging -o yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow The output includes information similar to the following:
Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- In the output, the cluster status fields appear in the
statusstanza. - 2
- The status of the log store:
- The number of active primary shards.
- The number of active shards.
- The number of shards that are initializing.
- The number of log store data nodes.
- The total number of log store nodes.
- The number of pending tasks.
-
The log store status:
green,red,yellow. - The number of unassigned shards.
- 3
- Any status conditions, if present. The log store status indicates the reasons from the scheduler if a pod could not be placed. Any events related to the following conditions are shown:
- Container Waiting for both the log store and proxy containers.
- Container Terminated for both the log store and proxy containers.
- Pod unschedulable. Also, a condition is shown for a number of issues; see Example condition messages.
- 4
- The log store nodes in the cluster, with
upgradeStatus. - 5
- The log store client, data, and master pods in the cluster, listed under 'failed`,
notReady, orreadystate.
12.2.1.1. Example condition messages 링크 복사링크가 클립보드에 복사되었습니다!
The following are examples of some condition messages from the Status section of the Elasticsearch instance.
The following status message indicates that a node has exceeded the configured low watermark, and no shard will be allocated to this node.
The following status message indicates that a node has exceeded the configured high watermark, and shards will be relocated to other nodes.
The following status message indicates that the log store node selector in the CR does not match any nodes in the cluster:
The following status message indicates that the log store CR uses a non-existent persistent volume claim (PVC).
The following status message indicates that your log store cluster does not have enough nodes to support the redundancy policy.
This status message indicates your cluster has too many control plane nodes (also known as the master nodes):
The following status message indicates that Elasticsearch storage does not support the change you tried to make.
For example:
The reason and type fields specify the type of unsupported change:
StorageClassNameChangeIgnored- Unsupported change to the storage class name.
StorageSizeChangeIgnored- Unsupported change the storage size.
StorageStructureChangeIgnoredUnsupported change between ephemeral and persistent storage structures.
ImportantIf you try to configure the
ClusterLoggingcustom resource (CR) to switch from ephemeral to persistent storage, the OpenShift Elasticsearch Operator creates a persistent volume claim (PVC) but does not create a persistent volume (PV). To clear theStorageStructureChangeIgnoredstatus, you must revert the change to theClusterLoggingCR and delete the PVC.
12.2.2. Viewing the status of the log store components 링크 복사링크가 클립보드에 복사되었습니다!
You can view the status for a number of the log store components.
- Elasticsearch indices
You can view the status of the Elasticsearch indices.
Get the name of an Elasticsearch pod:
oc get pods --selector component=elasticsearch -o name
$ oc get pods --selector component=elasticsearch -o nameCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
pod/elasticsearch-cdm-1godmszn-1-6f8495-vp4lw pod/elasticsearch-cdm-1godmszn-2-5769cf-9ms2n pod/elasticsearch-cdm-1godmszn-3-f66f7d-zqkz7
pod/elasticsearch-cdm-1godmszn-1-6f8495-vp4lw pod/elasticsearch-cdm-1godmszn-2-5769cf-9ms2n pod/elasticsearch-cdm-1godmszn-3-f66f7d-zqkz7Copy to Clipboard Copied! Toggle word wrap Toggle overflow Get the status of the indices:
oc exec elasticsearch-cdm-4vjor49p-2-6d4d7db474-q2w7z -- indices
$ oc exec elasticsearch-cdm-4vjor49p-2-6d4d7db474-q2w7z -- indicesCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
- Log store pods
You can view the status of the pods that host the log store.
Get the name of a pod:
oc get pods --selector component=elasticsearch -o name
$ oc get pods --selector component=elasticsearch -o nameCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
pod/elasticsearch-cdm-1godmszn-1-6f8495-vp4lw pod/elasticsearch-cdm-1godmszn-2-5769cf-9ms2n pod/elasticsearch-cdm-1godmszn-3-f66f7d-zqkz7
pod/elasticsearch-cdm-1godmszn-1-6f8495-vp4lw pod/elasticsearch-cdm-1godmszn-2-5769cf-9ms2n pod/elasticsearch-cdm-1godmszn-3-f66f7d-zqkz7Copy to Clipboard Copied! Toggle word wrap Toggle overflow Get the status of a pod:
oc describe pod elasticsearch-cdm-1godmszn-1-6f8495-vp4lw
$ oc describe pod elasticsearch-cdm-1godmszn-1-6f8495-vp4lwCopy to Clipboard Copied! Toggle word wrap Toggle overflow The output includes the following status information:
Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
- Log storage pod deployment configuration
You can view the status of the log store deployment configuration.
Get the name of a deployment configuration:
oc get deployment --selector component=elasticsearch -o name
$ oc get deployment --selector component=elasticsearch -o nameCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
deployment.extensions/elasticsearch-cdm-1gon-1 deployment.extensions/elasticsearch-cdm-1gon-2 deployment.extensions/elasticsearch-cdm-1gon-3
deployment.extensions/elasticsearch-cdm-1gon-1 deployment.extensions/elasticsearch-cdm-1gon-2 deployment.extensions/elasticsearch-cdm-1gon-3Copy to Clipboard Copied! Toggle word wrap Toggle overflow Get the deployment configuration status:
oc describe deployment elasticsearch-cdm-1gon-1
$ oc describe deployment elasticsearch-cdm-1gon-1Copy to Clipboard Copied! Toggle word wrap Toggle overflow The output includes the following status information:
Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
- Log store replica set
You can view the status of the log store replica set.
Get the name of a replica set:
oc get replicaSet --selector component=elasticsearch -o name
$ oc get replicaSet --selector component=elasticsearch -o name replicaset.extensions/elasticsearch-cdm-1gon-1-6f8495 replicaset.extensions/elasticsearch-cdm-1gon-2-5769cf replicaset.extensions/elasticsearch-cdm-1gon-3-f66f7dCopy to Clipboard Copied! Toggle word wrap Toggle overflow Get the status of the replica set:
oc describe replicaSet elasticsearch-cdm-1gon-1-6f8495
$ oc describe replicaSet elasticsearch-cdm-1gon-1-6f8495Copy to Clipboard Copied! Toggle word wrap Toggle overflow The output includes the following status information:
Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
12.3. Understanding OpenShift Logging alerts 링크 복사링크가 클립보드에 복사되었습니다!
All of the logging collector alerts are listed on the Alerting UI of the OpenShift Container Platform web console.
12.3.1. Viewing logging collector alerts 링크 복사링크가 클립보드에 복사되었습니다!
Alerts are shown in the OpenShift Container Platform web console, on the Alerts tab of the Alerting UI. Alerts are in one of the following states:
- Firing. The alert condition is true for the duration of the timeout. Click the Options menu at the end of the firing alert to view more information or silence the alert.
- Pending The alert condition is currently true, but the timeout has not been reached.
- Not Firing. The alert is not currently triggered.
Procedure
To view OpenShift Logging and other OpenShift Container Platform alerts:
-
In the OpenShift Container Platform console, click Monitoring
Alerting. - Click the Alerts tab. The alerts are listed, based on the filters selected.
12.3.2. About logging collector alerts 링크 복사링크가 클립보드에 복사되었습니다!
The following alerts are generated by the logging collector. You can view these alerts in the OpenShift Container Platform web console, on the Alerts page of the Alerting UI.
| Alert | Message | Description | Severity |
|---|---|---|---|
|
|
| The number of FluentD output errors is high, by default more than 10 in the previous 15 minutes. | Warning |
|
|
| Fluentd is reporting that Prometheus could not scrape a specific Fluentd instance. | Critical |
|
|
| Fluentd is reporting that the queue size is increasing. | Critical |
|
|
| The number of FluentD output errors is very high, by default more than 25 in the previous 15 minutes. | Critical |
12.3.3. About Elasticsearch alerting rules 링크 복사링크가 클립보드에 복사되었습니다!
You can view these alerting rules in Prometheus.
| Alert | Description | Severity |
|---|---|---|
|
| The cluster health status has been RED for at least 2 minutes. The cluster does not accept writes, shards may be missing, or the master node hasn’t been elected yet. | Critical |
|
| The cluster health status has been YELLOW for at least 20 minutes. Some shard replicas are not allocated. | Warning |
|
| The cluster is expected to be out of disk space within the next 6 hours. | Critical |
|
| The cluster is predicted to be out of file descriptors within the next hour. | Warning |
|
| The JVM Heap usage on the specified node is high. | Alert |
|
| The specified node has hit the low watermark due to low free disk space. Shards can not be allocated to this node anymore. You should consider adding more disk space to the node. | Info |
|
| The specified node has hit the high watermark due to low free disk space. Some shards will be re-allocated to different nodes if possible. Make sure more disk space is added to the node or drop old indices allocated to this node. | Warning |
|
| The specified node has hit the flood watermark due to low free disk space. Every index that has a shard allocated on this node is enforced a read-only block. The index block must be manually released when the disk use falls below the high watermark. | Critical |
|
| The JVM Heap usage on the specified node is too high. | Alert |
|
| Elasticsearch is experiencing an increase in write rejections on the specified node. This node might not be keeping up with the indexing speed. | Warning |
|
| The CPU used by the system on the specified node is too high. | Alert |
|
| The CPU used by Elasticsearch on the specified node is too high. | Alert |
12.4. Collecting logging data for Red Hat Support 링크 복사링크가 클립보드에 복사되었습니다!
When opening a support case, it is helpful to provide debugging information about your cluster to Red Hat Support.
The must-gather tool enables you to collect diagnostic information for project-level resources, cluster-level resources, and each of the OpenShift Logging components.
For prompt support, supply diagnostic information for both OpenShift Container Platform and OpenShift Logging.
Do not use the hack/logging-dump.sh script. The script is no longer supported and does not collect data.
12.4.1. About the must-gather tool 링크 복사링크가 클립보드에 복사되었습니다!
The oc adm must-gather CLI command collects the information from your cluster that is most likely needed for debugging issues.
For your OpenShift Logging environment, must-gather collects the following information:
- Project-level resources, including pods, configuration maps, service accounts, roles, role bindings, and events at the project level
- Cluster-level resources, including nodes, roles, and role bindings at the cluster level
-
OpenShift Logging resources in the
openshift-loggingandopenshift-operators-redhatnamespaces, including health status for the log collector, the log store, and the log visualizer
When you run oc adm must-gather, a new pod is created on the cluster. The data is collected on that pod and saved in a new directory that starts with must-gather.local. This directory is created in the current working directory.
12.4.2. Prerequisites 링크 복사링크가 클립보드에 복사되었습니다!
- OpenShift Logging and Elasticsearch must be installed.
12.4.3. Collecting OpenShift Logging data 링크 복사링크가 클립보드에 복사되었습니다!
You can use the oc adm must-gather CLI command to collect information about your OpenShift Logging environment.
Procedure
To collect OpenShift Logging information with must-gather:
-
Navigate to the directory where you want to store the
must-gatherinformation. Run the
oc adm must-gathercommand against the OpenShift Logging image:oc adm must-gather --image=$(oc -n openshift-logging get deployment.apps/cluster-logging-operator -o jsonpath='{.spec.template.spec.containers[?(@.name == "cluster-logging-operator")].image}')$ oc adm must-gather --image=$(oc -n openshift-logging get deployment.apps/cluster-logging-operator -o jsonpath='{.spec.template.spec.containers[?(@.name == "cluster-logging-operator")].image}')Copy to Clipboard Copied! Toggle word wrap Toggle overflow The
must-gathertool creates a new directory that starts withmust-gather.localwithin the current directory. For example:must-gather.local.4157245944708210408.Create a compressed file from the
must-gatherdirectory that was just created. For example, on a computer that uses a Linux operating system, run the following command:tar -cvaf must-gather.tar.gz must-gather.local.4157245944708210408
$ tar -cvaf must-gather.tar.gz must-gather.local.4157245944708210408Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Attach the compressed file to your support case on the Red Hat Customer Portal.
12.5. Troubleshooting for Critical Alerts 링크 복사링크가 클립보드에 복사되었습니다!
12.5.1. Elasticsearch Cluster Health is Red 링크 복사링크가 클립보드에 복사되었습니다!
At least one primary shard and its replicas are not allocated to a node.
Troubleshooting
Check the Elasticsearch cluster health and verify that the cluster
statusis red.oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- health
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- healthCopy to Clipboard Copied! Toggle word wrap Toggle overflow List the nodes that have joined the cluster.
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=_cat/nodes?v
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=_cat/nodes?vCopy to Clipboard Copied! Toggle word wrap Toggle overflow List the Elasticsearch pods and compare them with the nodes in the command output from the previous step.
oc -n openshift-logging get pods -l component=elasticsearch
oc -n openshift-logging get pods -l component=elasticsearchCopy to Clipboard Copied! Toggle word wrap Toggle overflow If some of the Elasticsearch nodes have not joined the cluster, perform the following steps.
Confirm that Elasticsearch has an elected control plane node.
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=_cat/master?v
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=_cat/master?vCopy to Clipboard Copied! Toggle word wrap Toggle overflow Review the pod logs of the elected control plane node for issues.
oc logs <elasticsearch_master_pod_name> -c elasticsearch -n openshift-logging
oc logs <elasticsearch_master_pod_name> -c elasticsearch -n openshift-loggingCopy to Clipboard Copied! Toggle word wrap Toggle overflow Review the logs of nodes that have not joined the cluster for issues.
oc logs <elasticsearch_node_name> -c elasticsearch -n openshift-logging
oc logs <elasticsearch_node_name> -c elasticsearch -n openshift-loggingCopy to Clipboard Copied! Toggle word wrap Toggle overflow
If all the nodes have joined the cluster, perform the following steps, check if the cluster is in the process of recovering.
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=_cat/recovery?active_only=true
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=_cat/recovery?active_only=trueCopy to Clipboard Copied! Toggle word wrap Toggle overflow If there is no command output, the recovery process might be delayed or stalled by pending tasks.
Check if there are pending tasks.
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- health |grep number_of_pending_tasks
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- health |grep number_of_pending_tasksCopy to Clipboard Copied! Toggle word wrap Toggle overflow If there are pending tasks, monitor their status.
If their status changes and indicates that the cluster is recovering, continue waiting. The recovery time varies according to the size of the cluster and other factors.
Otherwise, if the status of the pending tasks does not change, this indicates that the recovery has stalled.
If it seems like the recovery has stalled, check if
cluster.routing.allocation.enableis set tonone.oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=_cluster/settings?pretty
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=_cluster/settings?prettyCopy to Clipboard Copied! Toggle word wrap Toggle overflow If
cluster.routing.allocation.enableis set tonone, set it toall.oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=_cluster/settings?pretty -X PUT -d '{"persistent": {"cluster.routing.allocation.enable":"all"}}'oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=_cluster/settings?pretty -X PUT -d '{"persistent": {"cluster.routing.allocation.enable":"all"}}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check which indices are still red.
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=_cat/indices?v
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=_cat/indices?vCopy to Clipboard Copied! Toggle word wrap Toggle overflow If any indices are still red, try to clear them by performing the following steps.
Clear the cache.
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=<elasticsearch_index_name>/_cache/clear?pretty
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=<elasticsearch_index_name>/_cache/clear?prettyCopy to Clipboard Copied! Toggle word wrap Toggle overflow Increase the max allocation retries.
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=<elasticsearch_index_name>/_settings?pretty -X PUT -d '{"index.allocation.max_retries":10}'oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=<elasticsearch_index_name>/_settings?pretty -X PUT -d '{"index.allocation.max_retries":10}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow Delete all the scroll items.
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=_search/scroll/_all -X DELETE
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=_search/scroll/_all -X DELETECopy to Clipboard Copied! Toggle word wrap Toggle overflow Increase the timeout.
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=<elasticsearch_index_name>/_settings?pretty -X PUT -d '{"index.unassigned.node_left.delayed_timeout":"10m"}'oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=<elasticsearch_index_name>/_settings?pretty -X PUT -d '{"index.unassigned.node_left.delayed_timeout":"10m"}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow
If the preceding steps do not clear the red indices, delete the indices individually.
Identify the red index name.
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=_cat/indices?v
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=_cat/indices?vCopy to Clipboard Copied! Toggle word wrap Toggle overflow Delete the red index.
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=<elasticsearch_red_index_name> -X DELETE
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=<elasticsearch_red_index_name> -X DELETECopy to Clipboard Copied! Toggle word wrap Toggle overflow
If there are no red indices and the cluster status is red, check for a continuous heavy processing load on a data node.
Check if the Elasticsearch JVM Heap usage is high.
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=_nodes/stats?pretty
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=_nodes/stats?prettyCopy to Clipboard Copied! Toggle word wrap Toggle overflow In the command output, review the
node_name.jvm.mem.heap_used_percentfield to determine the JVM Heap usage.- Check for high CPU utilization.
12.5.2. Elasticsearch Cluster Health is Yellow 링크 복사링크가 클립보드에 복사되었습니다!
Replica shards for at least one primary shard are not allocated to nodes.
Troubleshooting
-
Increase the node count by adjusting
nodeCountin theClusterLoggingCR.
12.5.3. Elasticsearch Node Disk Low Watermark Reached 링크 복사링크가 클립보드에 복사되었습니다!
Elasticsearch does not allocate shards to nodes that reach the low watermark.
Troubleshooting
Identify the node on which Elasticsearch is deployed.
oc -n openshift-logging get po -o wide
oc -n openshift-logging get po -o wideCopy to Clipboard Copied! Toggle word wrap Toggle overflow Check if there are
unassigned shards.oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=_cluster/health?pretty | grep unassigned_shards
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=_cluster/health?pretty | grep unassigned_shardsCopy to Clipboard Copied! Toggle word wrap Toggle overflow If there are unassigned shards, check the disk space on each node.
for pod in `oc -n openshift-logging get po -l component=elasticsearch -o jsonpath='{.items[*].metadata.name}'`; do echo $pod; oc -n openshift-logging exec -c elasticsearch $pod -- df -h /elasticsearch/persistent; donefor pod in `oc -n openshift-logging get po -l component=elasticsearch -o jsonpath='{.items[*].metadata.name}'`; do echo $pod; oc -n openshift-logging exec -c elasticsearch $pod -- df -h /elasticsearch/persistent; doneCopy to Clipboard Copied! Toggle word wrap Toggle overflow Check the
nodes.node_name.fsfield to determine the free disk space on that node.If the used disk percentage is above 85%, the node has exceeded the low watermark, and shards can no longer be allocated to this node.
- Try to increase the disk space on all nodes.
- If increasing the disk space is not possible, try adding a new data node to the cluster.
If adding a new data node is problematic, decrease the total cluster redundancy policy.
Check the current
redundancyPolicy.oc -n openshift-logging get es elasticsearch -o jsonpath='{.spec.redundancyPolicy}'oc -n openshift-logging get es elasticsearch -o jsonpath='{.spec.redundancyPolicy}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteIf you are using a
ClusterLoggingCR, enter:oc -n openshift-logging get cl -o jsonpath='{.items[*].spec.logStore.elasticsearch.redundancyPolicy}'oc -n openshift-logging get cl -o jsonpath='{.items[*].spec.logStore.elasticsearch.redundancyPolicy}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
If the cluster
redundancyPolicyis higher thanSingleRedundancy, set it toSingleRedundancyand save this change.
If the preceding steps do not fix the issue, delete the old indices.
Check the status of all indices on Elasticsearch.
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- indices
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- indicesCopy to Clipboard Copied! Toggle word wrap Toggle overflow - Identify an old index that can be deleted.
Delete the index.
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=<elasticsearch_index_name> -X DELETE
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=<elasticsearch_index_name> -X DELETECopy to Clipboard Copied! Toggle word wrap Toggle overflow
12.5.4. Elasticsearch Node Disk High Watermark Reached 링크 복사링크가 클립보드에 복사되었습니다!
Elasticsearch attempts to relocate shards away from a node that has reached the high watermark.
Troubleshooting
Identify the node on which Elasticsearch is deployed.
oc -n openshift-logging get po -o wide
oc -n openshift-logging get po -o wideCopy to Clipboard Copied! Toggle word wrap Toggle overflow Check the disk space on each node.
for pod in `oc -n openshift-logging get po -l component=elasticsearch -o jsonpath='{.items[*].metadata.name}'`; do echo $pod; oc -n openshift-logging exec -c elasticsearch $pod -- df -h /elasticsearch/persistent; donefor pod in `oc -n openshift-logging get po -l component=elasticsearch -o jsonpath='{.items[*].metadata.name}'`; do echo $pod; oc -n openshift-logging exec -c elasticsearch $pod -- df -h /elasticsearch/persistent; doneCopy to Clipboard Copied! Toggle word wrap Toggle overflow Check if the cluster is rebalancing.
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=_cluster/health?pretty | grep relocating_shards
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=_cluster/health?pretty | grep relocating_shardsCopy to Clipboard Copied! Toggle word wrap Toggle overflow If the command output shows relocating shards, the High Watermark has been exceeded. The default value of the High Watermark is 90%.
The shards relocate to a node with low disk usage that has not crossed any watermark threshold limits.
- To allocate shards to a particular node, free up some space.
- Try to increase the disk space on all nodes.
- If increasing the disk space is not possible, try adding a new data node to the cluster.
If adding a new data node is problematic, decrease the total cluster redundancy policy.
Check the current
redundancyPolicy.oc -n openshift-logging get es elasticsearch -o jsonpath='{.spec.redundancyPolicy}'oc -n openshift-logging get es elasticsearch -o jsonpath='{.spec.redundancyPolicy}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteIf you are using a
ClusterLoggingCR, enter:oc -n openshift-logging get cl -o jsonpath='{.items[*].spec.logStore.elasticsearch.redundancyPolicy}'oc -n openshift-logging get cl -o jsonpath='{.items[*].spec.logStore.elasticsearch.redundancyPolicy}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
If the cluster
redundancyPolicyis higher thanSingleRedundancy, set it toSingleRedundancyand save this change.
If the preceding steps do not fix the issue, delete the old indices.
Check the status of all indices on Elasticsearch.
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- indices
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- indicesCopy to Clipboard Copied! Toggle word wrap Toggle overflow - Identify an old index that can be deleted.
Delete the index.
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=<elasticsearch_index_name> -X DELETE
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=<elasticsearch_index_name> -X DELETECopy to Clipboard Copied! Toggle word wrap Toggle overflow
12.5.5. Elasticsearch Node Disk Flood Watermark Reached 링크 복사링크가 클립보드에 복사되었습니다!
Elasticsearch enforces a read-only index block on every index that has both of these conditions:
- One or more shards are allocated to the node.
- One or more disks exceed the flood stage.
Troubleshooting
Check the disk space of the Elasticsearch node.
for pod in `oc -n openshift-logging get po -l component=elasticsearch -o jsonpath='{.items[*].metadata.name}'`; do echo $pod; oc -n openshift-logging exec -c elasticsearch $pod -- df -h /elasticsearch/persistent; donefor pod in `oc -n openshift-logging get po -l component=elasticsearch -o jsonpath='{.items[*].metadata.name}'`; do echo $pod; oc -n openshift-logging exec -c elasticsearch $pod -- df -h /elasticsearch/persistent; doneCopy to Clipboard Copied! Toggle word wrap Toggle overflow Check the
nodes.node_name.fsfield to determine the free disk space on that node.- If the used disk percentage is above 95%, it signifies that the node has crossed the flood watermark. Writing is blocked for shards allocated on this particular node.
- Try to increase the disk space on all nodes.
- If increasing the disk space is not possible, try adding a new data node to the cluster.
If adding a new data node is problematic, decrease the total cluster redundancy policy.
Check the current
redundancyPolicy.oc -n openshift-logging get es elasticsearch -o jsonpath='{.spec.redundancyPolicy}'oc -n openshift-logging get es elasticsearch -o jsonpath='{.spec.redundancyPolicy}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteIf you are using a
ClusterLoggingCR, enter:oc -n openshift-logging get cl -o jsonpath='{.items[*].spec.logStore.elasticsearch.redundancyPolicy}'oc -n openshift-logging get cl -o jsonpath='{.items[*].spec.logStore.elasticsearch.redundancyPolicy}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
If the cluster
redundancyPolicyis higher thanSingleRedundancy, set it toSingleRedundancyand save this change.
If the preceding steps do not fix the issue, delete the old indices.
Check the status of all indices on Elasticsearch.
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- indices
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- indicesCopy to Clipboard Copied! Toggle word wrap Toggle overflow - Identify an old index that can be deleted.
Delete the index.
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=<elasticsearch_index_name> -X DELETE
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=<elasticsearch_index_name> -X DELETECopy to Clipboard Copied! Toggle word wrap Toggle overflow
Continue freeing up and monitoring the disk space until the used disk space drops below 90%. Then, unblock write to this particular node.
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=_all/_settings?pretty -X PUT -d '{"index.blocks.read_only_allow_delete": null}'oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=_all/_settings?pretty -X PUT -d '{"index.blocks.read_only_allow_delete": null}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow
12.5.6. Elasticsearch JVM Heap Use is High 링크 복사링크가 클립보드에 복사되었습니다!
The Elasticsearch node JVM Heap memory used is above 75%.
Troubleshooting
Consider increasing the heap size.
12.5.7. Aggregated Logging System CPU is High 링크 복사링크가 클립보드에 복사되었습니다!
System CPU usage on the node is high.
Troubleshooting
Check the CPU of the cluster node. Consider allocating more CPU resources to the node.
12.5.8. Elasticsearch Process CPU is High 링크 복사링크가 클립보드에 복사되었습니다!
Elasticsearch process CPU usage on the node is high.
Troubleshooting
Check the CPU of the cluster node. Consider allocating more CPU resources to the node.
12.5.9. Elasticsearch Disk Space is Running Low 링크 복사링크가 클립보드에 복사되었습니다!
The Elasticsearch Cluster is predicted to be out of disk space within the next 6 hours based on current disk usage.
Troubleshooting
Get the disk space of the Elasticsearch node.
for pod in `oc -n openshift-logging get po -l component=elasticsearch -o jsonpath='{.items[*].metadata.name}'`; do echo $pod; oc -n openshift-logging exec -c elasticsearch $pod -- df -h /elasticsearch/persistent; donefor pod in `oc -n openshift-logging get po -l component=elasticsearch -o jsonpath='{.items[*].metadata.name}'`; do echo $pod; oc -n openshift-logging exec -c elasticsearch $pod -- df -h /elasticsearch/persistent; doneCopy to Clipboard Copied! Toggle word wrap Toggle overflow -
In the command output, check the
nodes.node_name.fsfield to determine the free disk space on that node. - Try to increase the disk space on all nodes.
- If increasing the disk space is not possible, try adding a new data node to the cluster.
If adding a new data node is problematic, decrease the total cluster redundancy policy.
Check the current
redundancyPolicy.oc -n openshift-logging get es elasticsearch -o jsonpath='{.spec.redundancyPolicy}'oc -n openshift-logging get es elasticsearch -o jsonpath='{.spec.redundancyPolicy}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteIf you are using a
ClusterLoggingCR, enter:oc -n openshift-logging get cl -o jsonpath='{.items[*].spec.logStore.elasticsearch.redundancyPolicy}'oc -n openshift-logging get cl -o jsonpath='{.items[*].spec.logStore.elasticsearch.redundancyPolicy}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
If the cluster
redundancyPolicyis higher thanSingleRedundancy, set it toSingleRedundancyand save this change.
If the preceding steps do not fix the issue, delete the old indices.
Check the status of all indices on Elasticsearch.
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- indices
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- indicesCopy to Clipboard Copied! Toggle word wrap Toggle overflow - Identify an old index that can be deleted.
Delete the index.
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=<elasticsearch_index_name> -X DELETE
oc exec -n openshift-logging -c elasticsearch <elasticsearch_pod_name> -- es_util --query=<elasticsearch_index_name> -X DELETECopy to Clipboard Copied! Toggle word wrap Toggle overflow
12.5.10. Elasticsearch FileDescriptor Usage is high 링크 복사링크가 클립보드에 복사되었습니다!
Based on current usage trends, the predicted number of file descriptors on the node is insufficient.
Troubleshooting
Check and, if needed, configure the value of max_file_descriptors for each node, as described in the Elasticsearch File descriptors topic.