9.3. Understanding cluster logging alerts
All of the logging collector alerts are listed on the Alerting UI of the OpenShift Container Platform web console.
9.3.1. Viewing logging collector alerts
Alerts are shown in the OpenShift Container Platform web console, on the Alerts tab of the Alerting UI. Alerts are in one of the following states:
- Firing. The alert condition is true for the duration of the timeout. Click the Options menu at the end of the firing alert to view more information or silence the alert.
- Pending The alert condition is currently true, but the timeout has not been reached.
- Not Firing. The alert is not currently triggered.
Procedure
To view cluster logging and other OpenShift Container Platform alerts:
-
In the OpenShift Container Platform console, click Monitoring
Alerting. - Click the Alerts tab. The alerts are listed, based on the filters selected.
Additional resources
- For more information on the Alerting UI, see Managing cluster alerts.
9.3.2. About logging collector alerts
The following alerts are generated by the logging collector. You can view these alerts in the OpenShift Container Platform web console, on the Alerts page of the Alerting UI.
Alert | Message | Description | Severity |
---|---|---|---|
|
| Fluentd is reporting a higher number of issues than the specified number, default 10. | Critical |
|
| Fluentd is reporting that Prometheus could not scrape a specific Fluentd instance. | Critical |
|
| Fluentd is reporting that it is overwhelmed. | Warning |
|
| Fluentd is reporting queue usage issues. | Critical |
9.3.3. About Elasticsearch alerting rules
You can view these alerting rules in Prometheus.
Alert | Description | Severity |
---|---|---|
ElasticsearchClusterNotHealthy | Cluster health status has been RED for at least 2m. Cluster does not accept writes, shards may be missing or master node hasn’t been elected yet. | critical |
ElasticsearchClusterNotHealthy | Cluster health status has been YELLOW for at least 20m. Some shard replicas are not allocated. | warning |
ElasticsearchBulkRequestsRejectionJumps | High Bulk Rejection Ratio at node in cluster. This node may not be keeping up with the indexing speed. | warning |
ElasticsearchNodeDiskWatermarkReached | Disk Low Watermark Reached at node in cluster. Shards can not be allocated to this node anymore. You should consider adding more disk space to the node. | alert |
ElasticsearchNodeDiskWatermarkReached | Disk High Watermark Reached at node in cluster. Some shards will be re-allocated to different nodes if possible. Make sure more disk space is added to the node or drop old indices allocated to this node. | high |
ElasticsearchJVMHeapUseHigh | JVM Heap usage on the node in cluster is <value> | alert |
AggregatedLoggingSystemCPUHigh | System CPU usage on the node in cluster is <value> | alert |
ElasticsearchProcessCPUHigh | ES process CPU usage on the node in cluster is <value> | alert |