Chapter 11. Logging alerts

11.1. Default logging alerts
Copy link

Logging alerts are installed as part of the Red Hat OpenShift Logging Operator installation. Alerts depend on metrics exported by the log collection and log storage backends. These metrics are enabled if you selected the option to Enable operator recommended cluster monitoring on this namespace when installing the Red Hat OpenShift Logging Operator. For more information about installing logging Operators, see Installing logging using the web console.

Default logging alerts are sent to the OpenShift Container Platform monitoring stack Alertmanager in the openshift-monitoring namespace, unless you have disabled the local Alertmanager instance.

11.1.1. Accessing the Alerting UI in the Administrator and Developer perspectives
Copy link

The Alerting UI is accessible through the Administrator perspective and the Developer perspective of the OpenShift Container Platform web console.

In the Administrator perspective, go to Observe Alerting. The three main pages in the Alerting UI in this perspective are the Alerts, Silences, and Alerting rules pages.

In the Developer perspective, go to Observe <project_name> Alerts. In this perspective, alerts, silences, and alerting rules are all managed from the Alerts page. The results shown in the Alerts page are specific to the selected project.

Note

In the Developer perspective, you can select from core OpenShift Container Platform and user-defined projects that you have access to in the Project: <project_name> list. However, alerts, silences, and alerting rules relating to core OpenShift Container Platform projects are not displayed if you do not have cluster-admin privileges.

11.1.2. Logging collector alerts
Copy link

In logging 5.8 and later versions, the following alerts are generated by the Red Hat OpenShift Logging Operator. You can view these alerts in the OpenShift Container Platform web console.

Expand

Alert Name	Message	Description	Severity
CollectorNodeDown	Prometheus could not scrape `namespace`/`pod` collector component for more than 10m.	Collector cannot be scraped.	Critical
CollectorHighErrorRate	`value`% of records have resulted in an error by `namespace`/`pod` collector component.	`namespace`/`pod` collector component errors are high.	Critical
CollectorVeryHighErrorRate	`value`% of records have resulted in an error by `namespace`/`pod` collector component.	`namespace`/`pod` collector component errors are very high.	Critical

11.1.3. Vector collector alerts
Copy link

In logging 5.7 and later versions, the following alerts are generated by the Vector collector. You can view these alerts in the OpenShift Container Platform web console.

Expand

Table 11.1. Vector collector alerts
Alert	Message	Description	Severity
`CollectorHighErrorRate`	`<value> of records have resulted in an error by vector <instance>.`	The number of vector output errors is high, by default more than 10 in the previous 15 minutes.	Warning
`CollectorNodeDown`	`Prometheus could not scrape vector <instance> for more than 10m.`	Vector is reporting that Prometheus could not scrape a specific Vector instance.	Critical
`CollectorVeryHighErrorRate`	`<value> of records have resulted in an error by vector <instance>.`	The number of Vector component errors are very high, by default more than 25 in the previous 15 minutes.	Critical
`FluentdQueueLengthIncreasing`	`In the last 1h, fluentd <instance> buffer queue length constantly increased more than 1. Current value is <value>.`	Fluentd is reporting that the queue size is increasing.	Warning

11.1.4. Fluentd collector alerts
Copy link

The following alerts are generated by the legacy Fluentd log collector. You can view these alerts in the OpenShift Container Platform web console.

Expand

Table 11.2. Fluentd collector alerts
Alert	Message	Description	Severity
`FluentDHighErrorRate`	`<value> of records have resulted in an error by fluentd <instance>.`	The number of FluentD output errors is high, by default more than 10 in the previous 15 minutes.	Warning
`FluentdNodeDown`	`Prometheus could not scrape fluentd <instance> for more than 10m.`	Fluentd is reporting that Prometheus could not scrape a specific Fluentd instance.	Critical
`FluentdQueueLengthIncreasing`	`In the last 1h, fluentd <instance> buffer queue length constantly increased more than 1. Current value is <value>.`	Fluentd is reporting that the queue size is increasing.	Warning
`FluentDVeryHighErrorRate`	`<value> of records have resulted in an error by fluentd <instance>.`	The number of FluentD output errors is very high, by default more than 25 in the previous 15 minutes.	Critical

11.1.5. Elasticsearch alerting rules
Copy link

You can view these alerting rules in the OpenShift Container Platform web console.

Expand

Table 11.3. Alerting rules
Alert	Description	Severity
`ElasticsearchClusterNotHealthy`	The cluster health status has been RED for at least 2 minutes. The cluster does not accept writes, shards may be missing, or the master node hasn’t been elected yet.	Critical
`ElasticsearchClusterNotHealthy`	The cluster health status has been YELLOW for at least 20 minutes. Some shard replicas are not allocated.	Warning
`ElasticsearchDiskSpaceRunningLow`	The cluster is expected to be out of disk space within the next 6 hours.	Critical
`ElasticsearchHighFileDescriptorUsage`	The cluster is predicted to be out of file descriptors within the next hour.	Warning
`ElasticsearchJVMHeapUseHigh`	The JVM Heap usage on the specified node is high.	Alert
`ElasticsearchNodeDiskWatermarkReached`	The specified node has hit the low watermark due to low free disk space. Shards can not be allocated to this node anymore. You should consider adding more disk space to the node.	Info
`ElasticsearchNodeDiskWatermarkReached`	The specified node has hit the high watermark due to low free disk space. Some shards will be re-allocated to different nodes if possible. Make sure more disk space is added to the node or drop old indices allocated to this node.	Warning
`ElasticsearchNodeDiskWatermarkReached`	The specified node has hit the flood watermark due to low free disk space. Every index that has a shard allocated on this node is enforced a read-only block. The index block must be manually released when the disk use falls below the high watermark.	Critical
`ElasticsearchJVMHeapUseHigh`	The JVM Heap usage on the specified node is too high.	Alert
`ElasticsearchWriteRequestsRejectionJumps`	Elasticsearch is experiencing an increase in write rejections on the specified node. This node might not be keeping up with the indexing speed.	Warning
`AggregatedLoggingSystemCPUHigh`	The CPU used by the system on the specified node is too high.	Alert
`ElasticsearchProcessCPUHigh`	The CPU used by Elasticsearch on the specified node is too high.	Alert

Chapter 11. Logging alerts

11.1. Default logging alerts
Copy link

11.1.1. Accessing the Alerting UI in the Administrator and Developer perspectives
Copy link

11.1.2. Logging collector alerts
Copy link

11.1.3. Vector collector alerts
Copy link

11.1.4. Fluentd collector alerts
Copy link

11.1.5. Elasticsearch alerting rules
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat

Making open source more inclusive

About Red Hat Documentation

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 11. Logging alerts

11.1. Default logging alertsCopy linkLink copied to clipboard!

11.1.1. Accessing the Alerting UI in the Administrator and Developer perspectivesCopy linkLink copied to clipboard!

11.1.2. Logging collector alertsCopy linkLink copied to clipboard!

11.1.3. Vector collector alertsCopy linkLink copied to clipboard!

11.1.4. Fluentd collector alertsCopy linkLink copied to clipboard!

11.1.5. Elasticsearch alerting rulesCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat

Making open source more inclusive

About Red Hat Documentation

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

11.1. Default logging alerts
Copy link

11.1.1. Accessing the Alerting UI in the Administrator and Developer perspectives
Copy link

11.1.2. Logging collector alerts
Copy link

11.1.3. Vector collector alerts
Copy link

11.1.4. Fluentd collector alerts
Copy link

11.1.5. Elasticsearch alerting rules
Copy link