Chapter 1. Default logging alerts
Logging alerts are installed as part of the Red Hat OpenShift Logging Operator installation. Alerts depend on metrics exported by the log collection and log storage backends. These metrics are enabled if you selected the option to Enable Operator recommended cluster monitoring on this namespace when installing the Red Hat OpenShift Logging Operator.
Default logging alerts are sent to the OpenShift Container Platform monitoring stack Alertmanager in the openshift-monitoring
namespace, unless you have disabled the local Alertmanager instance.
1.1. Accessing the Alerting UI from the Administrator perspective Copy linkLink copied to clipboard!
You can access the Alerting user interface (UI) through the Administrator perspective of the OpenShift Container Platform web console.
Prerequisites
- You have administrator permissions.
- You have access to the OpenShift Container Platform web console.
Procedure
-
From the Administrator perspective, go to Observe
Alerting. The three main pages in the Alerting UI in this perspective are the Alerts, Silences, and Alerting rules pages.
1.2. Red Hat OpenShift Logging Operator alerts Copy linkLink copied to clipboard!
The following alerts are generated by the Vector collector. You can view these alerts in the OpenShift Container Platform web console.
Alert | Message | Description | Severity |
---|---|---|---|
|
| Vector is reporting that Prometheus could not scrape a specific Vector instance. | Critical |
|
| Collectors are consuming too much node disk on the host. | Warning |
|
| At least 10% of sent requests responded with "HTTP 403 Forbidden" for collector "<intance>" in namespace <namespace> for the output "<output>". | Critical |
1.3. Loki Operator alerts Copy linkLink copied to clipboard!
The following alerts are generated by the Loki Operator. You can view these alerts in the OpenShift Container Platform web console.
Alert | Message | Description | Severity |
---|---|---|---|
|
|
At least 10% of requests result in | critical |
|
|
At least 10% of write requests to the lokistack-gateway result in | critical |
|
|
At least 10% of query requests to the lokistack-gateway result in | critical |
|
| A panic was triggered. | critical |
|
| The 99th percentile is experiencing latency higher than 1 second. | critical |
|
| At least 10% of requests are received the rate limit error code. | warning |
|
| The storage path is experiencing slow read response rates. | warning |
|
| The write path is experiencing high load causing backpressure storage flushing. | warning |
|
| The read path has a high volume of queries, causing longer response times. | warning |
|
| Loki is discarding samples during ingestion because they fail validation. | warning |
|
| One or more of the deployed LokiStacks contains an outdated storage schema configuration. | warning |