5.2. 日志转发故障排除
5.2.1. 重新部署 Fluentd pod 复制链接链接已复制到粘贴板!
当您创建 ClusterLogForwarder
自定义资源 (CR) 时,如果 Red Hat OpenShift Logging Operator 没有自动重新部署 Fluentd Pod,您可以删除 Fluentd Pod 来强制重新部署它们。
先决条件
-
您已创建了
ClusterLogForwarder
自定义资源 (CR) 对象。
流程
运行以下命令,删除 Fluentd pod 以强制重新部署:
oc delete pod --selector logging-infra=collector
$ oc delete pod --selector logging-infra=collector
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
5.2.2. Loki 速率限制错误故障排除 复制链接链接已复制到粘贴板!
如果 Log Forwarder API 将超过速率限制的大量信息转发到 Loki,Loki 会生成速率限制(429
)错误。
这些错误可能会在正常操作过程中发生。例如,当将 logging 添加到已具有某些日志的集群中时,logging 会尝试充分利用现有日志条目时可能会出现速率限制错误。在这种情况下,如果添加新日志的速度小于总速率限值,历史数据最终会被处理,并且不要求用户干预即可解决速率限制错误。
如果速率限制错误持续发生,您可以通过修改 LokiStack
自定义资源(CR)来解决此问题。
LokiStack
CR 在 Grafana 托管的 Loki 上不可用。本主题不适用于 Grafana 托管的 Loki 服务器。
Conditions
- Log Forwarder API 配置为将日志转发到 Loki。
您的系统向 Loki 发送大于 2 MB 的消息块。例如:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 输入
oc logs -n openshift-logging -l component=collector
后,集群中的收集器日志会显示包含以下错误消息之一的行:429 Too Many Requests Ingestion rate limit exceeded
429 Too Many Requests Ingestion rate limit exceeded
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Vector 错误消息示例
2023-08-25T16:08:49.301780Z WARN sink{component_kind="sink" component_id=default_loki_infra component_type=loki component_name=default_loki_infra}: vector::sinks::util::retries: Retrying after error. error=Server responded with an error: 429 Too Many Requests internal_log_rate_limit=true
2023-08-25T16:08:49.301780Z WARN sink{component_kind="sink" component_id=default_loki_infra component_type=loki component_name=default_loki_infra}: vector::sinks::util::retries: Retrying after error. error=Server responded with an error: 429 Too Many Requests internal_log_rate_limit=true
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Fluentd 错误消息示例
2023-08-30 14:52:15 +0000 [warn]: [default_loki_infra] failed to flush the buffer. retry_times=2 next_retry_time=2023-08-30 14:52:19 +0000 chunk="604251225bf5378ed1567231a1c03b8b" error_class=Fluent::Plugin::LokiOutput::LogPostError error="429 Too Many Requests Ingestion rate limit exceeded for user infrastructure (limit: 4194304 bytes/sec) while attempting to ingest '4082' lines totaling '7820025' bytes, reduce log volume or contact your Loki administrator to see if the limit can be increased\n"
2023-08-30 14:52:15 +0000 [warn]: [default_loki_infra] failed to flush the buffer. retry_times=2 next_retry_time=2023-08-30 14:52:19 +0000 chunk="604251225bf5378ed1567231a1c03b8b" error_class=Fluent::Plugin::LokiOutput::LogPostError error="429 Too Many Requests Ingestion rate limit exceeded for user infrastructure (limit: 4194304 bytes/sec) while attempting to ingest '4082' lines totaling '7820025' bytes, reduce log volume or contact your Loki administrator to see if the limit can be increased\n"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 在接收结束时也会看到这个错误。例如,在 LokiStack ingester pod 中:
Loki ingester 错误消息示例
level=warn ts=2023-08-30T14:57:34.155592243Z caller=grpc_logging.go:43 duration=1.434942ms method=/logproto.Pusher/Push err="rpc error: code = Code(429) desc = entry with timestamp 2023-08-30 14:57:32.012778399 +0000 UTC ignored, reason: 'Per stream rate limit exceeded (limit: 3MB/sec) while attempting to ingest for stream
level=warn ts=2023-08-30T14:57:34.155592243Z caller=grpc_logging.go:43 duration=1.434942ms method=/logproto.Pusher/Push err="rpc error: code = Code(429) desc = entry with timestamp 2023-08-30 14:57:32.012778399 +0000 UTC ignored, reason: 'Per stream rate limit exceeded (limit: 3MB/sec) while attempting to ingest for stream
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
流程
更新
LokiStack
CR 中的ingestionBurstSize
和ingestionRate
字段:Copy to Clipboard Copied! Toggle word wrap Toggle overflow