6.2. 日志转发故障排除
6.2.1. 重新部署 Fluentd pod
					当您创建 ClusterLogForwarder 自定义资源 (CR) 时,如果 Red Hat OpenShift Logging Operator 没有自动重新部署 Fluentd Pod,您可以删除 Fluentd Pod 来强制重新部署它们。
				
先决条件
- 
							您已创建了 ClusterLogForwarder自定义资源 (CR) 对象。
流程
- 运行以下命令,删除 Fluentd pod 以强制重新部署: - oc delete pod --selector logging-infra=collector - $ oc delete pod --selector logging-infra=collector- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
6.2.2. Loki 速率限制错误故障排除
					如果 Log Forwarder API 将超过速率限制的大量信息转发到 Loki,Loki 会生成速率限制(429)错误。
				
这些错误可能会在正常操作过程中发生。例如,当将 logging 添加到已具有某些日志的集群中时,logging 会尝试充分利用现有日志条目时可能会出现速率限制错误。在这种情况下,如果添加新日志的速度小于总速率限值,历史数据最终会被处理,并且不要求用户干预即可解决速率限制错误。
					如果速率限制错误持续发生,您可以通过修改 LokiStack 自定义资源(CR)来解决此问题。
				
						LokiStack CR 在 Grafana 托管的 Loki 上不可用。本主题不适用于 Grafana 托管的 Loki 服务器。
					
Conditions
- Log Forwarder API 配置为将日志转发到 Loki。
- 您的系统向 Loki 发送大于 2 MB 的消息块。例如: - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- 输入 - oc logs -n openshift-logging -l component=collector后,集群中的收集器日志会显示包含以下错误消息之一的行:- 429 Too Many Requests Ingestion rate limit exceeded - 429 Too Many Requests Ingestion rate limit exceeded- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Vector 错误消息示例 - 2023-08-25T16:08:49.301780Z WARN sink{component_kind="sink" component_id=default_loki_infra component_type=loki component_name=default_loki_infra}: vector::sinks::util::retries: Retrying after error. error=Server responded with an error: 429 Too Many Requests internal_log_rate_limit=true- 2023-08-25T16:08:49.301780Z WARN sink{component_kind="sink" component_id=default_loki_infra component_type=loki component_name=default_loki_infra}: vector::sinks::util::retries: Retrying after error. error=Server responded with an error: 429 Too Many Requests internal_log_rate_limit=true- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Fluentd 错误消息示例 - 2023-08-30 14:52:15 +0000 [warn]: [default_loki_infra] failed to flush the buffer. retry_times=2 next_retry_time=2023-08-30 14:52:19 +0000 chunk="604251225bf5378ed1567231a1c03b8b" error_class=Fluent::Plugin::LokiOutput::LogPostError error="429 Too Many Requests Ingestion rate limit exceeded for user infrastructure (limit: 4194304 bytes/sec) while attempting to ingest '4082' lines totaling '7820025' bytes, reduce log volume or contact your Loki administrator to see if the limit can be increased\n" - 2023-08-30 14:52:15 +0000 [warn]: [default_loki_infra] failed to flush the buffer. retry_times=2 next_retry_time=2023-08-30 14:52:19 +0000 chunk="604251225bf5378ed1567231a1c03b8b" error_class=Fluent::Plugin::LokiOutput::LogPostError error="429 Too Many Requests Ingestion rate limit exceeded for user infrastructure (limit: 4194304 bytes/sec) while attempting to ingest '4082' lines totaling '7820025' bytes, reduce log volume or contact your Loki administrator to see if the limit can be increased\n"- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 在接收结束时也会看到这个错误。例如,在 LokiStack ingester pod 中: - Loki ingester 错误消息示例 - level=warn ts=2023-08-30T14:57:34.155592243Z caller=grpc_logging.go:43 duration=1.434942ms method=/logproto.Pusher/Push err="rpc error: code = Code(429) desc = entry with timestamp 2023-08-30 14:57:32.012778399 +0000 UTC ignored, reason: 'Per stream rate limit exceeded (limit: 3MB/sec) while attempting to ingest for stream - level=warn ts=2023-08-30T14:57:34.155592243Z caller=grpc_logging.go:43 duration=1.434942ms method=/logproto.Pusher/Push err="rpc error: code = Code(429) desc = entry with timestamp 2023-08-30 14:57:32.012778399 +0000 UTC ignored, reason: 'Per stream rate limit exceeded (limit: 3MB/sec) while attempting to ingest for stream- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
流程
- 更新 - LokiStackCR 中的- ingestionBurstSize和- ingestionRate字段:- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow