主页
产品
OpenShift Container Platform
4.11
日志记录
10.3. 配置 LokiStack 日志存储

10.3. 配置 LokiStack 日志存储

在日志记录文档中，LokiStack 指的是 Loki 和 Web 代理与 OpenShift Container Platform 身份验证集成的日志记录组合。LokiStack 的代理使用 OpenShift Container Platform 身份验证来强制实施多租户。Loki 将日志存储指代为单个组件或外部存储。

10.3.1. 为 cluster-admin 用户角色创建新组
复制链接

重要

以 cluster-admin 用户身份查询多个命名空间的应用程序日志，其中集群中所有命名空间的字符总和大于 5120，会导致错误 Parse error: input size too long (XXXX > 5120)。为了更好地控制 LokiStack 中日志的访问，请使 cluster-admin 用户成为 cluster-admin 组的成员。如果 cluster-admin 组不存在，请创建它并将所需的用户添加到其中。

使用以下步骤为具有 cluster-admin 权限的用户创建新组。

流程

输入以下命令创建新组：
```
oc adm groups new cluster-admin
```
```
$ oc adm groups new cluster-admin
```
Copy to Clipboard Toggle word wrap
输入以下命令将所需的用户添加到 cluster-admin 组中：
```
oc adm groups add-users cluster-admin <username>
```
```
$ oc adm groups add-users cluster-admin <username>
```
Copy to Clipboard Toggle word wrap

输入以下命令在组中添加 cluster-admin 用户角色：

oc adm policy add-cluster-role-to-group cluster-admin cluster-admin

$ oc adm policy add-cluster-role-to-group cluster-admin cluster-admin

Copy to Clipboard

Toggle word wrap

10.3.2. 使用 Loki 启用基于流的保留
复制链接

使用日志记录版本 5.6 及更高版本，您可以根据日志流配置保留策略。这些规则可全局设置，每个租户或两个都设置。如果同时配置这两个，则租户规则会在全局规则之前应用。

要启用基于流的保留，请创建一个 LokiStack 自定义资源(CR)：

全局的基于流的保留示例

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
  limits:
    global: 
      retention: 
        days: 20
        streams:
        - days: 4
          priority: 1
          selector: '{kubernetes_namespace_name=~"test.+"}' 
        - days: 1
          priority: 1
          selector: '{log_type="infrastructure"}'
  managementState: Managed
  replicationFactor: 1
  size: 1x.small
  storage:
    schemas:
    - effectiveDate: "2020-10-11"
      version: v11
    secret:
      name: logging-loki-s3
      type: aws
  storageClassName: standard
  tenants:
    mode: openshift-logging

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
  limits:
    global:


      retention:


        days: 20
        streams:
        - days: 4
          priority: 1
          selector: '{kubernetes_namespace_name=~"test.+"}'


        - days: 1
          priority: 1
          selector: '{log_type="infrastructure"}'
  managementState: Managed
  replicationFactor: 1
  size: 1x.small
  storage:
    schemas:
    - effectiveDate: "2020-10-11"
      version: v11
    secret:
      name: logging-loki-s3
      type: aws
  storageClassName: standard
  tenants:
    mode: openshift-logging

Copy to Clipboard

Toggle word wrap

1: 为所有日志流设置保留策略。注：此字段不会影响存储在对象存储中的保留周期。
2: 当此块被添加到 CR 时，集群中会启用保留。
3: 包含用于定义日志流的 LogQL 查询。

针对一个租户的基于流的保留示例

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
  limits:
    global:
      retention:
        days: 20
    tenants: 
      application:
        retention:
          days: 1
          streams:
            - days: 4
              selector: '{kubernetes_namespace_name=~"test.+"}' 
      infrastructure:
        retention:
          days: 5
          streams:
            - days: 1
              selector: '{kubernetes_namespace_name=~"openshift-cluster.+"}'
  managementState: Managed
  replicationFactor: 1
  size: 1x.small
  storage:
    schemas:
    - effectiveDate: "2020-10-11"
      version: v11
    secret:
      name: logging-loki-s3
      type: aws
  storageClassName: standard
  tenants:
    mode: openshift-logging

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
  limits:
    global:
      retention:
        days: 20
    tenants:


      application:
        retention:
          days: 1
          streams:
            - days: 4
              selector: '{kubernetes_namespace_name=~"test.+"}'


      infrastructure:
        retention:
          days: 5
          streams:
            - days: 1
              selector: '{kubernetes_namespace_name=~"openshift-cluster.+"}'
  managementState: Managed
  replicationFactor: 1
  size: 1x.small
  storage:
    schemas:
    - effectiveDate: "2020-10-11"
      version: v11
    secret:
      name: logging-loki-s3
      type: aws
  storageClassName: standard
  tenants:
    mode: openshift-logging

Copy to Clipboard

Toggle word wrap

1: 根据租户设置保留策略。有效的租户类型是 application、audit 和 infrastructure。
2: 包含用于定义日志流的 LogQL 查询。

应用 LokiStack CR：
```
oc apply -f <filename>.yaml
```
```
$ oc apply -f <filename>.yaml
```
Copy to Clipboard Toggle word wrap

注意

这不适用于为存储的日志管理保留。使用对象存储配置存储在支持的最大 30 天的全局保留周期。

10.3.3. Loki 速率限制错误故障排除
复制链接

如果 Log Forwarder API 将超过速率限制的大量信息转发到 Loki，Loki 会生成速率限制(429)错误。

这些错误可能会在正常操作过程中发生。例如，当将 logging 添加到已具有某些日志的集群中时，logging 会尝试充分利用现有日志条目时可能会出现速率限制错误。在这种情况下，如果添加新日志的速度小于总速率限值，历史数据最终会被处理，并且不要求用户干预即可解决速率限制错误。

如果速率限制错误持续发生，您可以通过修改 LokiStack 自定义资源(CR)来解决此问题。

重要

LokiStack CR 在 Grafana 托管的 Loki 上不可用。本主题不适用于 Grafana 托管的 Loki 服务器。

Conditions

Log Forwarder API 配置为将日志转发到 Loki。

您的系统向 Loki 发送大于 2 MB 的消息块。例如：

"values":[["1630410392689800468","{\"kind\":\"Event\",\"apiVersion\":\
\"received_at\":\"2021-08-31T11:46:32.800278+00:00\",\"version\":\"1.7.4 1.6.0\"}},\"@timestamp\":\"2021-08-31T11:46:32.799692+00:00\",\"viaq_index_name\":\"audit-write\",\"viaq_msg_id\":\"MzFjYjJkZjItNjY0MC00YWU4LWIwMTEtNGNmM2E5ZmViMGU4\",\"log_type\":\"audit\"}"]]}]}

"values":[["1630410392689800468","{\"kind\":\"Event\",\"apiVersion\":\
\"received_at\":\"2021-08-31T11:46:32.800278+00:00\",\"version\":\"1.7.4 1.6.0\"}},\"@timestamp\":\"2021-08-31T11:46:32.799692+00:00\",\"viaq_index_name\":\"audit-write\",\"viaq_msg_id\":\"MzFjYjJkZjItNjY0MC00YWU4LWIwMTEtNGNmM2E5ZmViMGU4\",\"log_type\":\"audit\"}"]]}]}

Copy to Clipboard

Toggle word wrap

输入 oc logs -n openshift-logging -l component=collector 后，集群中的收集器日志会显示包含以下错误消息之一的行：

429 Too Many Requests Ingestion rate limit exceeded

429 Too Many Requests Ingestion rate limit exceeded

Copy to Clipboard

Toggle word wrap

Vector 错误消息示例

2023-08-25T16:08:49.301780Z  WARN sink{component_kind="sink" component_id=default_loki_infra component_type=loki component_name=default_loki_infra}: vector::sinks::util::retries: Retrying after error. error=Server responded with an error: 429 Too Many Requests internal_log_rate_limit=true

2023-08-25T16:08:49.301780Z  WARN sink{component_kind="sink" component_id=default_loki_infra component_type=loki component_name=default_loki_infra}: vector::sinks::util::retries: Retrying after error. error=Server responded with an error: 429 Too Many Requests internal_log_rate_limit=true

Copy to Clipboard

Toggle word wrap

Fluentd 错误消息示例

2023-08-30 14:52:15 +0000 [warn]: [default_loki_infra] failed to flush the buffer. retry_times=2 next_retry_time=2023-08-30 14:52:19 +0000 chunk="604251225bf5378ed1567231a1c03b8b" error_class=Fluent::Plugin::LokiOutput::LogPostError error="429 Too Many Requests Ingestion rate limit exceeded for user infrastructure (limit: 4194304 bytes/sec) while attempting to ingest '4082' lines totaling '7820025' bytes, reduce log volume or contact your Loki administrator to see if the limit can be increased\n"

2023-08-30 14:52:15 +0000 [warn]: [default_loki_infra] failed to flush the buffer. retry_times=2 next_retry_time=2023-08-30 14:52:19 +0000 chunk="604251225bf5378ed1567231a1c03b8b" error_class=Fluent::Plugin::LokiOutput::LogPostError error="429 Too Many Requests Ingestion rate limit exceeded for user infrastructure (limit: 4194304 bytes/sec) while attempting to ingest '4082' lines totaling '7820025' bytes, reduce log volume or contact your Loki administrator to see if the limit can be increased\n"

Copy to Clipboard

Toggle word wrap

在接收结束时也会看到这个错误。例如，在 LokiStack ingester pod 中：

Loki ingester 错误消息示例

level=warn ts=2023-08-30T14:57:34.155592243Z caller=grpc_logging.go:43 duration=1.434942ms method=/logproto.Pusher/Push err="rpc error: code = Code(429) desc = entry with timestamp 2023-08-30 14:57:32.012778399 +0000 UTC ignored, reason: 'Per stream rate limit exceeded (limit: 3MB/sec) while attempting to ingest for stream

level=warn ts=2023-08-30T14:57:34.155592243Z caller=grpc_logging.go:43 duration=1.434942ms method=/logproto.Pusher/Push err="rpc error: code = Code(429) desc = entry with timestamp 2023-08-30 14:57:32.012778399 +0000 UTC ignored, reason: 'Per stream rate limit exceeded (limit: 3MB/sec) while attempting to ingest for stream

Copy to Clipboard

Toggle word wrap

流程

更新 LokiStack CR 中的 ingestionBurstSize 和 ingestionRate 字段：
```
apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
  limits:
    global:
      ingestion:
        ingestionBurstSize: 16 
        ingestionRate: 8 
# ...
```
```
apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
  limits:
    global:
      ingestion:
        ingestionBurstSize: 16 
```
1
```
        ingestionRate: 8 
```
2
```
# ...
```
Copy to Clipboard Toggle word wrap
1
ingestionBurstSize 字段定义每个经销商副本的最大本地速率限制示例大小（以 MB 为单位）。这个值是一个硬限制。将此值设置为至少在单个推送请求中预期的最大日志大小。不允许大于 ingestionBurstSize 值的单个请求。
2
ingestionRate 字段是每秒最大最大样本量的软限制（以 MB 为单位）。如果日志速率超过限制，则会出现速率限制错误，但收集器会重试发送日志。只要总平均值低于限制，系统就会在没有用户干预的情况下解决错误。

返回顶部

10.3. 配置 LokiStack 日志存储

10.3.1. 为 cluster-admin 用户角色创建新组
复制链接

10.3.2. 使用 Loki 启用基于流的保留
复制链接

10.3.3. Loki 速率限制错误故障排除
复制链接

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

10.3. 配置 LokiStack 日志存储

10.3.1. 为 cluster-admin 用户角色创建新组复制链接链接已复制到粘贴板!

10.3.2. 使用 Loki 启用基于流的保留复制链接链接已复制到粘贴板!

10.3.3. Loki 速率限制错误故障排除复制链接链接已复制到粘贴板!

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

10.3.1. 为 cluster-admin 用户角色创建新组
复制链接

10.3.2. 使用 Loki 启用基于流的保留
复制链接

10.3.3. Loki 速率限制错误故障排除
复制链接