Chapter 2. Configuring the log store


You can configure a LokiStack custom resource (CR) to store application, audit, and infrastructure-related logs.

Loki is a horizontally scalable, highly available, multi-tenant log aggregation system offered as a GA log store for logging for Red Hat OpenShift that can be visualized with the OpenShift Observability UI. The Loki configuration provided by OpenShift Logging is a short-term log store designed to enable users to perform fast troubleshooting with the collected logs. For that purpose, the logging for Red Hat OpenShift configuration of Loki has short-term storage, and is optimized for very recent queries.

Important

For long-term storage or queries over a long time period, users should look to log stores external to their cluster. Loki sizing is only tested and supported for short term storage, for a maximum of 30 days.

2.1. Loki deployment sizing

Sizing for Loki follows the format of 1x.<size> where the value 1x is number of instances and <size> specifies performance capabilities.

The 1x.pico configuration defines a single Loki deployment with minimal resource and limit requirements, offering high availability (HA) support for all Loki components. This configuration is suited for deployments that do not require a single replication factor or auto-compaction.

Disk requests are similar across size configurations, allowing customers to test different sizes to determine the best fit for their deployment needs.

Important

It is not possible to change the number 1x for the deployment size.

Expand
Table 2.1. Loki sizing
 1x.demo1x.pico [6.1+ only]1x.extra-small1x.small1x.medium

Data transfer

Demo use only

50GB/day

100GB/day

500GB/day

2TB/day

Queries per second (QPS)

Demo use only

1-25 QPS at 200ms

1-25 QPS at 200ms

25-50 QPS at 200ms

25-75 QPS at 200ms

Replication factor

None

2

2

2

2

Total CPU requests

None

7 vCPUs

14 vCPUs

34 vCPUs

54 vCPUs

Total CPU requests if using the ruler

None

8 vCPUs

16 vCPUs

42 vCPUs

70 vCPUs

Total memory requests

None

17Gi

31Gi

67Gi

139Gi

Total memory requests if using the ruler

None

18Gi

35Gi

83Gi

171Gi

Total disk requests

40Gi

590Gi

430Gi

430Gi

590Gi

Total disk requests if using the ruler

60Gi

910Gi

750Gi

750Gi

910Gi

2.2. Loki object storage

The Loki Operator supports AWS S3, as well as other S3 compatible object stores such as Minio and OpenShift Data Foundation. Azure, GCS, and Swift are also supported.

The recommended nomenclature for Loki storage is logging-loki-<your_storage_provider>.

The following table shows the type values within the LokiStack custom resource (CR) for each storage provider. For more information, see the section on your storage provider.

Expand
Table 2.2. Secret type quick reference
Storage providerSecret type value

AWS

s3

Azure

azure

Google Cloud

gcs

Minio

s3

OpenShift Data Foundation

s3

Swift

swift

2.2.1. AWS storage

You can create an object storage in Amazon Web Services (AWS) to store logs.

Prerequisites

Procedure

  • Create an object storage secret with the name logging-loki-aws by running the following command:

    $ oc create secret generic logging-loki-aws \ 
    1
    
      --from-literal=bucketnames="<bucket_name>" \
      --from-literal=endpoint="<aws_bucket_endpoint>" \
      --from-literal=access_key_id="<aws_access_key_id>" \
      --from-literal=access_key_secret="<aws_access_key_secret>" \
      --from-literal=region="<aws_region_of_your_bucket>"
      --from-literal=forcepathstyle="true" 
    2
    Copy to Clipboard Toggle word wrap
    1
    logging-loki-aws is the name of the secret.
    2
    AWS endpoints (those ending in .amazonaws.com) use a virtual-hosted style by default, which is equivalent to setting the forcepathstyle attribute to false. Conversely, non-AWS endpoints use a path style, equivalent to setting forcepathstyle attribute to true. If you need to use a virtual-hosted style with non-AWS S3 services, you must explicitly set forcepathstyle to false.

2.2.1.1. AWS storage for STS enabled clusters

If your cluster has STS enabled, the Cloud Credential Operator (CCO) supports short-term authentication by using AWS tokens.

You can create the Loki object storage secret manually by running the following command:

$ oc -n openshift-logging create secret generic "logging-loki-aws" \
  --from-literal=bucketnames="<s3_bucket_name>" \
  --from-literal=region="<bucket_region>" \
  --from-literal=audience="<oidc_audience>" 
1
Copy to Clipboard Toggle word wrap
1
Optional annotation, default value is openshift.

2.2.2. Azure storage

Prerequisites

  • You installed the Loki Operator.
  • You installed the OpenShift CLI (oc).
  • You created a bucket on Azure.

Procedure

  • Create an object storage secret with the name logging-loki-azure by running the following command:

    $ oc create secret generic logging-loki-azure \
      --from-literal=container="<azure_container_name>" \
      --from-literal=environment="<azure_environment>" \ 
    1
    
      --from-literal=account_name="<azure_account_name>" \
      --from-literal=account_key="<azure_account_key>"
    Copy to Clipboard Toggle word wrap
    1
    Supported environment values are AzureGlobal, AzureChinaCloud, AzureGermanCloud, or AzureUSGovernment.

If your cluster has Microsoft Entra Workload ID enabled, the Cloud Credential Operator (CCO) supports short-term authentication using Workload ID.

You can create the Loki object storage secret manually by running the following command:

$ oc -n openshift-logging create secret generic logging-loki-azure \
  --from-literal=environment="<azure_environment>" \
  --from-literal=account_name="<storage_account_name>" \
  --from-literal=container="<container_name>"
Copy to Clipboard Toggle word wrap

2.2.3. Google Cloud Platform storage

Prerequisites

  • You installed the Loki Operator.
  • You installed the OpenShift CLI (oc).
  • You created a project on Google Cloud Platform (GCP).
  • You created a bucket in the same project.
  • You created a service account in the same project for GCP authentication.

Procedure

  1. Copy the service account credentials received from GCP into a file called key.json.
  2. Create an object storage secret with the name logging-loki-gcs by running the following command:

    $ oc create secret generic logging-loki-gcs \
      --from-literal=bucketname="<bucket_name>" \
      --from-file=key.json="<path/to/key.json>"
    Copy to Clipboard Toggle word wrap

2.2.4. Minio storage

You can create an object storage in Minio to store logs.

Prerequisites

  • You installed the Loki Operator.
  • You installed the OpenShift CLI (oc).
  • You have Minio deployed on your cluster.
  • You created a bucket on Minio.

Procedure

  • Create an object storage secret with the name logging-loki-minio by running the following command:

    $ oc create secret generic logging-loki-minio \ 
    1
    
      --from-literal=bucketnames="<bucket_name>" \
      --from-literal=endpoint="<minio_bucket_endpoint>" \
      --from-literal=access_key_id="<minio_access_key_id>" \
      --from-literal=access_key_secret="<minio_access_key_secret>"
      --from-literal=forcepathstyle="true" 
    2
    Copy to Clipboard Toggle word wrap
    1
    logging-loki-minio is the name of the secret.
    2
    AWS endpoints (those ending in .amazonaws.com) use a virtual-hosted style by default, which is equivalent to setting the forcepathstyle attribute to false. Conversely, non-AWS endpoints use a path style, equivalent to setting forcepathstyle attribute to true. If you need to use a virtual-hosted style with non-AWS S3 services, you must explicitly set forcepathstyle to false.

2.2.5. OpenShift Data Foundation storage

You can create an object storage in OpenShift Data Foundation storage to store logs.

Prerequisites

Procedure

  1. Create an ObjectBucketClaim custom resource in the openshift-logging namespace:

    apiVersion: objectbucket.io/v1alpha1
    kind: ObjectBucketClaim
    metadata:
      name: loki-bucket-odf
      namespace: openshift-logging
    spec:
      generateBucketName: loki-bucket-odf
      storageClassName: openshift-storage.noobaa.io
    Copy to Clipboard Toggle word wrap
  2. Get bucket properties from the associated ConfigMap object by running the following command:

    BUCKET_HOST=$(oc get -n openshift-logging configmap loki-bucket-odf -o jsonpath='{.data.BUCKET_HOST}')
    BUCKET_NAME=$(oc get -n openshift-logging configmap loki-bucket-odf -o jsonpath='{.data.BUCKET_NAME}')
    BUCKET_PORT=$(oc get -n openshift-logging configmap loki-bucket-odf -o jsonpath='{.data.BUCKET_PORT}')
    Copy to Clipboard Toggle word wrap
  3. Get bucket access key from the associated secret by running the following command:

    ACCESS_KEY_ID=$(oc get -n openshift-logging secret loki-bucket-odf -o jsonpath='{.data.AWS_ACCESS_KEY_ID}' | base64 -d)
    SECRET_ACCESS_KEY=$(oc get -n openshift-logging secret loki-bucket-odf -o jsonpath='{.data.AWS_SECRET_ACCESS_KEY}' | base64 -d)
    Copy to Clipboard Toggle word wrap
  4. Create an object storage secret with the name logging-loki-odf by running the following command:

    $ oc create -n openshift-logging secret generic logging-loki-odf \ 
    1
    
      --from-literal=access_key_id="<access_key_id>" \
      --from-literal=access_key_secret="<secret_access_key>" \
      --from-literal=bucketnames="<bucket_name>" \
      --from-literal=endpoint="https://<bucket_host>:<bucket_port>"
      --from-literal=forcepathstyle="true" 
    2
    Copy to Clipboard Toggle word wrap
    1
    logging-loki-odf is the name of the secret.
    2
    AWS endpoints (those ending in .amazonaws.com) use a virtual-hosted style by default, which is equivalent to setting the forcepathstyle attribute to false. Conversely, non-AWS endpoints use a path style, equivalent to setting forcepathstyle attribute to true. If you need to use a virtual-hosted style with non-AWS S3 services, you must explicitly set forcepathstyle to false.

2.2.6. Swift storage

Prerequisites

  • You installed the Loki Operator.
  • You installed the OpenShift CLI (oc).
  • You created a bucket on Swift.

Procedure

  • Create an object storage secret with the name logging-loki-swift by running the following command:

    $ oc create secret generic logging-loki-swift \
      --from-literal=auth_url="<swift_auth_url>" \
      --from-literal=username="<swift_usernameclaim>" \
      --from-literal=user_domain_name="<swift_user_domain_name>" \
      --from-literal=user_domain_id="<swift_user_domain_id>" \
      --from-literal=user_id="<swift_user_id>" \
      --from-literal=password="<swift_password>" \
      --from-literal=domain_id="<swift_domain_id>" \
      --from-literal=domain_name="<swift_domain_name>" \
      --from-literal=container_name="<swift_container_name>"
    Copy to Clipboard Toggle word wrap
  • You can optionally provide project-specific data, region, or both by running the following command:

    $ oc create secret generic logging-loki-swift \
      --from-literal=auth_url="<swift_auth_url>" \
      --from-literal=username="<swift_usernameclaim>" \
      --from-literal=user_domain_name="<swift_user_domain_name>" \
      --from-literal=user_domain_id="<swift_user_domain_id>" \
      --from-literal=user_id="<swift_user_id>" \
      --from-literal=password="<swift_password>" \
      --from-literal=domain_id="<swift_domain_id>" \
      --from-literal=domain_name="<swift_domain_name>" \
      --from-literal=container_name="<swift_container_name>" \
      --from-literal=project_id="<swift_project_id>" \
      --from-literal=project_name="<swift_project_name>" \
      --from-literal=project_domain_id="<swift_project_domain_id>" \
      --from-literal=project_domain_name="<swift_project_domain_name>" \
      --from-literal=region="<swift_region>"
    Copy to Clipboard Toggle word wrap

For some storage providers, you can use the Cloud Credential Operator utility (ccoctl) during installation to implement short-term credentials. These credentials are created and managed outside the OpenShift Container Platform cluster. For more information, see Manual mode with short-term credentials for components.

Note

Short-term credential authentication must be configured during a new installation of Loki Operator, on a cluster that uses this credentials strategy. You cannot configure an existing cluster that uses a different credentials strategy to use this feature.

You can use workload identity federation with short-lived tokens to authenticate to cloud-based log stores. With workload identity federation, you do not have to store long-lived credentials in your cluster, which reduces the risk of credential leaks and simplifies secret management.

Prerequisites

  • You have administrator permissions.

Procedure

  • Use one of the following options to enable authentication:

    • If you used the OpenShift Container Platform web console to install the Loki Operator, the system automatically detects clusters that use short-lived tokens. You are prompted to create roles and supply the data required for the Loki Operator to create a CredentialsRequest object, which populates a secret.
    • If you used the OpenShift CLI (oc) to install the Loki Operator, you must manually create a Subscription object. Use the appropriate template for your storage provider, as shown in the following samples. This authentication strategy supports only the storage providers indicated within the samples.

      Microsoft Azure sample subscription

      apiVersion: operators.coreos.com/v1alpha1
      kind: Subscription
      metadata:
        name: loki-operator
        namespace: openshift-operators-redhat
      spec:
        channel: "stable-6.3"
        installPlanApproval: Manual
        name: loki-operator
        source: redhat-operators
        sourceNamespace: openshift-marketplace
        config:
          env:
            - name: CLIENTID
              value: <your_client_id>
            - name: TENANTID
              value: <your_tenant_id>
            - name: SUBSCRIPTIONID
              value: <your_subscription_id>
            - name: REGION
              value: <your_region>
      Copy to Clipboard Toggle word wrap

      Amazon Web Services (AWS) sample subscription

      apiVersion: operators.coreos.com/v1alpha1
      kind: Subscription
      metadata:
        name: loki-operator
        namespace: openshift-operators-redhat
      spec:
        channel: "stable-6.3"
        installPlanApproval: Manual
        name: loki-operator
        source: redhat-operators
        sourceNamespace: openshift-marketplace
        config:
          env:
          - name: ROLEARN
            value: <role_ARN>
      Copy to Clipboard Toggle word wrap

      Google Cloud Platform (GCP) sample subscription

      apiVersion: operators.coreos.com/v1alpha1
      kind: Subscription
      metadata:
        name: loki-operator
        namespace: openshift-operators-redhat
      spec:
        channel: "stable-6.3"
        installPlanApproval: Manual
        name: loki-operator
        source:  redhat-operators
        sourceNamespace: openshift-marketplace
        config:
          env:
          - name: PROJECT_NUMBER
            value: <your_project_number>
          - name: POOL_ID
            value: <your_pool_id>
          - name: PROVIDER_ID
            value: <your_provider_id>
          - name: SERVICE_ACCOUNT_EMAIL
            value: example@mydomain.iam.gserviceaccount.com
      Copy to Clipboard Toggle word wrap

You can create a LokiStack custom resource (CR) by using the OpenShift Container Platform web console.

Prerequisites

  • You have administrator permissions.
  • You have access to the OpenShift Container Platform web console.
  • You installed the Loki Operator.

Procedure

  1. Go to the Operators Installed Operators page. Click the All instances tab.
  2. From the Create new drop-down list, select LokiStack.
  3. Select YAML view, and then use the following template to create a LokiStack CR:

    apiVersion: loki.grafana.com/v1
    kind: LokiStack
    metadata:
      name: logging-loki 
    1
    
      namespace: openshift-logging
    spec:
      size: 1x.small 
    2
    
      storage:
        schemas:
          - effectiveDate: '2023-10-15'
            version: v13
        secret:
          name: logging-loki-s3 
    3
    
          type: s3 
    4
    
          credentialMode: 
    5
    
      storageClassName: <storage_class_name> 
    6
    
      tenants:
        mode: openshift-logging
    Copy to Clipboard Toggle word wrap
    1
    Use the name logging-loki.
    2
    Specify the deployment size. In the logging 5.8 and later versions, the supported size options for production instances of Loki are 1x.extra-small, 1x.small, or 1x.medium.
    3
    Specify the secret used for your log storage.
    4
    Specify the corresponding storage type.
    5
    Optional field, logging 5.9 and later. Supported user configured values are as follows: static is the default authentication mode available for all supported object storage types using credentials stored in a Secret. token for short-lived tokens retrieved from a credential source. In this mode the static configuration does not contain credentials needed for the object storage. Instead, they are generated during runtime using a service, which allows for shorter-lived credentials and much more granular control. This authentication mode is not supported for all object storage types. token-cco is the default value when Loki is running on managed STS mode and using CCO on STS/WIF clusters.
    6
    Enter the name of a storage class for temporary storage. For best performance, specify a storage class that allocates block storage. Available storage classes for your cluster can be listed by using the oc get storageclasses command.

To configure Loki object storage, you must create a secret. You can do this by using the OpenShift CLI (oc).

Prerequisites

  • You have administrator permissions.
  • You installed the Loki Operator.
  • You installed the OpenShift CLI (oc).

Procedure

  • Create a secret in the directory that contains your certificate and key files by running the following command:

    $ oc create secret generic -n openshift-logging <your_secret_name> \
     --from-file=tls.key=<your_key_file>
     --from-file=tls.crt=<your_crt_file>
     --from-file=ca-bundle.crt=<your_bundle_file>
     --from-literal=username=<your_username>
     --from-literal=password=<your_password>
    Copy to Clipboard Toggle word wrap
Note

Use generic or opaque secrets for best results.

Verification

  • Verify that a secret was created by running the following command:

    $ oc get secrets
    Copy to Clipboard Toggle word wrap

2.2.8. Fine grained access for Loki logs

The Red Hat OpenShift Logging Operator does not grant all users access to logs by default. As an administrator, you must configure your users' access unless the Operator was upgraded and prior configurations are in place. Depending on your configuration and need, you can configure fine grain access to logs using the following:

  • Cluster wide policies
  • Namespace scoped policies
  • Creation of custom admin groups

As an administrator, you need to create the role bindings and cluster role bindings appropriate for your deployment. The Red Hat OpenShift Logging Operator provides the following cluster roles:

  • cluster-logging-application-view grants permission to read application logs.
  • cluster-logging-infrastructure-view grants permission to read infrastructure logs.
  • cluster-logging-audit-view grants permission to read audit logs.

If you have upgraded from a prior version, an additional cluster role logging-application-logs-reader and associated cluster role binding logging-all-authenticated-application-logs-reader provide backward compatibility, allowing any authenticated user read access in their namespaces.

Note

Users with access by namespace must provide a namespace when querying application logs.

2.2.8.1. Cluster wide access

Cluster role binding resources reference cluster roles, and set permissions cluster wide.

Example ClusterRoleBinding

kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: logging-all-application-logs-reader
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-logging-application-view 
1

subjects: 
2

- kind: Group
  name: system:authenticated
  apiGroup: rbac.authorization.k8s.io
Copy to Clipboard Toggle word wrap

1
Additional ClusterRoles are cluster-logging-infrastructure-view, and cluster-logging-audit-view.
2
Specifies the users or groups this object applies to.

2.2.8.2. Namespaced access

RoleBinding resources can be used with ClusterRole objects to define the namespace a user or group has access to logs for.

Example RoleBinding

kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: allow-read-logs
  namespace: log-test-0 
1

roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-logging-application-view
subjects:
- kind: User
  apiGroup: rbac.authorization.k8s.io
  name: testuser-0
Copy to Clipboard Toggle word wrap

1
Specifies the namespace this RoleBinding applies to.

2.2.8.3. Custom admin group access

If you have a large deployment with several users who require broader permissions, you can create a custom group using the adminGroup field. Users who are members of any group specified in the adminGroups field of the LokiStack CR are considered administrators.

Administrator users have access to all application logs in all namespaces, if they also get assigned the cluster-logging-application-view role.

Example LokiStack CR

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
  tenants:
    mode: openshift-logging 
1

    openshift:
      adminGroups: 
2

      - cluster-admin
      - custom-admin-group 
3
Copy to Clipboard Toggle word wrap

1
Custom admin groups are only available in this mode.
2
Entering an empty list [] value for this field disables admin groups.
3
Overrides the default groups (system:cluster-admins, cluster-admin, dedicated-admin)
Important

Querying application logs for multiple namespaces as a cluster-admin user, where the sum total of characters of all of the namespaces in the cluster is greater than 5120, results in the error Parse error: input size too long (XXXX > 5120). For better control over access to logs in LokiStack, make the cluster-admin user a member of the cluster-admin group. If the cluster-admin group does not exist, create it and add the desired users to it.

Use the following procedure to create a new group for users with cluster-admin permissions.

Procedure

  1. Enter the following command to create a new group:

    $ oc adm groups new cluster-admin
    Copy to Clipboard Toggle word wrap
  2. Enter the following command to add the desired user to the cluster-admin group:

    $ oc adm groups add-users cluster-admin <username>
    Copy to Clipboard Toggle word wrap
  3. Enter the following command to add cluster-admin user role to the group:

    $ oc adm policy add-cluster-role-to-group cluster-admin cluster-admin
    Copy to Clipboard Toggle word wrap

2.3. Enhanced reliability and performance

Use the following configurations to ensure reliability and efficiency of Loki in production.

2.3.1. Loki pod placement

You can control which nodes the Loki pods run on, and prevent other workloads from using those nodes, by using tolerations or node selectors on the pods.

You can apply tolerations to the log store pods with the LokiStack custom resource (CR) and apply taints to a node with the node specification. A taint on a node is a key:value pair that instructs the node to repel all pods that do not allow the taint. Using a specific key:value pair that is not on other pods ensures that only the log store pods can run on that node.

Example LokiStack with node selectors

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
# ...
  template:
    compactor: 
1

      nodeSelector:
        node-role.kubernetes.io/infra: "" 
2

    distributor:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    gateway:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    indexGateway:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    ingester:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    querier:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    queryFrontend:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    ruler:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
# ...
Copy to Clipboard Toggle word wrap

1
Specifies the component pod type that applies to the node selector.
2
Specifies the pods that are moved to nodes containing the defined label.

Example LokiStack CR with node selectors and tolerations

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
# ...
  template:
    compactor:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
    distributor:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
    indexGateway:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
    ingester:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
    querier:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
    queryFrontend:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
    ruler:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
    gateway:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
# ...
Copy to Clipboard Toggle word wrap

To configure the nodeSelector and tolerations fields of the LokiStack (CR), you can use the oc explain command to view the description and fields for a particular resource:

$ oc explain lokistack.spec.template
Copy to Clipboard Toggle word wrap

Example output

KIND:     LokiStack
VERSION:  loki.grafana.com/v1

RESOURCE: template <Object>

DESCRIPTION:
     Template defines the resource/limits/tolerations/nodeselectors per
     component

FIELDS:
   compactor	<Object>
     Compactor defines the compaction component spec.

   distributor	<Object>
     Distributor defines the distributor component spec.
...
Copy to Clipboard Toggle word wrap

For more detailed information, you can add a specific field:

$ oc explain lokistack.spec.template.compactor
Copy to Clipboard Toggle word wrap

Example output

KIND:     LokiStack
VERSION:  loki.grafana.com/v1

RESOURCE: compactor <Object>

DESCRIPTION:
     Compactor defines the compaction component spec.

FIELDS:
   nodeSelector	<map[string]string>
     NodeSelector defines the labels required by a node to schedule the
     component onto it.
...
Copy to Clipboard Toggle word wrap

2.3.2. Configuring Loki to tolerate node failure

In the logging 5.8 and later versions, the Loki Operator supports setting pod anti-affinity rules to request that pods of the same component are scheduled on different available nodes in the cluster.

Affinity is a property of pods that controls the nodes on which they prefer to be scheduled. Anti-affinity is a property of pods that prevents a pod from being scheduled on a node.

In OpenShift Container Platform, pod affinity and pod anti-affinity allow you to constrain which nodes your pod is eligible to be scheduled on based on the key-value labels on other pods.

The Operator sets default, preferred podAntiAffinity rules for all Loki components, which includes the compactor, distributor, gateway, indexGateway, ingester, querier, queryFrontend, and ruler components.

You can override the preferred podAntiAffinity settings for Loki components by configuring required settings in the requiredDuringSchedulingIgnoredDuringExecution field:

Example user settings for the ingester component

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
# ...
  template:
    ingester:
      podAntiAffinity:
      # ...
        requiredDuringSchedulingIgnoredDuringExecution: 
1

        - labelSelector:
            matchLabels: 
2

              app.kubernetes.io/component: ingester
          topologyKey: kubernetes.io/hostname
# ...
Copy to Clipboard Toggle word wrap

1
The stanza to define a required rule.
2
The key-value pair (label) that must be matched to apply the rule.

2.3.3. Enabling stream-based retention with Loki

You can configure retention policies based on log streams. You can set retention rules globally, per-tenant, or both. If you configure both, tenant rules apply before global rules.

Important

If there is no retention period defined on the s3 bucket or in the LokiStack custom resource (CR), then the logs are not pruned and they stay in the s3 bucket forever, which might fill up the s3 storage.

Note
  • Although logging version 5.9 and later supports schema v12, schema v13 is recommended for future compatibility.
  • For cost-effective log pruning, configure retention policies directly on the object storage provider. Use the lifecycle management features of the storage provider to ensure automatic deletion of old logs. This also avoids extra processing from Loki and delete requests to S3.

    If the object storage does not support lifecycle policies, you must configure LokiStack to enforce retention internally. The supported retention period is up to 30 days.

Prerequisites

  • You have administrator permissions.
  • You have installed the Loki Operator.
  • You have installed the OpenShift CLI (oc).

Procedure

  1. To enable stream-based retention, create a LokiStack CR and save it as a YAML file. In the following example, it is called lokistack.yaml.

    Example global stream-based retention for S3

    apiVersion: loki.grafana.com/v1
    kind: LokiStack
    metadata:
      name: logging-loki
      namespace: openshift-logging
    spec:
      limits:
       global: 
    1
    
          retention: 
    2
    
            days: 20
            streams:
            - days: 4
              priority: 1
              selector: '{kubernetes_namespace_name=~"test.+"}' 
    3
    
            - days: 1
              priority: 1
              selector: '{log_type="infrastructure"}'
      managementState: Managed
      replicationFactor: 1
      size: 1x.small
      storage:
        schemas:
        - effectiveDate: "2020-10-11"
          version: v13
        secret:
          name: logging-loki-s3
          type: s3
      storageClassName: gp3-csi
      tenants:
        mode: openshift-logging
    Copy to Clipboard Toggle word wrap

    1
    Set the retention policy for all log streams. This policy does not impact the retention period for stored logs in object storage.
    2
    Enable retention in the cluster by adding the retention block to the CR.
    3
    Specify the LogQL query to match log streams to the retention rule.

    Example per-tenant stream-based retention for S3

    apiVersion: loki.grafana.com/v1
    kind: LokiStack
    metadata:
      name: logging-loki
      namespace: openshift-logging
    spec:
      limits:
        global:
          retention:
            days: 20
        tenants: 
    1
    
          application:
            retention:
              days: 1
              streams:
                - days: 4
                  selector: '{kubernetes_namespace_name=~"test.+"}' 
    2
    
          infrastructure:
            retention:
              days: 5
              streams:
                - days: 1
                  selector: '{kubernetes_namespace_name=~"openshift-cluster.+"}'
      managementState: Managed
      replicationFactor: 1
      size: 1x.small
      storage:
        schemas:
        - effectiveDate: "2020-10-11"
          version: v13
        secret:
          name: logging-loki-s3
          type: s3
      storageClassName: gp3-csi
      tenants:
        mode: openshift-logging
    Copy to Clipboard Toggle word wrap

    1
    Set the retention policy per-tenant. Valid tenant types are application, audit, and infrastructure.
    2
    Specify the LogQL query to match log streams to the retention rule.
  2. Apply the LokiStack CR:

    $ oc apply -f lokistack.yaml
    Copy to Clipboard Toggle word wrap

In an OpenShift Container Platform cluster, administrators generally use a non-private IP network range. As a result, the LokiStack memberlist configuration fails because, by default, it only uses private IP networks.

As an administrator, you can select the pod network for the memberlist configuration. You can modify the LokiStack custom resource (CR) to use the podIP address in the hashRing spec. To configure the LokiStack CR, use the following command:

$ oc patch LokiStack logging-loki -n openshift-logging  --type=merge -p '{"spec": {"hashRing":{"memberlist":{"instanceAddrType":"podIP"},"type":"memberlist"}}}'
Copy to Clipboard Toggle word wrap

Example LokiStack to include podIP

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
# ...
  hashRing:
    type: memberlist
    memberlist:
      instanceAddrType: podIP
# ...
Copy to Clipboard Toggle word wrap

2.3.5. LokiStack behavior during cluster restarts

When an OpenShift Container Platform cluster is restarted, LokiStack ingestion and the query path continue to operate within the available CPU and memory resources available for the node. This means that there is no downtime for the LokiStack during OpenShift Container Platform cluster updates. This behavior is achieved by using PodDisruptionBudget resources. The Loki Operator provisions PodDisruptionBudget resources for Loki, which determine the minimum number of pods that must be available per component to ensure normal operations under certain conditions.

2.4. Advanced deployment and scalability

To configure high availability, scalability, and error handling, use the following information.

2.4.1. Zone aware data replication

The Loki Operator offers support for zone-aware data replication through pod topology spread constraints. Enabling this feature enhances reliability and safeguards against log loss in the event of a single zone failure. When configuring the deployment size as 1x.extra-small, 1x.small, or 1x.medium, the replication.factor field is automatically set to 2.

To ensure proper replication, you need to have at least as many availability zones as the replication factor specifies. While it is possible to have more availability zones than the replication factor, having fewer zones can lead to write failures. Each zone should host an equal number of instances for optimal operation.

Example LokiStack CR with zone replication enabled

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
 name: logging-loki
 namespace: openshift-logging
spec:
 replicationFactor: 2 
1

 replication:
   factor: 2 
2

   zones:
   -  maxSkew: 1 
3

      topologyKey: topology.kubernetes.io/zone 
4
Copy to Clipboard Toggle word wrap

1
Deprecated field, values entered are overwritten by replication.factor.
2
This value is automatically set when deployment size is selected at setup.
3
The maximum difference in number of pods between any two topology domains. The default is 1, and you cannot specify a value of 0.
4
Defines zones in the form of a topology key that corresponds to a node label.

2.4.2. Recovering Loki pods from failed zones

In OpenShift Container Platform a zone failure happens when specific availability zone resources become inaccessible. Availability zones are isolated areas within a cloud provider’s data center, aimed at enhancing redundancy and fault tolerance. If your OpenShift Container Platform cluster is not configured to handle this, a zone failure can lead to service or data loss.

Loki pods are part of a StatefulSet, and they come with Persistent Volume Claims (PVCs) provisioned by a StorageClass object. Each Loki pod and its PVCs reside in the same zone. When a zone failure occurs in a cluster, the StatefulSet controller automatically attempts to recover the affected pods in the failed zone.

Warning

The following procedure will delete the PVCs in the failed zone, and all data contained therein. To avoid complete data loss the replication factor field of the LokiStack CR should always be set to a value greater than 1 to ensure that Loki is replicating.

Prerequisites

  • Verify your LokiStack CR has a replication factor greater than 1.
  • Zone failure detected by the control plane, and nodes in the failed zone are marked by cloud provider integration.

The StatefulSet controller automatically attempts to reschedule pods in a failed zone. Because the associated PVCs are also in the failed zone, automatic rescheduling to a different zone does not work. You must manually delete the PVCs in the failed zone to allow successful re-creation of the stateful Loki Pod and its provisioned PVC in the new zone.

Procedure

  1. List the pods in Pending status by running the following command:

    $ oc get pods --field-selector status.phase==Pending -n openshift-logging
    Copy to Clipboard Toggle word wrap

    Example oc get pods output

    NAME                           READY   STATUS    RESTARTS   AGE 
    1
    
    logging-loki-index-gateway-1   0/1     Pending   0          17m
    logging-loki-ingester-1        0/1     Pending   0          16m
    logging-loki-ruler-1           0/1     Pending   0          16m
    Copy to Clipboard Toggle word wrap

    1
    These pods are in Pending status because their corresponding PVCs are in the failed zone.
  2. List the PVCs in Pending status by running the following command:

    $ oc get pvc -o=json -n openshift-logging | jq '.items[] | select(.status.phase == "Pending") | .metadata.name' -r
    Copy to Clipboard Toggle word wrap

    Example oc get pvc output

    storage-logging-loki-index-gateway-1
    storage-logging-loki-ingester-1
    wal-logging-loki-ingester-1
    storage-logging-loki-ruler-1
    wal-logging-loki-ruler-1
    Copy to Clipboard Toggle word wrap

  3. Delete the PVC(s) for a pod by running the following command:

    $ oc delete pvc <pvc_name> -n openshift-logging
    Copy to Clipboard Toggle word wrap
  4. Delete the pod(s) by running the following command:

    $ oc delete pod <pod_name> -n openshift-logging
    Copy to Clipboard Toggle word wrap

    Once these objects have been successfully deleted, they should automatically be rescheduled in an available zone.

The PVCs might hang in the terminating state without being deleted, if PVC metadata finalizers are set to kubernetes.io/pv-protection. Removing the finalizers should allow the PVCs to delete successfully.

  • Remove the finalizer for each PVC by running the command below, then retry deletion.

    $ oc patch pvc <pvc_name> -p '{"metadata":{"finalizers":null}}' -n openshift-logging
    Copy to Clipboard Toggle word wrap

2.4.3. Troubleshooting Loki rate limit errors

If the Log Forwarder API forwards a large block of messages that exceeds the rate limit to Loki, Loki generates rate limit (429) errors.

These errors can occur during normal operation. For example, when adding the logging to a cluster that already has some logs, rate limit errors might occur while the logging tries to ingest all of the existing log entries. In this case, if the rate of addition of new logs is less than the total rate limit, the historical data is eventually ingested, and the rate limit errors are resolved without requiring user intervention.

In cases where the rate limit errors continue to occur, you can fix the issue by modifying the LokiStack custom resource (CR).

Important

The LokiStack CR is not available on Grafana-hosted Loki. This topic does not apply to Grafana-hosted Loki servers.

Conditions

  • The Log Forwarder API is configured to forward logs to Loki.
  • Your system sends a block of messages that is larger than 2 MB to Loki. For example:

    "values":[["1630410392689800468","{\"kind\":\"Event\",\"apiVersion\":\
    .......
    ......
    ......
    ......
    \"received_at\":\"2021-08-31T11:46:32.800278+00:00\",\"version\":\"1.7.4 1.6.0\"}},\"@timestamp\":\"2021-08-31T11:46:32.799692+00:00\",\"viaq_index_name\":\"audit-write\",\"viaq_msg_id\":\"MzFjYjJkZjItNjY0MC00YWU4LWIwMTEtNGNmM2E5ZmViMGU4\",\"log_type\":\"audit\"}"]]}]}
    Copy to Clipboard Toggle word wrap
  • After you enter oc logs -n openshift-logging -l component=collector, the collector logs in your cluster show a line containing one of the following error messages:

    429 Too Many Requests Ingestion rate limit exceeded
    Copy to Clipboard Toggle word wrap

    Example Vector error message

    2023-08-25T16:08:49.301780Z  WARN sink{component_kind="sink" component_id=default_loki_infra component_type=loki component_name=default_loki_infra}: vector::sinks::util::retries: Retrying after error. error=Server responded with an error: 429 Too Many Requests internal_log_rate_limit=true
    Copy to Clipboard Toggle word wrap

    The error is also visible on the receiving end. For example, in the LokiStack ingester pod:

    Example Loki ingester error message

    level=warn ts=2023-08-30T14:57:34.155592243Z caller=grpc_logging.go:43 duration=1.434942ms method=/logproto.Pusher/Push err="rpc error: code = Code(429) desc = entry with timestamp 2023-08-30 14:57:32.012778399 +0000 UTC ignored, reason: 'Per stream rate limit exceeded (limit: 3MB/sec) while attempting to ingest for stream
    Copy to Clipboard Toggle word wrap

Procedure

  • Update the ingestionBurstSize and ingestionRate fields in the LokiStack CR:

    apiVersion: loki.grafana.com/v1
    kind: LokiStack
    metadata:
      name: logging-loki
      namespace: openshift-logging
    spec:
      limits:
        global:
          ingestion:
            ingestionBurstSize: 16 
    1
    
            ingestionRate: 8 
    2
    
    # ...
    Copy to Clipboard Toggle word wrap
    1
    The ingestionBurstSize field defines the maximum local rate-limited sample size per distributor replica in MB. This value is a hard limit. Set this value to at least the maximum logs size expected in a single push request. Single requests that are larger than the ingestionBurstSize value are not permitted.
    2
    The ingestionRate field is a soft limit on the maximum amount of ingested samples per second in MB. Rate limit errors occur if the rate of logs exceeds the limit, but the collector retries sending the logs. As long as the total average is lower than the limit, the system recovers and errors are resolved without user intervention.

2.5. Log-based alerts

Administrators can allow users to create and manage their own alerting and recording rules by binding cluster roles to usernames. Cluster roles are defined as ClusterRole objects that contain necessary role-based access control (RBAC) permissions for users.

The following cluster roles for alerting and recording rules are available for LokiStack:

Expand
Rule nameDescription

alertingrules.loki.grafana.com-v1-admin

Users with this role have administrative-level access to manage alerting rules. This cluster role grants permissions to create, read, update, delete, list, and watch AlertingRule resources within the loki.grafana.com/v1 API group.

alertingrules.loki.grafana.com-v1-crdview

Users with this role can view the definitions of Custom Resource Definitions (CRDs) related to AlertingRule resources within the loki.grafana.com/v1 API group, but do not have permissions for modifying or managing these resources.

alertingrules.loki.grafana.com-v1-edit

Users with this role have permission to create, update, and delete AlertingRule resources.

alertingrules.loki.grafana.com-v1-view

Users with this role can read AlertingRule resources within the loki.grafana.com/v1 API group. They can inspect configurations, labels, and annotations for existing alerting rules but cannot make any modifications to them.

recordingrules.loki.grafana.com-v1-admin

Users with this role have administrative-level access to manage recording rules. This cluster role grants permissions to create, read, update, delete, list, and watch RecordingRule resources within the loki.grafana.com/v1 API group.

recordingrules.loki.grafana.com-v1-crdview

Users with this role can view the definitions of Custom Resource Definitions (CRDs) related to RecordingRule resources within the loki.grafana.com/v1 API group, but do not have permissions for modifying or managing these resources.

recordingrules.loki.grafana.com-v1-edit

Users with this role have permission to create, update, and delete RecordingRule resources.

recordingrules.loki.grafana.com-v1-view

Users with this role can read RecordingRule resources within the loki.grafana.com/v1 API group. They can inspect configurations, labels, and annotations for existing alerting rules but cannot make any modifications to them.

2.5.1.1. Examples

To apply cluster roles for a user, you must bind an existing cluster role to a specific username.

Cluster roles can be cluster or namespace scoped, depending on which type of role binding you use. When a RoleBinding object is used, as when using the oc adm policy add-role-to-user command, the cluster role only applies to the specified namespace. When a ClusterRoleBinding object is used, as when using the oc adm policy add-cluster-role-to-user command, the cluster role applies to all namespaces in the cluster.

The following example command gives the specified user create, read, update and delete (CRUD) permissions for alerting rules in a specific namespace in the cluster:

Example cluster role binding command for alerting rule CRUD permissions in a specific namespace

$ oc adm policy add-role-to-user alertingrules.loki.grafana.com-v1-admin -n <namespace> <username>
Copy to Clipboard Toggle word wrap

The following command gives the specified user administrator permissions for alerting rules in all namespaces:

Example cluster role binding command for administrator permissions

$ oc adm policy add-cluster-role-to-user alertingrules.loki.grafana.com-v1-admin <username>
Copy to Clipboard Toggle word wrap

The AlertingRule CR contains a set of specifications and webhook validation definitions to declare groups of alerting rules for a single LokiStack instance. In addition, the webhook validation definition provides support for rule validation conditions:

  • If an AlertingRule CR includes an invalid interval period, it is an invalid alerting rule
  • If an AlertingRule CR includes an invalid for period, it is an invalid alerting rule.
  • If an AlertingRule CR includes an invalid LogQL expr, it is an invalid alerting rule.
  • If an AlertingRule CR includes two groups with the same name, it is an invalid alerting rule.
  • If none of the above applies, an alerting rule is considered valid.
Expand
Table 2.3. AlertingRule definitions
Tenant typeValid namespaces for AlertingRule CRs

application

<your_application_namespace>

audit

openshift-logging

infrastructure

openshift-/*, kube-/\*, default

Procedure

  1. Create an AlertingRule custom resource (CR):

    Example infrastructure AlertingRule CR

      apiVersion: loki.grafana.com/v1
      kind: AlertingRule
      metadata:
        name: loki-operator-alerts
        namespace: openshift-operators-redhat 
    1
    
        labels: 
    2
    
          openshift.io/<label_name>: "true"
      spec:
        tenantID: "infrastructure" 
    3
    
        groups:
          - name: LokiOperatorHighReconciliationError
            rules:
              - alert: HighPercentageError
                expr: | 
    4
    
                  sum(rate({kubernetes_namespace_name="openshift-operators-redhat", kubernetes_pod_name=~"loki-operator-controller-manager.*"} |= "error" [1m])) by (job)
                    /
                  sum(rate({kubernetes_namespace_name="openshift-operators-redhat", kubernetes_pod_name=~"loki-operator-controller-manager.*"}[1m])) by (job)
                    > 0.01
                for: 10s
                labels:
                  severity: critical 
    5
    
                annotations:
                  summary: High Loki Operator Reconciliation Errors 
    6
    
                  description: High Loki Operator Reconciliation Errors 
    7
    Copy to Clipboard Toggle word wrap

    1
    The namespace where this AlertingRule CR is created must have a label matching the LokiStack spec.rules.namespaceSelector definition.
    2
    The labels block must match the LokiStack spec.rules.selector definition.
    3
    AlertingRule CRs for infrastructure tenants are only supported in the openshift-*, kube-\*, or default namespaces.
    4
    The value for kubernetes_namespace_name: must match the value for metadata.namespace.
    5
    The value of this mandatory field must be critical, warning, or info.
    6
    This field is mandatory.
    7
    This field is mandatory.

    Example application AlertingRule CR

      apiVersion: loki.grafana.com/v1
      kind: AlertingRule
      metadata:
        name: app-user-workload
        namespace: app-ns 
    1
    
        labels: 
    2
    
          openshift.io/<label_name>: "true"
      spec:
        tenantID: "application"
        groups:
          - name: AppUserWorkloadHighError
            rules:
              - alert:
                expr: | 
    3
    
                  sum(rate({kubernetes_namespace_name="app-ns", kubernetes_pod_name=~"podName.*"} |= "error" [1m])) by (job)
                for: 10s
                labels:
                  severity: critical 
    4
    
                annotations:
                  summary:  
    5
    
                  description:  
    6
    Copy to Clipboard Toggle word wrap

    1
    The namespace where this AlertingRule CR is created must have a label matching the LokiStack spec.rules.namespaceSelector definition.
    2
    The labels block must match the LokiStack spec.rules.selector definition.
    3
    Value for kubernetes_namespace_name: must match the value for metadata.namespace.
    4
    The value of this mandatory field must be critical, warning, or info.
    5
    The value of this mandatory field is a summary of the rule.
    6
    The value of this mandatory field is a detailed description of the rule.
  2. Apply the AlertingRule CR:

    $ oc apply -f <filename>.yaml
    Copy to Clipboard Toggle word wrap
Back to top
Red Hat logoGithubredditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust. Explore our recent updates.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

Theme

© 2025 Red Hat