Home
Products
Red Hat OpenShift Logging
6.3
Configuring logging
Chapter 3. Configuring the log store

Chapter 3. Configuring the log store

You can configure a LokiStack custom resource (CR) to store application, audit, and infrastructure-related logs.

Loki is a horizontally scalable, highly available, multi-tenant log aggregation system offered as a GA log store for logging for Red Hat OpenShift that can be visualized with the OpenShift Observability UI. The Loki configuration provided by OpenShift Logging is a short-term log store designed to enable users to perform fast troubleshooting with the collected logs. For that purpose, the logging for Red Hat OpenShift configuration of Loki has short-term storage, and is optimized for very recent queries.

Important

For long-term storage or queries over a long time period, users should look to log stores external to their cluster. Loki sizing is only tested and supported for short term storage, for a maximum of 30 days.

3.1. Loki deployment sizing
Copy link

Sizing for Loki follows the format of 1x.<size> where the value 1x is number of instances and <size> specifies performance capabilities.

The 1x.pico configuration defines a single Loki deployment with minimal resource and limit requirements, offering high availability (HA) support for all Loki components. This configuration is suited for deployments that do not require a single replication factor or auto-compaction.

Disk requests are similar across size configurations, allowing customers to test different sizes to determine the best fit for their deployment needs.

Important

It is not possible to change the number 1x for the deployment size.

Expand

Table 3.1. Loki sizing
	1x.demo	1x.pico [6.1+ only]	1x.extra-small	1x.small	1x.medium
Data transfer	Demo use only	50GB/day	100GB/day	500GB/day	2TB/day
Queries per second (QPS)	Demo use only	1-25 QPS at 200ms	1-25 QPS at 200ms	25-50 QPS at 200ms	25-75 QPS at 200ms
Replication factor	None	2	2	2	2
Total CPU requests	None	7 vCPUs	14 vCPUs	34 vCPUs	54 vCPUs
Total CPU requests if using the ruler	None	8 vCPUs	16 vCPUs	42 vCPUs	70 vCPUs
Total memory requests	None	17Gi	31Gi	67Gi	139Gi
Total memory requests if using the ruler	None	18Gi	35Gi	83Gi	171Gi
Total disk requests	40Gi	590Gi	430Gi	430Gi	590Gi
Total disk requests if using the ruler	60Gi	910Gi	750Gi	750Gi	910Gi

3.2. Loki object storage
Copy link

The Loki Operator supports AWS S3, as well as other S3 compatible object stores such as Minio and OpenShift Data Foundation. Azure, GCS, and Swift are also supported.

The recommended nomenclature for Loki storage is logging-loki-<your_storage_provider>.

The following table shows the type values within the LokiStack custom resource (CR) for each storage provider. For more information, see the section on your storage provider.

Expand

Table 3.2. Secret type quick reference
Storage provider	Secret `type` value
AWS	s3
Azure	azure
Google Cloud	gcs
Minio	s3
OpenShift Data Foundation	s3
Swift	swift

3.2.1. AWS storage
Copy link

You can create an object storage in Amazon Web Services (AWS) to store logs.

Prerequisites

You installed the Loki Operator.
You installed the OpenShift CLI (oc).
You created a bucket on AWS.
You created an AWS IAM Policy and IAM User.

Procedure

Create an object storage secret with the name logging-loki-aws by running the following command:

oc create secret generic logging-loki-aws \
  --from-literal=bucketnames="<bucket_name>" \
  --from-literal=endpoint="<aws_bucket_endpoint>" \
  --from-literal=access_key_id="<aws_access_key_id>" \
  --from-literal=access_key_secret="<aws_access_key_secret>" \
  --from-literal=region="<aws_region_of_your_bucket>" \
  --from-literal=forcepathstyle="true"

$ oc create secret generic logging-loki-aws \


  --from-literal=bucketnames="<bucket_name>" \
  --from-literal=endpoint="<aws_bucket_endpoint>" \
  --from-literal=access_key_id="<aws_access_key_id>" \
  --from-literal=access_key_secret="<aws_access_key_secret>" \
  --from-literal=region="<aws_region_of_your_bucket>" \
  --from-literal=forcepathstyle="true"

Copy to Clipboard

Toggle word wrap

1: logging-loki-aws is the name of the secret.
2: AWS endpoints (those ending in .amazonaws.com) use a virtual-hosted style by default, which is equivalent to setting the forcepathstyle attribute to false. Conversely, non-AWS endpoints use a path style, equivalent to setting forcepathstyle attribute to true. If you need to use a virtual-hosted style with non-AWS S3 services, you must explicitly set forcepathstyle to false.

3.2.1.1. AWS storage for STS enabled clusters
Copy link

If your cluster has STS enabled, the Cloud Credential Operator (CCO) supports short-term authentication by using AWS tokens.

You can create the Loki object storage secret manually by running the following command:

oc -n openshift-logging create secret generic "logging-loki-aws" \
  --from-literal=bucketnames="<s3_bucket_name>" \
  --from-literal=region="<bucket_region>" \
  --from-literal=audience="<oidc_audience>"

$ oc -n openshift-logging create secret generic "logging-loki-aws" \
  --from-literal=bucketnames="<s3_bucket_name>" \
  --from-literal=region="<bucket_region>" \
  --from-literal=audience="<oidc_audience>"

Copy to Clipboard

Toggle word wrap

1: Optional annotation, default value is openshift.

3.2.2. Azure storage
Copy link

Prerequisites

You installed the Loki Operator.
You installed the OpenShift CLI (oc).
You created a bucket on Azure.

Procedure

Create an object storage secret with the name logging-loki-azure by running the following command:

oc create secret generic logging-loki-azure \
  --from-literal=container="<azure_container_name>" \
  --from-literal=environment="<azure_environment>" \
  --from-literal=account_name="<azure_account_name>" \
  --from-literal=account_key="<azure_account_key>"

$ oc create secret generic logging-loki-azure \
  --from-literal=container="<azure_container_name>" \
  --from-literal=environment="<azure_environment>" \


  --from-literal=account_name="<azure_account_name>" \
  --from-literal=account_key="<azure_account_key>"

Copy to Clipboard

Toggle word wrap

1: Supported environment values are AzureGlobal, AzureChinaCloud, AzureGermanCloud, or AzureUSGovernment.

3.2.2.1. Azure storage for Microsoft Entra Workload ID enabled clusters
Copy link

If your cluster has Microsoft Entra Workload ID enabled, the Cloud Credential Operator (CCO) supports short-term authentication using Workload ID.

You can create the Loki object storage secret manually by running the following command:

oc -n openshift-logging create secret generic logging-loki-azure \
  --from-literal=environment="<azure_environment>" \
  --from-literal=account_name="<storage_account_name>" \
  --from-literal=container="<container_name>"

$ oc -n openshift-logging create secret generic logging-loki-azure \
  --from-literal=environment="<azure_environment>" \
  --from-literal=account_name="<storage_account_name>" \
  --from-literal=container="<container_name>"

Copy to Clipboard

Toggle word wrap

3.2.3. Google Cloud Platform storage
Copy link

Prerequisites

You installed the Loki Operator.
You installed the OpenShift CLI (oc).
You created a project on Google Cloud Platform (GCP).
You created a bucket in the same project.
You created a service account in the same project for GCP authentication.

Procedure

Copy the service account credentials received from GCP into a file called key.json.

Create an object storage secret with the name logging-loki-gcs by running the following command:

oc create secret generic logging-loki-gcs \
  --from-literal=bucketname="<bucket_name>" \
  --from-file=key.json="<path/to/key.json>"

$ oc create secret generic logging-loki-gcs \
  --from-literal=bucketname="<bucket_name>" \
  --from-file=key.json="<path/to/key.json>"

Copy to Clipboard

Toggle word wrap

3.2.4. Minio storage
Copy link

You can create an object storage in Minio to store logs.

Prerequisites

You installed the Loki Operator.
You installed the OpenShift CLI (oc).
You have Minio deployed on your cluster.
You created a bucket on Minio.

Procedure

Create an object storage secret with the name logging-loki-minio by running the following command:

oc create secret generic logging-loki-minio \
  --from-literal=bucketnames="<bucket_name>" \
  --from-literal=endpoint="<minio_bucket_endpoint>" \
  --from-literal=access_key_id="<minio_access_key_id>" \
  --from-literal=access_key_secret="<minio_access_key_secret>"
  --from-literal=forcepathstyle="true"

$ oc create secret generic logging-loki-minio \


  --from-literal=bucketnames="<bucket_name>" \
  --from-literal=endpoint="<minio_bucket_endpoint>" \
  --from-literal=access_key_id="<minio_access_key_id>" \
  --from-literal=access_key_secret="<minio_access_key_secret>"
  --from-literal=forcepathstyle="true"

Copy to Clipboard

Toggle word wrap

1: logging-loki-minio is the name of the secret.
2: AWS endpoints (those ending in .amazonaws.com) use a virtual-hosted style by default, which is equivalent to setting the forcepathstyle attribute to false. Conversely, non-AWS endpoints use a path style, equivalent to setting forcepathstyle attribute to true. If you need to use a virtual-hosted style with non-AWS S3 services, you must explicitly set forcepathstyle to false.

3.2.5. OpenShift Data Foundation storage
Copy link

You can create an object storage in OpenShift Data Foundation storage to store logs.

Prerequisites

You installed the Loki Operator.
You installed the OpenShift CLI (oc).
You deployed OpenShift Data Foundation.
You configured your OpenShift Data Foundation cluster for object storage.

Procedure

Create an ObjectBucketClaim custom resource in the openshift-logging namespace:

apiVersion: objectbucket.io/v1alpha1
kind: ObjectBucketClaim
metadata:
  name: loki-bucket-odf
  namespace: openshift-logging
spec:
  generateBucketName: loki-bucket-odf
  storageClassName: openshift-storage.noobaa.io

apiVersion: objectbucket.io/v1alpha1
kind: ObjectBucketClaim
metadata:
  name: loki-bucket-odf
  namespace: openshift-logging
spec:
  generateBucketName: loki-bucket-odf
  storageClassName: openshift-storage.noobaa.io

Copy to Clipboard

Toggle word wrap

Get bucket properties from the associated ConfigMap object by running the following command:

BUCKET_HOST=$(oc get -n openshift-logging configmap loki-bucket-odf -o jsonpath='{.data.BUCKET_HOST}')
BUCKET_NAME=$(oc get -n openshift-logging configmap loki-bucket-odf -o jsonpath='{.data.BUCKET_NAME}')
BUCKET_PORT=$(oc get -n openshift-logging configmap loki-bucket-odf -o jsonpath='{.data.BUCKET_PORT}')

BUCKET_HOST=$(oc get -n openshift-logging configmap loki-bucket-odf -o jsonpath='{.data.BUCKET_HOST}')
BUCKET_NAME=$(oc get -n openshift-logging configmap loki-bucket-odf -o jsonpath='{.data.BUCKET_NAME}')
BUCKET_PORT=$(oc get -n openshift-logging configmap loki-bucket-odf -o jsonpath='{.data.BUCKET_PORT}')

Copy to Clipboard

Toggle word wrap

Get bucket access key from the associated secret by running the following command:

ACCESS_KEY_ID=$(oc get -n openshift-logging secret loki-bucket-odf -o jsonpath='{.data.AWS_ACCESS_KEY_ID}' | base64 -d)
SECRET_ACCESS_KEY=$(oc get -n openshift-logging secret loki-bucket-odf -o jsonpath='{.data.AWS_SECRET_ACCESS_KEY}' | base64 -d)

ACCESS_KEY_ID=$(oc get -n openshift-logging secret loki-bucket-odf -o jsonpath='{.data.AWS_ACCESS_KEY_ID}' | base64 -d)
SECRET_ACCESS_KEY=$(oc get -n openshift-logging secret loki-bucket-odf -o jsonpath='{.data.AWS_SECRET_ACCESS_KEY}' | base64 -d)

Copy to Clipboard

Toggle word wrap

Create an object storage secret with the name logging-loki-odf by running the following command:

oc create -n openshift-logging secret generic logging-loki-odf \
  --from-literal=access_key_id="<access_key_id>" \
  --from-literal=access_key_secret="<secret_access_key>" \
  --from-literal=bucketnames="<bucket_name>" \
  --from-literal=endpoint="https://<bucket_host>:<bucket_port>" \
  --from-literal=forcepathstyle="true"

$ oc create -n openshift-logging secret generic logging-loki-odf \


  --from-literal=access_key_id="<access_key_id>" \
  --from-literal=access_key_secret="<secret_access_key>" \
  --from-literal=bucketnames="<bucket_name>" \
  --from-literal=endpoint="https://<bucket_host>:<bucket_port>" \
  --from-literal=forcepathstyle="true"

Copy to Clipboard

Toggle word wrap

1: logging-loki-odf is the name of the secret.
2: AWS endpoints (those ending in .amazonaws.com) use a virtual-hosted style by default, which is equivalent to setting the forcepathstyle attribute to false. Conversely, non-AWS endpoints use a path style, equivalent to setting forcepathstyle attribute to true. If you need to use a virtual-hosted style with non-AWS S3 services, you must explicitly set forcepathstyle to false.

3.2.6. Swift storage
Copy link

Prerequisites

You installed the Loki Operator.
You installed the OpenShift CLI (oc).
You created a bucket on Swift.

Procedure

Create an object storage secret with the name logging-loki-swift by running the following command:

oc create secret generic logging-loki-swift \
  --from-literal=auth_url="<swift_auth_url>" \
  --from-literal=username="<swift_usernameclaim>" \
  --from-literal=user_domain_name="<swift_user_domain_name>" \
  --from-literal=user_domain_id="<swift_user_domain_id>" \
  --from-literal=user_id="<swift_user_id>" \
  --from-literal=password="<swift_password>" \
  --from-literal=domain_id="<swift_domain_id>" \
  --from-literal=domain_name="<swift_domain_name>" \
  --from-literal=container_name="<swift_container_name>"

$ oc create secret generic logging-loki-swift \
  --from-literal=auth_url="<swift_auth_url>" \
  --from-literal=username="<swift_usernameclaim>" \
  --from-literal=user_domain_name="<swift_user_domain_name>" \
  --from-literal=user_domain_id="<swift_user_domain_id>" \
  --from-literal=user_id="<swift_user_id>" \
  --from-literal=password="<swift_password>" \
  --from-literal=domain_id="<swift_domain_id>" \
  --from-literal=domain_name="<swift_domain_name>" \
  --from-literal=container_name="<swift_container_name>"

Copy to Clipboard

Toggle word wrap

You can optionally provide project-specific data, region, or both by running the following command:

oc create secret generic logging-loki-swift \
  --from-literal=auth_url="<swift_auth_url>" \
  --from-literal=username="<swift_usernameclaim>" \
  --from-literal=user_domain_name="<swift_user_domain_name>" \
  --from-literal=user_domain_id="<swift_user_domain_id>" \
  --from-literal=user_id="<swift_user_id>" \
  --from-literal=password="<swift_password>" \
  --from-literal=domain_id="<swift_domain_id>" \
  --from-literal=domain_name="<swift_domain_name>" \
  --from-literal=container_name="<swift_container_name>" \
  --from-literal=project_id="<swift_project_id>" \
  --from-literal=project_name="<swift_project_name>" \
  --from-literal=project_domain_id="<swift_project_domain_id>" \
  --from-literal=project_domain_name="<swift_project_domain_name>" \
  --from-literal=region="<swift_region>"

$ oc create secret generic logging-loki-swift \
  --from-literal=auth_url="<swift_auth_url>" \
  --from-literal=username="<swift_usernameclaim>" \
  --from-literal=user_domain_name="<swift_user_domain_name>" \
  --from-literal=user_domain_id="<swift_user_domain_id>" \
  --from-literal=user_id="<swift_user_id>" \
  --from-literal=password="<swift_password>" \
  --from-literal=domain_id="<swift_domain_id>" \
  --from-literal=domain_name="<swift_domain_name>" \
  --from-literal=container_name="<swift_container_name>" \
  --from-literal=project_id="<swift_project_id>" \
  --from-literal=project_name="<swift_project_name>" \
  --from-literal=project_domain_id="<swift_project_domain_id>" \
  --from-literal=project_domain_name="<swift_project_domain_name>" \
  --from-literal=region="<swift_region>"

Copy to Clipboard

Toggle word wrap

3.2.7. Deploying a Loki log store on a cluster that uses short-term credentials
Copy link

For some storage providers, you can use the Cloud Credential Operator utility (ccoctl) during installation to implement short-term credentials. These credentials are created and managed outside the OpenShift Container Platform cluster. For more information, see Manual mode with short-term credentials for components.

Note

Short-term credential authentication must be configured during a new installation of Loki Operator, on a cluster that uses this credentials strategy. You cannot configure an existing cluster that uses a different credentials strategy to use this feature.

3.2.7.1. Authenticating with workload identity federation to access cloud-based log stores
Copy link

You can use workload identity federation with short-lived tokens to authenticate to cloud-based log stores. With workload identity federation, you do not have to store long-lived credentials in your cluster, which reduces the risk of credential leaks and simplifies secret management.

Prerequisites

You have administrator permissions.

Procedure

Use one of the following options to enable authentication:

If you used the OpenShift Container Platform web console to install the Loki Operator, the system automatically detects clusters that use short-lived tokens. You are prompted to create roles and supply the data required for the Loki Operator to create a CredentialsRequest object, which populates a secret.

If you used the OpenShift CLI (oc) to install the Loki Operator, you must manually create a Subscription object. Use the appropriate template for your storage provider, as shown in the following samples. This authentication strategy supports only the storage providers indicated within the samples.

Microsoft Azure sample subscription

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: loki-operator
  namespace: openshift-operators-redhat
spec:
  channel: "stable-6.3"
  installPlanApproval: Manual
  name: loki-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
  config:
    env:
      - name: CLIENTID
        value: <your_client_id>
      - name: TENANTID
        value: <your_tenant_id>
      - name: SUBSCRIPTIONID
        value: <your_subscription_id>
      - name: REGION
        value: <your_region>

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: loki-operator
  namespace: openshift-operators-redhat
spec:
  channel: "stable-6.3"
  installPlanApproval: Manual
  name: loki-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
  config:
    env:
      - name: CLIENTID
        value: <your_client_id>
      - name: TENANTID
        value: <your_tenant_id>
      - name: SUBSCRIPTIONID
        value: <your_subscription_id>
      - name: REGION
        value: <your_region>

Copy to Clipboard

Toggle word wrap

Amazon Web Services (AWS) sample subscription

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: loki-operator
  namespace: openshift-operators-redhat
spec:
  channel: "stable-6.3"
  installPlanApproval: Manual
  name: loki-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
  config:
    env:
    - name: ROLEARN
      value: <role_ARN>

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: loki-operator
  namespace: openshift-operators-redhat
spec:
  channel: "stable-6.3"
  installPlanApproval: Manual
  name: loki-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
  config:
    env:
    - name: ROLEARN
      value: <role_ARN>

Copy to Clipboard

Toggle word wrap

Google Cloud Platform (GCP) sample subscription

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: loki-operator
  namespace: openshift-operators-redhat
spec:
  channel: "stable-6.3"
  installPlanApproval: Manual
  name: loki-operator
  source:  redhat-operators
  sourceNamespace: openshift-marketplace
  config:
    env:
    - name: PROJECT_NUMBER
      value: <your_project_number>
    - name: POOL_ID
      value: <your_pool_id>
    - name: PROVIDER_ID
      value: <your_provider_id>
    - name: SERVICE_ACCOUNT_EMAIL
      value: example@mydomain.iam.gserviceaccount.com

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: loki-operator
  namespace: openshift-operators-redhat
spec:
  channel: "stable-6.3"
  installPlanApproval: Manual
  name: loki-operator
  source:  redhat-operators
  sourceNamespace: openshift-marketplace
  config:
    env:
    - name: PROJECT_NUMBER
      value: <your_project_number>
    - name: POOL_ID
      value: <your_pool_id>
    - name: PROVIDER_ID
      value: <your_provider_id>
    - name: SERVICE_ACCOUNT_EMAIL
      value: example@mydomain.iam.gserviceaccount.com

Copy to Clipboard

Toggle word wrap

3.2.7.2. Creating a LokiStack custom resource by using the web console
Copy link

You can create a LokiStack custom resource (CR) by using the OpenShift Container Platform web console.

Prerequisites

You have administrator permissions.
You have access to the OpenShift Container Platform web console.
You installed the Loki Operator.

Procedure

Go to the Operators Installed Operators page. Click the All instances tab.
From the Create new drop-down list, select LokiStack.
Select YAML view, and then use the following template to create a LokiStack CR:
```
apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki 
  namespace: openshift-logging
spec:
  size: 1x.small 
  storage:
    schemas:
      - effectiveDate: '2023-10-15'
        version: v13
    secret:
      name: logging-loki-s3 
      type: s3 
      credentialMode: 
  storageClassName: <storage_class_name> 
  tenants:
    mode: openshift-logging
```
```
apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki 
```
1
```
  namespace: openshift-logging
spec:
  size: 1x.small 
```
2
```
  storage:
    schemas:
      - effectiveDate: '2023-10-15'
        version: v13
    secret:
      name: logging-loki-s3 
```
3
```
      type: s3 
```
4
```
      credentialMode: 
```
5
```
  storageClassName: <storage_class_name> 
```
6
```
  tenants:
    mode: openshift-logging
```
Copy to Clipboard Toggle word wrap
1
Use the name logging-loki.
2
Specify the deployment size. In the logging 5.8 and later versions, the supported size options for production instances of Loki are 1x.extra-small, 1x.small, or 1x.medium.
3
Specify the secret used for your log storage.
4
Specify the corresponding storage type.
5
Optional field, logging 5.9 and later. Supported user configured values are as follows: static is the default authentication mode available for all supported object storage types using credentials stored in a Secret. token for short-lived tokens retrieved from a credential source. In this mode the static configuration does not contain credentials needed for the object storage. Instead, they are generated during runtime using a service, which allows for shorter-lived credentials and much more granular control. This authentication mode is not supported for all object storage types. token-cco is the default value when Loki is running on managed STS mode and using CCO on STS/WIF clusters.
6
Enter the name of a storage class for temporary storage. For best performance, specify a storage class that allocates block storage. Available storage classes for your cluster can be listed by using the oc get storageclasses command.

3.2.7.3. Creating a secret for Loki object storage by using the CLI
Copy link

To configure Loki object storage, you must create a secret. You can do this by using the OpenShift CLI (oc).

Prerequisites

You have administrator permissions.
You installed the Loki Operator.
You installed the OpenShift CLI (oc).

Procedure

Create a secret in the directory that contains your certificate and key files by running the following command:

oc create secret generic -n openshift-logging <your_secret_name> \
  --from-file=tls.key=<your_key_file> \
  --from-file=tls.crt=<your_crt_file> \
  --from-file=ca-bundle.crt=<your_bundle_file> \
  --from-literal=username=<your_username> \
  --from-literal=password=<your_password>

$ oc create secret generic -n openshift-logging <your_secret_name> \
  --from-file=tls.key=<your_key_file> \
  --from-file=tls.crt=<your_crt_file> \
  --from-file=ca-bundle.crt=<your_bundle_file> \
  --from-literal=username=<your_username> \
  --from-literal=password=<your_password>

Copy to Clipboard

Toggle word wrap

Note

Use generic or opaque secrets for best results.

Verification

Verify that a secret was created by running the following command:
```
oc get secret -n openshift-logging
```
```
$ oc get secret -n openshift-logging
```
Copy to Clipboard Toggle word wrap

3.2.8. Fine grained access for Loki logs
Copy link

The Red Hat OpenShift Logging Operator does not grant all users access to logs by default. As an administrator, you must configure your users' access unless the Operator was upgraded and prior configurations are in place. Depending on your configuration and need, you can configure fine grain access to logs using the following:

Cluster wide policies
Namespace scoped policies
Creation of custom admin groups

As an administrator, you need to create the role bindings and cluster role bindings appropriate for your deployment. The Red Hat OpenShift Logging Operator provides the following cluster roles:

cluster-logging-application-view grants permission to read application logs.
cluster-logging-infrastructure-view grants permission to read infrastructure logs.
cluster-logging-audit-view grants permission to read audit logs.

If you have upgraded from a prior version, an additional cluster role logging-application-logs-reader and associated cluster role binding logging-all-authenticated-application-logs-reader provide backward compatibility, allowing any authenticated user read access in their namespaces.

Note

Users with access by namespace must provide a namespace when querying application logs.

3.2.8.1. Cluster wide access
Copy link

Cluster role binding resources reference cluster roles, and set permissions cluster wide.

Example ClusterRoleBinding

kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: logging-all-application-logs-reader
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-logging-application-view 
subjects: 
- kind: Group
  name: system:authenticated
  apiGroup: rbac.authorization.k8s.io

kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: logging-all-application-logs-reader
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-logging-application-view


subjects:


- kind: Group
  name: system:authenticated
  apiGroup: rbac.authorization.k8s.io

Copy to Clipboard

Toggle word wrap

1: Additional ClusterRoles are cluster-logging-infrastructure-view, and cluster-logging-audit-view.
2: Specifies the users or groups this object applies to.

3.2.8.2. Namespaced access
Copy link

RoleBinding resources can be used with ClusterRole objects to define the namespace a user or group has access to logs for.

Example RoleBinding

kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: allow-read-logs
  namespace: log-test-0 
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-logging-application-view
subjects:
- kind: User
  apiGroup: rbac.authorization.k8s.io
  name: testuser-0

kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: allow-read-logs
  namespace: log-test-0


roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-logging-application-view
subjects:
- kind: User
  apiGroup: rbac.authorization.k8s.io
  name: testuser-0

Copy to Clipboard

Toggle word wrap

1: Specifies the namespace this RoleBinding applies to.

3.2.8.3. Custom admin group access
Copy link

If you have a large deployment with several users who require broader permissions, you can create a custom group using the adminGroup field. Users who are members of any group specified in the adminGroups field of the LokiStack CR are considered administrators.

Administrator users have access to all application logs in all namespaces, if they also get assigned the cluster-logging-application-view role.

Example LokiStack CR

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
  tenants:
    mode: openshift-logging 
    openshift:
      adminGroups: 
      - cluster-admin
      - custom-admin-group

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
  tenants:
    mode: openshift-logging


    openshift:
      adminGroups:


      - cluster-admin
      - custom-admin-group

Copy to Clipboard

Toggle word wrap

1: Custom admin groups are only available in this mode.
2: Entering an empty list [] value for this field disables admin groups.
3: Overrides the default groups (system:cluster-admins, cluster-admin, dedicated-admin)

3.2.9. Creating a new group for the cluster-admin user role
Copy link

Important

Querying application logs for multiple namespaces as a cluster-admin user, where the sum total of characters of all of the namespaces in the cluster is greater than 5120, results in the error Parse error: input size too long (XXXX > 5120). For better control over access to logs in LokiStack, make the cluster-admin user a member of the cluster-admin group. If the cluster-admin group does not exist, create it and add the desired users to it.

Use the following procedure to create a new group for users with cluster-admin permissions.

Procedure

Enter the following command to create a new group:
```
oc adm groups new cluster-admin
```
```
$ oc adm groups new cluster-admin
```
Copy to Clipboard Toggle word wrap
Enter the following command to add the desired user to the cluster-admin group:
```
oc adm groups add-users cluster-admin <username>
```
```
$ oc adm groups add-users cluster-admin <username>
```
Copy to Clipboard Toggle word wrap

Enter the following command to add cluster-admin user role to the group:

oc adm policy add-cluster-role-to-group cluster-admin cluster-admin

$ oc adm policy add-cluster-role-to-group cluster-admin cluster-admin

Copy to Clipboard

Toggle word wrap

3.3. Enhanced reliability and performance for Loki
Copy link

Use the following configurations to ensure reliability and efficiency of Loki in production.

3.3.1. Loki pod placement
Copy link

You can control which nodes the Loki pods run on, and prevent other workloads from using those nodes, by using tolerations or node selectors on the pods.

You can apply tolerations to the log store pods with the LokiStack custom resource (CR) and apply taints to a node with the node specification. A taint on a node is a key:value pair that instructs the node to repel all pods that do not allow the taint. Using a specific key:value pair that is not on other pods ensures that only the log store pods can run on that node.

Example LokiStack with node selectors

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
# ...
  template:
    compactor: 
      nodeSelector:
        node-role.kubernetes.io/infra: "" 
    distributor:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    gateway:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    indexGateway:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    ingester:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    querier:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    queryFrontend:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    ruler:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
# ...

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
# ...
  template:
    compactor:


      nodeSelector:
        node-role.kubernetes.io/infra: ""


    distributor:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    gateway:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    indexGateway:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    ingester:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    querier:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    queryFrontend:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    ruler:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
# ...

Copy to Clipboard

Toggle word wrap

1: Specifies the component pod type that applies to the node selector.
2: Specifies the pods that are moved to nodes containing the defined label.

Example LokiStack CR with node selectors and tolerations

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
# ...
  template:
    compactor:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
    distributor:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
    indexGateway:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
    ingester:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
    querier:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
    queryFrontend:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
    ruler:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
    gateway:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
# ...

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
# ...
  template:
    compactor:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
    distributor:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
    indexGateway:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
    ingester:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
    querier:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
    queryFrontend:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
    ruler:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
    gateway:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
# ...

Copy to Clipboard

Toggle word wrap

To configure the nodeSelector and tolerations fields of the LokiStack (CR), you can use the oc explain command to view the description and fields for a particular resource:

oc explain lokistack.spec.template

$ oc explain lokistack.spec.template

Copy to Clipboard

Toggle word wrap

Example output

KIND:     LokiStack
VERSION:  loki.grafana.com/v1

RESOURCE: template <Object>

DESCRIPTION:
     Template defines the resource/limits/tolerations/nodeselectors per
     component

FIELDS:
   compactor	<Object>
     Compactor defines the compaction component spec.

   distributor	<Object>
     Distributor defines the distributor component spec.
...

KIND:     LokiStack
VERSION:  loki.grafana.com/v1

RESOURCE: template <Object>

DESCRIPTION:
     Template defines the resource/limits/tolerations/nodeselectors per
     component

FIELDS:
   compactor	<Object>
     Compactor defines the compaction component spec.

   distributor	<Object>
     Distributor defines the distributor component spec.
...

Copy to Clipboard

Toggle word wrap

For more detailed information, you can add a specific field:

oc explain lokistack.spec.template.compactor

$ oc explain lokistack.spec.template.compactor

Copy to Clipboard

Toggle word wrap

Example output

KIND:     LokiStack
VERSION:  loki.grafana.com/v1

RESOURCE: compactor <Object>

DESCRIPTION:
     Compactor defines the compaction component spec.

FIELDS:
   nodeSelector	<map[string]string>
     NodeSelector defines the labels required by a node to schedule the
     component onto it.
...

KIND:     LokiStack
VERSION:  loki.grafana.com/v1

RESOURCE: compactor <Object>

DESCRIPTION:
     Compactor defines the compaction component spec.

FIELDS:
   nodeSelector	<map[string]string>
     NodeSelector defines the labels required by a node to schedule the
     component onto it.
...

Copy to Clipboard

Toggle word wrap

3.3.2. Configuring Loki to tolerate node failure
Copy link

In the logging 5.8 and later versions, the Loki Operator supports setting pod anti-affinity rules to request that pods of the same component are scheduled on different available nodes in the cluster.

Affinity is a property of pods that controls the nodes on which they prefer to be scheduled. Anti-affinity is a property of pods that prevents a pod from being scheduled on a node.

In OpenShift Container Platform, pod affinity and pod anti-affinity allow you to constrain which nodes your pod is eligible to be scheduled on based on the key-value labels on other pods.

The Operator sets default, preferred podAntiAffinity rules for all Loki components, which includes the compactor, distributor, gateway, indexGateway, ingester, querier, queryFrontend, and ruler components.

You can override the preferred podAntiAffinity settings for Loki components by configuring required settings in the requiredDuringSchedulingIgnoredDuringExecution field:

Example user settings for the ingester component

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
# ...
  template:
    ingester:
      podAntiAffinity:
      # ...
        requiredDuringSchedulingIgnoredDuringExecution: 
        - labelSelector:
            matchLabels: 
              app.kubernetes.io/component: ingester
          topologyKey: kubernetes.io/hostname
# ...

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
# ...
  template:
    ingester:
      podAntiAffinity:
      # ...
        requiredDuringSchedulingIgnoredDuringExecution:


        - labelSelector:
            matchLabels:


              app.kubernetes.io/component: ingester
          topologyKey: kubernetes.io/hostname
# ...

Copy to Clipboard

Toggle word wrap

1: The stanza to define a required rule.
2: The key-value pair (label) that must be matched to apply the rule.

3.3.3. Enabling stream-based retention with Loki
Copy link

You can configure retention policies based on log streams. You can set retention rules globally, per-tenant, or both. If you configure both, tenant rules apply before global rules.

Important

If there is no retention period defined on the s3 bucket or in the LokiStack custom resource (CR), then the logs are not pruned and they stay in the s3 bucket forever, which might fill up the s3 storage.

Note

Although logging version 5.9 and later supports schema v12, schema v13 is recommended for future compatibility.
For cost-effective log pruning, configure retention policies directly on the object storage provider. Use the lifecycle management features of the storage provider to ensure automatic deletion of old logs. This also avoids extra processing from Loki and delete requests to S3.
If the object storage does not support lifecycle policies, you must configure LokiStack to enforce retention internally. The supported retention period is up to 30 days.

Prerequisites

You have administrator permissions.
You have installed the Loki Operator.
You have installed the OpenShift CLI (oc).

Procedure

To enable stream-based retention, create a LokiStack CR and save it as a YAML file. In the following example, it is called lokistack.yaml.

Example global stream-based retention for S3

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
  limits:
   global: 
      retention: 
        days: 20
        streams:
        - days: 4
          priority: 1
          selector: '{kubernetes_namespace_name=~"test.+"}' 
        - days: 1
          priority: 1
          selector: '{log_type="infrastructure"}'
  managementState: Managed
  replicationFactor: 1
  size: 1x.small
  storage:
    schemas:
    - effectiveDate: "2020-10-11"
      version: v13
    secret:
      name: logging-loki-s3
      type: s3
  storageClassName: gp3-csi
  tenants:
    mode: openshift-logging

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
  limits:
   global:


      retention:


        days: 20
        streams:
        - days: 4
          priority: 1
          selector: '{kubernetes_namespace_name=~"test.+"}'


        - days: 1
          priority: 1
          selector: '{log_type="infrastructure"}'
  managementState: Managed
  replicationFactor: 1
  size: 1x.small
  storage:
    schemas:
    - effectiveDate: "2020-10-11"
      version: v13
    secret:
      name: logging-loki-s3
      type: s3
  storageClassName: gp3-csi
  tenants:
    mode: openshift-logging

Copy to Clipboard

Toggle word wrap

1: Set the retention policy for all log streams. This policy does not impact the retention period for stored logs in object storage.
2: Enable retention in the cluster by adding the retention block to the CR.
3: Specify the LogQL query to match log streams to the retention rule.

Example per-tenant stream-based retention for S3

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
  limits:
    global:
      retention:
        days: 20
    tenants: 
      application:
        retention:
          days: 1
          streams:
            - days: 4
              selector: '{kubernetes_namespace_name=~"test.+"}' 
      infrastructure:
        retention:
          days: 5
          streams:
            - days: 1
              selector: '{kubernetes_namespace_name=~"openshift-cluster.+"}'
  managementState: Managed
  replicationFactor: 1
  size: 1x.small
  storage:
    schemas:
    - effectiveDate: "2020-10-11"
      version: v13
    secret:
      name: logging-loki-s3
      type: s3
  storageClassName: gp3-csi
  tenants:
    mode: openshift-logging

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
  limits:
    global:
      retention:
        days: 20
    tenants:


      application:
        retention:
          days: 1
          streams:
            - days: 4
              selector: '{kubernetes_namespace_name=~"test.+"}'


      infrastructure:
        retention:
          days: 5
          streams:
            - days: 1
              selector: '{kubernetes_namespace_name=~"openshift-cluster.+"}'
  managementState: Managed
  replicationFactor: 1
  size: 1x.small
  storage:
    schemas:
    - effectiveDate: "2020-10-11"
      version: v13
    secret:
      name: logging-loki-s3
      type: s3
  storageClassName: gp3-csi
  tenants:
    mode: openshift-logging

Copy to Clipboard

Toggle word wrap

1: Set the retention policy per-tenant. Valid tenant types are application, audit, and infrastructure.
2: Specify the LogQL query to match log streams to the retention rule.

Apply the LokiStack CR:
```
oc apply -f lokistack.yaml
```
```
$ oc apply -f lokistack.yaml
```
Copy to Clipboard Toggle word wrap

3.3.4. Configuring Loki to tolerate memberlist creation failure
Copy link

In an OpenShift Container Platform cluster, administrators generally use a non-private IP network range. As a result, the LokiStack memberlist configuration fails because, by default, it only uses private IP networks.

As an administrator, you can select the pod network for the memberlist configuration. You can modify the LokiStack custom resource (CR) to use the podIP address in the hashRing spec. To configure the LokiStack CR, use the following command:

oc patch LokiStack logging-loki -n openshift-logging  --type=merge -p '{"spec": {"hashRing":{"memberlist":{"instanceAddrType":"podIP"},"type":"memberlist"}}}'

$ oc patch LokiStack logging-loki -n openshift-logging  --type=merge -p '{"spec": {"hashRing":{"memberlist":{"instanceAddrType":"podIP"},"type":"memberlist"}}}'

Copy to Clipboard

Toggle word wrap

Example LokiStack to include podIP

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
# ...
  hashRing:
    type: memberlist
    memberlist:
      instanceAddrType: podIP
# ...

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
# ...
  hashRing:
    type: memberlist
    memberlist:
      instanceAddrType: podIP
# ...

Copy to Clipboard

Toggle word wrap

3.3.5. LokiStack behavior during cluster restarts
Copy link

When an OpenShift Container Platform cluster is restarted, LokiStack ingestion and the query path continue to operate within the available CPU and memory resources available for the node. This means that there is no downtime for the LokiStack during OpenShift Container Platform cluster updates. This behavior is achieved by using PodDisruptionBudget resources. The Loki Operator provisions PodDisruptionBudget resources for Loki, which determine the minimum number of pods that must be available per component to ensure normal operations under certain conditions.

3.4. Advanced deployment and scalability for Loki
Copy link

You can configure high availability, scalability, and error handling for Loki.

3.4.1. Zone aware data replication
Copy link

The Loki Operator offers support for zone-aware data replication through pod topology spread constraints. Enabling this feature enhances reliability and safeguards against log loss in the event of a single zone failure. When configuring the deployment size as 1x.extra-small, 1x.small, or 1x.medium, the replication.factor field is automatically set to 2.

To ensure proper replication, you need to have at least as many availability zones as the replication factor specifies. While it is possible to have more availability zones than the replication factor, having fewer zones can lead to write failures. Each zone should host an equal number of instances for optimal operation.

Example LokiStack CR with zone replication enabled

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
 name: logging-loki
 namespace: openshift-logging
spec:
 replicationFactor: 2 
 replication:
   factor: 2 
   zones:
   -  maxSkew: 1 
      topologyKey: topology.kubernetes.io/zone

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
 name: logging-loki
 namespace: openshift-logging
spec:
 replicationFactor: 2


 replication:
   factor: 2


   zones:
   -  maxSkew: 1


      topologyKey: topology.kubernetes.io/zone

Copy to Clipboard

Toggle word wrap

1: Deprecated field, values entered are overwritten by replication.factor.
2: This value is automatically set when deployment size is selected at setup.
3: The maximum difference in number of pods between any two topology domains. The default is 1, and you cannot specify a value of 0.
4: Defines zones in the form of a topology key that corresponds to a node label.

3.4.2. Recovering Loki pods from failed zones
Copy link

In OpenShift Container Platform a zone failure happens when specific availability zone resources become inaccessible. Availability zones are isolated areas within a cloud provider’s data center, aimed at enhancing redundancy and fault tolerance. If your OpenShift Container Platform cluster is not configured to handle this, a zone failure can lead to service or data loss.

Loki pods are part of a StatefulSet, and they come with Persistent Volume Claims (PVCs) provisioned by a StorageClass object. Each Loki pod and its PVCs reside in the same zone. When a zone failure occurs in a cluster, the StatefulSet controller automatically attempts to recover the affected pods in the failed zone.

Warning

The following procedure will delete the PVCs in the failed zone, and all data contained therein. To avoid complete data loss the replication factor field of the LokiStack CR should always be set to a value greater than 1 to ensure that Loki is replicating.

Prerequisites

Verify your LokiStack CR has a replication factor greater than 1.
Zone failure detected by the control plane, and nodes in the failed zone are marked by cloud provider integration.

The StatefulSet controller automatically attempts to reschedule pods in a failed zone. Because the associated PVCs are also in the failed zone, automatic rescheduling to a different zone does not work. You must manually delete the PVCs in the failed zone to allow successful re-creation of the stateful Loki Pod and its provisioned PVC in the new zone.

Procedure

List the pods in Pending status by running the following command:

oc get pods --field-selector status.phase==Pending -n openshift-logging

$ oc get pods --field-selector status.phase==Pending -n openshift-logging

Copy to Clipboard

Toggle word wrap

Example oc get pods output

NAME                           READY   STATUS    RESTARTS   AGE 
logging-loki-index-gateway-1   0/1     Pending   0          17m
logging-loki-ingester-1        0/1     Pending   0          16m
logging-loki-ruler-1           0/1     Pending   0          16m

NAME                           READY   STATUS    RESTARTS   AGE


logging-loki-index-gateway-1   0/1     Pending   0          17m
logging-loki-ingester-1        0/1     Pending   0          16m
logging-loki-ruler-1           0/1     Pending   0          16m

Copy to Clipboard

Toggle word wrap

1: These pods are in Pending status because their corresponding PVCs are in the failed zone.

List the PVCs in Pending status by running the following command:

oc get pvc -o=json -n openshift-logging | jq '.items[] | select(.status.phase == "Pending") | .metadata.name' -r

$ oc get pvc -o=json -n openshift-logging | jq '.items[] | select(.status.phase == "Pending") | .metadata.name' -r

Copy to Clipboard

Toggle word wrap

Example oc get pvc output

storage-logging-loki-index-gateway-1
storage-logging-loki-ingester-1
wal-logging-loki-ingester-1
storage-logging-loki-ruler-1
wal-logging-loki-ruler-1

storage-logging-loki-index-gateway-1
storage-logging-loki-ingester-1
wal-logging-loki-ingester-1
storage-logging-loki-ruler-1
wal-logging-loki-ruler-1

Copy to Clipboard

Toggle word wrap

Delete the PVC(s) for a pod by running the following command:
```
oc delete pvc <pvc_name> -n openshift-logging
```
```
$ oc delete pvc <pvc_name> -n openshift-logging
```
Copy to Clipboard Toggle word wrap
Delete the pod(s) by running the following command:
```
oc delete pod <pod_name> -n openshift-logging
```
```
$ oc delete pod <pod_name> -n openshift-logging
```
Copy to Clipboard Toggle word wrap
Once these objects have been successfully deleted, they should automatically be rescheduled in an available zone.

3.4.2.1. Troubleshooting PVC in a terminating state
Copy link

The PVCs might hang in the terminating state without being deleted, if PVC metadata finalizers are set to kubernetes.io/pv-protection. Removing the finalizers should allow the PVCs to delete successfully.

Remove the finalizer for each PVC by running the command below, then retry deletion.

oc patch pvc <pvc_name> -p '{"metadata":{"finalizers":null}}' -n openshift-logging

$ oc patch pvc <pvc_name> -p '{"metadata":{"finalizers":null}}' -n openshift-logging

Copy to Clipboard

Toggle word wrap

3.4.3. Troubleshooting Loki rate limit errors
Copy link

If the Log Forwarder API forwards a large block of messages that exceeds the rate limit to Loki, Loki generates rate limit (429) errors.

These errors can occur during normal operation. For example, when adding the logging to a cluster that already has some logs, rate limit errors might occur while the logging tries to ingest all of the existing log entries. In this case, if the rate of addition of new logs is less than the total rate limit, the historical data is eventually ingested, and the rate limit errors are resolved without requiring user intervention.

In cases where the rate limit errors continue to occur, you can fix the issue by modifying the LokiStack custom resource (CR).

Important

The LokiStack CR is not available on Grafana-hosted Loki. This topic does not apply to Grafana-hosted Loki servers.

Conditions

The Log Forwarder API is configured to forward logs to Loki.

Your system sends a block of messages that is larger than 2 MB to Loki. For example:

"values":[["1630410392689800468","{\"kind\":\"Event\",\"apiVersion\":\
.......
......
......
......
\"received_at\":\"2021-08-31T11:46:32.800278+00:00\",\"version\":\"1.7.4 1.6.0\"}},\"@timestamp\":\"2021-08-31T11:46:32.799692+00:00\",\"viaq_index_name\":\"audit-write\",\"viaq_msg_id\":\"MzFjYjJkZjItNjY0MC00YWU4LWIwMTEtNGNmM2E5ZmViMGU4\",\"log_type\":\"audit\"}"]]}]}

"values":[["1630410392689800468","{\"kind\":\"Event\",\"apiVersion\":\
.......
......
......
......
\"received_at\":\"2021-08-31T11:46:32.800278+00:00\",\"version\":\"1.7.4 1.6.0\"}},\"@timestamp\":\"2021-08-31T11:46:32.799692+00:00\",\"viaq_index_name\":\"audit-write\",\"viaq_msg_id\":\"MzFjYjJkZjItNjY0MC00YWU4LWIwMTEtNGNmM2E5ZmViMGU4\",\"log_type\":\"audit\"}"]]}]}

Copy to Clipboard

Toggle word wrap

After you enter oc logs -n openshift-logging -l component=collector, the collector logs in your cluster show a line containing one of the following error messages:

429 Too Many Requests Ingestion rate limit exceeded

429 Too Many Requests Ingestion rate limit exceeded

Copy to Clipboard

Toggle word wrap

Example Vector error message

2023-08-25T16:08:49.301780Z  WARN sink{component_kind="sink" component_id=default_loki_infra component_type=loki component_name=default_loki_infra}: vector::sinks::util::retries: Retrying after error. error=Server responded with an error: 429 Too Many Requests internal_log_rate_limit=true

2023-08-25T16:08:49.301780Z  WARN sink{component_kind="sink" component_id=default_loki_infra component_type=loki component_name=default_loki_infra}: vector::sinks::util::retries: Retrying after error. error=Server responded with an error: 429 Too Many Requests internal_log_rate_limit=true

Copy to Clipboard

Toggle word wrap

The error is also visible on the receiving end. For example, in the LokiStack ingester pod:

Example Loki ingester error message

level=warn ts=2023-08-30T14:57:34.155592243Z caller=grpc_logging.go:43 duration=1.434942ms method=/logproto.Pusher/Push err="rpc error: code = Code(429) desc = entry with timestamp 2023-08-30 14:57:32.012778399 +0000 UTC ignored, reason: 'Per stream rate limit exceeded (limit: 3MB/sec) while attempting to ingest for stream

level=warn ts=2023-08-30T14:57:34.155592243Z caller=grpc_logging.go:43 duration=1.434942ms method=/logproto.Pusher/Push err="rpc error: code = Code(429) desc = entry with timestamp 2023-08-30 14:57:32.012778399 +0000 UTC ignored, reason: 'Per stream rate limit exceeded (limit: 3MB/sec) while attempting to ingest for stream

Copy to Clipboard

Toggle word wrap

Procedure

Update the ingestionBurstSize and ingestionRate fields in the LokiStack CR:
```
apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
  limits:
    global:
      ingestion:
        ingestionBurstSize: 16 
        ingestionRate: 8 
# ...
```
```
apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
  limits:
    global:
      ingestion:
        ingestionBurstSize: 16 
```
1
```
        ingestionRate: 8 
```
2
```
# ...
```
Copy to Clipboard Toggle word wrap
1
The ingestionBurstSize field defines the maximum local rate-limited sample size per distributor replica in MB. This value is a hard limit. Set this value to at least the maximum logs size expected in a single push request. Single requests that are larger than the ingestionBurstSize value are not permitted.
2
The ingestionRate field is a soft limit on the maximum amount of ingested samples per second in MB. Rate limit errors occur if the rate of logs exceeds the limit, but the collector retries sending the logs. As long as the total average is lower than the limit, the system recovers and errors are resolved without user intervention.

3.5. Log-based alerts for Loki
Copy link

You can configure log-based alerts for Loki by creating an AlertingRule custom resource (CR).

3.5.1. Authorizing LokiStack rules RBAC permissions
Copy link

Administrators can allow users to create and manage their own alerting and recording rules by binding cluster roles to usernames. Cluster roles are defined as ClusterRole objects that contain necessary role-based access control (RBAC) permissions for users.

The following cluster roles for alerting and recording rules are available for LokiStack:

Expand

Rule name	Description
`alertingrules.loki.grafana.com-v1-admin`	Users with this role have administrative-level access to manage alerting rules. This cluster role grants permissions to create, read, update, delete, list, and watch `AlertingRule` resources within the `loki.grafana.com/v1` API group.
`alertingrules.loki.grafana.com-v1-crdview`	Users with this role can view the definitions of Custom Resource Definitions (CRDs) related to `AlertingRule` resources within the `loki.grafana.com/v1` API group, but do not have permissions for modifying or managing these resources.
`alertingrules.loki.grafana.com-v1-edit`	Users with this role have permission to create, update, and delete `AlertingRule` resources.
`alertingrules.loki.grafana.com-v1-view`	Users with this role can read `AlertingRule` resources within the `loki.grafana.com/v1` API group. They can inspect configurations, labels, and annotations for existing alerting rules but cannot make any modifications to them.
`recordingrules.loki.grafana.com-v1-admin`	Users with this role have administrative-level access to manage recording rules. This cluster role grants permissions to create, read, update, delete, list, and watch `RecordingRule` resources within the `loki.grafana.com/v1` API group.
`recordingrules.loki.grafana.com-v1-crdview`	Users with this role can view the definitions of Custom Resource Definitions (CRDs) related to `RecordingRule` resources within the `loki.grafana.com/v1` API group, but do not have permissions for modifying or managing these resources.
`recordingrules.loki.grafana.com-v1-edit`	Users with this role have permission to create, update, and delete `RecordingRule` resources.
`recordingrules.loki.grafana.com-v1-view`	Users with this role can read `RecordingRule` resources within the `loki.grafana.com/v1` API group. They can inspect configurations, labels, and annotations for existing alerting rules but cannot make any modifications to them.

3.5.1.1. Examples
Copy link

To apply cluster roles for a user, you must bind an existing cluster role to a specific username.

Cluster roles can be cluster or namespace scoped, depending on which type of role binding you use. When a RoleBinding object is used, as when using the oc adm policy add-role-to-user command, the cluster role only applies to the specified namespace. When a ClusterRoleBinding object is used, as when using the oc adm policy add-cluster-role-to-user command, the cluster role applies to all namespaces in the cluster.

The following example command gives the specified user create, read, update and delete (CRUD) permissions for alerting rules in a specific namespace in the cluster:

Example cluster role binding command for alerting rule CRUD permissions in a specific namespace

oc adm policy add-role-to-user alertingrules.loki.grafana.com-v1-admin -n <namespace> <username>

$ oc adm policy add-role-to-user alertingrules.loki.grafana.com-v1-admin -n <namespace> <username>

Copy to Clipboard

Toggle word wrap

The following command gives the specified user administrator permissions for alerting rules in all namespaces:

Example cluster role binding command for administrator permissions

oc adm policy add-cluster-role-to-user alertingrules.loki.grafana.com-v1-admin <username>

$ oc adm policy add-cluster-role-to-user alertingrules.loki.grafana.com-v1-admin <username>

Copy to Clipboard

Toggle word wrap

3.5.2. Creating a log-based alerting rule with Loki
Copy link

The AlertingRule CR contains a set of specifications and webhook validation definitions to declare groups of alerting rules for a single LokiStack instance. In addition, the webhook validation definition provides support for rule validation conditions:

If an AlertingRule CR includes an invalid interval period, it is an invalid alerting rule
If an AlertingRule CR includes an invalid for period, it is an invalid alerting rule.
If an AlertingRule CR includes an invalid LogQL expr, it is an invalid alerting rule.
If an AlertingRule CR includes two groups with the same name, it is an invalid alerting rule.
If none of the above applies, an alerting rule is considered valid.

Expand

Table 3.3. AlertingRule definitions
Tenant type	Valid namespaces for `AlertingRule` CRs
application	`<your_application_namespace>`
audit	`openshift-logging`
infrastructure	`openshift-/`, `kube-/\`, `default`

Procedure

Create an AlertingRule custom resource (CR):

Example infrastructure AlertingRule CR

  apiVersion: loki.grafana.com/v1
  kind: AlertingRule
  metadata:
    name: loki-operator-alerts
    namespace: openshift-operators-redhat 
    labels: 
      openshift.io/<label_name>: "true"
  spec:
    tenantID: "infrastructure" 
    groups:
      - name: LokiOperatorHighReconciliationError
        rules:
          - alert: HighPercentageError
            expr: | 
              sum(rate({kubernetes_namespace_name="openshift-operators-redhat", kubernetes_pod_name=~"loki-operator-controller-manager.*"} |= "error" [1m])) by (job)
                /
              sum(rate({kubernetes_namespace_name="openshift-operators-redhat", kubernetes_pod_name=~"loki-operator-controller-manager.*"}[1m])) by (job)
                > 0.01
            for: 10s
            labels:
              severity: critical 
            annotations:
              summary: High Loki Operator Reconciliation Errors 
              description: High Loki Operator Reconciliation Errors

  apiVersion: loki.grafana.com/v1
  kind: AlertingRule
  metadata:
    name: loki-operator-alerts
    namespace: openshift-operators-redhat


    labels:


      openshift.io/<label_name>: "true"
  spec:
    tenantID: "infrastructure"


    groups:
      - name: LokiOperatorHighReconciliationError
        rules:
          - alert: HighPercentageError
            expr: |


              sum(rate({kubernetes_namespace_name="openshift-operators-redhat", kubernetes_pod_name=~"loki-operator-controller-manager.*"} |= "error" [1m])) by (job)
                /
              sum(rate({kubernetes_namespace_name="openshift-operators-redhat", kubernetes_pod_name=~"loki-operator-controller-manager.*"}[1m])) by (job)
                > 0.01
            for: 10s
            labels:
              severity: critical


            annotations:
              summary: High Loki Operator Reconciliation Errors


              description: High Loki Operator Reconciliation Errors

Copy to Clipboard

Toggle word wrap

1: The namespace where this AlertingRule CR is created must have a label matching the LokiStack spec.rules.namespaceSelector definition.
2: The labels block must match the LokiStack spec.rules.selector definition.
3: AlertingRule CRs for infrastructure tenants are only supported in the openshift-*, kube-\*, or default namespaces.
4: The value for kubernetes_namespace_name: must match the value for metadata.namespace.
5: The value of this mandatory field must be critical, warning, or info.
6: This field is mandatory.
7: This field is mandatory.

Example application AlertingRule CR

  apiVersion: loki.grafana.com/v1
  kind: AlertingRule
  metadata:
    name: app-user-workload
    namespace: app-ns 
    labels: 
      openshift.io/<label_name>: "true"
  spec:
    tenantID: "application"
    groups:
      - name: AppUserWorkloadHighError
        rules:
          - alert:
            expr: | 
              sum(rate({kubernetes_namespace_name="app-ns", kubernetes_pod_name=~"podName.*"} |= "error" [1m])) by (job)
            for: 10s
            labels:
              severity: critical 
            annotations:
              summary:  
              description:

  apiVersion: loki.grafana.com/v1
  kind: AlertingRule
  metadata:
    name: app-user-workload
    namespace: app-ns


    labels:


      openshift.io/<label_name>: "true"
  spec:
    tenantID: "application"
    groups:
      - name: AppUserWorkloadHighError
        rules:
          - alert:
            expr: |


              sum(rate({kubernetes_namespace_name="app-ns", kubernetes_pod_name=~"podName.*"} |= "error" [1m])) by (job)
            for: 10s
            labels:
              severity: critical


            annotations:
              summary:


              description:

Copy to Clipboard

Toggle word wrap

1: The namespace where this AlertingRule CR is created must have a label matching the LokiStack spec.rules.namespaceSelector definition.
2: The labels block must match the LokiStack spec.rules.selector definition.
3: Value for kubernetes_namespace_name: must match the value for metadata.namespace.
4: The value of this mandatory field must be critical, warning, or info.
5: The value of this mandatory field is a summary of the rule.
6: The value of this mandatory field is a detailed description of the rule.

Apply the AlertingRule CR:
```
oc apply -f <filename>.yaml
```
```
$ oc apply -f <filename>.yaml
```
Copy to Clipboard Toggle word wrap

Chapter 3. Configuring the log store

3.1. Loki deployment sizingCopy linkLink copied to clipboard!

3.2. Loki object storageCopy linkLink copied to clipboard!

3.2.1. AWS storageCopy linkLink copied to clipboard!

3.2.1.1. AWS storage for STS enabled clustersCopy linkLink copied to clipboard!

3.2.2. Azure storageCopy linkLink copied to clipboard!

3.2.2.1. Azure storage for Microsoft Entra Workload ID enabled clustersCopy linkLink copied to clipboard!

3.2.3. Google Cloud Platform storageCopy linkLink copied to clipboard!

3.2.4. Minio storageCopy linkLink copied to clipboard!

3.2.5. OpenShift Data Foundation storageCopy linkLink copied to clipboard!

3.2.6. Swift storageCopy linkLink copied to clipboard!

3.2.7. Deploying a Loki log store on a cluster that uses short-term credentialsCopy linkLink copied to clipboard!

3.2.7.1. Authenticating with workload identity federation to access cloud-based log storesCopy linkLink copied to clipboard!

3.2.7.2. Creating a LokiStack custom resource by using the web consoleCopy linkLink copied to clipboard!

3.2.7.3. Creating a secret for Loki object storage by using the CLICopy linkLink copied to clipboard!

3.2.8. Fine grained access for Loki logsCopy linkLink copied to clipboard!

3.2.8.1. Cluster wide accessCopy linkLink copied to clipboard!

3.2.8.2. Namespaced accessCopy linkLink copied to clipboard!

3.2.8.3. Custom admin group accessCopy linkLink copied to clipboard!

3.2.9. Creating a new group for the cluster-admin user roleCopy linkLink copied to clipboard!

3.3. Enhanced reliability and performance for LokiCopy linkLink copied to clipboard!

3.3.1. Loki pod placementCopy linkLink copied to clipboard!

3.3.2. Configuring Loki to tolerate node failureCopy linkLink copied to clipboard!

3.3.3. Enabling stream-based retention with LokiCopy linkLink copied to clipboard!

3.3.4. Configuring Loki to tolerate memberlist creation failureCopy linkLink copied to clipboard!

3.3.5. LokiStack behavior during cluster restartsCopy linkLink copied to clipboard!

3.4. Advanced deployment and scalability for LokiCopy linkLink copied to clipboard!

3.4.1. Zone aware data replicationCopy linkLink copied to clipboard!

3.4.2. Recovering Loki pods from failed zonesCopy linkLink copied to clipboard!

3.4.2.1. Troubleshooting PVC in a terminating stateCopy linkLink copied to clipboard!

3.4.3. Troubleshooting Loki rate limit errorsCopy linkLink copied to clipboard!

3.5. Log-based alerts for LokiCopy linkLink copied to clipboard!

3.5.1. Authorizing LokiStack rules RBAC permissionsCopy linkLink copied to clipboard!

3.5.1.1. ExamplesCopy linkLink copied to clipboard!

3.5.2. Creating a log-based alerting rule with LokiCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

3.1. Loki deployment sizing
Copy link

3.2. Loki object storage
Copy link

3.2.1. AWS storage
Copy link

3.2.1.1. AWS storage for STS enabled clusters
Copy link

3.2.2. Azure storage
Copy link

3.2.2.1. Azure storage for Microsoft Entra Workload ID enabled clusters
Copy link

3.2.3. Google Cloud Platform storage
Copy link

3.2.4. Minio storage
Copy link

3.2.5. OpenShift Data Foundation storage
Copy link

3.2.6. Swift storage
Copy link

3.2.7. Deploying a Loki log store on a cluster that uses short-term credentials
Copy link

3.2.7.1. Authenticating with workload identity federation to access cloud-based log stores
Copy link

3.2.7.2. Creating a LokiStack custom resource by using the web console
Copy link

3.2.7.3. Creating a secret for Loki object storage by using the CLI
Copy link

3.2.8. Fine grained access for Loki logs
Copy link

3.2.8.1. Cluster wide access
Copy link

3.2.8.2. Namespaced access
Copy link

3.2.8.3. Custom admin group access
Copy link

3.2.9. Creating a new group for the cluster-admin user role
Copy link

3.3. Enhanced reliability and performance for Loki
Copy link

3.3.1. Loki pod placement
Copy link

3.3.2. Configuring Loki to tolerate node failure
Copy link

3.3.3. Enabling stream-based retention with Loki
Copy link

3.3.4. Configuring Loki to tolerate memberlist creation failure
Copy link

3.3.5. LokiStack behavior during cluster restarts
Copy link

3.4. Advanced deployment and scalability for Loki
Copy link

3.4.1. Zone aware data replication
Copy link

3.4.2. Recovering Loki pods from failed zones
Copy link

3.4.2.1. Troubleshooting PVC in a terminating state
Copy link

3.4.3. Troubleshooting Loki rate limit errors
Copy link

3.5. Log-based alerts for Loki
Copy link

3.5.1. Authorizing LokiStack rules RBAC permissions
Copy link

3.5.1.1. Examples
Copy link

3.5.2. Creating a log-based alerting rule with Loki
Copy link