Chapter 6. Loki query performance troubleshooting

6.1. Best practices for Loki query performance
Copy link

You can take the following steps to improve Loki query performance:

Ensure that you are running the latest version of the Loki Operator.
Ensure that you have migrated LokiStack schema to the v13 version.
Ensure that you use reliable and fast object storage. Loki places significant demands on object storage. If you are not using an object store solution from a cloud provider, use solid-state drive (SSD) for your object storage. By using SSDs you can benefit from the high parallelization capabilities of Loki.
To better understand the utilization of object storage by Loki, you can use the following query in the Metrics dashboard in the OpenShift Container Platform web console:
```
sum by(status, container, operation) (label_replace(rate(loki_s3_request_duration_seconds_count{namespace="openshift-logging"}[5m]), "status", "${1}xx", "status_code", "([0-9]).."))
```
```
sum by(status, container, operation) (label_replace(rate(loki_s3_request_duration_seconds_count{namespace="openshift-logging"}[5m]), "status", "${1}xx", "status_code", "([0-9]).."))
```
Copy to Clipboard Toggle word wrap
Loki Operator enables automatic stream sharding by default. The default automatic stream sharding mechanism should be adequate in most cases and users should not need to configure perStream* attributes.
If you use the OpenTelemetry Protocol (OTLP) data model, you can configure additional stream labels in LokiStack. For more information, see Best practices for Loki labels.
Different types of queries have different performance characteristics. Use simple filter queries instead of regular expressions for better performance.

6.2. Best practices for Loki labels
Copy link

Labels in Loki are the keyspace on which Loki shards incoming data. They are also the index used for finding logs at query-time. You can optimize query performance by properly using labels.

Consider the following criteria when creating labels:

Labels should describe infrastructure. This could include regions, clusters, servers, applications, namespaces, or environments.
Labels are long-lived. Label values should generate logs perpetually, or at least for several hours.
Labels are intuitive for querying.

6.3. Configuration of stream labels in Loki Operator
Copy link

Configuring which labels the Loki Operator will use as stream labels depends on the data model you are using: ViaQ or OpenTelemetry Protocol (OTLP).

Both models come with a predefined set of stream labels, for more information, see OpenTelemetry data model.

ViaQ model

ViaQ does not support structured metadata. To configure stream labels for the ViaQ model, add the configuration in the ClusterLogForwarder resource. For example:

apiVersion: observability.openshift.io/v1
kind: ClusterLogForwarder
metadata:
  name: instance
  namespace: openshift-logging
spec:
  serviceAccount:
    name: logging-collector
  outputs:
    - name: lokistack-out
      type: lokiStack
      lokiStack:
        target:
          name: logging-loki
          namespace: openshift-logging
        labelKeys:
          application:
            ignoreGlobal: <true_or_false>
            labelKeys: []
          audit:
            ignoreGlobal: <true_or_false>
            labelKeys: []
          infrastructure:
            ignoreGlobal: <true_or_false>
            labelKeys: []
          global: []

apiVersion: observability.openshift.io/v1
kind: ClusterLogForwarder
metadata:
  name: instance
  namespace: openshift-logging
spec:
  serviceAccount:
    name: logging-collector
  outputs:
    - name: lokistack-out
      type: lokiStack
      lokiStack:
        target:
          name: logging-loki
          namespace: openshift-logging
        labelKeys:
          application:
            ignoreGlobal: <true_or_false>
            labelKeys: []
          audit:
            ignoreGlobal: <true_or_false>
            labelKeys: []
          infrastructure:
            ignoreGlobal: <true_or_false>
            labelKeys: []
          global: []

Copy to Clipboard

Toggle word wrap

lokiStack.labelKeys field contains the configuration that maps log record keys to Loki labels used to identify streams.

OTLP model

In the OTLP model all labels that are not specified as stream labels are attached as structured metadata.

The following are the best practices for creating stream labels:

have a low cardinality with at most tens of values.
The values are long lived. For example, the first level of an HTTP path: /load, /save, and /update.
The labels can be used in queries to improve query performance.

6.4. Analyzing Loki query performance
Copy link

Every query and subquery in Loki generates a metrics.go log line with performance statistics. Subqueries emit the log line in the queriers. Every query has an associated single summary metrics.go line emitted by the query-front end. Use these statistics to calculate the query performance metrics.

Prerequisites

You have administrator permissions.
You have access to the OpenShift Container Platform web console.
You installed and configured Loki Operator.

Procedure

In the OpenShift Container Platform web console, navigate to the Metrics Observe tab.
Note the following values:
- duration: Denotes the amount of time a query took to run.
- queue_time: Denotes the time a query spent in the queue before being processed.
- chunk_refs_fetch_time: Denotes the amount of time spent in getting chunk information from the index.
- store_chunks_download_time: Denotes the amount of time in getting chunks from cache or storage.
Calculate the following performance metrics:
- total query time as total_duration:
  total_duration = duration + queue_time
  Copy to Clipboard Toggle word wrap
- Percentage of the total duration that a query spent in the queue as Queue Time:
  Queue Time = queue_time / total_duration * 100
  Copy to Clipboard Toggle word wrap
- Calculate the percentage of the total duration that was spent in getting chunk information from the index as Chunk Refs Fetch Time:
  Chunk Refs Fetch Time = chunk_refs_fetch_time / total_duration * 100
  Copy to Clipboard Toggle word wrap
- Calculate the percentage of the total duration that was spent in getting chunks from cache or storage:
  Chunks Download Time = store_chunks_download_time / total_duration * 100
  Copy to Clipboard Toggle word wrap
- Calculate the percentage of the total duration that was spent in executing the query:
  Execution Time = (duration - chunk_refs_fetch_time - store_chunks_download_time) / total_duration * 100
  Copy to Clipboard Toggle word wrap
Refer to Query performance analysis to understand the reason for each metric and how each metric affects query performance.

6.5. Query performance analysis
Copy link

For best query performance, you want to see as much time as possible spent in execution time, denoted by the Execution Time metric. See the table below for the reason other performance metrics might be higher and the steps you can take to improve them. You can also reduce the execution time by modifying your queries, thereby improving the overall performance.

Expand

Issue	Reason	Fix
High `Execution Time`	Queries might be doing many CPU-intensive operations such as regular expression processing.	You can make the following changes: Change your queries to reduce or remove regular expressions. Add more CPU resources.
High `Execution Time`	Your queries have many small log lines.	If your queries have many small lines, execution becomes dependent on how fast Loki can iterate the lines themselves. This becomes a CPU clock frequency bottleneck. To make things faster you need a faster CPU.
High `Queue Time`	You do not have enough queriers running.	The only fix is to increase the number of queriers replicas in the `LokiStack` spec.
High `Chunk Refs Fetch Time`	Insufficient number of index-gateway replicas in the `LokiStack` spec.	Increase the number of index-gateway replicas or ensure they have enough CPU resources.
High `Chunks Download Time`	The chunks might be too small	Check the average chunk size by dividing `total_bytes` value by `cache_chunk_req` value. The average represents the average uncompressed bytes per chunk. The value for best performance should be in the order of magnitude of megabytes. If the chunks are only a few hundred bytes or kilobytes in size, revisit labels to ensure that you are not splitting your data into very small chunks.
Query timing out	Query timeout value might be too low	Increase the `queryTimeout` value in the LokiStack spec.

6.1. Best practices for Loki query performance
Copy link

6.2. Best practices for Loki labels
Copy link

6.3. Configuration of stream labels in Loki Operator
Copy link

6.4. Analyzing Loki query performance
Copy link

6.5. Query performance analysis
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 6. Loki query performance troubleshooting

6.1. Best practices for Loki query performanceCopy linkLink copied to clipboard!

6.2. Best practices for Loki labelsCopy linkLink copied to clipboard!

6.3. Configuration of stream labels in Loki OperatorCopy linkLink copied to clipboard!

6.4. Analyzing Loki query performanceCopy linkLink copied to clipboard!

6.5. Query performance analysisCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

6.1. Best practices for Loki query performance
Copy link

6.2. Best practices for Loki labels
Copy link

6.3. Configuration of stream labels in Loki Operator
Copy link

6.4. Analyzing Loki query performance
Copy link

6.5. Query performance analysis
Copy link