Questo contenuto non è disponibile nella lingua selezionata.

Chapter 6. Loki query performance troubleshooting


This documentation details methods for optimizing your Logging stack to improve query performance and provides steps for troubleshooting.

6.1. Best practices for Loki query performance

You can take the following steps to improve Loki query performance:

  • Ensure that you are running the latest version of the Loki Operator.
  • Ensure that you have migrated LokiStack schema to the v13 version.
  • Ensure that you use reliable and fast object storage. Loki places significant demands on object storage. If you are not using an object store solution from a cloud provider, use solid-state drive (SSD) for your object storage. By using SSDs you can benefit from the high parallelization capabilities of Loki.

    To better understand the utilization of object storage by Loki, you can use the following query in the Metrics dashboard in the OpenShift Container Platform web console:

    sum by(status, container, operation) (label_replace(rate(loki_s3_request_duration_seconds_count{namespace="openshift-logging"}[5m]), "status", "${1}xx", "status_code", "([0-9]).."))
    Copy to Clipboard Toggle word wrap
  • Loki Operator enables automatic stream sharding by default. The default automatic stream sharding mechanism should be adequate in most cases and users should not need to configure perStream* attributes.
  • If you use the OpenTelemetry Protocol (OTLP) data model, you can configure additional stream labels in LokiStack. For more information, see Best practices for Loki labels.
  • Different types of queries have different performance characteristics. Use simple filter queries instead of regular expressions for better performance.

6.2. Best practices for Loki labels

Labels in Loki are the keyspace on which Loki shards incoming data. They are also the index used for finding logs at query-time. You can optimize query performance by properly using labels.

Consider the following criteria when creating labels:

  • Labels should describe infrastructure. This could include regions, clusters, servers, applications, namespaces, or environments.
  • Labels are long-lived. Label values should generate logs perpetually, or at least for several hours.
  • Labels are intuitive for querying.

6.3. Configuration of stream labels in Loki Operator

Configuring which labels the Loki Operator will use as stream labels depends on the data model you are using: ViaQ or OpenTelemetry Protocol (OTLP).

Both models come with a predefined set of stream labels, for more information, see OpenTelemetry data model.

ViaQ model

ViaQ does not support structured metadata. To configure stream labels for the ViaQ model, add the configuration in the ClusterLogForwarder resource. For example:

apiVersion: observability.openshift.io/v1
kind: ClusterLogForwarder
metadata:
  name: instance
  namespace: openshift-logging
spec:
  serviceAccount:
    name: logging-collector
  outputs:
    - name: lokistack-out
      type: lokiStack
      lokiStack:
        target:
          name: logging-loki
          namespace: openshift-logging
        labelKeys:
          application:
            ignoreGlobal: <true_or_false>
            labelKeys: []
          audit:
            ignoreGlobal: <true_or_false>
            labelKeys: []
          infrastructure:
            ignoreGlobal: <true_or_false>
            labelKeys: []
          global: []
Copy to Clipboard Toggle word wrap

lokiStack.labelKeys field contains the configuration that maps log record keys to Loki labels used to identify streams.

OTLP model
In the OTLP model all labels that are not specified as stream labels are attached as structured metadata.

The following are the best practices for creating stream labels:

  • have a low cardinality with at most tens of values.
  • The values are long lived. For example, the first level of an HTTP path: /load, /save, and /update.
  • The labels can be used in queries to improve query performance.

6.4. Analyzing Loki query performance

Every query and subquery in Loki generates a metrics.go log line with performance statistics. Subqueries emit the log line in the queriers. Every query has an associated single summary metrics.go line emitted by the query-front end. Use these statistics to calculate the query performance metrics.

Prerequisites

  • You have administrator permissions.
  • You have access to the OpenShift Container Platform web console.
  • You installed and configured Loki Operator.

Procedure

  1. In the OpenShift Container Platform web console, navigate to the Metrics Observe tab.
  2. Note the following values:

    • duration: Denotes the amount of time a query took to run.
    • queue_time: Denotes the time a query spent in the queue before being processed.
    • chunk_refs_fetch_time: Denotes the amount of time spent in getting chunk information from the index.
    • store_chunks_download_time: Denotes the amount of time in getting chunks from cache or storage.
  3. Calculate the following performance metrics:

    • total query time as total_duration:

      total_duration = duration + queue_time
      Copy to Clipboard Toggle word wrap
    • Percentage of the total duration that a query spent in the queue as Queue Time:

      Queue Time = queue_time / total_duration * 100
      Copy to Clipboard Toggle word wrap
    • Calculate the percentage of the total duration that was spent in getting chunk information from the index as Chunk Refs Fetch Time:

      Chunk Refs Fetch Time = chunk_refs_fetch_time / total_duration * 100
      Copy to Clipboard Toggle word wrap
    • Calculate the percentage of the total duration that was spent in getting chunks from cache or storage:

      Chunks Download Time = store_chunks_download_time / total_duration * 100
      Copy to Clipboard Toggle word wrap
    • Calculate the percentage of the total duration that was spent in executing the query:

      Execution Time = (duration - chunk_refs_fetch_time - store_chunks_download_time) / total_duration * 100
      Copy to Clipboard Toggle word wrap
  4. Refer to Query performance analysis to understand the reason for each metric and how each metric affects query performance.

6.5. Query performance analysis

For best query performance, you want to see as much time as possible spent in execution time, denoted by the Execution Time metric. See the table below for the reason other performance metrics might be higher and the steps you can take to improve them. You can also reduce the execution time by modifying your queries, thereby improving the overall performance.

Expand
IssueReasonFix

High Execution Time

Queries might be doing many CPU-intensive operations such as regular expression processing.

You can make the following changes:

  • Change your queries to reduce or remove regular expressions.
  • Add more CPU resources.

Your queries have many small log lines.

If your queries have many small lines, execution becomes dependent on how fast Loki can iterate the lines themselves. This becomes a CPU clock frequency bottleneck. To make things faster you need a faster CPU.

High Queue Time

You do not have enough queriers running.

The only fix is to increase the number of queriers replicas in the LokiStack spec.

High Chunk Refs Fetch Time

Insufficient number of index-gateway replicas in the LokiStack spec.

Increase the number of index-gateway replicas or ensure they have enough CPU resources.

High Chunks Download Time

The chunks might be too small

Check the average chunk size by dividing total_bytes value by cache_chunk_req value. The average represents the average uncompressed bytes per chunk. The value for best performance should be in the order of magnitude of megabytes. If the chunks are only a few hundred bytes or kilobytes in size, revisit labels to ensure that you are not splitting your data into very small chunks.

Query timing out

Query timeout value might be too low

Increase the queryTimeout value in the LokiStack spec.

Torna in cima
Red Hat logoGithubredditYoutubeTwitter

Formazione

Prova, acquista e vendi

Community

Informazioni sulla documentazione di Red Hat

Aiutiamo gli utenti Red Hat a innovarsi e raggiungere i propri obiettivi con i nostri prodotti e servizi grazie a contenuti di cui possono fidarsi. Esplora i nostri ultimi aggiornamenti.

Rendiamo l’open source più inclusivo

Red Hat si impegna a sostituire il linguaggio problematico nel codice, nella documentazione e nelle proprietà web. Per maggiori dettagli, visita il Blog di Red Hat.

Informazioni su Red Hat

Forniamo soluzioni consolidate che rendono più semplice per le aziende lavorare su piattaforme e ambienti diversi, dal datacenter centrale all'edge della rete.

Theme

© 2025 Red Hat