Chapter 5. Tuning the log store

5.1. Prerequisites
Copia collegamento

You have created a LokiStack custom resource.

5.2. Enhanced reliability and performance for Loki
Copia collegamento

Use the following configurations to ensure reliability and efficiency of Loki in production environments. These settings help optimize pod placement, data retention, cluster hardening, and recovery behavior to minimize data loss and maintain consistent performance under load.

5.2.1. Loki pod placement
Copia collegamento

You can control which nodes the Loki pods run on, and prevent other workloads from using those nodes, by using tolerations or node selectors on the pods.

You can apply tolerations to the log store pods with the LokiStack custom resource (CR) and apply taints to a node with the node specification. A taint on a node is a key:value pair that instructs the node to repel all pods that do not allow the taint. Using a specific key:value pair that is not on other pods ensures that only the log store pods can run on that node.

Example LokiStack with node selectors

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
# ...
  template:
    compactor:

1


      nodeSelector:
        node-role.kubernetes.io/infra: ""

2


    distributor:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    gateway:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    indexGateway:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    ingester:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    querier:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    queryFrontend:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
    ruler:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
# ...

1: Specifies the component pod type that applies to the node selector.
2: Specifies the pods that are moved to nodes containing the defined label.

Example LokiStack CR with node selectors and tolerations

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
# ...
  template:
    compactor:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
    distributor:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
    indexGateway:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
    ingester:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
    querier:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
    queryFrontend:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
    ruler:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
    gateway:
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/infra
        value: reserved
      - effect: NoExecute
        key: node-role.kubernetes.io/infra
        value: reserved
# ...

To configure the nodeSelector and tolerations fields of the LokiStack (CR), you can use the oc explain command to view the description and fields for a particular resource:

$ oc explain lokistack.spec.template

Example output

KIND:     LokiStack
VERSION:  loki.grafana.com/v1

RESOURCE: template <Object>

DESCRIPTION:
     Template defines the resource/limits/tolerations/nodeselectors per
     component

FIELDS:
   compactor	<Object>
     Compactor defines the compaction component spec.

   distributor	<Object>
     Distributor defines the distributor component spec.
...

For more detailed information, you can add a specific field:

$ oc explain lokistack.spec.template.compactor

Example output

KIND:     LokiStack
VERSION:  loki.grafana.com/v1

RESOURCE: compactor <Object>

DESCRIPTION:
     Compactor defines the compaction component spec.

FIELDS:
   nodeSelector	<map[string]string>
     NodeSelector defines the labels required by a node to schedule the
     component onto it.
...

5.2.2. Configuring Loki to tolerate node failure
Copia collegamento

In the logging 5.8 and later versions, the Loki Operator supports setting pod anti-affinity rules to request that pods of the same component are scheduled on different available nodes in the cluster.

Affinity is a property of pods that controls the nodes on which they prefer to be scheduled. Anti-affinity is a property of pods that prevents a pod from being scheduled on a node.

In OpenShift Container Platform, pod affinity and pod anti-affinity allow you to constrain which nodes your pod is eligible to be scheduled on based on the key-value labels on other pods.

The Operator sets default, preferred podAntiAffinity rules for all Loki components, which includes the compactor, distributor, gateway, indexGateway, ingester, querier, queryFrontend, and ruler components.

You can override the preferred podAntiAffinity settings for Loki components by configuring required settings in the requiredDuringSchedulingIgnoredDuringExecution field:

Example user settings for the ingester component

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
# ...
  template:
    ingester:
      podAntiAffinity:
      # ...
        requiredDuringSchedulingIgnoredDuringExecution:

1


        - labelSelector:
            matchLabels:

2


              app.kubernetes.io/component: ingester
          topologyKey: kubernetes.io/hostname
# ...

1: The stanza to define a required rule.
2: The key-value pair (label) that must be matched to apply the rule.

5.2.3. Enabling stream-based retention with Loki
Copia collegamento

You can configure retention policies based on log streams. You can set retention rules globally, per-tenant, or both. If you configure both, tenant rules apply before global rules.

Important

If there is no retention period defined on the s3 bucket or in the LokiStack custom resource (CR), then the logs are not pruned and they stay in the s3 bucket forever, which might fill up the s3 storage.

Note

Although logging version 5.9 and later supports schema v12, schema v13 is recommended for future compatibility.
For cost-effective log pruning, configure retention policies directly on the object storage provider. Use the lifecycle management features of the storage provider to ensure automatic deletion of old logs. This also avoids extra processing from Loki and delete requests to S3.
If the object storage does not support lifecycle policies, you must configure LokiStack to enforce retention internally. The supported retention period is up to 30 days.

Prerequisites

You have administrator permissions.
You have installed the Loki Operator.
You have installed the OpenShift CLI (oc).

Procedure

To enable stream-based retention, create a LokiStack CR and save it as a YAML file. In the following example, it is called lokistack.yaml.

Example global stream-based retention for S3

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
  limits:
   global:

1


      retention:

2


        days: 20
        streams:
        - days: 4
          priority: 1
          selector: '{kubernetes_namespace_name=~"test.+"}'

3


        - days: 1
          priority: 1
          selector: '{log_type="infrastructure"}'
  managementState: Managed
  replicationFactor: 1
  size: 1x.small
  storage:
    schemas:
    - effectiveDate: "2020-10-11"
      version: v13
    secret:
      name: logging-loki-s3
      type: s3
  storageClassName: gp3-csi
  tenants:
    mode: openshift-logging

1: Set the retention policy for all log streams. This policy does not impact the retention period for stored logs in object storage.
2: Enable retention in the cluster by adding the retention block to the CR.
3: Specify the LogQL query to match log streams to the retention rule.

Example per-tenant stream-based retention for S3

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
  limits:
    global:
      retention:
        days: 20
    tenants:

1


      application:
        retention:
          days: 1
          streams:
            - days: 4
              selector: '{kubernetes_namespace_name=~"test.+"}'

2


      infrastructure:
        retention:
          days: 5
          streams:
            - days: 1
              selector: '{kubernetes_namespace_name=~"openshift-cluster.+"}'
  managementState: Managed
  replicationFactor: 1
  size: 1x.small
  storage:
    schemas:
    - effectiveDate: "2020-10-11"
      version: v13
    secret:
      name: logging-loki-s3
      type: s3
  storageClassName: gp3-csi
  tenants:
    mode: openshift-logging

1: Set the retention policy per-tenant. Valid tenant types are application, audit, and infrastructure.
2: Specify the LogQL query to match log streams to the retention rule.

Apply the LokiStack CR:
```
$ oc apply -f lokistack.yaml
```

5.2.4. Configuring Loki to tolerate memberlist creation failure
Copia collegamento

In an OpenShift Container Platform cluster, administrators generally use a non-private IP network range. As a result, the LokiStack memberlist configuration fails because, by default, it only uses private IP networks.

As an administrator, you can select the pod network for the memberlist configuration. You can modify the LokiStack custom resource (CR) to use the podIP address in the hashRing spec. To configure the LokiStack CR, use the following command:

$ oc patch LokiStack logging-loki -n openshift-logging  --type=merge -p '{"spec": {"hashRing":{"memberlist":{"instanceAddrType":"podIP"},"type":"memberlist"}}}'

Example LokiStack to include podIP

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
# ...
  hashRing:
    type: memberlist
    memberlist:
      instanceAddrType: podIP
# ...

5.2.5. LokiStack behavior during cluster restarts
Copia collegamento

When an OpenShift Container Platform cluster is restarted, LokiStack ingestion and the query path continue to operate within the available CPU and memory resources available for the node. This means that there is no downtime for the LokiStack during OpenShift Container Platform cluster updates. This behavior is achieved by using PodDisruptionBudget resources. The Loki Operator provisions PodDisruptionBudget resources for Loki, which determine the minimum number of pods that must be available per component to ensure normal operations under certain conditions.

5.3. Advanced deployment and scalability for Loki
Copia collegamento

Configure high availability, scalability, and error handling for Loki to support large-scale deployments across multiple availability zones. These features enable Loki to tolerate zone failures, manage rate limit errors, and scale horizontally to handle increased log ingestion rates.

5.3.1. Zone aware data replication
Copia collegamento

The Loki Operator offers support for zone-aware data replication through pod topology spread constraints. Enabling this feature enhances reliability and safeguards against log loss in the event of a single zone failure. When configuring the deployment size as 1x.extra-small, 1x.small, or 1x.medium, the replication.factor field is automatically set to 2.

To ensure proper replication, you need to have at least as many availability zones as the replication factor specifies. While it is possible to have more availability zones than the replication factor, having fewer zones can lead to write failures. Each zone should host an equal number of instances for optimal operation.

Example LokiStack CR with zone replication enabled

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
 name: logging-loki
 namespace: openshift-logging
spec:
 replicationFactor: 2

1


 replication:
   factor: 2

2


   zones:
   -  maxSkew: 1

3


      topologyKey: topology.kubernetes.io/zone

4

1: Deprecated field, values entered are overwritten by replication.factor.
2: This value is automatically set when deployment size is selected at setup.
3: The maximum difference in number of pods between any two topology domains. The default is 1, and you cannot specify a value of 0.
4: Defines zones in the form of a topology key that corresponds to a node label.

5.3.2. Recovering Loki pods from failed zones
Copia collegamento

In OpenShift Container Platform a zone failure happens when specific availability zone resources become inaccessible. Availability zones are isolated areas within a cloud provider’s data center, aimed at enhancing redundancy and fault tolerance. If your OpenShift Container Platform cluster is not configured to handle this, a zone failure can lead to service or data loss.

Loki pods are part of a StatefulSet, and they come with Persistent Volume Claims (PVCs) provisioned by a StorageClass object. Each Loki pod and its PVCs reside in the same zone. When a zone failure occurs in a cluster, the StatefulSet controller automatically attempts to recover the affected pods in the failed zone.

Warning

The following procedure will delete the PVCs in the failed zone, and all data contained therein. To avoid complete data loss the replication factor field of the LokiStack CR should always be set to a value greater than 1 to ensure that Loki is replicating.

Prerequisites

Verify your LokiStack CR has a replication factor greater than 1.
Zone failure detected by the control plane, and nodes in the failed zone are marked by cloud provider integration.

The StatefulSet controller automatically attempts to reschedule pods in a failed zone. Because the associated PVCs are also in the failed zone, automatic rescheduling to a different zone does not work. You must manually delete the PVCs in the failed zone to allow successful re-creation of the stateful Loki Pod and its provisioned PVC in the new zone.

Procedure

List the pods in Pending status by running the following command:

$ oc get pods --field-selector status.phase==Pending -n openshift-logging

Example oc get pods output

NAME                           READY   STATUS    RESTARTS   AGE

1


logging-loki-index-gateway-1   0/1     Pending   0          17m
logging-loki-ingester-1        0/1     Pending   0          16m
logging-loki-ruler-1           0/1     Pending   0          16m

1: These pods are in Pending status because their corresponding PVCs are in the failed zone.

List the PVCs in Pending status by running the following command:

$ oc get pvc -o=json -n openshift-logging | jq '.items[] | select(.status.phase == "Pending") | .metadata.name' -r

Example oc get pvc output

storage-logging-loki-index-gateway-1
storage-logging-loki-ingester-1
wal-logging-loki-ingester-1
storage-logging-loki-ruler-1
wal-logging-loki-ruler-1

Delete the PVC(s) for a pod by running the following command:
```
$ oc delete pvc <pvc_name> -n openshift-logging
```
Delete the pod(s) by running the following command:
```
$ oc delete pod <pod_name> -n openshift-logging
```
Once these objects have been successfully deleted, they should automatically be rescheduled in an available zone.

5.3.2.1. Troubleshooting PVC in a terminating state
Copia collegamento

The PVCs might hang in the terminating state without being deleted, if PVC metadata finalizers are set to kubernetes.io/pv-protection. Removing the finalizers should allow the PVCs to delete successfully.

Remove the finalizer for each PVC by running the command below, then retry deletion.

$ oc patch pvc <pvc_name> -p '{"metadata":{"finalizers":null}}' -n openshift-logging

5.3.3. Troubleshooting Loki rate limit errors
Copia collegamento

If the Log Forwarder API forwards a large block of messages that exceeds the rate limit to Loki, Loki generates rate limit (429) errors.

These errors can occur during normal operation. For example, when adding the logging to a cluster that already has some logs, rate limit errors might occur while the logging tries to ingest all of the existing log entries. In this case, if the rate of addition of new logs is less than the total rate limit, the historical data is eventually ingested, and the rate limit errors are resolved without requiring user intervention.

In cases where the rate limit errors continue to occur, you can fix the issue by modifying the LokiStack custom resource (CR).

Important

The LokiStack CR is not available on Grafana-hosted Loki. This topic does not apply to Grafana-hosted Loki servers.

Conditions

The Log Forwarder API is configured to forward logs to Loki.

Your system sends a block of messages that is larger than 2 MB to Loki. For example:

"values":[["1630410392689800468","{\"kind\":\"Event\",\"apiVersion\":\
.......
......
......
......
\"received_at\":\"2021-08-31T11:46:32.800278+00:00\",\"version\":\"1.7.4 1.6.0\"}},\"@timestamp\":\"2021-08-31T11:46:32.799692+00:00\",\"viaq_index_name\":\"audit-write\",\"viaq_msg_id\":\"MzFjYjJkZjItNjY0MC00YWU4LWIwMTEtNGNmM2E5ZmViMGU4\",\"log_type\":\"audit\"}"]]}]}

After you enter oc logs -n openshift-logging -l component=collector, the collector logs in your cluster show a line containing one of the following error messages:

429 Too Many Requests Ingestion rate limit exceeded

Example Vector error message

2023-08-25T16:08:49.301780Z  WARN sink{component_kind="sink" component_id=default_loki_infra component_type=loki component_name=default_loki_infra}: vector::sinks::util::retries: Retrying after error. error=Server responded with an error: 429 Too Many Requests internal_log_rate_limit=true

The error is also visible on the receiving end. For example, in the LokiStack ingester pod:

Example Loki ingester error message

level=warn ts=2023-08-30T14:57:34.155592243Z caller=grpc_logging.go:43 duration=1.434942ms method=/logproto.Pusher/Push err="rpc error: code = Code(429) desc = entry with timestamp 2023-08-30 14:57:32.012778399 +0000 UTC ignored, reason: 'Per stream rate limit exceeded (limit: 3MB/sec) while attempting to ingest for stream

Procedure

Update the ingestionBurstSize and ingestionRate fields in the LokiStack CR:
```
apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
  limits:
    global:
      ingestion:
        ingestionBurstSize: 16 
```
1
```
        ingestionRate: 8 
```
2
```
# ...
```
1
The ingestionBurstSize field defines the maximum local rate-limited sample size per distributor replica in MB. This value is a hard limit. Set this value to at least the maximum logs size expected in a single push request. Single requests that are larger than the ingestionBurstSize value are not permitted.
2
The ingestionRate field is a soft limit on the maximum amount of ingested samples per second in MB. Rate limit errors occur if the rate of logs exceeds the limit, but the collector retries sending the logs. As long as the total average is lower than the limit, the system recovers and errors are resolved without user intervention.

5.4. Loki network policies for added security
Copia collegamento

The Loki Operator can deploy and manage a set of network policies that restrict communications to and from Loki components to enhance security. These network policies control ingress and egress traffic at the pod level, limiting exposure to only necessary services while allowing integration with external monitoring systems when required.

5.4.1. Loki network policies
Copia collegamento

You can enable the Loki Operator to automatically create a NetworkPolicy resource that implements a "default deny" security model with explicit allow rules for required communications. Network policies provide network segmentation for your LokiStack deployment by controlling ingress and egress traffic between Loki components and external services. The network policies in Loki Operator are designed to be secure by default while maintaining compatibility across diverse environments.

Network policies for Loki on OpenShift Container Platform include the following additional integrations:

Monitoring: Automatic integration with the OpenShift Container Platform monitoring stack.
DNS: Support for both standard and OpenShift Container Platform DNS services (port 5353).

5.4.2. Configuring a network policy for Loki
Copia collegamento

Enable or disable the deployment of NetworkPolicies per LokiStack by setting the networkPolicies field.

Prerequisites

You have administrator permissions.
You have installed the OpenShift CLI (oc).
You have installed the Loki Operator.
You have created a LokiStack custom resource (CR).

Procedure

Update the LokiStack CR:

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: logging-loki
  namespace: openshift-logging
spec:
  size: 1x.small
  storage:
    schemas:
    - version: v13
      effectiveDate: "<yyyy>-<mm>-<dd>"
    secret:
      name: logging-loki-s3
      type: s3
  storageClassName: <storage_class_name>
  tenants:
    mode: openshift-logging
  networkPolicies:
    ruleSet: RestrictIngressEgress

You can set one of the following values for the spec.networkPolicies.ruleSet field:

None

Loki Operator will not deploy any network policy.

RestrictIngressEgress

Loki Operator will deploy a set of network policies that restrict the communications to and from the Loki components.

If you do not define a spec.networkPolicies.ruleSet value, the platform and operator default values are inherited and full network access is allowed.

Apply the LokiStack CR object by running the following command:
```
$ oc apply -f <filename>.yaml
```

5.4.3. Loki NetworkPolicy resources
Copia collegamento

When network policies are enabled, the Loki Operator creates several NetworkPolicy resources to secure different aspects of your LokiStack deployment.

Expand

Policy name	Purpose	Components affected
{name}-default-deny	A baseline deny-all policy	All LokiStack pods
{name}-loki-allow	Inter-component communication allowed	All Loki components
{name}-loki-allow-metrics	Allow metric scraping on the prometheus endpoint	All Loki components
{name}-loki-allow-bucket-egress	Policy for object storage access	ingester, querier, index-gateway, compactor, ruler
{name}-loki-allow-gateway-ingress	Allow gateway access to Loki components	distributor, query-frontend, ruler
{name}-gateway-allow	Gateway external and monitoring access	LokiStack-gateway
{name}-gateway-allow-metrics	Allow metric scraping on the prometheus endpoint	LokiStack-gateway
{name}-ruler-allow-alert-egress	Allow ruler egress to AlertManager	ruler
{name}-loki-allow-query-frontend	Query frontend external access	query-frontend (OpenShift network mode)

5.4.4. Integrating Loki network policy with external systems
Copia collegamento

To integrate Loki with external systems such as custom dashboards, or external alerting, create additional network policies. You can select specific components by using the label app.kubernetes.io/component. Always include the labels app.kubernetes.io/name=lokistack and app.kubernetes.io/instance={name} to avoid collision with other pods deployed in the namespace.

Prerequisites

You have administrator permissions.
You have installed the OpenShift CLI (oc).
You have installed the Loki Operator.
You have created a LokiStack custom resource (CR).

Procedure

Create a network policy:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: <name>
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/name: lokistack
      app.kubernetes.io/instance: <instance_name>
      app.kubernetes.io/component: <loki_component>
  policyTypes:
  - Egress
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: <namespace_name>
    ports:
    - protocol: TCP
      port: <port_number>

Replace <component_name> with the component you want to integrate with.

Apply the network policy:
```
$ oc apply -f <file_name>.yaml
```

5.5. Log-based alerts for Loki
Copia collegamento

Configure log-based alerts for Loki by creating AlertingRule custom resources. Log-based alerting enables you to trigger alerts based on log patterns and volumes, complementing metric-based alerting to provide comprehensive observability. These alerts integrate with Red Hat OpenShift Logging monitoring and can route to external alerting systems.

5.5.1. Authorizing LokiStack rules RBAC permissions
Copia collegamento

Administrators bind cluster roles to users to enable them to create and manage alerting and recording rules. A cluster role is defined as a ClusterRole object that has the required role-based access control (RBAC) permissions.

The following cluster roles for alerting and recording rules are available for LokiStack:

Expand

Rule name	Description
`alertingrules.loki.grafana.com-v1-admin`	Users with this role have administrative-level access to manage alerting rules. This cluster role grants permissions to create, read, update, delete, list, and watch `AlertingRule` resources within the `loki.grafana.com/v1` API group.
`alertingrules.loki.grafana.com-v1-crdview`	Users with this role can view the definitions of Custom Resource Definitions (CRDs) related to `AlertingRule` resources within the `loki.grafana.com/v1` API group, but do not have permissions for modifying or managing these resources.
`alertingrules.loki.grafana.com-v1-edit`	Users with this role have permission to create, update, and delete `AlertingRule` resources.
`alertingrules.loki.grafana.com-v1-view`	Users with this role can read `AlertingRule` resources within the `loki.grafana.com/v1` API group. They can inspect configurations, labels, and annotations for existing alerting rules but cannot make any modifications to them.
`recordingrules.loki.grafana.com-v1-admin`	Users with this role have administrative-level access to manage recording rules. This cluster role grants permissions to create, read, update, delete, list, and watch `RecordingRule` resources within the `loki.grafana.com/v1` API group.
`recordingrules.loki.grafana.com-v1-crdview`	Users with this role can view the definitions of Custom Resource Definitions (CRDs) related to `RecordingRule` resources within the `loki.grafana.com/v1` API group, but do not have permissions for modifying or managing these resources.
`recordingrules.loki.grafana.com-v1-edit`	Users with this role have permission to create, update, and delete `RecordingRule` resources.
`recordingrules.loki.grafana.com-v1-view`	Users with this role can read `RecordingRule` resources within the `loki.grafana.com/v1` API group. They can inspect configurations, labels, and annotations for existing alerting rules but cannot make any modifications to them.

5.5.1.1. Examples
Copia collegamento

To apply cluster roles for a user, you must bind an existing cluster role to a specific username.

Cluster roles can be cluster or namespace scoped, depending on which type of role binding you use. When a RoleBinding object is used, as when using the oc adm policy add-role-to-user command, the cluster role only applies to the specified namespace. When a ClusterRoleBinding object is used, as when using the oc adm policy add-cluster-role-to-user command, the cluster role applies to all namespaces in the cluster.

The following example command gives the specified user create, read, update and delete (CRUD) permissions for alerting rules in a specific namespace in the cluster:

The following example displays cluster role binding command for alerting rule CRUD permissions in a specific namespace:

$ oc adm policy add-role-to-user alertingrules.loki.grafana.com-v1-admin -n <namespace> <username>

The following command gives the specified user administrator permissions for alerting rules in all namespaces:

$ oc adm policy add-cluster-role-to-user alertingrules.loki.grafana.com-v1-admin <username>

5.5.2. Creating a log-based alerting rule with Loki
Copia collegamento

The AlertingRule CR contains a set of specifications and webhook validation definitions to declare groups of alerting rules for a single LokiStack instance. In addition, the webhook validation definition provides support for rule validation conditions:

If an AlertingRule CR includes an invalid interval period, it is an invalid alerting rule
If an AlertingRule CR includes an invalid for period, it is an invalid alerting rule.
If an AlertingRule CR includes an invalid LogQL expr, it is an invalid alerting rule.
If an AlertingRule CR includes two groups with the same name, it is an invalid alerting rule.
If none of the above applies, an alerting rule is considered valid.

Expand

Table 5.1. AlertingRule definitions
Tenant type	Valid namespaces for `AlertingRule` CRs
application	`<your_application_namespace>`
audit	`openshift-logging`
infrastructure	`openshift-/`, `kube-/\`, `default`

Procedure

Create an AlertingRule custom resource (CR):

Example infrastructure AlertingRule CR

  apiVersion: loki.grafana.com/v1
  kind: AlertingRule
  metadata:
    name: loki-operator-alerts
    namespace: openshift-operators-redhat

1


    labels:

2


      openshift.io/<label_name>: "true"
  spec:
    tenantID: "infrastructure"

3


    groups:
      - name: LokiOperatorHighReconciliationError
        rules:
          - alert: HighPercentageError
            expr: |

4


              sum(rate({kubernetes_namespace_name="openshift-operators-redhat", kubernetes_pod_name=~"loki-operator-controller-manager.*"} |= "error" [1m])) by (job)
                /
              sum(rate({kubernetes_namespace_name="openshift-operators-redhat", kubernetes_pod_name=~"loki-operator-controller-manager.*"}[1m])) by (job)
                > 0.01
            for: 10s
            labels:
              severity: critical

5


            annotations:
              summary: High Loki Operator Reconciliation Errors

6


              description: High Loki Operator Reconciliation Errors

7

1: The namespace where this AlertingRule CR is created must have a label matching the LokiStack spec.rules.namespaceSelector definition.
2: The labels block must match the LokiStack spec.rules.selector definition.
3: AlertingRule CRs for infrastructure tenants are only supported in the openshift-*, kube-\*, or default namespaces.
4: The value for kubernetes_namespace_name: must match the value for metadata.namespace.
5: The value of this mandatory field must be critical, warning, or info.
6: This field is mandatory.
7: This field is mandatory.

Example application AlertingRule CR

  apiVersion: loki.grafana.com/v1
  kind: AlertingRule
  metadata:
    name: app-user-workload
    namespace: app-ns

1


    labels:

2


      openshift.io/<label_name>: "true"
  spec:
    tenantID: "application"
    groups:
      - name: AppUserWorkloadHighError
        rules:
          - alert:
            expr: |

3


              sum(rate({kubernetes_namespace_name="app-ns", kubernetes_pod_name=~"podName.*"} |= "error" [1m])) by (job)
            for: 10s
            labels:
              severity: critical

4


            annotations:
              summary:

5


              description:

6

1: The namespace where this AlertingRule CR is created must have a label matching the LokiStack spec.rules.namespaceSelector definition.
2: The labels block must match the LokiStack spec.rules.selector definition.
3: Value for kubernetes_namespace_name: must match the value for metadata.namespace.
4: The value of this mandatory field must be critical, warning, or info.
5: The value of this mandatory field is a summary of the rule.
6: The value of this mandatory field is a detailed description of the rule.

Apply the AlertingRule CR:
```
$ oc apply -f <filename>.yaml
```

Questo contenuto non è disponibile nella lingua selezionata.

5.1. Prerequisites
Copia collegamento

5.2. Enhanced reliability and performance for Loki
Copia collegamento

5.2.1. Loki pod placement
Copia collegamento

5.2.2. Configuring Loki to tolerate node failure
Copia collegamento

5.2.3. Enabling stream-based retention with Loki
Copia collegamento

5.2.4. Configuring Loki to tolerate memberlist creation failure
Copia collegamento

5.2.5. LokiStack behavior during cluster restarts
Copia collegamento

5.3. Advanced deployment and scalability for Loki
Copia collegamento

5.3.1. Zone aware data replication
Copia collegamento

5.3.2. Recovering Loki pods from failed zones
Copia collegamento

5.3.2.1. Troubleshooting PVC in a terminating state
Copia collegamento

5.3.3. Troubleshooting Loki rate limit errors
Copia collegamento

5.4. Loki network policies for added security
Copia collegamento

5.4.1. Loki network policies
Copia collegamento

5.4.2. Configuring a network policy for Loki
Copia collegamento

5.4.3. Loki NetworkPolicy resources
Copia collegamento

5.4.4. Integrating Loki network policy with external systems
Copia collegamento

5.5. Log-based alerts for Loki
Copia collegamento

5.5.1. Authorizing LokiStack rules RBAC permissions
Copia collegamento

5.5.1.1. Examples
Copia collegamento

5.5.2. Creating a log-based alerting rule with Loki
Copia collegamento

Formazione

Prova, acquista e vendi

Community

Informazioni su Red Hat

Rendiamo l’open source più inclusivo

Informazioni sulla documentazione di Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Questo contenuto non è disponibile nella lingua selezionata.

Chapter 5. Tuning the log store

5.1. PrerequisitesCopia collegamentoCollegamento copiato negli appunti!

5.2. Enhanced reliability and performance for LokiCopia collegamentoCollegamento copiato negli appunti!

5.2.1. Loki pod placementCopia collegamentoCollegamento copiato negli appunti!

5.2.2. Configuring Loki to tolerate node failureCopia collegamentoCollegamento copiato negli appunti!

5.2.3. Enabling stream-based retention with LokiCopia collegamentoCollegamento copiato negli appunti!

5.2.4. Configuring Loki to tolerate memberlist creation failureCopia collegamentoCollegamento copiato negli appunti!

5.2.5. LokiStack behavior during cluster restartsCopia collegamentoCollegamento copiato negli appunti!

5.3. Advanced deployment and scalability for LokiCopia collegamentoCollegamento copiato negli appunti!

5.3.1. Zone aware data replicationCopia collegamentoCollegamento copiato negli appunti!

5.3.2. Recovering Loki pods from failed zonesCopia collegamentoCollegamento copiato negli appunti!

5.3.2.1. Troubleshooting PVC in a terminating stateCopia collegamentoCollegamento copiato negli appunti!

5.3.3. Troubleshooting Loki rate limit errorsCopia collegamentoCollegamento copiato negli appunti!

5.4. Loki network policies for added securityCopia collegamentoCollegamento copiato negli appunti!

5.4.1. Loki network policiesCopia collegamentoCollegamento copiato negli appunti!

5.4.2. Configuring a network policy for LokiCopia collegamentoCollegamento copiato negli appunti!

5.4.3. Loki NetworkPolicy resourcesCopia collegamentoCollegamento copiato negli appunti!

5.4.4. Integrating Loki network policy with external systemsCopia collegamentoCollegamento copiato negli appunti!

5.5. Log-based alerts for LokiCopia collegamentoCollegamento copiato negli appunti!

5.5.1. Authorizing LokiStack rules RBAC permissionsCopia collegamentoCollegamento copiato negli appunti!

5.5.1.1. ExamplesCopia collegamentoCollegamento copiato negli appunti!

5.5.2. Creating a log-based alerting rule with LokiCopia collegamentoCollegamento copiato negli appunti!

Formazione

Prova, acquista e vendi

Community

Informazioni su Red Hat

Rendiamo l’open source più inclusivo

Informazioni sulla documentazione di Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

5.1. Prerequisites
Copia collegamento

5.2. Enhanced reliability and performance for Loki
Copia collegamento

5.2.1. Loki pod placement
Copia collegamento

5.2.2. Configuring Loki to tolerate node failure
Copia collegamento

5.2.3. Enabling stream-based retention with Loki
Copia collegamento

5.2.4. Configuring Loki to tolerate memberlist creation failure
Copia collegamento

5.2.5. LokiStack behavior during cluster restarts
Copia collegamento

5.3. Advanced deployment and scalability for Loki
Copia collegamento

5.3.1. Zone aware data replication
Copia collegamento

5.3.2. Recovering Loki pods from failed zones
Copia collegamento

5.3.2.1. Troubleshooting PVC in a terminating state
Copia collegamento

5.3.3. Troubleshooting Loki rate limit errors
Copia collegamento

5.4. Loki network policies for added security
Copia collegamento

5.4.1. Loki network policies
Copia collegamento

5.4.2. Configuring a network policy for Loki
Copia collegamento

5.4.3. Loki NetworkPolicy resources
Copia collegamento

5.4.4. Integrating Loki network policy with external systems
Copia collegamento

5.5. Log-based alerts for Loki
Copia collegamento

5.5.1. Authorizing LokiStack rules RBAC permissions
Copia collegamento

5.5.1.1. Examples
Copia collegamento

5.5.2. Creating a log-based alerting rule with Loki
Copia collegamento