Chapter 10. Log storage
10.1. About log storage
You can use an internal Loki or Elasticsearch log store on your cluster for storing logs, or you can use a ClusterLogForwarder
custom resource (CR) to forward logs to an external store.
10.1.1. Log storage types
Loki is a horizontally scalable, highly available, multi-tenant log aggregation system offered as an alternative to Elasticsearch as a log store for the logging.
Elasticsearch indexes incoming log records completely during ingestion. Loki only indexes a few fixed labels during ingestion and defers more complex parsing until after the logs have been stored. This means Loki can collect logs more quickly.
10.1.1.1. About the Elasticsearch log store
The logging Elasticsearch instance is optimized and tested for short term storage, approximately seven days. If you want to retain your logs over a longer term, it is recommended you move the data to a third-party storage system.
Elasticsearch organizes the log data from Fluentd into datastores, or indices, then subdivides each index into multiple pieces called shards, which it spreads across a set of Elasticsearch nodes in an Elasticsearch cluster. You can configure Elasticsearch to make copies of the shards, called replicas, which Elasticsearch also spreads across the Elasticsearch nodes. The ClusterLogging
custom resource (CR) allows you to specify how the shards are replicated to provide data redundancy and resilience to failure. You can also specify how long the different types of logs are retained using a retention policy in the ClusterLogging
CR.
The number of primary shards for the index templates is equal to the number of Elasticsearch data nodes.
The Red Hat OpenShift Logging Operator and companion OpenShift Elasticsearch Operator ensure that each Elasticsearch node is deployed using a unique deployment that includes its own storage volume. You can use a ClusterLogging
custom resource (CR) to increase the number of Elasticsearch nodes, as needed. See the Elasticsearch documentation for considerations involved in configuring storage.
A highly-available Elasticsearch environment requires at least three Elasticsearch nodes, each on a different host.
Role-based access control (RBAC) applied on the Elasticsearch indices enables the controlled access of the logs to the developers. Administrators can access all logs and developers can access only the logs in their projects.
10.1.2. Querying log stores
You can query Loki by using the LogQL log query language.
10.1.3. Additional resources
10.2. Installing log storage
You can use the OpenShift CLI (oc
) or the OpenShift Container Platform web console to deploy a log store on your OpenShift Container Platform cluster.
The Logging 5.9 release does not contain an updated version of the OpenShift Elasticsearch Operator. If you currently use the OpenShift Elasticsearch Operator released with Logging 5.8, it will continue to work with Logging until the EOL of Logging 5.8. As an alternative to using the OpenShift Elasticsearch Operator to manage the default log storage, you can use the Loki Operator. For more information on the Logging lifecycle dates, see Platform Agnostic Operators.
10.2.1. Deploying a Loki log store
You can use the Loki Operator to deploy an internal Loki log store on your OpenShift Container Platform cluster. After install the Loki Operator, you must configure Loki object storage by creating a secret, and create a LokiStack
custom resource (CR).
10.2.1.1. Loki deployment sizing
Sizing for Loki follows the format of 1x.<size>
where the value 1x
is number of instances and <size>
specifies performance capabilities.
It is not possible to change the number 1x
for the deployment size.
1x.demo | 1x.extra-small | 1x.small | 1x.medium | |
---|---|---|---|---|
Data transfer | Demo use only | 100GB/day | 500GB/day | 2TB/day |
Queries per second (QPS) | Demo use only | 1-25 QPS at 200ms | 25-50 QPS at 200ms | 25-75 QPS at 200ms |
Replication factor | None | 2 | 2 | 2 |
Total CPU requests | None | 14 vCPUs | 34 vCPUs | 54 vCPUs |
Total CPU requests if using the ruler | None | 16 vCPUs | 42 vCPUs | 70 vCPUs |
Total memory requests | None | 31Gi | 67Gi | 139Gi |
Total memory requests if using the ruler | None | 35Gi | 83Gi | 171Gi |
Total disk requests | 40Gi | 430Gi | 430Gi | 590Gi |
Total disk requests if using the ruler | 80Gi | 750Gi | 750Gi | 910Gi |
10.2.1.2. Installing the Loki Operator by using the OpenShift Container Platform web console
To install and configure logging on your OpenShift Container Platform cluster, additional Operators must be installed. This can be done from the Operator Hub within the web console.
OpenShift Container Platform Operators use custom resources (CR) to manage applications and their components. High-level configuration and settings are provided by the user within a CR. The Operator translates high-level directives into low-level actions, based on best practices embedded within the Operator’s logic. A custom resource definition (CRD) defines a CR and lists all the configurations available to users of the Operator. Installing an Operator creates the CRDs, which are then used to generate CRs.
Prerequisites
- You have access to a supported object store (AWS S3, Google Cloud Storage, Azure, Swift, Minio, OpenShift Data Foundation).
- You have administrator permissions.
- You have access to the OpenShift Container Platform web console.
Procedure
-
In the OpenShift Container Platform web console Administrator perspective, go to Operators
OperatorHub. Type Loki Operator in the Filter by keyword field. Click Loki Operator in the list of available Operators, and then click Install.
ImportantThe Community Loki Operator is not supported by Red Hat.
Select stable or stable-x.y as the Update channel.
NoteThe stable channel only provides updates to the most recent release of logging. To continue receiving updates for prior releases, you must change your subscription channel to stable-x.y, where
x.y
represents the major and minor version of logging you have installed. For example, stable-5.7.The Loki Operator must be deployed to the global operator group namespace
openshift-operators-redhat
, so the Installation mode and Installed Namespace are already selected. If this namespace does not already exist, it is created for you.Select Enable operator-recommended cluster monitoring on this namespace.
This option sets the
openshift.io/cluster-monitoring: "true"
label in theNamespace
object. You must select this option to ensure that cluster monitoring scrapes theopenshift-operators-redhat
namespace.For Update approval select Automatic, then click Install.
If the approval strategy in the subscription is set to Automatic, the update process initiates as soon as a new Operator version is available in the selected channel. If the approval strategy is set to Manual, you must manually approve pending updates.
Verification
-
Go to Operators
Installed Operators. - Make sure the openshift-logging project is selected.
- In the Status column, verify that you see green checkmarks with InstallSucceeded and the text Up to date.
An Operator might display a Failed
status before the installation finishes. If the Operator install completes with an InstallSucceeded
message, refresh the page.
10.2.1.3. Creating a secret for Loki object storage by using the web console
To configure Loki object storage, you must create a secret. You can create a secret by using the OpenShift Container Platform web console.
Prerequisites
- You have administrator permissions.
- You have access to the OpenShift Container Platform web console.
- You installed the Loki Operator.
Procedure
-
Go to Workloads
Secrets in the Administrator perspective of the OpenShift Container Platform web console. - From the Create drop-down list, select From YAML.
Create a secret that uses the
access_key_id
andaccess_key_secret
fields to specify your credentials and thebucketnames
,endpoint
, andregion
fields to define the object storage location. AWS is used in the following example:Example
Secret
objectapiVersion: v1 kind: Secret metadata: name: logging-loki-s3 namespace: openshift-logging stringData: access_key_id: AKIAIOSFODNN7EXAMPLE access_key_secret: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY bucketnames: s3-bucket-name endpoint: https://s3.eu-central-1.amazonaws.com region: eu-central-1
Additional resources
10.2.1.4. Creating a LokiStack custom resource by using the web console
You can create a LokiStack
custom resource (CR) by using the OpenShift Container Platform web console.
Prerequisites
- You have administrator permissions.
- You have access to the OpenShift Container Platform web console.
- You installed the Loki Operator.
Procedure
-
Go to the Operators
Installed Operators page. Click the All instances tab. - From the Create new drop-down list, select LokiStack.
Select YAML view, and then use the following template to create a
LokiStack
CR:apiVersion: loki.grafana.com/v1 kind: LokiStack metadata: name: logging-loki 1 namespace: openshift-logging spec: size: 1x.small 2 storage: schemas: - version: v12 effectiveDate: '2022-06-01' secret: name: logging-loki-s3 3 type: s3 4 storageClassName: <storage_class_name> 5 tenants: mode: openshift-logging 6
- 1
- Use the name
logging-loki
. - 2
- Specify the deployment size. In the logging 5.8 and later versions, the supported size options for production instances of Loki are
1x.extra-small
,1x.small
, or1x.medium
.ImportantIt is not possible to change the number
1x
for the deployment size. - 3
- Specify the secret used for your log storage.
- 4
- Specify the corresponding storage type.
- 5
- Enter the name of a storage class for temporary storage. For best performance, specify a storage class that allocates block storage. Available storage classes for your cluster can be listed by using the
oc get storageclasses
command. - 6
- LokiStack defaults to running in multi-tenant mode, which cannot be modified. One tenant is provided for each log type: audit, infrastructure, and application logs. This enables access control for individual users and user groups to different log streams.
- Click Create.
10.2.1.5. Installing Loki Operator by using the CLI
To install and configure logging on your OpenShift Container Platform cluster, additional Operators must be installed. This can be done from the OpenShift Container Platform CLI.
OpenShift Container Platform Operators use custom resources (CR) to manage applications and their components. High-level configuration and settings are provided by the user within a CR. The Operator translates high-level directives into low-level actions, based on best practices embedded within the Operator’s logic. A custom resource definition (CRD) defines a CR and lists all the configurations available to users of the Operator. Installing an Operator creates the CRDs, which are then used to generate CRs.
Prerequisites
- You have administrator permissions.
-
You installed the OpenShift CLI (
oc
). - You have access to a supported object store. For example: AWS S3, Google Cloud Storage, Azure, Swift, Minio, or OpenShift Data Foundation.
Procedure
Create a
Subscription
object:apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: loki-operator namespace: openshift-operators-redhat 1 spec: channel: stable 2 name: loki-operator source: redhat-operators 3 sourceNamespace: openshift-marketplace
- 1
- You must specify the
openshift-operators-redhat
namespace. - 2
- Specify
stable
, orstable-5.<y>
as the channel. - 3
- Specify
redhat-operators
. If your OpenShift Container Platform cluster is installed on a restricted network, also known as a disconnected cluster, specify the name of theCatalogSource
object you created when you configured the Operator Lifecycle Manager (OLM).
Apply the
Subscription
object:$ oc apply -f <filename>.yaml
10.2.1.6. Creating a secret for Loki object storage by using the CLI
To configure Loki object storage, you must create a secret. You can do this by using the OpenShift CLI (oc
).
Prerequisites
- You have administrator permissions.
- You installed the Loki Operator.
-
You installed the OpenShift CLI (
oc
).
Procedure
Create a secret in the directory that contains your certificate and key files by running the following command:
$ oc create secret generic -n openshift-logging <your_secret_name> \ --from-file=tls.key=<your_key_file> --from-file=tls.crt=<your_crt_file> --from-file=ca-bundle.crt=<your_bundle_file> --from-literal=username=<your_username> --from-literal=password=<your_password>
Use generic or opaque secrets for best results.
Verification
Verify that a secret was created by running the following command:
$ oc get secrets
Additional resources
10.2.1.7. Creating a LokiStack custom resource by using the CLI
You can create a LokiStack
custom resource (CR) by using the OpenShift CLI (oc
).
Prerequisites
- You have administrator permissions.
- You installed the Loki Operator.
-
You installed the OpenShift CLI (
oc
).
Procedure
Create a
LokiStack
CR:Example
LokiStack
CRapiVersion: loki.grafana.com/v1 kind: LokiStack metadata: name: logging-loki namespace: openshift-logging spec: size: 1x.small 1 storage: schemas: - version: v12 effectiveDate: "2022-06-01" secret: name: logging-loki-s3 2 type: s3 3 storageClassName: <storage_class_name> 4 tenants: mode: openshift-logging 5
- 1
- Specify the deployment size. In the logging 5.8 and later versions, the supported size options for production instances of Loki are
1x.extra-small
,1x.small
, or1x.medium
.ImportantIt is not possible to change the number
1x
for the deployment size. - 2
- Specify the name of your log store secret.
- 3
- Specify the type of your log store secret.
- 4
- Specify the name of a storage class for temporary storage. For best performance, specify a storage class that allocates block storage. Available storage classes for your cluster can be listed by using the
oc get storageclasses
command. - 5
- LokiStack defaults to running in multi-tenant mode, which cannot be modified. One tenant is provided for each log type: audit, infrastructure, and application logs. This enables access control for individual users and user groups to different log streams.
Apply the
LokiStack
CR by running the following command:$ oc apply -f <filename>.yaml
Verification
Verify the installation by listing the pods in the
openshift-logging
project by running the following command and observing the output:$ oc get pods -n openshift-logging
Confirm that you see several pods for components of the logging, similar to the following list:
Example output
NAME READY STATUS RESTARTS AGE cluster-logging-operator-78fddc697-mnl82 1/1 Running 0 14m collector-6cglq 2/2 Running 0 45s collector-8r664 2/2 Running 0 45s collector-8z7px 2/2 Running 0 45s collector-pdxl9 2/2 Running 0 45s collector-tc9dx 2/2 Running 0 45s collector-xkd76 2/2 Running 0 45s logging-loki-compactor-0 1/1 Running 0 8m2s logging-loki-distributor-b85b7d9fd-25j9g 1/1 Running 0 8m2s logging-loki-distributor-b85b7d9fd-xwjs6 1/1 Running 0 8m2s logging-loki-gateway-7bb86fd855-hjhl4 2/2 Running 0 8m2s logging-loki-gateway-7bb86fd855-qjtlb 2/2 Running 0 8m2s logging-loki-index-gateway-0 1/1 Running 0 8m2s logging-loki-index-gateway-1 1/1 Running 0 7m29s logging-loki-ingester-0 1/1 Running 0 8m2s logging-loki-ingester-1 1/1 Running 0 6m46s logging-loki-querier-f5cf9cb87-9fdjd 1/1 Running 0 8m2s logging-loki-querier-f5cf9cb87-fp9v5 1/1 Running 0 8m2s logging-loki-query-frontend-58c579fcb7-lfvbc 1/1 Running 0 8m2s logging-loki-query-frontend-58c579fcb7-tjf9k 1/1 Running 0 8m2s logging-view-plugin-79448d8df6-ckgmx 1/1 Running 0 46s
10.2.2. Loki object storage
The Loki Operator supports AWS S3, as well as other S3 compatible object stores such as Minio and OpenShift Data Foundation. Azure, GCS, and Swift are also supported.
The recommended nomenclature for Loki storage is logging-loki-<your_storage_provider>
.
The following table shows the type
values within the LokiStack
custom resource (CR) for each storage provider. For more information, see the section on your storage provider.
Storage provider | Secret type value |
---|---|
AWS | s3 |
Azure | azure |
Google Cloud | gcs |
Minio | s3 |
OpenShift Data Foundation | s3 |
Swift | swift |
10.2.2.1. AWS storage
Prerequisites
- You installed the Loki Operator.
-
You installed the OpenShift CLI (
oc
). - You created a bucket on AWS.
- You created an AWS IAM Policy and IAM User.
Procedure
Create an object storage secret with the name
logging-loki-aws
by running the following command:$ oc create secret generic logging-loki-aws \ --from-literal=bucketnames="<bucket_name>" \ --from-literal=endpoint="<aws_bucket_endpoint>" \ --from-literal=access_key_id="<aws_access_key_id>" \ --from-literal=access_key_secret="<aws_access_key_secret>" \ --from-literal=region="<aws_region_of_your_bucket>"
10.2.2.2. Azure storage
Prerequisites
- You installed the Loki Operator.
-
You installed the OpenShift CLI (
oc
). - You created a bucket on Azure.
Procedure
Create an object storage secret with the name
logging-loki-azure
by running the following command:$ oc create secret generic logging-loki-azure \ --from-literal=container="<azure_container_name>" \ --from-literal=environment="<azure_environment>" \ 1 --from-literal=account_name="<azure_account_name>" \ --from-literal=account_key="<azure_account_key>"
- 1
- Supported environment values are
AzureGlobal
,AzureChinaCloud
,AzureGermanCloud
, orAzureUSGovernment
.
10.2.2.3. Google Cloud Platform storage
Prerequisites
- You installed the Loki Operator.
-
You installed the OpenShift CLI (
oc
). - You created a project on Google Cloud Platform (GCP).
- You created a bucket in the same project.
- You created a service account in the same project for GCP authentication.
Procedure
-
Copy the service account credentials received from GCP into a file called
key.json
. Create an object storage secret with the name
logging-loki-gcs
by running the following command:$ oc create secret generic logging-loki-gcs \ --from-literal=bucketname="<bucket_name>" \ --from-file=key.json="<path/to/key.json>"
10.2.2.4. Minio storage
Prerequisites
Procedure
Create an object storage secret with the name
logging-loki-minio
by running the following command:$ oc create secret generic logging-loki-minio \ --from-literal=bucketnames="<bucket_name>" \ --from-literal=endpoint="<minio_bucket_endpoint>" \ --from-literal=access_key_id="<minio_access_key_id>" \ --from-literal=access_key_secret="<minio_access_key_secret>"
10.2.2.5. OpenShift Data Foundation storage
Prerequisites
- You installed the Loki Operator.
-
You installed the OpenShift CLI (
oc
). - You deployed OpenShift Data Foundation.
- You configured your OpenShift Data Foundation cluster for object storage.
Procedure
Create an
ObjectBucketClaim
custom resource in theopenshift-logging
namespace:apiVersion: objectbucket.io/v1alpha1 kind: ObjectBucketClaim metadata: name: loki-bucket-odf namespace: openshift-logging spec: generateBucketName: loki-bucket-odf storageClassName: openshift-storage.noobaa.io
Get bucket properties from the associated
ConfigMap
object by running the following command:BUCKET_HOST=$(oc get -n openshift-logging configmap loki-bucket-odf -o jsonpath='{.data.BUCKET_HOST}') BUCKET_NAME=$(oc get -n openshift-logging configmap loki-bucket-odf -o jsonpath='{.data.BUCKET_NAME}') BUCKET_PORT=$(oc get -n openshift-logging configmap loki-bucket-odf -o jsonpath='{.data.BUCKET_PORT}')
Get bucket access key from the associated secret by running the following command:
ACCESS_KEY_ID=$(oc get -n openshift-logging secret loki-bucket-odf -o jsonpath='{.data.AWS_ACCESS_KEY_ID}' | base64 -d) SECRET_ACCESS_KEY=$(oc get -n openshift-logging secret loki-bucket-odf -o jsonpath='{.data.AWS_SECRET_ACCESS_KEY}' | base64 -d)
Create an object storage secret with the name
logging-loki-odf
by running the following command:$ oc create -n openshift-logging secret generic logging-loki-odf \ --from-literal=access_key_id="<access_key_id>" \ --from-literal=access_key_secret="<secret_access_key>" \ --from-literal=bucketnames="<bucket_name>" \ --from-literal=endpoint="https://<bucket_host>:<bucket_port>"
10.2.2.6. Swift storage
Prerequisites
- You installed the Loki Operator.
-
You installed the OpenShift CLI (
oc
). - You created a bucket on Swift.
Procedure
Create an object storage secret with the name
logging-loki-swift
by running the following command:$ oc create secret generic logging-loki-swift \ --from-literal=auth_url="<swift_auth_url>" \ --from-literal=username="<swift_usernameclaim>" \ --from-literal=user_domain_name="<swift_user_domain_name>" \ --from-literal=user_domain_id="<swift_user_domain_id>" \ --from-literal=user_id="<swift_user_id>" \ --from-literal=password="<swift_password>" \ --from-literal=domain_id="<swift_domain_id>" \ --from-literal=domain_name="<swift_domain_name>" \ --from-literal=container_name="<swift_container_name>"
You can optionally provide project-specific data, region, or both by running the following command:
$ oc create secret generic logging-loki-swift \ --from-literal=auth_url="<swift_auth_url>" \ --from-literal=username="<swift_usernameclaim>" \ --from-literal=user_domain_name="<swift_user_domain_name>" \ --from-literal=user_domain_id="<swift_user_domain_id>" \ --from-literal=user_id="<swift_user_id>" \ --from-literal=password="<swift_password>" \ --from-literal=domain_id="<swift_domain_id>" \ --from-literal=domain_name="<swift_domain_name>" \ --from-literal=container_name="<swift_container_name>" \ --from-literal=project_id="<swift_project_id>" \ --from-literal=project_name="<swift_project_name>" \ --from-literal=project_domain_id="<swift_project_domain_id>" \ --from-literal=project_domain_name="<swift_project_domain_name>" \ --from-literal=region="<swift_region>"
10.2.3. Deploying an Elasticsearch log store
You can use the OpenShift Elasticsearch Operator to deploy an internal Elasticsearch log store on your OpenShift Container Platform cluster.
The Logging 5.9 release does not contain an updated version of the OpenShift Elasticsearch Operator. If you currently use the OpenShift Elasticsearch Operator released with Logging 5.8, it will continue to work with Logging until the EOL of Logging 5.8. As an alternative to using the OpenShift Elasticsearch Operator to manage the default log storage, you can use the Loki Operator. For more information on the Logging lifecycle dates, see Platform Agnostic Operators.
10.2.3.1. Storage considerations for Elasticsearch
A persistent volume is required for each Elasticsearch deployment configuration. On OpenShift Container Platform this is achieved using persistent volume claims (PVCs).
If you use a local volume for persistent storage, do not use a raw block volume, which is described with volumeMode: block
in the LocalVolume
object. Elasticsearch cannot use raw block volumes.
The OpenShift Elasticsearch Operator names the PVCs using the Elasticsearch resource name.
Fluentd ships any logs from systemd journal and /var/log/containers/*.log to Elasticsearch.
Elasticsearch requires sufficient memory to perform large merge operations. If it does not have enough memory, it becomes unresponsive. To avoid this problem, evaluate how much application log data you need, and allocate approximately double that amount of free storage capacity.
By default, when storage capacity is 85% full, Elasticsearch stops allocating new data to the node. At 90%, Elasticsearch attempts to relocate existing shards from that node to other nodes if possible. But if no nodes have a free capacity below 85%, Elasticsearch effectively rejects creating new indices and becomes RED.
These low and high watermark values are Elasticsearch defaults in the current release. You can modify these default values. Although the alerts use the same default values, you cannot change these values in the alerts.
10.2.3.2. Installing the OpenShift Elasticsearch Operator by using the web console
The OpenShift Elasticsearch Operator creates and manages the Elasticsearch cluster used by OpenShift Logging.
Prerequisites
Elasticsearch is a memory-intensive application. Each Elasticsearch node needs at least 16GB of memory for both memory requests and limits, unless you specify otherwise in the
ClusterLogging
custom resource.The initial set of OpenShift Container Platform nodes might not be large enough to support the Elasticsearch cluster. You must add additional nodes to the OpenShift Container Platform cluster to run with the recommended or higher memory, up to a maximum of 64GB for each Elasticsearch node.
Elasticsearch nodes can operate with a lower memory setting, though this is not recommended for production environments.
Ensure that you have the necessary persistent storage for Elasticsearch. Note that each Elasticsearch node requires its own storage volume.
NoteIf you use a local volume for persistent storage, do not use a raw block volume, which is described with
volumeMode: block
in theLocalVolume
object. Elasticsearch cannot use raw block volumes.
Procedure
-
In the OpenShift Container Platform web console, click Operators
OperatorHub. - Click OpenShift Elasticsearch Operator from the list of available Operators, and click Install.
- Ensure that the All namespaces on the cluster is selected under Installation mode.
Ensure that openshift-operators-redhat is selected under Installed Namespace.
You must specify the
openshift-operators-redhat
namespace. Theopenshift-operators
namespace might contain Community Operators, which are untrusted and could publish a metric with the same name as OpenShift Container Platform metric, which would cause conflicts.Select Enable operator recommended cluster monitoring on this namespace.
This option sets the
openshift.io/cluster-monitoring: "true"
label in theNamespace
object. You must select this option to ensure that cluster monitoring scrapes theopenshift-operators-redhat
namespace.- Select stable-5.x as the Update channel.
Select an Update approval strategy:
- The Automatic strategy allows Operator Lifecycle Manager (OLM) to automatically update the Operator when a new version is available.
- The Manual strategy requires a user with appropriate credentials to approve the Operator update.
- Click Install.
Verification
-
Verify that the OpenShift Elasticsearch Operator installed by switching to the Operators
Installed Operators page. - Ensure that OpenShift Elasticsearch Operator is listed in all projects with a Status of Succeeded.
10.2.3.3. Installing the OpenShift Elasticsearch Operator by using the CLI
You can use the OpenShift CLI (oc
) to install the OpenShift Elasticsearch Operator.
Prerequisites
Ensure that you have the necessary persistent storage for Elasticsearch. Note that each Elasticsearch node requires its own storage volume.
NoteIf you use a local volume for persistent storage, do not use a raw block volume, which is described with
volumeMode: block
in theLocalVolume
object. Elasticsearch cannot use raw block volumes.Elasticsearch is a memory-intensive application. By default, OpenShift Container Platform installs three Elasticsearch nodes with memory requests and limits of 16 GB. This initial set of three OpenShift Container Platform nodes might not have enough memory to run Elasticsearch within your cluster. If you experience memory issues that are related to Elasticsearch, add more Elasticsearch nodes to your cluster rather than increasing the memory on existing nodes.
- You have administrator permissions.
-
You have installed the OpenShift CLI (
oc
).
Procedure
Create a
Namespace
object as a YAML file:apiVersion: v1 kind: Namespace metadata: name: openshift-operators-redhat 1 annotations: openshift.io/node-selector: "" labels: openshift.io/cluster-monitoring: "true" 2
- 1
- You must specify the
openshift-operators-redhat
namespace. To prevent possible conflicts with metrics, configure the Prometheus Cluster Monitoring stack to scrape metrics from theopenshift-operators-redhat
namespace and not theopenshift-operators
namespace. Theopenshift-operators
namespace might contain community Operators, which are untrusted and could publish a metric with the same name as metric, which would cause conflicts. - 2
- String. You must specify this label as shown to ensure that cluster monitoring scrapes the
openshift-operators-redhat
namespace.
Apply the
Namespace
object by running the following command:$ oc apply -f <filename>.yaml
Create an
OperatorGroup
object as a YAML file:apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: openshift-operators-redhat namespace: openshift-operators-redhat 1 spec: {}
- 1
- You must specify the
openshift-operators-redhat
namespace.
Apply the
OperatorGroup
object by running the following command:$ oc apply -f <filename>.yaml
Create a
Subscription
object to subscribe the namespace to the OpenShift Elasticsearch Operator:Example Subscription
apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: elasticsearch-operator namespace: openshift-operators-redhat 1 spec: channel: stable-x.y 2 installPlanApproval: Automatic 3 source: redhat-operators 4 sourceNamespace: openshift-marketplace name: elasticsearch-operator
- 1
- You must specify the
openshift-operators-redhat
namespace. - 2
- Specify
stable
, orstable-x.y
as the channel. See the following note. - 3
Automatic
allows the Operator Lifecycle Manager (OLM) to automatically update the Operator when a new version is available.Manual
requires a user with appropriate credentials to approve the Operator update.- 4
- Specify
redhat-operators
. If your OpenShift Container Platform cluster is installed on a restricted network, also known as a disconnected cluster, specify the name of theCatalogSource
object created when you configured the Operator Lifecycle Manager (OLM).
NoteSpecifying
stable
installs the current version of the latest stable release. Usingstable
withinstallPlanApproval: "Automatic"
automatically upgrades your Operators to the latest stable major and minor release.Specifying
stable-x.y
installs the current minor version of a specific major release. Usingstable-x.y
withinstallPlanApproval: "Automatic"
automatically upgrades your Operators to the latest stable minor release within the major release.Apply the subscription by running the following command:
$ oc apply -f <filename>.yaml
The OpenShift Elasticsearch Operator is installed to the
openshift-operators-redhat
namespace and copied to each project in the cluster.
Verification
Run the following command:
$ oc get csv -n --all-namespaces
Observe the output and confirm that pods for the OpenShift Elasticsearch Operator exist in each namespace
Example output
NAMESPACE NAME DISPLAY VERSION REPLACES PHASE default elasticsearch-operator.v5.8.1 OpenShift Elasticsearch Operator 5.8.1 elasticsearch-operator.v5.8.0 Succeeded kube-node-lease elasticsearch-operator.v5.8.1 OpenShift Elasticsearch Operator 5.8.1 elasticsearch-operator.v5.8.0 Succeeded kube-public elasticsearch-operator.v5.8.1 OpenShift Elasticsearch Operator 5.8.1 elasticsearch-operator.v5.8.0 Succeeded kube-system elasticsearch-operator.v5.8.1 OpenShift Elasticsearch Operator 5.8.1 elasticsearch-operator.v5.8.0 Succeeded non-destructive-test elasticsearch-operator.v5.8.1 OpenShift Elasticsearch Operator 5.8.1 elasticsearch-operator.v5.8.0 Succeeded openshift-apiserver-operator elasticsearch-operator.v5.8.1 OpenShift Elasticsearch Operator 5.8.1 elasticsearch-operator.v5.8.0 Succeeded openshift-apiserver elasticsearch-operator.v5.8.1 OpenShift Elasticsearch Operator 5.8.1 elasticsearch-operator.v5.8.0 Succeeded ...
10.2.4. Configuring log storage
You can configure which log storage type your logging uses by modifying the ClusterLogging
custom resource (CR).
Prerequisites
- You have administrator permissions.
-
You have installed the OpenShift CLI (
oc
). - You have installed the Red Hat OpenShift Logging Operator and an internal log store that is either the LokiStack or Elasticsearch.
-
You have created a
ClusterLogging
CR.
The Logging 5.9 release does not contain an updated version of the OpenShift Elasticsearch Operator. If you currently use the OpenShift Elasticsearch Operator released with Logging 5.8, it will continue to work with Logging until the EOL of Logging 5.8. As an alternative to using the OpenShift Elasticsearch Operator to manage the default log storage, you can use the Loki Operator. For more information on the Logging lifecycle dates, see Platform Agnostic Operators.
Procedure
Modify the
ClusterLogging
CRlogStore
spec:ClusterLogging
CR exampleapiVersion: logging.openshift.io/v1 kind: ClusterLogging metadata: # ... spec: # ... logStore: type: <log_store_type> 1 elasticsearch: 2 nodeCount: <integer> resources: {} storage: {} redundancyPolicy: <redundancy_type> 3 lokistack: 4 name: {} # ...
- 1
- Specify the log store type. This can be either
lokistack
orelasticsearch
. - 2
- Optional configuration options for the Elasticsearch log store.
- 3
- Specify the redundancy type. This value can be
ZeroRedundancy
,SingleRedundancy
,MultipleRedundancy
, orFullRedundancy
. - 4
- Optional configuration options for LokiStack.
Example
ClusterLogging
CR to specify LokiStack as the log storeapiVersion: logging.openshift.io/v1 kind: ClusterLogging metadata: name: instance namespace: openshift-logging spec: managementState: Managed logStore: type: lokistack lokistack: name: logging-loki # ...
Apply the
ClusterLogging
CR by running the following command:$ oc apply -f <filename>.yaml
10.3. Configuring the LokiStack log store
In logging documentation, LokiStack refers to the logging supported combination of Loki and web proxy with OpenShift Container Platform authentication integration. LokiStack’s proxy uses OpenShift Container Platform authentication to enforce multi-tenancy. Loki refers to the log store as either the individual component or an external store.
10.3.1. Creating a new group for the cluster-admin user role
Querying application logs for multiple namespaces as a cluster-admin
user, where the sum total of characters of all of the namespaces in the cluster is greater than 5120, results in the error Parse error: input size too long (XXXX > 5120)
. For better control over access to logs in LokiStack, make the cluster-admin
user a member of the cluster-admin
group. If the cluster-admin
group does not exist, create it and add the desired users to it.
Use the following procedure to create a new group for users with cluster-admin
permissions.
Procedure
Enter the following command to create a new group:
$ oc adm groups new cluster-admin
Enter the following command to add the desired user to the
cluster-admin
group:$ oc adm groups add-users cluster-admin <username>
Enter the following command to add
cluster-admin
user role to the group:$ oc adm policy add-cluster-role-to-group cluster-admin cluster-admin
10.3.2. LokiStack behavior during cluster restarts
In logging version 5.8 and newer versions, when an OpenShift Container Platform cluster is restarted, LokiStack ingestion and the query path continue to operate within the available CPU and memory resources available for the node. This means that there is no downtime for the LokiStack during OpenShift Container Platform cluster updates. This behavior is achieved by using PodDisruptionBudget
resources. The Loki Operator provisions PodDisruptionBudget
resources for Loki, which determine the minimum number of pods that must be available per component to ensure normal operations under certain conditions.
Additional resources
10.3.3. Configuring Loki to tolerate node failure
In the logging 5.8 and later versions, the Loki Operator supports setting pod anti-affinity rules to request that pods of the same component are scheduled on different available nodes in the cluster.
Affinity is a property of pods that controls the nodes on which they prefer to be scheduled. Anti-affinity is a property of pods that prevents a pod from being scheduled on a node.
In OpenShift Container Platform, pod affinity and pod anti-affinity allow you to constrain which nodes your pod is eligible to be scheduled on based on the key-value labels on other pods.
The Operator sets default, preferred podAntiAffinity
rules for all Loki components, which includes the compactor
, distributor
, gateway
, indexGateway
, ingester
, querier
, queryFrontend
, and ruler
components.
You can override the preferred podAntiAffinity
settings for Loki components by configuring required settings in the requiredDuringSchedulingIgnoredDuringExecution
field:
Example user settings for the ingester component
apiVersion: loki.grafana.com/v1 kind: LokiStack metadata: name: logging-loki namespace: openshift-logging spec: # ... template: ingester: podAntiAffinity: # ... requiredDuringSchedulingIgnoredDuringExecution: 1 - labelSelector: matchLabels: 2 app.kubernetes.io/component: ingester topologyKey: kubernetes.io/hostname # ...
10.3.4. Zone aware data replication
In the logging 5.8 and later versions, the Loki Operator offers support for zone-aware data replication through pod topology spread constraints. Enabling this feature enhances reliability and safeguards against log loss in the event of a single zone failure. When configuring the deployment size as 1x.extra.small
, 1x.small
, or 1x.medium,
the replication.factor
field is automatically set to 2.
To ensure proper replication, you need to have at least as many availability zones as the replication factor specifies. While it is possible to have more availability zones than the replication factor, having fewer zones can lead to write failures. Each zone should host an equal number of instances for optimal operation.
Example LokiStack CR with zone replication enabled
apiVersion: loki.grafana.com/v1 kind: LokiStack metadata: name: logging-loki namespace: openshift-logging spec: replicationFactor: 2 1 replication: factor: 2 2 zones: - maxSkew: 1 3 topologyKey: topology.kubernetes.io/zone 4
- 1
- Deprecated field, values entered are overwritten by
replication.factor
. - 2
- This value is automatically set when deployment size is selected at setup.
- 3
- The maximum difference in number of pods between any two topology domains. The default is 1, and you cannot specify a value of 0.
- 4
- Defines zones in the form of a topology key that corresponds to a node label.
10.3.4.1. Recovering Loki pods from failed zones
In OpenShift Container Platform a zone failure happens when specific availability zone resources become inaccessible. Availability zones are isolated areas within a cloud provider’s data center, aimed at enhancing redundancy and fault tolerance. If your OpenShift Container Platform cluster isn’t configured to handle this, a zone failure can lead to service or data loss.
Loki pods are part of a StatefulSet, and they come with Persistent Volume Claims (PVCs) provisioned by a StorageClass
object. Each Loki pod and its PVCs reside in the same zone. When a zone failure occurs in a cluster, the StatefulSet controller automatically attempts to recover the affected pods in the failed zone.
The following procedure will delete the PVCs in the failed zone, and all data contained therein. To avoid complete data loss the replication factor field of the LokiStack
CR should always be set to a value greater than 1 to ensure that Loki is replicating.
Prerequisites
- Logging version 5.8 or later.
-
Verify your
LokiStack
CR has a replication factor greater than 1. - Zone failure detected by the control plane, and nodes in the failed zone are marked by cloud provider integration.
The StatefulSet controller automatically attempts to reschedule pods in a failed zone. Because the associated PVCs are also in the failed zone, automatic rescheduling to a different zone does not work. You must manually delete the PVCs in the failed zone to allow successful re-creation of the stateful Loki Pod and its provisioned PVC in the new zone.
Procedure
List the pods in
Pending
status by running the following command:oc get pods --field-selector status.phase==Pending -n openshift-logging
Example
oc get pods
outputNAME READY STATUS RESTARTS AGE 1 logging-loki-index-gateway-1 0/1 Pending 0 17m logging-loki-ingester-1 0/1 Pending 0 16m logging-loki-ruler-1 0/1 Pending 0 16m
- 1
- These pods are in
Pending
status because their corresponding PVCs are in the failed zone.
List the PVCs in
Pending
status by running the following command:oc get pvc -o=json -n openshift-logging | jq '.items[] | select(.status.phase == "Pending") | .metadata.name' -r
Example
oc get pvc
outputstorage-logging-loki-index-gateway-1 storage-logging-loki-ingester-1 wal-logging-loki-ingester-1 storage-logging-loki-ruler-1 wal-logging-loki-ruler-1
Delete the PVC(s) for a pod by running the following command:
oc delete pvc __<pvc_name>__ -n openshift-logging
Then delete the pod(s) by running the following command:
oc delete pod __<pod_name>__ -n openshift-logging
Once these objects have been successfully deleted, they should automatically be rescheduled in an available zone.
10.3.4.1.1. Troubleshooting PVC in a terminating state
The PVCs might hang in the terminating state without being deleted, if PVC metadata finalizers are set to kubernetes.io/pv-protection
. Removing the finalizers should allow the PVCs to delete successfully.
Remove the finalizer for each PVC by running the command below, then retry deletion.
oc patch pvc __<pvc_name>__ -p '{"metadata":{"finalizers":null}}' -n openshift-logging
10.3.5. Fine grained access for Loki logs
In logging 5.8 and later, the Red Hat OpenShift Logging Operator does not grant all users access to logs by default. As an administrator, you must configure your users' access unless the Operator was upgraded and prior configurations are in place. Depending on your configuration and need, you can configure fine grain access to logs using the following:
- Cluster wide policies
- Namespace scoped policies
- Creation of custom admin groups
As an administrator, you need to create the role bindings and cluster role bindings appropriate for your deployment. The Red Hat OpenShift Logging Operator provides the following cluster roles:
-
cluster-logging-application-view
grants permission to read application logs. -
cluster-logging-infrastructure-view
grants permission to read infrastructure logs. -
cluster-logging-audit-view
grants permission to read audit logs.
If you have upgraded from a prior version, an additional cluster role logging-application-logs-reader
and associated cluster role binding logging-all-authenticated-application-logs-reader
provide backward compatibility, allowing any authenticated user read access in their namespaces.
Users with access by namespace must provide a namespace when querying application logs.
10.3.5.1. Cluster wide access
Cluster role binding resources reference cluster roles, and set permissions cluster wide.
Example ClusterRoleBinding
kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: logging-all-application-logs-reader roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cluster-logging-application-view 1 subjects: 2 - kind: Group name: system:authenticated apiGroup: rbac.authorization.k8s.io
10.3.5.2. Namespaced access
RoleBinding
resources can be used with ClusterRole
objects to define the namespace a user or group has access to logs for.
Example RoleBinding
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: allow-read-logs
namespace: log-test-0 1
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-logging-application-view
subjects:
- kind: User
apiGroup: rbac.authorization.k8s.io
name: testuser-0
- 1
- Specifies the namespace this
RoleBinding
applies to.
10.3.5.3. Custom admin group access
If you have a large deployment with several users who require broader permissions, you can create a custom group using the adminGroup
field. Users who are members of any group specified in the adminGroups
field of the LokiStack
CR are considered administrators.
Administrator users have access to all application logs in all namespaces, if they also get assigned the cluster-logging-application-view
role.
Example LokiStack CR
apiVersion: loki.grafana.com/v1 kind: LokiStack metadata: name: logging-loki namespace: openshift-logging spec: tenants: mode: openshift-logging 1 openshift: adminGroups: 2 - cluster-admin - custom-admin-group 3
Additional resources
10.3.6. Enabling stream-based retention with Loki
With Logging version 5.6 and higher, you can configure retention policies based on log streams. Rules for these may be set globally, per tenant, or both. If you configure both, tenant rules apply before global rules.
If there is no retention period defined on the s3 bucket or in the LokiStack custom resource (CR), then the logs are not pruned and they stay in the s3 bucket forever, which might fill up the s3 storage.
Although logging version 5.9 and higher supports schema v12, v13 is recommended.
To enable stream-based retention, create a
LokiStack
CR:Example global stream-based retention
apiVersion: loki.grafana.com/v1 kind: LokiStack metadata: name: logging-loki namespace: openshift-logging spec: limits: global: 1 retention: 2 days: 20 streams: - days: 4 priority: 1 selector: '{kubernetes_namespace_name=~"test.+"}' 3 - days: 1 priority: 1 selector: '{log_type="infrastructure"}' managementState: Managed replicationFactor: 1 size: 1x.small storage: schemas: - effectiveDate: "2020-10-11" version: v11 secret: name: logging-loki-s3 type: aws storageClassName: standard tenants: mode: openshift-logging
- 1
- Sets retention policy for all log streams. Note: This field does not impact the retention period for stored logs in object storage.
- 2
- Retention is enabled in the cluster when this block is added to the CR.
- 3
- Contains the LogQL query used to define the log stream.
Example per-tenant stream-based retention
apiVersion: loki.grafana.com/v1 kind: LokiStack metadata: name: logging-loki namespace: openshift-logging spec: limits: global: retention: days: 20 tenants: 1 application: retention: days: 1 streams: - days: 4 selector: '{kubernetes_namespace_name=~"test.+"}' 2 infrastructure: retention: days: 5 streams: - days: 1 selector: '{kubernetes_namespace_name=~"openshift-cluster.+"}' managementState: Managed replicationFactor: 1 size: 1x.small storage: schemas: - effectiveDate: "2020-10-11" version: v11 secret: name: logging-loki-s3 type: aws storageClassName: standard tenants: mode: openshift-logging
- 1
- Sets retention policy by tenant. Valid tenant types are
application
,audit
, andinfrastructure
. - 2
- Contains the LogQL query used to define the log stream.
Apply the
LokiStack
CR:$ oc apply -f <filename>.yaml
This is not for managing the retention for stored logs. Global retention periods for stored logs to a supported maximum of 30 days is configured with your object storage.
10.3.7. Troubleshooting Loki rate limit errors
If the Log Forwarder API forwards a large block of messages that exceeds the rate limit to Loki, Loki generates rate limit (429
) errors.
These errors can occur during normal operation. For example, when adding the logging to a cluster that already has some logs, rate limit errors might occur while the logging tries to ingest all of the existing log entries. In this case, if the rate of addition of new logs is less than the total rate limit, the historical data is eventually ingested, and the rate limit errors are resolved without requiring user intervention.
In cases where the rate limit errors continue to occur, you can fix the issue by modifying the LokiStack
custom resource (CR).
The LokiStack
CR is not available on Grafana-hosted Loki. This topic does not apply to Grafana-hosted Loki servers.
Conditions
- The Log Forwarder API is configured to forward logs to Loki.
Your system sends a block of messages that is larger than 2 MB to Loki. For example:
"values":[["1630410392689800468","{\"kind\":\"Event\",\"apiVersion\":\ ....... ...... ...... ...... \"received_at\":\"2021-08-31T11:46:32.800278+00:00\",\"version\":\"1.7.4 1.6.0\"}},\"@timestamp\":\"2021-08-31T11:46:32.799692+00:00\",\"viaq_index_name\":\"audit-write\",\"viaq_msg_id\":\"MzFjYjJkZjItNjY0MC00YWU4LWIwMTEtNGNmM2E5ZmViMGU4\",\"log_type\":\"audit\"}"]]}]}
After you enter
oc logs -n openshift-logging -l component=collector
, the collector logs in your cluster show a line containing one of the following error messages:429 Too Many Requests Ingestion rate limit exceeded
Example Vector error message
2023-08-25T16:08:49.301780Z WARN sink{component_kind="sink" component_id=default_loki_infra component_type=loki component_name=default_loki_infra}: vector::sinks::util::retries: Retrying after error. error=Server responded with an error: 429 Too Many Requests internal_log_rate_limit=true
Example Fluentd error message
2023-08-30 14:52:15 +0000 [warn]: [default_loki_infra] failed to flush the buffer. retry_times=2 next_retry_time=2023-08-30 14:52:19 +0000 chunk="604251225bf5378ed1567231a1c03b8b" error_class=Fluent::Plugin::LokiOutput::LogPostError error="429 Too Many Requests Ingestion rate limit exceeded for user infrastructure (limit: 4194304 bytes/sec) while attempting to ingest '4082' lines totaling '7820025' bytes, reduce log volume or contact your Loki administrator to see if the limit can be increased\n"
The error is also visible on the receiving end. For example, in the LokiStack ingester pod:
Example Loki ingester error message
level=warn ts=2023-08-30T14:57:34.155592243Z caller=grpc_logging.go:43 duration=1.434942ms method=/logproto.Pusher/Push err="rpc error: code = Code(429) desc = entry with timestamp 2023-08-30 14:57:32.012778399 +0000 UTC ignored, reason: 'Per stream rate limit exceeded (limit: 3MB/sec) while attempting to ingest for stream
Procedure
Update the
ingestionBurstSize
andingestionRate
fields in theLokiStack
CR:apiVersion: loki.grafana.com/v1 kind: LokiStack metadata: name: logging-loki namespace: openshift-logging spec: limits: global: ingestion: ingestionBurstSize: 16 1 ingestionRate: 8 2 # ...
- 1
- The
ingestionBurstSize
field defines the maximum local rate-limited sample size per distributor replica in MB. This value is a hard limit. Set this value to at least the maximum logs size expected in a single push request. Single requests that are larger than theingestionBurstSize
value are not permitted. - 2
- The
ingestionRate
field is a soft limit on the maximum amount of ingested samples per second in MB. Rate limit errors occur if the rate of logs exceeds the limit, but the collector retries sending the logs. As long as the total average is lower than the limit, the system recovers and errors are resolved without user intervention.
10.3.8. Configuring Loki to tolerate memberlist creation failure
In an OpenShift cluster, administrators generally use a non-private IP network range. As a result, the LokiStack memberlist configuration fails because, by default, it only uses private IP networks.
As an administrator, you can select the pod network for the memberlist configuration. You can modify the LokiStack CR to use the podIP
in the hashRing
spec. To configure the LokiStack CR, use the following command:
$ oc patch LokiStack logging-loki -n openshift-logging --type=merge -p '{"spec": {"hashRing":{"memberlist":{"instanceAddrType":"podIP","type": "memberlist"}}}}'
Example LokiStack to include podIP
apiVersion: loki.grafana.com/v1 kind: LokiStack metadata: name: logging-loki namespace: openshift-logging spec: # ... hashRing: type: memberlist memberlist: instanceAddrType: podIP # ...
10.3.9. Additional resources
10.4. Configuring the Elasticsearch log store
You can use Elasticsearch 6 to store and organize log data.
You can make modifications to your log store, including:
- Storage for your Elasticsearch cluster
- Shard replication across data nodes in the cluster, from full replication to no replication
- External access to Elasticsearch data
10.4.1. Configuring log storage
You can configure which log storage type your logging uses by modifying the ClusterLogging
custom resource (CR).
Prerequisites
- You have administrator permissions.
-
You have installed the OpenShift CLI (
oc
). - You have installed the Red Hat OpenShift Logging Operator and an internal log store that is either the LokiStack or Elasticsearch.
-
You have created a
ClusterLogging
CR.
The Logging 5.9 release does not contain an updated version of the OpenShift Elasticsearch Operator. If you currently use the OpenShift Elasticsearch Operator released with Logging 5.8, it will continue to work with Logging until the EOL of Logging 5.8. As an alternative to using the OpenShift Elasticsearch Operator to manage the default log storage, you can use the Loki Operator. For more information on the Logging lifecycle dates, see Platform Agnostic Operators.
Procedure
Modify the
ClusterLogging
CRlogStore
spec:ClusterLogging
CR exampleapiVersion: logging.openshift.io/v1 kind: ClusterLogging metadata: # ... spec: # ... logStore: type: <log_store_type> 1 elasticsearch: 2 nodeCount: <integer> resources: {} storage: {} redundancyPolicy: <redundancy_type> 3 lokistack: 4 name: {} # ...
- 1
- Specify the log store type. This can be either
lokistack
orelasticsearch
. - 2
- Optional configuration options for the Elasticsearch log store.
- 3
- Specify the redundancy type. This value can be
ZeroRedundancy
,SingleRedundancy
,MultipleRedundancy
, orFullRedundancy
. - 4
- Optional configuration options for LokiStack.
Example
ClusterLogging
CR to specify LokiStack as the log storeapiVersion: logging.openshift.io/v1 kind: ClusterLogging metadata: name: instance namespace: openshift-logging spec: managementState: Managed logStore: type: lokistack lokistack: name: logging-loki # ...
Apply the
ClusterLogging
CR by running the following command:$ oc apply -f <filename>.yaml
10.4.2. Forwarding audit logs to the log store
In a logging deployment, container and infrastructure logs are forwarded to the internal log store defined in the ClusterLogging
custom resource (CR) by default.
Audit logs are not forwarded to the internal log store by default because this does not provide secure storage. You are responsible for ensuring that the system to which you forward audit logs is compliant with your organizational and governmental regulations, and is properly secured.
If this default configuration meets your needs, you do not need to configure a ClusterLogForwarder
CR. If a ClusterLogForwarder
CR exists, logs are not forwarded to the internal log store unless a pipeline is defined that contains the default
output.
Procedure
To use the Log Forward API to forward audit logs to the internal Elasticsearch instance:
Create or edit a YAML file that defines the
ClusterLogForwarder
CR object:Create a CR to send all log types to the internal Elasticsearch instance. You can use the following example without making any changes:
apiVersion: logging.openshift.io/v1 kind: ClusterLogForwarder metadata: name: instance namespace: openshift-logging spec: pipelines: 1 - name: all-to-default inputRefs: - infrastructure - application - audit outputRefs: - default
- 1
- A pipeline defines the type of logs to forward using the specified output. The default output forwards logs to the internal Elasticsearch instance.
NoteYou must specify all three types of logs in the pipeline: application, infrastructure, and audit. If you do not specify a log type, those logs are not stored and will be lost.
If you have an existing
ClusterLogForwarder
CR, add a pipeline to the default output for the audit logs. You do not need to define the default output. For example:apiVersion: "logging.openshift.io/v1" kind: ClusterLogForwarder metadata: name: instance namespace: openshift-logging spec: outputs: - name: elasticsearch-insecure type: "elasticsearch" url: http://elasticsearch-insecure.messaging.svc.cluster.local insecure: true - name: elasticsearch-secure type: "elasticsearch" url: https://elasticsearch-secure.messaging.svc.cluster.local secret: name: es-audit - name: secureforward-offcluster type: "fluentdForward" url: https://secureforward.offcluster.com:24224 secret: name: secureforward pipelines: - name: container-logs inputRefs: - application outputRefs: - secureforward-offcluster - name: infra-logs inputRefs: - infrastructure outputRefs: - elasticsearch-insecure - name: audit-logs inputRefs: - audit outputRefs: - elasticsearch-secure - default 1
- 1
- This pipeline sends the audit logs to the internal Elasticsearch instance in addition to an external instance.
Additional resources
10.4.3. Configuring log retention time
You can configure a retention policy that specifies how long the default Elasticsearch log store keeps indices for each of the three log sources: infrastructure logs, application logs, and audit logs.
To configure the retention policy, you set a maxAge
parameter for each log source in the ClusterLogging
custom resource (CR). The CR applies these values to the Elasticsearch rollover schedule, which determines when Elasticsearch deletes the rolled-over indices.
Elasticsearch rolls over an index, moving the current index and creating a new index, when an index matches any of the following conditions:
-
The index is older than the
rollover.maxAge
value in theElasticsearch
CR. - The index size is greater than 40 GB × the number of primary shards.
- The index doc count is greater than 40960 KB × the number of primary shards.
Elasticsearch deletes the rolled-over indices based on the retention policy you configure. If you do not create a retention policy for any log sources, logs are deleted after seven days by default.
Prerequisites
- The Red Hat OpenShift Logging Operator and the OpenShift Elasticsearch Operator must be installed.
Procedure
To configure the log retention time:
Edit the
ClusterLogging
CR to add or modify theretentionPolicy
parameter:apiVersion: "logging.openshift.io/v1" kind: "ClusterLogging" ... spec: managementState: "Managed" logStore: type: "elasticsearch" retentionPolicy: 1 application: maxAge: 1d infra: maxAge: 7d audit: maxAge: 7d elasticsearch: nodeCount: 3 ...
- 1
- Specify the time that Elasticsearch should retain each log source. Enter an integer and a time designation: weeks(w), hours(h/H), minutes(m) and seconds(s). For example,
1d
for one day. Logs older than themaxAge
are deleted. By default, logs are retained for seven days.
You can verify the settings in the
Elasticsearch
custom resource (CR).For example, the Red Hat OpenShift Logging Operator updated the following
Elasticsearch
CR to configure a retention policy that includes settings to roll over active indices for the infrastructure logs every eight hours and the rolled-over indices are deleted seven days after rollover. OpenShift Container Platform checks every 15 minutes to determine if the indices need to be rolled over.apiVersion: "logging.openshift.io/v1" kind: "Elasticsearch" metadata: name: "elasticsearch" spec: ... indexManagement: policies: 1 - name: infra-policy phases: delete: minAge: 7d 2 hot: actions: rollover: maxAge: 8h 3 pollInterval: 15m 4 ...
- 1
- For each log source, the retention policy indicates when to delete and roll over logs for that source.
- 2
- When OpenShift Container Platform deletes the rolled-over indices. This setting is the
maxAge
you set in theClusterLogging
CR. - 3
- The index age for OpenShift Container Platform to consider when rolling over the indices. This value is determined from the
maxAge
you set in theClusterLogging
CR. - 4
- When OpenShift Container Platform checks if the indices should be rolled over. This setting is the default and cannot be changed.
NoteModifying the
Elasticsearch
CR is not supported. All changes to the retention policies must be made in theClusterLogging
CR.The OpenShift Elasticsearch Operator deploys a cron job to roll over indices for each mapping using the defined policy, scheduled using the
pollInterval
.$ oc get cronjob
Example output
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE elasticsearch-im-app */15 * * * * False 0 <none> 4s elasticsearch-im-audit */15 * * * * False 0 <none> 4s elasticsearch-im-infra */15 * * * * False 0 <none> 4s
10.4.4. Configuring CPU and memory requests for the log store
Each component specification allows for adjustments to both the CPU and memory requests. You should not have to manually adjust these values as the OpenShift Elasticsearch Operator sets values sufficient for your environment.
In large-scale clusters, the default memory limit for the Elasticsearch proxy container might not be sufficient, causing the proxy container to be OOMKilled. If you experience this issue, increase the memory requests and limits for the Elasticsearch proxy.
Each Elasticsearch node can operate with a lower memory setting though this is not recommended for production deployments. For production use, you should have no less than the default 16Gi allocated to each pod. Preferably you should allocate as much as possible, up to 64Gi per pod.
Prerequisites
- The Red Hat OpenShift Logging and Elasticsearch Operators must be installed.
Procedure
Edit the
ClusterLogging
custom resource (CR) in theopenshift-logging
project:$ oc edit ClusterLogging instance
apiVersion: "logging.openshift.io/v1" kind: "ClusterLogging" metadata: name: "instance" .... spec: logStore: type: "elasticsearch" elasticsearch:1 resources: limits: 2 memory: "32Gi" requests: 3 cpu: "1" memory: "16Gi" proxy: 4 resources: limits: memory: 100Mi requests: memory: 100Mi
- 1
- Specify the CPU and memory requests for Elasticsearch as needed. If you leave these values blank, the OpenShift Elasticsearch Operator sets default values that should be sufficient for most deployments. The default values are
16Gi
for the memory request and1
for the CPU request. - 2
- The maximum amount of resources a pod can use.
- 3
- The minimum resources required to schedule a pod.
- 4
- Specify the CPU and memory requests for the Elasticsearch proxy as needed. If you leave these values blank, the OpenShift Elasticsearch Operator sets default values that are sufficient for most deployments. The default values are
256Mi
for the memory request and100m
for the CPU request.
When adjusting the amount of Elasticsearch memory, the same value should be used for both requests
and limits
.
For example:
resources: limits: 1 memory: "32Gi" requests: 2 cpu: "8" memory: "32Gi"
Kubernetes generally adheres the node configuration and does not allow Elasticsearch to use the specified limits. Setting the same value for the requests
and limits
ensures that Elasticsearch can use the memory you want, assuming the node has the memory available.
10.4.5. Configuring replication policy for the log store
You can define how Elasticsearch shards are replicated across data nodes in the cluster.
Prerequisites
- The Red Hat OpenShift Logging and Elasticsearch Operators must be installed.
Procedure
Edit the
ClusterLogging
custom resource (CR) in theopenshift-logging
project:$ oc edit clusterlogging instance
apiVersion: "logging.openshift.io/v1" kind: "ClusterLogging" metadata: name: "instance" .... spec: logStore: type: "elasticsearch" elasticsearch: redundancyPolicy: "SingleRedundancy" 1
- 1
- Specify a redundancy policy for the shards. The change is applied upon saving the changes.
- FullRedundancy. Elasticsearch fully replicates the primary shards for each index to every data node. This provides the highest safety, but at the cost of the highest amount of disk required and the poorest performance.
- MultipleRedundancy. Elasticsearch fully replicates the primary shards for each index to half of the data nodes. This provides a good tradeoff between safety and performance.
- SingleRedundancy. Elasticsearch makes one copy of the primary shards for each index. Logs are always available and recoverable as long as at least two data nodes exist. Better performance than MultipleRedundancy, when using 5 or more nodes. You cannot apply this policy on deployments of single Elasticsearch node.
- ZeroRedundancy. Elasticsearch does not make copies of the primary shards. Logs might be unavailable or lost in the event a node is down or fails. Use this mode when you are more concerned with performance than safety, or have implemented your own disk/PVC backup/restore strategy.
The number of primary shards for the index templates is equal to the number of Elasticsearch data nodes.
10.4.6. Scaling down Elasticsearch pods
Reducing the number of Elasticsearch pods in your cluster can result in data loss or Elasticsearch performance degradation.
If you scale down, you should scale down by one pod at a time and allow the cluster to re-balance the shards and replicas. After the Elasticsearch health status returns to green
, you can scale down by another pod.
If your Elasticsearch cluster is set to ZeroRedundancy
, you should not scale down your Elasticsearch pods.
10.4.7. Configuring persistent storage for the log store
Elasticsearch requires persistent storage. The faster the storage, the faster the Elasticsearch performance.
Using NFS storage as a volume or a persistent volume (or via NAS such as Gluster) is not supported for Elasticsearch storage, as Lucene relies on file system behavior that NFS does not supply. Data corruption and other problems can occur.
Prerequisites
- The Red Hat OpenShift Logging and Elasticsearch Operators must be installed.
Procedure
Edit the
ClusterLogging
CR to specify that each data node in the cluster is bound to a Persistent Volume Claim.apiVersion: "logging.openshift.io/v1" kind: "ClusterLogging" metadata: name: "instance" # ... spec: logStore: type: "elasticsearch" elasticsearch: nodeCount: 3 storage: storageClassName: "gp2" size: "200G"
This example specifies each data node in the cluster is bound to a Persistent Volume Claim that requests "200G" of AWS General Purpose SSD (gp2) storage.
If you use a local volume for persistent storage, do not use a raw block volume, which is described with volumeMode: block
in the LocalVolume
object. Elasticsearch cannot use raw block volumes.
10.4.8. Configuring the log store for emptyDir storage
You can use emptyDir with your log store, which creates an ephemeral deployment in which all of a pod’s data is lost upon restart.
When using emptyDir, if log storage is restarted or redeployed, you will lose data.
Prerequisites
- The Red Hat OpenShift Logging and Elasticsearch Operators must be installed.
Procedure
Edit the
ClusterLogging
CR to specify emptyDir:spec: logStore: type: "elasticsearch" elasticsearch: nodeCount: 3 storage: {}
10.4.9. Performing an Elasticsearch rolling cluster restart
Perform a rolling restart when you change the elasticsearch
config map or any of the elasticsearch-*
deployment configurations.
Also, a rolling restart is recommended if the nodes on which an Elasticsearch pod runs requires a reboot.
Prerequisites
- The Red Hat OpenShift Logging and Elasticsearch Operators must be installed.
Procedure
To perform a rolling cluster restart:
Change to the
openshift-logging
project:$ oc project openshift-logging
Get the names of the Elasticsearch pods:
$ oc get pods -l component=elasticsearch
Scale down the collector pods so they stop sending new logs to Elasticsearch:
$ oc -n openshift-logging patch daemonset/collector -p '{"spec":{"template":{"spec":{"nodeSelector":{"logging-infra-collector": "false"}}}}}'
Perform a shard synced flush using the OpenShift Container Platform es_util tool to ensure there are no pending operations waiting to be written to disk prior to shutting down:
$ oc exec <any_es_pod_in_the_cluster> -c elasticsearch -- es_util --query="_flush/synced" -XPOST
For example:
$ oc exec -c elasticsearch-cdm-5ceex6ts-1-dcd6c4c7c-jpw6 -c elasticsearch -- es_util --query="_flush/synced" -XPOST
Example output
{"_shards":{"total":4,"successful":4,"failed":0},".security":{"total":2,"successful":2,"failed":0},".kibana_1":{"total":2,"successful":2,"failed":0}}
Prevent shard balancing when purposely bringing down nodes using the OpenShift Container Platform es_util tool:
$ oc exec <any_es_pod_in_the_cluster> -c elasticsearch -- es_util --query="_cluster/settings" -XPUT -d '{ "persistent": { "cluster.routing.allocation.enable" : "primaries" } }'
For example:
$ oc exec elasticsearch-cdm-5ceex6ts-1-dcd6c4c7c-jpw6 -c elasticsearch -- es_util --query="_cluster/settings" -XPUT -d '{ "persistent": { "cluster.routing.allocation.enable" : "primaries" } }'
Example output
{"acknowledged":true,"persistent":{"cluster":{"routing":{"allocation":{"enable":"primaries"}}}},"transient":
After the command is complete, for each deployment you have for an ES cluster:
By default, the OpenShift Container Platform Elasticsearch cluster blocks rollouts to their nodes. Use the following command to allow rollouts and allow the pod to pick up the changes:
$ oc rollout resume deployment/<deployment-name>
For example:
$ oc rollout resume deployment/elasticsearch-cdm-0-1
Example output
deployment.extensions/elasticsearch-cdm-0-1 resumed
A new pod is deployed. After the pod has a ready container, you can move on to the next deployment.
$ oc get pods -l component=elasticsearch-
Example output
NAME READY STATUS RESTARTS AGE elasticsearch-cdm-5ceex6ts-1-dcd6c4c7c-jpw6k 2/2 Running 0 22h elasticsearch-cdm-5ceex6ts-2-f799564cb-l9mj7 2/2 Running 0 22h elasticsearch-cdm-5ceex6ts-3-585968dc68-k7kjr 2/2 Running 0 22h
After the deployments are complete, reset the pod to disallow rollouts:
$ oc rollout pause deployment/<deployment-name>
For example:
$ oc rollout pause deployment/elasticsearch-cdm-0-1
Example output
deployment.extensions/elasticsearch-cdm-0-1 paused
Check that the Elasticsearch cluster is in a
green
oryellow
state:$ oc exec <any_es_pod_in_the_cluster> -c elasticsearch -- es_util --query=_cluster/health?pretty=true
NoteIf you performed a rollout on the Elasticsearch pod you used in the previous commands, the pod no longer exists and you need a new pod name here.
For example:
$ oc exec elasticsearch-cdm-5ceex6ts-1-dcd6c4c7c-jpw6 -c elasticsearch -- es_util --query=_cluster/health?pretty=true
{ "cluster_name" : "elasticsearch", "status" : "yellow", 1 "timed_out" : false, "number_of_nodes" : 3, "number_of_data_nodes" : 3, "active_primary_shards" : 8, "active_shards" : 16, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 1, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 100.0 }
- 1
- Make sure this parameter value is
green
oryellow
before proceeding.
- If you changed the Elasticsearch configuration map, repeat these steps for each Elasticsearch pod.
After all the deployments for the cluster have been rolled out, re-enable shard balancing:
$ oc exec <any_es_pod_in_the_cluster> -c elasticsearch -- es_util --query="_cluster/settings" -XPUT -d '{ "persistent": { "cluster.routing.allocation.enable" : "all" } }'
For example:
$ oc exec elasticsearch-cdm-5ceex6ts-1-dcd6c4c7c-jpw6 -c elasticsearch -- es_util --query="_cluster/settings" -XPUT -d '{ "persistent": { "cluster.routing.allocation.enable" : "all" } }'
Example output
{ "acknowledged" : true, "persistent" : { }, "transient" : { "cluster" : { "routing" : { "allocation" : { "enable" : "all" } } } } }
Scale up the collector pods so they send new logs to Elasticsearch.
$ oc -n openshift-logging patch daemonset/collector -p '{"spec":{"template":{"spec":{"nodeSelector":{"logging-infra-collector": "true"}}}}}'
10.4.10. Exposing the log store service as a route
By default, the log store that is deployed with logging is not accessible from outside the logging cluster. You can enable a route with re-encryption termination for external access to the log store service for those tools that access its data.
Externally, you can access the log store by creating a reencrypt route, your OpenShift Container Platform token and the installed log store CA certificate. Then, access a node that hosts the log store service with a cURL request that contains:
-
The
Authorization: Bearer ${token}
- The Elasticsearch reencrypt route and an Elasticsearch API request.
Internally, you can access the log store service using the log store cluster IP, which you can get by using either of the following commands:
$ oc get service elasticsearch -o jsonpath={.spec.clusterIP} -n openshift-logging
Example output
172.30.183.229
$ oc get service elasticsearch -n openshift-logging
Example output
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE elasticsearch ClusterIP 172.30.183.229 <none> 9200/TCP 22h
You can check the cluster IP address with a command similar to the following:
$ oc exec elasticsearch-cdm-oplnhinv-1-5746475887-fj2f8 -n openshift-logging -- curl -tlsv1.2 --insecure -H "Authorization: Bearer ${token}" "https://172.30.183.229:9200/_cat/health"
Example output
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 29 100 29 0 0 108 0 --:--:-- --:--:-- --:--:-- 108
Prerequisites
- The Red Hat OpenShift Logging and Elasticsearch Operators must be installed.
- You must have access to the project to be able to access to the logs.
Procedure
To expose the log store externally:
Change to the
openshift-logging
project:$ oc project openshift-logging
Extract the CA certificate from the log store and write to the admin-ca file:
$ oc extract secret/elasticsearch --to=. --keys=admin-ca
Example output
admin-ca
Create the route for the log store service as a YAML file:
Create a YAML file with the following:
apiVersion: route.openshift.io/v1 kind: Route metadata: name: elasticsearch namespace: openshift-logging spec: host: to: kind: Service name: elasticsearch tls: termination: reencrypt destinationCACertificate: | 1
- 1
- Add the log store CA certifcate or use the command in the next step. You do not have to set the
spec.tls.key
,spec.tls.certificate
, andspec.tls.caCertificate
parameters required by some reencrypt routes.
Run the following command to add the log store CA certificate to the route YAML you created in the previous step:
$ cat ./admin-ca | sed -e "s/^/ /" >> <file-name>.yaml
Create the route:
$ oc create -f <file-name>.yaml
Example output
route.route.openshift.io/elasticsearch created
Check that the Elasticsearch service is exposed:
Get the token of this service account to be used in the request:
$ token=$(oc whoami -t)
Set the elasticsearch route you created as an environment variable.
$ routeES=`oc get route elasticsearch -o jsonpath={.spec.host}`
To verify the route was successfully created, run the following command that accesses Elasticsearch through the exposed route:
curl -tlsv1.2 --insecure -H "Authorization: Bearer ${token}" "https://${routeES}"
The response appears similar to the following:
Example output
{ "name" : "elasticsearch-cdm-i40ktba0-1", "cluster_name" : "elasticsearch", "cluster_uuid" : "0eY-tJzcR3KOdpgeMJo-MQ", "version" : { "number" : "6.8.1", "build_flavor" : "oss", "build_type" : "zip", "build_hash" : "Unknown", "build_date" : "Unknown", "build_snapshot" : true, "lucene_version" : "7.7.0", "minimum_wire_compatibility_version" : "5.6.0", "minimum_index_compatibility_version" : "5.0.0" }, "<tagline>" : "<for search>" }
10.4.11. Removing unused components if you do not use the default Elasticsearch log store
As an administrator, in the rare case that you forward logs to a third-party log store and do not use the default Elasticsearch log store, you can remove several unused components from your logging cluster.
In other words, if you do not use the default Elasticsearch log store, you can remove the internal Elasticsearch logStore
and Kibana visualization
components from the ClusterLogging
custom resource (CR). Removing these components is optional but saves resources.
Prerequisites
Verify that your log forwarder does not send log data to the default internal Elasticsearch cluster. Inspect the
ClusterLogForwarder
CR YAML file that you used to configure log forwarding. Verify that it does not have anoutputRefs
element that specifiesdefault
. For example:outputRefs: - default
Suppose the ClusterLogForwarder
CR forwards log data to the internal Elasticsearch cluster, and you remove the logStore
component from the ClusterLogging
CR. In that case, the internal Elasticsearch cluster will not be present to store the log data. This absence can cause data loss.
Procedure
Edit the
ClusterLogging
custom resource (CR) in theopenshift-logging
project:$ oc edit ClusterLogging instance
-
If they are present, remove the
logStore
andvisualization
stanzas from theClusterLogging
CR. Preserve the
collection
stanza of theClusterLogging
CR. The result should look similar to the following example:apiVersion: "logging.openshift.io/v1" kind: "ClusterLogging" metadata: name: "instance" namespace: "openshift-logging" spec: managementState: "Managed" collection: type: "fluentd" fluentd: {}
Verify that the collector pods are redeployed:
$ oc get pods -l component=collector -n openshift-logging