Chapter 6. Configuring your cluster logging deployment
6.1. About configuring cluster logging
After installing cluster logging into your cluster, you can make the following configurations.
You must set cluster logging to Unmanaged state before performing these configurations, unless otherwise noted. For more information, see Changing the cluster logging management state.
6.1.1. About deploying and configuring cluster logging
OpenShift Container Platform cluster logging is designed to be used with the default configuration, which is tuned for small to medium sized OpenShift Container Platform clusters.
The installation instructions that follow include a sample Cluster Logging Custom Resource (CR), which you can use to create a cluster logging instance and configure your cluster logging deployment.
If you want to use the default cluster logging install, you can use the sample CR directly.
If you want to customize your deployment, make changes to the sample CR as needed. The following describes the configurations you can make when installing your cluster logging instance or modify after installtion. See the Configuring sections for more information on working with each component, including modifications you can make outside of the Cluster Logging Custom Resource.
6.1.1.1. Configuring and Tuning Cluster Logging
You can configure your cluster logging environment by modifying the Cluster Logging Custom Resource deployed in the openshift-logging
project.
You can modify any of the following components upon install or after install:
- Memory and CPU
-
You can adjust both the CPU and memory limits for each component by modifying the
resources
block with valid memory and CPU values:
spec: logStore: elasticsearch: resources: limits: cpu: memory: requests: cpu: 1 memory: 16Gi type: "elasticsearch" collection: logs: fluentd: resources: limits: cpu: memory: requests: cpu: memory: type: "fluentd" visualization: kibana: resources: limits: cpu: memory: requests: cpu: memory: type: kibana curation: curator: resources: limits: memory: 200Mi requests: cpu: 200m memory: 200Mi type: "curator"
- Elasticsearch storage
-
You can configure a persistent storage class and size for the Elasticsearch cluster using the
storageClass
name
andsize
parameters. The Cluster Logging Operator creates aPersistentVolumeClaim
for each data node in the Elasticsearch cluster based on these parameters.
spec: logStore: type: "elasticsearch" elasticsearch: storage: storageClassName: "gp2" size: "200G"
This example specifies each data node in the cluster will be bound to a PersistentVolumeClaim
that requests "200G" of "gp2" storage. Each primary shard will be backed by a single replica.
Omitting the storage
block results in a deployment that includes ephemeral storage only.
spec: logStore: type: "elasticsearch" elasticsearch: storage: {}
- Elasticsearch replication policy
You can set the policy that defines how Elasticsearch shards are replicated across data nodes in the cluster:
-
FullRedundancy
. The shards for each index are fully replicated to every data node. -
MultipleRedundancy
. The shards for each index are spread over half of the data nodes. -
SingleRedundancy
. A single copy of each shard. Logs are always available and recoverable as long as at least two data nodes exist. -
ZeroRedundancy
. No copies of any shards. Logs may be unavailable (or lost) in the event a node is down or fails.
-
- Curator schedule
- You specify the schedule for Curator in the [cron format](https://en.wikipedia.org/wiki/Cron).
spec: curation: type: "curator" resources: curator: schedule: "30 3 * * *"
6.1.1.2. Sample modified Cluster Logging Custom Resource
The following is an example of a Cluster Logging Custom Resource modified using the options previously described.
Sample modified Cluster Logging Custom Resource
apiVersion: "logging.openshift.io/v1" kind: "ClusterLogging" metadata: name: "instance" namespace: "openshift-logging" spec: managementState: "Managed" logStore: type: "elasticsearch" elasticsearch: nodeCount: 2 resources: limits: memory: 2Gi requests: cpu: 200m memory: 2Gi storage: {} redundancyPolicy: "SingleRedundancy" visualization: type: "kibana" kibana: resources: limits: memory: 1Gi requests: cpu: 500m memory: 1Gi replicas: 1 curation: type: "curator" curator: resources: limits: memory: 200Mi requests: cpu: 200m memory: 200Mi schedule: "*/5 * * * *" collection: logs: type: "fluentd" fluentd: resources: limits: memory: 1Gi requests: cpu: 200m memory: 1Gi
6.1.2. Moving the cluster logging resources
You can configure the Cluster Logging Operator to deploy the pods for any or all of the Cluster Logging components, Elasticsearch, Kibana, and Curator to different nodes. You cannot move the Cluster Logging Operator pod from its installed location.
For example, you can move the Elasticsearch pods to a separate node because of high CPU, memory, and disk requirements.
You should set your MachineSet to use at least 6 replicas.
Prerequisites
- Cluster logging and Elasticsearch must be installed. These features are not installed by default.
Procedure
Edit the Cluster Logging Custom Resource in the
openshift-logging
project:$ oc edit ClusterLogging instance
apiVersion: logging.openshift.io/v1 kind: ClusterLogging .... spec: collection: logs: fluentd: resources: null rsyslog: resources: null type: fluentd curation: curator: nodeSelector: 1 node-role.kubernetes.io/infra: '' resources: null schedule: 30 3 * * * type: curator logStore: elasticsearch: nodeCount: 3 nodeSelector: 2 node-role.kubernetes.io/infra: '' redundancyPolicy: SingleRedundancy resources: limits: cpu: 500m memory: 16Gi requests: cpu: 500m memory: 16Gi storage: {} type: elasticsearch managementState: Managed visualization: kibana: nodeSelector: 3 node-role.kubernetes.io/infra: '' 4 proxy: resources: null replicas: 1 resources: null type: kibana ....
6.2. Changing cluster logging management state
In order to modify certain components managed by the Cluster Logging Operator or the Elasticsearch Operator, you must set the operator to the unmanaged state.
In unmanaged state, the operators do not respond to changes in the CRs. The administrator assumes full control of individual component configurations and upgrades when in unmanaged state.
In managed state, the Cluster Logging Operator (CLO) responds to changes in the Cluster Logging Custom Resource (CR) and attempts to update the cluster to match the CR.
The OpenShift Container Platform documentation indicates in a prerequisite step when you must set the cluster to Unmanaged.
If you set the Elasticsearch Operator (EO) to unmanaged and leave the Cluster Logging Operator (CLO) as managed, the CLO will revert changes you make to the EO, as the EO is managed by the CLO.
6.2.1. Changing the cluster logging management state
You must set the operator to the unmanaged state in order to modify the components managed by the Cluster Logging Operator:
- the Curator CronJob,
- the Elasticsearch CR,
- the Kibana Deployment,
- the log collector DaemonSet.
If you make changes to these components in managed state, the Cluster Logging Operator reverts those changes.
An unmanaged cluster logging environment does not receive updates until you return the Cluster Logging Operator to Managed state.
Prerequisites
- The Cluster Logging Operator must be installed.
Procedure
Edit the Cluster Logging Custom Resource (CR) in the
openshift-logging
project:$ oc edit ClusterLogging instance
$ oc edit ClusterLogging instance apiVersion: "logging.openshift.io/v1" kind: "ClusterLogging" metadata: name: "instance" .... spec: managementState: "Managed" 1
- 1
- Specify the management state as
Managed
orUnmanaged
.
6.2.2. Changing the Elasticsearch management state
You must set the operator to the unmanaged state in order to modify the Elasticsearch deployment files, which are managed by the Elasticsearch Operator.
If you make changes to these components in managed state, the Elsticsearch Operator reverts those changes.
An unmanaged Elasticsearch cluster does not receive updates until you return the Elasticsearch Operator to Managed state.
Prerequisite
- The Elasticsearch Operator must be installed.
Have the name of the Elasticsearch CR, in the
openshift-logging
project:$ oc get -n openshift-logging Elasticsearch NAME AGE elasticsearch 28h
Procedure
Edit the Elasticsearch Custom Resource (CR) in the openshift-logging
project:
$ oc edit Elasticsearch elasticsearch
apiVersion: logging.openshift.io/v1
kind: Elasticsearch
metadata:
name: elasticsearch
....
spec:
managementState: "Managed" 1
- 1
- Specify the management state as
Managed
orUnmanaged
.
If you set the Elasticsearch Operator (EO) to unmanaged and leave the Cluster Logging Operator (CLO) as managed, the CLO will revert changes you make to the EO, as the EO is managed by the CLO.
6.3. Configuring cluster logging
Cluster logging is configurable using a Cluster Logging Custom Resource (CR) deployed in the openshift-logging
project.
The Cluster Logging Operator watches for changes to Cluster Logging CRs, creates any missing logging components, and adjusts the logging deployment accordingly.
The Cluster Logging CR is based on the Cluster Logging Custom Resource Definition (CRD), which defines a complete cluster logging deployment and includes all the components of the logging stack to collect, store and visualize logs.
Sample Cluster Logging Custom Resource (CR)
apiVersion: logging.openshift.io/v1 kind: ClusterLogging metadata: creationTimestamp: '2019-03-20T18:07:02Z' generation: 1 name: instance namespace: openshift-logging spec: collection: logs: fluentd: resources: null rsyslog: resources: null type: fluentd curation: curator: resources: null schedule: 30 3 * * * type: curator logStore: elasticsearch: nodeCount: 3 redundancyPolicy: SingleRedundancy resources: limits: cpu: memory: requests: cpu: memory: storage: {} type: elasticsearch managementState: Managed visualization: kibana: proxy: resources: null replicas: 1 resources: null type: kibana
You can configure the following for cluster logging:
- You can place cluster logging into an unmanaged state that allows an administrator to assume full control of individual component configurations and upgrades.
-
You can overwrite the image for each cluster logging component by modifying the appropriate environment variable in the
cluster-logging-operator
Deployment. - You can specify specific nodes for the logging components using node selectors.
6.3.1. Understanding the cluster logging component images
There are several components in cluster logging, each one implemented with one or more images. Each image is specified by an environment variable defined in the cluster-logging-operator deployment in the openshift-logging project and should not be changed.
You can view the images by running the following command:
oc -n openshift-logging set env deployment/cluster-logging-operator --list | grep _IMAGE ELASTICSEARCH_IMAGE=registry.redhat.io/openshift4/ose-logging-elasticsearch5:v4.1 1 FLUENTD_IMAGE=registry.redhat.io/openshift4/ose-logging-fluentd:v4.1 2 KIBANA_IMAGE=registry.redhat.io/openshift4/ose-logging-kibana5:v4.1 3 CURATOR_IMAGE=registry.redhat.io/openshift4/ose-logging-curator5:v4.1 4 OAUTH_PROXY_IMAGE=registry.redhat.io/openshift4/ose-oauth-proxy:v4.1 5
The values might be different depending on your environment.
6.4. Configuring Elasticsearch to store and organize log data
OpenShift Container Platform uses Elasticsearch (ES) to store and organize the log data.
You can configure your Elasticsearch deployment to:
- configure storage for your Elasticsearch cluster;
- define how shards are replicated across data nodes in the cluster, from full replication to no replication;
- configure external access to Elasticsearch data.
Scaling down Elasticsearch nodes is not supported. When scaling down, Elasticsearch pods can be accidentally deleted, possibly resulting in shards not being allocated and replica shards being lost.
Elasticsearch is a memory-intensive application. Each Elasticsearch node needs 16G of memory for both memory requests and CPU limits, unless you specify otherwise in the ClusterLogging Custom Resource. The initial set of OpenShift Container Platform nodes might not be large enough to support the Elasticsearch cluster. You must add additional nodes to the OpenShift Container Platform cluster to run with the recommended or higher memory.
Each Elasticsearch node can operate with a lower memory setting though this is not recommended for production deployments.
If you set the Elasticsearch Operator (EO) to unmanaged and leave the Cluster Logging Operator (CLO) as managed, the CLO will revert changes you make to the EO, as the EO is managed by the CLO.
6.4.1. Configuring Elasticsearch CPU and memory limits
Each component specification allows for adjustments to both the CPU and memory limits. You should not have to manually adjust these values as the Elasticsearch Operator sets values sufficient for your environment.
Each Elasticsearch node can operate with a lower memory setting though this is not recommended for production deployments. For production use, you should have no less than the default 16Gi allocated to each Pod. Preferably you should allocate as much as possible, up to 64Gi per Pod.
Prerequisites
- Cluster logging and Elasticsearch must be installed.
Procedure
Edit the Cluster Logging Custom Resource (CR) in the
openshift-logging
project:$ oc edit ClusterLogging instance
apiVersion: "logging.openshift.io/v1" kind: "ClusterLogging" metadata: name: "instance" .... spec: logStore: type: "elasticsearch" elasticsearch: resources: 1 limits: memory: "16Gi" requests: cpu: "1" memory: "16Gi"
- 1
- Specify the CPU and memory limits as needed. If you leave these values blank, the Elasticsearch Operator sets default values that should be sufficient for most deployments.
6.4.2. Configuring Elasticsearch replication policy
You can define how Elasticsearch shards are replicated across data nodes in the cluster:
Prerequisites
- Cluster logging and Elasticsearch must be installed.
Procedure
Edit the Cluster Logging Custom Resource (CR) in the
openshift-logging
project:oc edit clusterlogging instance
apiVersion: "logging.openshift.io/v1" kind: "ClusterLogging" metadata: name: "instance" .... spec: logStore: type: "elasticsearch" elasticsearch: redundancyPolicy: "SingleRedundancy" 1
- 1
- Specify a redundancy policy for the shards. The change is applied upon saving the changes.
- FullRedundancy. Elasticsearch fully replicates the primary shards for each index to every data node. This provides the highest safety, but at the cost of the highest amount of disk required and the poorest performance.
- MultipleRedundancy. Elasticsearch fully replicates the primary shards for each index to half of the data nodes. This provides a good tradeoff between safety and performance.
- SingleRedundancy. Elasticsearch makes one copy of the primary shards for each index. Logs are always available and recoverable as long as at least two data nodes exist. Better performance than MultipleRedundancy, when using 5 or more nodes. You cannot apply this policy on deployments of single Elasticsearch node.
- ZeroRedundancy. Elasticsearch does not make copies of the primary shards. Logs might be unavailable or lost in the event a node is down or fails. Use this mode when you are more concerned with performance than safety, or have implemented your own disk/PVC backup/restore strategy.
6.4.3. Configuring Elasticsearch storage
Elasticsearch requires persistent storage. The faster the storage, the faster the Elasticsearch performance is.
Using NFS storage as a volume or a persistent volume (or via NAS such as Gluster) is not supported for Elasticsearch storage, as Lucene relies on file system behavior that NFS does not supply. Data corruption and other problems can occur.
Prerequisites
- Cluster logging and Elasticsearch must be installed.
Procedure
Edit the Cluster Logging CR to specify that each data node in the cluster is bound to a Persistent Volume Claim.
apiVersion: "logging.openshift.io/v1" kind: "ClusterLogging" metadata: name: "instance" .... spec: logStore: type: "elasticsearch" elasticsearch: nodeCount: 3 storage: storageClassName: "gp2" size: "200G"
This example specifies each data node in the cluster is bound to a Persistent Volume Claim that requests "200G" of AWS General Purpose SSD (gp2) storage.
6.4.4. Configuring Elasticsearch for emptyDir storage
You can use emptyDir with Elasticsearch, which creates an ephemeral deployment in which all of a pod’s data is lost upon restart.
When using emptyDir, if Elasticsearch is restarted or redeployed, you will lose data.
Prerequisites
- Cluster logging and Elasticsearch must be installed.
Procedure
Edit the Cluster Logging CR to specify emptyDir:
spec: logStore: type: "elasticsearch" elasticsearch: nodeCount: 3 storage: {}
6.4.5. Exposing Elasticsearch as a route
By default, Elasticsearch deployed with cluster logging is not accessible from outside the logging cluster. You can enable a route with re-encryption termination for external access to Elasticsearch for those tools that access its data.
Externally, you can access Elasticsearch by creating a reencrypt route, your OpenShift Container Platform token and the installed Elasticsearch CA certificate. Then, access an Elasticsearch node with a cURL request that contains:
-
The
Authorization: Bearer ${token}
- The Elasticsearch reencrypt route and an Elasticsearch API request.
Internally, you can access Elastiscearch using the Elasticsearch cluster IP:
$ oc get service elasticsearch -o jsonpath={.spec.clusterIP} -n openshift-logging 172.30.183.229 oc get service elasticsearch NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE elasticsearch ClusterIP 172.30.183.229 <none> 9200/TCP 22h $ oc exec elasticsearch-cdm-oplnhinv-1-5746475887-fj2f8 -- curl -tlsv1.2 --insecure -H "Authorization: Bearer ${token}" "https://172.30.183.229:9200/_cat/health" % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 29 100 29 0 0 108 0 --:--:-- --:--:-- --:--:-- 108
Prerequisites
- Cluster logging and Elasticsearch must be installed.
- You must have access to the project in order to be able to access to the logs. For example:
Procedure
To expose Elasticsearch externally:
Change to the
openshift-logging
project:$ oc project openshift-logging
Extract the CA certificate from Elasticsearch and write to the admin-ca file:
$ oc extract secret/elasticsearch --to=. --keys=admin-ca admin-ca
Create the route for the Elasticsearch service as a YAML file:
Create a YAML file with the following:
apiVersion: route.openshift.io/v1 kind: Route metadata: name: elasticsearch namespace: openshift-logging spec: host: to: kind: Service name: elasticsearch tls: termination: reencrypt destinationCACertificate: | 1
- 1
- Add the Elasticsearch CA certifcate or use the command in the next step. You do not have to set the
spec.tls.key
,spec.tls.certificate
, andspec.tls.caCertificate
parameters required by some reencrypt routes.
Add the Elasticsearch CA certificate to the route YAML you created:
cat ./admin-ca | sed -e "s/^/ /" >> <file-name>.yaml
Create the route:
$ oc create -f <file-name>.yaml route.route.openshift.io/elasticsearch created
Check that the Elasticsearch service is exposed:
Get the token of this ServiceAccount to be used in the request:
$ token=$(oc whoami -t)
Set the elasticsearch route you created as an environment variable.
$ routeES=`oc get route elasticsearch -o jsonpath={.spec.host}`
To verify the route was successfully created, run the following command that accesses Elasticsearch through the exposed route:
curl -tlsv1.2 --insecure -H "Authorization: Bearer ${token}" "https://${routeES}/.operations.*/_search?size=1" | jq
The response appears similar to the following:
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 944 100 944 0 0 62 0 0:00:15 0:00:15 --:--:-- 204 { "took": 441, "timed_out": false, "_shards": { "total": 3, "successful": 3, "skipped": 0, "failed": 0 }, "hits": { "total": 89157, "max_score": 1, "hits": [ { "_index": ".operations.2019.03.15", "_type": "com.example.viaq.common", "_id": "ODdiNWIyYzAtMjg5Ni0TAtNWE3MDY1MjMzNTc3", "_score": 1, "_source": { "_SOURCE_MONOTONIC_TIMESTAMP": "673396", "systemd": { "t": { "BOOT_ID": "246c34ee9cdeecb41a608e94", "MACHINE_ID": "e904a0bb5efd3e36badee0c", "TRANSPORT": "kernel" }, "u": { "SYSLOG_FACILITY": "0", "SYSLOG_IDENTIFIER": "kernel" } }, "level": "info", "message": "acpiphp: Slot [30] registered", "hostname": "localhost.localdomain", "pipeline_metadata": { "collector": { "ipaddr4": "10.128.2.12", "ipaddr6": "fe80::xx:xxxx:fe4c:5b09", "inputname": "fluent-plugin-systemd", "name": "fluentd", "received_at": "2019-03-15T20:25:06.273017+00:00", "version": "1.3.2 1.6.0" } }, "@timestamp": "2019-03-15T20:00:13.808226+00:00", "viaq_msg_id": "ODdiNWIyYzAtMYTAtNWE3MDY1MjMzNTc3" } } ] } }
6.4.6. About Elasticsearch alerting rules
You can view these alerting rules in Prometheus.
Alert | Description | Severity |
---|---|---|
ElasticsearchClusterNotHealthy | Cluster health status has been RED for at least 2m. Cluster does not accept writes, shards may be missing or master node hasn’t been elected yet. | critical |
ElasticsearchClusterNotHealthy | Cluster health status has been YELLOW for at least 20m. Some shard replicas are not allocated. | warning |
ElasticsearchBulkRequestsRejectionJumps | High Bulk Rejection Ratio at node in cluster. This node may not be keeping up with the indexing speed. | warning |
ElasticsearchNodeDiskWatermarkReached | Disk Low Watermark Reached at node in cluster. Shards can not be allocated to this node anymore. You should consider adding more disk to the node. | alert |
ElasticsearchNodeDiskWatermarkReached | Disk High Watermark Reached at node in cluster. Some shards will be re-allocated to different nodes if possible. Make sure more disk space is added to the node or drop old indices allocated to this node. | high |
ElasticsearchJVMHeapUseHigh | JVM Heap usage on the node in cluster is <value> | alert |
AggregatedLoggingSystemCPUHigh | System CPU usage on the node in cluster is <value> | alert |
ElasticsearchProcessCPUHigh | ES process CPU usage on the node in cluster is <value> | alert |
6.5. Configuring Kibana
OpenShift Container Platform uses Kibana to display the log data collected by Fluentd and indexed by Elasticsearch.
You can scale Kibana for redundancy and configure the CPU and memory for your Kibana nodes.
You must set cluster logging to Unmanaged state before performing these configurations, unless otherwise noted. For more information, see Changing the cluster logging management state.
6.5.1. Configure Kibana CPU and memory limits
Each component specification allows for adjustments to both the CPU and memory limits.
Procedure
Edit the Cluster Logging Custom Resource (CR) in the
openshift-logging
project:$ oc edit ClusterLogging instance
apiVersion: "logging.openshift.io/v1" kind: "ClusterLogging" metadata: name: "instance" .... spec: visualization: type: "kibana" kibana: replicas: resources: 1 limits: memory: 1Gi requests: cpu: 500m memory: 1Gi proxy: 2 resources: limits: memory: 100Mi requests: cpu: 100m memory: 100Mi
6.5.2. Scaling Kibana for redundancy
You can scale the Kibana deployment for redundancy.
..Procedure
Edit the Cluster Logging Custom Resource (CR) in the
openshift-logging
project:$ oc edit ClusterLogging instance
$ oc edit ClusterLogging instance apiVersion: "logging.openshift.io/v1" kind: "ClusterLogging" metadata: name: "instance" .... spec: visualization: type: "kibana" kibana: replicas: 1 1
- 1
- Specify the number of Kibana nodes.
6.5.3. Installing the Kibana Visualize tool
Kibana’s Visualize tab enables you to create visualizations and dashboards for monitoring container logs, allowing administrator users (cluster-admin
or cluster-reader
) to view logs by deployment, namespace, pod, and container.
Procedure
To load dashboards and other Kibana UI objects:
If necessary, get the Kibana route, which is created by default upon installation of the Cluster Logging Operator:
$ oc get routes -n openshift-logging NAMESPACE NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD openshift-logging kibana kibana-openshift-logging.apps.openshift.com kibana <all> reencrypt/Redirect None
Get the name of your Elasticsearch pods.
$ oc get pods -l component=elasticsearch NAME READY STATUS RESTARTS AGE elasticsearch-cdm-5ceex6ts-1-dcd6c4c7c-jpw6k 2/2 Running 0 22h elasticsearch-cdm-5ceex6ts-2-f799564cb-l9mj7 2/2 Running 0 22h elasticsearch-cdm-5ceex6ts-3-585968dc68-k7kjr 2/2 Running 0 22h
Create the necessary per-user configuration that this procedure requires:
Log in to the Kibana dashboard as the user you want to add the dashboards to.
https://kibana-openshift-logging.apps.openshift.com 1
- 1
- Where the URL is Kibana route.
- If the Authorize Access page appears, select all permissions and click Allow selected permissions.
- Log out of the Kibana dashboard.
Run the following command from the project where the pod is located using the name of any of your Elastiscearch pods:
$ oc exec <es-pod> -- es_load_kibana_ui_objects <user-name>
For example:
$ oc exec elasticsearch-cdm-5ceex6ts-1-dcd6c4c7c-jpw6k -- es_load_kibana_ui_objects <user-name>
6.6. Curation of Elasticsearch Data
The Elasticsearch Curator tool performs scheduled maintenance operations on a global and/or on a per-project basis. Curator performs actions daily based on its configuration.
The Cluster Logging Operator installs Curator and its configuration. You can configure the Curator cron schedule using the Cluster Logging Custom Resource and further configuration options can be found in the Curator ConfigMap, curator
in the openshift-logging
project, which incorporates the Curator configuration file, curator5.yaml and an OpenShift Container Platform custom configuration file, config.yaml.
OpenShift Container Platform uses the config.yaml internally to generate the Curator action
file.
Optionally, you can use the action
file, directly. Editing this file allows you to use any action that Curator has available to it to be run periodically. However, this is only recommended for advanced users as modifying the file can be destructive to the cluster and can cause removal of required indices/settings from Elasticsearch. Most users only must modify the Curator configuration map and never edit the action
file.
6.6.1. Configuring the Curator schedule
You can specify the schedule for Curator using the cluster logging Custom Resource created by the cluster logging installation.
Prerequisites
- Cluster logging and Elasticsearch must be installed.
Procedure
To configure the Curator schedule:
Edit the Cluster Logging Custom Resource in the
openshift-logging
project:$ oc edit clusterlogging instance
apiVersion: "logging.openshift.io/v1" kind: "ClusterLogging" metadata: name: "instance" ... curation: curator: schedule: 30 3 * * * 1 type: curator
- 1
- Specify the schedule for Curator in cron format.
NoteThe time zone is set based on the host node where the Curator pod runs.
6.6.2. Configuring Curator index deletion
You can configure Curator to delete Elasticsearch data based on retention settings. You can configure per-project and global settings. Global settings apply to any project not specified. Per-project settings override global settings.
Prerequisite
- Cluster logging must be installed.
Procedure
To delete indices:
Edit the OpenShift Container Platform custom Curator configuration file:
$ oc edit configmap/curator
Set the following parameters as needed:
config.yaml: | project_name: action unit:value
The available parameters are:
Table 6.1. Project options Variable Name Description project_name
The actual name of a project, such as myapp-devel. For OpenShift Container Platform operations logs, use the name
.operations
as the project name.action
The action to take, currently only
delete
is allowed.unit
The period to use for deletion,
days
,weeks
, ormonths
.value
The number of units.
Table 6.2. Filter options Variable Name Description .defaults
Use
.defaults
as theproject_name
to set the defaults for projects that are not specified..regex
The list of regular expressions that match project names.
pattern
The valid and properly escaped regular expression pattern enclosed by single quotation marks.
For example, to configure Curator to:
-
Delete indices in the myapp-dev project older than
1 day
-
Delete indices in the myapp-qe project older than
1 week
-
Delete operations logs older than
8 weeks
-
Delete all other projects indices after they are
31 days
old -
Delete indices older than 1 day that are matched by the
^project\..+\-dev.*$
regex -
Delete indices older than 2 days that are matched by the
^project\..+\-test.*$
regex
Use:
config.yaml: | .defaults: delete: days: 31 .operations: delete: weeks: 8 myapp-dev: delete: days: 1 myapp-qe: delete: weeks: 1 .regex: - pattern: '^project\..+\-dev\..*$' delete: days: 1 - pattern: '^project\..+\-test\..*$' delete: days: 2
When you use months
as the $UNIT
for an operation, Curator starts counting at the first day of the current month, not the current day of the current month. For example, if today is April 15, and you want to delete indices that are 2 months older than today (delete: months: 2), Curator does not delete indices that are dated older than February 15; it deletes indices older than February 1. That is, it goes back to the first day of the current month, then goes back two whole months from that date. If you want to be exact with Curator, it is best to use days (for example, delete: days: 30
).
6.6.3. Troubleshooting Curator
You can use information in this section for debugging Curator. For example, if curator is in failed state, but the log messages do not provide a reason, you could increase the log level and trigger a new job, instead of waiting for another scheduled run of the cron job.
Prerequisites
Cluster logging and Elasticsearch must be installed.
Procedure
Enable the Curator debug log and trigger next Curator iteration manually
Enable debug log of Curator:
$ oc set env cronjob/curator CURATOR_LOG_LEVEL=DEBUG CURATOR_SCRIPT_LOG_LEVEL=DEBUG
Specify the log level:
- CRITICAL. Curator displays only critical messages.
- ERROR. Curator displays only error and critical messages.
- WARNING. Curator displays only error, warning, and critical messages.
- INFO. Curator displays only informational, error, warning, and critical messages.
DEBUG. Curator displays only debug messages, in addition to all of the above.
The default value is INFO.
Cluster logging uses the OpenShift Container Platform custom environment variable CURATOR_SCRIPT_LOG_LEVEL
in OpenShift Container Platform wrapper scripts (run.sh
and convert.py
). The environment variable takes the same values as CURATOR_LOG_LEVEL
for script debugging, as needed.
Trigger next curator iteration:
$ oc create job --from=cronjob/curator <job_name>
Use the following commands to control the CronJob:
Suspend a CronJob:
$ oc patch cronjob curator -p '{"spec":{"suspend":true}}'
Resume a CronJob:
$ oc patch cronjob curator -p '{"spec":{"suspend":false}}'
Change a CronJob schedule:
$ oc patch cronjob curator -p '{"spec":{"schedule":"0 0 * * *"}}' 1
- 1
- The
schedule
option accepts schedules in cron format.
6.6.4. Configuring Curator in scripted deployments
Use the information in this section if you must configure Curator in scripted deployments.
Prerequisites
- Cluster logging and Elasticsearch must be installed.
- Set cluster logging to the unmanaged state.
Procedure
Use the following snippets to configure Curator in your scripts:
For scripted deployments
Create and modify the configuration:
Copy the Curator configuration file and the OpenShift Container Platform custom configuration file from the Curator configuration map and create separate files for each:
$ oc extract configmap/curator --keys=curator5.yaml,config.yaml --to=/my/config
- Edit the /my/config/curator5.yaml and /my/config/config.yaml files.
Delete the existing Curator config map and add the edited YAML files to a new Curator config map.
$ oc delete configmap curator ; sleep 1 $ oc create configmap curator \ --from-file=curator5.yaml=/my/config/curator5.yaml \ --from-file=config.yaml=/my/config/config.yaml \ ; sleep 1
The next iteration will use this configuration.
If you are using the action file:
Create and modify the configuration:
Copy the Curator configuration file and the action file from the Curator configuration map and create separate files for each:
$ oc extract configmap/curator --keys=curator5.yaml,actions.yaml --to=/my/config
- Edit the /my/config/curator5.yaml and /my/config/actions.yaml files.
Delete the existing Curator config map and add the edited YAML files to a new Curator config map.
$ oc delete configmap curator ; sleep 1 $ oc create configmap curator \ --from-file=curator5.yaml=/my/config/curator5.yaml \ --from-file=actions.yaml=/my/config/actions.yaml \ ; sleep 1
The next iteration will use this configuration.
6.6.5. Using the Curator Action file
The Curator ConfigMap in the openshift-logging
project includes a Curator action file where you configure any Curator action to be run periodically.
However, when you use the action file, OpenShift Container Platform ignores the config.yaml
section of the curator ConfigMap, which is configured to ensure important internal indices do not get deleted by mistake. In order to use the action file, you should add an exclude rule to your configuration to retain these indices. You also must manually add all the other patterns following the steps in this topic.
The actions
and config.yaml
are mutually-exclusive configuration files. Once the actions
file exist, OpenShift Container Platform ignores the config.yaml
file. Using the action file is recommended only for advanced users as using this file can be destructive to the cluster and can cause removal of required indices/settings from Elasticsearch.
Prerequisite
- Cluster logging and Elasticsearch must be installed.
- Set cluster logging to the unmanaged state.
Procedure
To configure Curator to delete indices:
Edit the Curator ConfigMap:
oc edit cm/curator -n openshift-logging
Make the following changes to the
action
file:actions: 1: action: delete_indices 1 description: >- Delete .operations indices older than 30 days. Ignore the error if the filter does not result in an actionable list of indices (ignore_empty_list). See https://www.elastic.co/guide/en/elasticsearch/client/curator/5.2/ex_delete_indices.html options: # Swallow curator.exception.NoIndices exception ignore_empty_list: True # In seconds, default is 300 timeout_override: ${CURATOR_TIMEOUT} # Don't swallow any other exceptions continue_if_exception: False # Optionally disable action, useful for debugging disable_action: False # All filters are bound by logical AND filters: 2 - filtertype: pattern kind: regex value: '^\.operations\..*$' exclude: False 3 - filtertype: age # Parse timestamp from index name source: name direction: older timestring: '%Y.%m.%d' unit: days unit_count: 30 exclude: False
- 1
- Specify
delete_indices
to delete the specified index. - 2
- Use the
filers
parameters to specify the index to be deleted. See the Elastic Search curator documentation for information on these parameters. - 3
- Specify
false
to allow the index to be deleted.
6.7. Configuring Fluentd
OpenShift Container Platform uses Fluentd to collect operations and application logs from your cluster which OpenShift Container Platform enriches with Kubernetes Pod and Namespace metadata.
You can configure log rotation, log location, use an external log aggregator, and make other configurations.
You must set cluster logging to Unmanaged state before performing these configurations, unless otherwise noted. For more information, see Changing the cluster logging management state.
6.7.1. Viewing Fluentd pods
You can use the oc get pods -o wide
command to see the nodes where the Fluentd pod are deployed.
Procedure
Run the following command in the openshift-logging
project:
$ oc get pods -o wide | grep fluentd NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE fluentd-5mr28 1/1 Running 0 4m56s 10.129.2.12 ip-10-0-164-233.ec2.internal <none> fluentd-cnc4c 1/1 Running 0 4m56s 10.128.2.13 ip-10-0-155-142.ec2.internal <none> fluentd-nlp8z 1/1 Running 0 4m56s 10.131.0.13 ip-10-0-138-77.ec2.internal <none> fluentd-rknlk 1/1 Running 0 4m56s 10.128.0.33 ip-10-0-128-130.ec2.internal <none> fluentd-rsm49 1/1 Running 0 4m56s 10.129.0.37 ip-10-0-163-191.ec2.internal <none> fluentd-wjt8s 1/1 Running 0 4m56s 10.130.0.42 ip-10-0-156-251.ec2.internal <none>
6.7.2. Viewing Fluentd logs
How you view logs depends upon the LOGGING_FILE_PATH
setting.
If
LOGGING_FILE_PATH
points to a file, the default, use the logs utility, from the project, where the pod is located, to print out the contents of Fluentd log files:$ oc exec <any-fluentd-pod> -- logs 1
- 1
- Specify the name of a Fluentd pod. Note the space before
logs
.
For example:
$ oc exec fluentd-ht42r -n openshift-logging -- logs
To view the current setting:
oc -n openshift-logging set env daemonset/fluentd --list | grep LOGGING_FILE_PATH
If you are using
LOGGING_FILE_PATH=console
, Fluentd writes logs to stdout/stderr`. You can retrieve the logs with theoc logs [-f] <pod_name>
command, where the-f
is optional, from the project where the pod is located.$ oc logs -f <any-fluentd-pod> 1
- 1
- Specify the name of a Fluentd pod. Use the
-f
option to follow what is being written into the logs.
For example
$ oc logs -f fluentd-ht42r -n openshift-logging
The contents of log files are printed out, starting with the oldest log.
6.7.3. Configure Fluentd CPU and memory limits
Each component specification allows for adjustments to both the CPU and memory limits.
Procedure
Edit the Cluster Logging Custom Resource (CR) in the
openshift-logging
project:$ oc edit ClusterLogging instance
$ oc edit ClusterLogging instance apiVersion: "logging.openshift.io/v1" kind: "ClusterLogging" metadata: name: "instance" .... spec: collection: logs: fluentd: resources: limits: 1 cpu: 250m memory: 1Gi requests: cpu: 250m memory: 1Gi
- 1
- Specify the CPU and memory limits as needed. The values shown are the default values.
6.7.4. Configuring Fluentd log location
Fluentd writes logs to a specified file or to the default location, /var/log/fluentd/fluentd.log
, based on the LOGGING_FILE_PATH
environment variable.
Prerequisite
Set cluster logging to the unmanaged state.
Procedure
To set the output location for the Fluentd logs:
Edit the
LOGGING_FILE_PATH
parameter in thefluentd
daemonset. You can specify a particular file orconsole
:spec: template: spec: containers: env: - name: LOGGING_FILE_PATH value: console 1 LOGGING_FILE_PATH= 2
- 1
- Specify the log output method:
-
use
console
to use the Fluentd default location. Retrieve the logs with theoc logs [-f] <pod_name>
command. use
<path-to-log/fluentd.log> to sends the log output to the specified file. Retrieve the logs with the `oc exec <pod_name> — logs
command. This is the default setting.Or, use the CLI:
oc -n openshift-logging set env daemonset/fluentd LOGGING_FILE_PATH=console
-
use
6.7.5. Throttling Fluentd logs
For projects that are especially verbose, an administrator can throttle down the rate at which the logs are read in by Fluentd before being processed. By throttling, you deliberately slow down the rate at which you are reading logs, so Kibana might take longer to display records.
Throttling can contribute to log aggregation falling behind for the configured projects; log entries can be lost if a pod is deleted before Fluentd catches up.
Throttling does not work when using the systemd journal as the log source. The throttling implementation depends on being able to throttle the reading of the individual log files for each project. When reading from the journal, there is only a single log source, no log files, so no file-based throttling is available. There is not a method of restricting the log entries that are read into the Fluentd process.
Prerequisite
Set cluster logging to the unmanaged state.
Procedure
To configure Fluentd to restrict specific projects, edit the throttle configuration in the Fluentd ConfigMap after deployment:
$ oc edit configmap/fluentd
The format of the throttle-config.yaml key is a YAML file that contains project names and the desired rate at which logs are read in on each node. The default is 1000 lines at a time per node. For example:
throttle-config.yaml: | - opensift-logging: read_lines_limit: 10 - .operations: read_lines_limit: 100
6.7.6. Understanding Buffer Chunk Limiting for Fluentd
If the Fluentd logger is unable to keep up with a high number of logs, it will need to switch to file buffering to reduce memory usage and prevent data loss.
Fluentd file buffering stores records in chunks. Chunks are stored in buffers.
The Fluentd buffer_chunk_limit
is determined by the environment variable BUFFER_SIZE_LIMIT
, which has the default value 8m
. The file buffer size per output is determined by the environment variable FILE_BUFFER_LIMIT
, which has the default value 256Mi
. The permanent volume size must be larger than FILE_BUFFER_LIMIT
multiplied by the output.
On the Fluentd pods, permanent volume /var/lib/fluentd should be prepared by the PVC or hostmount, for example. That area is then used for the file buffers.
The buffer_type
and buffer_path
are configured in the Fluentd configuration files as follows:
$ egrep "buffer_type|buffer_path" *.conf output-es-config.conf: buffer_type file buffer_path `/var/lib/fluentd/buffer-output-es-config` output-es-ops-config.conf: buffer_type file buffer_path `/var/lib/fluentd/buffer-output-es-ops-config`
The Fluentd buffer_queue_limit
is the value of the variable BUFFER_QUEUE_LIMIT
. This value is 32
by default.
The environment variable BUFFER_QUEUE_LIMIT
is calculated as (FILE_BUFFER_LIMIT / (number_of_outputs * BUFFER_SIZE_LIMIT))
.
If the BUFFER_QUEUE_LIMIT
variable has the default set of values:
-
FILE_BUFFER_LIMIT = 256Mi
-
number_of_outputs = 1
-
BUFFER_SIZE_LIMIT = 8Mi
The value of buffer_queue_limit
will be 32
. To change the buffer_queue_limit
, you must change the value of FILE_BUFFER_LIMIT
.
In this formula, number_of_outputs
is 1
if all the logs are sent to a single resource, and it is incremented by 1
for each additional resource. For example, the value of number_of_outputs
is:
-
1
- if all logs are sent to a single Elasticsearch pod -
2
- if application logs are sent to an Elasticsearch pod and ops logs are sent to another Elasticsearch pod -
4
- if application logs are sent to an Elasticsearch pod, ops logs are sent to another Elasticsearch pod, and both of them are forwarded to other Fluentd instances
6.7.7. Configuring Fluentd JSON parsing
You can configure Fluentd to inspect each log message to determine if the message is in JSON format and merge the message into the JSON payload document posted to Elasticsearch. This feature is disabled by default.
You can enable or disable this feature by editing the MERGE_JSON_LOG
environment variable in the fluentd daemonset.
Enabling this feature comes with risks, including:
- Possible log loss due to Elasticsearch rejecting documents due to inconsistent type mappings.
- Potential buffer storage leak caused by rejected message cycling.
- Overwrite of data for field with same names.
The features in this topic should be used by only experienced Fluentd and Elasticsearch users.
Prerequisites
Set cluster logging to the unmanaged state.
Procedure
Use the following command to enable this feature:
oc set env ds/fluentd MERGE_JSON_LOG=true 1
- 1
- Set this to
false
to disable this feature ortrue
to enable this feature.
Setting MERGE_JSON_LOG and CDM_UNDEFINED_TO_STRING
If you set the MERGE_JSON_LOG
and CDM_UNDEFINED_TO_STRING
enviroment variables to true
, you might receive an Elasticsearch 400 error. The error occurs because when`MERGE_JSON_LOG=true`, Fluentd adds fields with data types other than string. When you set CDM_UNDEFINED_TO_STRING=true
, Fluentd attempts to add those fields as a string value resulting in the Elasticsearch 400 error. The error clears when the indices roll over for the next day.
When Fluentd rolls over the indices for the next day’s logs, it will create a brand new index. The field definitions are updated and you will not get the 400 error.
Records that have hard errors, such as schema violations, corrupted data, and so forth, cannot be retried. Fluent sends the records for error handling. If you add a <label @ERROR>
section to your Fluentd config, as the last <label>, you can handle these records as needed.
For example:
data: fluent.conf: .... <label @ERROR> <match **> @type file path /var/log/fluent/dlq time_slice_format %Y%m%d time_slice_wait 10m time_format %Y%m%dT%H%M%S%z compress gzip </match> </label>
This section writes error records to the Elasticsearch dead letter queue (DLQ) file. See the fluentd documentation for more information about the file output.
Then you can edit the file to clean up the records manually, edit the file to use with the Elasticsearch /_bulk index
API and use cURL to add those records. For more information on Elasticsearch Bulk API, see the Elasticsearch documentation.
6.7.8. Configuring how the log collector normalizes logs
Cluster Logging uses a specific data model, like a database schema, to store log records and their metadata in the logging store. There are some restrictions on the data:
-
There must be a
"message"
field containing the actual log message. -
There must be a
"@timestamp"
field containing the log record timestamp in RFC 3339 format, preferably millisecond or better resolution. -
There must be a
"level"
field with the log level, such aserr
,info
,unknown
, and so forth.
For more information on the data model, see Exported Fields.
Because of these requirements, conflicts and inconsistencies can arise with log data collected from different subsystems.
For example, if you use the MERGE_JSON_LOG
feature (MERGE_JSON_LOG=true
), it can be extremely useful to have your applications log their output in JSON, and have the log collector automatically parse and index the data in Elasticsearch. However, this leads to several problems, including:
- field names can be empty, or contain characters that are illegal in Elasticsearch;
- different applications in the same namespace might output the same field name with different value data types;
- applications might emit too many fields;
- fields may conflict with the cluster logging built-in fields.
You can configure how cluster logging treats fields from disparate sources by editing the log collector daemonset, Fluentd or Rsyslog, and setting environment variables in the table below.
Undefined fields. One of the problems with log data from disparate systems is that some fields might be unknown to the ViaQ data model. Such fields are called undefined. ViaQ requires all top-level fields to be defined and described.
Use the parameters to configure how OpenShift Container Platform moves any undefined fields under a top-level field called
undefined
to avoid conflicting with the well known ViaQ top-level fields. You can add undefined fields to the top-level fields and move others to anundefined
container.You can also replace special characters in undefined fields and convert undefined fields to their JSON string representation. Coverting to JSON string preserves the structure of the value, so that you can retrieve the value later and convert it back to a map or an array.
-
Simple scalar values like numbers and booleans are changed to a quoted string. For example:
10
becomes"10"
,3.1415
becomes"3.1415"`
,false
becomes"false"
. -
Map/dict values and array values are converted to their JSON string representation:
"mapfield":{"key":"value"}
becomes"mapfield":"{\"key\":\"value\"}"
and"arrayfield":[1,2,"three"]
becomes"arrayfield":"[1,2,\"three\"]"
.
-
Simple scalar values like numbers and booleans are changed to a quoted string. For example:
Defined fields. You can also configure which defined fields appear in the top levels of the logs.
The default top-level fields, defined through the
CDM_DEFAULT_KEEP_FIELDS
parameter, areCEE
,time
,@timestamp
,aushape
,ci_job
,collectd
,docker
,fedora-ci
,file
,foreman
,geoip
,hostname
,ipaddr4
,ipaddr6
,kubernetes
,level
,message
,namespace_name
,namespace_uuid
,offset
,openstack
,ovirt
,pid
,pipeline_metadata
,rsyslog
,service
,systemd
,tags
,testcase
,tlog
,viaq_msg_id
.Any fields not included in
${CDM_DEFAULT_KEEP_FIELDS}
or${CDM_EXTRA_KEEP_FIELDS}
are moved to${CDM_UNDEFINED_NAME}
ifCDM_USE_UNDEFINED
istrue
.NoteThe
CDM_DEFAULT_KEEP_FIELDS
parameter is for only advanced users, or if you are instructed to do so by Red Hat support.- Empty fields. You can determine which empty fields to retain from disparate logs.
Parameters | Definition | Example |
---|---|---|
|
Specify an extra set of defined fields to be kept at the top level of the logs in addition to the |
|
| Specify fields to retain even if empty in CSV format. Empty defined fields not specified are dropped. The default is "message", keep empty messages. |
|
|
Set to |
|
|
Specify a name for the undefined top level field if using |
|
|
If the number of undefined fields is greater than this number, all undefined fields are converted to their JSON string representation and stored in the Note
This parameter is honored even if |
|
|
Set to |
|
|
Specify a character to use in place of a dot character '.' in an undefined field. |
|
If you set the MERGE_JSON_LOG
parameter in the log collector daemonset and CDM_UNDEFINED_TO_STRING
environment variables to true, you might receive an Elasticsearch 400 error. The error occurs because when`MERGE_JSON_LOG=true`, the log collector adds fields with data types other than string. When you set CDM_UNDEFINED_TO_STRING=true
, the log collector attempts to add those fields as a string value resulting in the Elasticsearch 400 error. The error clears when the log collector rolls over the indices for the next day’s logs
When the log collector rolls over the indices, it creates a brand new index. The field definitions are updated and you will not get the 400 error.
Procedure
Use the CDM_*
parameters to configure undefined and empty field processing.
Configure how to process fields, as needed:
-
Specify the fields to move using
CDM_EXTRA_KEEP_FIELDS
. -
Specify any empty fields to retain in the
CDM_KEEP_EMPTY_FIELDS
parameter in CSV format.
-
Specify the fields to move using
Configure how to process undefined fields, as needed:
-
Set
CDM_USE_UNDEFINED
totrue
to move undefined fields to the top-levelundefined
field: -
Specify a name for the undefined fields using the
CDM_UNDEFINED_NAME
parameter. -
Set
CDM_UNDEFINED_MAX_NUM_FIELDS
to a value other than the default-1
, to set an upper bound on the number of undefined fields in a single record.
-
Set
-
Specify
CDM_UNDEFINED_DOT_REPLACE_CHAR
to change any dot.
characters in an undefined field name to another character. For example, ifCDM_UNDEFINED_DOT_REPLACE_CHAR=@@@
and there is a field namedfoo.bar.baz
the field is transformed intofoo@@@bar@@@baz
. -
Set
UNDEFINED_TO_STRING
totrue
to convert undefined fields to their JSON string representation.
If you configure the CDM_UNDEFINED_TO_STRING
or CDM_UNDEFINED_MAX_NUM_FIELDS
parameters, you use the CDM_UNDEFINED_NAME
to change the undefined field name. This field is needed because CDM_UNDEFINED_TO_STRING
or CDM_UNDEFINED_MAX_NUM_FIELDS
could change the value type of the undefined field. When CDM_UNDEFINED_TO_STRING
or CDM_UNDEFINED_MAX_NUM_FIELDS
is set to true and there are more undefined fields in a log, the value type becomes string
. Elasticsearch stops accepting records if the value type is changed, for example, from JSON to JSON string.
For example, when CDM_UNDEFINED_TO_STRING
is false
or CDM_UNDEFINED_MAX_NUM_FIELDS
is the default, -1
, the value type of the undefined field is json
. If you change CDM_UNDEFINED_MAX_NUM_FIELDS
to a value other than default and there are more undefined fields in a log, the value type becomes string
(json string). Elasticsearch stops accepting records if the value type is changed.
6.7.9. Configuring Fluentd using environment variables
You can use environment variables to modify your Fluentd configuration.
Prerequisite
Set cluster logging to the unmanaged state.
Procedure
Set any of the Fluentd environment variables as needed:
oc set env ds/fluentd <env-var>=<value>
For example:
oc set env ds/fluentd LOGGING_FILE_AGE=30
6.8. Sending OpenShift Container Platform logs to external devices
You can send Elasticsearch logs to external devices, such as an externally-hosted Elasticsearch instance or an external syslog server. You can also configure Fluentd to send logs to an external log aggregator.
You must set cluster logging to Unmanaged state before performing these configurations, unless otherwise noted. For more information, see Changing the cluster logging management state.
6.8.1. Configuring Fluentd to send logs to an external Elasticsearch instance
Fluentd sends logs to the value of the ES_HOST
, ES_PORT
, OPS_HOST
, and OPS_PORT
environment variables of the Elasticsearch deployment configuration. The application logs are directed to the ES_HOST
destination, and operations logs to OPS_HOST
.
Sending logs directly to an AWS Elasticsearch instance is not supported. Use Fluentd Secure Forward to direct logs to an instance of Fluentd that you control and that is configured with the fluent-plugin-aws-elasticsearch-service
plug-in.
Prerequisite
- Cluster logging and Elasticsearch must be installed.
- Set cluster logging to the unmanaged state.
Procedure
To direct logs to a specific Elasticsearch instance:
Edit the
fluentd
DaemonSet in the openshift-logging project:$ oc edit ds/fluentd spec: template: spec: containers: env: - name: ES_HOST value: elasticsearch - name: ES_PORT value: '9200' - name: ES_CLIENT_CERT value: /etc/fluent/keys/app-cert - name: ES_CLIENT_KEY value: /etc/fluent/keys/app-key - name: ES_CA value: /etc/fluent/keys/app-ca - name: OPS_HOST value: elasticsearch - name: OPS_PORT value: '9200' - name: OPS_CLIENT_CERT value: /etc/fluent/keys/infra-cert - name: OPS_CLIENT_KEY value: /etc/fluent/keys/infra-key - name: OPS_CA value: /etc/fluent/keys/infra-ca
-
Set
ES_HOST
andOPS_HOST
to the same destination, while ensuring thatES_PORT
andOPS_PORT
also have the same value for an external Elasticsearch instance to contain both application and operations logs. - Configure your externally-hosted Elasticsearch instance for TLS. Only externally-hosted Elasticsearch instances that use Mutual TLS are allowed.
If you are not using the provided Kibana and Elasticsearch images, you will not have the same multi-tenant capabilities and your data will not be restricted by user access to a particular project.
6.8.2. Configuring Fluentd to send logs to an external syslog server
Use the fluent-plugin-remote-syslog
plug-in on the host to send logs to an external syslog server.
Prerequisite
Set cluster logging to the unmanaged state.
Procedure
Set environment variables in the
fluentd
daemonset in theopenshift-logging
project:spec: template: spec: containers: - name: fluentd image: 'registry.redhat.io/openshift4/ose-logging-fluentd:v4.1' env: - name: REMOTE_SYSLOG_HOST 1 value: host1 - name: REMOTE_SYSLOG_HOST_BACKUP value: host2 - name: REMOTE_SYSLOG_PORT_BACKUP value: 5555
- 1
- The desired remote syslog host. Required for each host.
This will build two destinations. The syslog server on
host1
will be receiving messages on the default port of514
, whilehost2
will be receiving the same messages on port5555
.Alternatively, you can configure your own custom the
fluentd
daemonset in theopenshift-logging
project.Fluentd Environment Variables
Parameter Description USE_REMOTE_SYSLOG
Defaults to
false
. Set totrue
to enable use of thefluent-plugin-remote-syslog
gemREMOTE_SYSLOG_HOST
(Required) Hostname or IP address of the remote syslog server.
REMOTE_SYSLOG_PORT
Port number to connect on. Defaults to
514
.REMOTE_SYSLOG_SEVERITY
Set the syslog severity level. Defaults to
debug
.REMOTE_SYSLOG_FACILITY
Set the syslog facility. Defaults to
local0
.REMOTE_SYSLOG_USE_RECORD
Defaults to
false
. Set totrue
to use the record’s severity and facility fields to set on the syslog message.REMOTE_SYSLOG_REMOVE_TAG_PREFIX
Removes the prefix from the tag, defaults to
''
(empty).REMOTE_SYSLOG_TAG_KEY
If specified, uses this field as the key to look on the record, to set the tag on the syslog message.
REMOTE_SYSLOG_PAYLOAD_KEY
If specified, uses this field as the key to look on the record, to set the payload on the syslog message.
REMOTE_SYSLOG_TYPE
Set the transport layer protocol type. Defaults to
syslog_buffered
, which sets the TCP protocol. To switch to UDP, set this tosyslog
.WarningThis implementation is insecure, and should only be used in environments where you can guarantee no snooping on the connection.
6.8.3. Configuring Fluentd to send logs to an external log aggregator
You can configure Fluentd to send a copy of its logs to an external log aggregator, and not the default Elasticsearch, using the out_forward plug-in. From there, you can further process log records after the locally hosted Fluentd has processed them.
The forward
plug-in is supported by Fluentd only. The out_forward plug-in implements the client side (sender) and the in_forward plug-in implements the server side (receiver).
To configure OpenShift Container Platform to send logs using out_forward, create a ConfigMap called secure-forward
in the openshift-logging
namespace that points to a receiver. On the receiver, configure the in_forward plug-in to receive the logs from OpenShift Container Platform. For more information on using the in_forward plug-in, see the Fluentd documentation.
Default secure-forward.conf
section
# <store> # @type forward # <security> # self_hostname ${hostname} # ${hostname} is a placeholder. # shared_key <shared_key_between_forwarder_and_forwardee> # </security> # transport tls # tls_verify_hostname true # Set false to ignore server cert hostname. # tls_cert_path /path/for/certificate/ca_cert.pem # <buffer> # @type file # path '/var/lib/fluentd/forward' # queued_chunks_limit_size "#{ENV['BUFFER_QUEUE_LIMIT'] || '1024' }" # chunk_limit_size "#{ENV['BUFFER_SIZE_LIMIT'] || '1m' }" # flush_interval "#{ENV['FORWARD_FLUSH_INTERVAL'] || '5s'}" # flush_at_shutdown "#{ENV['FLUSH_AT_SHUTDOWN'] || 'false'}" # flush_thread_count "#{ENV['FLUSH_THREAD_COUNT'] || 2}" # retry_max_interval "#{ENV['FORWARD_RETRY_WAIT'] || '300'}" # retry_forever true # # the systemd journald 0.0.8 input plugin will just throw away records if the buffer # # queue limit is hit - 'block' will halt further reads and keep retrying to flush the # # buffer to the remote - default is 'exception' because in_tail handles that case # overflow_action "#{ENV['BUFFER_QUEUE_FULL_ACTION'] || 'exception'}" # </buffer> # <server> # host server.fqdn.example.com # or IP # port 24284 # </server> # <server> # host 203.0.113.8 # ip address to connect # name server.fqdn.example.com # The name of the server. Used for logging and certificate verification in TLS transport (when host is address). # </server> # </store>
Procedure
To send a copy of Fluentd logs to an external log aggregator:
Edit the
secure-forward.conf
section of the Fluentd configuration map:$ oc edit configmap/fluentd -n openshift-logging
Enter the name, host, and port for your external Fluentd server:
# <server> # host server.fqdn.example.com # or IP # port 24284 # </server> # <server> # host 203.0.113.8 # ip address to connect # name server.fqdn.example.com # The name of the server. Used for logging and certificate verification in TLS transport (when host is address). # </server>
For example:
<server> name externalserver1 1 host 192.168.1.1 2 port 24224 3 </server> <server> 4 name externalserver1 host 192.168.1.2 port 24224 </server> </store>
Add the path to your CA certificate and private key to the
secure-forward.conf
section:# <security> # self_hostname ${hostname} # ${hostname} is a placeholder. 1 # shared_key <shared_key_between_forwarder_and_forwardee> 2 # </security> # tls_cert_path /path/for/certificate/ca_cert.pem 3
For example:
<security> self_hostname client.fqdn.local shared_key cluster_logging_key </security> tls_cert_path /etc/fluent/keys/ca.crt
To use mTLS, see the Fluentd documentation for information about client certificate and key parameters and other settings.
Add certificates to be used in
secure-forward.conf
to the existing secret that is mounted on the Fluentd pods. Theyour_ca_cert
andyour_private_key
values must match what is specified insecure-forward.conf
inconfigmap/logging-fluentd
:$ oc patch secrets/fluentd --type=json \ --patch "[{'op':'add','path':'/data/your_ca_cert','value':'$(base64 -w0 /path/to/your_ca_cert.pem)'}]" $ oc patch secrets/fluentd --type=json \ --patch "[{'op':'add','path':'/data/your_private_key','value':'$(base64 -w0 /path/to/your_private_key.pem)'}]"
NoteReplace
your_private_key
with a generic name. This is a link to the JSON path, not a path on your host system.For example:
$ oc patch secrets/fluentd --type=json \ --patch "[{'op':'add','path':'/data/ca.crt','value':'$(base64 -w0 /etc/fluent/keys/ca.crt)'}]" $ oc patch secrets/fluentd --type=json \ --patch "[{'op':'add','path':'/data/ext-agg','value':'$(base64 -w0 /etc/fluent/keys/ext-agg.pem)'}]"
Configure the
secure-forward.conf
file on the external aggregator to accept messages securely from Fluentd.When configuring the external aggregator, it must be able to accept messages securely from Fluentd.
You can find further explanation of how to set up the inforward plugin and the out_forward plugin.
6.9. Configuring systemd-journald and rsyslog
Because Fluentd and rsyslog read from the journal, and the journal default settings are very low, journal entries can be lost because the journal cannot keep up with the logging rate from system services.
We recommend setting RateLimitInterval=1s
and RateLimitBurst=10000
(or even higher if necessary) to prevent the journal from losing entries.
6.9.1. Configuring systemd-journald for cluster logging
As you scale up your project, the default logging environment might need some adjustments.
For example, if you are missing logs, you might have to increase the rate limits for journald. You can adjust the number of messages to retain for a specified period of time to ensure that cluster logging does not use excessive resources without dropping logs.
You can also determine if you want the logs compressed, how long to retain logs, how or if the logs are stored, and other settings.
Procedure
Create a
journald.conf
file with the required settings:Compress=no 1 ForwardToConsole=yes 2 ForwardToSyslog=no 3 MaxRetentionSec=30s 4 RateLimitBurst=10000 5 RateLimitInterval=1s 6 Storage=volatile 7 SyncIntervalSec=1s 8 SystemMaxUse=8g 9 SystemKeepFree=20% 10 SystemMaxFileSize10M 11
- 1
- Specify whether you want logs compressed before they are written to the file system. Specify
yes
to compress the message orno
to not compress. The default isyes
. - 2 3
- Configure whether to forward log messages. Defaults to
no
for each. Specify:-
ForwardToConsole
to forward logs to the system console. -
ForwardToKsmg
to forward logs to the kernel log buffer. -
ForwardToSyslog
to forward to a syslog daemon. -
ForwardToWall
to forward messages as wall messages to all logged-in users.
-
- 4
- Specify the maximum time to store journal entries. Enter digits to specify seconds. Or include a unit: "year", "month", "week", "day", "h" or "m". Enter
0
to disable. The default is1month
. - 5 6
- Configure rate limiting. If, during the time interval defined by
RateLimitIntervalSec
, more logs than specified inRateLimitBurst
are received, all further messages within the interval are dropped until the interval is over. It is recommended to setRateLimitInterval=1s
andRateLimitBurst=10000
, which are the defaults. - 7
- Specify how logs are stored. The default is
persistent
:-
volatile
to store logs in memory in/var/log/journal/
. -
persistent
to store logs to disk in/var/log/journal/
. systemd creates the directory if it does not exist. -
auto
to store logs in in/var/log/journal/
if the directory exists. If it does not exist, systemd temporarily stores logs in/run/systemd/journal
. -
none
to not store logs. systemd drops all logs.
-
- 8
- Specify the timeout before synchronizing journal files to disk for ERR, WARNING, NOTICE, INFO, and DEBUG logs. systemd immediately syncs after receiving a CRIT, ALERT, or EMERG log. The default is
1s
. - 9
- Specify the maximum size the journal can use. The default is
8g
. - 10
- Specify how much disk space systemd must leave free. The default is
20%
. - 11
- Specify the maximum size for individual journal files stored persistently in
/var/log/journal
. The default is10M
.NoteIf you are removing the rate limit, you might see increased CPU utilization on the system logging daemons as it processes any messages that would have previously been throttled.
For more information on systemd settings, see https://www.freedesktop.org/software/systemd/man/journald.conf.html. The default settings listed on that page might not apply to OpenShift Container Platform.
Convert the
journal.conf
file to base64:$ export jrnl_cnf=$( cat /journald.conf | base64 -w0 )
Create a new MachineConfig for master or worker and add the
journal.conf
parameters:For example:
... config: storage: files: - contents: source: data:text/plain;charset=utf-8;base64,${jrnl_cnf} verification: {} filesystem: root mode: 0644 1 path: /etc/systemd/journald.conf 2 systemd: {}
Create the MachineConfig:
$ oc apply -f <filename>.yaml
The controller detects the new MachineConfig and generates a new
rendered-worker-<hash>
version.Monitor the status of the rollout of the new rendered configuration to each node:
$ oc describe machineconfigpool/worker Name: worker Namespace: Labels: machineconfiguration.openshift.io/mco-built-in= Annotations: <none> API Version: machineconfiguration.openshift.io/v1 Kind: MachineConfigPool ... Conditions: Message: Reason: All nodes are updating to rendered-worker-913514517bcea7c93bd446f4830bc64e