Home
Products
OpenShift Container Platform
3.11
Configuring Clusters
Chapter 36. Aggregating Container Logs

Chapter 36. Aggregating Container Logs

36.1. Overview
Copy link

As an OpenShift Container Platform cluster administrator, you can deploy the EFK stack to aggregate logs for a range of OpenShift Container Platform services. Application developers can view the logs of the projects for which they have view access. The EFK stack aggregates logs from hosts and applications, whether coming from multiple containers or even deleted pods.

The EFK stack is a modified version of the ELK stack and is comprised of:

Elasticsearch (ES): An object store where all logs are stored.
Fluentd: Gathers logs from nodes and feeds them to Elasticsearch.
Kibana: A web UI for Elasticsearch.

After deployment in a cluster, the stack aggregates logs from all nodes and projects into Elasticsearch, and provides a Kibana UI to view any logs. Cluster administrators can view all logs, but application developers can only view logs for projects they have permission to view. The stack components communicate securely.

Note

Managing Docker Container Logs discusses the use of json-file logging driver options to manage container logs and prevent filling node disks.

36.2. Pre-deployment Configuration
Copy link

An Ansible playbook is available to deploy and upgrade aggregated logging. You should familiarize yourself with the Installing Clusters guide. This provides information for preparing to use Ansible and includes information about configuration. Parameters are added to the Ansible inventory file to configure various areas of the EFK stack.
Review the sizing guidelines to determine how best to configure your deployment.
Ensure that you have deployed a router for the cluster.
Ensure that you have the necessary storage for Elasticsearch. Note that each Elasticsearch node requires its own storage volume. See Elasticsearch for more information.
Determine if you need highly-available Elasticsearch. A highly-available environment requires at least three Elasticsearch nodes, each on a different host. By default, OpenShift Container Platform creates one shard for each index and zero replicas of those shards. High availability also requires multiple replicas of each shard. To create high availability, set the openshift_logging_es_number_of_replicas Ansible variable to a value higher than 1. See Elasticsearch for more information.

36.3. Specifying Logging Ansible Variables
Copy link

You can override the default parameter values by specifying parameters for the EFK deployment in the inventory host file.

Read the Elasticsearch and the Fluentd sections before choosing parameters:

Note

By default, the Elasticsearch service uses port 9300 for TCP communication between nodes in a cluster.

Expand

Parameter	Description
`openshift_logging_install_logging`	Set to `true` to install logging. Set to `false` to uninstall logging. When set to `true`, you must specify a node selector using `openshift_logging_es_nodeselector`.
`openshift_logging_use_ops`	If set to `true`, configures a second Elasticsearch cluster and Kibana for operations logs. Fluentd splits logs between the main cluster and a cluster reserved for operations logs, which consists of the logs from the projects default, openshift, and openshift-infra, as well as Docker, OpenShift, and system logs from the journal. This means a second Elasticsearch cluster and Kibana are deployed. The deployments are distinguishable by the -ops suffix included in their names and have parallel deployment options listed below and described in Creating the Curator Configuration. If set to `true`, `openshift_logging_es_ops_nodeselector` is mandatory.
`openshift_logging_master_url`	The URL for the Kubernetes master, this does not need to be public facing but should be accessible from within the cluster. For example, https://<PRIVATE-MASTER-URL>:8443.
`openshift_logging_purge_logging`	The common uninstall keeps PVC to prevent unwanted data loss during reinstalls. To ensure that the Ansible playbook completely and irreversibly removes all logging persistent data including PVC, set `openshift_logging_install_logging` to `false` to trigger uninstallation and `openshift_logging_purge_logging` to `true`. The default is set to `false`.
`openshift_logging_install_eventrouter`	Coupled with `openshift_logging_install_logging`. When both are set to `true`, eventrouter will be installed. When both are `false`, eventrouter will be uninstalled.
`openshift_logging_eventrouter_image`	The image version for Eventrouter. For example: `registry.redhat.io/openshift3/ose-logging-eventrouter:v3.11`
`openshift_logging_eventrouter_image_version`	The image version for the logging eventrouter.
`openshift_logging_eventrouter_sink`	Select a sink for eventrouter, supported `stdout` and `glog`. The default is set to `stdout`.
`openshift_logging_eventrouter_nodeselector`	A map of labels, such as `"node":"infra"`,`"region":"west"`, to select the nodes where the pod will land.
`openshift_logging_eventrouter_replicas`	The default is set to `1`.
`openshift_logging_eventrouter_cpu_limit`	The minimum amount of CPU to allocate to eventrouter. The default is set to `100m`.
`openshift_logging_eventrouter_memory_limit`	The memory limit for eventrouter pods. The default is set to `128Mi`.
`openshift_logging_eventrouter_namespace`	The project where eventrouter is deployed. The default is set to `default`. Important Do not set the project to anything other than `default` or `openshift-*`. If you specify a different project, event information from the other project can leak into indices that are not restricted to operations users. To use a non-default project, create the project as usual using `oc new-project`.
`openshift_logging_image_pull_secret`	Specify the name of an existing pull secret to be used for pulling component images from an authenticated registry.
`openshift_logging_curator_image`	The image version for Curator. For example: `registry.redhat.io/openshift3/ose-logging-curator5:v3.11`
`openshift_logging_curator_default_days`	The default minimum age (in days) Curator uses for deleting log records.
`openshift_logging_curator_run_hour`	The hour of the day Curator will run.
`openshift_logging_curator_run_minute`	The minute of the hour Curator will run.
`openshift_logging_curator_run_timezone`	The timezone Curator uses for figuring out its run time. Provide the timezone as a string in the tzselect(8) or timedatectl(1) "Region/Locality" format, for example `America/New_York` or `UTC`.
`openshift_logging_curator_script_log_level`	The script log level for Curator.
`openshift_logging_curator_log_level`	The log level for the Curator process.
`openshift_logging_curator_cpu_limit`	The amount of CPU to allocate to Curator.
`openshift_logging_curator_memory_limit`	The amount of memory to allocate to Curator.
`openshift_logging_curator_nodeselector`	A node selector that specifies which nodes are eligible targets for deploying Curator instances.
`openshift_logging_curator_ops_cpu_limit`	Equivalent to `openshift_logging_curator_cpu_limit` for Ops cluster when `openshift_logging_use_ops` is set to `true`.
`openshift_logging_curator_ops_memory_limit`	Equivalent to `openshift_logging_curator_memory_limit` for Ops cluster when `openshift_logging_use_ops` is set to `true`.
`openshift_logging_curator_replace_configmap`	Set to `no` to prevent the upgrade from replacing the `logging-curator` ConfigMap. Set to `yes` to allow the ConfigMap to be overridden.
`openshift_logging_kibana_image`	The image version for Kibana. For example: `registry.redhat.io/openshift3/ose-logging-kibana5:v3.11`
`openshift_logging_kibana_hostname`	The external host name for web clients to reach Kibana.
`openshift_logging_kibana_cpu_limit`	The amount of CPU to allocate to Kibana.
`openshift_logging_kibana_memory_limit`	The amount of memory to allocate to Kibana.
`openshift_logging_kibana_proxy_image`	The image version for the Kibana proxy. For example: `registry.redhat.io/openshift3/oauth-proxy:v3.11`
`openshift_logging_kibana_proxy_debug`	When `true`, set the Kibana Proxy log level to `DEBUG`.
`openshift_logging_kibana_proxy_cpu_limit`	The amount of CPU to allocate to Kibana proxy.
`openshift_logging_kibana_proxy_memory_limit`	The amount of memory to allocate to Kibana proxy.
`openshift_logging_kibana_replica_count`	The number of nodes to which Kibana should be scaled up.
`openshift_logging_kibana_nodeselector`	A node selector that specifies which nodes are eligible targets for deploying Kibana instances.
`openshift_logging_kibana_env_vars`	A map of environment variables to add to the Kibana deployment configuration. For example, {"ELASTICSEARCH_REQUESTTIMEOUT":"30000"}.
`openshift_logging_kibana_key`	The public facing key to use when creating the Kibana route.
`openshift_logging_kibana_cert`	The cert that matches the key when creating the Kibana route.
`openshift_logging_kibana_ca`	Optional. The CA to goes with the key and cert used when creating the Kibana route.
`openshift_logging_kibana_ops_hostname`	Equivalent to `openshift_logging_kibana_hostname` for Ops cluster when `openshift_logging_use_ops` is set to `true`.
`openshift_logging_kibana_ops_cpu_limit`	Equivalent to `openshift_logging_kibana_cpu_limit` for Ops cluster when `openshift_logging_use_ops` is set to `true`.
`openshift_logging_kibana_ops_memory_limit`	Equivalent to `openshift_logging_kibana_memory_limit` for Ops cluster when `openshift_logging_use_ops` is set to `true`.
`openshift_logging_kibana_ops_proxy_debug`	Equivalent to `openshift_logging_kibana_proxy_debug` for Ops cluster when `openshift_logging_use_ops` is set to `true`.
`openshift_logging_kibana_ops_proxy_cpu_limit`	Equivalent to `openshift_logging_kibana_proxy_cpu_limit` for Ops cluster when `openshift_logging_use_ops` is set to `true`.
`openshift_logging_kibana_ops_proxy_memory_limit`	Equivalent to `openshift_logging_kibana_proxy_memory_limit` for Ops cluster when `openshift_logging_use_ops` is set to `true`.
`openshift_logging_kibana_ops_replica_count`	Equivalent to `openshift_logging_kibana_replica_count` for Ops cluster when `openshift_logging_use_ops` is set to `true`.
`openshift_logging_es_allow_external`	Set to `true` to expose Elasticsearch as a reencrypt route. Set to `false` by default.
`openshift_logging_es_hostname`	The external-facing hostname to use for the route and the TLS server certificate. The default is set to `es`. For example, if `openshift_master_default_subdomain` is set to `=example.test`, then the default value of `openshift_logging_es_hostname` will be `es.example.test`.
`openshift_logging_es_cert`	The location of the certificate Elasticsearch uses for the external TLS server cert. The default is a generated cert.
`openshift_logging_es_key`	The location of the key Elasticsearch uses for the external TLS server cert. The default is a generated key.
`openshift_logging_es_ca_ext`	The location of the CA cert Elasticsearch uses for the external TLS server cert. The default is the internal CA.
`openshift_logging_es_ops_allow_external`	Set to `true` to expose Elasticsearch as a reencrypt route. Set to `false` by default.
`openshift_logging_es_ops_hostname`	The external-facing hostname to use for the route and the TLS server certificate. The default is set to `es-ops`. For example, if `openshift_master_default_subdomain` is set to `=example.test`, then the default value of `openshift_logging_es_ops_hostname` will be `es-ops.example.test`.
`openshift_logging_es_ops_cert`	The location of the certificate Elasticsearch uses for the external TLS server cert. The default is a generated cert.
`openshift_logging_es_ops_key`	The location of the key Elasticsearch uses for the external TLS server cert. The default is a generated key.
`openshift_logging_es_ops_ca_ext`	The location of the CA cert Elasticsearch uses for the external TLS server cert. The default is the internal CA.
`openshift_logging_fluentd_image`	The image version for Fluentd. For example: `registry.redhat.io/openshift3/ose-logging-fluentd:v3.11`
`openshift_logging_fluentd_nodeselector`	A node selector that specifies which nodes are eligible targets for deploying Fluentd instances. Any node where Fluentd should run (typically, all) must have this label before Fluentd is able to run and collect logs. When scaling up the Aggregated Logging cluster after installation, the `openshift_logging` role labels nodes provided by `openshift_logging_fluentd_hosts` with this node selector. As part of the installation, it is recommended that you add the Fluentd node selector label to the list of persisted node labels.
`openshift_logging_fluentd_cpu_limit`	The CPU limit for Fluentd pods.
`openshift_logging_fluentd_memory_limit`	The memory limit for Fluentd pods.
`openshift_logging_fluentd_journal_read_from_head`	Set to `true` if Fluentd should read from the head of Journal when first starting up, using this may cause a delay in Elasticsearch receiving current log records.
`openshift_logging_fluentd_hosts`	List of nodes that should be labeled for Fluentd to be deployed. The default is to label all nodes with ['--all']. The null value is `openshift_logging_fluentd_hosts={}`. To spin up Fluentd pods update the daemonset’s `nodeSelector` to a valid label. For example, ['host1.example.com', 'host2.example.com'].
`openshift_logging_fluentd_audit_container_engine`	When `openshift_logging_fluentd_audit_container_engine` is set to `true`, the audit log of the container engine is collected and stored in ES. Enabling this variable allows the EFK to watch the specified audit log file or the default `/var/log/audit.log` file, collects audit information for the container engine for the platform, then puts it into Kibana.
`openshift_logging_fluentd_audit_file`	Location of audit log file. The default is `/var/log/audit/audit.log`. Enabling this variable allows the EFK to watch the specified audit log file or the default `/var/log/audit.log` file, collects audit information for the container engine for the platform, then puts it into Kibana.
`openshift_logging_fluentd_audit_pos_file`	Location of the Fluentd `in_tail` position file for the audit log file. The default is `/var/log/audit/audit.log.pos`. Enabling this variable allows the EFK to watch the specified audit log file or the default `/var/log/audit.log` file, collects audit information for the container engine for the platform, then puts it into Kibana.
`openshift_logging_fluentd_merge_json_log`	Set to `true` to enable processing of JSON logs embedded in the `log` or `MESSAGE` field of the record. The default is `true`.
`openshift_logging_fluentd_extra_keep_fields`	Specify a comma-separated list of fields that you do not want to be altered when processing the extra fields generated when using `openshift_logging_fluentd_merge_json_log`. Otherwise, Fluentd processes the fields according to the other undefined field settings below. The default is empty.
`openshift_logging_fluentd_keep_empty_fields`	Specify a list of comma-delimited fields to keep as empty fields when using `openshift_logging_fluentd_merge_json_log`. By default, Fluentd removes fields with empty values from the record, except for the `message` field.
`openshift_logging_fluentd_replace_configmap`	Set to `no` to prevent the upgrade from replacing the `logging-fluentd` ConfigMap. Set to `yes` to allow the ConfigMap to be overridden.
`openshift_logging_fluentd_use_undefined`	Set to `true` to move fields generated by `openshift_logging_fluentd_merge_json_log` into a sub-field named by the `openshift_logging_fluentd_undefined_name` parameter. By default, Fluentd keeps these at the top-level of the record, which can lead to Elasticsearch conflicts and schema errors.
`openshift_logging_fluentd_undefined_name`	Specify the name of the field to move undefined fields into when using `openshift_logging_fluentd_use_undefined`. The default is `undefined`.
`openshift_logging_fluentd_undefined_to_string`	Set to `true` to convert all undefined field values into their JSON string representation when using `openshift_logging_fluentd_merge_json_log`. The default is `false`.
`openshift_logging_fluentd_undefined_dot_replace_char`	Specify a character to replace any `.` characters in a field name, such as `_` when using `openshift_logging_fluentd_merge_json_log`. Undefined fields with a `.` character in the name causes problems with Elasticsearch. The default is `UNUSED` which means `.` in the field name is preserved.
`openshift_logging_fluentd_undefined_max_num_fields`	Specify a limit to the number of undefined fields when using `openshift_logging_fluentd_merge_json_log`. Logs can contain hundreds of undefined fields, which causes problems with Elasticsearch. If there are more than the specified number of fields, the fields will be converted into a JSON hash string and stored in the `openshift_logging_fluentd_undefined_name` field. The default value is `-1` which means an unlimited number of fields.
`openshift_logging_fluentd_use_multiline_json`	Set to `true` to force Fluentd to reconstruct any split log lines into a single line when using `openshift_logging_fluentd_merge_json_log`. With the `json-file` driver, Docker splits log lines at a size of 16k bytes. The default is `false`.
`openshift_logging_fluentd_use_multiline_journal`	Set to `true` to force Fluentd to reconstruct the split lines into a single line when using `openshift_logging_fluentd_merge_json_log`. With the `journald` driver, Docker splits log lines at a size of 16k bytes. The default is `false`.
`openshift_logging_es_host`	The name of the Elasticsearch service where Fluentd should send logs.
`openshift_logging_es_port`	The port for the Elasticsearch service where Fluentd should send logs.
`openshift_logging_es_ca`	The location of the CA Fluentd uses to communicate with `openshift_logging_es_host`.
`openshift_logging_es_client_cert`	The location of the client certificate Fluentd uses for `openshift_logging_es_host`.
`openshift_logging_es_client_key`	The location of the client key Fluentd uses for `openshift_logging_es_host`.
`openshift_logging_es_cluster_size`	Elasticsearch nodes to deploy. High availability requires three or more.
`openshift_logging_es_cpu_limit`	The amount of CPU limit for the Elasticsearch cluster.
`openshift_logging_es_memory_limit`	Amount of RAM to reserve per Elasticsearch instance. It must be at least 512M. Possible suffixes are G,g,M,m.
`openshift_logging_es_number_of_replicas`	The number of replicas per primary shard for each new index. Defaults to '0'. A minimum of `1` is advisable for production clusters. For a highly-available environment, set this value to `1` or higher and have at least three Elasticsearch nodes, each on a different host. If you change the number of replicas, the new value applies to the new indices only. The new number does not apply to existing indices. For information on how to change the number of replicas for the existing indices, see Changing the Number of Elasticsearch Replicas.
`openshift_logging_es_number_of_shards`	The number of primary shards for every new index created in ES. Defaults to `1`.
`openshift_logging_es_pv_selector`	A key/value map added to a PVC in order to select specific PVs.
`openshift_logging_es_pvc_dynamic`	To dynamically provision the backing storage, set the parameter value to `true`. When set to `true`, the storageClass spec is omitted from the PVC definition. When set to `false`, you must specify a value for the `openshift_logging_es_pvc_size` parameter. If you set a value for the `openshift_logging_es_pvc_storage_class_name` parameter, its value overrides the value of the `openshift_logging_es_pvc_dynamic` parameter.
`openshift_logging_es_pvc_storage_class_name`	To use a non-default storage class, specify the storage class name, such as `glusterprovisioner` or `cephrbdprovisioner`. After you specify the storage class name, dynamic volume provisioning is active regardless of the `openshift_logging_es_pvc_dynamic` value.
`openshift_logging_es_pvc_size`	Size of the persistent volume claim to create per Elasticsearch instance. For example, 100G. If omitted, no PVCs are created, and ephemeral volumes are used instead. If you set this parameter, the logging installer sets `openshift_logging_elasticsearch_storage_type` to `pvc`. If the `openshift_logging_es_pvc_dynamic` parameter has been set to `false`, you must set a value for this parameter. Read the description of `openshift_logging_es_pvc_prefix` for more information.
`openshift_logging_elasticsearch_image`	The image version for Elasticsearch. For example: `registry.redhat.io/openshift3/ose-logging-elasticsearch5:v3.11`
`openshift_logging_elasticsearch_storage_type`	Sets the Elasticsearch storage type. If you are using Persistent Elasticsearch Storage, the logging installer sets this to `pvc`.
`openshift_logging_es_pvc_prefix`	Prefix for the names of persistent volume claims to be used as storage for Elasticsearch nodes. A number is appended per node, such as logging-es-1. If they do not already exist, they are created with size `es-pvc-size`. When `openshift_logging_es_pvc_prefix` is set, and: `openshift_logging_es_pvc_dynamic`=`true`, the value for `openshift_logging_es_pvc_size` is optional. `openshift_logging_es_pvc_dynamic`=`false`, the value for `openshift_logging_es_pvc_size` must be set.
`openshift_logging_es_recover_after_time`	The amount of time Elasticsearch will wait before it tries to recover. Supported time units are seconds (s) or minutes (m).
`openshift_logging_es_storage_group`	Number of a supplemental group ID for access to Elasticsearch storage volumes. Backing volumes should allow access by this group ID.
`openshift_logging_es_nodeselector`	A node selector specified as a map that determines which nodes are eligible targets for deploying Elasticsearch nodes. Use this map to place these instances on nodes that are reserved or optimized for running them. For example, the selector could be `{"node-role.kubernetes.io/infra":"true"}`. At least one active node must have this label before Elasticsearch will deploy. This parameter is mandatory when installing logging.
`openshift_logging_es_ops_host`	Equivalent to `openshift_logging_es_host` for Ops cluster when `openshift_logging_use_ops` is set to `true`.
`openshift_logging_es_ops_port`	Equivalent to `openshift_logging_es_port` for Ops cluster when `openshift_logging_use_ops` is set to `true`.
`openshift_logging_es_ops_ca`	Equivalent to `openshift_logging_es_ca` for Ops cluster when `openshift_logging_use_ops` is set to `true`.
`openshift_logging_es_ops_client_cert`	Equivalent to `openshift_logging_es_client_cert` for Ops cluster when `openshift_logging_use_ops` is set to `true`.
`openshift_logging_es_ops_client_key`	Equivalent to `openshift_logging_es_client_key` for Ops cluster when `openshift_logging_use_ops` is set to `true`.
`openshift_logging_es_ops_cluster_size`	Equivalent to `openshift_logging_es_cluster_size` for Ops cluster when `openshift_logging_use_ops` is set to `true`.
`openshift_logging_es_ops_cpu_limit`	Equivalent to `openshift_logging_es_cpu_limit` for Ops cluster when `openshift_logging_use_ops` is set to `true`.
`openshift_logging_es_ops_memory_limit`	Equivalent to `openshift_logging_es_memory_limit` for Ops cluster when `openshift_logging_use_ops` is set to `true`.
`openshift_logging_es_ops_pv_selector`	Equivalent to `openshift_logging_es_pv_selector` for Ops cluster when `openshift_logging_use_ops` is set to `true`.
`openshift_logging_es_ops_pvc_dynamic`	Equivalent to `openshift_logging_es_pvc_dynamic` for Ops cluster when `openshift_logging_use_ops` is set to `true`.
`openshift_logging_es_ops_pvc_size`	Equivalent to `openshift_logging_es_pvc_size` for Ops cluster when `openshift_logging_use_ops` is set to `true`.
`openshift_logging_es_ops_pvc_prefix`	Equivalent to `openshift_logging_es_pvc_prefix` for Ops cluster when `openshift_logging_use_ops` is set to `true`.
`openshift_logging_es_ops_storage_group`	Equivalent to `openshift_logging_es_storage_group` for Ops cluster when `openshift_logging_use_ops` is set to `true`.
`openshift_logging_es_ops_nodeselector`	A node selector that specifies which nodes are eligible targets for deploying Elasticsearch nodes. This can be used to place these instances on nodes reserved or optimized for running them. For example, the selector could be `node-type=infrastructure`. At least one active node must have this label before Elasticsearch will deploy. This parameter is mandatory when `openshift_logging_use_ops` is set to `true`.
`openshift_logging_elasticsearch_kibana_index_mode`	The default value, `unique`, allows users to each have their own Kibana index. In this mode, their saved queries, visualizations, and dashboards are not shared. You may also set the value `shared_ops`. In this mode, all operations users share a Kibana index which allows each operations user to see the same queries, visualizations, and dashboards. To determine if you are an operations user: `#oc auth can-i view pod/logs -n default yes` Copy to Clipboard Toggle word wrap If you do not have appropriate access, contact your cluster administrator.
`openshift_logging_elasticsearch_poll_timeout_minutes`	Adjusts the time that the Ansible playbook waits for the Elasticsearch cluster to enter a green state after upgrading a given Elasticsearch node. Large shards, 50 GB or more, can take more than 60 minutes to initialize, causing the Ansible playbook to abort the upgrade procedure. The default is `60`.
`openshift_logging_kibana_ops_nodeselector`	A node selector that specifies which nodes are eligible targets for deploying Kibana instances.
`openshift_logging_curator_ops_nodeselector`	A node selector that specifies which nodes are eligible targets for deploying Curator instances.
`openshift_logging_elasticsearch_replace_configmap`	Set to `true` to replace your `logging-elasticsearch` ConfigMap with the current default values. Your current ConfigMap is saved to `logging-elasticsearch.old`, which you can use to copy customizations to the new ConfigMap. In some cases, using an older ConfigMap can cause the upgrade to fail. The default is set to `false`.

Custom Certificates

You can specify custom certificates using the following inventory variables instead of relying on those generated during the deployment process. These certificates are used to encrypt and secure communication between a user’s browser and Kibana. The security-related files will be generated if they are not supplied.

Expand

File Name	Description
`openshift_logging_kibana_cert`	A browser-facing certificate for the Kibana server.
`openshift_logging_kibana_key`	A key to be used with the browser-facing Kibana certificate.
`openshift_logging_kibana_ca`	The absolute path on the control node to the CA file to use for the browser facing Kibana certs.
`openshift_logging_kibana_ops_cert`	A browser-facing certificate for the Ops Kibana server.
`openshift_logging_kibana_ops_key`	A key to be used with the browser-facing Ops Kibana certificate.
`openshift_logging_kibana_ops_ca`	The absolute path on the control node to the CA file to use for the browser facing ops Kibana certs.

If you need to redeploy these certificates, see Redeploy EFK Certificates.

36.4. Deploying the EFK Stack
Copy link

The EFK stack is deployed using an Ansible playbook to the EFK components. Run the playbook from the default OpenShift Ansible location using the default inventory file.

cd /usr/share/ansible/openshift-ansible
ansible-playbook [-i </path/to/inventory>] \
    playbooks/openshift-logging/config.yml

$ cd /usr/share/ansible/openshift-ansible
$ ansible-playbook [-i </path/to/inventory>] \
    playbooks/openshift-logging/config.yml

Copy to Clipboard

Toggle word wrap

Running the playbook deploys all resources needed to support the stack; such as Secrets, ServiceAccounts, and DeploymentConfigs, deployed to the project openshift-logging. The playbook waits to deploy the component pods until the stack is running. If the wait steps fail, the deployment could still be successful; it may be retrieving the component images from the registry which can take up to a few minutes. You can watch the process with:

oc get pods -w

logging-curator-1541129400-l5h77           0/1       Running   0          11h
logging-es-data-master-ecu30lr4-1-deploy   0/1       Running   0          11h
logging-fluentd-2lgwn                      1/1       Running   0          11h
logging-fluentd-lmvms                      1/1       Running   0          11h
logging-fluentd-p9nd7                      1/1       Running   0          11h
logging-kibana-1-zk94k                     2/2       Running   0          11h

$ oc get pods -w

logging-curator-1541129400-l5h77           0/1       Running   0          11h


logging-es-data-master-ecu30lr4-1-deploy   0/1       Running   0          11h


logging-fluentd-2lgwn                      1/1       Running   0          11h


logging-fluentd-lmvms                      1/1       Running   0          11h
logging-fluentd-p9nd7                      1/1       Running   0          11h
logging-kibana-1-zk94k                     2/2       Running   0          11h

Copy to Clipboard

Toggle word wrap

1: The Curator pod. Only one pod is needed for Curator.
2: The Elasticsearch pod on this host.
3: The Fliuentd pods. There is one pod for each node in the cluster.
4: The Kibana pods.

You can use the `oc get pods -o wide command to see the nodes where the Fluentd pod are deployed:

oc get pods -o wide
NAME                                       READY     STATUS    RESTARTS   AGE       IP             NODE                         NOMINATED NODE
logging-es-data-master-5av030lk-1-2x494    2/2       Running   0          38m       154.128.0.80   ip-153-12-8-6.wef.internal   <none>
logging-fluentd-lqdxg                      1/1       Running   0          2m        154.128.0.85   ip-153-12-8-6.wef.internal   <none>
logging-kibana-1-gj5kc                     2/2       Running   0          39m       154.128.0.77   ip-153-12-8-6.wef.internal   <none>

$ oc get pods -o wide
NAME                                       READY     STATUS    RESTARTS   AGE       IP             NODE                         NOMINATED NODE
logging-es-data-master-5av030lk-1-2x494    2/2       Running   0          38m       154.128.0.80   ip-153-12-8-6.wef.internal   <none>
logging-fluentd-lqdxg                      1/1       Running   0          2m        154.128.0.85   ip-153-12-8-6.wef.internal   <none>
logging-kibana-1-gj5kc                     2/2       Running   0          39m       154.128.0.77   ip-153-12-8-6.wef.internal   <none>

Copy to Clipboard

Toggle word wrap

They will eventually enter Running status. For additional details about the status of the pods during deployment by retrieving associated events:

oc describe pods/<pod_name>

$ oc describe pods/<pod_name>

Copy to Clipboard

Toggle word wrap

Check the logs if the pods do not run successfully:

oc logs -f <pod_name>

$ oc logs -f <pod_name>

Copy to Clipboard

Toggle word wrap

36.5. Understanding and Adjusting the Deployment
Copy link

This section describes adjustments that you can make to deployed components.

36.5.1. Ops Cluster
Copy link

Note

The logs for the default, openshift, and openshift-infra projects are automatically aggregated and grouped into the .operations item in the Kibana interface.

The project where you have deployed the EFK stack (logging, as documented here) is not aggregated into .operations and is found under its ID.

If you set openshift_logging_use_ops to true in your inventory file, Fluentd is configured to split logs between the main Elasticsearch cluster and another cluster reserved for operations logs, which are defined as node system logs and the projects default, openshift, and openshift-infra. Therefore, a separate Elasticsearch cluster, a separate Kibana, and a separate Curator are deployed to index, access, and manage operations logs. These deployments are set apart with names that include -ops. Keep these separate deployments in mind if you enable this option. Most of the following discussion also applies to the operations cluster if present, just with the names changed to include -ops.

36.5.2. Elasticsearch
Copy link

Elasticsearch (ES) is an object store where all logs are stored.

Elasticsearch organizes the log data into datastores, each called an index. Elasticsearch subdivides each index into multiple pieces called shards, which it spreads across a set of Elasticsearch nodes in your cluster. You can configure Elasticsearch to make copies of the shards, called replicas. Elasticsearch also spreads replicas across the Elactisearch nodes. The combination of shards and replicas is intended to provide redundancy and resilience to failure. For example, if you configure three shards for the index with one replica, Elasticsearch generates a total of six shards for that index: three primary shards and three replicas as a backup.

The OpenShift Container Platform logging installer ensures each Elasticsearch node is deployed using a unique deployment configuration that includes its own storage volume. You can create an additional deployment configuration for each Elasticsearch node you add to the logging system. During installation, you can use the openshift_logging_es_cluster_size Ansible variable to specify the number of Elasticsearch nodes.

Alternatively, you can scale up your existing cluster by modifying the openshift_logging_es_cluster_size in the inventory file and re-running the logging playbook. Additional clustering parameters can be modified and are described in Specifying Logging Ansible Variables.

Refer to Elastic’s documentation for considerations involved in choosing storage and network location as directed below.

Note

A highly-available Elasticsearch environment requires at least three Elasticsearch nodes, each on a different host, and setting the openshift_logging_es_number_of_replicas Ansible variable to a value of 1 or higher to create replicas.

Viewing all Elasticsearch Deployments

To view all current Elasticsearch deployments:

oc get dc --selector logging-infra=elasticsearch

$ oc get dc --selector logging-infra=elasticsearch

Copy to Clipboard

Toggle word wrap

Configuring Elasticsearch for High Availability

Use the following scenarios as a guide for an OpenShift Container Platform cluster with three Elasticsearch nodes:

With openshift_logging_es_number_of_replicas set to 1, two nodes have a copy of all of the Elasticsearch data in the cluster. This ensures that if a node with Elasticsearch data goes down, another node has a copy of all of the Elasticsearch data in the cluster.
With openshift_logging_es_number_of_replicas set to 3, four nodes have a copy of all of the Elasticsearch data in the cluster. This ensures that if three nodes with Elasticsearch data go down, one node has a copy of all of the Elasticsearch data in the cluster.
In this scenario, with multiple Elasticsearch nodes going down, Elasticsearch status would be RED, and new Elasticsearch shards would not be allocated. However, because of the high availability, you do not lose your Elasticsearch data.

Note that there is a trade-off between high availability and performance. For example, having openshift_logging_es_number_of_replicas=2 and openshift_logging_es_number_of_shards=3 requires Elasticsearch to spend significant resources replicating the shard data among the nodes in the cluster. Also, using a higher number of replicas requires doubling or tripling the data storage requirements on each node, so you must take that into account when planning persistent storage for Elasticsearch.

Considerations when Configuring the Number of Shards

For the openshift_logging_es_number_of_shards parameter, consider:

For higher performance, increase the number of shards. For example, in a three node cluster, set openshift_logging_es_number_of_shards=3. This will cause each index to be split into three parts (shards), and the load for processing the index will be spread out over all 3 nodes.
If you have a large number of projects, you might see performance degradation if you have more than a few thousand shards in the cluster. Either reduce the number of shards or reduce the curation time.
If you have a small number of very large indices, you might want to configure openshift_logging_es_number_of_shards=3 or higher. Elasticsearch recommends using a maximum shard size of less than 50 GB.

Node Selector

Because Elasticsearch can use a lot of resources, all members of a cluster should have low latency network connections to each other and to any remote storage. Ensure this by directing the instances to dedicated nodes, or a dedicated region within your cluster, using a node selector.

To configure a node selector, specify the openshift_logging_es_nodeselector configuration option in the inventory file. This applies to all Elasticsearch deployments; if you need to individualize the node selectors, you must manually edit each deployment configuration after deployment. The node selector is specified as a python compatible dict. For example, {"node-type":"infra", "region":"east"}.

36.5.2.1. Persistent Elasticsearch Storage
Copy link

By default, the openshift_logging Ansible role creates an ephemeral deployment in which all data in a pod is lost upon pod restart.

For production environments, each Elasticsearch deployment configuration requires a persistent storage volume. You can specify an existing persistent volume claim or allow OpenShift Container Platform to create one.

Use existing PVCs. If you create your own PVCs for the deployment, OpenShift Container Platform uses those PVCs.
Name the PVCs to match the openshift_logging_es_pvc_prefix setting, which defaults to logging-es. Assign each PVC a name with a sequence number added to it: logging-es-0, logging-es-1, logging-es-2, and so on.

Allow OpenShift Container Platform to create a PVC. If a PVC for Elsaticsearch does not exist, OpenShift Container Platform creates the PVC based on parameters in the Ansible inventory file.

Expand

Parameter	Description
`openshift_logging_es_pvc_size`	Specify the size of the PVC request.
`openshift_logging_elasticsearch_storage_type`	Specify the storage type as `pvc`. Note This is an optional parameter. If you set the `openshift_logging_es_pvc_size` parameter to a value greater than 0, the logging installer automatically sets this parameter to `pvc` by default.
`openshift_logging_es_pvc_prefix`	Optionally, specify a custom prefix for the PVC.

For example:

openshift_logging_elasticsearch_storage_type=pvc
openshift_logging_es_pvc_size=104802308Ki
openshift_logging_es_pvc_prefix=es-logging

openshift_logging_elasticsearch_storage_type=pvc
openshift_logging_es_pvc_size=104802308Ki
openshift_logging_es_pvc_prefix=es-logging

Copy to Clipboard

Toggle word wrap

If using dynamically provisioned PVs, the OpenShift Container Platform logging installer creates PVCs that use the default storage class or the PVC specified with the openshift_logging_elasticsearch_pvc_storage_class_name parameter.

If using NFS storage, the OpenShift Container Platform installer creates the persistent volumes, based on the openshift_logging_storage_* parameters and the OpenShift Container Platform logging installer creates PVCs, using the openshift_logging_es_pvc_* parameters. Make sure you specify the correct parameters in order to use persistent volumes with EFK. Also set the openshift_enable_unsupported_configurations=true parameter in the Ansible inventory file, as the logging installer blocks the installation of NFS with core infrastructure by default.

Warning

Using NFS storage as a volume or a persistent volume, or using NAS such as Gluster, is not supported for Elasticsearch storage, as Lucene relies on file system behavior that NFS does not supply. Data corruption and other problems can occur.

If your environment requires NFS storage, use one of the following methods:

36.5.2.1.1. Using NFS as a persistent volume
Copy link

You can deploy NFS as an automatically provisioned persistent volume or using a predefined NFS volume.

For more information, see Sharing an NFS mount across two persistent volume claims to leverage shared storage for use by two separate containers.

Using automatically provisioned NFS

To use NFS as a persistent volume where NFS is automatically provisioned:

Add the following lines to the Ansible inventory file to create an NFS auto-provisioned storage class and dynamically provision the backing storage:
```
openshift_logging_es_pvc_storage_class_name=$nfsclass
openshift_logging_es_pvc_dynamic=true
```
```
openshift_logging_es_pvc_storage_class_name=$nfsclass
openshift_logging_es_pvc_dynamic=true
```
Copy to Clipboard Toggle word wrap

Use the following command to deploy the NFS volume using the logging playbook:

ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/openshift-logging/config.yml

ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/openshift-logging/config.yml

Copy to Clipboard

Toggle word wrap

Use the following steps to create a PVC:
1. Edit the Ansible inventory file to set the PVC size:
  openshift_logging_es_pvc_size=50Gi
  Copy to Clipboard Toggle word wrap
  Note
  The logging playbook selects a volume based on size and might use an unexpected volume if any other persistent volume has same size.
2. Use the following command to rerun the Ansible deploy_cluster.yml playbook:
  ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.yml
  Copy to Clipboard Toggle word wrap
  The installer playbook creates the NFS volume based on the openshift_logging_storage variables.

Using a predefined NFS volume

To deploy logging alongside the OpenShift Container Platform cluster using an existing NFS volume:

Edit the Ansible inventory file to configure the NFS volume and set the PVC size:

openshift_logging_storage_kind=nfs
openshift_logging_storage_access_modes=['ReadWriteOnce']
openshift_logging_storage_nfs_directory=/share 
openshift_logging_storage_nfs_options='*(rw,root_squash)' 
openshift_logging_storage_labels={'storage': 'logging'}
openshift_logging_elasticsearch_storage_type=pvc
openshift_logging_es_pvc_size=10Gi
openshift_logging_es_pvc_storage_class_name=''
openshift_logging_es_pvc_dynamic=true
openshift_logging_es_pvc_prefix=logging

openshift_logging_storage_kind=nfs
openshift_logging_storage_access_modes=['ReadWriteOnce']
openshift_logging_storage_nfs_directory=/share


openshift_logging_storage_nfs_options='*(rw,root_squash)'


openshift_logging_storage_labels={'storage': 'logging'}
openshift_logging_elasticsearch_storage_type=pvc
openshift_logging_es_pvc_size=10Gi
openshift_logging_es_pvc_storage_class_name=''
openshift_logging_es_pvc_dynamic=true
openshift_logging_es_pvc_prefix=logging

Copy to Clipboard

Toggle word wrap

1 2: These parameters work only with the /usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.yml installation playbook. The parameters will not work with the /usr/share/ansible/openshift-ansible/playbooks/openshift-logging/config.yml playbook.

Use the following command to redeploy the EFK stack:

ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.yml

ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.yml

Copy to Clipboard

Toggle word wrap

36.5.2.1.2. Using NFS as local storage
Copy link

You can allocate a large file on an NFS server and mount the file to the nodes. You can then use the file as a host path device.

mount -F nfs nfserver:/nfs/storage/elasticsearch-1 /usr/local/es-storage
chown 1000:1000 /usr/local/es-storage

$ mount -F nfs nfserver:/nfs/storage/elasticsearch-1 /usr/local/es-storage
$ chown 1000:1000 /usr/local/es-storage

Copy to Clipboard

Toggle word wrap

Then, use /usr/local/es-storage as a host-mount as described below. Use a different backing file as storage for each Elasticsearch node.

This loopback must be maintained manually outside of OpenShift Container Platform, on the node. You must not maintain it from inside a container.

It is possible to use a local disk volume (if available) on each node host as storage for an Elasticsearch replica. Doing so requires some preparation as follows.

The relevant service account must be given the privilege to mount and edit a local volume:
```
oc adm policy add-scc-to-user privileged  \
         system:serviceaccount:openshift-logging:aggregated-logging-elasticsearch
```
```
  $ oc adm policy add-scc-to-user privileged  \
         system:serviceaccount:openshift-logging:aggregated-logging-elasticsearch
```
Copy to Clipboard Toggle word wrap
Note
If you upgraded from an earlier version of OpenShift Container Platform, cluster logging might have been installed in the logging project. You should adjust the service account accordingly.

Each Elasticsearch node definition must be patched to claim that privilege, for example:

for dc in $(oc get deploymentconfig --selector component=es -o name); do
    oc scale $dc --replicas=0
    oc patch $dc \
       -p '{"spec":{"template":{"spec":{"containers":[{"name":"elasticsearch","securityContext":{"privileged": true}}]}}}}'
  done

$ for dc in $(oc get deploymentconfig --selector component=es -o name); do
    oc scale $dc --replicas=0
    oc patch $dc \
       -p '{"spec":{"template":{"spec":{"containers":[{"name":"elasticsearch","securityContext":{"privileged": true}}]}}}}'
  done

Copy to Clipboard

Toggle word wrap

The Elasticsearch replicas must be located on the correct nodes to use the local storage, and must not move around, even if those nodes are taken down for a period of time. This requires giving each Elasticsearch replica a node selector that is unique to a node where an administrator has allocated storage for it. To configure a node selector, edit each Elasticsearch deployment configuration, adding or editing the nodeSelector section to specify a unique label that you have applied for each desired node:

apiVersion: v1
kind: DeploymentConfig
spec:
  template:
    spec:
      nodeSelector:
        logging-es-node: "1"

apiVersion: v1
kind: DeploymentConfig
spec:
  template:
    spec:
      nodeSelector:
        logging-es-node: "1"

Copy to Clipboard

Toggle word wrap

1

This label must uniquely identify a replica with a single node that bears that label, in this case logging-es-node=1.

Create a node selector for each required node.
Use the oc label command to apply labels to as many nodes as needed.

For example, if your deployment has three infrastructure nodes, you could add labels for those nodes as follows:

oc label node <nodename1> logging-es-node=0
oc label node <nodename2> logging-es-node=1
oc label node <nodename3> logging-es-node=2

$ oc label node <nodename1> logging-es-node=0
$ oc label node <nodename2> logging-es-node=1
$ oc label node <nodename3> logging-es-node=2

Copy to Clipboard

Toggle word wrap

For information about adding a label to a node, see Updating Labels on Nodes.

To automate applying the node selector, you can instead use the oc patch command:

oc patch dc/logging-es-<suffix> \
   -p '{"spec":{"template":{"spec":{"nodeSelector":{"logging-es-node":"0"}}}}}'

$ oc patch dc/logging-es-<suffix> \
   -p '{"spec":{"template":{"spec":{"nodeSelector":{"logging-es-node":"0"}}}}}'

Copy to Clipboard

Toggle word wrap

Once you have completed these steps, you can apply a local host mount to each replica. The following example assumes storage is mounted at the same path on each node.

for dc in $(oc get deploymentconfig --selector component=es -o name); do
    oc set volume $dc \
          --add --overwrite --name=elasticsearch-storage \
          --type=hostPath --path=/usr/local/es-storage
    oc rollout latest $dc
    oc scale $dc --replicas=1
  done

$ for dc in $(oc get deploymentconfig --selector component=es -o name); do
    oc set volume $dc \
          --add --overwrite --name=elasticsearch-storage \
          --type=hostPath --path=/usr/local/es-storage
    oc rollout latest $dc
    oc scale $dc --replicas=1
  done

Copy to Clipboard

Toggle word wrap

36.5.2.1.3. Configuring hostPath storage for Elasticsearch
Copy link

You can provision OpenShift Container Platform clusters using hostPath storage for Elasticsearch.

To use a local disk volume on each node host as storage for an Elasticsearch replica:

Create a local mount point on each infrastructure node for the local Elasticsearch storage:
```
mkdir /usr/local/es-storage
```
```
$ mkdir /usr/local/es-storage
```
Copy to Clipboard Toggle word wrap
Create a filesystem on the Elasticsearch volume:
```
mkfs.ext4 /dev/xxx
```
```
$ mkfs.ext4 /dev/xxx
```
Copy to Clipboard Toggle word wrap
Mount the elasticsearch volume:
```
mount /dev/xxx /usr/local/es-storage
```
```
$ mount /dev/xxx /usr/local/es-storage
```
Copy to Clipboard Toggle word wrap
Add the following line to /etc/fstab:
```
/dev/xxx /usr/local/es-storage ext4
```
```
$ /dev/xxx /usr/local/es-storage ext4
```
Copy to Clipboard Toggle word wrap
Change ownership for the mount point:
```
chown 1000:1000 /usr/local/es-storage
```
```
$ chown 1000:1000 /usr/local/es-storage
```
Copy to Clipboard Toggle word wrap
Give the privilege to mount and edit a local volume to the relevant service account:
```
oc adm policy add-scc-to-user privileged  \
         system:serviceaccount:openshift-logging:aggregated-logging-elasticsearch
```
```
  $ oc adm policy add-scc-to-user privileged  \
         system:serviceaccount:openshift-logging:aggregated-logging-elasticsearch
```
Copy to Clipboard Toggle word wrap
Note
If you upgraded from an earlier version of OpenShift Container Platform, cluster logging might have been installed in the logging project. You should adjust the service account accordingly.

To claim that privilege, patch each Elasticsearch replica definition, as shown in the example, which specifies --selector component=es-ops for an Ops cluster:

for dc in $(oc get deploymentconfig --selector component=es -o name);
do
    oc scale $dc --replicas=0
    oc patch $dc \
       -p '{"spec":{"template":{"spec":{"containers":[{"name":"elasticsearch","securityContext":{"privileged":
true}}]}}}}'
done

  $ for dc in $(oc get deploymentconfig --selector component=es -o name);
do
    oc scale $dc --replicas=0
    oc patch $dc \
       -p '{"spec":{"template":{"spec":{"containers":[{"name":"elasticsearch","securityContext":{"privileged":
true}}]}}}}'
done

Copy to Clipboard

Toggle word wrap

Locate the Elasticsearch replicas on the correct nodes to use the local storage, and do not move them around, even if those nodes are taken down for a period of time. To specify the node location, give each Elasticsearch replica a node selector that is unique to a node where an administrator has allocated storage for it.
To configure a node selector, edit each Elasticsearch deployment configuration, adding or editing the nodeSelector section to specify a unique label that you have applied for each node you desire:
```
apiVersion: v1
kind: DeploymentConfig
spec:
  template:
    spec:
      nodeSelector:
        logging-es-node: "1"
```
```
apiVersion: v1
kind: DeploymentConfig
spec:
  template:
    spec:
      nodeSelector:
        logging-es-node: "1"
```
Copy to Clipboard Toggle word wrap
The label must uniquely identify a replica with a single node that bears that label, in this case logging-es-node=1.

Create a node selector for each required node. Use the oc label command to apply labels to as many nodes as needed.

For example, if your deployment has three infrastructure nodes, you could add labels for those nodes as follows:

oc label node <nodename1> logging-es-node=0
  $ oc label node <nodename2> logging-es-node=1
  $ oc label node <nodename3> logging-es-node=2

  $ oc label node <nodename1> logging-es-node=0
  $ oc label node <nodename2> logging-es-node=1
  $ oc label node <nodename3> logging-es-node=2

Copy to Clipboard

Toggle word wrap

To automate application of the node selector, use the oc patch command instead of the oc label command, as follows:

oc patch dc/logging-es-<suffix> \
     -p '{"spec":{"template":{"spec":{"nodeSelector":{"logging-es-node":"1"}}}}}'

  $ oc patch dc/logging-es-<suffix> \
     -p '{"spec":{"template":{"spec":{"nodeSelector":{"logging-es-node":"1"}}}}}'

Copy to Clipboard

Toggle word wrap

Once you have completed these steps, you can apply a local host mount to each replica. The following example assumes storage is mounted at the same path on each node, and specifies --selector component=es-ops for an Ops cluster.

for dc in $(oc get deploymentconfig --selector component=es -o name);
do
    oc set volume $dc \
          --add --overwrite --name=elasticsearch-storage \
          --type=hostPath --path=/usr/local/es-storage
    oc rollout latest $dc
    oc scale $dc --replicas=1
done

$ for dc in $(oc get deploymentconfig --selector component=es -o name);
do
    oc set volume $dc \
          --add --overwrite --name=elasticsearch-storage \
          --type=hostPath --path=/usr/local/es-storage
    oc rollout latest $dc
    oc scale $dc --replicas=1
done

Copy to Clipboard

Toggle word wrap

36.5.2.1.4. Changing the Scale of Elasticsearch
Copy link

If you need to scale up the number of Elasticsearch nodes in your cluster, you can create a deployment configuration for each Elasticsearch node you want to add.

Due to the nature of persistent volumes and how Elasticsearch is configured to store its data and recover the cluster, you cannot simply increase the nodes in an Elasticsearch deployment configuration.

The simplest way to change the scale of Elasticsearch is to modify the inventory host file and re-run the logging playbook as described previously. If you have supplied persistent storage for the deployment, this should not be disruptive.

Note

Resizing an Elasticsearch cluster using the logging playbook is only possible when the new openshift_logging_es_cluster_size value is higher than the current number of Elasticsearch nodes (scaled up) in the cluster.

36.5.2.1.5. Changing the Number of Elasticsearch Replicas
Copy link

You can change the number of Elasticsearch replicas by editing the openshift_logging_es_number_of_replicas value in the inventory host file and re-running the logging playbook as described previously.

The changes apply only to the new indices. Existing indices continue to use the previous number of replicas. For example, if you change the number of indices from 3 to 2, your cluster will use 2 replicas for new indices and 3 replicas for existing indices.

You can modify the replica count for the existing indices by running the following command:

oc exec -c elasticsearch $pod -- es_util --query=project.* -d '{"index":{"number_of_replicas":"2"}}'

$ oc exec -c elasticsearch $pod -- es_util --query=project.* -d '{"index":{"number_of_replicas":"2"}}'

Copy to Clipboard

Toggle word wrap

1: Specify the number of replicas you want for existing indices.

36.5.2.1.6. Expose Elasticsearch as a Route
Copy link

By default, Elasticsearch deployed with OpenShift aggregated logging is not accessible from outside the logging cluster. You can enable a route for external access to Elasticsearch for those tools that want to access its data.

You have access to Elasticsearch using your OpenShift token, and you can provide the external Elasticsearch and Elasticsearch Ops hostnames when creating the server certificate (similar to Kibana).

To access Elasticsearch as a reencrypt route, define the following variables:

openshift_logging_es_allow_external=True
openshift_logging_es_hostname=elasticsearch.example.com

openshift_logging_es_allow_external=True
openshift_logging_es_hostname=elasticsearch.example.com

Copy to Clipboard

Toggle word wrap

Change to the playbook directory and run the following Ansible playbook:

cd /usr/share/ansible/openshift-ansible
ansible-playbook [-i </path/to/inventory>] \
    playbooks/openshift-logging/config.yml

$ cd /usr/share/ansible/openshift-ansible
$ ansible-playbook [-i </path/to/inventory>] \
    playbooks/openshift-logging/config.yml

Copy to Clipboard

Toggle word wrap

To log in to Elasticsearch remotely, the request must contain three HTTP headers:

Authorization: Bearer $token
X-Proxy-Remote-User: $username
X-Forwarded-For: $ip_address

Authorization: Bearer $token
X-Proxy-Remote-User: $username
X-Forwarded-For: $ip_address

Copy to Clipboard

Toggle word wrap

You must have access to the project in order to be able to access to the logs. For example:

oc login <user1>
oc new-project <user1project>
oc new-app <httpd-example>

$ oc login <user1>
$ oc new-project <user1project>
$ oc new-app <httpd-example>

Copy to Clipboard

Toggle word wrap

You need to get the token of this ServiceAccount to be used in the request:
```
token=$(oc whoami -t)
```
```
$ token=$(oc whoami -t)
```
Copy to Clipboard Toggle word wrap

Using the token previously configured, you should be able access Elasticsearch through the exposed route:

curl -k -H "Authorization: Bearer $token" -H "X-Proxy-Remote-User: $(oc whoami)" -H "X-Forwarded-For: 127.0.0.1" https://es.example.test/project.my-project.*/_search?q=level:err | python -mjson.tool

$ curl -k -H "Authorization: Bearer $token" -H "X-Proxy-Remote-User: $(oc whoami)" -H "X-Forwarded-For: 127.0.0.1" https://es.example.test/project.my-project.*/_search?q=level:err | python -mjson.tool

Copy to Clipboard

Toggle word wrap

36.5.3. Fluentd
Copy link

Fluentd is deployed as a DaemonSet that deploys nodes according to a node label selector, which you can specify with the inventory parameter openshift_logging_fluentd_nodeselector and the default is logging-infra-fluentd. As part of the OpenShift cluster installation, it is recommended that you add the Fluentd node selector to the list of persisted node labels.

Fluentd uses journald as the system log source. These are log messages from the operating system, the container runtime, and OpenShift.

The available container runtimes provide minimal information to identify the source of log messages. Log collection and normalization of logs can occur after a pod is deleted and additional metadata cannot be retrieved from the API server, such as labels or annotations.

If a pod with a given name and namespace is deleted before the log collector finishes processing logs, there might not be a way to distinguish the log messages from a similarly named pod and namespace. This can cause logs to be indexed and annotated to an index that is not owned by the user who deployed the pod.

Important

The available container runtimes provide minimal information to identify the source of log messages and do not guarantee unique individual log messages or that these messages can be traced to their source.

Clean installations of OpenShift Container Platform 3.9 or later use json-file as the default log driver, but environments upgraded from OpenShift Container Platform 3.7 will maintain their existing journald log driver configuration. It is recommended to use the json-file log driver. See Changing the Aggregated Logging Driver for instructions to change your existing log driver configuration to json-file.

Viewing Fluentd Logs

How you view logs depends upon the LOGGING_FILE_PATH setting.

If LOGGING_FILE_PATH points to a file, use the logs utility to print out the contents of Fluentd log files:
```
oc exec <pod> -- logs 
```
```
oc exec <pod> -- logs 
```
1
Copy to Clipboard Toggle word wrap
1
Specify the name of the Fluentd pod. Note the space before logs.
For example:
```
oc exec logging-fluentd-lmvms -- logs
```
```
oc exec logging-fluentd-lmvms -- logs
```
Copy to Clipboard Toggle word wrap
The contents of log files are printed out, starting with the oldest log. Use -f option to follow what is being written into the logs.
If you are using LOGGING_FILE_PATH=console, Fluentd writes logs to its default location, /var/log/fluentd/fluentd.log. You can retrieve the logs with the oc logs -f <pod_name> command.
For example
```
oc logs -f fluentd.log
```
```
oc logs -f fluentd.log
```
Copy to Clipboard Toggle word wrap

Configuring Fluentd Log Location

Fluentd writes logs to a specified file, by default /var/log/fluentd/fluentd.log, or to the console, based on the LOGGING_FILE_PATH environment variable.

To change the default output location for the Fluentd logs, use the LOGGING_FILE_PATH parameter in the default inventory file. You can specify a particular file or use the Fluentd default location:

LOGGING_FILE_PATH=console 
LOGGING_FILE_PATH=<path-to-log/fluentd.log>

LOGGING_FILE_PATH=console


LOGGING_FILE_PATH=<path-to-log/fluentd.log>

Copy to Clipboard

Toggle word wrap

1: Sends the log output to the Fluentd default location. Retrieve the logs with the oc logs -f <pod_name> command.
2: Sends the log output to the specified file. Retrieve the logs with the oc exec <pod_name> — logs command.

After changing these parameters, re-run the logging installer playbook:

cd /usr/share/ansible/openshift-ansible
ansible-playbook [-i </path/to/inventory>] \
    playbooks/openshift-logging/config.yml

$ cd /usr/share/ansible/openshift-ansible
$ ansible-playbook [-i </path/to/inventory>] \
    playbooks/openshift-logging/config.yml

Copy to Clipboard

Toggle word wrap

Configuring Fluentd Log Rotation

When the current Fluentd log file reaches a specified size, OpenShift Container Platform automatically renames the fluentd.log log file so that new logging data can be collected. Log rotation is enabled by default.

The following example shows logs in a cluster where the maximum log size is 1Mb and four logs should be retained. When the fluentd.log reaches 1Mb, OpenShift Container Platform deletes the current fluentd.log.4, renames each of the Fluentd logs in turn, and creates a new fluentd.log.

fluentd.log     0b
fluentd.log.1  1Mb
fluentd.log.2  1Mb
fluentd.log.3  1Mb
fluentd.log.4  1Mb

fluentd.log     0b
fluentd.log.1  1Mb
fluentd.log.2  1Mb
fluentd.log.3  1Mb
fluentd.log.4  1Mb

Copy to Clipboard

Toggle word wrap

You can control the size of the Fluentd log files and how many of the renamed files that OpenShift Container Platform retains using environment variables.

Expand

Table 36.1. Parameters for configuring Fluentd log rotation
Parameter	Description
`LOGGING_FILE_SIZE`	The maximum size of a single Fluentd log file in Bytes. If the size of the flientd.log file exceeds this value, OpenShift Container Platform renames the fluentd.log.* files and creates a new fluentd.log. The default is 1024000 (1MB).
`LOGGING_FILE_AGE`	The number of logs that Fluentd retains before deleting. The default value is `10`.

For example:

oc set env ds/logging-fluentd LOGGING_FILE_AGE=30 LOGGING_FILE_SIZE=1024000"

$ oc set env ds/logging-fluentd LOGGING_FILE_AGE=30 LOGGING_FILE_SIZE=1024000"

Copy to Clipboard

Toggle word wrap

Turn off log rotation by setting LOGGING_FILE_PATH=console. This causes Fluentd to write logs to the Fluentd default location, /var/log/fluentd/fluentd.log, where you can retrieve them using the oc logs -f <pod_name> command.

oc set env ds/fluentd LOGGING_FILE_PATH=console

$ oc set env ds/fluentd LOGGING_FILE_PATH=console

Copy to Clipboard

Toggle word wrap

Disabling JSON parsing of logs with MERGE_JSON_LOG

By default, Fluentd determines if a log message is in JSON format and merges the message into the JSON payload document posted to Elasticsearch.

When using JSON parsing you might experience:

log loss due to Elasticsearch rejecting documents due to inconsistent type mappings;
buffer storage leaks caused by rejected message cycling;
overwritten data for fields with same names.

For information on how to mitigate some of these problems, see Configuring how the log collector normalizes logs.

You can disable JSON parsing to avoid these problems or if you do not need to parse JSON from your logs.

To disable JSON parsing:

Run the following command:
```
oc set env ds/logging-fluentd MERGE_JSON_LOG=false 
```
```
oc set env ds/logging-fluentd MERGE_JSON_LOG=false 
```
1
Copy to Clipboard Toggle word wrap
1
Set this to false to disable this feature or true to enable this feature.
To ensure this setting is applied each time you run Ansible, add openshift_logging_fluentd_merge_json_log="false" to your Ansible inventory.

Configuring how the log collector normalizes logs

Cluster Logging uses a specific data model, like a database schema, to store log records and their metadata in the logging store. There are some restrictions on the data:

There must be a "message" field containing the actual log message.
There must be a "@timestamp" field containing the log record timestamp in RFC 3339 format, preferably millisecond or better resolution.
There must be a "level" field with the log level, such as err, info, unknown, and so forth.

Note

For more information on the data model, see Exported Fields.

Because of these requirements, conflicts and inconsistencies can arise with log data collected from different subsystems.

For example, if you use the MERGE_JSON_LOG feature (MERGE_JSON_LOG=true), it can be extremely useful to have your applications log their output in JSON, and have the log collector automatically parse and index the data in Elasticsearch. However, this leads to several problems, including:

field names can be empty, or contain characters that are illegal in Elasticsearch;
different applications in the same namespace might output the same field name with a different value data type;
applications might emit too many fields;
fields may conflict with the cluster logging built-in fields.

You can configure how cluster logging treats fields from disparate sources by editing the Fluentd log collector daemonset and setting environment variables in the table below.

Undefined fields. Fields unknown to the ViaQ data model are called undefined. Log data from disparate systems can contain undefined fields. The data model requires all top-level fields to be defined and described.
Use the parameters to configure how OpenShift Container Platform moves any undefined fields under a top-level field called undefined to avoid conflicting with the well known top-level fields. You can add undefined fields to the top-level fields and move others to an undefined container.
You can also replace special characters in undefined fields and convert undefined fields to their JSON string representation. Converting to JSON string preserves the structure of the value, so that you can retrieve the value later and convert it back to a map or an array.
- Simple scalar values like numbers and booleans are changed to a quoted string. For example: 10 becomes "10", 3.1415 becomes "3.1415", false becomes "false".
- Map/dict values and array values are converted to their JSON string representation: "mapfield":{"key":"value"} becomes "mapfield":"{\"key\":\"value\"}" and "arrayfield":[1,2,"three"] becomes "arrayfield":"[1,2,\"three\"]".
Defined fields. Defined fields appear in the top levels of the logs. You can configure which fields are considered defined fields.
The default top-level fields, defined through the CDM_DEFAULT_KEEP_FIELDS parameter, are CEE, time, @timestamp, aushape, ci_job, collectd, docker, fedora-ci, file, foreman, geoip, hostname, ipaddr4, ipaddr6, kubernetes, level, message, namespace_name, namespace_uuid, offset, openstack, ovirt, pid, pipeline_metadata, service, systemd, tags, testcase, tlog, viaq_msg_id.
Any fields not included in ${CDM_DEFAULT_KEEP_FIELDS} or ${CDM_EXTRA_KEEP_FIELDS} are moved to ${CDM_UNDEFINED_NAME} if CDM_USE_UNDEFINED is true. See the table below for more information on these parameters.
Note
The CDM_DEFAULT_KEEP_FIELDS parameter is for only advanced users, or if you are instructed to do so by Red Hat support.
Empty fields. Empty fields have no data. You can determine which empty fields to retain from logs.

Expand

Table 36.2. Environment parameters for log normalization
Parameters	Definition	Example
`CDM_EXTRA_KEEP_FIELDS`	Specify an extra set of defined fields to be kept at the top level of the logs in addition to the `CDM_DEFAULT_KEEP_FIELDS`. The default is "".	`CDM_EXTRA_KEEP_FIELDS="broker"`
`CDM_KEEP_EMPTY_FIELDS`	Specify fields to retain in CSV format even if empty. Empty defined fields not specified are dropped. The default is "message", keep empty messages.	`CDM_KEEP_EMPTY_FIELDS="message"`
`CDM_USE_UNDEFINED`	Set to `true` to move undefined fields to the `undefined` top level field. The default is `false`. If `true`, values in `CDM_DEFAULT_KEEP_FIELDS` and `CDM_EXTRA_KEEP_FIELDS` are not moved to `undefined`.	`CDM_USE_UNDEFINED=true`
`CDM_UNDEFINED_NAME`	Specify a name for the undefined top level field if using `CDM_USE_UNDEFINED`. The default is`undefined`. Enabled only when `CDM_USE_UNDEFINED` is `true`.	`CDM_UNDEFINED_NAME="undef"`
`CDM_UNDEFINED_MAX_NUM_FIELDS`	If the number of undefined fields is greater than this number, all undefined fields are converted to their JSON string representation and stored in the `CDM_UNDEFINED_NAME` field. If the record contains more than this value of undefined fields, no further processing takes place on these fields. Instead, the fields will be converted to a single string JSON value, stored in the top-level `CDM_UNDEFINED_NAME` field. Keeping the default of `-1` allows for an unlimited number of undefined fields, which is not recommended. NOTE: This parameter is honored even if `CDM_USE_UNDEFINED` is false.	`CDM_UNDEFINED_MAX_NUM_FIELDS=4`
`CDM_UNDEFINED_TO_STRING`	Set to `true` to convert all undefined fields to their JSON string representation. The default is `false`.	`CDM_UNDEFINED_TO_STRING=true`
`CDM_UNDEFINED_DOT_REPLACE_CHAR`	Specify a character to use in place of a dot character '.' in an undefined field. `MERGE_JSON_LOG` must be `true`. The default is `UNUSED`. If you set the `MERGE_JSON_LOG` parameter to `true`, see the Note below.	`CDM_UNDEFINED_DOT_REPLACE_CHAR="_"`

Note

If you set the MERGE_JSON_LOG parameter in the Fluentd log collector daemonset and CDM_UNDEFINED_TO_STRING environment variables to true, you might receive an Elasticsearch 400 error. When MERGE_JSON_LOG=true, the log collector adds fields with data types other than string. If you set CDM_UNDEFINED_TO_STRING=true, the log collector attempts to add those fields as a string value resulting in the Elasticsearch 400 error. The error clears when the log collector rolls over the indices for the next day’s logs

When the log collector rolls over the indices, it creates a brand new index. The field definitions are updated and you will not get the 400 error. For more information, see Setting MERGE_JSON_LOG and CDM_UNDEFINED_TO_STRING.

To configure undefined and empty field processing, edit the logging-fluentd daemonset:

Configure how to process fields, as needed:
1. Specify the fields to move using CDM_EXTRA_KEEP_FIELDS.
2. Specify any empty fields to retain in the CDM_KEEP_EMPTY_FIELDS parameter in CSV format.
Configure how to process undefined fields, as needed:
1. Set CDM_USE_UNDEFINED to true to move undefined fields to the top-level undefined field:
2. Specify a name for the undefined fields using the CDM_UNDEFINED_NAME parameter.
3. Set CDM_UNDEFINED_MAX_NUM_FIELDS to a value other than the default -1, to set an upper bound on the number of undefined fields in a single record.
Specify CDM_UNDEFINED_DOT_REPLACE_CHAR to change any dot . characters in an undefined field name to another character. For example, if CDM_UNDEFINED_DOT_REPLACE_CHAR=@@@ and there is a field named foo.bar.baz the field is transformed into foo@@@bar@@@baz.
Set UNDEFINED_TO_STRING to true to convert undefined fields to their JSON string representation.

Note

If you configure the CDM_UNDEFINED_TO_STRING or CDM_UNDEFINED_MAX_NUM_FIELDS parameters, you use the CDM_UNDEFINED_NAME to change the undefined field name. This field is needed because CDM_UNDEFINED_TO_STRING or CDM_UNDEFINED_MAX_NUM_FIELDS could change the value type of the undefined field. When CDM_UNDEFINED_TO_STRING or CDM_UNDEFINED_MAX_NUM_FIELDS is set to true and there are more undefined fields in a log, the value type becomes string. Elasticsearch stops accepting records if the value type is changed, for example, from JSON to JSON string.

For example, when CDM_UNDEFINED_TO_STRING is false or CDM_UNDEFINED_MAX_NUM_FIELDS is the default, -1, the value type of the undefined field is json. If you change CDM_UNDEFINED_MAX_NUM_FIELDS to a value other than default and there are more undefined fields in a log, the value type becomes string (JSON string). Elasticsearch stops accepting records if the value type is changed.

Setting MERGE_JSON_LOG and CDM_UNDEFINED_TO_STRING

If you set the MERGE_JSON_LOG and CDM_UNDEFINED_TO_STRING environment variables to true, you might receive an Elasticsearch 400 error. When MERGE_JSON_LOG=true, the log collector adds fields with data types other than string. If you set CDM_UNDEFINED_TO_STRING=true, Fluentd attempts to add those fields as a string value resulting in the Elasticsearch 400 error. The error clears when the indices roll over for the next day.

When Fluentd rolls over the indices for the next day’s logs, it will create a brand new index. The field definitions are updated and you will not get the 400 error.

Records that have hard errors, such as schema violations, corrupted data, and so forth, cannot be retried. The log collector sends the records for error handling. If you add a <label @ERROR> section to your Fluentd config, as the last <label>, you can handle these records as needed.

For example:

data:
  fluent.conf:

....

    <label @ERROR>
      <match **>
        @type file
        path /var/log/fluent/dlq
        time_slice_format %Y%m%d
        time_slice_wait 10m
        time_format %Y%m%dT%H%M%S%z
        compress gzip
      </match>
    </label>

data:
  fluent.conf:

....

    <label @ERROR>
      <match **>
        @type file
        path /var/log/fluent/dlq
        time_slice_format %Y%m%d
        time_slice_wait 10m
        time_format %Y%m%dT%H%M%S%z
        compress gzip
      </match>
    </label>

Copy to Clipboard

Toggle word wrap

This section writes error records to the Elasticsearch dead letter queue (DLQ) file. See the fluentd documentation for more information about the file output.

Then you can edit the file to clean up the records manually, edit the file to use with the Elasticsearch /_bulk index API and use cURL to add those records. For more information on Elasticsearch Bulk API, see the Elasticsearch documentation.

Join Multi-line Docker Logs

You can configure Fluentd to reconstruct whole log records from Docker log partial fragments. With this feature active, Fluentd reads multi-line Docker logs, reconstructs them, and stores the logs as one record in Elasticsearch with no missing data.

However, because this feature can cause a performance regression, the feature is off by default and must be manually enabled.

The following Fluentd environment variables configure cluster logging to process multi-line Docker logs:

Expand

Parameter	Description
USE_MULTILINE_JSON	Set to `true` to process multi-line Docker logs when using the `json-file` log driver. This parameter is set to `false` by default.
USE_MULTILINE_JOURNAL	Set to `true` to process multi-line Docker logs when using the `journald` log driver, Fluentd reconstructs whole log records from the docker log partial fragments. This parameter is set to `false` by default.

You can use the following command to determine which log driver is being used:

docker info | grep -i log

$ docker info | grep -i log

Copy to Clipboard

Toggle word wrap

One of the following is output:

Logging Driver: json-file

Logging Driver: json-file

Copy to Clipboard

Toggle word wrap

Logging Driver: journald

Logging Driver: journald

Copy to Clipboard

Toggle word wrap

To turn on multi-line Docker logs processing:

Use the following command to enable the multiline Docker logs:
- For the json-file log driver:
  oc set env daemonset/logging-fluentd USE_MULTILINE_JSON=true
  Copy to Clipboard Toggle word wrap
- For the journald log driver:
  oc set env daemonset/logging-fluentd USE_MULTILINE_JOURNAL=true
  Copy to Clipboard Toggle word wrap
The Fluentd pods in the cluster restart.

Configuring Fluentd to Send Logs to an External Log Aggregator

You can configure Fluentd to send a copy of its logs to an external log aggregator, in addition to the default Elasticsearch, using the secure-forward plug-in. From there, you can further process log records after the locally hosted Fluentd has processed them.

Important

You cannot configure the secure_foward plug-in with a client certificate. Authentication can be run through SSL/TLS protocol but require the shared_key and the destination Fluentd to be configured with the secure_foward input plug-in.

The logging deployment provides a secure-forward.conf section in the Fluentd configmap for configuring the external aggregator:

<store>
@type secure_forward
self_hostname pod-${HOSTNAME}
shared_key thisisasharedkey
secure yes
enable_strict_verification yes
ca_cert_path /etc/fluent/keys/your_ca_cert
ca_private_key_path /etc/fluent/keys/your_private_key
ca_private_key_passphrase passphrase
<server>
  host ose1.example.com
  port 24284
</server>
<server>
  host ose2.example.com
  port 24284
  standby
</server>
<server>
  host ose3.example.com
  port 24284
  standby
</server>
</store>

<store>
@type secure_forward
self_hostname pod-${HOSTNAME}
shared_key thisisasharedkey
secure yes
enable_strict_verification yes
ca_cert_path /etc/fluent/keys/your_ca_cert
ca_private_key_path /etc/fluent/keys/your_private_key
ca_private_key_passphrase passphrase
<server>
  host ose1.example.com
  port 24284
</server>
<server>
  host ose2.example.com
  port 24284
  standby
</server>
<server>
  host ose3.example.com
  port 24284
  standby
</server>
</store>

Copy to Clipboard

Toggle word wrap

This can be updated using the oc edit command:

oc edit configmap/logging-fluentd

$ oc edit configmap/logging-fluentd

Copy to Clipboard

Toggle word wrap

Certificates to be used in secure-forward.conf can be added to the existing secret that is mounted on the Fluentd pods. The your_ca_cert and your_private_key values must match what is specified in secure-forward.conf in configmap/logging-fluentd:

oc patch secrets/logging-fluentd --type=json \
  --patch "[{'op':'add','path':'/data/your_ca_cert','value':'$(base64 -w 0 /path/to/your_ca_cert.pem)'}]"
oc patch secrets/logging-fluentd --type=json \
  --patch "[{'op':'add','path':'/data/your_private_key','value':'$(base64 -w 0 /path/to/your_private_key.pem)'}]"

$ oc patch secrets/logging-fluentd --type=json \
  --patch "[{'op':'add','path':'/data/your_ca_cert','value':'$(base64 -w 0 /path/to/your_ca_cert.pem)'}]"
$ oc patch secrets/logging-fluentd --type=json \
  --patch "[{'op':'add','path':'/data/your_private_key','value':'$(base64 -w 0 /path/to/your_private_key.pem)'}]"

Copy to Clipboard

Toggle word wrap

Note

Replace your_private_key with a generic name. This is a link to the JSON path, not a path on your host system.

When configuring the external aggregator, it must be able to accept messages securely from Fluentd.

If the external aggregator is another Fluentd server, it must have the fluent-plugin-secure-forward plug-in installed and make use of the input plug-in it provides:

<source>
  @type secure_forward

  self_hostname ${HOSTNAME}
  bind 0.0.0.0
  port 24284

  shared_key thisisasharedkey

  secure yes
  cert_path        /path/for/certificate/cert.pem
  private_key_path /path/for/certificate/key.pem
  private_key_passphrase secret_foo_bar_baz
</source>

<source>
  @type secure_forward

  self_hostname ${HOSTNAME}
  bind 0.0.0.0
  port 24284

  shared_key thisisasharedkey

  secure yes
  cert_path        /path/for/certificate/cert.pem
  private_key_path /path/for/certificate/key.pem
  private_key_passphrase secret_foo_bar_baz
</source>

Copy to Clipboard

Toggle word wrap

You can find further explanation of how to set up the fluent-plugin-secure-forward plug-in in the fluent-plugin-secure-forward repository.

Reducing the Number of Connections from Fluentd to the API Server

Important

mux is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs), might not be functionally complete, and Red Hat does not recommend to use them for production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information on Red Hat Technology Preview features support scope, see https://access.redhat.com/support/offerings/techpreview/.

mux is a Secure Forward listener service.

Expand

Parameter	Description
`openshift_logging_use_mux`	The default is set to `False`. If set to `True`, a service called `mux` is deployed. This service acts as a Fluentd `secure_forward` aggregator for the node agent Fluentd daemonsets running in the cluster. Use `openshift_logging_use_mux` to reduce the number of connections to the OpenShift API server, and configure each node in Fluentd to send raw logs to `mux` and turn off the Kubernetes metadata plug-in. This requires the use of `openshift_logging_mux_client_mode`.
`openshift_logging_mux_client_mode`	Values for `openshift_logging_mux_client_mode` are `minimal` and `maximal`, and there is no default. `openshift_logging_mux_client_mode` causes the Fluentd node agent to send logs to mux rather than directly to Elasticsearch. The value `maximal` means that Fluentd does as much processing as possible at the node before sending the records to `mux`. The `maximal` value is recommended for using `mux`. The value `minimal` means that Fluentd does no processing at all, and sends the raw logs to `mux` for processing. It is not recommended to use the `minimal` value.
`openshift_logging_mux_allow_external`	The default is set to `False`. If set to `True`, the `mux` service is deployed, and it is configured to allow Fluentd clients running outside of the cluster to send logs using `secure_forward`. This allows OpenShift logging to be used as a central logging service for clients other than OpenShift, or other OpenShift clusters.
`openshift_logging_mux_hostname`	The default is `mux` plus `openshift_master_default_subdomain`. This is the hostname `external_clients` will use to connect to `mux`, and is used in the TLS server cert subject.
`openshift_logging_mux_port`	24284
`openshift_logging_mux_cpu_limit`	500M
`openshift_logging_mux_memory_limit`	2Gi
`openshift_logging_mux_default_namespaces`	The default is `mux-undefined`. The first value in the list is the namespace to use for undefined projects, followed by any additional namespaces to create by default. Usually, you do not need to set this value.
`openshift_logging_mux_namespaces`	The default value is empty, allowing for additional namespaces to create for external `mux` clients to associate with their logs. You will need to set this value.

Throttling logs in Fluentd

For projects that are especially verbose, an administrator can throttle down the rate at which the logs are read in by Fluentd before being processed.

Warning

Throttling can contribute to log aggregation falling behind for the configured projects; log entries can be lost if a pod is deleted before Fluentd catches up.

Note

Throttling does not work when using the systemd journal as the log source. The throttling implementation depends on being able to throttle the reading of the individual log files for each project. When reading from the journal, there is only a single log source, no log files, so no file-based throttling is available. There is not a method of restricting the log entries that are read into the Fluentd process.

To tell Fluentd which projects it should be restricting, edit the throttle configuration in its ConfigMap after deployment:

oc edit configmap/logging-fluentd

$ oc edit configmap/logging-fluentd

Copy to Clipboard

Toggle word wrap

The format of the throttle-config.yaml key is a YAML file that contains project names and the desired rate at which logs are read in on each node. The default is 1000 lines at a time per node. For example:

Projects

project-name:
  read_lines_limit: 50

second-project-name:
  read_lines_limit: 100

project-name:
  read_lines_limit: 50

second-project-name:
  read_lines_limit: 100

Copy to Clipboard

Toggle word wrap

Logging

logging:
  read_lines_limit: 500

test-project:
  read_lines_limit: 10

.operations:
  read_lines_limit: 100

logging:
  read_lines_limit: 500

test-project:
  read_lines_limit: 10

.operations:
  read_lines_limit: 100

Copy to Clipboard

Toggle word wrap

To make changes to Fluentd, change the configuration and restart the Fluentd pods to apply the changes. To make changes to Elasticsearch, you must first scale down Fluentd and then scale down Elasticsearch to zero. After making your changes, scale Elasticsearch first and then scale Fluentd back to its original setting.

To scale Elasticsearch to zero:

oc scale --replicas=0 dc/<ELASTICSEARCH_DC>

$ oc scale --replicas=0 dc/<ELASTICSEARCH_DC>

Copy to Clipboard

Toggle word wrap

Change nodeSelector in the daemonset configuration to match zero:

Get the Fluentd node selector:

oc get ds logging-fluentd -o yaml |grep -A 1 Selector
     nodeSelector:
       logging-infra-fluentd: "true"

$ oc get ds logging-fluentd -o yaml |grep -A 1 Selector
     nodeSelector:
       logging-infra-fluentd: "true"

Copy to Clipboard

Toggle word wrap

Use the oc patch command to modify the daemonset nodeSelector:

oc patch ds logging-fluentd -p '{"spec":{"template":{"spec":{"nodeSelector":{"nonexistlabel":"true"}}}}}'

$ oc patch ds logging-fluentd -p '{"spec":{"template":{"spec":{"nodeSelector":{"nonexistlabel":"true"}}}}}'

Copy to Clipboard

Toggle word wrap

Get the Fluentd node selector:

oc get ds logging-fluentd -o yaml |grep -A 1 Selector
     nodeSelector:
       "nonexistlabel: "true"

$ oc get ds logging-fluentd -o yaml |grep -A 1 Selector
     nodeSelector:
       "nonexistlabel: "true"

Copy to Clipboard

Toggle word wrap

Scale Elasticsearch back up from zero:

oc scale --replicas=# dc/<ELASTICSEARCH_DC>

$ oc scale --replicas=# dc/<ELASTICSEARCH_DC>

Copy to Clipboard

Toggle word wrap

Change nodeSelector in the daemonset configuration back to logging-infra-fluentd: "true".

Use the oc patch command to modify the daemonset nodeSelector:

oc patch ds logging-fluentd -p '{"spec":{"template":{"spec":{"nodeSelector":{"logging-infra-fluentd":"true"}}}}}'

oc patch ds logging-fluentd -p '{"spec":{"template":{"spec":{"nodeSelector":{"logging-infra-fluentd":"true"}}}}}'

Copy to Clipboard

Toggle word wrap

Tune Buffer Chunk Limit

If Fluentd logger is unable to keep up with a high number of logs, it will need to switch to file buffering to reduce memory usage and prevent data loss.

The Fluentd buffer_chunk_limit is determined by the environment variable BUFFER_SIZE_LIMIT, which has the default value 8m. The file buffer size per output is determined by the environment variable FILE_BUFFER_LIMIT, which has the default value 256Mi. The permanent volume size must be larger than FILE_BUFFER_LIMIT multiplied by the output.

On the Fluentd and Mux pods, permanent volume /var/lib/fluentd should be prepared by the PVC or hostmount, for example. That area is then used for the file buffers.

The buffer_type and buffer_path are configured in the Fluentd configuration files as follows:

egrep "buffer_type|buffer_path" *.conf
output-es-config.conf:
  buffer_type file
  buffer_path `/var/lib/fluentd/buffer-output-es-config`
output-es-ops-config.conf:
  buffer_type file
  buffer_path `/var/lib/fluentd/buffer-output-es-ops-config`
filter-pre-mux-client.conf:
  buffer_type file
  buffer_path `/var/lib/fluentd/buffer-mux-client`

$ egrep "buffer_type|buffer_path" *.conf
output-es-config.conf:
  buffer_type file
  buffer_path `/var/lib/fluentd/buffer-output-es-config`
output-es-ops-config.conf:
  buffer_type file
  buffer_path `/var/lib/fluentd/buffer-output-es-ops-config`
filter-pre-mux-client.conf:
  buffer_type file
  buffer_path `/var/lib/fluentd/buffer-mux-client`

Copy to Clipboard

Toggle word wrap

The Fluentd buffer_queue_limit is the value of the variable BUFFER_QUEUE_LIMIT. This value is 32 by default.

The environment variable BUFFER_QUEUE_LIMIT is calculated as (FILE_BUFFER_LIMIT / (number_of_outputs * BUFFER_SIZE_LIMIT)).

If the BUFFER_QUEUE_LIMIT variable has the default set of values:

FILE_BUFFER_LIMIT = 256Mi
number_of_outputs = 1
BUFFER_SIZE_LIMIT = 8Mi

The value of buffer_queue_limit will be 32. To change the buffer_queue_limit, you need to change the value of FILE_BUFFER_LIMIT.

In this formula, number_of_outputs is 1 if all the logs are sent to a single resource, and it is incremented by 1 for each additional resource. For example, the value of number_of_outputs is:

1 - if all logs are sent to a single ElasticSearch pod
2 - if application logs are sent to an ElasticSearch pod and ops logs are sent to another ElasticSearch pod
4 - if application logs are sent to an ElasticSearch pod, ops logs are sent to another ElasticSearch pod, and both of them are forwarded to other Fluentd instances

36.5.4. Kibana
Copy link

To access the Kibana console from the OpenShift Container Platform web console, add the loggingPublicURL parameter in the master webconsole-config configmap file, with the URL of the Kibana console (the kibana-hostname parameter). The value must be an HTTPS URL:

...
clusterInfo:
  ...
  loggingPublicURL: "https://kibana.example.com"
...

...
clusterInfo:
  ...
  loggingPublicURL: "https://kibana.example.com"
...

Copy to Clipboard

Toggle word wrap

Setting the loggingPublicURL parameter creates a View Archive button on the OpenShift Container Platform web console under the Browse Pods <pod_name> Logs tab. This links to the Kibana console.

Note

You need to log in to the Kibana console when your valid login cookie expires, for example: you need to log in:

on the first use
after logging out

You can scale the Kibana deployment as usual for redundancy:

oc scale dc/logging-kibana --replicas=2

$ oc scale dc/logging-kibana --replicas=2

Copy to Clipboard

Toggle word wrap

Note

To ensure the scale persists across multiple executions of the logging playbook, make sure to update the openshift_logging_kibana_replica_count in the inventory file.

You can see the user interface by visiting the site specified by the openshift_logging_kibana_hostname variable.

See the Kibana documentation for more information on Kibana.

Kibana Visualize

Kibana Visualize enables you to create visualizations and dashboards for monitoring container and pod logs allows administrator users (cluster-admin or cluster-reader) to view logs by deployment, namespace, pod, and container.

Kibana Visualize exists inside the Elasticsearch and ES-OPS pod, and must be run inside those pods. To load dashboards and other Kibana UI objects, you must first log in to Kibana as the user you want to add the dashboards to, then log out. This will create the necessary per-user configuration that the next step relies on. Then, run:

oc exec <$espod> -- es_load_kibana_ui_objects <user-name>

$ oc exec <$espod> -- es_load_kibana_ui_objects <user-name>

Copy to Clipboard

Toggle word wrap

Where $espod is the name of any one of your Elasticsearch pods.

Adding Custom Fields to Kibana Visualize

If your OpenShift Container Platform cluster generates logs in JSON format that contain custom fields that are not defined in the Elasticsearch .operations.* or the project.* indices, you cannot create visualizations with these fields because the custom fields are not available in Kibana.

However, you can add the custom fields to the Elasticsearch indices, which allows you to add the fields to the Kibana index patterns for use in Kibana Visualize.

Note

The custom fields are applied to only the indices created after the template is updated.

To add custom fields to Kibana Visualize:

Add custom fields to an Elasticsearch index template:

Determine which Elasticsearch index you want to add the fields to, either the .operations.* or the project.* index. If there is a specific project that has the custom fields, you add the fields to a specific index for the project, for example: project.this-project-has-time-fields.*.

Create a JSON file for the custom fields, similar to the following:

For example:

{
	"order": 20,
	"mappings": {
		"_default_": {
			"properties": {
				"mytimefield1": { 
					"doc_values": true,
					"format": "yyyy-MM-dd HH:mm:ss,SSSZ||yyyy-MM-dd'T'HH:mm:ss.SSSSSSZ||yyyy-MM-dd'T'HH:mm:ssZ||dateOptionalTime",
					"index": "not_analyzed",
					"type": "date"
				},
				"mytimefield2": {
					"doc_values": true,
					"format": "yyyy-MM-dd HH:mm:ss,SSSZ||yyyy-MM-dd'T'HH:mm:ss.SSSSSSZ||yyyy-MM-dd'T'HH:mm:ssZ||dateOptionalTime",
					"index": "not_analyzed",
					"type": "date"
				}
			}
		}
	},
	"template": "project.<project-name>.*" 
}

{
	"order": 20,
	"mappings": {
		"_default_": {
			"properties": {
				"mytimefield1": {


					"doc_values": true,
					"format": "yyyy-MM-dd HH:mm:ss,SSSZ||yyyy-MM-dd'T'HH:mm:ss.SSSSSSZ||yyyy-MM-dd'T'HH:mm:ssZ||dateOptionalTime",
					"index": "not_analyzed",
					"type": "date"
				},
				"mytimefield2": {
					"doc_values": true,
					"format": "yyyy-MM-dd HH:mm:ss,SSSZ||yyyy-MM-dd'T'HH:mm:ss.SSSSSSZ||yyyy-MM-dd'T'HH:mm:ssZ||dateOptionalTime",
					"index": "not_analyzed",
					"type": "date"
				}
			}
		}
	},
	"template": "project.<project-name>.*"

Copy to Clipboard

Toggle word wrap

1: Add a custom field and parameters.
2: Specify the .operations.* or project.* index.

Change to the openshift-logging project:
```
oc project openshift-logging
```
```
$ oc project openshift-logging
```
Copy to Clipboard Toggle word wrap

Get the name of one of the Elasticsearch pods:

oc get -n logging pods -l component=es

NAME                                       READY     STATUS    RESTARTS   AGE       IP             NODE                         NOMINATED NODE
logging-es-data-master-5av030lk-1-2x494    2/2       Running   0          38m       154.128.0.80   ip-153-12-8-6.wef.internal   <none>

$ oc get -n logging pods -l component=es

NAME                                       READY     STATUS    RESTARTS   AGE       IP             NODE                         NOMINATED NODE
logging-es-data-master-5av030lk-1-2x494    2/2       Running   0          38m       154.128.0.80   ip-153-12-8-6.wef.internal   <none>

Copy to Clipboard

Toggle word wrap

Load the JSON file into the Elasticsearch pod:

cat <json-file-name> | \
oc exec -n logging -i -c elasticsearch <es-pod-name> -- \
    curl -s -k --cert /etc/elasticsearch/secret/admin-cert \
    --key /etc/elasticsearch/secret/admin-key \
    https://localhost:9200/_template/<json-file-name> -XPUT -d@- | \
python -mjson.tool

$ cat <json-file-name> | \


oc exec -n logging -i -c elasticsearch <es-pod-name> -- \


    curl -s -k --cert /etc/elasticsearch/secret/admin-cert \
    --key /etc/elasticsearch/secret/admin-key \
    https://localhost:9200/_template/<json-file-name> -XPUT -d@- | \


python -mjson.tool

Copy to Clipboard

Toggle word wrap

1 3: The name of the JSON file you created.
2: The name of the Elasticsearch pod.

{
    "acknowledged": true
}

{
    "acknowledged": true
}

Copy to Clipboard

Toggle word wrap

If you have a separate OPS cluster, get the name of one of the es-ops Elasticsearch pods:

oc get -n logging pods -l component=es-ops

NAME                                           READY     STATUS    RESTARTS   AGE       IP             NODE                         NOMINATED NODE
logging-es-ops-data-master-o7nhcbo4-5-b7stm    2/2       Running   0          38m       154.128.0.80   ip-153-12-8-6.wef.internal   <none>

$ oc get -n logging pods -l component=es-ops

NAME                                           READY     STATUS    RESTARTS   AGE       IP             NODE                         NOMINATED NODE
logging-es-ops-data-master-o7nhcbo4-5-b7stm    2/2       Running   0          38m       154.128.0.80   ip-153-12-8-6.wef.internal   <none>

Copy to Clipboard

Toggle word wrap

Load the JSON file into the es-ops Elasticsearch pod:

cat <json-file-name> | \
oc exec -n logging -i -c elasticsearch <esops-pod-name> -- \
    curl -s -k --cert /etc/elasticsearch/secret/admin-cert \
    --key /etc/elasticsearch/secret/admin-key \
    https://localhost:9200/_template/<json-file-name> -XPUT -d@- | \
python -mjson.tool

$ cat <json-file-name> | \


oc exec -n logging -i -c elasticsearch <esops-pod-name> -- \


    curl -s -k --cert /etc/elasticsearch/secret/admin-cert \
    --key /etc/elasticsearch/secret/admin-key \
    https://localhost:9200/_template/<json-file-name> -XPUT -d@- | \


python -mjson.tool

Copy to Clipboard

Toggle word wrap

1 3: The name of the JSON file you created.
2: The name of the OPS cluster Elasticsearch pod.

The output appears similar to the following:

{
    "acknowledged": true
}

{
    "acknowledged": true
}

Copy to Clipboard

Toggle word wrap

Verify that the indices are updated:

oc exec -n logging -i -c elasticsearch <es-pod-name> -- \ 
    curl -s -k --cert /etc/elasticsearch/secret/admin-cert \
    --key /etc/elasticsearch/secret/admin-key \
    https://localhost:9200/project.*/_search?sort=<custom-field>:desc | \ 
python -mjson.tool

oc exec -n logging -i -c elasticsearch <es-pod-name> -- \


    curl -s -k --cert /etc/elasticsearch/secret/admin-cert \
    --key /etc/elasticsearch/secret/admin-key \
    https://localhost:9200/project.*/_search?sort=<custom-field>:desc | \


python -mjson.tool

Copy to Clipboard

Toggle word wrap

1: The name of the Elasticsearch or OPS cluster Elasticsearch pod.
2: The name of a custom field you added.

The command outputs the index records for your custom fields sorted in descending order.

Note

The settings do not apply to existing indices. If you want to apply the settings to existing indices, perform a re-index.

Add the custom fields to Kibana:

Get the existing index pattern file from your Elasticsearch container:

mkdir index_patterns
cd index_patterns
oc project openshift-logging
for espod in $( oc get pods -l component=es -o jsonpath='{.items[*].metadata.name}' ) ; do
 for ff in $( oc exec -c elasticsearch <es-pod-name> -- ls /usr/share/elasticsearch/index_patterns ) ; do
   oc exec -c elasticsearch <es-pod-name> -- cat /usr/share/elasticsearch/index_patterns/$ff > $ff
 done
 break
done

$ mkdir index_patterns
$ cd index_patterns
$ oc project openshift-logging
$ for espod in $( oc get pods -l component=es -o jsonpath='{.items[*].metadata.name}' ) ; do
>  for ff in $( oc exec -c elasticsearch <es-pod-name> -- ls /usr/share/elasticsearch/index_patterns ) ; do
>    oc exec -c elasticsearch <es-pod-name> -- cat /usr/share/elasticsearch/index_patterns/$ff > $ff
>  done
>  break
> done

Copy to Clipboard

Toggle word wrap

The index pattern files are downloaded to the /usr/share/elasticsearch/index_patterns directory.

For example:

index_patterns $ ls

com.redhat.viaq-openshift.index-pattern.json

index_patterns $ ls

com.redhat.viaq-openshift.index-pattern.json

Copy to Clipboard

Toggle word wrap

Edit the corresponding index pattern files to add a definition for each custom field to the fields value:
For example:
```
{\"count\": 0, \"name\": \"mytimefield2\", \"searchable\": true, \"aggregatable\": true, \"readFromDocValues\": true, \"type\": \"date\", \"scripted\": false},
```
```
{\"count\": 0, \"name\": \"mytimefield2\", \"searchable\": true, \"aggregatable\": true, \"readFromDocValues\": true, \"type\": \"date\", \"scripted\": false},
```
Copy to Clipboard Toggle word wrap
The definition must contain the \"searchable\": true, and \"aggregatable\": true, parameters in order to be used in visualizations. The data type must correspond to the Elasticsearch field definition you added above. For example, if you added the myfield field in Elasticsearch that is a number type, you cannot add myfield to Kibana as a string type.
In the index pattern file, add the name of the Kibana index pattern to the index pattern files:
For example, to use the operations.\* index pattern:
```
"title": "*operations.*"
```
```
"title": "*operations.*"
```
Copy to Clipboard Toggle word wrap
To use the project.MYNAMESPACE.\* index pattern:
```
"title": "project.MYNAMESPACE.*"
```
```
"title": "project.MYNAMESPACE.*"
```
Copy to Clipboard Toggle word wrap
Identify the user name and get the hash value of the user name. The index patterns are stored using the hash of the user name. Run the following two commands in order:
```
get_hash() {
    printf "%s" "$1" | sha1sum | awk '{print $1}'
}
```
```
$ get_hash() {
>     printf "%s" "$1" | sha1sum | awk '{print $1}'
> }
```
Copy to Clipboard Toggle word wrap
```
get_hash admin

d0aeb5660fc2140aec35850c4da997
```
```
$ get_hash admin

d0aeb5660fc2140aec35850c4da997
```
Copy to Clipboard Toggle word wrap

Apply the index pattern file to Elasticsearch:

cat com.redhat.viaq-openshift.index-pattern.json | \ 
  oc exec -i -c elasticsearch <espod-name> -- es_util \
    --query=".kibana.<user-hash>/index-pattern/<index>" -XPUT --data-binary @- | \ 
  python -mjson.tool

cat com.redhat.viaq-openshift.index-pattern.json | \


  oc exec -i -c elasticsearch <espod-name> -- es_util \
    --query=".kibana.<user-hash>/index-pattern/<index>" -XPUT --data-binary @- | \


  python -mjson.tool

Copy to Clipboard

Toggle word wrap

1: The name of the index pattern file.
2: The user hash and the index, either .operations.* or project.*.

For example:

cat index-pattern.json | \
  oc exec -i -c elasticsearch mypod-23-gb9pl -- es_util \
    --query=".kibana.d0aeb5660fc2140aec35850c4da997/index-pattern/project.MYNAMESPACE.*" -XPUT --data-binary @- | \
  python -mjson.tool

cat index-pattern.json | \
  oc exec -i -c elasticsearch mypod-23-gb9pl -- es_util \
    --query=".kibana.d0aeb5660fc2140aec35850c4da997/index-pattern/project.MYNAMESPACE.*" -XPUT --data-binary @- | \
  python -mjson.tool

Copy to Clipboard

Toggle word wrap

The output appears similar to the following:

{
    "_id": ".operations.*",
    "_index": ".kibana.d0aeb5660fc2140aec35850c4da997",
    "_shards": {
        "failed": 0,
        "successful": 2,
        "total": 2
    },
    "_type": "index-pattern",
    "_version": 1,
    "created": true,
    "result": "created"
}

{
    "_id": ".operations.*",
    "_index": ".kibana.d0aeb5660fc2140aec35850c4da997",
    "_shards": {
        "failed": 0,
        "successful": 2,
        "total": 2
    },
    "_type": "index-pattern",
    "_version": 1,
    "created": true,
    "result": "created"
}

Copy to Clipboard

Toggle word wrap

Exit and restart the Kibana console for the custom fields to appear in the Available Fields list and in the fields list on the Management Index Patterns page.

36.5.5. Curator
Copy link

Curator allows administrators to configure scheduled Elasticsearch maintenance operations to be performed automatically on a per-project basis. It is scheduled to perform actions daily based on its configuration. Only one Curator pod is recommended per Elasticsearch cluster. Curator pods only run at the time stated in the cronjob and then the pod terminates upon completion. Curator is configured via a YAML configuration file with the following structure:

Note

The time zone is set based on the host node where the curator pod runs.

$PROJECT_NAME:
  $ACTION:
    $UNIT: $VALUE

$PROJECT_NAME:
  $ACTION:
    $UNIT: $VALUE
 ...

$PROJECT_NAME:
  $ACTION:
    $UNIT: $VALUE

$PROJECT_NAME:
  $ACTION:
    $UNIT: $VALUE
 ...

Copy to Clipboard

Toggle word wrap

The available parameters are:

Expand

Variable Name	Description
`PROJECT_NAME`	The actual name of a project, such as myapp-devel. For OpenShift Container Platform operations logs, use the name `.operations` as the project name.
`ACTION`	The action to take, currently only `delete` is allowed.
`UNIT`	One of `days`, `weeks`, or `months`.
`VALUE`	An integer for the number of units.
`.defaults`	Use `.defaults` as the `$PROJECT_NAME` to set the defaults for projects that are not specified.
`.regex`	The list of regular expressions that match project names.
`pattern`	The valid and properly escaped regular expression pattern enclosed by single quotation marks.

For example, to configure Curator to:

Delete indices in the myapp-dev project older than 1 day
Delete indices in the myapp-qe project older than 1 week
Delete operations logs older than 8 weeks
Delete all other projects indices after they are 31 days old
Delete indices older than 1 day that are matched by the '^project\..+\-dev.*$' regex
Delete indices older than 2 days that are matched by the '^project\..+\-test.*$' regex

Use:

config.yaml: |
  myapp-dev:
    delete:
      days: 1

  myapp-qe:
    delete:
      weeks: 1

  .operations:
    delete:
      weeks: 8

  .defaults:
    delete:
      days: 31

  .regex:
    - pattern: '^project\..+\-dev\..*$'
      delete:
        days: 1
    - pattern: '^project\..+\-test\..*$'
      delete:
        days: 2

config.yaml: |
  myapp-dev:
    delete:
      days: 1

  myapp-qe:
    delete:
      weeks: 1

  .operations:
    delete:
      weeks: 8

  .defaults:
    delete:
      days: 31

  .regex:
    - pattern: '^project\..+\-dev\..*$'
      delete:
        days: 1
    - pattern: '^project\..+\-test\..*$'
      delete:
        days: 2

Copy to Clipboard

Toggle word wrap

Important

When you use months as the $UNIT for an operation, Curator starts counting at the first day of the current month, not the current day of the current month. For example, if today is April 15, and you want to delete indices that are 2 months older than today (delete: months: 2), Curator does not delete indices that are dated older than February 15; it deletes indices older than February 1. That is, it goes back to the first day of the current month, then goes back two whole months from that date. If you want to be exact with Curator, it is best to use days (for example, delete: days: 30).

36.5.5.1. Using the Curator Actions File
Copy link

Setting the OpenShift Container Platform custom configuration file format ensures internal indices are not mistakenly deleted.

To use the actions file, add an exclude rule to your Curator configuration to retain these indices. You must manually add all of the required patterns.

actions.yaml: |
actions:

    action: delete_indices
    description: be careful!
    filters:
    - exclude: false
      kind: regex
      filtertype: pattern
      value: '^project\.myapp\..*$'
    - direction: older
      filtertype: age
      source: name
      timestring: '%Y.%m.%d'
      unit_count: 7
      unit: days
    options:
      continue_if_exception: false
      timeout_override: '300'
      ignore_empty_list: true

    action: delete_indices
    description: be careful!
    filters:
    - exclude: false
      kind: regex
      filtertype: pattern
      value: '^\.operations\..*$'
    - direction: older
      filtertype: age
      source: name
      timestring: '%Y.%m.%d'
      unit_count: 56
      unit: days
    options:
      continue_if_exception: false
      timeout_override: '300'
      ignore_empty_list: true

    action: delete_indices
    description: be careful!
    filters:
    - exclude: true
      kind: regex
      filtertype: pattern
      value: '^project\.myapp\..*$|^\.operations\..*$|^\.searchguard\..*$|^\.kibana$'
    - direction: older
      filtertype: age
      source: name
      timestring: '%Y.%m.%d'
      unit_count: 30
      unit: days
    options:
      continue_if_exception: false
      timeout_override: '300'
      ignore_empty_list: true

actions.yaml: |
actions:

    action: delete_indices
    description: be careful!
    filters:
    - exclude: false
      kind: regex
      filtertype: pattern
      value: '^project\.myapp\..*$'
    - direction: older
      filtertype: age
      source: name
      timestring: '%Y.%m.%d'
      unit_count: 7
      unit: days
    options:
      continue_if_exception: false
      timeout_override: '300'
      ignore_empty_list: true

    action: delete_indices
    description: be careful!
    filters:
    - exclude: false
      kind: regex
      filtertype: pattern
      value: '^\.operations\..*$'
    - direction: older
      filtertype: age
      source: name
      timestring: '%Y.%m.%d'
      unit_count: 56
      unit: days
    options:
      continue_if_exception: false
      timeout_override: '300'
      ignore_empty_list: true

    action: delete_indices
    description: be careful!
    filters:
    - exclude: true
      kind: regex
      filtertype: pattern
      value: '^project\.myapp\..*$|^\.operations\..*$|^\.searchguard\..*$|^\.kibana$'
    - direction: older
      filtertype: age
      source: name
      timestring: '%Y.%m.%d'
      unit_count: 30
      unit: days
    options:
      continue_if_exception: false
      timeout_override: '300'
      ignore_empty_list: true

Copy to Clipboard

Toggle word wrap

36.5.5.2. Creating the Curator Configuration
Copy link

The openshift_logging Ansible role provides a ConfigMap from which Curator reads its configuration. You may edit or replace this ConfigMap to reconfigure Curator. Currently the logging-curator ConfigMap is used to configure both your ops and non-ops Curator instances. Any .operations configurations are in the same location as your application logs configurations.

To create the Curator configuration, edit the configuration in the deployed ConfigMap:

oc edit configmap/logging-curator

$ oc edit configmap/logging-curator

Copy to Clipboard

Toggle word wrap

Or, manually create the jobs from a cronjob:

oc create job --from=cronjob/logging-curator <job_name>

oc create job --from=cronjob/logging-curator <job_name>

Copy to Clipboard

Toggle word wrap

For scripted deployments, copy the configuration file that was created by the installer and create your new OpenShift Container Platform custom configuration:

oc extract configmap/logging-curator --keys=curator5.yaml,config.yaml --to=/my/config
  edit /my/config/curator5.yaml
  edit /my/config/config.yaml
oc delete configmap logging-curator ; sleep 1
oc create configmap logging-curator \
    --from-file=curator5.yaml=/my/config/curator5.yaml \
    --from-file=config.yaml=/my/config/config.yaml \
    ; sleep 1

$ oc extract configmap/logging-curator --keys=curator5.yaml,config.yaml --to=/my/config
  edit /my/config/curator5.yaml
  edit /my/config/config.yaml
$ oc delete configmap logging-curator ; sleep 1
$ oc create configmap logging-curator \
    --from-file=curator5.yaml=/my/config/curator5.yaml \
    --from-file=config.yaml=/my/config/config.yaml \
    ; sleep 1

Copy to Clipboard

Toggle word wrap

Alternatively, if you are using the actions file:

oc extract configmap/logging-curator --keys=curator5.yaml,actions.yaml --to=/my/config
  edit /my/config/curator5.yaml
  edit /my/config/actions.yaml
oc delete configmap logging-curator ; sleep 1
oc create configmap logging-curator \
    --from-file=curator5.yaml=/my/config/curator5.yaml \
    --from-file=actions.yaml=/my/config/actions.yaml \
    ; sleep 1

$ oc extract configmap/logging-curator --keys=curator5.yaml,actions.yaml --to=/my/config
  edit /my/config/curator5.yaml
  edit /my/config/actions.yaml
$ oc delete configmap logging-curator ; sleep 1
$ oc create configmap logging-curator \
    --from-file=curator5.yaml=/my/config/curator5.yaml \
    --from-file=actions.yaml=/my/config/actions.yaml \
    ; sleep 1

Copy to Clipboard

Toggle word wrap

The next scheduled job uses this configuration.

You can use the following commands to control the cronjob:

suspend cronjob
oc patch cronjob logging-curator -p '{"spec":{"suspend":true}}'

resume cronjob
oc patch cronjob logging-curator -p '{"spec":{"suspend":false}}

change cronjob schedule
oc patch cronjob logging-curator -p '{"spec":{"schedule":"0 0 * * *"}}'

# suspend cronjob
oc patch cronjob logging-curator -p '{"spec":{"suspend":true}}'

# resume cronjob
oc patch cronjob logging-curator -p '{"spec":{"suspend":false}}

# change cronjob schedule
oc patch cronjob logging-curator -p '{"spec":{"schedule":"0 0 * * *"}}'

Copy to Clipboard

Toggle word wrap

1: The schedule option accepts schedules in cron format.

36.6. Cleanup
Copy link

Remove everything generated during the deployment.

cd /usr/share/ansible/openshift-ansible
ansible-playbook [-i </path/to/inventory>] \
    playbooks/openshift-logging/config.yml \
    -e openshift_logging_install_logging=False

$ cd /usr/share/ansible/openshift-ansible
$ ansible-playbook [-i </path/to/inventory>] \
    playbooks/openshift-logging/config.yml \
    -e openshift_logging_install_logging=False

Copy to Clipboard

Toggle word wrap

36.7. Sending Logs to an External Elasticsearch Instance
Copy link

Fluentd sends logs to the value of the ES_HOST, ES_PORT, OPS_HOST, and OPS_PORT environment variables of the Elasticsearch deployment configuration. The application logs are directed to the ES_HOST destination, and operations logs to OPS_HOST.

Note

Sending logs directly to an AWS Elasticsearch instance is not supported. Use Fluentd Secure Forward to direct logs to an instance of Fluentd that you control and that is configured with the fluent-plugin-aws-elasticsearch-service plug-in.

To direct logs to a specific Elasticsearch instance, edit the deployment configuration and replace the value of the above variables with the desired instance:

oc edit ds/<daemon_set>

$ oc edit ds/<daemon_set>

Copy to Clipboard

Toggle word wrap

For an external Elasticsearch instance to contain both application and operations logs, you can set ES_HOST and OPS_HOST to the same destination, while ensuring that ES_PORT and OPS_PORT also have the same value.

Only Mutual TLS configuration is supported, as the provided Elasticsearch instance does. Patch or recreate the logging-fluentd secret with your client key, client cert, and CA.

Note

If you are not using the provided Kibana and Elasticsearch images, you will not have the same multi-tenant capabilities and your data will not be restricted by user access to a particular project.

36.8. Sending Logs to an External Syslog Server
Copy link

Use the fluent-plugin-remote-syslog plug-in on the host to send logs to an external syslog server.

Set environment variables in the logging-fluentd or logging-mux daemonsets:

- name: REMOTE_SYSLOG_HOST 
  value: host1
- name: REMOTE_SYSLOG_HOST_BACKUP
  value: host2
- name: REMOTE_SYSLOG_PORT_BACKUP
  value: 5555

- name: REMOTE_SYSLOG_HOST


  value: host1
- name: REMOTE_SYSLOG_HOST_BACKUP
  value: host2
- name: REMOTE_SYSLOG_PORT_BACKUP
  value: 5555

Copy to Clipboard

Toggle word wrap

1: The desired remote syslog host. Required for each host.

This will build two destinations. The syslog server on host1 will be receiving messages on the default port of 514, while host2 will be receiving the same messages on port 5555.

Alternatively, you can configure your own custom fluent.conf in the logging-fluentd or logging-mux ConfigMaps.

Fluentd Environment Variables

Expand

Parameter	Description
`USE_REMOTE_SYSLOG`	Defaults to `false`. Set to `true` to enable use of the `fluent-plugin-remote-syslog` gem
`REMOTE_SYSLOG_HOST`	(Required) Hostname or IP address of the remote syslog server.
`REMOTE_SYSLOG_PORT`	Port number to connect on. Defaults to `514`.
`REMOTE_SYSLOG_SEVERITY`	Set the syslog severity level. Defaults to `debug`.
`REMOTE_SYSLOG_FACILITY`	Set the syslog facility. Defaults to `local0`.
`REMOTE_SYSLOG_USE_RECORD`	Defaults to `false`. Set to `true` to use the record’s severity and facility fields to set on the syslog message.
`REMOTE_SYSLOG_REMOVE_TAG_PREFIX`	Removes the prefix from the tag, defaults to `''` (empty).
`REMOTE_SYSLOG_TAG_KEY`	If specified, uses this field as the key to look on the record, to set the tag on the syslog message.
`REMOTE_SYSLOG_PAYLOAD_KEY`	If specified, uses this field as the key to look on the record, to set the payload on the syslog message.
`REMOTE_SYSLOG_TYPE`	Set the transport layer protocol type. Defaults to `syslog_buffered`, which sets the TCP protocol. To switch to UDP, set this to `syslog`.

Warning

This implementation is insecure, and should only be used in environments where you can guarantee no snooping on the connection.

Fluentd Logging Ansible Variables

Expand

Parameter	Description
`openshift_logging_fluentd_remote_syslog`	The default is set to `false`. Set to `true` to enable use of the fluent-plugin-remote-syslog gem.
`openshift_logging_fluentd_remote_syslog_host`	Hostname or IP address of the remote syslog server, this is mandatory.
`openshift_logging_fluentd_remote_syslog_port`	Port number to connect on, defaults to `514`.
`openshift_logging_fluentd_remote_syslog_severity`	Set the syslog severity level, defaults to `debug`.
`openshift_logging_fluentd_remote_syslog_facility`	Set the syslog facility, defaults to `local0`.
`openshift_logging_fluentd_remote_syslog_use_record`	The default is set to `false`. Set to `true` to use the record’s severity and facility fields to set on the syslog message.
`openshift_logging_fluentd_remote_syslog_remove_tag_prefix`	Removes the prefix from the tag, defaults to `''` (empty).
`openshift_logging_fluentd_remote_syslog_tag_key`	If string is specified, uses this field as the key to look on the record, to set the tag on the syslog message.
`openshift_logging_fluentd_remote_syslog_payload_key`	If string is specified, uses this field as the key to look on the record, to set the payload on the syslog message.

Mux Logging Ansible Variables

Expand

Parameter	Description
`openshift_logging_mux_remote_syslog`	The default is set to `false`. Set to `true` to enable use of the fluent-plugin-remote-syslog gem.
`openshift_logging_mux_remote_syslog_host`	Hostname or IP address of the remote syslog server, this is mandatory.
`openshift_logging_mux_remote_syslog_port`	Port number to connect on, defaults to `514`.
`openshift_logging_mux_remote_syslog_severity`	Set the syslog severity level, defaults to `debug`.
`openshift_logging_mux_remote_syslog_facility`	Set the syslog facility, defaults to `local0`.
`openshift_logging_mux_remote_syslog_use_record`	The default is set to `false`. Set to `true` to use the record’s severity and facility fields to set on the syslog message.
`openshift_logging_mux_remote_syslog_remove_tag_prefix`	Removes the prefix from the tag, defaults to `''` (empty).
`openshift_logging_mux_remote_syslog_tag_key`	If string is specified, uses this field as the key to look on the record, to set the tag on the syslog message.
`openshift_logging_mux_remote_syslog_payload_key`	If string is specified, uses this field as the key to look on the record, to set the payload on the syslog message.

36.9. Performing Administrative Elasticsearch Operations
Copy link

As of logging version 3.2.0, an administrator certificate, key, and CA that can be used to communicate with and perform administrative operations on Elasticsearch are provided within the logging-elasticsearch secret.

Note

To confirm whether or not your EFK installation provides these, run:

oc describe secret logging-elasticsearch

$ oc describe secret logging-elasticsearch

Copy to Clipboard

Toggle word wrap

Connect to an Elasticsearch pod that is in the cluster on which you are attempting to perform maintenance.

To find a pod in a cluster use either:

oc get pods -l component=es -o name | head -1
oc get pods -l component=es-ops -o name | head -1

$ oc get pods -l component=es -o name | head -1
$ oc get pods -l component=es-ops -o name | head -1

Copy to Clipboard

Toggle word wrap

Connect to a pod:
```
oc rsh <your_Elasticsearch_pod>
```
```
$ oc rsh <your_Elasticsearch_pod>
```
Copy to Clipboard Toggle word wrap
Once connected to an Elasticsearch container, you can use the certificates mounted from the secret to communicate with Elasticsearch per its Indices APIs documentation.
Fluentd sends its logs to Elasticsearch using the index format project.{project_name}.{project_uuid}.YYYY.MM.DD where YYYY.MM.DD is the date of the log record.
For example, to delete all logs for the openshift-logging project with uuid 3b3594fa-2ccd-11e6-acb7-0eb6b35eaee3 from June 15, 2016, we can run:
```
curl --key /etc/elasticsearch/secret/admin-key \
  --cert /etc/elasticsearch/secret/admin-cert \
  --cacert /etc/elasticsearch/secret/admin-ca -XDELETE \
  "https://localhost:9200/project.logging.3b3594fa-2ccd-11e6-acb7-0eb6b35eaee3.2016.06.15"
```
```
$ curl --key /etc/elasticsearch/secret/admin-key \
  --cert /etc/elasticsearch/secret/admin-cert \
  --cacert /etc/elasticsearch/secret/admin-ca -XDELETE \
  "https://localhost:9200/project.logging.3b3594fa-2ccd-11e6-acb7-0eb6b35eaee3.2016.06.15"
```
Copy to Clipboard Toggle word wrap

36.10. Redeploying EFK Certificates
Copy link

You can use an Ansible playbook to perform a certificate rotation for the EFK stack without needing to run the install/upgrade playbook.

This playbook deletes the current certificate files, generates new EFK certificates, updates certificate secrets, and restarts Kibana and Elasticsearch to force those components to read in the updated certificates.

To redeploy EFK certificates:

Use the Ansible playbook to redeploy the EFK certificates:

cd /usr/share/ansible/openshift-ansible
ansible-playbook playbooks/openshift-logging/redeploy-certificates.yml

$ cd /usr/share/ansible/openshift-ansible
$ ansible-playbook playbooks/openshift-logging/redeploy-certificates.yml

Copy to Clipboard

Toggle word wrap

36.11. Changing the Aggregated Logging Driver
Copy link

For aggregated logging, it is recommended to use the json-file log driver.

Important

When using the json-file driver, ensure that you are using Docker version docker-1.12.6-55.gitc4618fb.el7_4 now or later.

Fluentd determines the driver Docker is using by checking the /etc/docker/daemon.json and /etc/sysconfig/docker files.

You can determine which driver Docker is using with the docker info command:

docker info | grep Logging

Logging Driver: journald

# docker info | grep Logging

Logging Driver: journald

Copy to Clipboard

Toggle word wrap

To change to json-file:

Modify either the /etc/sysconfig/docker or /etc/docker/daemon.json files.

For example:

cat /etc/sysconfig/docker
OPTIONS=' --selinux-enabled --log-driver=json-file --log-opt max-size=1M --log-opt max-file=3 --signature-verification=False'

cat /etc/docker/daemon.json
{
"log-driver": "json-file",
"log-opts": {
"max-size": "1M",
"max-file": "1"
}
}

# cat /etc/sysconfig/docker
OPTIONS=' --selinux-enabled --log-driver=json-file --log-opt max-size=1M --log-opt max-file=3 --signature-verification=False'

cat /etc/docker/daemon.json
{
"log-driver": "json-file",
"log-opts": {
"max-size": "1M",
"max-file": "1"
}
}

Copy to Clipboard

Toggle word wrap

Restart the Docker service:
```
systemctl restart docker
```
```
systemctl restart docker
```
Copy to Clipboard Toggle word wrap
Restart Fluentd.
Warning
Restarting Fluentd on more than a dozen nodes at once will create a large load on the Kubernetes scheduler. Exercise caution when using the following the directions to restart Fluentd.
There are two methods for restarting Fluentd. You can restart the Fluentd on one node or a set of nodes, or on all nodes.
1. The following steps demonstrate how to restart Fluentd on one node or a set of nodes.
  1. List the nodes where Fluentd is running:
    
    $ oc get nodes -l logging-infra-fluentd=true
    
    Copy to Clipboard Toggle word wrap
  2. For each node, remove the label and turn off Fluentd:
    
    $ oc label node $node logging-infra-fluentd-
    
    Copy to Clipboard Toggle word wrap
  3. Verify Fluentd is off:
    
    $ oc get pods -l component=fluentd
    
    Copy to Clipboard Toggle word wrap
  4. For each node, restart Fluentd:
    
    $ oc label node $node logging-infra-fluentd=true
    
    Copy to Clipboard Toggle word wrap
2. The following steps demonstrate how to restart the Fluentd all nodes.
  1. Turn off Fluentd on all nodes:
    
    $ oc label node -l logging-infra-fluentd=true --overwrite logging-infra-fluentd=false
    
    Copy to Clipboard Toggle word wrap
  2. Verify Fluentd is off:
    
    $ oc get pods -l component=fluentd
    
    Copy to Clipboard Toggle word wrap
  3. Restart Fluentd on all nodes:
    
    $ oc label node -l logging-infra-fluentd=false --overwrite logging-infra-fluentd=true
    
    Copy to Clipboard Toggle word wrap
  4. Verify Fluentd is on:
    
    $ oc get pods -l component=fluentd
    
    Copy to Clipboard Toggle word wrap

36.12. Manual Elasticsearch Rollouts
Copy link

As of OpenShift Container Platform 3.7 the Aggregated Logging stack updated the Elasticsearch Deployment Config object so that it no longer has a Config Change Trigger, meaning any changes to the dc will not result in an automatic rollout. This was to prevent unintended restarts happening in the Elasticsearch cluster, which could create excessive shard rebalancing as cluster members restart.

This section presents two restart procedures: rolling-restart and full-restart. Where a rolling restart applies appropriate changes to the Elasticsearch cluster without down time (provided three masters are configured) and a full restart safely applies major changes without risk to existing data.

36.12.1. Performing an Elasticsearch Rolling Cluster Restart
Copy link

A rolling restart is recommended, when any of the following changes are made:

nodes on which Elasticsearch pods run require a reboot
logging-elasticsearch configmap
logging-es-* deployment configuration
new image deployment, or upgrade

This will be the recommended restart policy going forward.

Note

Any action you do for an Elasticsearch cluster will need to be repeated for the ops cluster if openshift_logging_use_ops was configured to be True.

Prevent shard balancing when purposely bringing down nodes:

oc exec -c elasticsearch <any_es_pod_in_the_cluster> -- \
          curl -s \
          --cacert /etc/elasticsearch/secret/admin-ca \
          --cert /etc/elasticsearch/secret/admin-cert \
          --key /etc/elasticsearch/secret/admin-key \
          -XPUT 'https://localhost:9200/_cluster/settings' \
          -d '{ "transient": { "cluster.routing.allocation.enable" : "none" } }'

$ oc exec -c elasticsearch <any_es_pod_in_the_cluster> -- \
          curl -s \
          --cacert /etc/elasticsearch/secret/admin-ca \
          --cert /etc/elasticsearch/secret/admin-cert \
          --key /etc/elasticsearch/secret/admin-key \
          -XPUT 'https://localhost:9200/_cluster/settings' \
          -d '{ "transient": { "cluster.routing.allocation.enable" : "none" } }'

Copy to Clipboard

Toggle word wrap

Once complete, for each dc you have for an Elasticsearch cluster, run oc rollout latest to deploy the latest version of the dc object:
```
oc rollout latest <dc_name>
```
```
$ oc rollout latest <dc_name>
```
Copy to Clipboard Toggle word wrap
You will see a new pod deployed. Once the pod has two ready containers, you can move on to the next dc.

Once all `dc`s for the cluster have been rolled out, re-enable shard balancing:

oc exec -c elasticsearch <any_es_pod_in_the_cluster> -- \
          curl -s \
          --cacert /etc/elasticsearch/secret/admin-ca \
          --cert /etc/elasticsearch/secret/admin-cert \
          --key /etc/elasticsearch/secret/admin-key \
          -XPUT 'https://localhost:9200/_cluster/settings' \
          -d '{ "transient": { "cluster.routing.allocation.enable" : "all" } }'

$ oc exec -c elasticsearch <any_es_pod_in_the_cluster> -- \
          curl -s \
          --cacert /etc/elasticsearch/secret/admin-ca \
          --cert /etc/elasticsearch/secret/admin-cert \
          --key /etc/elasticsearch/secret/admin-key \
          -XPUT 'https://localhost:9200/_cluster/settings' \
          -d '{ "transient": { "cluster.routing.allocation.enable" : "all" } }'

Copy to Clipboard

Toggle word wrap

36.12.2. Performing an Elasticsearch Full Cluster Restart
Copy link

A full restart is recommended when changing major versions of Elasticsearch or other changes which might put data integrity a risk during the change process.

Note

Any action you do for an Elasticsearch cluster will need to be repeated for the ops cluster if openshift_logging_use_ops was configured to be True.

Note

When making changes to the logging-es-ops service use components "es-ops-blocked" and "es-ops" instead in the patch

Disable all external communications to the Elasticsearch cluster while it is down. Edit your non-cluster logging service (for example, logging-es, logging-es-ops) to no longer match the Elasticsearch pods running:
```
 oc patch svc/logging-es -p '{"spec":{"selector":{"component":"es-blocked","provider":"openshift"}}}'
```
```
$  oc patch svc/logging-es -p '{"spec":{"selector":{"component":"es-blocked","provider":"openshift"}}}'
```
Copy to Clipboard Toggle word wrap

Perform a shard synced flush to ensure there are no pending operations waiting to be written to disk prior to shutting down:

oc exec -c elasticsearch <any_es_pod_in_the_cluster> -- \
          curl -s \
          --cacert /etc/elasticsearch/secret/admin-ca \
          --cert /etc/elasticsearch/secret/admin-cert \
          --key /etc/elasticsearch/secret/admin-key \
          -XPOST 'https://localhost:9200/_flush/synced'

$ oc exec -c elasticsearch <any_es_pod_in_the_cluster> -- \
          curl -s \
          --cacert /etc/elasticsearch/secret/admin-ca \
          --cert /etc/elasticsearch/secret/admin-cert \
          --key /etc/elasticsearch/secret/admin-key \
          -XPOST 'https://localhost:9200/_flush/synced'

Copy to Clipboard

Toggle word wrap

Prevent shard balancing when purposely bringing down nodes:

oc exec -c elasticsearch <any_es_pod_in_the_cluster> -- \
          curl -s \
          --cacert /etc/elasticsearch/secret/admin-ca \
          --cert /etc/elasticsearch/secret/admin-cert \
          --key /etc/elasticsearch/secret/admin-key \
          -XPUT 'https://localhost:9200/_cluster/settings' \
          -d '{ "transient": { "cluster.routing.allocation.enable" : "none" } }'

$ oc exec -c elasticsearch <any_es_pod_in_the_cluster> -- \
          curl -s \
          --cacert /etc/elasticsearch/secret/admin-ca \
          --cert /etc/elasticsearch/secret/admin-cert \
          --key /etc/elasticsearch/secret/admin-key \
          -XPUT 'https://localhost:9200/_cluster/settings' \
          -d '{ "transient": { "cluster.routing.allocation.enable" : "none" } }'

Copy to Clipboard

Toggle word wrap

Once complete, for each dc you have for an Elasticsearch cluster, scale down all nodes:
```
oc scale dc <dc_name> --replicas=0
```
```
$ oc scale dc <dc_name> --replicas=0
```
Copy to Clipboard Toggle word wrap
Once scale down is complete, for each dc you have for an Elasticsearch cluster, run oc rollout latest to deploy the latest version of the dc object:
```
oc rollout latest <dc_name>
```
```
$ oc rollout latest <dc_name>
```
Copy to Clipboard Toggle word wrap
You will see a new pod deployed. Once the pod has two ready containers, you can move on to the next dc.
Once deployment is complete, for each dc you have for an Elasticsearch cluster, scale up the nodes:
```
oc scale dc <dc_name> --replicas=1
```
```
$ oc scale dc <dc_name> --replicas=1
```
Copy to Clipboard Toggle word wrap
Once the scale up is complete, enable all external communications to the ES cluster. Edit your non-cluster logging service (for example, logging-es, logging-es-ops) to match the Elasticsearch pods running again:
```
oc patch svc/logging-es -p '{"spec":{"selector":{"component":"es","provider":"openshift"}}}'
```
```
$ oc patch svc/logging-es -p '{"spec":{"selector":{"component":"es","provider":"openshift"}}}'
```
Copy to Clipboard Toggle word wrap

36.13. Troubleshooting EFK
Copy link

The following is troubleshooting information for a number of commonly identified issues with cluster logging deployments:

The following troubleshooting issues apply to the EFK stack in general.

Deployment fails, ReplicationControllers scaled to 0

If you perform a deployment that does not successfully bring up an instance before a ten-minute timeout, OpenShift Container Platform considers the deployment as failed and scales down to zero instances. The oc get pods command shows a deployer pod with a non-zero exit code and no deployed pods.

In the following example, the deployer pod name for an Elasticsearch deployment is shown; this is from ReplicationController logging-es-2e7ut0iq-1, which is a deployment of DeploymentConfig logging-es-2e7ut0iq.

NAME                           READY     STATUS             RESTARTS   AGE
logging-es-2e7ut0iq-1-deploy   1/1       ExitCode:255       0          1m

NAME                           READY     STATUS             RESTARTS   AGE
logging-es-2e7ut0iq-1-deploy   1/1       ExitCode:255       0          1m

Copy to Clipboard

Toggle word wrap

Deployment failure can happen for a number of transitory reasons, such as the image pull taking too long or nodes being unresponsive.

Examine the deployer pod logs for possible reasons or attempt to redeploy:

oc deploy --latest logging-es-2e7ut0iq

$ oc deploy --latest logging-es-2e7ut0iq

Copy to Clipboard

Toggle word wrap

Alternatively, attempt to scale up the existing deployment:

oc scale --replicas=1 logging-es-2e7ut0iq-1

$ oc scale --replicas=1 logging-es-2e7ut0iq-1

Copy to Clipboard

Toggle word wrap

If the problem persists, examine the pod, events, and systemd unit logs to determine the source of the problem.

Cannot resolve kubernetes.default.svc.cluster.local

This internal alias for the master must be resolvable by the included DNS server on the master. Depending on your platform, you can run the dig command (for example, in a container) against the master to check whether this is the case:

dig kubernetes.default.svc.cluster.local @localhost
[...]
;; QUESTION SECTION:
;kubernetes.default.svc.cluster.local. IN A

;; ANSWER SECTION:
kubernetes.default.svc.cluster.local. 30 IN A   172.30.0.1

$ dig kubernetes.default.svc.cluster.local @localhost
[...]
;; QUESTION SECTION:
;kubernetes.default.svc.cluster.local. IN A

;; ANSWER SECTION:
kubernetes.default.svc.cluster.local. 30 IN A   172.30.0.1

Copy to Clipboard

Toggle word wrap

Older versions of cluster logging did not automatically define this internal alias for the master. You might need to upgrade your cluster in order to use aggregated logging. If your cluster is up to date, there might be a problem with your pods reaching the SkyDNS resolver at the master or the pod could have been blocked from running. You must resolve this problem before deploying again.

Cannot connect to the master or services

If DNS resolution does not return at all or the address cannot be connected to from within a pod (such as the fluentd pod), this could be an indication of a system firewall/network problem. You must debug this problem.

The following troubleshooting issues apply to the ElasticSearch components of the EFK stack.

Elasticsearch deployments never succeed and rollback to previous version

This situation typically occurs itself on OpenShift Container Platform with cluster logging deployed on AWS. Describing the Elasticsearch pods typically reveals issues re-attaching the pods storage:

oc describe pod <elasticsearch-pod>

$ oc describe pod <elasticsearch-pod>

Copy to Clipboard

Toggle word wrap

Consider patching each Elasticsearch deployment configuration to allow more time for AWS to make the storage available:

oc patch dc <elasticsearch-deployment-config> -p '{"spec":{"strategy":{"recreateParams": {"timeoutSeconds":1800}}}}'

$ oc patch dc <elasticsearch-deployment-config> -p '{"spec":{"strategy":{"recreateParams": {"timeoutSeconds":1800}}}}'

Copy to Clipboard

Toggle word wrap

Searchguard index remains red

This is a known issue related to upgrading and moving to a single SearchGuard index per cluster instead of one index per deployment configuration. The Elasticsearch Explain API is used to discover the reason and removing the index to node assignment is required:

oc -c elasticsearch exec ${pod} -- es_util --query=".searchguard/_settings" -XPUT -d "{\"index.routing.allocation.include._name\": \"\"}"

$ oc -c elasticsearch exec ${pod} -- es_util --query=".searchguard/_settings" -XPUT -d "{\"index.routing.allocation.include._name\": \"\"}"

Copy to Clipboard

Toggle word wrap

Elasticsearch pods never become ready

This is known issue when the initialization and seeding process fails, which can be from a red .searchguard index.

for p in $(oc get pods -l component=es -o jsonpath={.items[*].metadata.name}); do \
  oc exec -c elasticsearch $p -- touch /opt/app-root/src/init_failures;  \
done

for p in $(oc get pods -l component=es -o jsonpath={.items[*].metadata.name}); do \
  oc exec -c elasticsearch $p -- touch /opt/app-root/src/init_failures;  \
done

Copy to Clipboard

Toggle word wrap

36.13.3. Kibana
Copy link

The following troubleshooting issues apply to the Kibana components of the EFK stack.

Looping log in on Kibana

If you launch the Kibana console and login successfully, you are incorrectly redirected back to Kibana, which immediately redirects back to the login screen.

The likely cause for this issue is that the OAuth2 proxy in front of Kibana must share a secret with the master’s OAuth2 server in order to identify it as a valid client. This problem could indicate that the secrets do not match. Nothing reports this problem in a way that can be exposed.

This can happen when you deploy logging more than once. For example, if you fix the initial deployment and the secret used by Kibana is replaced while the matching master oauthclient entry to match is not replaced.

You can do the following:

oc delete oauthclient/kibana-proxy

$ oc delete oauthclient/kibana-proxy

Copy to Clipboard

Toggle word wrap

Follow the openshift-ansible instructions to re-run the openshift_logging role. This replaces the oauthclient and your next successful login should not loop.

*"error":"invalid\_request" on login*

*"error":"invalid\_request" on login*

Copy to Clipboard

Toggle word wrap

Login error on Kibana

When attempting to visit the Kibana console, you might receive a browser error instead:

{"error":"invalid_request","error_description":"The request is missing a required parameter,
 includes an invalid parameter value, includes a parameter more than once, or is otherwise malformed."}

{"error":"invalid_request","error_description":"The request is missing a required parameter,
 includes an invalid parameter value, includes a parameter more than once, or is otherwise malformed."}

Copy to Clipboard

Toggle word wrap

This problem can be caused by a mismatch between the OAuth2 client and server. The return address for the client must be in a whitelist so the server can securely redirect back after logging in. If there is a mismatch, the error message is shown.

This can be caused by an oauthclient entry lingering from a previous deployment, in which case you can replace it:

oc delete oauthclient/kibana-proxy

$ oc delete oauthclient/kibana-proxy

Copy to Clipboard

Toggle word wrap

Follow the openshift-ansible instructions to re-run the openshift_logging role, which replaces the oauthclient entry. Return to the Kibana console and log in again.

If the problem persists, check that you are accessing Kibana at a URL listed in the OAuth client. This issue can be caused by accessing the URL at a forwarded port, such as 1443 instead of the standard 443 HTTPS port.

You can adjust the server whitelist by editing its oauthclient:

oc edit oauthclient/kibana-proxy

$ oc edit oauthclient/kibana-proxy

Copy to Clipboard

Toggle word wrap

Edit the list of redirect URIs accepted to include the address you are actually using. After you save and exit, this should resolve the error.

Kibana access shows 503 error

If you receive a proxy error when viewing the Kibana console, it could be caused by one of two issues.

Kibana might not be recognizing pods. If ElasticSearch is slow in starting up, Kibana might error out trying to reach ElasticSearch and Kibana does not consider it alive. You can check whether the relevant service has any endpoints:
```
oc describe service logging-kibana
Name:                   logging-kibana
[...]
Endpoints:              <none>
```
```
$ oc describe service logging-kibana
Name:                   logging-kibana
[...]
Endpoints:              <none>
```
Copy to Clipboard Toggle word wrap
If any Kibana pods are live, endpoints should be listed. If they are not, check the state of the Kibana pod(s) and deployment.

The named route for accessing the Kibana service might be masked.

This can happen if you perform a test deployment in one project, then deploy in a different project without completely removing the first deployment. When multiple routes are sent to the same destination, the default router only routes to the first destination created. Check the problematic route to see if it is defined in multiple places:

oc get route  --all-namespaces --selector logging-infra=support
NAMESPACE   NAME         HOST/PORT                 PATH      SERVICE
logging     kibana       kibana.example.com                  logging-kibana
logging     kibana-ops   kibana-ops.example.com              logging-kibana-ops

$ oc get route  --all-namespaces --selector logging-infra=support
NAMESPACE   NAME         HOST/PORT                 PATH      SERVICE
logging     kibana       kibana.example.com                  logging-kibana
logging     kibana-ops   kibana-ops.example.com              logging-kibana-ops

Copy to Clipboard

Toggle word wrap

In this example there are no overlapping routes.

Chapter 36. Aggregating Container Logs

36.1. Overview
Copy link

36.2. Pre-deployment Configuration
Copy link

36.3. Specifying Logging Ansible Variables
Copy link

36.4. Deploying the EFK Stack
Copy link

36.5. Understanding and Adjusting the Deployment
Copy link

36.5.1. Ops Cluster
Copy link

36.5.2. Elasticsearch
Copy link

36.5.2.1. Persistent Elasticsearch Storage
Copy link

36.5.2.1.1. Using NFS as a persistent volume
Copy link

36.5.2.1.2. Using NFS as local storage
Copy link

36.5.2.1.3. Configuring hostPath storage for Elasticsearch
Copy link

36.5.2.1.4. Changing the Scale of Elasticsearch
Copy link

36.5.2.1.5. Changing the Number of Elasticsearch Replicas
Copy link

36.5.2.1.6. Expose Elasticsearch as a Route
Copy link

36.5.3. Fluentd
Copy link

36.5.4. Kibana
Copy link

36.5.5. Curator
Copy link

36.5.5.1. Using the Curator Actions File
Copy link

36.5.5.2. Creating the Curator Configuration
Copy link

36.6. Cleanup
Copy link

36.7. Sending Logs to an External Elasticsearch Instance
Copy link

36.8. Sending Logs to an External Syslog Server
Copy link

36.9. Performing Administrative Elasticsearch Operations
Copy link

36.10. Redeploying EFK Certificates
Copy link

36.11. Changing the Aggregated Logging Driver
Copy link

36.12. Manual Elasticsearch Rollouts
Copy link

36.12.1. Performing an Elasticsearch Rolling Cluster Restart
Copy link

36.12.2. Performing an Elasticsearch Full Cluster Restart
Copy link

36.13. Troubleshooting EFK
Copy link

36.13.3. Kibana
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 36. Aggregating Container Logs

36.1. OverviewCopy linkLink copied to clipboard!

36.2. Pre-deployment ConfigurationCopy linkLink copied to clipboard!

36.3. Specifying Logging Ansible VariablesCopy linkLink copied to clipboard!

36.4. Deploying the EFK StackCopy linkLink copied to clipboard!

36.5. Understanding and Adjusting the DeploymentCopy linkLink copied to clipboard!

36.5.1. Ops ClusterCopy linkLink copied to clipboard!

36.5.2. ElasticsearchCopy linkLink copied to clipboard!

36.5.2.1. Persistent Elasticsearch StorageCopy linkLink copied to clipboard!

36.5.2.1.1. Using NFS as a persistent volumeCopy linkLink copied to clipboard!

36.5.2.1.2. Using NFS as local storageCopy linkLink copied to clipboard!

36.5.2.1.3. Configuring hostPath storage for ElasticsearchCopy linkLink copied to clipboard!

36.5.2.1.4. Changing the Scale of ElasticsearchCopy linkLink copied to clipboard!

36.5.2.1.5. Changing the Number of Elasticsearch ReplicasCopy linkLink copied to clipboard!

36.5.2.1.6. Expose Elasticsearch as a RouteCopy linkLink copied to clipboard!

36.5.3. FluentdCopy linkLink copied to clipboard!

36.5.4. KibanaCopy linkLink copied to clipboard!

36.5.5. CuratorCopy linkLink copied to clipboard!

36.5.5.1. Using the Curator Actions FileCopy linkLink copied to clipboard!

36.5.5.2. Creating the Curator ConfigurationCopy linkLink copied to clipboard!

36.6. CleanupCopy linkLink copied to clipboard!

36.7. Sending Logs to an External Elasticsearch InstanceCopy linkLink copied to clipboard!

36.8. Sending Logs to an External Syslog ServerCopy linkLink copied to clipboard!

36.9. Performing Administrative Elasticsearch OperationsCopy linkLink copied to clipboard!

36.10. Redeploying EFK CertificatesCopy linkLink copied to clipboard!

36.11. Changing the Aggregated Logging DriverCopy linkLink copied to clipboard!

36.12. Manual Elasticsearch RolloutsCopy linkLink copied to clipboard!

36.12.1. Performing an Elasticsearch Rolling Cluster RestartCopy linkLink copied to clipboard!

36.12.2. Performing an Elasticsearch Full Cluster RestartCopy linkLink copied to clipboard!

36.13. Troubleshooting EFKCopy linkLink copied to clipboard!

36.13.1. Troubleshooting related to all EFK componentsCopy linkLink copied to clipboard!

36.13.2. Troubleshooting related to ElasticSearchCopy linkLink copied to clipboard!

36.13.3. KibanaCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

36.1. Overview
Copy link

36.2. Pre-deployment Configuration
Copy link

36.3. Specifying Logging Ansible Variables
Copy link

36.4. Deploying the EFK Stack
Copy link

36.5. Understanding and Adjusting the Deployment
Copy link

36.5.1. Ops Cluster
Copy link

36.5.2. Elasticsearch
Copy link

36.5.2.1. Persistent Elasticsearch Storage
Copy link

36.5.2.1.1. Using NFS as a persistent volume
Copy link

36.5.2.1.2. Using NFS as local storage
Copy link

36.5.2.1.3. Configuring hostPath storage for Elasticsearch
Copy link

36.5.2.1.4. Changing the Scale of Elasticsearch
Copy link

36.5.2.1.5. Changing the Number of Elasticsearch Replicas
Copy link

36.5.2.1.6. Expose Elasticsearch as a Route
Copy link

36.5.3. Fluentd
Copy link

36.5.4. Kibana
Copy link

36.5.5. Curator
Copy link

36.5.5.1. Using the Curator Actions File
Copy link

36.5.5.2. Creating the Curator Configuration
Copy link

36.6. Cleanup
Copy link

36.7. Sending Logs to an External Elasticsearch Instance
Copy link

36.8. Sending Logs to an External Syslog Server
Copy link

36.9. Performing Administrative Elasticsearch Operations
Copy link

36.10. Redeploying EFK Certificates
Copy link

36.11. Changing the Aggregated Logging Driver
Copy link

36.12. Manual Elasticsearch Rollouts
Copy link

36.12.1. Performing an Elasticsearch Rolling Cluster Restart
Copy link

36.12.2. Performing an Elasticsearch Full Cluster Restart
Copy link

36.13. Troubleshooting EFK
Copy link

36.13.1. Troubleshooting related to all EFK components
Copy link

36.13.2. Troubleshooting related to ElasticSearch
Copy link

36.13.3. Kibana
Copy link