Questo contenuto non è disponibile nella lingua selezionata.
Chapter 20. Aggregating Container Logs
20.1. Overview
As an OpenShift Enterprise cluster administrator, you can deploy the EFK stack to aggregate logs for a range of OpenShift Enterprise services. Application developers can view the logs of the projects for which they have view access. The EFK stack aggregates logs from hosts and applications, whether coming from multiple containers or even deleted pods.
The EFK stack is a modified version of the ELK stack and is comprised of:
- Elasticsearch: An object store where all logs are stored.
- Fluentd: Gathers logs from nodes and feeds them to Elasticsearch.
- Kibana: A web UI for Elasticsearch.
Once deployed in a cluster, the stack aggregates logs from all nodes and projects into Elasticsearch, and provides a Kibana UI to view any logs. Cluster administrators can view all logs, but application developers can only view logs for projects they have permission to view. The stack components communicate securely.
Managing Docker Container Logs discusses the use of json-file
logging driver options to manage container logs and prevent filling node disks.
20.2. Pre-deployment Configuration
- Ensure that you have deployed a router for the cluster.
- Ensure that you have the necessary storage for Elasticsearch. Note that each Elasticsearch replica requires its own storage volume. See Elasticsearch for more information.
Ansible-based installs should create the logging-deployer-template template in the openshift project. Otherwise you can create it with the following command:
$ oc create -n openshift -f \ /usr/share/openshift/examples/infrastructure-templates/enterprise/logging-deployer.yaml
Create a new project. Once implemented in a single project, the EFK stack collects logs for every project within your OpenShift Enterprise cluster. The examples in this topic use logging as an example project:
$ oadm new-project logging --node-selector="" $ oc project logging
NoteSpecifying a non-empty node selector on the project is not recommended, as this would restrict where Fluentd can be deployed. Instead, specify node selectors for the deployer to be applied to your other deployment configurations.
Create a secret to provide security-related files to the deployer. While the secret is necessary, the contents of the secret are optional, and will be generated for you if none are supplied.
You can supply the following files when creating a new secret:
File Name Description kibana.crt
A browser-facing certificate for the Kibana server.
kibana.key
A key to be used with the Kibana certificate.
kibana-ops.crt
A browser-facing certificate for the Ops Kibana server.
kibana-ops.key
A key to be used with the Ops Kibana certificate.
server-tls.json
JSON TLS options to override the Kibana server defaults. Refer to Node.JS docs for available options.
ca.crt
A certificate for a CA that will be used to sign all certificates generated by the deployer.
ca.key
A matching CA key.
For example:
$ oc secrets new logging-deployer \ kibana.crt=/path/to/cert kibana.key=/path/to/key
If a certificate file is not passed as a secret, the deployer will generate a self-signed certificate instead. However, a secret is still required for the deployer to run. In this case, you can create a "dummy" secret that does not specify a certificate value:
$ oc secrets new logging-deployer nothing=/dev/null
Create the deployer service account:
$ oc create -f - <<API apiVersion: v1 kind: ServiceAccount metadata: name: logging-deployer secrets: - name: logging-deployer API
Enable the Fluentd service account, which the deployer will create, that requires special privileges to operate Fluentd. Add the service account user to the security context:
$ oadm policy add-scc-to-user \ privileged system:serviceaccount:logging:aggregated-logging-fluentd 1
- 1
- Use the new project you created earlier (e.g., logging) when specifying this service account.
Give the Fluentd service account permission to read labels from all pods:
$ oadm policy add-cluster-role-to-user cluster-reader \ system:serviceaccount:logging:aggregated-logging-fluentd 1
- 1
- Use the new project you created earlier (e.g., logging) when specifying this service account.
20.3. Deploying the EFK Stack
The EFK stack is deployed using a template.
Run the deployer, specifying at least the parameters in the following example (more are described in the table below):
$ oc new-app logging-deployer-template \ --param KIBANA_HOSTNAME=kibana.example.com \ --param ES_CLUSTER_SIZE=1 \ --param PUBLIC_MASTER_URL=https://localhost:8443
Be sure to replace at least
KIBANA_HOSTNAME
andPUBLIC_MASTER_URL
with values relevant to your deployment.The available parameters are:
Variable Name Description PUBLIC_MASTER_URL
(Required with the
oc process
command) The external URL for the master. For OAuth use.ENABLE_OPS_CLUSTER
If set to
true
, configures a second Elasticsearch cluster and Kibana for operations logs. Fluentd splits logs between the main cluster and a cluster reserved for operations logs (which consists of /var/log/messages on nodes and the logs from the projects default, openshift, and openshift-infra). This means a second Elasticsearch and Kibana are deployed. The deployments are distinguishable by the -ops included in their names and have parallel deployment options listed below.KIBANA_HOSTNAME
,KIBANA_OPS_HOSTNAME
(Required with the
oc process
command) The external host name for web clients to reach Kibana.ES_CLUSTER_SIZE
,ES_OPS_CLUSTER_SIZE
(Required with the
oc process
command) The number of instances of Elasticsearch to deploy. Redundancy requires at least three, and more can be used for scaling.ES_INSTANCE_RAM
,ES_OPS_INSTANCE_RAM
Amount of RAM to reserve per Elasticsearch instance. The default is 8G (for 8GB), and it must be at least 512M. Possible suffixes are G,g,M,m.
ES_NODE_QUORUM
,ES_OPS_NODE_QUORUM
The quorum required to elect a new master. Should be more than half the intended cluster size.
ES_RECOVER_AFTER_NODES
,ES_OPS_RECOVER_AFTER_NODES
When restarting the cluster, require this many nodes to be present before starting recovery. Defaults to one less than the cluster size to allow for one missing node.
ES_RECOVER_EXPECTED_NODES
,ES_OPS_RECOVER_EXPECTED_NODES
When restarting the cluster, wait for this number of nodes to be present before starting recovery. By default, the same as the cluster size.
ES_RECOVER_AFTER_TIME
,ES_OPS_RECOVER_AFTER_TIME
When restarting the cluster, this is a timeout for waiting for the expected number of nodes to be present. Defaults to "5m".
IMAGE_PREFIX
The prefix for logging component images. For example, setting the prefix to registry.access.redhat.com/openshift3/ose- creates registry.access.redhat.com/openshift3/ose-logging-deployment:latest.
IMAGE_VERSION
The version for logging component images. For example, setting the version to 3.1.1 creates registry.access.redhat.com/openshift3/logging-deployment:3.1.1.
Running the deployer creates a deployer pod and prints its name.
Wait until the pod is running. It may take several minutes for OpenShift Enterprise to retrieve the deployer image from the registry.
NoteThe logs for the openshift and openshift-infra projects are automatically aggregated and grouped into the .operations item in the Kibana interface.
The project where you have deployed the EFK stack (logging, as documented here) is not aggregated into .operations and is found under its ID.
You can watch its progress with:
$ oc get pod/<pod_name> -w
If it seems to be taking too long to start, you can retrieve more details about the pod and any associated events with:
$ oc describe pod/<pod_name>
When it runs, you can check the logs of the resulting pod to see if the deployment was successful:
$ oc logs -f <pod_name>
As a cluster administrator, deploy the
logging-support-template
template that the deployer created:$ oc process logging-support-template | oc create -f -
ImportantDeployment of logging components should begin automatically. However, because deployment is triggered based on tags being imported into the ImageStreams created in this step, and not all tags are automatically imported, this mechanism has become unreliable as multiple versions are released. Therefore, manual importing may be necessary as follows.
For each ImageStream
logging-auth-proxy
,logging-kibana
,logging-elasticsearch
, andlogging-fluentd
, manually import the tag corresponding to theIMAGE_VERSION
specified (or defaulted) for the deployer.$ oc import-image <name>:<version> --from <prefix><name>:<tag>
For example:
$ oc import-image logging-auth-proxy:3.1.1 \ --from registry.access.redhat.com/openshift3/logging-auth-proxy:3.1.1 $ oc import-image logging-kibana:3.1.1 \ --from registry.access.redhat.com/openshift3/logging-kibana:3.1.1 $ oc import-image logging-elasticsearch:3.1.1 \ --from registry.access.redhat.com/openshift3/logging-elasticsearch:3.1.1 $ oc import-image logging-fluentd:3.1.1 \ --from registry.access.redhat.com/openshift3/logging-fluentd:3.1.1
20.4. Post-deployment Configuration
20.4.1. Elasticsearch
A highly-available environment requires at least three replicas of Elasticsearch; each on a different host. Elasticsearch replicas require their own storage, but an OpenShift Enterprise deployment configuration shares storage volumes between all its pods. So, when scaled up, the EFK deployer ensures each replica of Elasticsearch has its own deployment configuration.
Viewing all Elasticsearch Deployments
To view all current Elasticsearch deployments:
$ oc get dc --selector logging-infra=elasticsearch
Persistent Elasticsearch Storage
The deployer creates an ephemeral deployment in which all of a pod’s data is lost upon restart. For production usage, add a persistent storage volume to each Elasticsearch deployment configuration.
The best-performing volumes are local disks, if it is possible to use them. Doing so requires some preparation as follows.
The relevant service account must be given the privilege to mount and edit a local volume, as follows:
$ oadm policy add-scc-to-user privileged \ system:serviceaccount:logging:aggregated-logging-elasticsearch 1
- 1
- Use the new project you created earlier (e.g., logging) when specifying this service account.
Each Elasticsearch replica definition must be patched to claim that privilege, for example:
$ for dc in $(oc get deploymentconfig --selector logging-infra=elasticsearch -o name); do oc scale $dc --replicas=0 oc patch $dc \ -p '{"spec":{"template":{"spec":{"containers":[{"name":"elasticsearch","securityContext":{"privileged": true}}]}}}}' done
- The Elasticsearch pods must be located on the correct nodes to use the local storage, and should not move around even if those nodes are taken down for a period of time. This requires giving each Elasticsearch replica a node selector that is unique to the node where an administrator has allocated storage for it. See below for directions on setting a node selector.
Once these steps are taken, a local host mount can be applied to each replica as in this example (where we assume storage is mounted at the same path on each node):
$ for dc in $(oc get deploymentconfig --selector logging-infra=elasticsearch -o name); do oc volume $dc \ --add --overwrite --name=elasticsearch-storage \ --type=hostPath --path=/usr/local/es-storage oc scale $dc --replicas=1 done
If using host mounts is impractical or undesirable, it may be necessary to attach block storage as a PersistentVolumeClaim as in the following example:
$ oc volume dc/logging-es-<unique> \ --add --overwrite --name=elasticsearch-storage \ --type=persistentVolumeClaim --claim-name=logging-es-1
Using NFS storage directly or as a PersistentVolume (or via other NAS such as Gluster) is not supported for Elasticsearch storage, as Lucene relies on filesystem behavior that NFS does not supply. Data corruption and other problems can occur. If NFS storage is a requirement, you can allocate a large file on that storage to serve as a storage device and treat it as a host mount on each host. For example:
$ truncate -s 1T /nfs/storage/elasticsearch-1 $ mkfs.xfs /nfs/storage/elasticsearch-1 $ mount -o loop /nfs/storage/elasticsearch-1 /usr/local/es-storage $ chown 1000:1000 /usr/local/es-storage
Then, use /usr/local/es-storage as a host-mount as described above. Performance under this solution is significantly worse than using actual local drives.
Node Selector
Because Elasticsearch can use a lot of resources, all members of a cluster should have low latency network connections to each other. Ensure this by directing the instances to dedicated nodes, or a dedicated region within your cluster, using a node selector.
To configure a node selector, edit each deployment configuration and add the nodeSelector
parameter to specify the label of the desired nodes:
apiVersion: v1 kind: DeploymentConfig spec: template: spec: nodeSelector: nodelabel: logging-es-node-1
Alternatively you can use the oc patch
command:
$ oc patch dc/logging-es-<unique_name> \ -p '{"spec":{"template":{"spec":{"nodeSelector":{"<label_name>":"<label_value>"}}}}}'
Changing the Scale of Elasticsearch
If you need to scale up the number of Elasticsearch instances your cluster uses, it is not as simple as changing the number of Elasticsearch cluster nodes. This is due to the nature of persistent volumes and how Elasticsearch is configured to store its data and recover the cluster. Instead, you must create a deployment configuration for each Elasticsearch cluster node.
During installation, the deployer creates templates with the Elasticsearch configurations provided to it: logging-es-template and logging-es-ops-template if the deployer was run with ENABLE_OPS_CLUSTER=true
.
The node quorum and recovery settings were initially set based on the CLUSTER_SIZE
value provided to the deployer. Since the cluster size is changing, those values need to be updated.
- Prior to changing the number of Elasticsearch cluster nodes, the EFK stack should first be scaled down to preserve log data as described in Upgrading the EFK Logging Stack.
Edit the cluster template you are scaling up and change the parameters to the desired value:
-
NODE_QUORUM
is the intended cluster size / 2 (rounded down) + 1. For an intended cluster size of 5, the quorum would be 3. -
RECOVER_EXPECTED_NODES
is the same as the intended cluster size. RECOVER_AFTER_NODES
is the intended cluster size - 1.$ oc edit template logging-es[-ops]-template
-
In addition to updating the template, all of the deployment configurations for that cluster also need to have the three environment variable values above updated. To edit each of the configurations for the cluster in series, you use the following.
$ oc get dc -l component=es[-ops] -o name | xargs -r oc edit
Create an additional deployment configuration, run the following command against the Elasticsearch cluster you want to to scale up for (logging-es-template or logging-es-ops-template).
$ oc new-app logging-es[-ops]-template
These deployments will be named differently, but all will have the logging-es prefix. Be aware of the cluster parameters (described in the deployer parameters) based on cluster size that may need corresponding adjustment in the template, as well as existing deployments.
After the intended number of deployment configurations are created, scale up your cluster, starting with Elasticsearch as described in Upgrading the EFK Logging Stack.
NoteThe
oc new-app logging-es[-ops]-template
command creates a deployment configuration with a persistent volume. If you want to create a Elasticsearch cluster node with a persistent volume attached to it, upon creation you can instead run the following command to create your deployment configuration with a persistent volume claim (PVC) attached.$ oc process logging-es-template | oc volume -f - \ --add --overwrite --name=elasticsearch-storage \ --type=persistentVolumeClaim --claim-name={your_pvc}`
20.4.2. Fluentd
Once Elasticsearch is running, scale Fluentd to every node to feed logs into Elasticsearch. The following example is for an OpenShift Enterprise instance with three nodes:
$ oc scale dc/logging-fluentd --replicas=3
You will need to scale Fluentd if nodes are added or subtracted.
When you make changes to any part of the EFK stack, specifically Elasticsearch or Fluentd, you should first scale Elasicsearch down to zero and scale Fluentd so it does not match any other nodes. Then, make the changes and scale Elasicsearch and Fluentd back.
To scale Elasicsearch to zero:
$ oc scale --replicas=0 dc/<ELASTICSEARCH_DC>
Change nodeSelector in the daemonset configuration to match zero:
Get the fluentd node selector:
$ oc get ds logging-fluentd -o yaml |grep -A 1 Selector nodeSelector: logging-infra-fluentd: "true"
Use the oc patch
command to modify the daemonset nodeSelector:
$ oc patch ds logging-fluentd -p '{"spec":{"template":{"spec":{"nodeSelector":{"nonexistlabel":"true"}}}}}'
Get the fluentd node selector:
$ oc get ds logging-fluentd -o yaml |grep -A 1 Selector nodeSelector: "nonexistlabel: "true"
Scale Elastcsearch back up from zero:
$ oc scale --replicas=# dc/<ELASTICSEARCH_DC>
Change nodeSelector in the daemonset configuration back to logging-infra-fluentd: "true".
Use the oc patch
command to modify the daemonset nodeSelector:
oc patch ds logging-fluentd -p '{"spec":{"template":{"spec":{"nodeSelector":{"logging-infra-fluentd":"true"}}}}}'
20.4.3. Kibana
To access the Kibana console from the OpenShift Enterprise web console, add the loggingPublicURL
parameter in the /etc/origin/master/master-config.yaml file, with the URL of the Kibana console (the KIBANA_HOSTNAME
parameter). The value must be an HTTPS URL:
... assetConfig: ... loggingPublicURL: "https://kibana.example.com" ...
Setting the loggingPublicURL
parameter creates a View Archive button on the OpenShift Enterprise web console under the Browse
You can scale the Kibana deployment as usual for redundancy:
$ oc scale dc/logging-kibana --replicas=2
You can see the UI by visiting the site specified at the KIBANA_HOSTNAME
variable.
See the Kibana documentation for more information on Kibana.
20.4.4. Cleanup
You can remove everything generated during the deployment while leaving other project contents intact:
$ oc delete all --selector logging-infra=kibana $ oc delete all --selector logging-infra=fluentd $ oc delete all --selector logging-infra=elasticsearch $ oc delete all --selector logging-infra=curator $ oc delete all,sa,oauthclient --selector logging-infra=support $ oc delete secret logging-fluentd logging-elasticsearch \ logging-es-proxy logging-kibana logging-kibana-proxy \ logging-kibana-ops-proxy
20.5. Upgrading
To upgrade the EFK logging stack, see Manual Upgrades.
20.6. Troubleshooting Kibana
Using the Kibana console with OpenShift Enterprise can cause problems that are easily solved, but are not accompanied with useful error messages. Check the following troubleshooting sections if you are experiencing any problems when deploying Kibana on OpenShift Enterprise:
Login Loop
The OAuth2 proxy on the Kibana console must share a secret with the master host’s OAuth2 server. If the secret is not identical on both servers, it can cause a login loop where you are continuously redirected back to the Kibana login page.
To fix this issue, delete the current oauthclient, and create a new one, using the same template as before:
$ oc delete oauthclient/kibana-proxy $ oc process logging-support-template | oc create -f -
Cryptic Error When Viewing the Console
When attempting to visit the Kibana console, you may instead receive a browser error:
{"error":"invalid_request","error_description":"The request is missing a required parameter, includes an invalid parameter value, includes a parameter more than once, or is otherwise malformed."}
This can be caused by a mismatch between the OAuth2 client and server. The return address for the client must be in a whitelist so the server can securely redirect back after logging in.
Fix this issue by replacing the OAuth client entry:
$ oc delete oauthclient/kibana-proxy $ oc process logging-support-template | oc create -f -
If the problem persists, check that you are accessing Kibana at a URL listed in the OAuth client. This issue can be caused by accessing the URL at a forwarded port, such as 1443 instead of the standard 443 HTTPS port. You can adjust the server whitelist by editing the OAuth client:
$ oc edit oauthclient/kibana-proxy
503 Error When Viewing the Console
If you receive a proxy error when viewing the Kibana console, it could be caused by one of two issues.
First, Kibana may not be recognizing pods. If Elasticsearch is slow in starting up, Kibana may timeout trying to reach it. Check whether the relevant service has any endpoints:
$ oc describe service logging-kibana Name: logging-kibana [...] Endpoints: <none>
If any Kibana pods are live, endpoints will be listed. If they are not, check the state of the Kibana pods and deployment. You may need to scale the deployment down and back up again.
The second possible issue may be caused if the route for accessing the Kibana service is masked. This can happen if you perform a test deployment in one project, then deploy in a different project without completely removing the first deployment. When multiple routes are sent to the same destination, the default router will only route to the first created. Check the problematic route to see if it is defined in multiple places:
$ oc get route --all-namespaces --selector logging-infra=support
20.7. External Elasticsearch Instance with Fluentd
It is possible to configure the Fluentd pod created with aggregated logging to connect to an externally hosted Elasticsearch instance.
Fluentd knows where to send its logs to based on the ES_HOST
, ES_PORT
, OPS_HOST
and OPS_PORT
environment variables. If you have an external Elasticsearch instance that will contain both application and operations logs, ensure that ES_HOST
and OPS_HOST
are the same and that ES_PORT
and OPS_PORT
are also the same. Fluentd is configured to send its application logs to the ES_HOST
destination and all of its operations logs to OPS_HOST
.
If your externally hosted Elasticsearch does not make use of TLS you will need to update the *_CLIENT_CERT
, *_CLIENT_KEY
and *_CA
variables to be empty. If it uses TLS but not Mutual TLS, update the *_CLIENT_CERT
and *_CLIENT_KEY
variables to be empty and patch or recreate the logging-fluentd
secret with the appropriate *_CA
for communicating with your Elasticsearch. If it uses Mutual TLS as the provided Elasticsearch does, you will just need to patch or recreate the logging-fluentd
secret with your client key, client cert, and CA.
You can use oc edit dc/logging-fluentd
to update your Fluentd configuration. It is advised that you first scale down your number of replicas to 0 before editing the DeploymentConfig.
If you are not using the provided Kibana and Elasticsearch images, you will not have the same multi-tenant capabilities and your data will not be restricted by user access to a particular project.