이 콘텐츠는 선택한 언어로 제공되지 않습니다.

Chapter 10. Monitoring Cluster Events and Logs

10.1. Introduction
링크 복사

In addition to security measures mentioned in other sections of this guide, the ability to monitor and audit an OpenShift Container Platform cluster is an important part of safeguarding the cluster and its users against inappropriate usage.

There are two main sources of cluster-level information that are useful for this purpose: events and logs.

10.2. Cluster Events
링크 복사

Cluster administrators are encouraged to familiarize themselves with the Event resource type and review a list of events to determine which events are of interest. Depending on the master controller and plugin configuration, there are typically more potential event types than listed here.

Events are associated with a namespace, either the namespace of the resource they are related to or, for cluster events, the default namespace. The default namespace holds relevant events for monitoring or auditing a cluster, such as Node events and resource events related to infrastructure components.

The master API and oc command do not provide parameters to scope a listing of events to only those related to nodes. A simple approach would be to use grep:

oc get event -n default | grep Node
1h         20h         3         origin-node-1.example.local   Node      Normal    NodeHasDiskPressure   ...

$ oc get event -n default | grep Node
1h         20h         3         origin-node-1.example.local   Node      Normal    NodeHasDiskPressure   ...

Copy to Clipboard

Toggle word wrap

A more flexible approach is to output the events in a form that other tools can process. For example, the following example uses the jq tool against JSON output to extract only NodeHasDiskPressure events:

oc get events -n default -o json \
  | jq '.items[] | select(.involvedObject.kind == "Node" and .reason == "NodeHasDiskPressure")'

{
  "apiVersion": "v1",
  "count": 3,
  "involvedObject": {
    "kind": "Node",
    "name": "origin-node-1.example.local",
    "uid": "origin-node-1.example.local"
  },
  "kind": "Event",
  "reason": "NodeHasDiskPressure",
  ...
}

$ oc get events -n default -o json \
  | jq '.items[] | select(.involvedObject.kind == "Node" and .reason == "NodeHasDiskPressure")'

{
  "apiVersion": "v1",
  "count": 3,
  "involvedObject": {
    "kind": "Node",
    "name": "origin-node-1.example.local",
    "uid": "origin-node-1.example.local"
  },
  "kind": "Event",
  "reason": "NodeHasDiskPressure",
  ...
}

Copy to Clipboard

Toggle word wrap

Events related to resource creation, modification, or deletion can also be good candidates for detecting misuse of the cluster. The following query, for example, can be used to look for excessive pulling of images:

oc get events --all-namespaces -o json \
  | jq '[.items[] | select(.involvedObject.kind == "Pod" and .reason == "Pulling")] | length'

4

$ oc get events --all-namespaces -o json \
  | jq '[.items[] | select(.involvedObject.kind == "Pod" and .reason == "Pulling")] | length'

4

Copy to Clipboard

Toggle word wrap

Note

When a namespace is deleted, its events are deleted as well. Events can also expire and are deleted to prevent filling up etcd storage. Events are not stored as a permanent record and frequent polling is necessary to capture statistics over time.

10.3. Cluster Logs
링크 복사

This section describes the types of operational logs produced on the cluster.

10.3.1. Service Logs
링크 복사

OpenShift Container Platform produces logs for services that run on static pods in a cluster:

API (use master-logs api api)
Controllers (use master-logs controllers controllers)
etcd (use master-logs etcd etcd)
atomic-openshift-node (use journalctl -u atomic-openshift-node.service)

These logs are intended more for debugging purposes than for security auditing. You can retrieve logs for each service with the master-logs api api, master-logs controllers controllers, or master-logs etcd etcd commands. If your cluster runs an aggregated logging stack, like an Ops cluster, cluster administrators can retrieve logs from the logging .operations indexes.

Note

The API server, controllers, and etcd static pods run in kube-system namespace.

10.3.2. Master API Audit Log
링크 복사

To log master API requests by users, administrators, or system components enable audit logging for the master API. This will create a file on each master host or, if there is no file configured, be included in the service’s journal. Entries in the journal can be found by searching for "AUDIT".

Audit log entries consist of one line recording each REST request when it is received and one line with the HTTP response code when it completes. For example, here is a record of the system administrator requesting a list of nodes:

2017-10-17T13:12:17.635085787Z AUDIT: id="410eda6b-88d4-4491-87ff-394804ca69a1" ip="192.168.122.156" method="GET" user="system:admin" groups="\"system:cluster-admins\",\"system:authenticated\"" as="<self>" asgroups="<lookup>" namespace="<none>" uri="/api/v1/nodes"
2017-10-17T13:12:17.636081056Z AUDIT: id="410eda6b-88d4-4491-87ff-394804ca69a1" response="200"

2017-10-17T13:12:17.635085787Z AUDIT: id="410eda6b-88d4-4491-87ff-394804ca69a1" ip="192.168.122.156" method="GET" user="system:admin" groups="\"system:cluster-admins\",\"system:authenticated\"" as="<self>" asgroups="<lookup>" namespace="<none>" uri="/api/v1/nodes"
2017-10-17T13:12:17.636081056Z AUDIT: id="410eda6b-88d4-4491-87ff-394804ca69a1" response="200"

Copy to Clipboard

Toggle word wrap

It might be useful to poll the log periodically for the number of recent requests per response code, as shown in the following example:

tail -5000 /var/log/origin/audit-ocp.log \
  | grep -Po 'response="..."' \
  | sort | uniq -c | sort -rn

   3288 response="200"
      8 response="404"
      6 response="201"

$ tail -5000 /var/log/origin/audit-ocp.log \
  | grep -Po 'response="..."' \
  | sort | uniq -c | sort -rn

   3288 response="200"
      8 response="404"
      6 response="201"

Copy to Clipboard

Toggle word wrap

The following list describes some of the response codes in more detail:

200 or 201 response codes indicate a successful request.
400 response codes may be of interest as they indicate a malformed request, which should not occur with most clients.
404 response codes are typically benign requests for a resource that does not exist.
500 - 599 response codes indicate server errors, which can be a result of bugs, system failures, or even malicious activity.

If an unusual number of error responses are found, the audit log entries for corresponding requests can be retrieved for further investigation.

Note

The IP address of the request is typically a cluster host or API load balancer, and there is no record of the IP address behind a load balancer proxy request (however, load balancer logs can be useful for determining request origin).

It can be useful to look for unusual numbers of requests by a particular user or group.

The following example lists the top 10 users by number of requests in the last 5000 lines of the audit log:

tail -5000 /var/log/origin/audit-ocp.log \
  | grep -Po ' user="(.*?)(?<!\\)"' \
  | sort | uniq -c | sort -rn | head -10

  976  user="system:openshift-master"
  270  user="system:node:origin-node-1.example.local"
  270  user="system:node:origin-master.example.local"
   66  user="system:anonymous"
   32  user="system:serviceaccount:kube-system:cronjob-controller"
   24  user="system:serviceaccount:kube-system:pod-garbage-collector"
   18  user="system:serviceaccount:kube-system:endpoint-controller"
   14  user="system:serviceaccount:openshift-infra:serviceaccount-pull-secrets-controller"
   11  user="test user"
    4  user="test \" user"

$ tail -5000 /var/log/origin/audit-ocp.log \
  | grep -Po ' user="(.*?)(?<!\\)"' \
  | sort | uniq -c | sort -rn | head -10

  976  user="system:openshift-master"
  270  user="system:node:origin-node-1.example.local"
  270  user="system:node:origin-master.example.local"
   66  user="system:anonymous"
   32  user="system:serviceaccount:kube-system:cronjob-controller"
   24  user="system:serviceaccount:kube-system:pod-garbage-collector"
   18  user="system:serviceaccount:kube-system:endpoint-controller"
   14  user="system:serviceaccount:openshift-infra:serviceaccount-pull-secrets-controller"
   11  user="test user"
    4  user="test \" user"

Copy to Clipboard

Toggle word wrap

More advanced queries generally require the use of additional log analysis tools. Auditors will need a detailed familiarity with the OpenShift v1 API and Kubernetes v1 API to aggregate request summaries from the audit log according to which kind of resource is involved (the uri field). See REST API Reference for details.

More advanced audit logging capabilities are available. This feature enables providing an audit policy file to control which requests are logged and the level of detail to log. Advanced audit log entries provide more detail in JSON format and can be logged via a webhook as opposed to file or system journal.

이 콘텐츠는 선택한 언어로 제공되지 않습니다.

Chapter 10. Monitoring Cluster Events and Logs

10.1. Introduction
링크 복사

10.2. Cluster Events
링크 복사

10.3. Cluster Logs
링크 복사

10.3.1. Service Logs
링크 복사

10.3.2. Master API Audit Log
링크 복사

자세한 정보

평가판, 구매 및 판매

커뮤니티

Red Hat 문서 정보

보다 포괄적 수용을 위한 오픈 소스 용어 교체

Red Hat 소개

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

이 콘텐츠는 선택한 언어로 제공되지 않습니다.

Chapter 10. Monitoring Cluster Events and Logs

10.1. Introduction링크 복사링크가 클립보드에 복사되었습니다!

10.2. Cluster Events링크 복사링크가 클립보드에 복사되었습니다!

10.3. Cluster Logs링크 복사링크가 클립보드에 복사되었습니다!

10.3.1. Service Logs링크 복사링크가 클립보드에 복사되었습니다!

10.3.2. Master API Audit Log링크 복사링크가 클립보드에 복사되었습니다!

자세한 정보

평가판, 구매 및 판매

커뮤니티

Red Hat 문서 정보

보다 포괄적 수용을 위한 오픈 소스 용어 교체

Red Hat 소개

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

10.1. Introduction
링크 복사

10.2. Cluster Events
링크 복사

10.3. Cluster Logs
링크 복사

10.3.1. Service Logs
링크 복사

10.3.2. Master API Audit Log
링크 복사