41.6. OOM 종료 진단

컨테이너의 모든 프로세스의 총 메모리 사용량이 메모리 제한을 초과하거나 노드 메모리 소모가 심각한 경우 OpenShift Container Platform에서 컨테이너의 프로세스를 종료할 수 있습니다.

프로세스가 OOM이 종료되면 컨테이너가 즉시 종료되거나 종료되지 않을 수 있습니다. 컨테이너 PID 1 프로세스에서 SIGKILL을 수신하면 컨테이너가 즉시 종료됩니다. 그 외에는 컨테이너 동작이 기타 프로세스의 동작에 따라 달라집니다.

컨테이너가 즉시 종료되지 않으면 다음과 같이 OOM 종료를 탐지할 수 있습니다.

코드 137로 컨테이너 프로세스가 종료되면 SIGKILL 신호가 수신되었음을 나타냅니다.
/sys/fs/cgroup/memory/memory.oom_control 의 oom_kill 카운터가 증가됨

grep '^oom_kill ' /sys/fs/cgroup/memory/memory.oom_control
oom_kill 0
sed -e '' </dev/zero  # provoke an OOM kill
Killed
echo $?
137
grep '^oom_kill ' /sys/fs/cgroup/memory/memory.oom_control
oom_kill 1

$ grep '^oom_kill ' /sys/fs/cgroup/memory/memory.oom_control
oom_kill 0
$ sed -e '' </dev/zero  # provoke an OOM kill
Killed
$ echo $?
137
$ grep '^oom_kill ' /sys/fs/cgroup/memory/memory.oom_control
oom_kill 1

Copy to Clipboard

Toggle word wrap

Pod에서 하나 이상의 프로세스가 OOM 종료된 경우 나중에 Pod가 종료되면(즉시 여부와 관계없이) 단계는 실패, 이유는 OOM 종료가 됩니다. restartPolicy 값에 따라 OOM 종료 Pod를 다시 시작할 수 있습니다. 재시작되지 않은 경우 ReplicationController와 같은 컨트롤러에서 Pod의 실패 상태를 확인하고 새 Pod를 생성하여 이전 Pod를 교체합니다.

재시작되지 않은 경우 Pod 상태는 다음과 같습니다.

oc get pod test
NAME      READY     STATUS      RESTARTS   AGE
test      0/1       OOMKilled   0          1m

oc get pod test -o yaml
...
status:
  containerStatuses:
  - name: test
    ready: false
    restartCount: 0
    state:
      terminated:
        exitCode: 137
        reason: OOMKilled
  phase: Failed

$ oc get pod test
NAME      READY     STATUS      RESTARTS   AGE
test      0/1       OOMKilled   0          1m

$ oc get pod test -o yaml
...
status:
  containerStatuses:
  - name: test
    ready: false
    restartCount: 0
    state:
      terminated:
        exitCode: 137
        reason: OOMKilled
  phase: Failed

Copy to Clipboard

Toggle word wrap

재시작하는 경우 해당 상태는 다음과 같습니다.

oc get pod test
NAME      READY     STATUS    RESTARTS   AGE
test      1/1       Running   1          1m

oc get pod test -o yaml
...
status:
  containerStatuses:
  - name: test
    ready: true
    restartCount: 1
    lastState:
      terminated:
        exitCode: 137
        reason: OOMKilled
    state:
      running:
  phase: Running

$ oc get pod test
NAME      READY     STATUS    RESTARTS   AGE
test      1/1       Running   1          1m

$ oc get pod test -o yaml
...
status:
  containerStatuses:
  - name: test
    ready: true
    restartCount: 1
    lastState:
      terminated:
        exitCode: 137
        reason: OOMKilled
    state:
      running:
  phase: Running

Copy to Clipboard

Toggle word wrap

맨 위로 이동

41.6. OOM 종료 진단

자세한 정보

평가판, 구매 및 판매

커뮤니티

Red Hat 문서 정보

보다 포괄적 수용을 위한 오픈 소스 용어 교체

Red Hat 소개

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links