이 콘텐츠는 선택한 언어로 제공되지 않습니다.
Chapter 7. Monitoring application health by using health checks
In software systems, components can become unhealthy due to transient issues such as temporary connectivity loss, configuration errors, or problems with external dependencies. OpenShift Container Platform applications have a number of options to detect and handle unhealthy containers.
7.1. Understanding health checks 링크 복사링크가 클립보드에 복사되었습니다!
A health check periodically performs diagnostics on a running container using any combination of the readiness and liveness health checks.
You can include one or more probes in the specification for the pod that contains the container which you want to perform the health checks.
If you want to add or edit health checks in an existing pod, you must edit the pod DeploymentConfig object or use the Developer perspective in the web console. You cannot use the CLI to add or edit health checks for an existing pod.
- Readiness probe
A readiness probe determines if a container is ready to accept service requests. If the readiness probe fails for a container, the kubelet removes the pod from the list of available service endpoints.
After a failure, the probe continues to examine the pod. If the pod becomes available, the kubelet adds the pod to the list of available service endpoints.
- Liveness health check
A liveness probe determines if a container is still running. If the liveness probe fails due to a condition such as a deadlock, the kubelet kills the container. The pod then responds based on its restart policy.
For example, a liveness probe on a pod with a
restartPolicyofAlwaysorOnFailurekills and restarts the container.
You can configure liveness and readiness probes with any of the following types of tests:
HTTP
GET: When using an HTTPGETtest, the test determines the healthiness of the container by using a web hook. The test is successful if the HTTP response code is between200and399.You can use an HTTP
GETtest with applications that return HTTP status codes when completely initialized.-
Container Command: When using a container command test, the probe executes a command inside the container. The probe is successful if the test exits with a
0status. - TCP socket: When using a TCP socket test, the probe attempts to open a socket to the container. The container is only considered healthy if the probe can establish a connection. You can use a TCP socket test with applications that do not start listening until initialization is complete.
You can configure several fields to control the behavior of a probe:
-
initialDelaySeconds: The time, in seconds, after the container starts before the probe can be scheduled. The default is 0. -
periodSeconds: The delay, in seconds, between performing probes. The default is10. -
timeoutSeconds: The number of seconds of inactivity after which the probe times out and the container is assumed to have failed. The default is1. -
successThreshold: The number of times that the probe must report success after a failure in order to reset the container status to successful. The value must be1for a liveness probe. The default is1. failureThreshold: The number of times that the probe is allowed to fail. The default is 3. After the specified attempts:- for a liveness probe, the container is restarted
for a readiness probe, the pod is marked
UnreadyNoteThe
timeoutSecondsparameter has no effect on the readiness and liveness probes for container command probes, as OpenShift Container Platform cannot time out on an exec call into the container. One way to implement a timeout in a container command probe is by using theexec-timeoutcommand to run your liveness or readiness probes, as shown in the examples.
Example probes
The following are samples of different probes as they would appear in an object specification.
Sample readiness probe with a container command readiness probe in a pod spec
apiVersion: v1
kind: Pod
metadata:
labels:
test: health-check
name: my-application
...
spec:
containers:
- name: goproxy-app
args:
image: k8s.gcr.io/goproxy:0.1
readinessProbe:
exec:
command:
- cat
- /tmp/healthy
...
Sample liveness probe with a container command test that uses a timeout in a pod spec
apiVersion: v1
kind: Pod
metadata:
labels:
test: health-check
name: my-application
...
spec:
containers:
- name: goproxy-app
args:
image: k8s.gcr.io/goproxy:0.1
livenessProbe:
exec:
command:
- /bin/bash
- '-c'
- timeout 60 /opt/eap/bin/livenessProbe.sh
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
...
- 1
- The container name.
- 2
- Specify the container image to deploy.
- 3
- The liveness probe.
- 4
- The type of probe, here a container command probe.
- 5
- The command line to execute inside the container.
- 6
- How often in seconds to perform the probe.
- 7
- The number of number of consecutive successes needed to show success after a failure.
- 8
- The number of times to try the probe after a failure.
Sample readiness probe and liveness probe with a TCP socket test in a deployment
kind: Deployment
apiVersion: apps/v1
...
spec:
...
template:
spec:
containers:
- resources: {}
readinessProbe:
tcpSocket:
port: 8080
timeoutSeconds: 1
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
terminationMessagePath: /dev/termination-log
name: ruby-ex
livenessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 15
timeoutSeconds: 1
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
...
7.2. Configuring health checks 링크 복사링크가 클립보드에 복사되었습니다!
To configure readiness and liveness probes, add one or more probes to the specification for the pod that contains the container that you want to perform the health checks on.
If you want to add or edit health checks in an existing pod, you must edit the pod DeploymentConfig object or use the Developer perspective in the web console. You cannot use the CLI to add or edit health checks for an existing pod.
Procedure
To add probes for a container:
Create a
Podobject to add one or more probes:apiVersion: v1 kind: Pod metadata: labels: test: health-check name: my-application spec: containers: - name: my-container1 args: image: k8s.gcr.io/goproxy:0.12 livenessProbe:3 tcpSocket:4 port: 80805 initialDelaySeconds: 156 timeoutSeconds: 17 readinessProbe:8 httpGet:9 host: my-host10 scheme: HTTPS11 path: /healthz port: 808012 - 1
- Specify the container name.
- 2
- Specify the container image to deploy.
- 3
- Optional: Create a Liveness probe.
- 4
- Specify a test to perform, here a TCP Socket test.
- 5
- Specify the port on which the container is listening.
- 6
- Specify the time, in seconds, after the container starts before the probe can be scheduled.
- 7
- Specify the number of seconds between probes.
- 8
- Optional: Create a Readiness probe.
- 9
- Specify the type of test to perform, here an HTTP test.
- 10
- Specify a host IP address. When
hostis not defined, thePodIPis used. - 11
- Specify
HTTPorHTTPS. Whenschemeis not defined, theHTTPscheme is used. - 12
- Specify the port on which the container is listening.
NoteIf the
initialDelaySecondsvalue is lower than theperiodSecondsvalue, the first Readiness probe occurs at some point between the two periods due to an issue with timers.Create the
Podobject:$ oc create -f <file-name>.yamlVerify the state of the health check pod:
$ oc describe pod my-applicationExample output
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 9s default-scheduler Successfully assigned openshift-logging/liveness-exec to ip-10-0-143-40.ec2.internal Normal Pulling 2s kubelet, ip-10-0-143-40.ec2.internal pulling image "k8s.gcr.io/liveness" Normal Pulled 1s kubelet, ip-10-0-143-40.ec2.internal Successfully pulled image "k8s.gcr.io/liveness" Normal Created 1s kubelet, ip-10-0-143-40.ec2.internal Created container Normal Started 1s kubelet, ip-10-0-143-40.ec2.internal Started containerThe following is the output of a failed probe that restarted a container:
Sample Liveness check output with unhealthy container
$ oc describe pod pod1Example output
.... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled <unknown> Successfully assigned aaa/liveness-http to ci-ln-37hz77b-f76d1-wdpjv-worker-b-snzrj Normal AddedInterface 47s multus Add eth0 [10.129.2.11/23] Normal Pulled 46s kubelet, ci-ln-37hz77b-f76d1-wdpjv-worker-b-snzrj Successfully pulled image "k8s.gcr.io/liveness" in 773.406244ms Normal Pulled 28s kubelet, ci-ln-37hz77b-f76d1-wdpjv-worker-b-snzrj Successfully pulled image "k8s.gcr.io/liveness" in 233.328564ms Normal Created 10s (x3 over 46s) kubelet, ci-ln-37hz77b-f76d1-wdpjv-worker-b-snzrj Created container liveness Normal Started 10s (x3 over 46s) kubelet, ci-ln-37hz77b-f76d1-wdpjv-worker-b-snzrj Started container liveness Warning Unhealthy 10s (x6 over 34s) kubelet, ci-ln-37hz77b-f76d1-wdpjv-worker-b-snzrj Liveness probe failed: HTTP probe failed with statuscode: 500 Normal Killing 10s (x2 over 28s) kubelet, ci-ln-37hz77b-f76d1-wdpjv-worker-b-snzrj Container liveness failed liveness probe, will be restarted Normal Pulling 10s (x3 over 47s) kubelet, ci-ln-37hz77b-f76d1-wdpjv-worker-b-snzrj Pulling image "k8s.gcr.io/liveness" Normal Pulled 10s kubelet, ci-ln-37hz77b-f76d1-wdpjv-worker-b-snzrj Successfully pulled image "k8s.gcr.io/liveness" in 244.116568ms