This documentation is for a release that is no longer maintained
See documentation for the latest supported version 3 or the latest supported version 4.Chapter 6. Monitoring application health
In software systems, components can become unhealthy due to transient issues such as temporary connectivity loss, configuration errors, or problems with external dependencies. OpenShift Container Platform applications have a number of options to detect and handle unhealthy containers.
6.1. Understanding health checks
A probe is a Kubernetes action that periodically performs diagnostics on a running container. Currently, two types of probes exist, each serving a different purpose.
- Readiness Probe
- A Readiness check determines if the container in which it is scheduled is ready to service requests. If the readiness probe fails a container, the endpoints controller ensures the container has its IP address removed from the endpoints of all services. A readiness probe can be used to signal to the endpoints controller that even though a container is running, it should not receive any traffic from a proxy.
For example, a Readiness check can control which Pods are used. When a Pod is not ready, it is removed.
- Liveness Probe
- A Liveness checks determines if the container in which it is scheduled is still running. If the liveness probe fails due to a condition such as a deadlock, the kubelet kills the container The container then responds based on its restart policy.
For example, a liveness probe on a node with a restartPolicy
of Always
or OnFailure
kills and restarts the Container on the node.
Sample Liveness Check
apiVersion: v1 kind: Pod metadata: labels: test: liveness name: liveness-http spec: containers: - name: liveness-http image: k8s.gcr.io/liveness args: - /server livenessProbe: httpGet: # host: my-host # scheme: HTTPS path: /healthz port: 8080 httpHeaders: - name: X-Custom-Header value: Awesome initialDelaySeconds: 15 timeoutSeconds: 1 name: liveness
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-http
spec:
containers:
- name: liveness-http
image: k8s.gcr.io/liveness
args:
- /server
livenessProbe:
httpGet:
# host: my-host
# scheme: HTTPS
path: /healthz
port: 8080
httpHeaders:
- name: X-Custom-Header
value: Awesome
initialDelaySeconds: 15
timeoutSeconds: 1
name: liveness
- 1
- Specifies the image to use for the liveness probe.
- 2
- Specifies the type of heath check.
- 3
- Specifies the type of Liveness check:
-
HTTP Checks. Specify
httpGet
. -
Container Execution Checks. Specify
exec
. -
TCP Socket Check. Specify
tcpSocket
.
-
HTTP Checks. Specify
- 4
- Specifies the number of seconds before performing the first probe after the container starts.
- 5
- Specifies the number of seconds between probes.
Sample Liveness check output wth unhealthy container
oc describe pod pod1
$ oc describe pod pod1
....
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
37s 37s 1 {default-scheduler } Normal Scheduled Successfully assigned liveness-exec to worker0
36s 36s 1 {kubelet worker0} spec.containers{liveness} Normal Pulling pulling image "k8s.gcr.io/busybox"
36s 36s 1 {kubelet worker0} spec.containers{liveness} Normal Pulled Successfully pulled image "k8s.gcr.io/busybox"
36s 36s 1 {kubelet worker0} spec.containers{liveness} Normal Created Created container with docker id 86849c15382e; Security:[seccomp=unconfined]
36s 36s 1 {kubelet worker0} spec.containers{liveness} Normal Started Started container with docker id 86849c15382e
2s 2s 1 {kubelet worker0} spec.containers{liveness} Warning Unhealthy Liveness probe failed: cat: can't open '/tmp/healthy': No such file or directory
6.1.1. Understanding the types of health checks
Liveness checks and Readiness checks can be configured in three ways:
- HTTP Checks
- The kubelet uses a web hook to determine the healthiness of the container. The check is deemed successful if the HTTP response code is between 200 and 399.
A HTTP check is ideal for applications that return HTTP status codes when completely initialized.
- Container Execution Checks
- The kubelet executes a command inside the container. Exiting the check with status 0 is considered a success.
- TCP Socket Checks
- The kubelet attempts to open a socket to the container. The container is only considered healthy if the check can establish a connection. A TCP socket check is ideal for applications that do not start listening until initialization is complete.
6.2. Configuring health checks
To configure health checks, create a pod for each type of check you want.
Procedure
To create health checks:
Create a Liveness Container Execution Check:
Create a YAML file similar to the following:
apiVersion: v1 kind: Pod metadata: labels: test: liveness name: liveness-exec spec: containers: - args: image: k8s.gcr.io/liveness livenessProbe: exec: command: - cat - /tmp/health initialDelaySeconds: 15 ...
apiVersion: v1 kind: Pod metadata: labels: test: liveness name: liveness-exec spec: containers: - args: image: k8s.gcr.io/liveness livenessProbe: exec:
1 command:
2 - cat - /tmp/health initialDelaySeconds: 15
3 ...
Copy to Clipboard Copied! Verify the state of the health check pod:
oc describe pod liveness-exec
$ oc describe pod liveness-exec Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 9s default-scheduler Successfully assigned openshift-logging/liveness-exec to ip-10-0-143-40.ec2.internal Normal Pulling 2s kubelet, ip-10-0-143-40.ec2.internal pulling image "k8s.gcr.io/liveness" Normal Pulled 1s kubelet, ip-10-0-143-40.ec2.internal Successfully pulled image "k8s.gcr.io/liveness" Normal Created 1s kubelet, ip-10-0-143-40.ec2.internal Created container Normal Started 1s kubelet, ip-10-0-143-40.ec2.internal Started container
Copy to Clipboard Copied! NoteThe
timeoutSeconds
parameter has no effect on the Readiness and Liveness probes for Container Execution Checks. You can implement a timeout inside the probe itself, as OpenShift Container Platform cannot time out on an exec call into the container. One way to implement a timeout in a probe is by using thetimeout
parameter to run your liveness or readiness probe:spec: containers: livenessProbe: exec: command: - /bin/bash - '-c' - timeout 60 /opt/eap/bin/livenessProbe.sh timeoutSeconds: 1 periodSeconds: 10 successThreshold: 1 failureThreshold: 3
spec: containers: livenessProbe: exec: command: - /bin/bash - '-c' - timeout 60 /opt/eap/bin/livenessProbe.sh
1 timeoutSeconds: 1 periodSeconds: 10 successThreshold: 1 failureThreshold: 3
Copy to Clipboard Copied! - 1
- Timeout value and path to the probe script.
Create the check:
oc create -f <file-name>.yaml
$ oc create -f <file-name>.yaml
Copy to Clipboard Copied!
Create a Liveness TCP Socket Check:
Create a YAML file similar to the following:
apiVersion: v1 kind: Pod metadata: labels: test: liveness name: liveness-tcp spec: containers: - name: contaier1 image: k8s.gcr.io/liveness ports: - containerPort: 8080 livenessProbe: tcpSocket: port: 8080 initialDelaySeconds: 15 timeoutSeconds: 1
apiVersion: v1 kind: Pod metadata: labels: test: liveness name: liveness-tcp spec: containers: - name: contaier1
1 image: k8s.gcr.io/liveness ports: - containerPort: 8080
2 livenessProbe:
3 tcpSocket: port: 8080 initialDelaySeconds: 15
4 timeoutSeconds: 1
5 Copy to Clipboard Copied! Create the check:
oc create -f <file-name>.yaml
$ oc create -f <file-name>.yaml
Copy to Clipboard Copied!
Create an Readiness HTTP Check:
Create a YAML file similar to the following:
apiVersion: v1 kind: Pod metadata: labels: test: readiness name: readiness-http spec: containers: - args: image: k8s.gcr.io/readiness readinessProbe: httpGet: # host: my-host # scheme: HTTPS path: /healthz port: 8080 initialDelaySeconds: 15 timeoutSeconds: 1
apiVersion: v1 kind: Pod metadata: labels: test: readiness name: readiness-http spec: containers: - args: image: k8s.gcr.io/readiness
1 readinessProbe:
2 httpGet: # host: my-host
3 # scheme: HTTPS
4 path: /healthz port: 8080 initialDelaySeconds: 15
5 timeoutSeconds: 1
6 Copy to Clipboard Copied! - 1
- Specify the image to use for the liveness probe.
- 2
- Specify the Readiness heath check and the type of Readiness check.
- 3
- Specify a host IP address. When
host
is not defined, thePodIP
is used. - 4
- Specify
HTTP
orHTTPS
. Whenscheme
is not defined, theHTTP
scheme is used. - 5
- Specify the number of seconds before performing the first probe after the container starts.
- 6
- Specify the number of seconds between probes.
Create the check:
oc create -f <file-name>.yaml
$ oc create -f <file-name>.yaml
Copy to Clipboard Copied!