Chapter 7. Working with containers
7.1. Understanding Containers
The basic units of OpenShift Container Platform applications are called containers. Linux container technologies are lightweight mechanisms for isolating running processes so that they are limited to interacting with only their designated resources.
Many application instances can be running in containers on a single host without visibility into each others' processes, files, network, and so on. Typically, each container provides a single service (often called a "micro-service"), such as a web server or a database, though containers can be used for arbitrary workloads.
The Linux kernel has been incorporating capabilities for container technologies for years. OpenShift Container Platform and Kubernetes add the ability to orchestrate containers across multi-host installations.
7.1.1. About containers and RHEL kernel memory
Due to Red Hat Enterprise Linux (RHEL) behavior, a container on a node with high CPU usage might seem to consume more memory than expected. The higher memory consumption could be caused by the kmem_cache
in the RHEL kernel. The RHEL kernel creates a kmem_cache
for each cgroup. For added performance, the kmem_cache
contains a cpu_cache
, and a node cache for any NUMA nodes. These caches all consume kernel memory.
The amount of memory stored in those caches is proportional to the number of CPUs that the system uses. As a result, a higher number of CPUs results in a greater amount of kernel memory being held in these caches. Higher amounts of kernel memory in these caches can cause OpenShift Container Platform containers to exceed the configured memory limits, resulting in the container being killed.
To avoid losing containers due to kernel memory issues, ensure that the containers request sufficient memory. You can use the following formula to estimate the amount of memory consumed by the kmem_cache
, where nproc
is the number of processing units available that are reported by the nproc
command. The lower limit of container requests should be this value plus the container memory requirements:
$(nproc) X 1/2 MiB
7.1.2. About the container engine and container runtime
A container engine is a piece of software that processes user requests, including command line options and image pulls. The container engine uses a container runtime, also called a lower-level container runtime, to run and manage the components required to deploy and operate containers. You likely will not need to interact with the container engine or container runtime.
The OpenShift Container Platform documentation uses the term container runtime to refer to the lower-level container runtime. Other documentation can refer to the container engine as the container runtime.
OpenShift Container Platform uses CRI-O as the container engine and runC or crun as the container runtime. The default container runtime is runC. Both container runtimes adhere to the Open Container Initiative (OCI) runtime specifications.
CRI-O is a Kubernetes-native container engine implementation that integrates closely with the operating system to deliver an efficient and optimized Kubernetes experience. The CRI-O container engine runs as a systemd service on each OpenShift Container Platform cluster node.
runC, developed by Docker and maintained by the Open Container Project, is a lightweight, portable container runtime written in Go. crun, developed by Red Hat, is a fast and low-memory container runtime fully written in C. As of OpenShift Container Platform 4.15, you can select between the two.
crun has several improvements over runC, including:
- Smaller binary
- Quicker processing
- Lower memory footprint
runC has some benefits over crun, including:
- Most popular OCI container runtime.
- Longer tenure in production.
- Default container runtime of CRI-O.
You can move between the two container runtimes as needed.
For information on setting which container runtime to use, see Creating a ContainerRuntimeConfig
CR to edit CRI-O parameters.
7.2. Using Init Containers to perform tasks before a pod is deployed
OpenShift Container Platform provides init containers, which are specialized containers that run before application containers and can contain utilities or setup scripts not present in an app image.
7.2.1. Understanding Init Containers
You can use an Init Container resource to perform tasks before the rest of a pod is deployed.
A pod can have Init Containers in addition to application containers. Init containers allow you to reorganize setup scripts and binding code.
An Init Container can:
- Contain and run utilities that are not desirable to include in the app Container image for security reasons.
- Contain utilities or custom code for setup that is not present in an app image. For example, there is no requirement to make an image FROM another image just to use a tool like sed, awk, python, or dig during setup.
- Use Linux namespaces so that they have different filesystem views from app containers, such as access to secrets that application containers are not able to access.
Each Init Container must complete successfully before the next one is started. So, Init Containers provide an easy way to block or delay the startup of app containers until some set of preconditions are met.
For example, the following are some ways you can use Init Containers:
Wait for a service to be created with a shell command like:
for i in {1..100}; do sleep 1; if dig myservice; then exit 0; fi; done; exit 1
Register this pod with a remote server from the downward API with a command like:
$ curl -X POST http://$MANAGEMENT_SERVICE_HOST:$MANAGEMENT_SERVICE_PORT/register -d ‘instance=$()&ip=$()’
-
Wait for some time before starting the app Container with a command like
sleep 60
. - Clone a git repository into a volume.
- Place values into a configuration file and run a template tool to dynamically generate a configuration file for the main app Container. For example, place the POD_IP value in a configuration and generate the main app configuration file using Jinja.
See the Kubernetes documentation for more information.
7.2.2. Creating Init Containers
The following example outlines a simple pod which has two Init Containers. The first waits for myservice
and the second waits for mydb
. After both containers complete, the pod begins.
Procedure
Create the pod for the Init Container:
Create a YAML file similar to the following:
apiVersion: v1 kind: Pod metadata: name: myapp-pod labels: app: myapp spec: securityContext: runAsNonRoot: true seccompProfile: type: RuntimeDefault containers: - name: myapp-container image: registry.access.redhat.com/ubi9/ubi:latest command: ['sh', '-c', 'echo The app is running! && sleep 3600'] securityContext: allowPrivilegeEscalation: false capabilities: drop: [ALL] initContainers: - name: init-myservice image: registry.access.redhat.com/ubi9/ubi:latest command: ['sh', '-c', 'until getent hosts myservice; do echo waiting for myservice; sleep 2; done;'] securityContext: allowPrivilegeEscalation: false capabilities: drop: [ALL] - name: init-mydb image: registry.access.redhat.com/ubi9/ubi:latest command: ['sh', '-c', 'until getent hosts mydb; do echo waiting for mydb; sleep 2; done;'] securityContext: allowPrivilegeEscalation: false capabilities: drop: [ALL]
Create the pod:
$ oc create -f myapp.yaml
View the status of the pod:
$ oc get pods
Example output
NAME READY STATUS RESTARTS AGE myapp-pod 0/1 Init:0/2 0 5s
The pod status,
Init:0/2
, indicates it is waiting for the two services.
Create the
myservice
service.Create a YAML file similar to the following:
kind: Service apiVersion: v1 metadata: name: myservice spec: ports: - protocol: TCP port: 80 targetPort: 9376
Create the pod:
$ oc create -f myservice.yaml
View the status of the pod:
$ oc get pods
Example output
NAME READY STATUS RESTARTS AGE myapp-pod 0/1 Init:1/2 0 5s
The pod status,
Init:1/2
, indicates it is waiting for one service, in this case themydb
service.
Create the
mydb
service:Create a YAML file similar to the following:
kind: Service apiVersion: v1 metadata: name: mydb spec: ports: - protocol: TCP port: 80 targetPort: 9377
Create the pod:
$ oc create -f mydb.yaml
View the status of the pod:
$ oc get pods
Example output
NAME READY STATUS RESTARTS AGE myapp-pod 1/1 Running 0 2m
The pod status indicated that it is no longer waiting for the services and is running.
7.3. Using volumes to persist container data
Files in a container are ephemeral. As such, when a container crashes or stops, the data is lost. You can use volumes to persist the data used by the containers in a pod. A volume is directory, accessible to the Containers in a pod, where data is stored for the life of the pod.
7.3.1. Understanding volumes
Volumes are mounted file systems available to pods and their containers which may be backed by a number of host-local or network attached storage endpoints. Containers are not persistent by default; on restart, their contents are cleared.
To ensure that the file system on the volume contains no errors and, if errors are present, to repair them when possible, OpenShift Container Platform invokes the fsck
utility prior to the mount
utility. This occurs when either adding a volume or updating an existing volume.
The simplest volume type is emptyDir
, which is a temporary directory on a single machine. Administrators may also allow you to request a persistent volume that is automatically attached to your pods.
emptyDir
volume storage may be restricted by a quota based on the pod’s FSGroup, if the FSGroup parameter is enabled by your cluster administrator.
7.3.2. Working with volumes using the OpenShift Container Platform CLI
You can use the CLI command oc set volume
to add and remove volumes and volume mounts for any object that has a pod template like replication controllers or deployment configs. You can also list volumes in pods or any object that has a pod template.
The oc set volume
command uses the following general syntax:
$ oc set volume <object_selection> <operation> <mandatory_parameters> <options>
- Object selection
-
Specify one of the following for the
object_selection
parameter in theoc set volume
command:
Syntax | Description | Example |
---|---|---|
|
Selects |
|
|
Selects |
|
|
Selects resources of type |
|
|
Selects all resources of type |
|
| File name, directory, or URL to file to use to edit the resource. |
|
- Operation
-
Specify
--add
or--remove
for theoperation
parameter in theoc set volume
command. - Mandatory parameters
- Any mandatory parameters are specific to the selected operation and are discussed in later sections.
- Options
- Any options are specific to the selected operation and are discussed in later sections.
7.3.3. Listing volumes and volume mounts in a pod
You can list volumes and volume mounts in pods or pod templates:
Procedure
To list volumes:
$ oc set volume <object_type>/<name> [options]
List volume supported options:
Option | Description | Default |
---|---|---|
| Name of the volume. | |
|
Select containers by name. It can also take wildcard |
|
For example:
To list all volumes for pod p1:
$ oc set volume pod/p1
To list volume v1 defined on all deployment configs:
$ oc set volume dc --all --name=v1
7.3.4. Adding volumes to a pod
You can add volumes and volume mounts to a pod.
Procedure
To add a volume, a volume mount, or both to pod templates:
$ oc set volume <object_type>/<name> --add [options]
Option | Description | Default |
---|---|---|
| Name of the volume. | Automatically generated, if not specified. |
|
Name of the volume source. Supported values: |
|
|
Select containers by name. It can also take wildcard |
|
|
Mount path inside the selected containers. Do not mount to the container root, | |
|
Host path. Mandatory parameter for | |
|
Name of the secret. Mandatory parameter for | |
|
Name of the configmap. Mandatory parameter for | |
|
Name of the persistent volume claim. Mandatory parameter for | |
|
Details of volume source as a JSON string. Recommended if the desired volume source is not supported by | |
|
Display the modified objects instead of updating them on the server. Supported values: | |
| Output the modified objects with the given version. |
|
For example:
To add a new volume source emptyDir to the registry
DeploymentConfig
object:$ oc set volume dc/registry --add
TipYou can alternatively apply the following YAML to add the volume:
Example 7.1. Sample deployment config with an added volume
kind: DeploymentConfig apiVersion: apps.openshift.io/v1 metadata: name: registry namespace: registry spec: replicas: 3 selector: app: httpd template: metadata: labels: app: httpd spec: volumes: 1 - name: volume-pppsw emptyDir: {} containers: - name: httpd image: >- image-registry.openshift-image-registry.svc:5000/openshift/httpd:latest ports: - containerPort: 8080 protocol: TCP
- 1
- Add the volume source emptyDir.
To add volume v1 with secret secret1 for replication controller r1 and mount inside the containers at /data:
$ oc set volume rc/r1 --add --name=v1 --type=secret --secret-name='secret1' --mount-path=/data
TipYou can alternatively apply the following YAML to add the volume:
Example 7.2. Sample replication controller with added volume and secret
kind: ReplicationController apiVersion: v1 metadata: name: example-1 namespace: example spec: replicas: 0 selector: app: httpd deployment: example-1 deploymentconfig: example template: metadata: creationTimestamp: null labels: app: httpd deployment: example-1 deploymentconfig: example spec: volumes: 1 - name: v1 secret: secretName: secret1 defaultMode: 420 containers: - name: httpd image: >- image-registry.openshift-image-registry.svc:5000/openshift/httpd:latest volumeMounts: 2 - name: v1 mountPath: /data
To add existing persistent volume v1 with claim name pvc1 to deployment configuration dc.json on disk, mount the volume on container c1 at /data, and update the
DeploymentConfig
object on the server:$ oc set volume -f dc.json --add --name=v1 --type=persistentVolumeClaim \ --claim-name=pvc1 --mount-path=/data --containers=c1
TipYou can alternatively apply the following YAML to add the volume:
Example 7.3. Sample deployment config with persistent volume added
kind: DeploymentConfig apiVersion: apps.openshift.io/v1 metadata: name: example namespace: example spec: replicas: 3 selector: app: httpd template: metadata: labels: app: httpd spec: volumes: - name: volume-pppsw emptyDir: {} - name: v1 1 persistentVolumeClaim: claimName: pvc1 containers: - name: httpd image: >- image-registry.openshift-image-registry.svc:5000/openshift/httpd:latest ports: - containerPort: 8080 protocol: TCP volumeMounts: 2 - name: v1 mountPath: /data
To add a volume v1 based on Git repository https://github.com/namespace1/project1 with revision 5125c45f9f563 for all replication controllers:
$ oc set volume rc --all --add --name=v1 \ --source='{"gitRepo": { "repository": "https://github.com/namespace1/project1", "revision": "5125c45f9f563" }}'
7.3.5. Updating volumes and volume mounts in a pod
You can modify the volumes and volume mounts in a pod.
Procedure
Updating existing volumes using the --overwrite
option:
$ oc set volume <object_type>/<name> --add --overwrite [options]
For example:
To replace existing volume v1 for replication controller r1 with existing persistent volume claim pvc1:
$ oc set volume rc/r1 --add --overwrite --name=v1 --type=persistentVolumeClaim --claim-name=pvc1
TipYou can alternatively apply the following YAML to replace the volume:
Example 7.4. Sample replication controller with persistent volume claim named
pvc1
kind: ReplicationController apiVersion: v1 metadata: name: example-1 namespace: example spec: replicas: 0 selector: app: httpd deployment: example-1 deploymentconfig: example template: metadata: labels: app: httpd deployment: example-1 deploymentconfig: example spec: volumes: - name: v1 1 persistentVolumeClaim: claimName: pvc1 containers: - name: httpd image: >- image-registry.openshift-image-registry.svc:5000/openshift/httpd:latest ports: - containerPort: 8080 protocol: TCP volumeMounts: - name: v1 mountPath: /data
- 1
- Set persistent volume claim to
pvc1
.
To change the
DeploymentConfig
object d1 mount point to /opt for volume v1:$ oc set volume dc/d1 --add --overwrite --name=v1 --mount-path=/opt
TipYou can alternatively apply the following YAML to change the mount point:
Example 7.5. Sample deployment config with mount point set to
opt
.kind: DeploymentConfig apiVersion: apps.openshift.io/v1 metadata: name: example namespace: example spec: replicas: 3 selector: app: httpd template: metadata: labels: app: httpd spec: volumes: - name: volume-pppsw emptyDir: {} - name: v2 persistentVolumeClaim: claimName: pvc1 - name: v1 persistentVolumeClaim: claimName: pvc1 containers: - name: httpd image: >- image-registry.openshift-image-registry.svc:5000/openshift/httpd:latest ports: - containerPort: 8080 protocol: TCP volumeMounts: 1 - name: v1 mountPath: /opt
- 1
- Set the mount point to
/opt
.
7.3.6. Removing volumes and volume mounts from a pod
You can remove a volume or volume mount from a pod.
Procedure
To remove a volume from pod templates:
$ oc set volume <object_type>/<name> --remove [options]
Option | Description | Default |
---|---|---|
| Name of the volume. | |
|
Select containers by name. It can also take wildcard |
|
| Indicate that you want to remove multiple volumes at once. | |
|
Display the modified objects instead of updating them on the server. Supported values: | |
| Output the modified objects with the given version. |
|
For example:
To remove a volume v1 from the
DeploymentConfig
object d1:$ oc set volume dc/d1 --remove --name=v1
To unmount volume v1 from container c1 for the
DeploymentConfig
object d1 and remove the volume v1 if it is not referenced by any containers on d1:$ oc set volume dc/d1 --remove --name=v1 --containers=c1
To remove all volumes for replication controller r1:
$ oc set volume rc/r1 --remove --confirm
7.3.7. Configuring volumes for multiple uses in a pod
You can configure a volume to allows you to share one volume for multiple uses in a single pod using the volumeMounts.subPath
property to specify a subPath
value inside a volume instead of the volume’s root.
You cannot add a subPath
parameter to an existing scheduled pod.
Procedure
To view the list of files in the volume, run the
oc rsh
command:$ oc rsh <pod>
Example output
sh-4.2$ ls /path/to/volume/subpath/mount example_file1 example_file2 example_file3
Specify the
subPath
:Example
Pod
spec withsubPath
parameterapiVersion: v1 kind: Pod metadata: name: my-site spec: securityContext: runAsNonRoot: true seccompProfile: type: RuntimeDefault containers: - name: mysql image: mysql volumeMounts: - mountPath: /var/lib/mysql name: site-data subPath: mysql 1 securityContext: allowPrivilegeEscalation: false capabilities: drop: [ALL] - name: php image: php volumeMounts: - mountPath: /var/www/html name: site-data subPath: html 2 securityContext: allowPrivilegeEscalation: false capabilities: drop: [ALL] volumes: - name: site-data persistentVolumeClaim: claimName: my-site-data
7.4. Mapping volumes using projected volumes
A projected volume maps several existing volume sources into the same directory.
The following types of volume sources can be projected:
- Secrets
- Config Maps
- Downward API
All sources are required to be in the same namespace as the pod.
7.4.1. Understanding projected volumes
Projected volumes can map any combination of these volume sources into a single directory, allowing the user to:
- automatically populate a single volume with the keys from multiple secrets, config maps, and with downward API information, so that I can synthesize a single directory with various sources of information;
- populate a single volume with the keys from multiple secrets, config maps, and with downward API information, explicitly specifying paths for each item, so that I can have full control over the contents of that volume.
When the RunAsUser
permission is set in the security context of a Linux-based pod, the projected files have the correct permissions set, including container user ownership. However, when the Windows equivalent RunAsUsername
permission is set in a Windows pod, the kubelet is unable to correctly set ownership on the files in the projected volume.
Therefore, the RunAsUsername
permission set in the security context of a Windows pod is not honored for Windows projected volumes running in OpenShift Container Platform.
The following general scenarios show how you can use projected volumes.
- Config map, secrets, Downward API.
-
Projected volumes allow you to deploy containers with configuration data that includes passwords. An application using these resources could be deploying Red Hat OpenStack Platform (RHOSP) on Kubernetes. The configuration data might have to be assembled differently depending on if the services are going to be used for production or for testing. If a pod is labeled with production or testing, the downward API selector
metadata.labels
can be used to produce the correct RHOSP configs. - Config map + secrets.
- Projected volumes allow you to deploy containers involving configuration data and passwords. For example, you might execute a config map with some sensitive encrypted tasks that are decrypted using a vault password file.
- ConfigMap + Downward API.
-
Projected volumes allow you to generate a config including the pod name (available via the
metadata.name
selector). This application can then pass the pod name along with requests to easily determine the source without using IP tracking. - Secrets + Downward API.
-
Projected volumes allow you to use a secret as a public key to encrypt the namespace of the pod (available via the
metadata.namespace
selector). This example allows the Operator to use the application to deliver the namespace information securely without using an encrypted transport.
7.4.1.1. Example Pod specs
The following are examples of Pod
specs for creating projected volumes.
Pod with a secret, a Downward API, and a config map
apiVersion: v1 kind: Pod metadata: name: volume-test spec: securityContext: runAsNonRoot: true seccompProfile: type: RuntimeDefault containers: - name: container-test image: busybox volumeMounts: 1 - name: all-in-one mountPath: "/projected-volume"2 readOnly: true 3 securityContext: allowPrivilegeEscalation: false capabilities: drop: [ALL] volumes: 4 - name: all-in-one 5 projected: defaultMode: 0400 6 sources: - secret: name: mysecret 7 items: - key: username path: my-group/my-username 8 - downwardAPI: 9 items: - path: "labels" fieldRef: fieldPath: metadata.labels - path: "cpu_limit" resourceFieldRef: containerName: container-test resource: limits.cpu - configMap: 10 name: myconfigmap items: - key: config path: my-group/my-config mode: 0777 11
- 1
- Add a
volumeMounts
section for each container that needs the secret. - 2
- Specify a path to an unused directory where the secret will appear.
- 3
- Set
readOnly
totrue
. - 4
- Add a
volumes
block to list each projected volume source. - 5
- Specify any name for the volume.
- 6
- Set the execute permission on the files.
- 7
- Add a secret. Enter the name of the secret object. Each secret you want to use must be listed.
- 8
- Specify the path to the secrets file under the
mountPath
. Here, the secrets file is in /projected-volume/my-group/my-username. - 9
- Add a Downward API source.
- 10
- Add a ConfigMap source.
- 11
- Set the mode for the specific projection
If there are multiple containers in the pod, each container needs a volumeMounts
section, but only one volumes
section is needed.
Pod with multiple secrets with a non-default permission mode set
apiVersion: v1 kind: Pod metadata: name: volume-test spec: securityContext: runAsNonRoot: true seccompProfile: type: RuntimeDefault containers: - name: container-test image: busybox volumeMounts: - name: all-in-one mountPath: "/projected-volume" readOnly: true securityContext: allowPrivilegeEscalation: false capabilities: drop: [ALL] volumes: - name: all-in-one projected: defaultMode: 0755 sources: - secret: name: mysecret items: - key: username path: my-group/my-username - secret: name: mysecret2 items: - key: password path: my-group/my-password mode: 511
The defaultMode
can only be specified at the projected level and not for each volume source. However, as illustrated above, you can explicitly set the mode
for each individual projection.
7.4.1.2. Pathing Considerations
- Collisions Between Keys when Configured Paths are Identical
If you configure any keys with the same path, the pod spec will not be accepted as valid. In the following example, the specified path for
mysecret
andmyconfigmap
are the same:apiVersion: v1 kind: Pod metadata: name: volume-test spec: securityContext: runAsNonRoot: true seccompProfile: type: RuntimeDefault containers: - name: container-test image: busybox volumeMounts: - name: all-in-one mountPath: "/projected-volume" readOnly: true securityContext: allowPrivilegeEscalation: false capabilities: drop: [ALL] volumes: - name: all-in-one projected: sources: - secret: name: mysecret items: - key: username path: my-group/data - configMap: name: myconfigmap items: - key: config path: my-group/data
Consider the following situations related to the volume file paths.
- Collisions Between Keys without Configured Paths
- The only run-time validation that can occur is when all the paths are known at pod creation, similar to the above scenario. Otherwise, when a conflict occurs the most recent specified resource will overwrite anything preceding it (this is true for resources that are updated after pod creation as well).
- Collisions when One Path is Explicit and the Other is Automatically Projected
- In the event that there is a collision due to a user specified path matching data that is automatically projected, the latter resource will overwrite anything preceding it as before
7.4.2. Configuring a Projected Volume for a Pod
When creating projected volumes, consider the volume file path situations described in Understanding projected volumes.
The following example shows how to use a projected volume to mount an existing secret volume source. The steps can be used to create a user name and password secrets from local files. You then create a pod that runs one container, using a projected volume to mount the secrets into the same shared directory.
The user name and password values can be any valid string that is base64 encoded.
The following example shows admin
in base64:
$ echo -n "admin" | base64
Example output
YWRtaW4=
The following example shows the password 1f2d1e2e67df
in base64:
$ echo -n "1f2d1e2e67df" | base64
Example output
MWYyZDFlMmU2N2Rm
Procedure
To use a projected volume to mount an existing secret volume source.
Create the secret:
Create a YAML file similar to the following, replacing the password and user information as appropriate:
apiVersion: v1 kind: Secret metadata: name: mysecret type: Opaque data: pass: MWYyZDFlMmU2N2Rm user: YWRtaW4=
Use the following command to create the secret:
$ oc create -f <secrets-filename>
For example:
$ oc create -f secret.yaml
Example output
secret "mysecret" created
You can check that the secret was created using the following commands:
$ oc get secret <secret-name>
For example:
$ oc get secret mysecret
Example output
NAME TYPE DATA AGE mysecret Opaque 2 17h
$ oc get secret <secret-name> -o yaml
For example:
$ oc get secret mysecret -o yaml
apiVersion: v1 data: pass: MWYyZDFlMmU2N2Rm user: YWRtaW4= kind: Secret metadata: creationTimestamp: 2017-05-30T20:21:38Z name: mysecret namespace: default resourceVersion: "2107" selfLink: /api/v1/namespaces/default/secrets/mysecret uid: 959e0424-4575-11e7-9f97-fa163e4bd54c type: Opaque
Create a pod with a projected volume.
Create a YAML file similar to the following, including a
volumes
section:kind: Pod metadata: name: test-projected-volume spec: securityContext: runAsNonRoot: true seccompProfile: type: RuntimeDefault containers: - name: test-projected-volume image: busybox args: - sleep - "86400" volumeMounts: - name: all-in-one mountPath: "/projected-volume" readOnly: true securityContext: allowPrivilegeEscalation: false capabilities: drop: [ALL] volumes: - name: all-in-one projected: sources: - secret: name: mysecret 1
- 1
- The name of the secret you created.
Create the pod from the configuration file:
$ oc create -f <your_yaml_file>.yaml
For example:
$ oc create -f secret-pod.yaml
Example output
pod "test-projected-volume" created
Verify that the pod container is running, and then watch for changes to the pod:
$ oc get pod <name>
For example:
$ oc get pod test-projected-volume
The output should appear similar to the following:
Example output
NAME READY STATUS RESTARTS AGE test-projected-volume 1/1 Running 0 14s
In another terminal, use the
oc exec
command to open a shell to the running container:$ oc exec -it <pod> <command>
For example:
$ oc exec -it test-projected-volume -- /bin/sh
In your shell, verify that the
projected-volumes
directory contains your projected sources:/ # ls
Example output
bin home root tmp dev proc run usr etc projected-volume sys var
7.5. Allowing containers to consume API objects
The Downward API is a mechanism that allows containers to consume information about API objects without coupling to OpenShift Container Platform. Such information includes the pod’s name, namespace, and resource values. Containers can consume information from the downward API using environment variables or a volume plugin.
7.5.1. Expose pod information to Containers using the Downward API
The Downward API contains such information as the pod’s name, project, and resource values. Containers can consume information from the downward API using environment variables or a volume plugin.
Fields within the pod are selected using the FieldRef
API type. FieldRef
has two fields:
Field | Description |
---|---|
| The path of the field to select, relative to the pod. |
|
The API version to interpret the |
Currently, the valid selectors in the v1 API include:
Selector | Description |
---|---|
| The pod’s name. This is supported in both environment variables and volumes. |
| The pod’s namespace.This is supported in both environment variables and volumes. |
| The pod’s labels. This is only supported in volumes and not in environment variables. |
| The pod’s annotations. This is only supported in volumes and not in environment variables. |
| The pod’s IP. This is only supported in environment variables and not volumes. |
The apiVersion
field, if not specified, defaults to the API version of the enclosing pod template.
7.5.2. Understanding how to consume container values using the downward API
You containers can consume API values using environment variables or a volume plugin. Depending on the method you choose, containers can consume:
- Pod name
- Pod project/namespace
- Pod annotations
- Pod labels
Annotations and labels are available using only a volume plugin.
7.5.2.1. Consuming container values using environment variables
When using a container’s environment variables, use the EnvVar
type’s valueFrom
field (of type EnvVarSource
) to specify that the variable’s value should come from a FieldRef
source instead of the literal value specified by the value
field.
Only constant attributes of the pod can be consumed this way, as environment variables cannot be updated once a process is started in a way that allows the process to be notified that the value of a variable has changed. The fields supported using environment variables are:
- Pod name
- Pod project/namespace
Procedure
Create a new pod spec that contains the environment variables you want the container to consume:
Create a
pod.yaml
file similar to the following:apiVersion: v1 kind: Pod metadata: name: dapi-env-test-pod spec: securityContext: runAsNonRoot: true seccompProfile: type: RuntimeDefault containers: - name: env-test-container image: gcr.io/google_containers/busybox command: [ "/bin/sh", "-c", "env" ] env: - name: MY_POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: MY_POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace securityContext: allowPrivilegeEscalation: false capabilities: drop: [ALL] restartPolicy: Never # ...
Create the pod from the
pod.yaml
file:$ oc create -f pod.yaml
Verification
Check the container’s logs for the
MY_POD_NAME
andMY_POD_NAMESPACE
values:$ oc logs -p dapi-env-test-pod
7.5.2.2. Consuming container values using a volume plugin
You containers can consume API values using a volume plugin.
Containers can consume:
- Pod name
- Pod project/namespace
- Pod annotations
- Pod labels
Procedure
To use the volume plugin:
Create a new pod spec that contains the environment variables you want the container to consume:
Create a
volume-pod.yaml
file similar to the following:kind: Pod apiVersion: v1 metadata: labels: zone: us-east-coast cluster: downward-api-test-cluster1 rack: rack-123 name: dapi-volume-test-pod annotations: annotation1: "345" annotation2: "456" spec: securityContext: runAsNonRoot: true seccompProfile: type: RuntimeDefault containers: - name: volume-test-container image: gcr.io/google_containers/busybox command: ["sh", "-c", "cat /tmp/etc/pod_labels /tmp/etc/pod_annotations"] volumeMounts: - name: podinfo mountPath: /tmp/etc readOnly: false securityContext: allowPrivilegeEscalation: false capabilities: drop: [ALL] volumes: - name: podinfo downwardAPI: defaultMode: 420 items: - fieldRef: fieldPath: metadata.name path: pod_name - fieldRef: fieldPath: metadata.namespace path: pod_namespace - fieldRef: fieldPath: metadata.labels path: pod_labels - fieldRef: fieldPath: metadata.annotations path: pod_annotations restartPolicy: Never # ...
Create the pod from the
volume-pod.yaml
file:$ oc create -f volume-pod.yaml
Verification
Check the container’s logs and verify the presence of the configured fields:
$ oc logs -p dapi-volume-test-pod
Example output
cluster=downward-api-test-cluster1 rack=rack-123 zone=us-east-coast annotation1=345 annotation2=456 kubernetes.io/config.source=api
7.5.3. Understanding how to consume container resources using the Downward API
When creating pods, you can use the Downward API to inject information about computing resource requests and limits so that image and application authors can correctly create an image for specific environments.
You can do this using environment variable or a volume plugin.
7.5.3.1. Consuming container resources using environment variables
When creating pods, you can use the Downward API to inject information about computing resource requests and limits using environment variables.
When creating the pod configuration, specify environment variables that correspond to the contents of the resources
field in the spec.container
field.
If the resource limits are not included in the container configuration, the downward API defaults to the node’s CPU and memory allocatable values.
Procedure
Create a new pod spec that contains the resources you want to inject:
Create a
pod.yaml
file similar to the following:apiVersion: v1 kind: Pod metadata: name: dapi-env-test-pod spec: containers: - name: test-container image: gcr.io/google_containers/busybox:1.24 command: [ "/bin/sh", "-c", "env" ] resources: requests: memory: "32Mi" cpu: "125m" limits: memory: "64Mi" cpu: "250m" env: - name: MY_CPU_REQUEST valueFrom: resourceFieldRef: resource: requests.cpu - name: MY_CPU_LIMIT valueFrom: resourceFieldRef: resource: limits.cpu - name: MY_MEM_REQUEST valueFrom: resourceFieldRef: resource: requests.memory - name: MY_MEM_LIMIT valueFrom: resourceFieldRef: resource: limits.memory # ...
Create the pod from the
pod.yaml
file:$ oc create -f pod.yaml
7.5.3.2. Consuming container resources using a volume plugin
When creating pods, you can use the Downward API to inject information about computing resource requests and limits using a volume plugin.
When creating the pod configuration, use the spec.volumes.downwardAPI.items
field to describe the desired resources that correspond to the spec.resources
field.
If the resource limits are not included in the container configuration, the Downward API defaults to the node’s CPU and memory allocatable values.
Procedure
Create a new pod spec that contains the resources you want to inject:
Create a
pod.yaml
file similar to the following:apiVersion: v1 kind: Pod metadata: name: dapi-env-test-pod spec: containers: - name: client-container image: gcr.io/google_containers/busybox:1.24 command: ["sh", "-c", "while true; do echo; if [[ -e /etc/cpu_limit ]]; then cat /etc/cpu_limit; fi; if [[ -e /etc/cpu_request ]]; then cat /etc/cpu_request; fi; if [[ -e /etc/mem_limit ]]; then cat /etc/mem_limit; fi; if [[ -e /etc/mem_request ]]; then cat /etc/mem_request; fi; sleep 5; done"] resources: requests: memory: "32Mi" cpu: "125m" limits: memory: "64Mi" cpu: "250m" volumeMounts: - name: podinfo mountPath: /etc readOnly: false volumes: - name: podinfo downwardAPI: items: - path: "cpu_limit" resourceFieldRef: containerName: client-container resource: limits.cpu - path: "cpu_request" resourceFieldRef: containerName: client-container resource: requests.cpu - path: "mem_limit" resourceFieldRef: containerName: client-container resource: limits.memory - path: "mem_request" resourceFieldRef: containerName: client-container resource: requests.memory # ...
Create the pod from the
volume-pod.yaml
file:$ oc create -f volume-pod.yaml
7.5.4. Consuming secrets using the Downward API
When creating pods, you can use the downward API to inject secrets so image and application authors can create an image for specific environments.
Procedure
Create a secret to inject:
Create a
secret.yaml
file similar to the following:apiVersion: v1 kind: Secret metadata: name: mysecret data: password: <password> username: <username> type: kubernetes.io/basic-auth
Create the secret object from the
secret.yaml
file:$ oc create -f secret.yaml
Create a pod that references the
username
field from the aboveSecret
object:Create a
pod.yaml
file similar to the following:apiVersion: v1 kind: Pod metadata: name: dapi-env-test-pod spec: securityContext: runAsNonRoot: true seccompProfile: type: RuntimeDefault containers: - name: env-test-container image: gcr.io/google_containers/busybox command: [ "/bin/sh", "-c", "env" ] env: - name: MY_SECRET_USERNAME valueFrom: secretKeyRef: name: mysecret key: username securityContext: allowPrivilegeEscalation: false capabilities: drop: [ALL] restartPolicy: Never # ...
Create the pod from the
pod.yaml
file:$ oc create -f pod.yaml
Verification
Check the container’s logs for the
MY_SECRET_USERNAME
value:$ oc logs -p dapi-env-test-pod
7.5.5. Consuming configuration maps using the Downward API
When creating pods, you can use the Downward API to inject configuration map values so image and application authors can create an image for specific environments.
Procedure
Create a config map with the values to inject:
Create a
configmap.yaml
file similar to the following:apiVersion: v1 kind: ConfigMap metadata: name: myconfigmap data: mykey: myvalue
Create the config map from the
configmap.yaml
file:$ oc create -f configmap.yaml
Create a pod that references the above config map:
Create a
pod.yaml
file similar to the following:apiVersion: v1 kind: Pod metadata: name: dapi-env-test-pod spec: securityContext: runAsNonRoot: true seccompProfile: type: RuntimeDefault containers: - name: env-test-container image: gcr.io/google_containers/busybox command: [ "/bin/sh", "-c", "env" ] env: - name: MY_CONFIGMAP_VALUE valueFrom: configMapKeyRef: name: myconfigmap key: mykey securityContext: allowPrivilegeEscalation: false capabilities: drop: [ALL] restartPolicy: Always # ...
Create the pod from the
pod.yaml
file:$ oc create -f pod.yaml
Verification
Check the container’s logs for the
MY_CONFIGMAP_VALUE
value:$ oc logs -p dapi-env-test-pod
7.5.6. Referencing environment variables
When creating pods, you can reference the value of a previously defined environment variable by using the $()
syntax. If the environment variable reference can not be resolved, the value will be left as the provided string.
Procedure
Create a pod that references an existing environment variable:
Create a
pod.yaml
file similar to the following:apiVersion: v1 kind: Pod metadata: name: dapi-env-test-pod spec: securityContext: runAsNonRoot: true seccompProfile: type: RuntimeDefault containers: - name: env-test-container image: gcr.io/google_containers/busybox command: [ "/bin/sh", "-c", "env" ] env: - name: MY_EXISTING_ENV value: my_value - name: MY_ENV_VAR_REF_ENV value: $(MY_EXISTING_ENV) securityContext: allowPrivilegeEscalation: false capabilities: drop: [ALL] restartPolicy: Never # ...
Create the pod from the
pod.yaml
file:$ oc create -f pod.yaml
Verification
Check the container’s logs for the
MY_ENV_VAR_REF_ENV
value:$ oc logs -p dapi-env-test-pod
7.5.7. Escaping environment variable references
When creating a pod, you can escape an environment variable reference by using a double dollar sign. The value will then be set to a single dollar sign version of the provided value.
Procedure
Create a pod that references an existing environment variable:
Create a
pod.yaml
file similar to the following:apiVersion: v1 kind: Pod metadata: name: dapi-env-test-pod spec: securityContext: runAsNonRoot: true seccompProfile: type: RuntimeDefault containers: - name: env-test-container image: gcr.io/google_containers/busybox command: [ "/bin/sh", "-c", "env" ] env: - name: MY_NEW_ENV value: $$(SOME_OTHER_ENV) securityContext: allowPrivilegeEscalation: false capabilities: drop: [ALL] restartPolicy: Never # ...
Create the pod from the
pod.yaml
file:$ oc create -f pod.yaml
Verification
Check the container’s logs for the
MY_NEW_ENV
value:$ oc logs -p dapi-env-test-pod
7.6. Copying files to or from an OpenShift Container Platform container
You can use the CLI to copy local files to or from a remote directory in a container using the rsync
command.
7.6.1. Understanding how to copy files
The oc rsync
command, or remote sync, is a useful tool for copying database archives to and from your pods for backup and restore purposes. You can also use oc rsync
to copy source code changes into a running pod for development debugging, when the running pod supports hot reload of source files.
$ oc rsync <source> <destination> [-c <container>]
7.6.1.1. Requirements
- Specifying the Copy Source
The source argument of the
oc rsync
command must point to either a local directory or a pod directory. Individual files are not supported.When specifying a pod directory the directory name must be prefixed with the pod name:
<pod name>:<dir>
If the directory name ends in a path separator (
/
), only the contents of the directory are copied to the destination. Otherwise, the directory and its contents are copied to the destination.- Specifying the Copy Destination
-
The destination argument of the
oc rsync
command must point to a directory. If the directory does not exist, butrsync
is used for copy, the directory is created for you. - Deleting Files at the Destination
-
The
--delete
flag may be used to delete any files in the remote directory that are not in the local directory. - Continuous Syncing on File Change
Using the
--watch
option causes the command to monitor the source path for any file system changes, and synchronizes changes when they occur. With this argument, the command runs forever.Synchronization occurs after short quiet periods to ensure a rapidly changing file system does not result in continuous synchronization calls.
When using the
--watch
option, the behavior is effectively the same as manually invokingoc rsync
repeatedly, including any arguments normally passed tooc rsync
. Therefore, you can control the behavior via the same flags used with manual invocations ofoc rsync
, such as--delete
.
7.6.2. Copying files to and from containers
Support for copying local files to or from a container is built into the CLI.
Prerequisites
When working with oc rsync
, note the following:
rsync must be installed. The
oc rsync
command uses the localrsync
tool, if present on the client machine and the remote container.If
rsync
is not found locally or in the remote container, a tar archive is created locally and sent to the container where the tar utility is used to extract the files. If tar is not available in the remote container, the copy will fail.The tar copy method does not provide the same functionality as
oc rsync
. For example,oc rsync
creates the destination directory if it does not exist and only sends files that are different between the source and the destination.NoteIn Windows, the
cwRsync
client should be installed and added to the PATH for use with theoc rsync
command.
Procedure
To copy a local directory to a pod directory:
$ oc rsync <local-dir> <pod-name>:/<remote-dir> -c <container-name>
For example:
$ oc rsync /home/user/source devpod1234:/src -c user-container
To copy a pod directory to a local directory:
$ oc rsync devpod1234:/src /home/user/source
Example output
$ oc rsync devpod1234:/src/status.txt /home/user/
7.6.3. Using advanced Rsync features
The oc rsync
command exposes fewer command line options than standard rsync
. In the case that you want to use a standard rsync
command line option that is not available in oc rsync
, for example the --exclude-from=FILE
option, it might be possible to use standard rsync
's --rsh
(-e
) option or RSYNC_RSH
environment variable as a workaround, as follows:
$ rsync --rsh='oc rsh' --exclude-from=<file_name> <local-dir> <pod-name>:/<remote-dir>
or:
Export the RSYNC_RSH
variable:
$ export RSYNC_RSH='oc rsh'
Then, run the rsync command:
$ rsync --exclude-from=<file_name> <local-dir> <pod-name>:/<remote-dir>
Both of the above examples configure standard rsync
to use oc rsh
as its remote shell program to enable it to connect to the remote pod, and are an alternative to running oc rsync
.
7.7. Executing remote commands in an OpenShift Container Platform container
You can use the CLI to execute remote commands in an OpenShift Container Platform container.
7.7.1. Executing remote commands in containers
Support for remote container command execution is built into the CLI.
Procedure
To run a command in a container:
$ oc exec <pod> [-c <container>] -- <command> [<arg_1> ... <arg_n>]
For example:
$ oc exec mypod date
Example output
Thu Apr 9 02:21:53 UTC 2015
For security purposes, the oc exec
command does not work when accessing privileged containers except when the command is executed by a cluster-admin
user.
7.7.2. Protocol for initiating a remote command from a client
Clients initiate the execution of a remote command in a container by issuing a request to the Kubernetes API server:
/proxy/nodes/<node_name>/exec/<namespace>/<pod>/<container>?command=<command>
In the above URL:
-
<node_name>
is the FQDN of the node. -
<namespace>
is the project of the target pod. -
<pod>
is the name of the target pod. -
<container>
is the name of the target container. -
<command>
is the desired command to be executed.
For example:
/proxy/nodes/node123.openshift.com/exec/myns/mypod/mycontainer?command=date
Additionally, the client can add parameters to the request to indicate if:
- the client should send input to the remote container’s command (stdin).
- the client’s terminal is a TTY.
- the remote container’s command should send output from stdout to the client.
- the remote container’s command should send output from stderr to the client.
After sending an exec
request to the API server, the client upgrades the connection to one that supports multiplexed streams; the current implementation uses HTTP/2.
The client creates one stream each for stdin, stdout, and stderr. To distinguish among the streams, the client sets the streamType
header on the stream to one of stdin
, stdout
, or stderr
.
The client closes all streams, the upgraded connection, and the underlying connection when it is finished with the remote command execution request.
7.8. Using port forwarding to access applications in a container
OpenShift Container Platform supports port forwarding to pods.
7.8.1. Understanding port forwarding
You can use the CLI to forward one or more local ports to a pod. This allows you to listen on a given or random port locally, and have data forwarded to and from given ports in the pod.
Support for port forwarding is built into the CLI:
$ oc port-forward <pod> [<local_port>:]<remote_port> [...[<local_port_n>:]<remote_port_n>]
The CLI listens on each local port specified by the user, forwarding using the protocol described below.
Ports may be specified using the following formats:
| The client listens on port 5000 locally and forwards to 5000 in the pod. |
| The client listens on port 6000 locally and forwards to 5000 in the pod. |
| The client selects a free local port and forwards to 5000 in the pod. |
OpenShift Container Platform handles port-forward requests from clients. Upon receiving a request, OpenShift Container Platform upgrades the response and waits for the client to create port-forwarding streams. When OpenShift Container Platform receives a new stream, it copies data between the stream and the pod’s port.
Architecturally, there are options for forwarding to a pod’s port. The supported OpenShift Container Platform implementation invokes nsenter
directly on the node host to enter the pod’s network namespace, then invokes socat
to copy data between the stream and the pod’s port. However, a custom implementation could include running a helper pod that then runs nsenter
and socat
, so that those binaries are not required to be installed on the host.
7.8.2. Using port forwarding
You can use the CLI to port-forward one or more local ports to a pod.
Procedure
Use the following command to listen on the specified port in a pod:
$ oc port-forward <pod> [<local_port>:]<remote_port> [...[<local_port_n>:]<remote_port_n>]
For example:
Use the following command to listen on ports
5000
and6000
locally and forward data to and from ports5000
and6000
in the pod:$ oc port-forward <pod> 5000 6000
Example output
Forwarding from 127.0.0.1:5000 -> 5000 Forwarding from [::1]:5000 -> 5000 Forwarding from 127.0.0.1:6000 -> 6000 Forwarding from [::1]:6000 -> 6000
Use the following command to listen on port
8888
locally and forward to5000
in the pod:$ oc port-forward <pod> 8888:5000
Example output
Forwarding from 127.0.0.1:8888 -> 5000 Forwarding from [::1]:8888 -> 5000
Use the following command to listen on a free port locally and forward to
5000
in the pod:$ oc port-forward <pod> :5000
Example output
Forwarding from 127.0.0.1:42390 -> 5000 Forwarding from [::1]:42390 -> 5000
Or:
$ oc port-forward <pod> 0:5000
7.8.3. Protocol for initiating port forwarding from a client
Clients initiate port forwarding to a pod by issuing a request to the Kubernetes API server:
/proxy/nodes/<node_name>/portForward/<namespace>/<pod>
In the above URL:
-
<node_name>
is the FQDN of the node. -
<namespace>
is the namespace of the target pod. -
<pod>
is the name of the target pod.
For example:
/proxy/nodes/node123.openshift.com/portForward/myns/mypod
After sending a port forward request to the API server, the client upgrades the connection to one that supports multiplexed streams; the current implementation uses Hyptertext Transfer Protocol Version 2 (HTTP/2).
The client creates a stream with the port
header containing the target port in the pod. All data written to the stream is delivered via the kubelet to the target pod and port. Similarly, all data sent from the pod for that forwarded connection is delivered back to the same stream in the client.
The client closes all streams, the upgraded connection, and the underlying connection when it is finished with the port forwarding request.
7.9. Using sysctls in containers
Sysctl settings are exposed through Kubernetes, allowing users to modify certain kernel parameters at runtime. Only sysctls that are namespaced can be set independently on pods. If a sysctl is not namespaced, called node-level, you must use another method of setting the sysctl, such as by using the Node Tuning Operator.
Network sysctls are a special category of sysctl. Network sysctls include:
-
System-wide sysctls, for example
net.ipv4.ip_local_port_range
, that are valid for all networking. You can set these independently for each pod on a node. -
Interface-specific sysctls, for example
net.ipv4.conf.IFNAME.accept_local
, that only apply to a specific additional network interface for a given pod. You can set these independently for each additional network configuration. You set these by using a configuration in thetuning-cni
after the network interfaces are created.
Moreover, only those sysctls considered safe are whitelisted by default; you can manually enable other unsafe sysctls on the node to be available to the user.
If you are setting the sysctl and it is not node-level, you can find information on this procedure in the section Using the Node Tuning Operator.
7.9.1. About sysctls
In Linux, the sysctl interface allows an administrator to modify kernel parameters at runtime. Parameters are available from the /proc/sys/
virtual process file system. The parameters cover various subsystems, such as:
-
kernel (common prefix:
kernel.
) -
networking (common prefix:
net.
) -
virtual memory (common prefix:
vm.
) -
MDADM (common prefix:
dev.
)
More subsystems are described in Kernel documentation. To get a list of all parameters, run:
$ sudo sysctl -a
7.9.2. Namespaced and node-level sysctls
A number of sysctls are namespaced in the Linux kernels. This means that you can set them independently for each pod on a node. Being namespaced is a requirement for sysctls to be accessible in a pod context within Kubernetes.
The following sysctls are known to be namespaced:
-
kernel.shm*
-
kernel.msg*
-
kernel.sem
-
fs.mqueue.*
Additionally, most of the sysctls in the net.*
group are known to be namespaced. Their namespace adoption differs based on the kernel version and distributor.
Sysctls that are not namespaced are called node-level and must be set manually by the cluster administrator, either by means of the underlying Linux distribution of the nodes, such as by modifying the /etc/sysctls.conf
file, or by using a daemon set with privileged containers. You can use the Node Tuning Operator to set node-level sysctls.
Consider marking nodes with special sysctls as tainted. Only schedule pods onto them that need those sysctl settings. Use the taints and toleration feature to mark the nodes.
7.9.3. Safe and unsafe sysctls
Sysctls are grouped into safe and unsafe sysctls.
For system-wide sysctls to be considered safe, they must be namespaced. A namespaced sysctl ensures there is isolation between namespaces and therefore pods. If you set a sysctl for one pod it must not add any of the following:
- Influence any other pod on the node
- Harm the node health
- Gain CPU or memory resources outside of the resource limits of a pod
Being namespaced alone is not sufficient for the sysctl to be considered safe.
Any sysctl that is not added to the allowed list on OpenShift Container Platform is considered unsafe for OpenShift Container Platform.
Unsafe sysctls are not allowed by default. For system-wide sysctls the cluster administrator must manually enable them on a per-node basis. Pods with disabled unsafe sysctls are scheduled but do not launch.
You cannot manually enable interface-specific unsafe sysctls.
OpenShift Container Platform adds the following system-wide and interface-specific safe sysctls to an allowed safe list:
sysctl | Description |
---|---|
|
When set to |
|
Defines the local port range that is used by TCP and UDP to choose the local port. The first number is the first port number, and the second number is the last local port number. If possible, it is better if these numbers have different parity (one even and one odd value). They must be greater than or equal to |
|
When |
|
This restricts |
|
This defines the first unprivileged port in the network namespace. To disable all privileged ports, set this to |
| Specify a range of comma-separated local ports that you want to reserve for applications or services. |
sysctl | Description |
---|---|
| Accept IPv4 ICMP redirect messages. |
| Accept IPv4 packets with strict source route (SRR) option. |
| Define behavior for gratuitous ARP frames with an IPv4 address that is not already present in the ARP table:
|
| Define mode for notification of IPv4 address and device changes. |
| Disable IPSEC policy (SPD) for this IPv4 interface. |
| Accept ICMP redirect messages only to gateways listed in the interface’s current gateway list. |
| Send redirects is enabled only if the node acts as a router. That is, a host should not send an ICMP redirect message. It is used by routers to notify the host about a better routing path that is available for a particular destination. |
| Accept IPv6 Router advertisements; autoconfigure using them. It also determines whether or not to transmit router solicitations. Router solicitations are transmitted only if the functional setting is to accept router advertisements. |
| Accept IPv6 ICMP redirect messages. |
| Accept IPv6 packets with SRR option. |
| Define behavior for gratuitous ARP frames with an IPv6 address that is not already present in the ARP table:
|
| Define mode for notification of IPv6 address and device changes. |
| This parameter controls the hardware address to IP mapping lifetime in the neighbour table for IPv6. |
| Set the retransmit timer for neighbor discovery messages. |
When setting these values using the tuning
CNI plugin, use the value IFNAME
literally. The interface name is represented by the IFNAME
token, and is replaced with the actual name of the interface at runtime.
7.9.4. Updating the interface-specific safe sysctls list
OpenShift Container Platform includes a predefined list of safe interface-specific sysctls
. You can modify this list by updating the cni-sysctl-allowlist
in the openshift-multus
namespace.
The support for updating the interface-specific safe sysctls list is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
Follow this procedure to modify the predefined list of safe sysctls
. This procedure describes how to extend the default allow list.
Procedure
View the existing predefined list by running the following command:
$ oc get cm -n openshift-multus cni-sysctl-allowlist -oyaml
Expected output
apiVersion: v1 data: allowlist.conf: |- ^net.ipv4.conf.IFNAME.accept_redirects$ ^net.ipv4.conf.IFNAME.accept_source_route$ ^net.ipv4.conf.IFNAME.arp_accept$ ^net.ipv4.conf.IFNAME.arp_notify$ ^net.ipv4.conf.IFNAME.disable_policy$ ^net.ipv4.conf.IFNAME.secure_redirects$ ^net.ipv4.conf.IFNAME.send_redirects$ ^net.ipv6.conf.IFNAME.accept_ra$ ^net.ipv6.conf.IFNAME.accept_redirects$ ^net.ipv6.conf.IFNAME.accept_source_route$ ^net.ipv6.conf.IFNAME.arp_accept$ ^net.ipv6.conf.IFNAME.arp_notify$ ^net.ipv6.neigh.IFNAME.base_reachable_time_ms$ ^net.ipv6.neigh.IFNAME.retrans_time_ms$ kind: ConfigMap metadata: annotations: kubernetes.io/description: | Sysctl allowlist for nodes. release.openshift.io/version: 4.15.0-0.nightly-2022-11-16-003434 creationTimestamp: "2022-11-17T14:09:27Z" name: cni-sysctl-allowlist namespace: openshift-multus resourceVersion: "2422" uid: 96d138a3-160e-4943-90ff-6108fa7c50c3
Edit the list by using the following command:
$ oc edit cm -n openshift-multus cni-sysctl-allowlist -oyaml
For example, to allow you to be able to implement stricter reverse path forwarding you need to add
^net.ipv4.conf.IFNAME.rp_filter$
and^net.ipv6.conf.IFNAME.rp_filter$
to the list as shown here:# Please edit the object below. Lines beginning with a '#' will be ignored, # and an empty file will abort the edit. If an error occurs while saving this file will be # reopened with the relevant failures. # apiVersion: v1 data: allowlist.conf: |- ^net.ipv4.conf.IFNAME.accept_redirects$ ^net.ipv4.conf.IFNAME.accept_source_route$ ^net.ipv4.conf.IFNAME.arp_accept$ ^net.ipv4.conf.IFNAME.arp_notify$ ^net.ipv4.conf.IFNAME.disable_policy$ ^net.ipv4.conf.IFNAME.secure_redirects$ ^net.ipv4.conf.IFNAME.send_redirects$ ^net.ipv4.conf.IFNAME.rp_filter$ ^net.ipv6.conf.IFNAME.accept_ra$ ^net.ipv6.conf.IFNAME.accept_redirects$ ^net.ipv6.conf.IFNAME.accept_source_route$ ^net.ipv6.conf.IFNAME.arp_accept$ ^net.ipv6.conf.IFNAME.arp_notify$ ^net.ipv6.neigh.IFNAME.base_reachable_time_ms$ ^net.ipv6.neigh.IFNAME.retrans_time_ms$ ^net.ipv6.conf.IFNAME.rp_filter$
Save the changes to the file and exit.
NoteThe removal of
sysctls
is also supported. Edit the file, remove thesysctl
orsysctls
then save the changes and exit.
Verification
Follow this procedure to enforce stricter reverse path forwarding for IPv4. For more information on reverse path forwarding see Reverse Path Forwarding .
Create a network attachment definition, such as
reverse-path-fwd-example.yaml
, with the following content:apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: tuningnad namespace: default spec: config: '{ "cniVersion": "0.4.0", "name": "tuningnad", "plugins": [{ "type": "bridge" }, { "type": "tuning", "sysctl": { "net.ipv4.conf.IFNAME.rp_filter": "1" } } ] }'
Apply the yaml by running the following command:
$ oc apply -f reverse-path-fwd-example.yaml
Example output
networkattachmentdefinition.k8.cni.cncf.io/tuningnad created
Create a pod such as
examplepod.yaml
using the following YAML:apiVersion: v1 kind: Pod metadata: name: example labels: app: httpd namespace: default annotations: k8s.v1.cni.cncf.io/networks: tuningnad 1 spec: securityContext: runAsNonRoot: true seccompProfile: type: RuntimeDefault containers: - name: httpd image: 'image-registry.openshift-image-registry.svc:5000/openshift/httpd:latest' ports: - containerPort: 8080 securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL
- 1
- Specify the name of the configured
NetworkAttachmentDefinition
.
Apply the yaml by running the following command:
$ oc apply -f examplepod.yaml
Verify that the pod is created by running the following command:
$ oc get pod
Example output
NAME READY STATUS RESTARTS AGE example 1/1 Running 0 47s
Log in to the pod by running the following command:
$ oc rsh example
Verify the value of the configured sysctl flag. For example, find the value
net.ipv4.conf.net1.rp_filter
by running the following command:sh-4.4# sysctl net.ipv4.conf.net1.rp_filter
Expected output
net.ipv4.conf.net1.rp_filter = 1
Additional resources
7.9.5. Starting a pod with safe sysctls
You can set sysctls on pods using the pod’s securityContext
. The securityContext
applies to all containers in the same pod.
Safe sysctls are allowed by default.
This example uses the pod securityContext
to set the following safe sysctls:
-
kernel.shm_rmid_forced
-
net.ipv4.ip_local_port_range
-
net.ipv4.tcp_syncookies
-
net.ipv4.ping_group_range
To avoid destabilizing your operating system, modify sysctl parameters only after you understand their effects.
Use this procedure to start a pod with the configured sysctl settings.
In most cases you modify an existing pod definition and add the securityContext
spec.
Procedure
Create a YAML file
sysctl_pod.yaml
that defines an example pod and add thesecurityContext
spec, as shown in the following example:apiVersion: v1 kind: Pod metadata: name: sysctl-example namespace: default spec: containers: - name: podexample image: centos command: ["bin/bash", "-c", "sleep INF"] securityContext: runAsUser: 2000 1 runAsGroup: 3000 2 allowPrivilegeEscalation: false 3 capabilities: 4 drop: ["ALL"] securityContext: runAsNonRoot: true 5 seccompProfile: 6 type: RuntimeDefault sysctls: - name: kernel.shm_rmid_forced value: "1" - name: net.ipv4.ip_local_port_range value: "32770 60666" - name: net.ipv4.tcp_syncookies value: "0" - name: net.ipv4.ping_group_range value: "0 200000000"
- 1
runAsUser
controls which user ID the container is run with.- 2
runAsGroup
controls which primary group ID the containers is run with.- 3
allowPrivilegeEscalation
determines if a pod can request to allow privilege escalation. If unspecified, it defaults to true. This boolean directly controls whether theno_new_privs
flag gets set on the container process.- 4
capabilities
permit privileged actions without giving full root access. This policy ensures all capabilities are dropped from the pod.- 5
runAsNonRoot: true
requires that the container will run with a user with any UID other than 0.- 6
RuntimeDefault
enables the default seccomp profile for a pod or container workload.
Create the pod by running the following command:
$ oc apply -f sysctl_pod.yaml
Verify that the pod is created by running the following command:
$ oc get pod
Example output
NAME READY STATUS RESTARTS AGE sysctl-example 1/1 Running 0 14s
Log in to the pod by running the following command:
$ oc rsh sysctl-example
Verify the values of the configured sysctl flags. For example, find the value
kernel.shm_rmid_forced
by running the following command:sh-4.4# sysctl kernel.shm_rmid_forced
Expected output
kernel.shm_rmid_forced = 1
7.9.6. Starting a pod with unsafe sysctls
A pod with unsafe sysctls fails to launch on any node unless the cluster administrator explicitly enables unsafe sysctls for that node. As with node-level sysctls, use the taints and toleration feature or labels on nodes to schedule those pods onto the right nodes.
The following example uses the pod securityContext
to set a safe sysctl kernel.shm_rmid_forced
and two unsafe sysctls, net.core.somaxconn
and kernel.msgmax
. There is no distinction between safe and unsafe sysctls in the specification.
To avoid destabilizing your operating system, modify sysctl parameters only after you understand their effects.
The following example illustrates what happens when you add safe and unsafe sysctls to a pod specification:
Procedure
Create a YAML file
sysctl-example-unsafe.yaml
that defines an example pod and add thesecurityContext
specification, as shown in the following example:apiVersion: v1 kind: Pod metadata: name: sysctl-example-unsafe spec: containers: - name: podexample image: centos command: ["bin/bash", "-c", "sleep INF"] securityContext: runAsUser: 2000 runAsGroup: 3000 allowPrivilegeEscalation: false capabilities: drop: ["ALL"] securityContext: runAsNonRoot: true seccompProfile: type: RuntimeDefault sysctls: - name: kernel.shm_rmid_forced value: "0" - name: net.core.somaxconn value: "1024" - name: kernel.msgmax value: "65536"
Create the pod using the following command:
$ oc apply -f sysctl-example-unsafe.yaml
Verify that the pod is scheduled but does not deploy because unsafe sysctls are not allowed for the node using the following command:
$ oc get pod
Example output
NAME READY STATUS RESTARTS AGE sysctl-example-unsafe 0/1 SysctlForbidden 0 14s
7.9.7. Enabling unsafe sysctls
A cluster administrator can allow certain unsafe sysctls for very special situations such as high performance or real-time application tuning.
If you want to use unsafe sysctls, a cluster administrator must enable them individually for a specific type of node. The sysctls must be namespaced.
You can further control which sysctls are set in pods by specifying lists of sysctls or sysctl patterns in the allowedUnsafeSysctls
field of the Security Context Constraints.
-
The
allowedUnsafeSysctls
option controls specific needs such as high performance or real-time application tuning.
Due to their nature of being unsafe, the use of unsafe sysctls is at-your-own-risk and can lead to severe problems, such as improper behavior of containers, resource shortage, or breaking a node.
Procedure
List existing MachineConfig objects for your OpenShift Container Platform cluster to decide how to label your machine config by running the following command:
$ oc get machineconfigpool
Example output
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-bfb92f0cd1684e54d8e234ab7423cc96 True False False 3 3 3 0 42m worker rendered-worker-21b6cb9a0f8919c88caf39db80ac1fce True False False 3 3 3 0 42m
Add a label to the machine config pool where the containers with the unsafe sysctls will run by running the following command:
$ oc label machineconfigpool worker custom-kubelet=sysctl
Create a YAML file
set-sysctl-worker.yaml
that defines aKubeletConfig
custom resource (CR):apiVersion: machineconfiguration.openshift.io/v1 kind: KubeletConfig metadata: name: custom-kubelet spec: machineConfigPoolSelector: matchLabels: custom-kubelet: sysctl 1 kubeletConfig: allowedUnsafeSysctls: 2 - "kernel.msg*" - "net.core.somaxconn"
Create the object by running the following command:
$ oc apply -f set-sysctl-worker.yaml
Wait for the Machine Config Operator to generate the new rendered configuration and apply it to the machines by running the following command:
$ oc get machineconfigpool worker -w
After some minutes the
UPDATING
status changes from True to False:NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE worker rendered-worker-f1704a00fc6f30d3a7de9a15fd68a800 False True False 3 2 2 0 71m worker rendered-worker-f1704a00fc6f30d3a7de9a15fd68a800 False True False 3 2 3 0 72m worker rendered-worker-0188658afe1f3a183ec8c4f14186f4d5 True False False 3 3 3 0 72m
Create a YAML file
sysctl-example-safe-unsafe.yaml
that defines an example pod and add thesecurityContext
spec, as shown in the following example:apiVersion: v1 kind: Pod metadata: name: sysctl-example-safe-unsafe spec: containers: - name: podexample image: centos command: ["bin/bash", "-c", "sleep INF"] securityContext: runAsUser: 2000 runAsGroup: 3000 allowPrivilegeEscalation: false capabilities: drop: ["ALL"] securityContext: runAsNonRoot: true seccompProfile: type: RuntimeDefault sysctls: - name: kernel.shm_rmid_forced value: "0" - name: net.core.somaxconn value: "1024" - name: kernel.msgmax value: "65536"
Create the pod by running the following command:
$ oc apply -f sysctl-example-safe-unsafe.yaml
Expected output
Warning: would violate PodSecurity "restricted:latest": forbidden sysctls (net.core.somaxconn, kernel.msgmax) pod/sysctl-example-safe-unsafe created
Verify that the pod is created by running the following command:
$ oc get pod
Example output
NAME READY STATUS RESTARTS AGE sysctl-example-safe-unsafe 1/1 Running 0 19s
Log in to the pod by running the following command:
$ oc rsh sysctl-example-safe-unsafe
Verify the values of the configured sysctl flags. For example, find the value
net.core.somaxconn
by running the following command:sh-4.4# sysctl net.core.somaxconn
Expected output
net.core.somaxconn = 1024
The unsafe sysctl is now allowed and the value is set as defined in the securityContext
spec of the updated pod specification.
7.9.8. Additional resources
7.10. Accessing faster builds with /dev/fuse
You can configure your pods with the /dev/fuse
device to access faster builds.
7.10.1. Configuring /dev/fuse on unprivileged pods
As an alternative to the virtual filesystem, you can configure the /dev/fuse
device to the io.kubernetes.cri-o.Devices
annotation to access faster builds within unprivileged pods. Using /dev/fuse
is secure, efficient, and scalable, and allows unprivileged users to mount an overlay filesystem as if the unprivileged pod was privileged.
Procedure
Create the pod.
$ oc exec -ti no-priv -- /bin/bash
$ cat >> Dockerfile <<EOF FROM registry.access.redhat.com/ubi9 EOF
$ podman build .
Implement
/dev/fuse
by adding the/dev/fuse
device to theio.kubernetes.cri-o.Devices
annotation.io.kubernetes.cri-o.Devices: "/dev/fuse"
For example:
apiVersion: v1 kind: Pod metadata: name: podman-pod annotations: io.kubernetes.cri-o.Devices: "/dev/fuse"
Configure the
/dev/fuse
device in your pod specifications.spec: containers: - name: podman-container image: quay.io/podman/stable args: - sleep - "1000000" securityContext: runAsUser: 1000