主页
产品
OpenShift Container Platform
4.5
Nodes
5.9. Using sysctls in containers

5.9. Using sysctls in containers

Sysctl settings are exposed via Kubernetes, allowing users to modify certain kernel parameters at runtime for namespaces within a container. Only sysctls that are namespaced can be set independently on pods. If a sysctl is not namespaced, called node-level, it cannot be set within OpenShift Container Platform. Moreover, only those sysctls considered safe are whitelisted by default; you can manually enable other unsafe sysctls on the node to be available to the user.

5.9.1. About sysctls
复制链接

In Linux, the sysctl interface allows an administrator to modify kernel parameters at runtime. Parameters are available via the /proc/sys/ virtual process file system. The parameters cover various subsystems, such as:

kernel (common prefix: kernel.)
networking (common prefix: net.)
virtual memory (common prefix: vm.)
MDADM (common prefix: dev.)

More subsystems are described in Kernel documentation. To get a list of all parameters, run:

sudo sysctl -a

$ sudo sysctl -a

Copy to Clipboard

Toggle word wrap

5.9.1.1. Namespaced versus node-level sysctls
复制链接

A number of sysctls are namespaced in the Linux kernels. This means that you can set them independently for each pod on a node. Being namespaced is a requirement for sysctls to be accessible in a pod context within Kubernetes.

The following sysctls are known to be namespaced:

kernel.shm*
kernel.msg*
kernel.sem
fs.mqueue.*

Additionally, most of the sysctls in the net.* group are known to be namespaced. Their namespace adoption differs based on the kernel version and distributor.

Sysctls that are not namespaced are called node-level and must be set manually by the cluster administrator, either by means of the underlying Linux distribution of the nodes, such as by modifying the /etc/sysctls.conf file, or by using a daemon set with privileged containers.

注意

Consider marking nodes with special sysctls as tainted. Only schedule pods onto them that need those sysctl settings. Use the taints and toleration feature to mark the nodes.

5.9.1.2. Safe versus unsafe sysctls
复制链接

Sysctls are grouped into safe and unsafe sysctls.

For a sysctl to be considered safe, it must use proper namespacing and must be properly isolated between pods on the same node. This means that if you set a sysctl for one pod it must not:

Influence any other pod on the node
Harm the node’s health
Gain CPU or memory resources outside of the resource limits of a pod

OpenShift Container Platform supports, or whitelists, the following sysctls in the safe set:

kernel.shm_rmid_forced
net.ipv4.ip_local_port_range
net.ipv4.tcp_syncookies

All safe sysctls are enabled by default. You can use a sysctl in a pod by modifying the Pod spec.

Any sysctl not whitelisted by OpenShift Container Platform is considered unsafe for OpenShift Container Platform. Note that being namespaced alone is not sufficient for the sysctl to be considered safe.

All unsafe sysctls are disabled by default, and the cluster administrator must manually enable them on a per-node basis. Pods with disabled unsafe sysctls are scheduled but do not launch.

oc get pod

$ oc get pod

Copy to Clipboard

Toggle word wrap

Example output

NAME        READY   STATUS            RESTARTS   AGE
hello-pod   0/1     SysctlForbidden   0          14s

NAME        READY   STATUS            RESTARTS   AGE
hello-pod   0/1     SysctlForbidden   0          14s

Copy to Clipboard

Toggle word wrap

5.9.2. Setting sysctls for a pod
复制链接

You can set sysctls on pods using the pod’s securityContext. The securityContext applies to all containers in the same pod.

Safe sysctls are allowed by default. A pod with unsafe sysctls fails to launch on any node unless the cluster administrator explicitly enables unsafe sysctls for that node. As with node-level sysctls, use the taints and toleration feature or labels on nodes to schedule those pods onto the right nodes.

The following example uses the pod securityContext to set a safe sysctl kernel.shm_rmid_forced and two unsafe sysctls, net.core.somaxconn and kernel.msgmax. There is no distinction between safe and unsafe sysctls in the specification.

警告

To avoid destabilizing your operating system, modify sysctl parameters only after you understand their effects.

Procedure

To use safe and unsafe sysctls:

Modify the YAML file that defines the pod and add the securityContext spec, as shown in the following example:

apiVersion: v1
kind: Pod
metadata:
  name: sysctl-example
spec:
  securityContext:
    sysctls:
    - name: kernel.shm_rmid_forced
      value: "0"
    - name: net.core.somaxconn
      value: "1024"
    - name: kernel.msgmax
      value: "65536"
  ...

apiVersion: v1
kind: Pod
metadata:
  name: sysctl-example
spec:
  securityContext:
    sysctls:
    - name: kernel.shm_rmid_forced
      value: "0"
    - name: net.core.somaxconn
      value: "1024"
    - name: kernel.msgmax
      value: "65536"
  ...

Copy to Clipboard

Toggle word wrap

Create the pod:

oc apply -f <file-name>.yaml

$ oc apply -f <file-name>.yaml

Copy to Clipboard

Toggle word wrap

If the unsafe sysctls are not allowed for the node, the pod is scheduled, but does not deploy:

oc get pod

$ oc get pod

Copy to Clipboard

Toggle word wrap

Example output

NAME        READY   STATUS            RESTARTS   AGE
hello-pod   0/1     SysctlForbidden   0          14s

NAME        READY   STATUS            RESTARTS   AGE
hello-pod   0/1     SysctlForbidden   0          14s

Copy to Clipboard

Toggle word wrap

5.9.3. Enabling unsafe sysctls
复制链接

A cluster administrator can allow certain unsafe sysctls for very special situations such as high performance or real-time application tuning.

If you want to use unsafe sysctls, a cluster administrator must enable them individually for a specific type of node. The sysctls must be namespaced.

You can further control which sysctls can be set in pods by specifying lists of sysctls or sysctl patterns in the forbiddenSysctls and allowedUnsafeSysctls fields of the Security Context Constraints.

The forbiddenSysctls option excludes specific sysctls.
The allowedUnsafeSysctls option controls specific needs such as high performance or real-time application tuning.

警告

Due to their nature of being unsafe, the use of unsafe sysctls is at-your-own-risk and can lead to severe problems, such as improper behavior of containers, resource shortage, or breaking a node.

Procedure

Add a label to the machine config pool where the containers where containers with the unsafe sysctls will run:

oc edit machineconfigpool worker

$ oc edit machineconfigpool worker

Copy to Clipboard

Toggle word wrap

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  creationTimestamp: 2019-02-08T14:52:39Z
  generation: 1
  labels:
    custom-kubelet: sysctl

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  creationTimestamp: 2019-02-08T14:52:39Z
  generation: 1
  labels:
    custom-kubelet: sysctl

Copy to Clipboard

Toggle word wrap

1: Add a key: pair label.

Create a KubeletConfig custom resource (CR):

apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: custom-kubelet
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: sysctl 
  kubeletConfig:
    allowedUnsafeSysctls: 
      - "kernel.msg*"
      - "net.core.somaxconn"

apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: custom-kubelet
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: sysctl


  kubeletConfig:
    allowedUnsafeSysctls:


      - "kernel.msg*"
      - "net.core.somaxconn"

Copy to Clipboard

Toggle word wrap

1: Specify the label from the machine config pool.
2: List the unsafe sysctls you want to allow.

Create the object:
```
oc apply -f set-sysctl-worker.yaml
```
```
$ oc apply -f set-sysctl-worker.yaml
```
Copy to Clipboard Toggle word wrap
A new MachineConfig object named in the 99-worker-XXXXXX-XXXXX-XXXX-XXXXX-kubelet format is created.

Wait for the cluster to reboot usng the machineconfigpool object status fields:

For example:

status:
  conditions:
    - lastTransitionTime: '2019-08-11T15:32:00Z'
      message: >-
        All nodes are updating to
        rendered-worker-ccbfb5d2838d65013ab36300b7b3dc13
      reason: ''
      status: 'True'
      type: Updating

status:
  conditions:
    - lastTransitionTime: '2019-08-11T15:32:00Z'
      message: >-
        All nodes are updating to
        rendered-worker-ccbfb5d2838d65013ab36300b7b3dc13
      reason: ''
      status: 'True'
      type: Updating

Copy to Clipboard

Toggle word wrap

A message similar to the following appears when the cluster is ready:

   - lastTransitionTime: '2019-08-11T16:00:00Z'
      message: >-
        All nodes are updated with
        rendered-worker-ccbfb5d2838d65013ab36300b7b3dc13
      reason: ''
      status: 'True'
      type: Updated

   - lastTransitionTime: '2019-08-11T16:00:00Z'
      message: >-
        All nodes are updated with
        rendered-worker-ccbfb5d2838d65013ab36300b7b3dc13
      reason: ''
      status: 'True'
      type: Updated

Copy to Clipboard

Toggle word wrap

When the cluster is ready, check for the merged KubeletConfig object in the new MachineConfig object:

oc get machineconfig 99-worker-XXXXXX-XXXXX-XXXX-XXXXX-kubelet -o json | grep ownerReference -A7

$ oc get machineconfig 99-worker-XXXXXX-XXXXX-XXXX-XXXXX-kubelet -o json | grep ownerReference -A7

Copy to Clipboard

Toggle word wrap

        "ownerReferences": [
            {
                "apiVersion": "machineconfiguration.openshift.io/v1",
                "blockOwnerDeletion": true,
                "controller": true,
                "kind": "KubeletConfig",
                "name": "custom-kubelet",
                "uid": "3f64a766-bae8-11e9-abe8-0a1a2a4813f2"

        "ownerReferences": [
            {
                "apiVersion": "machineconfiguration.openshift.io/v1",
                "blockOwnerDeletion": true,
                "controller": true,
                "kind": "KubeletConfig",
                "name": "custom-kubelet",
                "uid": "3f64a766-bae8-11e9-abe8-0a1a2a4813f2"

Copy to Clipboard

Toggle word wrap

You can now add unsafe sysctls to pods as needed.

5.9. Using sysctls in containers

5.9.1. About sysctls
复制链接

5.9.1.1. Namespaced versus node-level sysctls
复制链接

5.9.1.2. Safe versus unsafe sysctls
复制链接

5.9.2. Setting sysctls for a pod
复制链接

5.9.3. Enabling unsafe sysctls
复制链接

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

5.9. Using sysctls in containers

5.9.1. About sysctls复制链接链接已复制到粘贴板!

5.9.1.1. Namespaced versus node-level sysctls复制链接链接已复制到粘贴板!

5.9.1.2. Safe versus unsafe sysctls复制链接链接已复制到粘贴板!

5.9.2. Setting sysctls for a pod复制链接链接已复制到粘贴板!

5.9.3. Enabling unsafe sysctls复制链接链接已复制到粘贴板!

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

5.9.1. About sysctls
复制链接

5.9.1.1. Namespaced versus node-level sysctls
复制链接

5.9.1.2. Safe versus unsafe sysctls
复制链接

5.9.2. Setting sysctls for a pod
复制链接

5.9.3. Enabling unsafe sysctls
复制链接