Chapter 4. Configuring MCO-related custom resources
Besides managing MachineConfig
objects, the MCO manages two custom resources (CRs): KubeletConfig
and ContainerRuntimeConfig
. Those CRs let you change node-level settings impacting how the kubelet and CRI-O container runtime services behave.
4.1. Creating a KubeletConfig CRD to edit kubelet parameters
The kubelet configuration is currently serialized as an Ignition configuration, so it can be directly edited. However, there is also a new kubelet-config-controller
added to the Machine Config Controller (MCC). This lets you use a KubeletConfig
custom resource (CR) to edit the kubelet parameters.
As the fields in the kubeletConfig
object are passed directly to the kubelet from upstream Kubernetes, the kubelet validates those values directly. Invalid values in the kubeletConfig
object might cause cluster nodes to become unavailable. For valid values, see the Kubernetes documentation.
Consider the following guidance:
-
Edit an existing
KubeletConfig
CR to modify existing settings or add new settings, instead of creating a CR for each change. It is recommended that you create a CR only to modify a different machine config pool, or for changes that are intended to be temporary, so that you can revert the changes. -
Create one
KubeletConfig
CR for each machine config pool with all the config changes you want for that pool. -
As needed, create multiple
KubeletConfig
CRs with a limit of 10 per cluster. For the firstKubeletConfig
CR, the Machine Config Operator (MCO) creates a machine config appended withkubelet
. With each subsequent CR, the controller creates anotherkubelet
machine config with a numeric suffix. For example, if you have akubelet
machine config with a-2
suffix, the nextkubelet
machine config is appended with-3
.
If you are applying a kubelet or container runtime config to a custom machine config pool, the custom role in the machineConfigSelector
must match the name of the custom machine config pool.
For example, because the following custom machine config pool is named infra
, the custom role must also be infra
:
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfigPool metadata: name: infra spec: machineConfigSelector: matchExpressions: - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra]} # ...
If you want to delete the machine configs, delete them in reverse order to avoid exceeding the limit. For example, you delete the kubelet-3
machine config before deleting the kubelet-2
machine config.
If you have a machine config with a kubelet-9
suffix, and you create another KubeletConfig
CR, a new machine config is not created, even if there are fewer than 10 kubelet
machine configs.
Example KubeletConfig
CR
$ oc get kubeletconfig
NAME AGE set-max-pods 15m
Example showing a KubeletConfig
machine config
$ oc get mc | grep kubelet
... 99-worker-generated-kubelet-1 b5c5119de007945b6fe6fb215db3b8e2ceb12511 3.2.0 26m ...
The following procedure is an example to show how to configure the maximum number of pods per node on the worker nodes.
Prerequisites
Obtain the label associated with the static
MachineConfigPool
CR for the type of node you want to configure. Perform one of the following steps:View the machine config pool:
$ oc describe machineconfigpool <name>
For example:
$ oc describe machineconfigpool worker
Example output
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfigPool metadata: creationTimestamp: 2019-02-08T14:52:39Z generation: 1 labels: custom-kubelet: set-max-pods 1
- 1
- If a label has been added it appears under
labels
.
If the label is not present, add a key/value pair:
$ oc label machineconfigpool worker custom-kubelet=set-max-pods
Procedure
View the available machine configuration objects that you can select:
$ oc get machineconfig
By default, the two kubelet-related configs are
01-master-kubelet
and01-worker-kubelet
.Check the current value for the maximum pods per node:
$ oc describe node <node_name>
For example:
$ oc describe node ci-ln-5grqprb-f76d1-ncnqq-worker-a-mdv94
Look for
value: pods: <value>
in theAllocatable
stanza:Example output
Allocatable: attachable-volumes-aws-ebs: 25 cpu: 3500m hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 15341844Ki pods: 250
Set the maximum pods per node on the worker nodes by creating a custom resource file that contains the kubelet configuration:
ImportantKubelet configurations that target a specific machine config pool also affect any dependent pools. For example, creating a kubelet configuration for the pool containing worker nodes will also apply to any subset pools, including the pool containing infrastructure nodes. To avoid this, you must create a new machine config pool with a selection expression that only includes worker nodes, and have your kubelet configuration target this new pool.
apiVersion: machineconfiguration.openshift.io/v1 kind: KubeletConfig metadata: name: set-max-pods spec: machineConfigPoolSelector: matchLabels: custom-kubelet: set-max-pods 1 kubeletConfig: maxPods: 500 2
NoteThe rate at which the kubelet talks to the API server depends on queries per second (QPS) and burst values. The default values,
50
forkubeAPIQPS
and100
forkubeAPIBurst
, are sufficient if there are limited pods running on each node. It is recommended to update the kubelet QPS and burst rates if there are enough CPU and memory resources on the node.apiVersion: machineconfiguration.openshift.io/v1 kind: KubeletConfig metadata: name: set-max-pods spec: machineConfigPoolSelector: matchLabels: custom-kubelet: set-max-pods kubeletConfig: maxPods: <pod_count> kubeAPIBurst: <burst_rate> kubeAPIQPS: <QPS>
Update the machine config pool for workers with the label:
$ oc label machineconfigpool worker custom-kubelet=set-max-pods
Create the
KubeletConfig
object:$ oc create -f change-maxPods-cr.yaml
Verify that the
KubeletConfig
object is created:$ oc get kubeletconfig
Example output
NAME AGE set-max-pods 15m
Depending on the number of worker nodes in the cluster, wait for the worker nodes to be rebooted one by one. For a cluster with 3 worker nodes, this could take about 10 to 15 minutes.
Verify that the changes are applied to the node:
Check on a worker node that the
maxPods
value changed:$ oc describe node <node_name>
Locate the
Allocatable
stanza:... Allocatable: attachable-volumes-gce-pd: 127 cpu: 3500m ephemeral-storage: 123201474766 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 14225400Ki pods: 500 1 ...
- 1
- In this example, the
pods
parameter should report the value you set in theKubeletConfig
object.
Verify the change in the
KubeletConfig
object:$ oc get kubeletconfigs set-max-pods -o yaml
This should show a status of
True
andtype:Success
, as shown in the following example:spec: kubeletConfig: maxPods: 500 machineConfigPoolSelector: matchLabels: custom-kubelet: set-max-pods status: conditions: - lastTransitionTime: "2021-06-30T17:04:07Z" message: Success status: "True" type: Success
4.2. Creating a ContainerRuntimeConfig CR to edit CRI-O parameters
You can change some of the settings associated with the OpenShift Container Platform CRI-O runtime for the nodes associated with a specific machine config pool (MCP). Using a ContainerRuntimeConfig
custom resource (CR), you set the configuration values and add a label to match the MCP. The MCO then rebuilds the crio.conf
and storage.conf
configuration files on the associated nodes with the updated values.
To revert the changes implemented by using a ContainerRuntimeConfig
CR, you must delete the CR. Removing the label from the machine config pool does not revert the changes.
You can modify the following settings by using a ContainerRuntimeConfig
CR:
PIDs limit: Setting the PIDs limit in the
ContainerRuntimeConfig
is expected to be deprecated. If PIDs limits are required, it is recommended to use thepodPidsLimit
field in theKubeletConfig
CR instead. The defaultpodPidsLimit
value is4096
and the defaultpids_limit
value is0
. IfpodPidsLimit
is lower thanpids_limit
then the effective container PIDs limit is defined by the value set inpodPidsLimit
.NoteThe CRI-O flag is applied on the cgroup of the container, while the Kubelet flag is set on the cgroup of the pod. Please adjust the PIDs limit accordingly.
-
Log level: The
logLevel
parameter sets the CRI-Olog_level
parameter, which is the level of verbosity for log messages. The default isinfo
(log_level = info
). Other options includefatal
,panic
,error
,warn
,debug
, andtrace
. -
Overlay size: The
overlaySize
parameter sets the CRI-O Overlay storage driversize
parameter, which is the maximum size of a container image. -
Maximum log size: Setting the maximum log size in the
ContainerRuntimeConfig
is expected to be deprecated. If a maximum log size is required, it is recommended to use thecontainerLogMaxSize
field in theKubeletConfig
CR instead. -
Container runtime: The
defaultRuntime
parameter sets the container runtime to eitherrunc
orcrun
. The default isrunc
.
You should have one ContainerRuntimeConfig
CR for each machine config pool with all the config changes you want for that pool. If you are applying the same content to all the pools, you only need one ContainerRuntimeConfig
CR for all the pools.
You should edit an existing ContainerRuntimeConfig
CR to modify existing settings or add new settings instead of creating a new CR for each change. It is recommended to create a new ContainerRuntimeConfig
CR only to modify a different machine config pool, or for changes that are intended to be temporary so that you can revert the changes.
You can create multiple ContainerRuntimeConfig
CRs, as needed, with a limit of 10 per cluster. For the first ContainerRuntimeConfig
CR, the MCO creates a machine config appended with containerruntime
. With each subsequent CR, the controller creates a new containerruntime
machine config with a numeric suffix. For example, if you have a containerruntime
machine config with a -2
suffix, the next containerruntime
machine config is appended with -3
.
If you want to delete the machine configs, you should delete them in reverse order to avoid exceeding the limit. For example, you should delete the containerruntime-3
machine config before deleting the containerruntime-2
machine config.
If you have a machine config with a containerruntime-9
suffix, and you create another ContainerRuntimeConfig
CR, a new machine config is not created, even if there are fewer than 10 containerruntime
machine configs.
Example showing multiple ContainerRuntimeConfig
CRs
$ oc get ctrcfg
Example output
NAME AGE ctr-overlay 15m ctr-level 5m45s
Example showing multiple containerruntime
machine configs
$ oc get mc | grep container
Example output
... 01-master-container-runtime b5c5119de007945b6fe6fb215db3b8e2ceb12511 3.2.0 57m ... 01-worker-container-runtime b5c5119de007945b6fe6fb215db3b8e2ceb12511 3.2.0 57m ... 99-worker-generated-containerruntime b5c5119de007945b6fe6fb215db3b8e2ceb12511 3.2.0 26m 99-worker-generated-containerruntime-1 b5c5119de007945b6fe6fb215db3b8e2ceb12511 3.2.0 17m 99-worker-generated-containerruntime-2 b5c5119de007945b6fe6fb215db3b8e2ceb12511 3.2.0 7m26s ...
The following example sets the log_level
field to debug
and sets the overlay size to 8 GB:
Example ContainerRuntimeConfig
CR
apiVersion: machineconfiguration.openshift.io/v1 kind: ContainerRuntimeConfig metadata: name: overlay-size spec: machineConfigPoolSelector: matchLabels: pools.operator.machineconfiguration.openshift.io/worker: '' 1 containerRuntimeConfig: logLevel: debug 2 overlaySize: 8G 3 defaultRuntime: "crun" 4
- 1
- Specifies the machine config pool label. For a container runtime config, the role must match the name of the associated machine config pool.
- 2
- Optional: Specifies the level of verbosity for log messages.
- 3
- Optional: Specifies the maximum size of a container image.
- 4
- Optional: Specifies the container runtime to deploy to new containers. The default value is
runc
.
Procedure
To change CRI-O settings using the ContainerRuntimeConfig
CR:
Create a YAML file for the
ContainerRuntimeConfig
CR:apiVersion: machineconfiguration.openshift.io/v1 kind: ContainerRuntimeConfig metadata: name: overlay-size spec: machineConfigPoolSelector: matchLabels: pools.operator.machineconfiguration.openshift.io/worker: '' 1 containerRuntimeConfig: 2 logLevel: debug overlaySize: 8G
Create the
ContainerRuntimeConfig
CR:$ oc create -f <file_name>.yaml
Verify that the CR is created:
$ oc get ContainerRuntimeConfig
Example output
NAME AGE overlay-size 3m19s
Check that a new
containerruntime
machine config is created:$ oc get machineconfigs | grep containerrun
Example output
99-worker-generated-containerruntime 2c9371fbb673b97a6fe8b1c52691999ed3a1bfc2 3.2.0 31s
Monitor the machine config pool until all are shown as ready:
$ oc get mcp worker
Example output
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE worker rendered-worker-169 False True False 3 1 1 0 9h
Verify that the settings were applied in CRI-O:
Open an
oc debug
session to a node in the machine config pool and runchroot /host
.$ oc debug node/<node_name>
sh-4.4# chroot /host
Verify the changes in the
crio.conf
file:sh-4.4# crio config | grep 'log_level'
Example output
log_level = "debug"
Verify the changes in the `storage.conf`file:
sh-4.4# head -n 7 /etc/containers/storage.conf
Example output
[storage] driver = "overlay" runroot = "/var/run/containers/storage" graphroot = "/var/lib/containers/storage" [storage.options] additionalimagestores = [] size = "8G"
4.3. Setting the default maximum container root partition size for Overlay with CRI-O
The root partition of each container shows all of the available disk space of the underlying host. Follow this guidance to set a maximum partition size for the root disk of all containers.
To configure the maximum Overlay size, as well as other CRI-O options like the log level, you can create the following ContainerRuntimeConfig
custom resource definition (CRD):
apiVersion: machineconfiguration.openshift.io/v1 kind: ContainerRuntimeConfig metadata: name: overlay-size spec: machineConfigPoolSelector: matchLabels: custom-crio: overlay-size containerRuntimeConfig: logLevel: debug overlaySize: 8G
Procedure
Create the configuration object:
$ oc apply -f overlaysize.yml
To apply the new CRI-O configuration to your worker nodes, edit the worker machine config pool:
$ oc edit machineconfigpool worker
Add the
custom-crio
label based on thematchLabels
name you set in theContainerRuntimeConfig
CRD:apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfigPool metadata: creationTimestamp: "2020-07-09T15:46:34Z" generation: 3 labels: custom-crio: overlay-size machineconfiguration.openshift.io/mco-built-in: ""
Save the changes, then view the machine configs:
$ oc get machineconfigs
New
99-worker-generated-containerruntime
andrendered-worker-xyz
objects are created:Example output
99-worker-generated-containerruntime 4173030d89fbf4a7a0976d1665491a4d9a6e54f1 3.2.0 7m42s rendered-worker-xyz 4173030d89fbf4a7a0976d1665491a4d9a6e54f1 3.2.0 7m36s
After those objects are created, monitor the machine config pool for the changes to be applied:
$ oc get mcp worker
The worker nodes show
UPDATING
asTrue
, as well as the number of machines, the number updated, and other details:Example output
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE worker rendered-worker-xyz False True False 3 2 2 0 20h
When complete, the worker nodes transition back to
UPDATING
asFalse
, and theUPDATEDMACHINECOUNT
number matches theMACHINECOUNT
:Example output
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE worker rendered-worker-xyz True False False 3 3 3 0 20h
Looking at a worker machine, you see that the new 8 GB max size configuration is applied to all of the workers:
Example output
head -n 7 /etc/containers/storage.conf [storage] driver = "overlay" runroot = "/var/run/containers/storage" graphroot = "/var/lib/containers/storage" [storage.options] additionalimagestores = [] size = "8G"
Looking inside a container, you see that the root partition is now 8 GB:
Example output
~ $ df -h Filesystem Size Used Available Use% Mounted on overlay 8.0G 8.0K 8.0G 0% /