Specialized hardware and driver enablement
Learn about hardware enablement on OpenShift Container Platform
Abstract
Chapter 1. About specialized hardware and driver enablement Copy linkLink copied to clipboard!
The Driver Toolkit (DTK) is a container image in the OpenShift Container Platform payload which is meant to be used as a base image on which to build driver containers. The Driver Toolkit image contains the kernel packages commonly required as dependencies to build or install kernel modules as well as a few tools needed in driver containers. The version of these packages will match the kernel version running on the RHCOS nodes in the corresponding OpenShift Container Platform release.
Driver containers are container images used for building and deploying out-of-tree kernel modules and drivers on container operating systems such as Red Hat Enterprise Linux CoreOS (RHCOS). Kernel modules and drivers are software libraries running with a high level of privilege in the operating system kernel. They extend the kernel functionalities or provide the hardware-specific code required to control new devices. Examples include hardware devices like field-programmable gate arrays (FPGA) or graphics processing units (GPU), and software-defined storage solutions, which all require kernel modules on client machines. Driver containers are the first layer of the software stack used to enable these technologies on OpenShift Container Platform deployments.
Chapter 2. Driver Toolkit Copy linkLink copied to clipboard!
Learn about the Driver Toolkit and how you can use it as a base image for driver containers for enabling special software and hardware devices on OpenShift Container Platform deployments.
2.1. About the Driver Toolkit Copy linkLink copied to clipboard!
2.1.1. Background Copy linkLink copied to clipboard!
The Driver Toolkit is a container image in the OpenShift Container Platform payload used as a base image on which you can build driver containers. The Driver Toolkit image includes the kernel packages commonly required as dependencies to build or install kernel modules, as well as a few tools needed in driver containers. The version of these packages will match the kernel version running on the Red Hat Enterprise Linux CoreOS (RHCOS) nodes in the corresponding OpenShift Container Platform release.
Driver containers are container images used for building and deploying out-of-tree kernel modules and drivers on container operating systems like RHCOS. Kernel modules and drivers are software libraries running with a high level of privilege in the operating system kernel. They extend the kernel functionalities or provide the hardware-specific code required to control new devices. Examples include hardware devices like Field Programmable Gate Arrays (FPGA) or GPUs, and software-defined storage (SDS) solutions, such as Lustre parallel file systems, which require kernel modules on client machines. Driver containers are the first layer of the software stack used to enable these technologies on Kubernetes.
The list of kernel packages in the Driver Toolkit includes the following and their dependencies:
-
kernel-core -
kernel-devel -
kernel-headers -
kernel-modules -
kernel-modules-extra
In addition, the Driver Toolkit also includes the corresponding real-time kernel packages:
-
kernel-rt-core -
kernel-rt-devel -
kernel-rt-modules -
kernel-rt-modules-extra
The Driver Toolkit also has several tools that are commonly needed to build and install kernel modules, including:
-
elfutils-libelf-devel -
kmod -
binutilskabi-dw -
kernel-abi-whitelists - dependencies for the above
2.1.2. Purpose Copy linkLink copied to clipboard!
Prior to the Driver Toolkit’s existence, users would install kernel packages in a pod or build config on OpenShift Container Platform using entitled builds or by installing from the kernel RPMs in the hosts
machine-os-content
The Driver Toolkit is also used by the Kernel Module Management (KMM), which is currently available as a community Operator on OperatorHub. KMM supports out-of-tree and third-party kernel drivers and the support software for the underlying operating system. Users can create modules for KMM to build and deploy a driver container, as well as support software like a device plugin, or metrics. Modules can include a build config to build a driver container-based on the Driver Toolkit, or KMM can deploy a prebuilt driver container.
2.2. Pulling the Driver Toolkit container image Copy linkLink copied to clipboard!
The
driver-toolkit
oc adm
2.2.1. Pulling the Driver Toolkit container image from registry.redhat.io Copy linkLink copied to clipboard!
Instructions for pulling the
driver-toolkit
registry.redhat.io
podman
registry.redhat.io
registry.redhat.io/openshift4/driver-toolkit-rhel8:v4.19
2.2.2. Finding the Driver Toolkit image URL in the payload Copy linkLink copied to clipboard!
Prerequisites
- You obtained the image pull secret from Red Hat OpenShift Cluster Manager.
-
You installed the OpenShift CLI ().
oc
Procedure
Use the
command to extract the image URL of theoc admcorresponding to a certain release:driver-toolkitFor an x86 image, the command is as follows:
$ oc adm release info quay.io/openshift-release-dev/ocp-release:4.19.z-x86_64 --image-for=driver-toolkitFor an ARM image, the command is as follows:
$ oc adm release info quay.io/openshift-release-dev/ocp-release:4.19.z-aarch64 --image-for=driver-toolkit
Example output
quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b53883ca2bac5925857148c4a1abc300ced96c222498e3bc134fe7ce3a1dd404Obtain this image using a valid pull secret, such as the pull secret required to install OpenShift Container Platform:
$ podman pull --authfile=path/to/pullsecret.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<SHA>
2.3. Using the Driver Toolkit Copy linkLink copied to clipboard!
As an example, the Driver Toolkit can be used as the base image for building a very simple kernel module called
simple-kmod
The Driver Toolkit includes the necessary dependencies,
openssl
mokutil
keyutils
simple-kmod
Secure Boot
2.3.1. Build and run the simple-kmod driver container on a cluster Copy linkLink copied to clipboard!
Prerequisites
- You have a running OpenShift Container Platform cluster.
-
You set the Image Registry Operator state to for your cluster.
Managed -
You installed the OpenShift CLI ().
oc -
You are logged into the OpenShift CLI as a user with privileges.
cluster-admin
Procedure
Create a namespace. For example:
$ oc new-project simple-kmod-demo
The YAML defines an
for storing theImageStreamdriver container image, and asimple-kmodfor building the container. Save this YAML asBuildConfig.0000-buildconfig.yaml.templateapiVersion: image.openshift.io/v1 kind: ImageStream metadata: labels: app: simple-kmod-driver-container name: simple-kmod-driver-container namespace: simple-kmod-demo spec: {} --- apiVersion: build.openshift.io/v1 kind: BuildConfig metadata: labels: app: simple-kmod-driver-build name: simple-kmod-driver-build namespace: simple-kmod-demo spec: nodeSelector: node-role.kubernetes.io/worker: "" runPolicy: "Serial" triggers: - type: "ConfigChange" - type: "ImageChange" source: dockerfile: | ARG DTK FROM ${DTK} as builder ARG KVER WORKDIR /build/ RUN git clone https://github.com/openshift-psap/simple-kmod.git WORKDIR /build/simple-kmod RUN make all install KVER=${KVER} FROM registry.redhat.io/ubi8/ubi-minimal ARG KVER # Required for installing `modprobe` RUN microdnf install kmod COPY --from=builder /lib/modules/${KVER}/simple-kmod.ko /lib/modules/${KVER}/ COPY --from=builder /lib/modules/${KVER}/simple-procfs-kmod.ko /lib/modules/${KVER}/ RUN depmod ${KVER} strategy: dockerStrategy: buildArgs: - name: KMODVER value: DEMO # $ oc adm release info quay.io/openshift-release-dev/ocp-release:<cluster version>-x86_64 --image-for=driver-toolkit - name: DTK value: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:34864ccd2f4b6e385705a730864c04a40908e57acede44457a783d739e377cae - name: KVER value: 4.18.0-372.26.1.el8_6.x86_64 output: to: kind: ImageStreamTag name: simple-kmod-driver-container:demoSubstitute the correct driver toolkit image for the OpenShift Container Platform version you are running in place of “DRIVER_TOOLKIT_IMAGE” with the following commands.
$ OCP_VERSION=$(oc get clusterversion/version -ojsonpath={.status.desired.version})$ DRIVER_TOOLKIT_IMAGE=$(oc adm release info $OCP_VERSION --image-for=driver-toolkit)$ sed "s#DRIVER_TOOLKIT_IMAGE#${DRIVER_TOOLKIT_IMAGE}#" 0000-buildconfig.yaml.template > 0000-buildconfig.yamlCreate the image stream and build config with
$ oc create -f 0000-buildconfig.yamlAfter the builder pod completes successfully, deploy the driver container image as a
.DaemonSetThe driver container must run with the privileged security context in order to load the kernel modules on the host. The following YAML file contains the RBAC rules and the
for running the driver container. Save this YAML asDaemonSet.1000-drivercontainer.yamlapiVersion: v1 kind: ServiceAccount metadata: name: simple-kmod-driver-container --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: simple-kmod-driver-container rules: - apiGroups: - security.openshift.io resources: - securitycontextconstraints verbs: - use resourceNames: - privileged --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: simple-kmod-driver-container roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: simple-kmod-driver-container subjects: - kind: ServiceAccount name: simple-kmod-driver-container userNames: - system:serviceaccount:simple-kmod-demo:simple-kmod-driver-container --- apiVersion: apps/v1 kind: DaemonSet metadata: name: simple-kmod-driver-container spec: selector: matchLabels: app: simple-kmod-driver-container template: metadata: labels: app: simple-kmod-driver-container spec: serviceAccount: simple-kmod-driver-container serviceAccountName: simple-kmod-driver-container containers: - image: image-registry.openshift-image-registry.svc:5000/simple-kmod-demo/simple-kmod-driver-container:demo name: simple-kmod-driver-container imagePullPolicy: Always command: [sleep, infinity] lifecycle: postStart: exec: command: ["modprobe", "-v", "-a" , "simple-kmod", "simple-procfs-kmod"] preStop: exec: command: ["modprobe", "-r", "-a" , "simple-kmod", "simple-procfs-kmod"] securityContext: privileged: true nodeSelector: node-role.kubernetes.io/worker: ""Create the RBAC rules and daemon set:
$ oc create -f 1000-drivercontainer.yaml
After the pods are running on the worker nodes, verify that the
kernel module is loaded successfully on the host machines withsimple_kmod.lsmodVerify that the pods are running:
$ oc get pod -n simple-kmod-demoExample output
NAME READY STATUS RESTARTS AGE simple-kmod-driver-build-1-build 0/1 Completed 0 6m simple-kmod-driver-container-b22fd 1/1 Running 0 40s simple-kmod-driver-container-jz9vn 1/1 Running 0 40s simple-kmod-driver-container-p45cc 1/1 Running 0 40sExecute the
command in the driver container pod:lsmod$ oc exec -it pod/simple-kmod-driver-container-p45cc -- lsmod | grep simpleExample output
simple_procfs_kmod 16384 0 simple_kmod 16384 0
Chapter 3. Node Feature Discovery Operator Copy linkLink copied to clipboard!
Learn about the Node Feature Discovery (NFD) Operator and how you can use it to expose node-level information by orchestrating Node Feature Discovery, a Kubernetes add-on for detecting hardware features and system configuration.
The Node Feature Discovery Operator (NFD) manages the detection of hardware features and configuration in an OpenShift Container Platform cluster by labeling the nodes with hardware-specific information. NFD labels the host with node-specific attributes, such as PCI cards, kernel, operating system version, and so on.
The NFD Operator can be found on the Operator Hub by searching for “Node Feature Discovery”.
3.1. Installing the Node Feature Discovery Operator Copy linkLink copied to clipboard!
The Node Feature Discovery (NFD) Operator orchestrates all resources needed to run the NFD daemon set. As a cluster administrator, you can install the NFD Operator by using the OpenShift Container Platform CLI or the web console.
3.1.1. Installing the NFD Operator using the CLI Copy linkLink copied to clipboard!
As a cluster administrator, you can install the NFD Operator using the CLI.
Prerequisites
- An OpenShift Container Platform cluster
-
Install the OpenShift CLI ().
oc -
Log in as a user with privileges.
cluster-admin
Procedure
Create a namespace for the NFD Operator.
Create the following
custom resource (CR) that defines theNamespacenamespace, and then save the YAML in theopenshift-nfdfile. Setnfd-namespace.yamltocluster-monitoring."true"apiVersion: v1 kind: Namespace metadata: name: openshift-nfd labels: name: openshift-nfd openshift.io/cluster-monitoring: "true"Create the namespace by running the following command:
$ oc create -f nfd-namespace.yaml
Install the NFD Operator in the namespace you created in the previous step by creating the following objects:
Create the following
CR and save the YAML in theOperatorGroupfile:nfd-operatorgroup.yamlapiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: generateName: openshift-nfd- name: openshift-nfd namespace: openshift-nfd spec: targetNamespaces: - openshift-nfdCreate the
CR by running the following command:OperatorGroup$ oc create -f nfd-operatorgroup.yamlCreate the following
CR and save the YAML in theSubscriptionfile:nfd-sub.yamlExample Subscription
apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: nfd namespace: openshift-nfd spec: channel: "stable" installPlanApproval: Automatic name: nfd source: redhat-operators sourceNamespace: openshift-marketplaceCreate the subscription object by running the following command:
$ oc create -f nfd-sub.yamlChange to the
project:openshift-nfd$ oc project openshift-nfd
Verification
To verify that the Operator deployment is successful, run:
$ oc get podsExample output
NAME READY STATUS RESTARTS AGE nfd-controller-manager-7f86ccfb58-vgr4x 2/2 Running 0 10mA successful deployment shows a
status.Running
3.1.2. Installing the NFD Operator using the web console Copy linkLink copied to clipboard!
As a cluster administrator, you can install the NFD Operator using the web console.
Procedure
- In the OpenShift Container Platform web console, click Operators → OperatorHub.
- Choose Node Feature Discovery from the list of available Operators, and then click Install.
- On the Install Operator page, select A specific namespace on the cluster, and then click Install. You do not need to create a namespace because it is created for you.
Verification
To verify that the NFD Operator installed successfully:
- Navigate to the Operators → Installed Operators page.
Ensure that Node Feature Discovery is listed in the openshift-nfd project with a Status of InstallSucceeded.
NoteDuring installation an Operator might display a Failed status. If the installation later succeeds with an InstallSucceeded message, you can ignore the Failed message.
Troubleshooting
If the Operator does not appear as installed, troubleshoot further:
- Navigate to the Operators → Installed Operators page and inspect the Operator Subscriptions and Install Plans tabs for any failure or errors under Status.
-
Navigate to the Workloads → Pods page and check the logs for pods in the project.
openshift-nfd
3.2. Using the Node Feature Discovery Operator Copy linkLink copied to clipboard!
The Node Feature Discovery (NFD) Operator orchestrates all resources needed to run the Node-Feature-Discovery daemon set by watching for a
NodeFeatureDiscovery
NodeFeatureDiscovery
nfd-worker-conf
As a cluster administrator, you can create a
NodeFeatureDiscovery
oc
Starting with version 4.12, the
operand.image
NodeFeatureDiscovery
operand.image
NodeFeatureDiscovery
operand.image
3.2.1. Creating a NodeFeatureDiscovery CR by using the CLI Copy linkLink copied to clipboard!
As a cluster administrator, you can create a
NodeFeatureDiscovery
oc
The
spec.operand.image
-rhel9
The following example shows the use of
-rhel9
Prerequisites
- You have access to an OpenShift Container Platform cluster
-
You installed the OpenShift CLI ().
oc -
You logged in as a user with privileges.
cluster-admin - You installed the NFD Operator.
Procedure
Create a
CR:NodeFeatureDiscoveryExample
NodeFeatureDiscoveryCRapiVersion: nfd.openshift.io/v1 kind: NodeFeatureDiscovery metadata: name: nfd-instance namespace: openshift-nfd spec: instance: "" # instance is empty by default topologyupdater: false # False by default operand: image: registry.redhat.io/openshift4/ose-node-feature-discovery-rhel9:v4.191 imagePullPolicy: Always workerConfig: configData: | core: # labelWhiteList: # noPublish: false sleepInterval: 60s # sources: [all] # klog: # addDirHeader: false # alsologtostderr: false # logBacktraceAt: # logtostderr: true # skipHeaders: false # stderrthreshold: 2 # v: 0 # vmodule: ## NOTE: the following options are not dynamically run-time configurable ## and require a nfd-worker restart to take effect after being changed # logDir: # logFile: # logFileMaxSize: 1800 # skipLogHeaders: false sources: cpu: cpuid: # NOTE: whitelist has priority over blacklist attributeBlacklist: - "BMI1" - "BMI2" - "CLMUL" - "CMOV" - "CX16" - "ERMS" - "F16C" - "HTT" - "LZCNT" - "MMX" - "MMXEXT" - "NX" - "POPCNT" - "RDRAND" - "RDSEED" - "RDTSCP" - "SGX" - "SSE" - "SSE2" - "SSE3" - "SSE4.1" - "SSE4.2" - "SSSE3" attributeWhitelist: kernel: kconfigFile: "/path/to/kconfig" configOpts: - "NO_HZ" - "X86" - "DMI" pci: deviceClassWhitelist: - "0200" - "03" - "12" deviceLabelFields: - "class" customConfig: configData: | - name: "more.kernel.features" matchOn: - loadedKMod: ["example_kmod3"]- 1
- The
operand.imagefield is mandatory.
Create the
CR by running the following command:NodeFeatureDiscovery$ oc apply -f <filename>
Verification
Check that the
CR was created by running the following command:NodeFeatureDiscovery$ oc get podsExample output
NAME READY STATUS RESTARTS AGE nfd-controller-manager-7f86ccfb58-vgr4x 2/2 Running 0 11m nfd-master-hcn64 1/1 Running 0 60s nfd-master-lnnxx 1/1 Running 0 60s nfd-master-mp6hr 1/1 Running 0 60s nfd-worker-vgcz9 1/1 Running 0 60s nfd-worker-xqbws 1/1 Running 0 60sA successful deployment shows a
status.Running
3.2.2. Creating a NodeFeatureDiscovery CR by using the CLI in a disconnected environment Copy linkLink copied to clipboard!
As a cluster administrator, you can create a
NodeFeatureDiscovery
oc
Prerequisites
- You have access to an OpenShift Container Platform cluster
-
You installed the OpenShift CLI ().
oc -
You logged in as a user with privileges.
cluster-admin - You installed the NFD Operator.
- You have access to a mirror registry with the required images.
-
You installed the CLI tool.
skopeo
Procedure
Determine the digest of the registry image:
Run the following command:
$ skopeo inspect docker://registry.redhat.io/openshift4/ose-node-feature-discovery:<openshift_version>Example command
$ skopeo inspect docker://registry.redhat.io/openshift4/ose-node-feature-discovery:v4.12Inspect the output to identify the image digest:
Example output
{ ... "Digest": "sha256:1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef", ... }
Use the
CLI tool to copy the image fromskopeoto your mirror registry, by running the following command:registry.redhat.ioskopeo copy docker://registry.redhat.io/openshift4/ose-node-feature-discovery@<image_digest> docker://<mirror_registry>/openshift4/ose-node-feature-discovery@<image_digest>Example command
skopeo copy docker://registry.redhat.io/openshift4/ose-node-feature-discovery@sha256:1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef docker://<your-mirror-registry>/openshift4/ose-node-feature-discovery@sha256:1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdefCreate a
CR:NodeFeatureDiscoveryExample
NodeFeatureDiscoveryCRapiVersion: nfd.openshift.io/v1 kind: NodeFeatureDiscovery metadata: name: nfd-instance spec: operand: image: <mirror_registry>/openshift4/ose-node-feature-discovery@<image_digest>1 imagePullPolicy: Always workerConfig: configData: | core: # labelWhiteList: # noPublish: false sleepInterval: 60s # sources: [all] # klog: # addDirHeader: false # alsologtostderr: false # logBacktraceAt: # logtostderr: true # skipHeaders: false # stderrthreshold: 2 # v: 0 # vmodule: ## NOTE: the following options are not dynamically run-time configurable ## and require a nfd-worker restart to take effect after being changed # logDir: # logFile: # logFileMaxSize: 1800 # skipLogHeaders: false sources: cpu: cpuid: # NOTE: whitelist has priority over blacklist attributeBlacklist: - "BMI1" - "BMI2" - "CLMUL" - "CMOV" - "CX16" - "ERMS" - "F16C" - "HTT" - "LZCNT" - "MMX" - "MMXEXT" - "NX" - "POPCNT" - "RDRAND" - "RDSEED" - "RDTSCP" - "SGX" - "SSE" - "SSE2" - "SSE3" - "SSE4.1" - "SSE4.2" - "SSSE3" attributeWhitelist: kernel: kconfigFile: "/path/to/kconfig" configOpts: - "NO_HZ" - "X86" - "DMI" pci: deviceClassWhitelist: - "0200" - "03" - "12" deviceLabelFields: - "class" customConfig: configData: | - name: "more.kernel.features" matchOn: - loadedKMod: ["example_kmod3"]- 1
- The
operand.imagefield is mandatory.
Create the
CR by running the following command:NodeFeatureDiscovery$ oc apply -f <filename>
Verification
Check the status of the
CR by running the following command:NodeFeatureDiscovery$ oc get nodefeaturediscovery nfd-instance -o yamlCheck that the pods are running without
errors by running the following command:ImagePullBackOff$ oc get pods -n <nfd_namespace>
3.2.3. Creating a NodeFeatureDiscovery CR by using the web console Copy linkLink copied to clipboard!
As a cluster administrator, you can create a
NodeFeatureDiscovery
Prerequisites
- You have access to an OpenShift Container Platform cluster
-
You logged in as a user with privileges.
cluster-admin - You installed the NFD Operator.
Procedure
- Navigate to the Operators → Installed Operators page.
- In the Node Feature Discovery section, under Provided APIs, click Create instance.
-
Edit the values of the CR.
NodeFeatureDiscovery - Click Create.
Starting with version 4.12, the
operand.image
NodeFeatureDiscovery
operand.image
NodeFeatureDiscovery
operand.image
3.3. Configuring the Node Feature Discovery Operator Copy linkLink copied to clipboard!
3.3.1. core Copy linkLink copied to clipboard!
The
core
3.3.1.1. core.sleepInterval Copy linkLink copied to clipboard!
core.sleepInterval
This value is overridden by the deprecated
--sleep-interval
Example usage
core:
sleepInterval: 60s
The default value is
60s
3.3.1.2. core.sources Copy linkLink copied to clipboard!
core.sources
all
This value is overridden by the deprecated
--sources
Default:
[all]
Example usage
core:
sources:
- system
- custom
3.3.1.3. core.labelWhiteList Copy linkLink copied to clipboard!
core.labelWhiteList
The regular expression is only matched against the basename part of the label, the part of the name after '/'. The label prefix, or namespace, is omitted.
This value is overridden by the deprecated
--label-whitelist
Default:
null
Example usage
core:
labelWhiteList: '^cpu-cpuid'
3.3.1.4. core.noPublish Copy linkLink copied to clipboard!
Setting
core.noPublish
true
nfd-master
nfd-worker
nfd-master
This value is overridden by the
--no-publish
Example:
Example usage
core:
noPublish: true
The default value is
false
3.3.2. core.klog Copy linkLink copied to clipboard!
The following options specify the logger configuration, most of which can be dynamically adjusted at run-time.
The logger options can also be specified using command-line flags, which take precedence over any corresponding config file options.
3.3.2.1. core.klog.addDirHeader Copy linkLink copied to clipboard!
If set to
true
core.klog.addDirHeader
Default:
false
Run-time configurable: yes
3.3.2.2. core.klog.alsologtostderr Copy linkLink copied to clipboard!
Log to standard error as well as files.
Default:
false
Run-time configurable: yes
3.3.2.3. core.klog.logBacktraceAt Copy linkLink copied to clipboard!
When logging hits line file:N, emit a stack trace.
Default: empty
Run-time configurable: yes
3.3.2.4. core.klog.logDir Copy linkLink copied to clipboard!
If non-empty, write log files in this directory.
Default: empty
Run-time configurable: no
3.3.2.5. core.klog.logFile Copy linkLink copied to clipboard!
If not empty, use this log file.
Default: empty
Run-time configurable: no
3.3.2.6. core.klog.logFileMaxSize Copy linkLink copied to clipboard!
core.klog.logFileMaxSize
0
Default:
1800
Run-time configurable: no
3.3.2.7. core.klog.logtostderr Copy linkLink copied to clipboard!
Log to standard error instead of files
Default:
true
Run-time configurable: yes
3.3.2.8. core.klog.skipHeaders Copy linkLink copied to clipboard!
If
core.klog.skipHeaders
true
Default:
false
Run-time configurable: yes
3.3.2.9. core.klog.skipLogHeaders Copy linkLink copied to clipboard!
If
core.klog.skipLogHeaders
true
Default:
false
Run-time configurable: no
3.3.2.10. core.klog.stderrthreshold Copy linkLink copied to clipboard!
Logs at or above this threshold go to stderr.
Default:
2
Run-time configurable: yes
3.3.2.11. core.klog.v Copy linkLink copied to clipboard!
core.klog.v
Default:
0
Run-time configurable: yes
3.3.2.12. core.klog.vmodule Copy linkLink copied to clipboard!
core.klog.vmodule
pattern=N
Default: empty
Run-time configurable: yes
3.3.3. sources Copy linkLink copied to clipboard!
The
sources
3.3.3.1. sources.cpu.cpuid.attributeBlacklist Copy linkLink copied to clipboard!
Prevent publishing
cpuid
This value is overridden by
sources.cpu.cpuid.attributeWhitelist
Default:
[BMI1, BMI2, CLMUL, CMOV, CX16, ERMS, F16C, HTT, LZCNT, MMX, MMXEXT, NX, POPCNT, RDRAND, RDSEED, RDTSCP, SGX, SGXLC, SSE, SSE2, SSE3, SSE4.1, SSE4.2, SSSE3]
Example usage
sources:
cpu:
cpuid:
attributeBlacklist: [MMX, MMXEXT]
3.3.3.2. sources.cpu.cpuid.attributeWhitelist Copy linkLink copied to clipboard!
Only publish the
cpuid
sources.cpu.cpuid.attributeWhitelist
sources.cpu.cpuid.attributeBlacklist
Default: empty
Example usage
sources:
cpu:
cpuid:
attributeWhitelist: [AVX512BW, AVX512CD, AVX512DQ, AVX512F, AVX512VL]
3.3.3.3. sources.kernel.kconfigFile Copy linkLink copied to clipboard!
sources.kernel.kconfigFile
Default: empty
Example usage
sources:
kernel:
kconfigFile: "/path/to/kconfig"
3.3.3.4. sources.kernel.configOpts Copy linkLink copied to clipboard!
sources.kernel.configOpts
Default:
[NO_HZ, NO_HZ_IDLE, NO_HZ_FULL, PREEMPT]
Example usage
sources:
kernel:
configOpts: [NO_HZ, X86, DMI]
3.3.3.5. sources.pci.deviceClassWhitelist Copy linkLink copied to clipboard!
sources.pci.deviceClassWhitelist
03
0300
deviceLabelFields
Default:
["03", "0b40", "12"]
Example usage
sources:
pci:
deviceClassWhitelist: ["0200", "03"]
3.3.3.6. sources.pci.deviceLabelFields Copy linkLink copied to clipboard!
sources.pci.deviceLabelFields
class
vendor
device
subsystem_vendor
subsystem_device
Default:
[class, vendor]
Example usage
sources:
pci:
deviceLabelFields: [class, vendor, device]
With the example config above, NFD would publish labels such as
feature.node.kubernetes.io/pci-<class-id>_<vendor-id>_<device-id>.present=true
3.3.3.7. sources.usb.deviceClassWhitelist Copy linkLink copied to clipboard!
sources.usb.deviceClassWhitelist
deviceLabelFields
Default:
["0e", "ef", "fe", "ff"]
Example usage
sources:
usb:
deviceClassWhitelist: ["ef", "ff"]
3.3.3.8. sources.usb.deviceLabelFields Copy linkLink copied to clipboard!
sources.usb.deviceLabelFields
class
vendor
device
Default:
[class, vendor, device]
Example usage
sources:
pci:
deviceLabelFields: [class, vendor]
With the example config above, NFD would publish labels like:
feature.node.kubernetes.io/usb-<class-id>_<vendor-id>.present=true
3.3.3.9. sources.custom Copy linkLink copied to clipboard!
sources.custom
Default: empty
Example usage
source:
custom:
- name: "my.custom.feature"
matchOn:
- loadedKMod: ["e1000e"]
- pciId:
class: ["0200"]
vendor: ["8086"]
3.4. About the NodeFeatureRule custom resource Copy linkLink copied to clipboard!
NodeFeatureRule
NodeFeatureDiscovery
NodeFeatureRule
3.5. Using the NodeFeatureRule custom resource Copy linkLink copied to clipboard!
Create a
NodeFeatureRule
Procedure
Create a custom resource file named
that contains the following text:nodefeaturerule.yamlapiVersion: nfd.openshift.io/v1 kind: NodeFeatureRule metadata: name: example-rule spec: rules: - name: "example rule" labels: "example-custom-feature": "true" # Label is created if all of the rules below match matchFeatures: # Match if "veth" kernel module is loaded - feature: kernel.loadedmodule matchExpressions: veth: {op: Exists} # Match if any PCI device with vendor 8086 exists in the system - feature: pci.device matchExpressions: vendor: {op: In, value: ["8086"]}This custom resource specifies that labelling occurs when the
module is loaded and any PCI device with vendor codevethexists in the cluster.8086Apply the
file to your cluster by running the following command:nodefeaturerule.yaml$ oc apply -f https://raw.githubusercontent.com/kubernetes-sigs/node-feature-discovery/v0.13.6/examples/nodefeaturerule.yamlThe example applies the feature label on nodes with the
module loaded and any PCI device with vendor codevethexists.8086NoteA relabeling delay of up to 1 minute might occur.
3.6. Using the NFD Topology Updater Copy linkLink copied to clipboard!
The Node Feature Discovery (NFD) Topology Updater is a daemon responsible for examining allocated resources on a worker node. It accounts for resources that are available to be allocated to new pod on a per-zone basis, where a zone can be a Non-Uniform Memory Access (NUMA) node. The NFD Topology Updater communicates the information to nfd-master, which creates a
NodeResourceTopology
To enable the Topology Updater workers in NFD, set the
topologyupdater
true
NodeFeatureDiscovery
3.6.1. NodeResourceTopology CR Copy linkLink copied to clipboard!
When run with NFD Topology Updater, NFD creates custom resource instances corresponding to the node resource hardware topology, such as:
apiVersion: topology.node.k8s.io/v1alpha1
kind: NodeResourceTopology
metadata:
name: node1
topologyPolicies: ["SingleNUMANodeContainerLevel"]
zones:
- name: node-0
type: Node
resources:
- name: cpu
capacity: 20
allocatable: 16
available: 10
- name: vendor/nic1
capacity: 3
allocatable: 3
available: 3
- name: node-1
type: Node
resources:
- name: cpu
capacity: 30
allocatable: 30
available: 15
- name: vendor/nic2
capacity: 6
allocatable: 6
available: 6
- name: node-2
type: Node
resources:
- name: cpu
capacity: 30
allocatable: 30
available: 15
- name: vendor/nic1
capacity: 3
allocatable: 3
available: 3
3.6.2. NFD Topology Updater command-line flags Copy linkLink copied to clipboard!
To view available command-line flags, run the
nfd-topology-updater -help
$ podman run gcr.io/k8s-staging-nfd/node-feature-discovery:master nfd-topology-updater -help
3.6.2.1. -ca-file Copy linkLink copied to clipboard!
The
-ca-file
-cert-file
Default: empty
The
-ca-file
-cert-file
-key-file
Example
$ nfd-topology-updater -ca-file=/opt/nfd/ca.crt -cert-file=/opt/nfd/updater.crt -key-file=/opt/nfd/updater.key
3.6.2.2. -cert-file Copy linkLink copied to clipboard!
The
-cert-file
-ca-file
-key-file flags
Default: empty
The
-cert-file
-ca-file
-key-file
Example
$ nfd-topology-updater -cert-file=/opt/nfd/updater.crt -key-file=/opt/nfd/updater.key -ca-file=/opt/nfd/ca.crt
3.6.2.3. -h, -help Copy linkLink copied to clipboard!
Print usage and exit.
3.6.2.4. -key-file Copy linkLink copied to clipboard!
The
-key-file
-ca-file
-cert-file
-cert-file
Default: empty
The
-key-file
-ca-file
-cert-file
Example
$ nfd-topology-updater -key-file=/opt/nfd/updater.key -cert-file=/opt/nfd/updater.crt -ca-file=/opt/nfd/ca.crt
3.6.2.5. -kubelet-config-file Copy linkLink copied to clipboard!
The
-kubelet-config-file
Default:
/host-var/lib/kubelet/config.yaml
Example
$ nfd-topology-updater -kubelet-config-file=/var/lib/kubelet/config.yaml
3.6.2.6. -no-publish Copy linkLink copied to clipboard!
The
-no-publish
Default:
false
Example
$ nfd-topology-updater -no-publish
3.6.2.7. -oneshot Copy linkLink copied to clipboard!
The
-oneshot
Default:
false
Example
$ nfd-topology-updater -oneshot -no-publish
3.6.2.8. -podresources-socket Copy linkLink copied to clipboard!
The
-podresources-socket
Default:
/host-var/liblib/kubelet/pod-resources/kubelet.sock
Example
$ nfd-topology-updater -podresources-socket=/var/lib/kubelet/pod-resources/kubelet.sock
3.6.2.9. -server Copy linkLink copied to clipboard!
The
-server
Default:
localhost:8080
Example
$ nfd-topology-updater -server=nfd-master.nfd.svc.cluster.local:443
3.6.2.10. -server-name-override Copy linkLink copied to clipboard!
The
-server-name-override
Default: empty
Example
$ nfd-topology-updater -server-name-override=localhost
3.6.2.11. -sleep-interval Copy linkLink copied to clipboard!
The
-sleep-interval
Default:
60s
Example
$ nfd-topology-updater -sleep-interval=1h
3.6.2.12. -version Copy linkLink copied to clipboard!
Print version and exit.
3.6.2.13. -watch-namespace Copy linkLink copied to clipboard!
The
-watch-namespace
*
Default:
*
Example
$ nfd-topology-updater -watch-namespace=rte
Chapter 4. Kernel Module Management Operator Copy linkLink copied to clipboard!
Learn about the Kernel Module Management (KMM) Operator and how you can use it to deploy out-of-tree kernel modules and device plugins on OpenShift Container Platform clusters.
4.1. About the Kernel Module Management Operator Copy linkLink copied to clipboard!
The Kernel Module Management (KMM) Operator manages, builds, signs, and deploys out-of-tree kernel modules and device plugins on OpenShift Container Platform clusters.
KMM adds a new
Module
Module
ModuleLoader
KMM is designed to accommodate multiple kernel versions at once for any kernel module, allowing for seamless node upgrades and reduced application downtime.
4.2. Installing the Kernel Module Management Operator Copy linkLink copied to clipboard!
As a cluster administrator, you can install the Kernel Module Management (KMM) Operator by using the OpenShift CLI or the web console.
The KMM Operator is supported on OpenShift Container Platform 4.12 and later. Installing KMM on version 4.11 does not require specific additional steps. For details on installing KMM on version 4.10 and earlier, see the section "Installing the Kernel Module Management Operator on earlier versions of OpenShift Container Platform".
4.2.1. Installing the Kernel Module Management Operator using the web console Copy linkLink copied to clipboard!
As a cluster administrator, you can install the Kernel Module Management (KMM) Operator using the OpenShift Container Platform web console.
Procedure
- Log in to the OpenShift Container Platform web console.
Install the Kernel Module Management Operator:
- In the OpenShift Container Platform web console, click Operators → OperatorHub.
- Select Kernel Module Management Operator from the list of available Operators, and then click Install.
-
From the Installed Namespace list, select the namespace.
openshift-kmm - Click Install.
Verification
To verify that KMM Operator installed successfully:
- Navigate to the Operators → Installed Operators page.
Ensure that Kernel Module Management Operator is listed in the openshift-kmm project with a Status of InstallSucceeded.
NoteDuring installation, an Operator might display a Failed status. If the installation later succeeds with an InstallSucceeded message, you can ignore the Failed message.
Troubleshooting
To troubleshoot issues with Operator installation:
- Navigate to the Operators → Installed Operators page and inspect the Operator Subscriptions and Install Plans tabs for any failure or errors under Status.
-
Navigate to the Workloads → Pods page and check the logs for pods in the project.
openshift-kmm
4.2.2. Installing the Kernel Module Management Operator by using the CLI Copy linkLink copied to clipboard!
As a cluster administrator, you can install the Kernel Module Management (KMM) Operator by using the OpenShift CLI.
Prerequisites
- You have a running OpenShift Container Platform cluster.
-
You installed the OpenShift CLI ().
oc -
You are logged into the OpenShift CLI as a user with privileges.
cluster-admin
Procedure
Install KMM in the
namespace:openshift-kmmCreate the following
CR and save the YAML file, for example,Namespace:kmm-namespace.yamlapiVersion: v1 kind: Namespace metadata: name: openshift-kmmCreate the following
CR and save the YAML file, for example,OperatorGroup:kmm-op-group.yamlapiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: kernel-module-management namespace: openshift-kmmCreate the following
CR and save the YAML file, for example,Subscription:kmm-sub.yamlapiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: kernel-module-management namespace: openshift-kmm spec: channel: release-1.0 installPlanApproval: Automatic name: kernel-module-management source: redhat-operators sourceNamespace: openshift-marketplace startingCSV: kernel-module-management.v1.0.0Create the subscription object by running the following command:
$ oc create -f kmm-sub.yaml
Verification
To verify that the Operator deployment is successful, run the following command:
$ oc get -n openshift-kmm deployments.apps kmm-operator-controllerExample output
NAME READY UP-TO-DATE AVAILABLE AGE kmm-operator-controller 1/1 1 1 97sThe Operator is available.
4.2.3. Installing the Kernel Module Management Operator on earlier versions of OpenShift Container Platform Copy linkLink copied to clipboard!
The KMM Operator is supported on OpenShift Container Platform 4.12 and later. For version 4.10 and earlier, you must create a new
SecurityContextConstraint
ServiceAccount
Prerequisites
- You have a running OpenShift Container Platform cluster.
-
You installed the OpenShift CLI ().
oc -
You are logged into the OpenShift CLI as a user with privileges.
cluster-admin
Procedure
Install KMM in the
namespace:openshift-kmmCreate the following
CR and save the YAML file, for example,Namespacefile:kmm-namespace.yamlapiVersion: v1 kind: Namespace metadata: name: openshift-kmmCreate the following
object and save the YAML file, for example,SecurityContextConstraint:kmm-security-constraint.yamlallowHostDirVolumePlugin: false allowHostIPC: false allowHostNetwork: false allowHostPID: false allowHostPorts: false allowPrivilegeEscalation: false allowPrivilegedContainer: false allowedCapabilities: - NET_BIND_SERVICE apiVersion: security.openshift.io/v1 defaultAddCapabilities: null fsGroup: type: MustRunAs groups: [] kind: SecurityContextConstraints metadata: name: restricted-v2 priority: null readOnlyRootFilesystem: false requiredDropCapabilities: - ALL runAsUser: type: MustRunAsRange seLinuxContext: type: MustRunAs seccompProfiles: - runtime/default supplementalGroups: type: RunAsAny users: [] volumes: - configMap - downwardAPI - emptyDir - persistentVolumeClaim - projected - secretBind the
object to the Operator’sSecurityContextConstraintby running the following commands:ServiceAccount$ oc apply -f kmm-security-constraint.yaml$ oc adm policy add-scc-to-user kmm-security-constraint -z kmm-operator-controller -n openshift-kmmCreate the following
CR and save the YAML file, for example,OperatorGroup:kmm-op-group.yamlapiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: kernel-module-management namespace: openshift-kmmCreate the following
CR and save the YAML file, for example,Subscription:kmm-sub.yamlapiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: kernel-module-management namespace: openshift-kmm spec: channel: release-1.0 installPlanApproval: Automatic name: kernel-module-management source: redhat-operators sourceNamespace: openshift-marketplace startingCSV: kernel-module-management.v1.0.0Create the subscription object by running the following command:
$ oc create -f kmm-sub.yaml
Verification
To verify that the Operator deployment is successful, run the following command:
$ oc get -n openshift-kmm deployments.apps kmm-operator-controllerExample output
NAME READY UP-TO-DATE AVAILABLE AGE kmm-operator-controller 1/1 1 1 97sThe Operator is available.
4.3. Configuring the Kernel Module Management Operator Copy linkLink copied to clipboard!
In most cases, the default configuration for the Kernel Module Management (KMM) Operator does not need to be modified. However, you can modify the Operator settings to suit your environment.
Procedure
To modify any setting, create a
with the nameConfigMapin the Operator namespace with the relevant data and restart the controller using the following command:kmm-operator-manager-config$ oc rollout restart -n "$namespace" deployment/kmm-operator-controllerThe value of
depends on your installation method.$namespaceExample output
apiVersion: v1 data: controller_config.yaml: | worker: firmwareHostPath: /example/different/firmware/path kind: ConfigMap metadata: name: kmm-operator-manager-config namespace: openshift-kmm
If you want to configure
KMM Hub
ConfigMap
kmm-operator-hub-manager-config
| Parameter | Description |
|---|---|
|
| Defines the address on which the Operator monitors for kubelet health probes. The recommended value is
|
|
| Defines the duration for which successful build pods should be preserved before they are deleted. For information about the valid values for this setting, see ParseDuration. The default value is
|
|
| Determines whether leader election is used to ensure that only one replica of the KMM Operator is running at any time. For more information, see Leases. The default value is
|
|
| Determines the name of the resource that leader election uses for holding the leader lock. The default value for KMM is
|
|
| Determines the bind address for the metrics server. Set this to "0" to disable the metrics server. The default value is
|
|
| If
|
|
| Determines if metrics are authenticated using
For authentication and authorization, the controller needs a
To scrape metrics, for example, using Prometheus, the client needs a
The default value is
|
|
| Determines whether the metrics are served over HTTPS instead of HTTP. The default value is
|
|
| If
|
|
| Defines the port on which the Operator monitors webhook requests. The default value is
|
|
| Determines the value of the
|
|
| Determines the value of the
|
|
| If set, the value of this field is written by the worker container into the /sys/module/firmware_class/parameters/path file on the node. For more information see Setting the kernel’s firmware search path. The default value is
|
4.3.1. Unloading the kernel module Copy linkLink copied to clipboard!
You must unload the kernel modules when moving to a newer version or if they introduce some undesirable side effect on the node.
Procedure
To unload a module loaded with KMM from nodes, delete the corresponding
resource. KMM then creates worker pods, where required, to runModuleand unload the kernel module from the nodes.modprobe -rWarningWhen unloading worker pods, KMM needs all the resources it uses when loading the kernel module. This includes the
referenced in theServiceAccountas well as any RBAC defined to allow privileged KMM worker Pods to run. It also includes any pull secret referenced inModule..spec.imageRepoSecretTo avoid situations where KMM is unable to unload the kernel module from nodes, make sure those resources are not deleted while the
resource is still present in the cluster in any state, includingModule. KMM includes a validating admission webhook that rejects the deletion of namespaces that contain at least oneTerminatingresource.Module
4.3.2. Setting the kernel firmware search path Copy linkLink copied to clipboard!
The Linux kernel accepts the
firmware_class.path
KMM worker pods can set this value on nodes by writing to sysfs before attempting to load kmods.
Procedure
-
To define a firmware search path, set to
worker.setFirmwareClassPathin the Operator configuration./var/lib/firmware
4.4. Uninstalling the Kernel Module Management Operator Copy linkLink copied to clipboard!
Use one of the following procedures to uninstall the Kernel Module Management (KMM) Operator, depending on how the KMM Operator was installed.
4.4.1. Uninstalling a Red Hat catalog installation Copy linkLink copied to clipboard!
Use this procedure if KMM was installed from the Red Hat catalog.
Procedure
Use the following method to uninstall the KMM Operator:
- Use the OpenShift console under Operators -→ Installed Operators to locate and uninstall the Operator.
Alternatively, you can delete the
Subscription
4.4.2. Uninstalling a CLI installation Copy linkLink copied to clipboard!
Use this command if the KMM Operator was installed using the OpenShift CLI.
Procedure
Run the following command to uninstall the KMM Operator:
$ oc delete -k https://github.com/rh-ecosystem-edge/kernel-module-management/config/defaultNoteUsing this command deletes the
CRD and allModuleinstances in the cluster.Module
4.5. Kernel module deployment Copy linkLink copied to clipboard!
Kernel Module Management (KMM) monitors
Node
Module
To be eligible for a module, a node must contain the following:
-
Labels that match the module’s field.
.spec.selector -
A kernel version matching one of the items in the module’s field.
.spec.moduleLoader.container.kernelMappings -
If ordered upgrade () is configured in the module, a label that matches its
ordered_upgrade.mdfield..spec.moduleLoader.container.version
When KMM reconciles nodes with the desired state as configured in the
Module
Node
Worker pods run the KMM
worker
-
Pulls the kmod image configured in the resource. Kmod images are standard OCI images that contain
Modulefiles..ko - Extracts the image in the pod’s filesystem.
-
Runs with the specified arguments to perform the necessary action.
modprobe
4.5.1. The Module custom resource definition Copy linkLink copied to clipboard!
The
Module
Module
The compatible versions for a
Module
.spec.moduleLoader.container.kernelMappings
literal
regexp
The reconciliation loop for the
Module
-
List all nodes matching .
.spec.selector - Build a set of all kernel versions running on those nodes.
For each kernel version:
-
Go through and find the appropriate container image name. If the kernel mapping has
.spec.moduleLoader.container.kernelMappingsorbuilddefined and the container image does not already exist, run the build, the signing pod, or both, as needed.sign -
Create a worker pod to pull the container image determined in the previous step and run .
modprobe -
If is defined, create a device plugin daemon set using the configuration specified under
.spec.devicePlugin..spec.devicePlugin.container
-
Go through
Run
on:garbage-collect-
Obsolete device plugin that do not target any node.
DaemonSets - Successful build pods.
- Successful signing pods.
-
Obsolete device plugin
4.5.2. Set soft dependencies between kernel modules Copy linkLink copied to clipboard!
Some configurations require that several kernel modules be loaded in a specific order to work properly, even though the modules do not directly depend on each other through symbols. These are called soft dependencies.
depmod
mod_a
mod_b
modprobe mod_a
mod_b
You can resolve these situations by declaring soft dependencies in the Module custom resource definition (CRD) using the
modulesLoadingOrder
# ...
spec:
moduleLoader:
container:
modprobe:
moduleName: mod_a
dirName: /opt
firmwarePath: /firmware
parameters:
- param=1
modulesLoadingOrder:
- mod_a
- mod_b
In the configuration above, the worker pod will first try to unload the in-tree
mod_b
mod_a
mod_a
mod_b
The first value in the list, to be loaded last, must be equivalent to the
moduleName
4.6. Security and permissions Copy linkLink copied to clipboard!
Loading kernel modules is a highly sensitive operation. After they are loaded, kernel modules have all possible permissions to do any kind of operation on the node.
4.6.1. ServiceAccounts and SecurityContextConstraints Copy linkLink copied to clipboard!
Kernel Module Management (KMM) creates a privileged workload to load the kernel modules on nodes. That workload needs
ServiceAccounts
privileged
SecurityContextConstraint
The authorization model for that workload depends on the namespace of the
Module
-
If the or
.spec.moduleLoader.serviceAccountNamefields are set, they are always used..spec.devicePlugin.serviceAccountName If those fields are not set, then:
-
If the resource is created in the Operator’s namespace (
Moduleby default), then KMM uses its default, powerfulopenshift-kmmto run the worker and device plugin pods.ServiceAccounts -
If the resource is created in any other namespace, then KMM runs the pods with the namespace’s
Moduledefault. TheServiceAccountresource cannot run a privileged workload unless you manually enable it to use theModuleSCC.privileged
-
If the
openshift-kmm
When setting up RBAC permissions, remember that any user or
ServiceAccount
Module
openshift-kmm
To allow any
ServiceAccount
privileged
oc adm policy
$ oc adm policy add-scc-to-user privileged -z "${serviceAccountName}" [ -n "${namespace}" ]
4.6.2. Pod security standards Copy linkLink copied to clipboard!
OpenShift runs a synchronization mechanism that sets the namespace Pod Security level automatically based on the security contexts in use. No action is needed.
4.7. Replacing in-tree modules with out-of-tree modules Copy linkLink copied to clipboard!
You can use Kernel Module Management (KMM) to build kernel modules that can be loaded or unloaded into the kernel on demand. These modules extend the functionality of the kernel without the need to reboot the system. Modules can be configured as built-in or dynamically loaded.
Dynamically loaded modules include in-tree modules and out-of-tree (OOT) modules. In-tree modules are internal to the Linux kernel tree, that is, they are already part of the kernel. Out-of-tree modules are external to the Linux kernel tree. They are generally written for development and testing purposes, such as testing the new version of a kernel module that is shipped in-tree, or to deal with incompatibilities.
Some modules that are loaded by KMM could replace in-tree modules that are already loaded on the node. To unload in-tree modules before loading your module, set the value of the
.spec.moduleLoader.container.inTreeModulesToRemove
# ...
spec:
moduleLoader:
container:
modprobe:
moduleName: mod_a
inTreeModulesToRemove: [mod_a, mod_b]
In this example, the
moduleLoader
inTreeModulesToRemove
mod_a
mod_b
mod_a
moduleLoader
moduleLoader`pod is terminated and `mod_a
mod_b
The following is an example for module replacement for specific kernel mappings:
# ...
spec:
moduleLoader:
container:
kernelMappings:
- literal: 6.0.15-300.fc37.x86_64
containerImage: "some.registry/org/my-kmod:${KERNEL_FULL_VERSION}"
inTreeModulesToRemove: [<module_name>, <module_name>]
4.7.1. Example Module CR Copy linkLink copied to clipboard!
The following is an annotated
Module
apiVersion: kmm.sigs.x-k8s.io/v1beta1
kind: Module
metadata:
name: <my_kmod>
spec:
moduleLoader:
container:
modprobe:
moduleName: <my_kmod>
dirName: /opt
firmwarePath: /firmware
parameters:
- param=1
kernelMappings:
- literal: 6.0.15-300.fc37.x86_64
containerImage: some.registry/org/my-kmod:6.0.15-300.fc37.x86_64
- regexp: '^.+\fc37\.x86_64$'
containerImage: "some.other.registry/org/<my_kmod>:${KERNEL_FULL_VERSION}"
- regexp: '^.+$'
containerImage: "some.registry/org/<my_kmod>:${KERNEL_FULL_VERSION}"
build:
buildArgs:
- name: ARG_NAME
value: <some_value>
secrets:
- name: <some_kubernetes_secret>
baseImageRegistryTLS:
insecure: false
insecureSkipTLSVerify: false
dockerfileConfigMap:
name: <my_kmod_dockerfile>
sign:
certSecret:
name: <cert_secret>
keySecret:
name: <key_secret>
filesToSign:
- /opt/lib/modules/${KERNEL_FULL_VERSION}/<my_kmod>.ko
registryTLS:
insecure: false
insecureSkipTLSVerify: false
serviceAccountName: <sa_module_loader>
devicePlugin:
container:
image: some.registry/org/device-plugin:latest
env:
- name: MY_DEVICE_PLUGIN_ENV_VAR
value: SOME_VALUE
volumeMounts:
- mountPath: /some/mountPath
name: <device_plugin_volume>
volumes:
- name: <device_plugin_volume>
configMap:
name: <some_configmap>
serviceAccountName: <sa_device_plugin>
imageRepoSecret:
name: <secret_name>
selector:
node-role.kubernetes.io/worker: ""
- 1 1 1
- Required.
- 2
- Optional.
- 3
- Optional: Copies the contents of this path into the path specified in
worker.setFirmwareClassPath(which is preset to/var/lib/firmware) of thekmm-operator-manager-configconfig map. This action occurs beforemodprobeis called to insert the kernel module. - 4
- Optional.
- 5
- At least one kernel item is required.
- 6
- For each node running a kernel matching the regular expression, KMM checks if you have included a tag or a digest. If you have not specified a tag or digest in the container image, then the validation webhook returns an error and does not apply the module.
- 7
- For any other kernel, build the image using the Dockerfile in the
my-kmodConfigMap. - 8
- The container image that holds the customer’s kmods. This container should contain the
cpbinary. - 9
- Optional.
- 10
- Optional: A value for
some-kubernetes-secretcan be obtained from the build environment at/run/secrets/some-kubernetes-secret. - 11
- This field has no effect. When building kmod images or signing kmods within a kmod image, you might sometimes need to pull base images from a registry that serves a certificate signed by an untrusted Certificate Authority (CA). In order for KMM to trust that CA, it must also trust the new CA by replacing the cluster’s CA bundle.
See "Additional resources" to learn how to replace the cluster’s CA bundle.
- 12
- Optional: Avoid using this parameter. If set to
true, the build skips any TLS server certificate validation when pulling the image in the DockerfileFROMinstruction using plain HTTP. - 13
- Required.
- 14
- Required: A secret holding the public secureboot key with the key 'cert'.
- 15
- Required: A secret holding the private secureboot key with the key 'key'.
- 16
- Optional: Avoid using this parameter. If set to
true, KMM is allowed to check if the container image already exists using plain HTTP. - 17
- Optional: Avoid using this parameter. If set to
true, KMM skips any TLS server certificate validation when checking if the container image already exists. - 18
- Optional.
- 19
- Optional.
- 20
- Required: If the device plugin section is present.
- 21
- Optional.
- 22
- Optional.
- 23
- Optional.
- 24
- Optional: Used to pull module loader and device plugin images.
4.8. Using in-tree modules with the device plugin Copy linkLink copied to clipboard!
In some cases, you might need to configure the KMM Module to avoid loading an out-of-tree kernel module and instead use the in-tree module, running only the device plugin. In such cases, you can omit the
moduleLoader
Module
devicePlugin
Example Module CR
apiVersion: kmm.sigs.x-k8s.io/v1beta1
kind: Module
metadata:
name: my-kmod
spec:
selector:
node-role.kubernetes.io/worker: ""
devicePlugin:
container:
image: some.registry/org/my-device-plugin:latest
4.9. Symbolic links for in-tree dependencies Copy linkLink copied to clipboard!
Some kernel modules depend on other kernel modules that are shipped with the node’s operating system. To avoid copying those dependencies into the kmod image, Kernel Module Management (KMM) mounts
/usr/lib/modules
By creating a symlink from
/opt/usr/lib/modules/<kernel_version>/<symlink_name>
/usr/lib/modules/<kernel_version>
depmod
At runtime, the worker pod extracts the entire image, including the
<symlink_name>
/usr/lib/modules/<kernel_version>
modprobe
In the following example,
host
/opt/usr/lib/modules/<kernel_version>
ARG DTK_AUTO
FROM ${DTK_AUTO} as builder
#
# Build steps
#
FROM ubi9/ubi
ARG KERNEL_FULL_VERSION
RUN dnf update && dnf install -y kmod
COPY --from=builder /usr/src/kernel-module-management/ci/kmm-kmod/kmm_ci_a.ko /opt/lib/modules/${KERNEL_FULL_VERSION}/
COPY --from=builder /usr/src/kernel-module-management/ci/kmm-kmod/kmm_ci_b.ko /opt/lib/modules/${KERNEL_FULL_VERSION}/
# Create the symbolic link
RUN ln -s /lib/modules/${KERNEL_FULL_VERSION} /opt/lib/modules/${KERNEL_FULL_VERSION}/host
RUN depmod -b /opt ${KERNEL_FULL_VERSION}
depmod
On the node on which KMM loads the kernel modules,
modprobe
/usr/lib/modules/<kernel_version>
4.10. Creating a kmod image Copy linkLink copied to clipboard!
Kernel Module Management (KMM) works with purpose-built kmod images, which are standard OCI images that contain
.ko
.ko
<prefix>/lib/modules/[kernel-version]/
Keep the following in mind when working with the
.ko
-
In most cases, should be equal to
<prefix>. This is the/optCRD’s default value.Module -
must not be empty and must be equal to the kernel version the kernel modules were built for.
kernel-version
In addition to the
.ko
cp
.ko
4.10.1. Running depmod Copy linkLink copied to clipboard!
It is recommended to run
depmod
modules.dep
.map
You must have a Red Hat subscription to download the
kernel-devel
Procedure
Generate
andmodules.depfiles for a specific kernel version by running the following command:.map$ depmod -b /opt ${KERNEL_FULL_VERSION}+`.Example Dockerfile
If you are building your image on OpenShift Container Platform, consider using the Driver Toolkit (DTK).
For further information, see using an entitled build.
apiVersion: v1 kind: ConfigMap metadata: name: kmm-ci-dockerfile data: dockerfile: | ARG DTK_AUTO FROM ${DTK_AUTO} as builder ARG KERNEL_FULL_VERSION WORKDIR /usr/src RUN ["git", "clone", "https://github.com/rh-ecosystem-edge/kernel-module-management.git"] WORKDIR /usr/src/kernel-module-management/ci/kmm-kmod RUN KERNEL_SRC_DIR=/lib/modules/${KERNEL_FULL_VERSION}/build make all FROM registry.redhat.io/ubi9/ubi-minimal ARG KERNEL_FULL_VERSION RUN microdnf install kmod COPY --from=builder /usr/src/kernel-module-management/ci/kmm-kmod/kmm_ci_a.ko /opt/lib/modules/${KERNEL_FULL_VERSION}/ COPY --from=builder /usr/src/kernel-module-management/ci/kmm-kmod/kmm_ci_b.ko /opt/lib/modules/${KERNEL_FULL_VERSION}/ RUN depmod -b /opt ${KERNEL_FULL_VERSION}
4.10.2. Building in the cluster Copy linkLink copied to clipboard!
KMM can build kmod images in the cluster. Follow these guidelines:
-
Provide build instructions using the section of a kernel mapping.
build -
Copy the Dockerfile for your container image into a resource, under the
ConfigMapkey.dockerfile -
Ensure that the is located in the same namespace as the
ConfigMap.Module
KMM checks if the image name specified in the
containerImage
Otherwise, KMM creates a
Build
Module
# ...
- regexp: '^.+$'
containerImage: "some.registry/org/<my_kmod>:${KERNEL_FULL_VERSION}"
build:
buildArgs:
- name: ARG_NAME
value: <some_value>
secrets:
- name: <some_kubernetes_secret>
baseImageRegistryTLS:
insecure: false
insecureSkipTLSVerify: false
dockerfileConfigMap:
name: <my_kmod_dockerfile>
registryTLS:
insecure: false
insecureSkipTLSVerify: false
- 1
- Optional.
- 2
- Optional.
- 3
- Will be mounted in the build pod as
/run/secrets/some-kubernetes-secret. - 4
- Optional: Avoid using this parameter. If set to
true, the build will be allowed to pull the image in the DockerfileFROMinstruction using plain HTTP. - 5
- Optional: Avoid using this parameter. If set to
true, the build will skip any TLS server certificate validation when pulling the image in the DockerfileFROMinstruction using plain HTTP. - 6
- Required.
- 7
- Optional: Avoid using this parameter. If set to
true, KMM will be allowed to check if the container image already exists using plain HTTP. - 8
- Optional: Avoid using this parameter. If set to
true, KMM will skip any TLS server certificate validation when checking if the container image already exists.
Successful build pods are garbage collected immediately, unless the
job.gcDelay
4.10.3. Using the Driver Toolkit Copy linkLink copied to clipboard!
The Driver Toolkit (DTK) is a convenient base image for building build kmod loader images. It contains tools and libraries for the OpenShift version currently running in the cluster.
Procedure
Use DTK as the first stage of a multi-stage Dockerfile.
- Build the kernel modules.
-
Copy the files into a smaller end-user image such as
.koubi-minimal. To leverage DTK in your in-cluster build, use the
build argument. The value is automatically set by KMM when creating theDTK_AUTOresource. See the following example.BuildARG DTK_AUTO FROM ${DTK_AUTO} as builder ARG KERNEL_FULL_VERSION WORKDIR /usr/src RUN ["git", "clone", "https://github.com/rh-ecosystem-edge/kernel-module-management.git"] WORKDIR /usr/src/kernel-module-management/ci/kmm-kmod RUN KERNEL_SRC_DIR=/lib/modules/${KERNEL_FULL_VERSION}/build make all FROM ubi9/ubi-minimal ARG KERNEL_FULL_VERSION RUN microdnf install kmod COPY --from=builder /usr/src/kernel-module-management/ci/kmm-kmod/kmm_ci_a.ko /opt/lib/modules/${KERNEL_FULL_VERSION}/ COPY --from=builder /usr/src/kernel-module-management/ci/kmm-kmod/kmm_ci_b.ko /opt/lib/modules/${KERNEL_FULL_VERSION}/ RUN depmod -b /opt ${KERNEL_FULL_VERSION}
4.11. Using signing with Kernel Module Management (KMM) Copy linkLink copied to clipboard!
On a Secure Boot enabled system, all kernel modules (kmods) must be signed with a public/private key-pair enrolled into the Machine Owner’s Key (MOK) database. Drivers distributed as part of a distribution should already be signed by the distribution’s private key, but for kernel modules build out-of-tree, KMM supports signing kernel modules using the
sign
For more details on using Secure Boot, see Generating a public and private key pair
Prerequisites
- A public private key pair in the correct (DER) format.
- At least one secure-boot enabled node with the public key enrolled in its MOK database.
- Either a pre-built driver container image, or the source code and Dockerfile needed to build one in-cluster.
4.12. Adding the keys for secureboot Copy linkLink copied to clipboard!
To use KMM Kernel Module Management (KMM) to sign kernel modules, a certificate and private key are required. For details on how to create these, see Generating a public and private key pair.
For details on how to extract the public and private key pair, see Signing kernel modules with the private key. Use steps 1 through 4 to extract the keys into files.
Procedure
Create the
file that contains the certificate and thesb_cert.cerfile that contains the private key:sb_cert.priv$ openssl req -x509 -new -nodes -utf8 -sha256 -days 36500 -batch -config configuration_file.config -outform DER -out my_signing_key_pub.der -keyout my_signing_key.privAdd the files by using one of the following methods:
Add the files as secrets directly:
$ oc create secret generic my-signing-key --from-file=key=<my_signing_key.priv>$ oc create secret generic my-signing-key-pub --from-file=cert=<my_signing_key_pub.der>Add the files by base64 encoding them:
$ cat sb_cert.priv | base64 -w 0 > my_signing_key2.base64$ cat sb_cert.cer | base64 -w 0 > my_signing_key_pub.base64
Add the encoded text to a YAML file:
apiVersion: v1 kind: Secret metadata: name: my-signing-key-pub namespace: default1 type: Opaque data: cert: <base64_encoded_secureboot_public_key> --- apiVersion: v1 kind: Secret metadata: name: my-signing-key namespace: default2 type: Opaque data: key: <base64_encoded_secureboot_private_key>Apply the YAML file:
$ oc apply -f <yaml_filename>
4.12.1. Checking the keys Copy linkLink copied to clipboard!
After you have added the keys, you must check them to ensure they are set correctly.
Procedure
Check to ensure the public key secret is set correctly:
$ oc get secret -o yaml <certificate secret name> | awk '/cert/{print $2; exit}' | base64 -d | openssl x509 -inform der -textThis should display a certificate with a Serial Number, Issuer, Subject, and more.
Check to ensure the private key secret is set correctly:
$ oc get secret -o yaml <private key secret name> | awk '/key/{print $2; exit}' | base64 -dThis should display the key enclosed in the
and-----BEGIN PRIVATE KEY-----lines.-----END PRIVATE KEY-----
4.13. Signing kmods in a pre-built image Copy linkLink copied to clipboard!
Use this procedure if you have a pre-built image, such as an image either distributed by a hardware vendor or built elsewhere.
The following YAML file adds the public/private key-pair as secrets with the required key names -
key
cert
unsignedImage
filesToSign
containerImage
KMM then loads the signed kmods onto all the nodes with that match the selector. The kmods are successfully loaded on any nodes that have the public key in their MOK database, and any nodes that are not secure-boot enabled, which will ignore the signature.
Prerequisites
-
The and
keySecretsecrets have been created in the same namespace as the rest of the resources.certSecret
Procedure
Apply the YAML file:
--- apiVersion: kmm.sigs.x-k8s.io/v1beta1 kind: Module metadata: name: example-module spec: moduleLoader: serviceAccountName: default container: modprobe:1 moduleName: '<module_name>' kernelMappings: # the kmods will be deployed on all nodes in the cluster with a kernel that matches the regexp - regexp: '^.*\.x86_64$' # the container to produce containing the signed kmods containerImage: <image_name>2 sign: # the image containing the unsigned kmods (we need this because we are not building the kmods within the cluster) unsignedImage: <image_name>3 keySecret: # a secret holding the private secureboot key with the key 'key' name: <private_key_secret_name> certSecret: # a secret holding the public secureboot key with the key 'cert' name: <certificate_secret_name> filesToSign: # full path within the unsignedImage container to the kmod(s) to sign - /opt/lib/modules/4.18.0-348.2.1.el8_5.x86_64/kmm_ci_a.ko imageRepoSecret: # the name of a secret containing credentials to pull unsignedImage and push containerImage to the registry name: repo-pull-secret selector: kubernetes.io/arch: amd64
4.14. Building and signing a kmod image Copy linkLink copied to clipboard!
Use this procedure if you have source code and must build your image first.
The following YAML file builds a new container image using the source code from the repository. The image produced is saved back in the registry with a temporary name, and this temporary image is then signed using the parameters in the
sign
The temporary image name is based on the final image name and is set to be
<containerImage>:<tag>-<namespace>_<module name>_kmm_unsigned
For example, using the following YAML file, Kernel Module Management (KMM) builds an image named
example.org/repository/minimal-driver:final-default_example-module_kmm_unsigned
example.org/repository/minimal-driver:final
After it is signed, you can safely delete the temporary image from the registry. It will be rebuilt, if needed.
Prerequisites
-
The and
keySecretsecrets have been created in the same namespace as the rest of the resources.certSecret
Procedure
Apply the YAML file:
--- apiVersion: v1 kind: ConfigMap metadata: name: example-module-dockerfile namespace: <namespace>1 data: dockerfile: | ARG DTK_AUTO ARG KERNEL_VERSION FROM ${DTK_AUTO} as builder WORKDIR /build/ RUN git clone -b main --single-branch https://github.com/rh-ecosystem-edge/kernel-module-management.git WORKDIR kernel-module-management/ci/kmm-kmod/ RUN make FROM registry.access.redhat.com/ubi9/ubi:latest ARG KERNEL_VERSION RUN yum -y install kmod && yum clean all RUN mkdir -p /opt/lib/modules/${KERNEL_VERSION} COPY --from=builder /build/kernel-module-management/ci/kmm-kmod/*.ko /opt/lib/modules/${KERNEL_VERSION}/ RUN /usr/sbin/depmod -b /opt --- apiVersion: kmm.sigs.x-k8s.io/v1beta1 kind: Module metadata: name: example-module namespace: <namespace>2 spec: moduleLoader: serviceAccountName: default3 container: modprobe: moduleName: simple_kmod kernelMappings: - regexp: '^.*\.x86_64$' containerImage: <final_driver_container_name> build: dockerfileConfigMap: name: example-module-dockerfile sign: keySecret: name: <private_key_secret_name> certSecret: name: <certificate_secret_name> filesToSign: - /opt/lib/modules/4.18.0-348.2.1.el8_5.x86_64/kmm_ci_a.ko imageRepoSecret:4 name: repo-pull-secret selector: # top-level selector kubernetes.io/arch: amd64
- 1 2
- Replace
defaultwith a valid namespace. - 3
- The default
serviceAccountNamedoes not have the required permissions to run a module that is privileged. For information on creating a service account, see "Creating service accounts" in the "Additional resources" of this section. - 4
- Used as
imagePullSecretsin theDaemonSetobject and to pull and push for the build and sign features.
4.15. Using tolerations for kernel module scheduling Copy linkLink copied to clipboard!
There are circumstances where you need to evacuate workloads on a node before upgrading a kernel module, which you can do through taints. You can use a taint to schedule only pods that contain a matching toleration on the node.
However, you also need to set tolerations to allow Kernel Module Management (KMM) pods that run housekeeping operations, such as kernel module upgrades. The tolerations must match the taint that is added to the nodes.
You can create user-defined tolerations to kernel modules to schedule selected KMM pods on a cordoned node. For example, during a device driver upgrade you cordon a node, at the same time, you can run housekeeping pods that perform the driver upgrades.
The
ModuleSpec
4.16. Applying tolerations to kernel module pods Copy linkLink copied to clipboard!
Taints and tolerations consist of
effect
key
value
operator
tolerationSeconds
effect-
Indicates the taint effect to match. If left empty, all taint effects are matched. When you set
effect, valid values are:NoSchedule,PreferNoSchedule, orNoExecute. key-
The taint key that the toleration applies to. If left empty, all taint keys are matched. If the
keyis empty, you must set theoperatorparameter toExists. This combination matches all values and all keys. value-
The taint value the toleration matches to. If the
operatorparameter isExists, the value must be empty, otherwise use a regular string. operator-
Represents a relationship of a key to the value. Valid
operatorparameters areExistsandEqual. The default value isEqual.Existsis equivalent to wildcard for value, so that a pod can tolerate all taints of a particular category. tolerationSeconds-
Represents the period of time the toleration (which must be of effect
NoExecute, otherwise this field is ignored) tolerates the taint. By default, it is not set and the taint is tolerated forever without eviction. Zero and negative values are treated as0and immediately evicted by the system.
Example taint in a node specification
apiVersion: v1
kind: Node
metadata:
name: <my_node>
#...
spec:
taints:
- effect: NoSchedule
key: key1
value: value1
#...
Example toleration in a module specification
apiVersion: kmm.sigs.x-k8s.io/v1beta1
kind: Module
metadata:
name: <my_kmod>
spec:
...
tolerations:
effect: NoSchedule
key: key1
operator: Equal
tolerationSeconds: 36000
value: value1
Toleration values must match the taint that is added to the nodes. A toleration matches a taint:
If the
parameter is set tooperator:Equal-
the parameters are the same;
key -
the parameters are the same;
value -
the parameters are the same.
effect
-
the
If the
parameter is set tooperator:Exists-
the parameters are the same;
key -
the parameters are the same.
effect
-
the
4.17. KMM hub and spoke Copy linkLink copied to clipboard!
In hub and spoke scenarios, many spoke clusters are connected to a central, powerful hub cluster. Kernel Module Management (KMM) depends on Red Hat Advanced Cluster Management (RHACM) to operate in hub and spoke environments.
KMM is compatible with hub and spoke environments through decoupling KMM features. A
ManagedClusterModule
Module
In hub and spoke setups, spokes are focused, resource-constrained clusters that are centrally managed by a hub cluster. Spokes run the single-cluster edition of KMM, with those resource-intensive features disabled. To adapt KMM to this environment, you should reduce the workload running on the spokes to the minimum, while the hub takes care of the expensive tasks.
Building kernel module images and signing the
.ko
DaemonSets
4.17.1. KMM-Hub Copy linkLink copied to clipboard!
The KMM project provides KMM-Hub, an edition of KMM dedicated to hub clusters. KMM-Hub monitors all kernel versions running on the spokes and determines the nodes on the cluster that should receive a kernel module.
KMM-Hub runs all compute-intensive tasks such as image builds and kmod signing, and prepares the trimmed-down
Module
KMM-Hub cannot be used to load kernel modules on the hub cluster. Install the regular edition of KMM to load kernel modules.
4.17.2. Installing KMM-Hub Copy linkLink copied to clipboard!
You can use one of the following methods to install KMM-Hub:
- With the Operator Lifecycle Manager (OLM)
- Creating KMM resources
4.17.2.1. Installing KMM-Hub using the Operator Lifecycle Manager Copy linkLink copied to clipboard!
Use the Operators section of the OpenShift console to install KMM-Hub.
4.17.2.2. Installing KMM-Hub by creating KMM resources Copy linkLink copied to clipboard!
Procedure
-
If you want to install KMM-Hub programmatically, you can use the following resources to create the ,
NamespaceandOperatorGroupresources:Subscription
---
apiVersion: v1
kind: Namespace
metadata:
name: openshift-kmm-hub
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: kernel-module-management-hub
namespace: openshift-kmm-hub
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: kernel-module-management-hub
namespace: openshift-kmm-hub
spec:
channel: stable
installPlanApproval: Automatic
name: kernel-module-management-hub
source: redhat-operators
sourceNamespace: openshift-marketplace
4.17.3. Using the ManagedClusterModule CRD Copy linkLink copied to clipboard!
Use the
ManagedClusterModule
Module
apiVersion: hub.kmm.sigs.x-k8s.io/v1beta1
kind: ManagedClusterModule
metadata:
name: <my-mcm>
# No namespace, because this resource is cluster-scoped.
spec:
moduleSpec:
selector:
node-wants-my-mcm: 'true'
spokeNamespace: <some-namespace>
selector:
wants-my-mcm: 'true'
If build or signing instructions are present in
.spec.moduleSpec
When the
.spec.selector matches
ManagedCluster
ManifestWork
ManifestWork
Module
build
sign
containerImage
4.17.4. Running KMM on the spoke Copy linkLink copied to clipboard!
After installing Kernel Module Management (KMM) on the spoke, no further action is required. Create a
ManagedClusterModule
Procedure
You can install KMM on the spokes cluster through a RHACM
Policy
Policy
Module
Use the following RHACM policy to install KMM on spoke clusters:
--- apiVersion: policy.open-cluster-management.io/v1 kind: Policy metadata: name: install-kmm spec: remediationAction: enforce disabled: false policy-templates: - objectDefinition: apiVersion: policy.open-cluster-management.io/v1 kind: ConfigurationPolicy metadata: name: install-kmm spec: severity: high object-templates: - complianceType: mustonlyhave objectDefinition: apiVersion: v1 kind: Namespace metadata: name: openshift-kmm - complianceType: mustonlyhave objectDefinition: apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: kmm namespace: openshift-kmm spec: upgradeStrategy: Default - complianceType: mustonlyhave objectDefinition: apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: kernel-module-management namespace: openshift-kmm spec: channel: stable config: env: - name: KMM_MANAGED1 value: "1" installPlanApproval: Automatic name: kernel-module-management source: redhat-operators sourceNamespace: openshift-marketplace - complianceType: mustonlyhave objectDefinition: apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: kmm-module-manager rules: - apiGroups: [kmm.sigs.x-k8s.io] resources: [modules] verbs: [create, delete, get, list, patch, update, watch] - complianceType: mustonlyhave objectDefinition: apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: klusterlet-kmm subjects: - kind: ServiceAccount name: klusterlet-work-sa namespace: open-cluster-management-agent roleRef: kind: ClusterRole name: kmm-module-manager apiGroup: rbac.authorization.k8s.io --- apiVersion: apps.open-cluster-management.io/v1 kind: PlacementRule metadata: name: all-managed-clusters spec: clusterSelector:2 matchExpressions: [] --- apiVersion: policy.open-cluster-management.io/v1 kind: PlacementBinding metadata: name: install-kmm placementRef: apiGroup: apps.open-cluster-management.io kind: PlacementRule name: all-managed-clusters subjects: - apiGroup: policy.open-cluster-management.io kind: Policy name: install-kmm
4.18. Customizing upgrades for kernel modules Copy linkLink copied to clipboard!
Use this procedure to upgrade the kernel module while running maintenance operations on the node, including rebooting the node, if needed. To minimize the impact on the workloads running in the cluster, run the kernel upgrade process sequentially, one node at a time.
This procedure requires knowledge of the workload utilizing the kernel module and must be managed by the cluster administrator.
Prerequisites
-
Before upgrading, set the label on all the nodes that are used by the kernel module.
kmm.node.kubernetes.io/version-module.<module_namespace>.<module_name>=$moduleVersion - Terminate all user application workloads on the node or move them to another node.
- Unload the currently loaded kernel module.
- Ensure that the user workload (the application running in the cluster that is accessing kernel module) is not running on the node prior to kernel module unloading and that the workload is back running on the node after the new kernel module version has been loaded.
Procedure
- Ensure that the device plugin managed by KMM on the node is unloaded.
Update the following fields in the
custom resource (CR):Module-
(to the appropriate kernel version)
containerImage versionThe update should be atomic; that is, both the
andcontainerImagefields must be updated simultaneously.version
-
- Terminate any workload using the kernel module on the node being upgraded.
Remove the
label on the node. Run the following command to unload the kernel module from the node:kmm.node.kubernetes.io/version-module.<module_namespace>.<module_name>$ oc label node/<node_name> kmm.node.kubernetes.io/version-module.<module_namespace>.<module_name>-If required, as the cluster administrator, perform any additional maintenance required on the node for the kernel module upgrade.
If no additional upgrading is needed, you can skip Steps 3 through 6 by updating the
label value to the newkmm.node.kubernetes.io/version-module.<module_namespace>.<module_name>as set in the$moduleVersion.ModuleRun the following command to add the
label to the node. Thekmm.node.kubernetes.io/version-module.<module_namespace>.<module_name>=$moduleVersionmust be equal to the new value of the$moduleVersionfield in theversionCR.Module$ oc label node/<node_name> kmm.node.kubernetes.io/version-module.<module_namespace>.<module_name>=<desired_version>NoteBecause of Kubernetes limitations in label names, the combined length of
name and namespace must not exceed 39 characters.Module- Restore any workload that leverages the kernel module on the node.
- Reload the device plugin managed by KMM on the node.
4.19. Day 1 kernel module loading Copy linkLink copied to clipboard!
Kernel Module Management (KMM) is typically a Day 2 Operator. Kernel modules are loaded only after the complete initialization of a Linux (RHCOS) server. However, in some scenarios the kernel module must be loaded at an earlier stage. Day 1 functionality allows you to use the Machine Config Operator (MCO) to load kernel modules during the Linux
systemd
4.19.1. Day 1 supported use cases Copy linkLink copied to clipboard!
The Day 1 functionality supports a limited number of use cases. The main use case is to allow loading out-of-tree (OOT) kernel modules prior to NetworkManager service initialization. It does not support loading kernel module at the
initramfs
The following are the conditions needed for Day 1 functionality:
- The kernel module is not loaded in the kernel.
- The in-tree kernel module is loaded into the kernel, but can be unloaded and replaced by the OOT kernel module. This means that the in-tree module is not referenced by any other kernel modules.
- In order for Day 1 functionlity to work, the node must have a functional network interface, that is, an in-tree kernel driver for that interface. The OOT kernel module can be a network driver that will replace the functional network driver.
4.19.2. OOT kernel module loading flow Copy linkLink copied to clipboard!
The loading of the out-of-tree (OOT) kernel module leverages the Machine Config Operator (MCO). The flow sequence is as follows:
Procedure
-
Apply a resource to the existing running cluster. In order to identify the necessary nodes that need to be updated, you must create an appropriate
MachineConfigresource.MachineConfigPool -
MCO applies the reboots node by node. On any rebooted node, two new services are deployed:
systemdservice andpullservice.load -
The service is configured to run prior to the
loadservice. The service tries to pull a predefined kernel module image and then, using that image, to unload an in-tree module and load an OOT kernel module.NetworkConfiguration -
The service is configured to run after NetworkManager service. The service checks if the preconfigured kernel module image is located on the node’s filesystem. If it is, the service exists normally, and the server continues with the boot process. If not, it pulls the image onto the node and reboots the node afterwards.
pull
4.19.3. The kernel module image Copy linkLink copied to clipboard!
The Day 1 functionality uses the same DTK based image leveraged by Day 2 KMM builds. The out-of-tree kernel module should be located under
/opt/lib/modules/${kernelVersion}
4.19.4. In-tree module replacement Copy linkLink copied to clipboard!
The Day 1 functionality always tries to replace the in-tree kernel module with the OOT version. If the in-tree kernel module is not loaded, the flow is not affected; the service proceeds and loads the OOT kernel module.
4.19.5. MCO yaml creation Copy linkLink copied to clipboard!
KMM provides an API to create an MCO YAML manifest for the Day 1 functionality:
ProduceMachineConfig(machineConfigName, machineConfigPoolRef, kernelModuleImage, kernelModuleName string) (string, error)
The returned output is a string representation of the MCO YAML manifest to be applied. It is up to the customer to apply this YAML.
The parameters are:
machineConfigName-
The name of the MCO YAML manifest. This parameter is set as the
nameparameter of the metadata of the MCO YAML manifest. machineConfigPoolRef-
The
MachineConfigPoolname used to identify the targeted nodes. kernelModuleImage- The name of the container image that includes the OOT kernel module.
kernelModuleName- The name of the OOT kernel module. This parameter is used both to unload the in-tree kernel module (if loaded into the kernel) and to load the OOT kernel module.
The API is located under
pkg/mcproducer
pkg/mcproducer
4.19.6. The MachineConfigPool Copy linkLink copied to clipboard!
The
MachineConfigPool
kind: MachineConfigPool
metadata:
name: sfc
spec:
machineConfigSelector:
matchExpressions:
- {key: machineconfiguration.openshift.io/role, operator: In, values: [worker, sfc]}
nodeSelector:
matchLabels:
node-role.kubernetes.io/sfc: ""
paused: false
maxUnavailable: 1
There are predefined
MachineConfigPools
-
: Targets all worker nodes in the cluster
worker -
: Targets all master nodes in the cluster
master
Define the following
MachineConfig
MachineConfigPool
metadata:
labels:
machineconfiguration.opensfhit.io/role: master
Define the following
MachineConfig
MachineConfigPool
metadata:
labels:
machineconfiguration.opensfhit.io/role: worker
4.20. Debugging and troubleshooting Copy linkLink copied to clipboard!
If the kmods in your driver container are not signed or are signed with the wrong key, then the container can enter a
PostStartHookError
CrashLoopBackOff
oc describe
modprobe: ERROR: could not insert '<your_kmod_name>': Required key not available
4.21. KMM firmware support Copy linkLink copied to clipboard!
Kernel modules sometimes need to load firmware files from the file system. KMM supports copying firmware files from the kmod image to the node’s file system.
The contents of
.spec.moduleLoader.container.modprobe.firmwarePath
/var/lib/firmware
modprobe
All files and empty directories are removed from that location before running the
modprobe -r
4.21.2. Building a kmod image Copy linkLink copied to clipboard!
Procedure
In addition to building the kernel module itself, include the binary firmware in the builder image:
FROM registry.redhat.io/ubi9/ubi-minimal as builder # Build the kmod RUN ["mkdir", "/firmware"] RUN ["curl", "-o", "/firmware/firmware.bin", "https://artifacts.example.com/firmware.bin"] FROM registry.redhat.io/ubi9/ubi-minimal # Copy the kmod, install modprobe, run depmod COPY --from=builder /firmware /firmware
4.21.3. Tuning the Module resource Copy linkLink copied to clipboard!
Procedure
Set
in the.spec.moduleLoader.container.modprobe.firmwarePathcustom resource (CR):ModuleapiVersion: kmm.sigs.x-k8s.io/v1beta1 kind: Module metadata: name: my-kmod spec: moduleLoader: container: modprobe: moduleName: my-kmod # Required firmwarePath: /firmware1 - 1
- Optional: Copies
/firmware/*into/var/lib/firmware/on the node.
4.22. Day 0 through Day 2 kmod installation Copy linkLink copied to clipboard!
You can install some kernel modules (kmods) during Day 0 through Day 2 operations without Kernel Module Management (KMM). This could assist in the transition of the kmods to KMM.
Use the following criteria to determine suitable kmod installations.
- Day 0
The most basic kmods that are required for a node to become
in the cluster. Examples of these types of kmods include:Ready- A storage driver that is required to mount the rootFS as part of the boot process
-
A network driver that is required for the machine to access on the bootstrap node to pull the ignition and join the cluster
machine-config-server
- Day 1
Kmods that are not required for a node to become
in the cluster but cannot be unloaded when the node isReady.ReadyAn example of this type of kmod is an out-of-tree (OOT) network driver that replaces an outdated in-tree driver to exploit the full potential of the NIC while
depends on it. When the node isNetworkManager, you cannot unload the driver because of theReadydependency.NetworkManager- Day 2
Kmods that can be dynamically loaded to the kernel or removed from it without interfering with the cluster infrastructure, for example, connectivity.
Examples of these types of kmods include:
- GPU operators
- Secondary network adapters
- field-programmable gate arrays (FPGAs)
4.22.1. Layering background Copy linkLink copied to clipboard!
When a Day 0 kmod is installed in the cluster, layering is applied through the Machine Config Operator (MCO) and OpenShift Container Platform upgrades do not trigger node upgrades.
You only need to recompile the driver if you add new features to it, because the node’s operating system will remain the same.
4.22.2. Lifecycle management Copy linkLink copied to clipboard!
You can leverage KMM to manage the Day 0 through Day 2 lifecycle of kmods without a reboot when the driver allows it.
This will not work if the upgrade requires a node reboot, for example, when rebuilding
initramfs
Use one of the following options for lifecycle management.
4.22.2.1. Treat the kmod as an in-tree driver Copy linkLink copied to clipboard!
Use this method when you want to upgrade the kmods. In this case, treat the kmod as an in-tree driver and create a
Module
inTreeRemoval
Note the following characteristics of treating the kmod as an in-tree driver:
- Downtime might occur as KMM tries to unload and load the kmod on all the selected nodes simultaneously.
- This works if removing the driver makes the node lose connectivity because KMM uses a single pod to unload and load the driver.
4.22.2.2. Use ordered upgrade Copy linkLink copied to clipboard!
You can use ordered upgrade (ordered_upgrade.md) to create a versioned
Module
Note the following characteristics of using ordered upgrade:
- There is no cluster downtime because you control the pace of the upgrade and how many nodes are upgraded at the same time; therefore, an upgrade with no downtime is possible.
- This method will not work if unloading the driver results in losing connection to the node, because KMM creates two different worker pods for unloading and another for loading. These pods will not be scheduled.
4.23. Troubleshooting KMM Copy linkLink copied to clipboard!
When troubleshooting KMM installation issues, you can monitor logs to determine at which stage issues occur. Then, retrieve diagnostic data relevant to that stage.
4.23.1. Reading Operator logs Copy linkLink copied to clipboard!
You can use the
oc logs
Example command for KMM controller
$ oc logs -fn openshift-kmm deployments/kmm-operator-controller
Example command for KMM webhook server
$ oc logs -fn openshift-kmm deployments/kmm-operator-webhook-server
Example command for KMM-Hub controller
$ oc logs -fn openshift-kmm-hub deployments/kmm-operator-hub-controller
Example command for KMM-Hub webhook server
$ oc logs -fn openshift-kmm deployments/kmm-operator-hub-webhook-server
4.23.2. Observing events Copy linkLink copied to clipboard!
Use the following methods to view KMM events.
Build & sign
KMM publishes events whenever it starts a kmod image build or observes its outcome. These events are attached to
Module
oc describe module
$ oc describe modules.kmm.sigs.x-k8s.io kmm-ci-a
[...]
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal BuildCreated 2m29s kmm Build created for kernel 6.6.2-201.fc39.x86_64
Normal BuildSucceeded 63s kmm Build job succeeded for kernel 6.6.2-201.fc39.x86_64
Normal SignCreated 64s (x2 over 64s) kmm Sign created for kernel 6.6.2-201.fc39.x86_64
Normal SignSucceeded 57s kmm Sign job succeeded for kernel 6.6.2-201.fc39.x86_64
Module load or unload
KMM publishes events whenever it successfully loads or unloads a kernel module on a node. These events are attached to
Node
oc describe node
$ oc describe node my-node
[...]
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
[...]
Normal ModuleLoaded 4m17s kmm Module default/kmm-ci-a loaded into the kernel
Normal ModuleUnloaded 2s kmm Module default/kmm-ci-a unloaded from the kernel
4.23.3. Using the must-gather tool Copy linkLink copied to clipboard!
The
oc adm must-gather
4.23.3.1. Gathering data for KMM Copy linkLink copied to clipboard!
Procedure
Gather the data for the KMM Operator controller manager:
Set the
variable:MUST_GATHER_IMAGE$ export MUST_GATHER_IMAGE=$(oc get deployment -n openshift-kmm kmm-operator-controller -ojsonpath='{.spec.template.spec.containers[?(@.name=="manager")].env[?(@.name=="RELATED_IMAGE_MUST_GATHER")].value}')$ oc adm must-gather --image="${MUST_GATHER_IMAGE}" -- /usr/bin/gatherNoteUse the
switch to specify a namespace if you installed KMM in a custom namespace.-n <namespace>Run the
tool:must-gather$ oc adm must-gather --image="${MUST_GATHER_IMAGE}" -- /usr/bin/gather
View the Operator logs:
$ oc logs -fn openshift-kmm deployments/kmm-operator-controllerExample 4.1. Example output
I0228 09:36:37.352405 1 request.go:682] Waited for 1.001998746s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/machine.openshift.io/v1beta1?timeout=32s I0228 09:36:40.767060 1 listener.go:44] kmm/controller-runtime/metrics "msg"="Metrics server is starting to listen" "addr"="127.0.0.1:8080" I0228 09:36:40.769483 1 main.go:234] kmm/setup "msg"="starting manager" I0228 09:36:40.769907 1 internal.go:366] kmm "msg"="Starting server" "addr"={"IP":"127.0.0.1","Port":8080,"Zone":""} "kind"="metrics" "path"="/metrics" I0228 09:36:40.770025 1 internal.go:366] kmm "msg"="Starting server" "addr"={"IP":"::","Port":8081,"Zone":""} "kind"="health probe" I0228 09:36:40.770128 1 leaderelection.go:248] attempting to acquire leader lease openshift-kmm/kmm.sigs.x-k8s.io... I0228 09:36:40.784396 1 leaderelection.go:258] successfully acquired lease openshift-kmm/kmm.sigs.x-k8s.io I0228 09:36:40.784876 1 controller.go:185] kmm "msg"="Starting EventSource" "controller"="Module" "controllerGroup"="kmm.sigs.x-k8s.io" "controllerKind"="Module" "source"="kind source: *v1beta1.Module" I0228 09:36:40.784925 1 controller.go:185] kmm "msg"="Starting EventSource" "controller"="Module" "controllerGroup"="kmm.sigs.x-k8s.io" "controllerKind"="Module" "source"="kind source: *v1.DaemonSet" I0228 09:36:40.784968 1 controller.go:185] kmm "msg"="Starting EventSource" "controller"="Module" "controllerGroup"="kmm.sigs.x-k8s.io" "controllerKind"="Module" "source"="kind source: *v1.Build" I0228 09:36:40.785001 1 controller.go:185] kmm "msg"="Starting EventSource" "controller"="Module" "controllerGroup"="kmm.sigs.x-k8s.io" "controllerKind"="Module" "source"="kind source: *v1.Job" I0228 09:36:40.785025 1 controller.go:185] kmm "msg"="Starting EventSource" "controller"="Module" "controllerGroup"="kmm.sigs.x-k8s.io" "controllerKind"="Module" "source"="kind source: *v1.Node" I0228 09:36:40.785039 1 controller.go:193] kmm "msg"="Starting Controller" "controller"="Module" "controllerGroup"="kmm.sigs.x-k8s.io" "controllerKind"="Module" I0228 09:36:40.785458 1 controller.go:185] kmm "msg"="Starting EventSource" "controller"="PodNodeModule" "controllerGroup"="" "controllerKind"="Pod" "source"="kind source: *v1.Pod" I0228 09:36:40.786947 1 controller.go:185] kmm "msg"="Starting EventSource" "controller"="PreflightValidation" "controllerGroup"="kmm.sigs.x-k8s.io" "controllerKind"="PreflightValidation" "source"="kind source: *v1beta1.PreflightValidation" I0228 09:36:40.787406 1 controller.go:185] kmm "msg"="Starting EventSource" "controller"="PreflightValidation" "controllerGroup"="kmm.sigs.x-k8s.io" "controllerKind"="PreflightValidation" "source"="kind source: *v1.Build" I0228 09:36:40.787474 1 controller.go:185] kmm "msg"="Starting EventSource" "controller"="PreflightValidation" "controllerGroup"="kmm.sigs.x-k8s.io" "controllerKind"="PreflightValidation" "source"="kind source: *v1.Job" I0228 09:36:40.787488 1 controller.go:185] kmm "msg"="Starting EventSource" "controller"="PreflightValidation" "controllerGroup"="kmm.sigs.x-k8s.io" "controllerKind"="PreflightValidation" "source"="kind source: *v1beta1.Module" I0228 09:36:40.787603 1 controller.go:185] kmm "msg"="Starting EventSource" "controller"="NodeKernel" "controllerGroup"="" "controllerKind"="Node" "source"="kind source: *v1.Node" I0228 09:36:40.787634 1 controller.go:193] kmm "msg"="Starting Controller" "controller"="NodeKernel" "controllerGroup"="" "controllerKind"="Node" I0228 09:36:40.787680 1 controller.go:193] kmm "msg"="Starting Controller" "controller"="PreflightValidation" "controllerGroup"="kmm.sigs.x-k8s.io" "controllerKind"="PreflightValidation" I0228 09:36:40.785607 1 controller.go:185] kmm "msg"="Starting EventSource" "controller"="imagestream" "controllerGroup"="image.openshift.io" "controllerKind"="ImageStream" "source"="kind source: *v1.ImageStream" I0228 09:36:40.787822 1 controller.go:185] kmm "msg"="Starting EventSource" "controller"="preflightvalidationocp" "controllerGroup"="kmm.sigs.x-k8s.io" "controllerKind"="PreflightValidationOCP" "source"="kind source: *v1beta1.PreflightValidationOCP" I0228 09:36:40.787853 1 controller.go:193] kmm "msg"="Starting Controller" "controller"="imagestream" "controllerGroup"="image.openshift.io" "controllerKind"="ImageStream" I0228 09:36:40.787879 1 controller.go:185] kmm "msg"="Starting EventSource" "controller"="preflightvalidationocp" "controllerGroup"="kmm.sigs.x-k8s.io" "controllerKind"="PreflightValidationOCP" "source"="kind source: *v1beta1.PreflightValidation" I0228 09:36:40.787905 1 controller.go:193] kmm "msg"="Starting Controller" "controller"="preflightvalidationocp" "controllerGroup"="kmm.sigs.x-k8s.io" "controllerKind"="PreflightValidationOCP" I0228 09:36:40.786489 1 controller.go:193] kmm "msg"="Starting Controller" "controller"="PodNodeModule" "controllerGroup"="" "controllerKind"="Pod"
4.23.3.2. Gathering data for KMM-Hub Copy linkLink copied to clipboard!
Procedure
Gather the data for the KMM Operator hub controller manager:
Set the
variable:MUST_GATHER_IMAGE$ export MUST_GATHER_IMAGE=$(oc get deployment -n openshift-kmm-hub kmm-operator-hub-controller -ojsonpath='{.spec.template.spec.containers[?(@.name=="manager")].env[?(@.name=="RELATED_IMAGE_MUST_GATHER")].value}')$ oc adm must-gather --image="${MUST_GATHER_IMAGE}" -- /usr/bin/gather -uNoteUse the
switch to specify a namespace if you installed KMM in a custom namespace.-n <namespace>Run the
tool:must-gather$ oc adm must-gather --image="${MUST_GATHER_IMAGE}" -- /usr/bin/gather -u
View the Operator logs:
$ oc logs -fn openshift-kmm-hub deployments/kmm-operator-hub-controllerExample 4.2. Example output
I0417 11:34:08.807472 1 request.go:682] Waited for 1.023403273s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/tuned.openshift.io/v1?timeout=32s I0417 11:34:12.373413 1 listener.go:44] kmm-hub/controller-runtime/metrics "msg"="Metrics server is starting to listen" "addr"="127.0.0.1:8080" I0417 11:34:12.376253 1 main.go:150] kmm-hub/setup "msg"="Adding controller" "name"="ManagedClusterModule" I0417 11:34:12.376621 1 main.go:186] kmm-hub/setup "msg"="starting manager" I0417 11:34:12.377690 1 leaderelection.go:248] attempting to acquire leader lease openshift-kmm-hub/kmm-hub.sigs.x-k8s.io... I0417 11:34:12.378078 1 internal.go:366] kmm-hub "msg"="Starting server" "addr"={"IP":"127.0.0.1","Port":8080,"Zone":""} "kind"="metrics" "path"="/metrics" I0417 11:34:12.378222 1 internal.go:366] kmm-hub "msg"="Starting server" "addr"={"IP":"::","Port":8081,"Zone":""} "kind"="health probe" I0417 11:34:12.395703 1 leaderelection.go:258] successfully acquired lease openshift-kmm-hub/kmm-hub.sigs.x-k8s.io I0417 11:34:12.396334 1 controller.go:185] kmm-hub "msg"="Starting EventSource" "controller"="ManagedClusterModule" "controllerGroup"="hub.kmm.sigs.x-k8s.io" "controllerKind"="ManagedClusterModule" "source"="kind source: *v1beta1.ManagedClusterModule" I0417 11:34:12.396403 1 controller.go:185] kmm-hub "msg"="Starting EventSource" "controller"="ManagedClusterModule" "controllerGroup"="hub.kmm.sigs.x-k8s.io" "controllerKind"="ManagedClusterModule" "source"="kind source: *v1.ManifestWork" I0417 11:34:12.396430 1 controller.go:185] kmm-hub "msg"="Starting EventSource" "controller"="ManagedClusterModule" "controllerGroup"="hub.kmm.sigs.x-k8s.io" "controllerKind"="ManagedClusterModule" "source"="kind source: *v1.Build" I0417 11:34:12.396469 1 controller.go:185] kmm-hub "msg"="Starting EventSource" "controller"="ManagedClusterModule" "controllerGroup"="hub.kmm.sigs.x-k8s.io" "controllerKind"="ManagedClusterModule" "source"="kind source: *v1.Job" I0417 11:34:12.396522 1 controller.go:185] kmm-hub "msg"="Starting EventSource" "controller"="ManagedClusterModule" "controllerGroup"="hub.kmm.sigs.x-k8s.io" "controllerKind"="ManagedClusterModule" "source"="kind source: *v1.ManagedCluster" I0417 11:34:12.396543 1 controller.go:193] kmm-hub "msg"="Starting Controller" "controller"="ManagedClusterModule" "controllerGroup"="hub.kmm.sigs.x-k8s.io" "controllerKind"="ManagedClusterModule" I0417 11:34:12.397175 1 controller.go:185] kmm-hub "msg"="Starting EventSource" "controller"="imagestream" "controllerGroup"="image.openshift.io" "controllerKind"="ImageStream" "source"="kind source: *v1.ImageStream" I0417 11:34:12.397221 1 controller.go:193] kmm-hub "msg"="Starting Controller" "controller"="imagestream" "controllerGroup"="image.openshift.io" "controllerKind"="ImageStream" I0417 11:34:12.498335 1 filter.go:196] kmm-hub "msg"="Listing all ManagedClusterModules" "managedcluster"="local-cluster" I0417 11:34:12.498570 1 filter.go:205] kmm-hub "msg"="Listed ManagedClusterModules" "count"=0 "managedcluster"="local-cluster" I0417 11:34:12.498629 1 filter.go:238] kmm-hub "msg"="Adding reconciliation requests" "count"=0 "managedcluster"="local-cluster" I0417 11:34:12.498687 1 filter.go:196] kmm-hub "msg"="Listing all ManagedClusterModules" "managedcluster"="sno1-0" I0417 11:34:12.498750 1 filter.go:205] kmm-hub "msg"="Listed ManagedClusterModules" "count"=0 "managedcluster"="sno1-0" I0417 11:34:12.498801 1 filter.go:238] kmm-hub "msg"="Adding reconciliation requests" "count"=0 "managedcluster"="sno1-0" I0417 11:34:12.501947 1 controller.go:227] kmm-hub "msg"="Starting workers" "controller"="imagestream" "controllerGroup"="image.openshift.io" "controllerKind"="ImageStream" "worker count"=1 I0417 11:34:12.501948 1 controller.go:227] kmm-hub "msg"="Starting workers" "controller"="ManagedClusterModule" "controllerGroup"="hub.kmm.sigs.x-k8s.io" "controllerKind"="ManagedClusterModule" "worker count"=1 I0417 11:34:12.502285 1 imagestream_reconciler.go:50] kmm-hub "msg"="registered imagestream info mapping" "ImageStream"={"name":"driver-toolkit","namespace":"openshift"} "controller"="imagestream" "controllerGroup"="image.openshift.io" "controllerKind"="ImageStream" "dtkImage"="quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:df42b4785a7a662b30da53bdb0d206120cf4d24b45674227b16051ba4b7c3934" "name"="driver-toolkit" "namespace"="openshift" "osImageVersion"="412.86.202302211547-0" "reconcileID"="e709ff0a-5664-4007-8270-49b5dff8bae9"
Chapter 5. Kernel Module Management Operator release notes Copy linkLink copied to clipboard!
5.1. Release notes for Kernel Module Management Operator 2.2 Copy linkLink copied to clipboard!
5.1.1. New features Copy linkLink copied to clipboard!
- KMM is now using the CRI-O container engine to pull container images in the worker pod instead of using HTTP calls directly from the worker container. For more information, see Example Module CR.
-
The Kernel Module Management (KMM) Operator images are now based on container images instead of the
rhel-els-minimalimages. This change results in a greatly reduced image footprint, while still maintaining FIPS compliance.rhel-els - In this release, the firmware search path has been updated to copy the contents of the specified path into the path specified in worker.setFirmwareClassPath (default: /var/lib/firmware). For more information, see Example Module CR.
- For each node running a kernel matching the regular expression, KMM now checks if you have included a tag or a digest. If you have not specified a tag or digest in the container image, then the validation webhook returns an error and does not apply the module. For more information, see Example Module CR.
5.2. Release notes for Kernel Module Management Operator 2.3 Copy linkLink copied to clipboard!
5.2.1. New features Copy linkLink copied to clipboard!
- In this release, KMM uses version 1.23 of the Golang programming language to ensure test continuity for partners.
- You can now schedule KMM pods by defining taints and tolerations. For more information, see Using tolerations for kernel module scheduling.
5.3. Release notes for Kernel Module Management Operator 2.4 Copy linkLink copied to clipboard!
5.3.1. New features and enhancements Copy linkLink copied to clipboard!
- In this release, you now have the option to configure the Kernel Module Management (KMM) module to not load an out-of-tree kernel driver and use the in-tree driver instead, and run only the device plugin. For more information, see Using in-tree modules with the device plugin.
In this release, KMM configurations are now persistent after cluster and KMM Operator upgrades and redeployments of KMM.
In earlier releases, a cluster or KMM upgrade, or any other action, such as upgrading a non-default configuration like the firmware path that redeploys KMM, could create the need to reconfigure KMM. In this release, KMM configurations now remain persistent regardless of any of such actions.
For more information, see Configuring the Kernel Module Management Operator.
- Improvements have been added to KMM so that GPU Operator vendors do not need to replicate KMM functionality in their code, but instead use KMM as is. This change greatly improves Operators' code size, tests, and reliability.
-
In this release, KMM no longer uses HTTP(S) direct requests to check if a kmod image exists. Instead, CRI-O is used internally to check for the images. This mitigates the need to access container image registries directly from HTTP(S) requests and manually handle tasks such as reading for mirroring configuration, accessing the image cluster resource for TLS configuration, mounting the CAs from the node, and maintaining your own cache in Hub & Spoke.
/etc/containers/registries.conf
- The KMM and KMM-hub Operators have been assigned the "Meets Best Practices" label in the Red Hat Catalog.
-
You can now install KMM on compute nodes, if needed. Previously, it was not possible to deploy workloads on the control-plane nodes. Because the compute nodes do not have the or
node-role.kubernetes.io/control-planelabels, the Kernel Module Management Operator might need further configurations. An internal code change has resolved this issue.node-role.kubernetes.io/master
In this release, the heartbeat filter for the NMC reconciler has been updated to filter the following events on nodes:
-
node.spec -
metadata.labels -
status.nodeInfo -
(
status.conditions[]only) and still filtering heartbeatsNodeReady
-
5.3.2. Notable technical changes Copy linkLink copied to clipboard!
- In this release, the preflight validation resource in the cluster has been modified. You can use the preflight validation to verify kernel modules to be installed on the nodes after cluster upgrades and possible kernel upgrades. Preflight validation also reports on the status and progress of each module in the cluster that it attempts or has attempted to validate. For more information, see Preflight validation for Kernel Module Management (KMM) Modules.
-
A requirement when creating a kmod image is that both the kernel module files and the
.kobinary must be included, which is required for copying files during the image loading process. For more information, see Creating a kmod image.cp
-
The field that refers to the Operator maturity level has been changed from
capabilitiestoBasic Install.Seamless upgradesindicates that the Operator does not have an upgrade option. This is not the case for KMM, where seamless upgrades are supported.Basic Install
5.3.3. Bug fixes Copy linkLink copied to clipboard!
Webhook deployment has been renamed from
towebhook-server.webhook-
Cause: Generating files with generated a service called
controller-genthat is not configurable. And, when deploying KMM with Operator Lifecycle Manager (OLM), OLM deploys a service for the webhook calledwebhook-service.-service -
Consequence: Two services were generated for the same deployment. One generated by and added to the bundle manifests and the other that the OLM created.
controller-gen -
Fix: Make OLM find an already existing service called in the cluster because the deployment is called
webhook-service.webhook - Result: A second service is no longer created.
-
Cause: Generating files with
Using
object in conjunction with DTK as the image stream results inimageRepoSecreterror.authorization required-
Cause: On the Kernel Module Management (KMM) Operator, when you set object in the KMM module, and the build’s resulting container image is defined to be stored in the cluster’s internal registry, the build fails to push the final image and generate an
imageRepoSecreterror.authorization required - Consequence: The KMM Operator does not work as expected.
-
Fix: When the object is user-defined, it is used as both a pull and push secret by the build process. To support using the cluster’s internal registry, you must add the authorization token for that registry to the
imageRepoSecretobject. You can obtain the token from the "build" service account of the KMM module’s namespace.imageRepoSecret - Result: The KMM Operator works as expected.
-
Cause: On the Kernel Module Management (KMM) Operator, when you set
Creating or deleting the image or creating an MCM module does not load the module on the spoke.
-
Cause: In a hub and spoke environment, when creating or deleting the image in registry, or when creating a (MCM), the module on the spoke cluster is not loaded.
ManagedClusterModule - Consequence: The module on the spoke is not created.
- Fix: Remove the cache package and image translation from the hub and spoke environment.
- Result: The module on the spoke is created for the second time the MCM object is created.
-
Cause: In a hub and spoke environment, when creating or deleting the image in registry, or when creating a
KMM cannot pull images from the private registry while doing in-cluster builds.
- Cause: The Kernel Module Management (KMM) Operator cannot pull images from private registry while doing in-cluster builds.
- Consequence: Images in private registries that are used in the build process can not be pulled.
-
Fix: The object configuration is now also used in the build process. The
imageRepoSecretobject specified must include all registries that are being used.imageRepoSecret - Result: You can now use private registries when doing in-cluster builds.
KMM worker pod is orphaned when deleting a module with a container image that can not be pulled.
- Cause: A Kernel Module Management (KMM) Operator worker pod is orphaned when deleting a module with a container image that can not be pulled.
- Consequence: Failing worker pods are left on the cluster and at no point being collected for garbage.
- Fix: KMM, now collects orphaned failing pods upon the modules deletion for garbage.
- Result: The module is successfully deleted, and all associated orphaned failing pods are also deleted.
The KMM Operator tries to create a MIC even when the node selector does not match.
- Cause: The Kernel Module Management (KMM) Operator tries to create a 'ModuleImagesConfig' (MIC) resource even when the node selector does not match with any actual nodes and fails.
- Consequence: The KMM Operator reports an error when reconciling a module that does not target any node.
-
Fix: The field in the MIC resource is now optional.
Images - Result: The KMM Operator can successfully create the MIC resource even when there are no images in it.
KMM does not reload the kernel module in case the node reboot sequence is too quick.
- Cause: The Kernel Module Management (KMM) Operator does not reload the kernel module in case the node reboot sequence is too quick. The reboot is determined based on the timestamp of the status condition being later than the timestamp in the Node Machine Configuration (NMC) status.
- Consequence: When the reboot happens quickly, in less time than the grace period, the node state does not change. After the node reboots, KMM does not load the kernel module again.
-
Fix: Instead of relying on the condition state, NMC can rely on the field. This field is set by kubelet based on the
Status.NodeInfo.BootIDfile of the server node, so it is updated after each reboot./proc/sys/kernel/random/boot_id - Result: The more accurate timestamps enable the Kernel Module Management (KMM) Operator to reload the kernel module after the node reboot sequence.
Filtering out node heartbeats events for the Node Machine Configuration (NMC) controller.
- Cause: The NMC controller gets spammed with events from node heartbeats. The node heartbeats let the Kubernetes API server know that the node is still connected and functional.
- Consequence: The spamming causes a constant reconciliation even when there is no module, and therefore no NMC, are applied to the cluster.
- Fix: The NMC controller now filter the node’s heartbeat from its reconciliation loop.
- Result: The NMC controller only gets real events and filters out node heartbeats.
NMC status contains toleration values, even though there are no tolerations in the
or in the module.NMC.spec-
Cause: The Node Machine Configuration (NMC) status contains toleration values, even though there are no tolerations in the or in the module.
NMC.spec - Consequence: Tolerations other than Kernel Module Management-specific tolerations can appear in the status.
- Fix: The NMC status now gets its toleration from a dedicated annotation rather than from the worker pod.
- Result: The NMC status only contains the module’s tolerations.
-
Cause: The Node Machine Configuration (NMC) status contains toleration values, even though there are no tolerations in the
The KMM Operator version 2.4 fails to start properly and cannot list the
resource.\modulebuildsignconfigs\- Cause: On the Kernel Module Management (KMM) Operator, when the Operator is installed using Red Hat Konflux, it does not start properly because the log files contain errors.
- Consequence: The KMM Operator does not work as expected.
-
Fix: The Cluster Service Version (CSV) file is updated to list the and the
\modulebuildsignconfigs\resources .moduleimagesconfig - Result: The KMM Operator works as expected.
The Red Hat Konflux build does not include version and git commit ID in the Operator logs.
- Cause: On the Kernel Module Management (KMM) Operator, when the Operator was built using Communications Platform as a Service (CPaas), the build included the Operator version and git commit ID in the log files. However, with Red Hat Konflux these details are not included in the log files.
- Consequence: Important information is missing from the log files.
- Fix: Some modifications are introduced in Konflux to resolve this issue.
- Result: The KMM Operator build now includes the Operator version and git commit ID in the log files.
The KMM Operator does not load the module after node with taint is rebooted.
- Cause: The Kernel Module Management (KMM) Operator does not reload the kernel module in case the node reboot sequence is too quick. The reboot is determined based on the timestamp of the status condition being later than the timestamp in the Node Machine Configuration (NMC) status.
- Consequence: When the reboot happens quickly, in less time than the grace period, the node state does not change. After the node reboots, KMM does not load the kernel module again.
-
Fix: Instead of relying on the condition state, NMC can rely on the field. This field is set by kubelet based on the /proc/sys/kernel/random/boot_id file of the server node, so it is updated after each reboot.
Status.NodeInfo.BootID - Result: The more accurate timestamps enable the Kernel Module Management (KMM) Operator to reload the kernel module after the node reboot sequence.
Redeploying a module that uses in-cluster builds fails with the
policy.ImagePullBackOff- Cause: On the Kernel Module Management (KMM) Operator, the image pull policy for the puller pod and the worker pod is different.
- Consequence: An image can be considered as existing while, in fact, it is not.
- Fix: Make the image pull policy of the pull pod the same as the pull policy defined in the KMM module since its the same policy that is used by the worker pod.
- Result: The MIC represents the state of the image in the same way the worker pod accesses it.
The MIC controller creates two pull-pods when it should create just one.
-
Cause: On the Kernel Module Management (KMM) Operator, the (MIC) controller may create multiple pull-pods for the same image.
ModuleImagesConfig - Consequence: Resources are not used appropriately or as intended.
-
Fix: The MIC API receives a slice of
CreateOrPatch, as the input is created by going over the target nodes and adding their images to the slice, so any duplicateImageSpecs, are now filtered out.ImageSpecs - Result: The KMM Operator works as expected.
-
Cause: On the Kernel Module Management (KMM) Operator, the
The
example in the documentation should specifyjob.dcDelayinstead of0s.0-
Cause: The Kernel Module Management (KMM) Operator default duration field is
job.gcDelaybut the documentation mentions the value as0s.0 -
Consequence: Entering a custom value of instead of
60or60smight result in an error due to the wrong input type.1m -
Fix: The field in the documentation is updated to default value of
job.gcDelay.0s - Result: Users are less likely to get confused.
-
Cause: The Kernel Module Management (KMM) Operator default
The KMM Operator Hub environment does not work because of missing MIC and MBSC CRDs.
-
Cause: The Kernel Module Management (KMM) Operator hub environment only generates Custom Resource Definitions (CRD) files based on the directory. As a result, this does not contain some CRDs that are required for the KMM Operator Hub environment, such as,
api-hub/(MIC) resource and Managed Kubernetes Service (MBSC).ModuleImagesConfig - Consequence: The KMM Operator hub environment cannot work because it tries to start controllers reconciling CRDs that do not exist in the cluster.
-
Fix: The fix generates all CRD files into the directory, but only applies the resources to the cluster that it actually needs.
config/crd-hub/bases - Result: The KMM Operator hub environment works as expected.
-
Cause: The Kernel Module Management (KMM) Operator hub environment only generates Custom Resource Definitions (CRD) files based on the
The KMM OperatorHub environment cannot build when finalisers are not set on a resource.
-
Cause: The Kernel Module Management (KMM) Operator displays an error with the controller failing to build. This is due to the missing
ManagedClusterModule(MIC) resource finalizers and Role-based Action Control (RBAC) permissions for the KMM OperatorHub environment.ModuleImagesConfig - Consequence: The KMM OperatorHub environment cannot build images.
- Fix: The RBAC permissions are updated to allow updating finalizers on the MIC resource, and then the appropriate rules created.
-
Result: The KMM OperatorHub environment builds images without errors with the controller.
ManagedClusterModule
-
Cause: The Kernel Module Management (KMM) Operator displays an error with the
The
custom resource, with aPreflightValidationOCPcauses the KMM Operator to panic.kernelVersion: tesdt-
Cause: Creating a custom resource (CR), with a
PreflightValidationOCPflag that is set tokernelVersion, causes the Kernel Module Management (KMM) Operator to generate a panic runtime error.tesdt - Consequence: Entering invalid kernel versions causes the KMM Operator to panic.
-
Fix: A webhook - a method for one application to automatically send real-time data to another application when a specific event occurs - is now added to the CR.
PreflightValidationOCP -
Result: The CR with invalid kernel versions can no longer be applied to the cluster, therefore, preventing the Operator from generating a panic runtime error.
PreflightValidationOCP
-
Cause: Creating a
The
custom resource, with aPreFflightValidationOCPflag that is different that the one of the cluster, does not work.kernelVersion-
Cause: Creating a custom resource (CR), with a
PreflightValidationOCPflag that is different from the one of the cluster, does not work.kernelVersion - Consequence: The Kernel Module Management (KMM) Operator is unable to find the Driver Toolkit (DTK) input image for the new kernel version.
-
Fix: You must use the CR and explicitly set the
PreflightValidationOCPfield in the CR.dtkImage -
Result: Using the fields and
kernelVersionthe feature can build installed modules for target OpenShift Container Platform versions.dtkImage
-
Cause: Creating a
The KMM Operator version 2.4 documentation is updated with
information.PreflightValidationOCP-
Cause: Previously, when creating an CR, you were required to supply the release-image. This has now changed and you need to set the
PreflightValidationOCPthekernelVersionfields.dtkImage - Consequence: The documentation was outdated and required an update.
- Fix: The documentation is updated with the new support details.
- Result: The KMM preflight feature is documented as expected.
-
Cause: Previously, when creating an
5.3.4. Known issues Copy linkLink copied to clipboard!
The
event does not appear when a module isModuleUnloaded.Unloaded-
Cause: When a module is (using the create a
Loadedevent) orModuleLoadevent) the events might not appear. This happens when you load and unload the kernel module in a quick succession.Unloaded ` (using the create a `ModuleUnloaded -
Consequence: The and the
ModuleLoadevents might not appear in OpenShift Container Platform.ModuleUnloaded - Fix: Introduce an alerting mechanism for this potential behavior and for awareness when working with modules.
- Result: Not yet available.
-
Cause: When a module is
5.4. Release notes for Kernel Module Management Operator 2.4.1 Copy linkLink copied to clipboard!
5.4.1. Known issues Copy linkLink copied to clipboard!
If you are running KMM-hub version 2.3.0 or earlier and you are not running KMM, the upgrade to KMM-hub 2.4.0 is not reliable. Instead, you must upgrade to KMM-hub 2.4.1. KMM is not affected by this issue. For more information, see RHEA-2025:10778 - Product Enhancement Advisory.
Legal Notice
Copy linkLink copied to clipboard!
Copyright © Red Hat
OpenShift documentation is licensed under the Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0).
Modified versions must remove all Red Hat trademarks.
Portions adapted from https://github.com/kubernetes-incubator/service-catalog/ with modifications by Red Hat.
Red Hat, Red Hat Enterprise Linux, the Red Hat logo, the Shadowman logo, JBoss, OpenShift, Fedora, the Infinity logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries.
Linux® is the registered trademark of Linus Torvalds in the United States and other countries.
Java® is a registered trademark of Oracle and/or its affiliates.
XFS® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries.
MySQL® is a registered trademark of MySQL AB in the United States, the European Union and other countries.
Node.js® is an official trademark of the OpenJS Foundation.
The OpenStack® Word Mark and OpenStack logo are either registered trademarks/service marks or trademarks/service marks of the OpenStack Foundation, in the United States and other countries and are used with the OpenStack Foundation’s permission. We are not affiliated with, endorsed or sponsored by the OpenStack Foundation, or the OpenStack community.
All other trademarks are the property of their respective owners.