Este contenido no está disponible en el idioma seleccionado.
Specialized hardware and driver enablement
Learn about hardware enablement on OpenShift Container Platform
Abstract
Chapter 1. About specialized hardware and driver enablement
The Driver Toolkit (DTK) is a container image in the OpenShift Container Platform payload which is meant to be used as a base image on which to build driver containers. The Driver Toolkit image contains the kernel packages commonly required as dependencies to build or install kernel modules as well as a few tools needed in driver containers. The version of these packages will match the kernel version running on the RHCOS nodes in the corresponding OpenShift Container Platform release.
Driver containers are container images used for building and deploying out-of-tree kernel modules and drivers on container operating systems such as Red Hat Enterprise Linux CoreOS (RHCOS). Kernel modules and drivers are software libraries running with a high level of privilege in the operating system kernel. They extend the kernel functionalities or provide the hardware-specific code required to control new devices. Examples include hardware devices like field-programmable gate arrays (FPGA) or graphics processing units (GPU), and software-defined storage solutions, which all require kernel modules on client machines. Driver containers are the first layer of the software stack used to enable these technologies on OpenShift Container Platform deployments.
Chapter 2. Driver Toolkit
Learn about the Driver Toolkit and how you can use it as a base image for driver containers for enabling special software and hardware devices on OpenShift Container Platform deployments.
2.1. About the Driver Toolkit
2.1.1. Background
The Driver Toolkit is a container image in the OpenShift Container Platform payload used as a base image on which you can build driver containers. The Driver Toolkit image includes the kernel packages commonly required as dependencies to build or install kernel modules, as well as a few tools needed in driver containers. The version of these packages will match the kernel version running on the Red Hat Enterprise Linux CoreOS (RHCOS) nodes in the corresponding OpenShift Container Platform release.
Driver containers are container images used for building and deploying out-of-tree kernel modules and drivers on container operating systems like RHCOS. Kernel modules and drivers are software libraries running with a high level of privilege in the operating system kernel. They extend the kernel functionalities or provide the hardware-specific code required to control new devices. Examples include hardware devices like Field Programmable Gate Arrays (FPGA) or GPUs, and software-defined storage (SDS) solutions, such as Lustre parallel file systems, which require kernel modules on client machines. Driver containers are the first layer of the software stack used to enable these technologies on Kubernetes.
The list of kernel packages in the Driver Toolkit includes the following and their dependencies:
- 
							kernel-core
- 
							kernel-devel
- 
							kernel-headers
- 
							kernel-modules
- 
							kernel-modules-extra
In addition, the Driver Toolkit also includes the corresponding real-time kernel packages:
- 
							kernel-rt-core
- 
							kernel-rt-devel
- 
							kernel-rt-modules
- 
							kernel-rt-modules-extra
The Driver Toolkit also has several tools that are commonly needed to build and install kernel modules, including:
- 
							elfutils-libelf-devel
- 
							kmod
- 
							binutilskabi-dw
- 
							kernel-abi-whitelists
- dependencies for the above
2.1.2. Purpose
					Prior to the Driver Toolkit’s existence, users would install kernel packages in a pod or build config on OpenShift Container Platform using entitled builds or by installing from the kernel RPMs in the hosts machine-os-content. The Driver Toolkit simplifies the process by removing the entitlement step, and avoids the privileged operation of accessing the machine-os-content in a pod. The Driver Toolkit can also be used by partners who have access to pre-released OpenShift Container Platform versions to prebuild driver-containers for their hardware devices for future OpenShift Container Platform releases.
				
The Driver Toolkit is also used by the Kernel Module Management (KMM), which is currently available as a community Operator on OperatorHub. KMM supports out-of-tree and third-party kernel drivers and the support software for the underlying operating system. Users can create modules for KMM to build and deploy a driver container, as well as support software like a device plugin, or metrics. Modules can include a build config to build a driver container-based on the Driver Toolkit, or KMM can deploy a prebuilt driver container.
2.2. Pulling the Driver Toolkit container image
				The driver-toolkit image is available from the Container images section of the Red Hat Ecosystem Catalog and in the OpenShift Container Platform release payload. The image corresponding to the most recent minor release of OpenShift Container Platform will be tagged with the version number in the catalog. The image URL for a specific release can be found using the oc adm CLI command.
			
2.2.1. Pulling the Driver Toolkit container image from registry.redhat.io
					Instructions for pulling the driver-toolkit image from registry.redhat.io with podman or in OpenShift Container Platform can be found on the Red Hat Ecosystem Catalog. The driver-toolkit image for the latest minor release are tagged with the minor release version on registry.redhat.io, for example: registry.redhat.io/openshift4/driver-toolkit-rhel8:v4.15.
				
2.2.2. Finding the Driver Toolkit image URL in the payload
Prerequisites
- You obtained the image pull secret from Red Hat OpenShift Cluster Manager.
- 
							You installed the OpenShift CLI (oc).
Procedure
- Use the - oc admcommand to extract the image URL of the- driver-toolkitcorresponding to a certain release:- For an x86 image, the command is as follows: - oc adm release info quay.io/openshift-release-dev/ocp-release:4.15.z-x86_64 --image-for=driver-toolkit - $ oc adm release info quay.io/openshift-release-dev/ocp-release:4.15.z-x86_64 --image-for=driver-toolkit- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- For an ARM image, the command is as follows: - oc adm release info quay.io/openshift-release-dev/ocp-release:4.15.z-aarch64 --image-for=driver-toolkit - $ oc adm release info quay.io/openshift-release-dev/ocp-release:4.15.z-aarch64 --image-for=driver-toolkit- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
 - Example output - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b53883ca2bac5925857148c4a1abc300ced96c222498e3bc134fe7ce3a1dd404 - quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b53883ca2bac5925857148c4a1abc300ced96c222498e3bc134fe7ce3a1dd404- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Obtain this image using a valid pull secret, such as the pull secret required to install OpenShift Container Platform: - podman pull --authfile=path/to/pullsecret.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<SHA> - $ podman pull --authfile=path/to/pullsecret.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:<SHA>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
2.3. Using the Driver Toolkit
				As an example, the Driver Toolkit can be used as the base image for building a very simple kernel module called simple-kmod.
			
					The Driver Toolkit includes the necessary dependencies, openssl, mokutil, and keyutils, needed to sign a kernel module. However, in this example, the simple-kmod kernel module is not signed and therefore cannot be loaded on systems with Secure Boot enabled.
				
2.3.1. Build and run the simple-kmod driver container on a cluster
Prerequisites
- You have a running OpenShift Container Platform cluster.
- 
							You set the Image Registry Operator state to Managedfor your cluster.
- 
							You installed the OpenShift CLI (oc).
- 
							You are logged into the OpenShift CLI as a user with cluster-adminprivileges.
Procedure
Create a namespace. For example:
oc new-project simple-kmod-demo
$ oc new-project simple-kmod-demo- The YAML defines an - ImageStreamfor storing the- simple-kmoddriver container image, and a- BuildConfigfor building the container. Save this YAML as- 0000-buildconfig.yaml.template.- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Substitute the correct driver toolkit image for the OpenShift Container Platform version you are running in place of “DRIVER_TOOLKIT_IMAGE” with the following commands. - OCP_VERSION=$(oc get clusterversion/version -ojsonpath={.status.desired.version})- $ OCP_VERSION=$(oc get clusterversion/version -ojsonpath={.status.desired.version})- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - DRIVER_TOOLKIT_IMAGE=$(oc adm release info $OCP_VERSION --image-for=driver-toolkit) - $ DRIVER_TOOLKIT_IMAGE=$(oc adm release info $OCP_VERSION --image-for=driver-toolkit)- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - sed "s#DRIVER_TOOLKIT_IMAGE#${DRIVER_TOOLKIT_IMAGE}#" 0000-buildconfig.yaml.template > 0000-buildconfig.yaml- $ sed "s#DRIVER_TOOLKIT_IMAGE#${DRIVER_TOOLKIT_IMAGE}#" 0000-buildconfig.yaml.template > 0000-buildconfig.yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Create the image stream and build config with - oc create -f 0000-buildconfig.yaml - $ oc create -f 0000-buildconfig.yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- After the builder pod completes successfully, deploy the driver container image as a - DaemonSet.- The driver container must run with the privileged security context in order to load the kernel modules on the host. The following YAML file contains the RBAC rules and the - DaemonSetfor running the driver container. Save this YAML as- 1000-drivercontainer.yaml.- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Create the RBAC rules and daemon set: - oc create -f 1000-drivercontainer.yaml - $ oc create -f 1000-drivercontainer.yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
 
- After the pods are running on the worker nodes, verify that the - simple_kmodkernel module is loaded successfully on the host machines with- lsmod.- Verify that the pods are running: - oc get pod -n simple-kmod-demo - $ oc get pod -n simple-kmod-demo- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - NAME READY STATUS RESTARTS AGE simple-kmod-driver-build-1-build 0/1 Completed 0 6m simple-kmod-driver-container-b22fd 1/1 Running 0 40s simple-kmod-driver-container-jz9vn 1/1 Running 0 40s simple-kmod-driver-container-p45cc 1/1 Running 0 40s - NAME READY STATUS RESTARTS AGE simple-kmod-driver-build-1-build 0/1 Completed 0 6m simple-kmod-driver-container-b22fd 1/1 Running 0 40s simple-kmod-driver-container-jz9vn 1/1 Running 0 40s simple-kmod-driver-container-p45cc 1/1 Running 0 40s- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Execute the - lsmodcommand in the driver container pod:- oc exec -it pod/simple-kmod-driver-container-p45cc -- lsmod | grep simple - $ oc exec -it pod/simple-kmod-driver-container-p45cc -- lsmod | grep simple- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - simple_procfs_kmod 16384 0 simple_kmod 16384 0 - simple_procfs_kmod 16384 0 simple_kmod 16384 0- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
 
Chapter 3. Node Feature Discovery Operator
Learn about the Node Feature Discovery (NFD) Operator and how you can use it to expose node-level information by orchestrating Node Feature Discovery, a Kubernetes add-on for detecting hardware features and system configuration.
The Node Feature Discovery Operator (NFD) manages the detection of hardware features and configuration in an OpenShift Container Platform cluster by labeling the nodes with hardware-specific information. NFD labels the host with node-specific attributes, such as PCI cards, kernel, operating system version, and so on.
The NFD Operator can be found on the Operator Hub by searching for “Node Feature Discovery”.
3.1. Installing the Node Feature Discovery Operator
The Node Feature Discovery (NFD) Operator orchestrates all resources needed to run the NFD daemon set. As a cluster administrator, you can install the NFD Operator by using the OpenShift Container Platform CLI or the web console.
3.1.1. Installing the NFD Operator using the CLI
As a cluster administrator, you can install the NFD Operator using the CLI.
Prerequisites
- An OpenShift Container Platform cluster
- 
							Install the OpenShift CLI (oc).
- 
							Log in as a user with cluster-adminprivileges.
Procedure
- Create a namespace for the NFD Operator. - Create the following - Namespacecustom resource (CR) that defines the- openshift-nfdnamespace, and then save the YAML in the- nfd-namespace.yamlfile. Set- cluster-monitoringto- "true".- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Create the namespace by running the following command: - oc create -f nfd-namespace.yaml - $ oc create -f nfd-namespace.yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
 
- Install the NFD Operator in the namespace you created in the previous step by creating the following objects: - Create the following - OperatorGroupCR and save the YAML in the- nfd-operatorgroup.yamlfile:- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Create the - OperatorGroupCR by running the following command:- oc create -f nfd-operatorgroup.yaml - $ oc create -f nfd-operatorgroup.yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Create the following - SubscriptionCR and save the YAML in the- nfd-sub.yamlfile:- Example Subscription - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Create the subscription object by running the following command: - oc create -f nfd-sub.yaml - $ oc create -f nfd-sub.yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Change to the - openshift-nfdproject:- oc project openshift-nfd - $ oc project openshift-nfd- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
 
Verification
- To verify that the Operator deployment is successful, run: - oc get pods - $ oc get pods- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - NAME READY STATUS RESTARTS AGE nfd-controller-manager-7f86ccfb58-vgr4x 2/2 Running 0 10m - NAME READY STATUS RESTARTS AGE nfd-controller-manager-7f86ccfb58-vgr4x 2/2 Running 0 10m- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - A successful deployment shows a - Runningstatus.
3.1.2. Installing the NFD Operator using the web console
As a cluster administrator, you can install the NFD Operator using the web console.
Procedure
- In the OpenShift Container Platform web console, click Operators → OperatorHub.
- Choose Node Feature Discovery from the list of available Operators, and then click Install.
- On the Install Operator page, select A specific namespace on the cluster, and then click Install. You do not need to create a namespace because it is created for you.
Verification
To verify that the NFD Operator installed successfully:
- Navigate to the Operators → Installed Operators page.
- Ensure that Node Feature Discovery is listed in the openshift-nfd project with a Status of InstallSucceeded. Note- During installation an Operator might display a Failed status. If the installation later succeeds with an InstallSucceeded message, you can ignore the Failed message. 
Troubleshooting
If the Operator does not appear as installed, troubleshoot further:
- Navigate to the Operators → Installed Operators page and inspect the Operator Subscriptions and Install Plans tabs for any failure or errors under Status.
- 
							Navigate to the Workloads → Pods page and check the logs for pods in the openshift-nfdproject.
3.2. Using the Node Feature Discovery Operator
				The Node Feature Discovery (NFD) Operator orchestrates all resources needed to run the Node-Feature-Discovery daemon set by watching for a NodeFeatureDiscovery custom resource (CR). Based on the NodeFeatureDiscovery CR, the Operator creates the operand (NFD) components in the selected namespace. You can edit the CR to use another namespace, image, image pull policy, and nfd-worker-conf config map, among other options.
			
				As a cluster administrator, you can create a NodeFeatureDiscovery CR by using the OpenShift CLI (oc) or the web console.
			
					Starting with version 4.12, the operand.image field in the NodeFeatureDiscovery CR is mandatory. If the NFD Operator is deployed by using {olm-first}, OLM automatically sets the operand.image field. If you create the NodeFeatureDiscovery CR by using the OpenShift Container Platform CLI or the OpenShift Container Platform web console, you must set the operand.image field explicitly.
				
3.2.1. Creating a NodeFeatureDiscovery CR by using the CLI
					As a cluster administrator, you can create a NodeFeatureDiscovery CR instance by using the OpenShift CLI (oc).
				
						The spec.operand.image setting requires a -rhel9 image to be defined for use with OpenShift Container Platform releases 4.13 and later.
					
					The following example shows the use of -rhel9 to acquire the correct image.
				
Prerequisites
- You have access to an OpenShift Container Platform cluster
- 
							You installed the OpenShift CLI (oc).
- 
							You logged in as a user with cluster-adminprivileges.
- You installed the NFD Operator.
Procedure
- Create a - NodeFeatureDiscoveryCR:- Example - NodeFeatureDiscoveryCR- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- Theoperand.imagefield is mandatory.
 
- Create the - NodeFeatureDiscoveryCR by running the following command:- oc apply -f <filename> - $ oc apply -f <filename>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
Verification
- Check that the - NodeFeatureDiscoveryCR was created by running the following command:- oc get pods - $ oc get pods- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - A successful deployment shows a - Runningstatus.
3.2.2. Creating a NodeFeatureDiscovery CR by using the CLI in a disconnected environment
					As a cluster administrator, you can create a NodeFeatureDiscovery CR instance by using the OpenShift CLI (oc).
				
Prerequisites
- You have access to an OpenShift Container Platform cluster
- 
							You installed the OpenShift CLI (oc).
- 
							You logged in as a user with cluster-adminprivileges.
- You installed the NFD Operator.
- You have access to a mirror registry with the required images.
- 
							You installed the skopeoCLI tool.
Procedure
- Determine the digest of the registry image: - Run the following command: - skopeo inspect docker://registry.redhat.io/openshift4/ose-node-feature-discovery:<openshift_version> - $ skopeo inspect docker://registry.redhat.io/openshift4/ose-node-feature-discovery:<openshift_version>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example command - skopeo inspect docker://registry.redhat.io/openshift4/ose-node-feature-discovery:v4.12 - $ skopeo inspect docker://registry.redhat.io/openshift4/ose-node-feature-discovery:v4.12- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Inspect the output to identify the image digest: - Example output - { ... "Digest": "sha256:1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef", ... }- { ... "Digest": "sha256:1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef", ... }- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
 
- Use the - skopeoCLI tool to copy the image from- registry.redhat.ioto your mirror registry, by running the following command:- skopeo copy docker://registry.redhat.io/openshift4/ose-node-feature-discovery@<image_digest> docker://<mirror_registry>/openshift4/ose-node-feature-discovery@<image_digest> - skopeo copy docker://registry.redhat.io/openshift4/ose-node-feature-discovery@<image_digest> docker://<mirror_registry>/openshift4/ose-node-feature-discovery@<image_digest>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example command - skopeo copy docker://registry.redhat.io/openshift4/ose-node-feature-discovery@sha256:1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef docker://<your-mirror-registry>/openshift4/ose-node-feature-discovery@sha256:1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef - skopeo copy docker://registry.redhat.io/openshift4/ose-node-feature-discovery@sha256:1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef docker://<your-mirror-registry>/openshift4/ose-node-feature-discovery@sha256:1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Create a - NodeFeatureDiscoveryCR:- Example - NodeFeatureDiscoveryCR- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- Theoperand.imagefield is mandatory.
 
- Create the - NodeFeatureDiscoveryCR by running the following command:- oc apply -f <filename> - $ oc apply -f <filename>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
Verification
- Check the status of the - NodeFeatureDiscoveryCR by running the following command:- oc get nodefeaturediscovery nfd-instance -o yaml - $ oc get nodefeaturediscovery nfd-instance -o yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Check that the pods are running without - ImagePullBackOfferrors by running the following command:- oc get pods -n <nfd_namespace> - $ oc get pods -n <nfd_namespace>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
3.2.3. Creating a NodeFeatureDiscovery CR by using the web console
					As a cluster administrator, you can create a NodeFeatureDiscovery CR by using the OpenShift Container Platform web console.
				
Prerequisites
- You have access to an OpenShift Container Platform cluster
- 
							You logged in as a user with cluster-adminprivileges.
- You installed the NFD Operator.
Procedure
- Navigate to the Operators → Installed Operators page.
- In the Node Feature Discovery section, under Provided APIs, click Create instance.
- 
							Edit the values of the NodeFeatureDiscoveryCR.
- Click Create.
						Starting with version 4.12, the operand.image field in the NodeFeatureDiscovery CR is mandatory. If the NFD Operator is deployed by using {olm-first}, OLM automatically sets the operand.image field. If you create the NodeFeatureDiscovery CR by using the OpenShift Container Platform CLI or the OpenShift Container Platform web console, you must set the operand.image field explicitly.
					
3.3. Configuring the Node Feature Discovery Operator
3.3.1. core
					The core section contains common configuration settings that are not specific to any particular feature source.
				
3.3.1.1. core.sleepInterval
						core.sleepInterval specifies the interval between consecutive passes of feature detection or re-detection, and thus also the interval between node re-labeling. A non-positive value implies infinite sleep interval; no re-detection or re-labeling is done.
					
						This value is overridden by the deprecated --sleep-interval command-line flag, if specified.
					
Example usage
core: sleepInterval: 60s
core:
  sleepInterval: 60s 
						The default value is 60s.
					
3.3.1.2. core.sources
						core.sources specifies the list of enabled feature sources. A special value all enables all feature sources.
					
						This value is overridden by the deprecated --sources command-line flag, if specified.
					
						Default: [all]
					
Example usage
core:
  sources:
    - system
    - custom
core:
  sources:
    - system
    - custom3.3.1.3. core.labelWhiteList
						core.labelWhiteList specifies a regular expression for filtering feature labels based on the label name. Non-matching labels are not published.
					
The regular expression is only matched against the basename part of the label, the part of the name after '/'. The label prefix, or namespace, is omitted.
						This value is overridden by the deprecated --label-whitelist command-line flag, if specified.
					
						Default: null
					
Example usage
core: labelWhiteList: '^cpu-cpuid'
core:
  labelWhiteList: '^cpu-cpuid'3.3.1.4. core.noPublish
						Setting core.noPublish to true disables all communication with the nfd-master. It is effectively a dry run flag; nfd-worker runs feature detection normally, but no labeling requests are sent to nfd-master.
					
						This value is overridden by the --no-publish command-line flag, if specified.
					
Example:
Example usage
core: noPublish: true
core:
  noPublish: true 
						The default value is false.
					
3.3.2. core.klog
The following options specify the logger configuration, most of which can be dynamically adjusted at run-time.
The logger options can also be specified using command-line flags, which take precedence over any corresponding config file options.
3.3.2.1. core.klog.addDirHeader
						If set to true, core.klog.addDirHeader adds the file directory to the header of the log messages.
					
						Default: false
					
Run-time configurable: yes
3.3.2.2. core.klog.alsologtostderr
Log to standard error as well as files.
						Default: false
					
Run-time configurable: yes
3.3.2.3. core.klog.logBacktraceAt
When logging hits line file:N, emit a stack trace.
Default: empty
Run-time configurable: yes
3.3.2.4. core.klog.logDir
If non-empty, write log files in this directory.
Default: empty
Run-time configurable: no
3.3.2.5. core.klog.logFile
If not empty, use this log file.
Default: empty
Run-time configurable: no
3.3.2.6. core.klog.logFileMaxSize
						core.klog.logFileMaxSize defines the maximum size a log file can grow to. Unit is megabytes. If the value is 0, the maximum file size is unlimited.
					
						Default: 1800
					
Run-time configurable: no
3.3.2.7. core.klog.logtostderr
Log to standard error instead of files
						Default: true
					
Run-time configurable: yes
3.3.2.8. core.klog.skipHeaders
						If core.klog.skipHeaders is set to true, avoid header prefixes in the log messages.
					
						Default: false
					
Run-time configurable: yes
3.3.2.9. core.klog.skipLogHeaders
						If core.klog.skipLogHeaders is set to true, avoid headers when opening log files.
					
						Default: false
					
Run-time configurable: no
3.3.2.10. core.klog.stderrthreshold
Logs at or above this threshold go to stderr.
						Default: 2
					
Run-time configurable: yes
3.3.2.11. core.klog.v
						core.klog.v is the number for the log level verbosity.
					
						Default: 0
					
Run-time configurable: yes
3.3.2.12. core.klog.vmodule
						core.klog.vmodule is a comma-separated list of pattern=N settings for file-filtered logging.
					
Default: empty
Run-time configurable: yes
3.3.3. sources
					The sources section contains feature source specific configuration parameters.
				
3.3.3.1. sources.cpu.cpuid.attributeBlacklist
						Prevent publishing cpuid features listed in this option.
					
						This value is overridden by sources.cpu.cpuid.attributeWhitelist, if specified.
					
						Default: [BMI1, BMI2, CLMUL, CMOV, CX16, ERMS, F16C, HTT, LZCNT, MMX, MMXEXT, NX, POPCNT, RDRAND, RDSEED, RDTSCP, SGX, SGXLC, SSE, SSE2, SSE3, SSE4.1, SSE4.2, SSSE3]
					
Example usage
sources:
  cpu:
    cpuid:
      attributeBlacklist: [MMX, MMXEXT]
sources:
  cpu:
    cpuid:
      attributeBlacklist: [MMX, MMXEXT]3.3.3.2. sources.cpu.cpuid.attributeWhitelist
						Only publish the cpuid features listed in this option.
					
						sources.cpu.cpuid.attributeWhitelist takes precedence over sources.cpu.cpuid.attributeBlacklist.
					
Default: empty
Example usage
sources:
  cpu:
    cpuid:
      attributeWhitelist: [AVX512BW, AVX512CD, AVX512DQ, AVX512F, AVX512VL]
sources:
  cpu:
    cpuid:
      attributeWhitelist: [AVX512BW, AVX512CD, AVX512DQ, AVX512F, AVX512VL]3.3.3.3. sources.kernel.kconfigFile
						sources.kernel.kconfigFile is the path of the kernel config file. If empty, NFD runs a search in the well-known standard locations.
					
Default: empty
Example usage
sources:
  kernel:
    kconfigFile: "/path/to/kconfig"
sources:
  kernel:
    kconfigFile: "/path/to/kconfig"3.3.3.4. sources.kernel.configOpts
						sources.kernel.configOpts represents kernel configuration options to publish as feature labels.
					
						Default: [NO_HZ, NO_HZ_IDLE, NO_HZ_FULL, PREEMPT]
					
Example usage
sources:
  kernel:
    configOpts: [NO_HZ, X86, DMI]
sources:
  kernel:
    configOpts: [NO_HZ, X86, DMI]3.3.3.5. sources.pci.deviceClassWhitelist
						sources.pci.deviceClassWhitelist is a list of PCI device class IDs for which to publish a label. It can be specified as a main class only (for example, 03) or full class-subclass combination (for example 0300). The former implies that all subclasses are accepted. The format of the labels can be further configured with deviceLabelFields.
					
						Default: ["03", "0b40", "12"]
					
Example usage
sources:
  pci:
    deviceClassWhitelist: ["0200", "03"]
sources:
  pci:
    deviceClassWhitelist: ["0200", "03"]3.3.3.6. sources.pci.deviceLabelFields
						sources.pci.deviceLabelFields is the set of PCI ID fields to use when constructing the name of the feature label. Valid fields are class, vendor, device, subsystem_vendor and subsystem_device.
					
						Default: [class, vendor]
					
Example usage
sources:
  pci:
    deviceLabelFields: [class, vendor, device]
sources:
  pci:
    deviceLabelFields: [class, vendor, device]
						With the example config above, NFD would publish labels such as feature.node.kubernetes.io/pci-<class-id>_<vendor-id>_<device-id>.present=true
					
3.3.3.7. sources.usb.deviceClassWhitelist
						sources.usb.deviceClassWhitelist is a list of USB device class IDs for which to publish a feature label. The format of the labels can be further configured with deviceLabelFields.
					
						Default: ["0e", "ef", "fe", "ff"]
					
Example usage
sources:
  usb:
    deviceClassWhitelist: ["ef", "ff"]
sources:
  usb:
    deviceClassWhitelist: ["ef", "ff"]3.3.3.8. sources.usb.deviceLabelFields
						sources.usb.deviceLabelFields is the set of USB ID fields from which to compose the name of the feature label. Valid fields are class, vendor, and device.
					
						Default: [class, vendor, device]
					
Example usage
sources:
  pci:
    deviceLabelFields: [class, vendor]
sources:
  pci:
    deviceLabelFields: [class, vendor]
						With the example config above, NFD would publish labels like: feature.node.kubernetes.io/usb-<class-id>_<vendor-id>.present=true.
					
3.3.3.9. sources.custom
						sources.custom is the list of rules to process in the custom feature source to create user-specific labels.
					
Default: empty
Example usage
3.4. About the NodeFeatureRule custom resource
				NodeFeatureRule objects are a NodeFeatureDiscovery custom resource designed for rule-based custom labeling of nodes. Some use cases include application-specific labeling or distribution by hardware vendors to create specific labels for their devices.
			
				NodeFeatureRule objects provide a method to create vendor- or application-specific labels and taints. It uses a flexible rule-based mechanism for creating labels and optionally taints based on node features.
			
3.5. Using the NodeFeatureRule custom resource
				Create a NodeFeatureRule object to label nodes if a set of rules match the conditions.
			
Procedure
- Create a custom resource file named - nodefeaturerule.yamlthat contains the following text:- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - This custom resource specifies that labelling occurs when the - vethmodule is loaded and any PCI device with vendor code- 8086exists in the cluster.
- Apply the - nodefeaturerule.yamlfile to your cluster by running the following command:- oc apply -f https://raw.githubusercontent.com/kubernetes-sigs/node-feature-discovery/v0.13.6/examples/nodefeaturerule.yaml - $ oc apply -f https://raw.githubusercontent.com/kubernetes-sigs/node-feature-discovery/v0.13.6/examples/nodefeaturerule.yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - The example applies the feature label on nodes with the - vethmodule loaded and any PCI device with vendor code- 8086exists.Note- A relabeling delay of up to 1 minute might occur. 
3.6. Using the NFD Topology Updater
				The Node Feature Discovery (NFD) Topology Updater is a daemon responsible for examining allocated resources on a worker node. It accounts for resources that are available to be allocated to new pod on a per-zone basis, where a zone can be a Non-Uniform Memory Access (NUMA) node. The NFD Topology Updater communicates the information to nfd-master, which creates a NodeResourceTopology custom resource (CR) corresponding to all of the worker nodes in the cluster. One instance of the NFD Topology Updater runs on each node of the cluster.
			
				To enable the Topology Updater workers in NFD, set the topologyupdater variable to true in the NodeFeatureDiscovery CR, as described in the section Using the Node Feature Discovery Operator.
			
3.6.1. NodeResourceTopology CR
When run with NFD Topology Updater, NFD creates custom resource instances corresponding to the node resource hardware topology, such as:
3.6.2. NFD Topology Updater command-line flags
					To view available command-line flags, run the nfd-topology-updater -help command. For example, in a podman container, run the following command:
				
podman run gcr.io/k8s-staging-nfd/node-feature-discovery:master nfd-topology-updater -help
$ podman run gcr.io/k8s-staging-nfd/node-feature-discovery:master nfd-topology-updater -help3.6.2.1. -ca-file
						The -ca-file flag is one of the three flags, together with the -cert-file and `-key-file`flags, that controls the mutual TLS authentication on the NFD Topology Updater. This flag specifies the TLS root certificate that is used for verifying the authenticity of nfd-master.
					
Default: empty
							The -ca-file flag must be specified together with the -cert-file and -key-file flags.
						
Example
nfd-topology-updater -ca-file=/opt/nfd/ca.crt -cert-file=/opt/nfd/updater.crt -key-file=/opt/nfd/updater.key
$ nfd-topology-updater -ca-file=/opt/nfd/ca.crt -cert-file=/opt/nfd/updater.crt -key-file=/opt/nfd/updater.key3.6.2.2. -cert-file
						The -cert-file flag is one of the three flags, together with the -ca-file and -key-file flags, that controls mutual TLS authentication on the NFD Topology Updater. This flag specifies the TLS certificate presented for authenticating outgoing requests.
					
Default: empty
							The -cert-file flag must be specified together with the -ca-file and -key-file flags.
						
Example
nfd-topology-updater -cert-file=/opt/nfd/updater.crt -key-file=/opt/nfd/updater.key -ca-file=/opt/nfd/ca.crt
$ nfd-topology-updater -cert-file=/opt/nfd/updater.crt -key-file=/opt/nfd/updater.key -ca-file=/opt/nfd/ca.crt3.6.2.3. -h, -help
Print usage and exit.
3.6.2.4. -key-file
						The -key-file flag is one of the three flags, together with the -ca-file and -cert-file flags, that controls the mutual TLS authentication on the NFD Topology Updater. This flag specifies the private key corresponding the given certificate file, or -cert-file, that is used for authenticating outgoing requests.
					
Default: empty
							The -key-file flag must be specified together with the -ca-file and -cert-file flags.
						
Example
nfd-topology-updater -key-file=/opt/nfd/updater.key -cert-file=/opt/nfd/updater.crt -ca-file=/opt/nfd/ca.crt
$ nfd-topology-updater -key-file=/opt/nfd/updater.key -cert-file=/opt/nfd/updater.crt -ca-file=/opt/nfd/ca.crt3.6.2.5. -kubelet-config-file
						The -kubelet-config-file specifies the path to the Kubelet’s configuration file.
					
						Default: /host-var/lib/kubelet/config.yaml
					
Example
nfd-topology-updater -kubelet-config-file=/var/lib/kubelet/config.yaml
$ nfd-topology-updater -kubelet-config-file=/var/lib/kubelet/config.yaml3.6.2.6. -no-publish
						The -no-publish flag disables all communication with the nfd-master, making it a dry run flag for nfd-topology-updater. NFD Topology Updater runs resource hardware topology detection normally, but no CR requests are sent to nfd-master.
					
						Default: false
					
Example
nfd-topology-updater -no-publish
$ nfd-topology-updater -no-publish3.6.2.7. -oneshot
						The -oneshot flag causes the NFD Topology Updater to exit after one pass of resource hardware topology detection.
					
						Default: false
					
Example
nfd-topology-updater -oneshot -no-publish
$ nfd-topology-updater -oneshot -no-publish3.6.2.8. -podresources-socket
						The -podresources-socket flag specifies the path to the Unix socket where kubelet exports a gRPC service to enable discovery of in-use CPUs and devices, and to provide metadata for them.
					
						Default: /host-var/liblib/kubelet/pod-resources/kubelet.sock
					
Example
nfd-topology-updater -podresources-socket=/var/lib/kubelet/pod-resources/kubelet.sock
$ nfd-topology-updater -podresources-socket=/var/lib/kubelet/pod-resources/kubelet.sock3.6.2.9. -server
						The -server flag specifies the address of the nfd-master endpoint to connect to.
					
						Default: localhost:8080
					
Example
nfd-topology-updater -server=nfd-master.nfd.svc.cluster.local:443
$ nfd-topology-updater -server=nfd-master.nfd.svc.cluster.local:4433.6.2.10. -server-name-override
						The -server-name-override flag specifies the common name (CN) which to expect from the nfd-master TLS certificate. This flag is mostly intended for development and debugging purposes.
					
Default: empty
Example
nfd-topology-updater -server-name-override=localhost
$ nfd-topology-updater -server-name-override=localhost3.6.2.11. -sleep-interval
						The -sleep-interval flag specifies the interval between resource hardware topology re-examination and custom resource updates. A non-positive value implies infinite sleep interval and no re-detection is done.
					
						Default: 60s
					
Example
nfd-topology-updater -sleep-interval=1h
$ nfd-topology-updater -sleep-interval=1h3.6.2.12. -version
Print version and exit.
3.6.2.13. -watch-namespace
						The -watch-namespace flag specifies the namespace to ensure that resource hardware topology examination only happens for the pods running in the specified namespace. Pods that are not running in the specified namespace are not considered during resource accounting. This is particularly useful for testing and debugging purposes. A * value means that all of the pods across all namespaces are considered during the accounting process.
					
						Default: *
					
Example
nfd-topology-updater -watch-namespace=rte
$ nfd-topology-updater -watch-namespace=rteChapter 4. Kernel Module Management Operator
Learn about the Kernel Module Management (KMM) Operator and how you can use it to deploy out-of-tree kernel modules and device plugins on OpenShift Container Platform clusters.
4.1. About the Kernel Module Management Operator
The Kernel Module Management (KMM) Operator manages, builds, signs, and deploys out-of-tree kernel modules and device plugins on OpenShift Container Platform clusters.
				KMM adds a new Module CRD which describes an out-of-tree kernel module and its associated device plugin. You can use Module resources to configure how to load the module, define ModuleLoader images for kernel versions, and include instructions for building and signing modules for specific kernel versions.
			
KMM is designed to accommodate multiple kernel versions at once for any kernel module, allowing for seamless node upgrades and reduced application downtime.
4.2. Installing the Kernel Module Management Operator
As a cluster administrator, you can install the Kernel Module Management (KMM) Operator by using the OpenShift CLI or the web console.
The KMM Operator is supported on OpenShift Container Platform 4.12 and later. Installing KMM on version 4.11 does not require specific additional steps. For details on installing KMM on version 4.10 and earlier, see the section "Installing the Kernel Module Management Operator on earlier versions of OpenShift Container Platform".
4.2.1. Installing the Kernel Module Management Operator using the web console
As a cluster administrator, you can install the Kernel Module Management (KMM) Operator using the OpenShift Container Platform web console.
Procedure
- Log in to the OpenShift Container Platform web console.
- Install the Kernel Module Management Operator: - In the OpenShift Container Platform web console, click Operators → OperatorHub.
- Select Kernel Module Management Operator from the list of available Operators, and then click Install.
- 
									From the Installed Namespace list, select the openshift-kmmnamespace.
- Click Install.
 
Verification
To verify that KMM Operator installed successfully:
- Navigate to the Operators → Installed Operators page.
- Ensure that Kernel Module Management Operator is listed in the openshift-kmm project with a Status of InstallSucceeded. Note- During installation, an Operator might display a Failed status. If the installation later succeeds with an InstallSucceeded message, you can ignore the Failed message. 
Troubleshooting
- To troubleshoot issues with Operator installation: - Navigate to the Operators → Installed Operators page and inspect the Operator Subscriptions and Install Plans tabs for any failure or errors under Status.
- 
									Navigate to the Workloads → Pods page and check the logs for pods in the openshift-kmmproject.
 
4.2.2. Installing the Kernel Module Management Operator by using the CLI
As a cluster administrator, you can install the Kernel Module Management (KMM) Operator by using the OpenShift CLI.
Prerequisites
- You have a running OpenShift Container Platform cluster.
- 
							You installed the OpenShift CLI (oc).
- 
							You are logged into the OpenShift CLI as a user with cluster-adminprivileges.
Procedure
- Install KMM in the - openshift-kmmnamespace:- Create the following - NamespaceCR and save the YAML file, for example,- kmm-namespace.yaml:- apiVersion: v1 kind: Namespace metadata: name: openshift-kmm - apiVersion: v1 kind: Namespace metadata: name: openshift-kmm- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Create the following - OperatorGroupCR and save the YAML file, for example,- kmm-op-group.yaml:- apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: kernel-module-management namespace: openshift-kmm - apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: kernel-module-management namespace: openshift-kmm- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Create the following - SubscriptionCR and save the YAML file, for example,- kmm-sub.yaml:- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Create the subscription object by running the following command: - oc create -f kmm-sub.yaml - $ oc create -f kmm-sub.yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
 
Verification
- To verify that the Operator deployment is successful, run the following command: - oc get -n openshift-kmm deployments.apps kmm-operator-controller - $ oc get -n openshift-kmm deployments.apps kmm-operator-controller- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - NAME READY UP-TO-DATE AVAILABLE AGE kmm-operator-controller 1/1 1 1 97s - NAME READY UP-TO-DATE AVAILABLE AGE kmm-operator-controller 1/1 1 1 97s- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - The Operator is available. 
4.2.3. Installing the Kernel Module Management Operator on earlier versions of OpenShift Container Platform
					The KMM Operator is supported on OpenShift Container Platform 4.12 and later. For version 4.10 and earlier, you must create a new SecurityContextConstraint object and bind it to the Operator’s ServiceAccount. As a cluster administrator, you can install the Kernel Module Management (KMM) Operator by using the OpenShift CLI.
				
Prerequisites
- You have a running OpenShift Container Platform cluster.
- 
							You installed the OpenShift CLI (oc).
- 
							You are logged into the OpenShift CLI as a user with cluster-adminprivileges.
Procedure
- Install KMM in the - openshift-kmmnamespace:- Create the following - NamespaceCR and save the YAML file, for example,- kmm-namespace.yamlfile:- apiVersion: v1 kind: Namespace metadata: name: openshift-kmm - apiVersion: v1 kind: Namespace metadata: name: openshift-kmm- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Create the following - SecurityContextConstraintobject and save the YAML file, for example,- kmm-security-constraint.yaml:- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Bind the - SecurityContextConstraintobject to the Operator’s- ServiceAccountby running the following commands:- oc apply -f kmm-security-constraint.yaml - $ oc apply -f kmm-security-constraint.yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - oc adm policy add-scc-to-user kmm-security-constraint -z kmm-operator-controller -n openshift-kmm - $ oc adm policy add-scc-to-user kmm-security-constraint -z kmm-operator-controller -n openshift-kmm- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Create the following - OperatorGroupCR and save the YAML file, for example,- kmm-op-group.yaml:- apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: kernel-module-management namespace: openshift-kmm - apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: kernel-module-management namespace: openshift-kmm- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Create the following - SubscriptionCR and save the YAML file, for example,- kmm-sub.yaml:- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Create the subscription object by running the following command: - oc create -f kmm-sub.yaml - $ oc create -f kmm-sub.yaml- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
 
Verification
- To verify that the Operator deployment is successful, run the following command: - oc get -n openshift-kmm deployments.apps kmm-operator-controller - $ oc get -n openshift-kmm deployments.apps kmm-operator-controller- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - NAME READY UP-TO-DATE AVAILABLE AGE kmm-operator-controller 1/1 1 1 97s - NAME READY UP-TO-DATE AVAILABLE AGE kmm-operator-controller 1/1 1 1 97s- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - The Operator is available. 
4.3. Configuring the Kernel Module Management Operator
In most cases, the default configuration for the Kernel Module Management (KMM) Operator does not need to be modified. However, you can modify the Operator settings to suit your environment using the following procedure.
				The Operator configuration is set in the kmm-operator-manager-config ConfigMap in the Operator namespace.
			
Procedure
- To modify the settings, edit the - ConfigMapdata by entering the following command:- oc edit configmap -n "$namespace" kmm-operator-manager-config - $ oc edit configmap -n "$namespace" kmm-operator-manager-config- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example output - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Expand - Table 4.1. Operator configuration parameters - Parameter - Description - healthProbeBindAddress- Defines the address on which the Operator monitors for kubelet health probes. The recommended value is - :8081.- job.gcDelay- Defines the duration that successful build pods should be preserved for before they are deleted. There is no recommended value for this setting. For information about the valid values for this setting, see ParseDuration. - leaderElection.enabled- Determines whether leader election is used to ensure that only one replica of the KMM Operator is running at any time. For more information, see Leases. The recommended value is - true.- leaderElection.resourceID- Determines the name of the resource that leader election uses for holding the leader lock. The recommended value is - kmm.sigs.x-k8s.io.- webhook.disableHTTP2- If - true, disables HTTP/2 for the webhook server, as a mitigation for cve-2023-44487. The recommended value is- true.- webhook.port- Defines the port on which the Operator monitors webhook requests. The recommended value is - 9443.- metrics.enableAuthnAuthz- Determines if metrics are authenticated using - TokenReviewsand authorized using- SubjectAccessReviewswith the kube-apiserver.- For authentication and authorization, the controller needs a - ClusterRolewith the following rules:- 
												apiGroups: authentication.k8s.io, resources: tokenreviews, verbs: create
- 
												apiGroups: authorization.k8s.io, resources: subjectaccessreviews, verbs: create
 - To scrape metrics, for example, using Prometheus, the client needs a - ClusterRolewith the following rule:- 
												nonResourceURLs: "/metrics", verbs: get
 - The recommended value is - true.- metrics.disableHTTP2- If - true, disables HTTP/2 for the metrics server as a mitigation for CVE-2023-44487. The recommended value is- true.- metrics.bindAddress- Determines the bind address for the metrics server. If unspecified, the default is - :8080. To disable the metrics server, set to- 0. The recommended value is- 0.0.0.0:8443.- metrics.secureServing- Determines whether the metrics are served over HTTPS instead of HTTP. The recommended value is - true.- worker.runAsUser- Determines the value of the - runAsUserfield of the worker container’s security context. For more information, see SecurityContext. The recommended value is- 9443.- worker.seLinuxType- Determines the value of the - seLinuxOptions.typefield of the worker container’s security context. For more information, see SecurityContext. The recommended value is- spc_t.- worker.setFirmwareClassPath- Sets the kernel’s firmware search path into the - /sys/module/firmware_class/parameters/pathfile on the node. The recommended value is- /var/lib/firmwareif you need to set that value through the worker app. Otherwise, unset.
- 
												
- After modifying the settings, restart the controller with the following command: - oc delete pod -n "<namespace>" -l app.kubernetes.io/component=kmm - $ oc delete pod -n "<namespace>" -l app.kubernetes.io/component=kmm- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow Note- The value of <namespace> depends on your original installation method. 
4.3.1. Unloading the kernel module
You must unload the kernel modules when moving to a newer version or if they introduce some undesirable side effect on the node.
Procedure
- To unload a module loaded with KMM from nodes, delete the corresponding - Moduleresource. KMM then creates worker pods, where required, to run- modprobe -rand unload the kernel module from the nodes.Warning- When unloading worker pods, KMM needs all the resources it uses when loading the kernel module. This includes the - ServiceAccountreferenced in the- Moduleas well as any RBAC defined to allow privileged KMM worker Pods to run. It also includes any pull secret referenced in- .spec.imageRepoSecret.- To avoid situations where KMM is unable to unload the kernel module from nodes, make sure those resources are not deleted while the - Moduleresource is still present in the cluster in any state, including- Terminating. KMM includes a validating admission webhook that rejects the deletion of namespaces that contain at least one- Moduleresource.
4.3.2. Setting the kernel firmware search path
					The Linux kernel accepts the firmware_class.path parameter as a search path for firmware, as explained in Firmware search paths.
				
KMM worker pods can set this value on nodes by writing to sysfs before attempting to load kmods.
Procedure
- 
							To define a firmware search path, set worker.setFirmwareClassPathto/var/lib/firmwarein the Operator configuration.
4.4. Uninstalling the Kernel Module Management Operator
Use one of the following procedures to uninstall the Kernel Module Management (KMM) Operator, depending on how the KMM Operator was installed.
4.4.1. Uninstalling a Red Hat catalog installation
Use this procedure if KMM was installed from the Red Hat catalog.
Procedure
Use the following method to uninstall the KMM Operator:
- Use the OpenShift console under Operators -→ Installed Operators to locate and uninstall the Operator.
						Alternatively, you can delete the Subscription resource in the KMM namespace.
					
4.4.2. Uninstalling a CLI installation
Use this command if the KMM Operator was installed using the OpenShift CLI.
Procedure
- Run the following command to uninstall the KMM Operator: - oc delete -k https://github.com/rh-ecosystem-edge/kernel-module-management/config/default - $ oc delete -k https://github.com/rh-ecosystem-edge/kernel-module-management/config/default- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow Note- Using this command deletes the - ModuleCRD and all- Moduleinstances in the cluster.
4.5. Kernel module deployment
				Kernel Module Management (KMM) monitors Node and Module resources in the cluster to determine if a kernel module should be loaded on or unloaded from a node.
			
To be eligible for a module, a node must contain the following:
- 
						Labels that match the module’s .spec.selectorfield.
- 
						A kernel version matching one of the items in the module’s .spec.moduleLoader.container.kernelMappingsfield.
- 
						If ordered upgrade (ordered_upgrade.md) is configured in the module, a label that matches its.spec.moduleLoader.container.versionfield.
				When KMM reconciles nodes with the desired state as configured in the Module resource, it creates worker pods on the target nodes to run the necessary action. The KMM Operator monitors the outcome of the pods and records the information. The Operator uses this information to label the Node objects when the module is successfully loaded, and to run the device plugin, if configured.
			
				Worker pods run the KMM worker binary that performs the following tasks:
			
- 
						Pulls the kmod image configured in the Moduleresource. Kmod images are standard OCI images that contain.kofiles.
- Extracts the image in the pod’s filesystem.
- 
						Runs modprobewith the specified arguments to perform the necessary action.
4.5.1. The Module custom resource definition
					The Module custom resource definition (CRD) represents a kernel module that can be loaded on all or select nodes in the cluster, through a kmod image. A Module custom resource (CR) specifies one or more kernel versions with which it is compatible, and a node selector.
				
					The compatible versions for a Module resource are listed under .spec.moduleLoader.container.kernelMappings. A kernel mapping can either match a literal version, or use regexp to match many of them at the same time.
				
					The reconciliation loop for the Module resource runs the following steps:
				
- 
							List all nodes matching .spec.selector.
- Build a set of all kernel versions running on those nodes.
- For each kernel version: - 
									Go through .spec.moduleLoader.container.kernelMappingsand find the appropriate container image name. If the kernel mapping hasbuildorsigndefined and the container image does not already exist, run the build, the signing pod, or both, as needed.
- 
									Create a worker pod to pull the container image determined in the previous step and run modprobe.
- 
									If .spec.devicePluginis defined, create a device plugin daemon set using the configuration specified under.spec.devicePlugin.container.
 
- 
									Go through 
- Run - garbage-collecton:- 
									Obsolete device plugin DaemonSetsthat do not target any node.
- Successful build pods.
- Successful signing pods.
 
- 
									Obsolete device plugin 
4.5.2. Set soft dependencies between kernel modules
					Some configurations require that several kernel modules be loaded in a specific order to work properly, even though the modules do not directly depend on each other through symbols. These are called soft dependencies. depmod is usually not aware of these dependencies, and they do not appear in the files it produces. For example, if mod_a has a soft dependency on mod_b, modprobe mod_a will not load mod_b.
				
					You can resolve these situations by declaring soft dependencies in the Module custom resource definition (CRD) using the modulesLoadingOrder field.
				
					In the configuration above, the worker pod will first try to unload the in-tree mod_b before loading mod_a from the kmod image. When the worker pod is terminated and mod_a is unloaded, mod_b will not be loaded again.
				
						The first value in the list, to be loaded last, must be equivalent to the moduleName.
					
4.6. Security and permissions
Loading kernel modules is a highly sensitive operation. After they are loaded, kernel modules have all possible permissions to do any kind of operation on the node.
4.6.1. ServiceAccounts and SecurityContextConstraints
					Kernel Module Management (KMM) creates a privileged workload to load the kernel modules on nodes. That workload needs ServiceAccounts allowed to use the privileged SecurityContextConstraint (SCC) resource.
				
					The authorization model for that workload depends on the namespace of the Module resource, as well as its spec.
				
- 
							If the .spec.moduleLoader.serviceAccountNameor.spec.devicePlugin.serviceAccountNamefields are set, they are always used.
- If those fields are not set, then: - 
									If the Moduleresource is created in the Operator’s namespace (openshift-kmmby default), then KMM uses its default, powerfulServiceAccountsto run the worker and device plugin pods.
- 
									If the Moduleresource is created in any other namespace, then KMM runs the pods with the namespace’sdefaultServiceAccount. TheModuleresource cannot run a privileged workload unless you manually enable it to use theprivilegedSCC.
 
- 
									If the 
						openshift-kmm is a trusted namespace.
					
						When setting up RBAC permissions, remember that any user or ServiceAccount creating a Module resource in the openshift-kmm namespace results in KMM automatically running privileged workloads on potentially all nodes in the cluster.
					
					To allow any ServiceAccount to use the privileged SCC and run worker or device plugin pods, you can use the oc adm policy command, as in the following example:
				
oc adm policy add-scc-to-user privileged -z "${serviceAccountName}" [ -n "${namespace}" ]
$ oc adm policy add-scc-to-user privileged -z "${serviceAccountName}" [ -n "${namespace}" ]4.6.2. Pod security standards
OpenShift runs a synchronization mechanism that sets the namespace Pod Security level automatically based on the security contexts in use. No action is needed.
4.7. Replacing in-tree modules with out-of-tree modules
You can use Kernel Module Management (KMM) to build kernel modules that can be loaded or unloaded into the kernel on demand. These modules extend the functionality of the kernel without the need to reboot the system. Modules can be configured as built-in or dynamically loaded.
Dynamically loaded modules include in-tree modules and out-of-tree (OOT) modules. In-tree modules are internal to the Linux kernel tree, that is, they are already part of the kernel. Out-of-tree modules are external to the Linux kernel tree. They are generally written for development and testing purposes, such as testing the new version of a kernel module that is shipped in-tree, or to deal with incompatibilities.
				Some modules that are loaded by KMM could replace in-tree modules that are already loaded on the node. To unload in-tree modules before loading your module, set the value of the .spec.moduleLoader.container.inTreeModulesToRemove field to the modules that you want to unload. The following example demonstrates module replacement for all kernel mappings:
			
				In this example, the moduleLoader pod uses inTreeModulesToRemove to unload the in-tree mod_a and mod_b before loading mod_a from the moduleLoader image. When the moduleLoader`pod is terminated and `mod_a is unloaded, mod_b is not loaded again.
			
The following is an example for module replacement for specific kernel mappings:
4.7.1. Example Module CR
					The following is an annotated Module example:
				
- 1 1 1
- Required.
- 2
- Optional.
- 3
- Optional: Copies/firmware/*into/var/lib/firmware/on the node.
- 4
- Optional.
- 5
- At least one kernel item is required.
- 6
- For each node running a kernel matching the regular expression, KMM creates aDaemonSetresource running the image specified incontainerImagewith${KERNEL_FULL_VERSION}replaced with the kernel version.
- 7
- For any other kernel, build the image using the Dockerfile in themy-kmodConfigMap.
- 8
- Optional.
- 9
- Optional: A value forsome-kubernetes-secretcan be obtained from the build environment at/run/secrets/some-kubernetes-secret.
- 10
- This field has no effect. When building kmod images or signing kmods within a kmod image, you might sometimes need to pull base images from a registry that serves a certificate signed by an untrusted Certificate Authority (CA). In order for KMM to trust that CA, it must also trust the new CA by replacing the cluster’s CA bundle.See "Additional resources" to learn how to replace the cluster’s CA bundle. 
- 11
- Optional: Avoid using this parameter. If set totrue, the build will skip any TLS server certificate validation when pulling the image in the DockerfileFROMinstruction using plain HTTP.
- 12
- Required.
- 13
- Required: A secret holding the public secureboot key with the key 'cert'.
- 14
- Required: A secret holding the private secureboot key with the key 'key'.
- 15
- Optional: Avoid using this parameter. If set totrue, KMM will be allowed to check if the container image already exists using plain HTTP.
- 16
- Optional: Avoid using this parameter. If set totrue, KMM will skip any TLS server certificate validation when checking if the container image already exists.
- 17
- Optional.
- 18
- Optional.
- 19
- Required: If the device plugin section is present.
- 20
- Optional.
- 21
- Optional.
- 22
- Optional.
- 23
- Optional: Used to pull module loader and device plugin images.
4.8. Symbolic links for in-tree dependencies
				Some kernel modules depend on other kernel modules that are shipped with the node’s operating system. To avoid copying those dependencies into the kmod image, Kernel Module Management (KMM) mounts /usr/lib/modules into both the build and the worker pod’s filesystems.
			
				By creating a symlink from /opt/usr/lib/modules/<kernel_version>/<symlink_name> to /usr/lib/modules/<kernel_version>, depmod can use the in-tree kmods on the building node’s filesystem to resolve dependencies.
			
				At runtime, the worker pod extracts the entire image, including the <symlink_name> symbolic link. That symbolic link points to /usr/lib/modules/<kernel_version> in the worker pod, which is mounted from the node’s filesystem. modprobe can then follow that link and load the in-tree dependencies as needed.
			
				In the following example, host is the symbolic link name under /opt/usr/lib/modules/<kernel_version>:
			
					depmod generates dependency files based on the kernel modules present on the node that runs the kmod image build.
				
					On the node on which KMM loads the kernel modules, modprobe expects the files to be present under /usr/lib/modules/<kernel_version>, and the same filesystem layout. It is highly recommended that the build and the target nodes share the same operating system and release.
				
4.9. Creating a kmod image
				Kernel Module Management (KMM) works with purpose-built kmod images, which are standard OCI images that contain .ko files. The location of the .ko files must match the following pattern: <prefix>/lib/modules/[kernel-version]/.
			
				Keep the following in mind when working with the .ko files:
			
- 
						In most cases, <prefix>should be equal to/opt. This is theModuleCRD’s default value.
- 
						kernel-versionmust not be empty and must be equal to the kernel version the kernel modules were built for.
4.9.1. Running depmod
					It is recommended to run depmod at the end of the build process to generate modules.dep and .map files. This is especially useful if your kmod image contains several kernel modules and if one of the modules depends on another module.
				
						You must have a Red Hat subscription to download the kernel-devel package.
					
Procedure
- Generate - modules.depand- .mapfiles for a specific kernel version by running the following command:- depmod -b /opt ${KERNEL_FULL_VERSION}+`.- $ depmod -b /opt ${KERNEL_FULL_VERSION}+`.- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
4.9.1.1. Example Dockerfile
If you are building your image on OpenShift Container Platform, consider using the Driver Tool Kit (DTK).
For further information, see using an entitled build.
4.9.2. Building in the cluster
KMM can build kmod images in the cluster. Follow these guidelines:
- 
							Provide build instructions using the buildsection of a kernel mapping.
- 
							Copy the Dockerfile for your container image into a ConfigMapresource, under thedockerfilekey.
- 
							Ensure that the ConfigMapis located in the same namespace as theModule.
					KMM checks if the image name specified in the containerImage field exists. If it does, the build is skipped.
				
					Otherwise, KMM creates a Build resource to build your image. After the image is built, KMM proceeds with the Module reconciliation. See the following example.
				
- 1
- Optional.
- 2
- Optional.
- 3
- Will be mounted in the build pod as/run/secrets/some-kubernetes-secret.
- 4
- Optional: Avoid using this parameter. If set totrue, the build will be allowed to pull the image in the DockerfileFROMinstruction using plain HTTP.
- 5
- Optional: Avoid using this parameter. If set totrue, the build will skip any TLS server certificate validation when pulling the image in the DockerfileFROMinstruction using plain HTTP.
- 6
- Required.
- 7
- Optional: Avoid using this parameter. If set totrue, KMM will be allowed to check if the container image already exists using plain HTTP.
- 8
- Optional: Avoid using this parameter. If set totrue, KMM will skip any TLS server certificate validation when checking if the container image already exists.
					Successful build pods are garbage collected immediately, unless the job.gcDelay parameter is set in the Operator configuration. Failed build pods are always preserved and must be deleted manually by the administrator for the build to be restarted.
				
4.9.3. Using the Driver Toolkit
The Driver Toolkit (DTK) is a convenient base image for building build kmod loader images. It contains tools and libraries for the OpenShift version currently running in the cluster.
Procedure
Use DTK as the first stage of a multi-stage Dockerfile.
- Build the kernel modules.
- 
							Copy the .kofiles into a smaller end-user image such asubi-minimal.
- To leverage DTK in your in-cluster build, use the - DTK_AUTObuild argument. The value is automatically set by KMM when creating the- Buildresource. See the following example.- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
4.10. Using signing with Kernel Module Management (KMM)
				On a Secure Boot enabled system, all kernel modules (kmods) must be signed with a public/private key-pair enrolled into the Machine Owner’s Key (MOK) database. Drivers distributed as part of a distribution should already be signed by the distribution’s private key, but for kernel modules build out-of-tree, KMM supports signing kernel modules using the sign section of the kernel mapping.
			
For more details on using Secure Boot, see Generating a public and private key pair
Prerequisites
- A public private key pair in the correct (DER) format.
- At least one secure-boot enabled node with the public key enrolled in its MOK database.
- Either a pre-built driver container image, or the source code and Dockerfile needed to build one in-cluster.
4.11. Adding the keys for secureboot
To use KMM Kernel Module Management (KMM) to sign kernel modules, a certificate and private key are required. For details on how to create these, see Generating a public and private key pair.
For details on how to extract the public and private key pair, see Signing kernel modules with the private key. Use steps 1 through 4 to extract the keys into files.
Procedure
- Create the - sb_cert.cerfile that contains the certificate and the- sb_cert.privfile that contains the private key:- openssl req -x509 -new -nodes -utf8 -sha256 -days 36500 -batch -config configuration_file.config -outform DER -out my_signing_key_pub.der -keyout my_signing_key.priv - $ openssl req -x509 -new -nodes -utf8 -sha256 -days 36500 -batch -config configuration_file.config -outform DER -out my_signing_key_pub.der -keyout my_signing_key.priv- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Add the files by using one of the following methods: - Add the files as secrets directly: - oc create secret generic my-signing-key --from-file=key=<my_signing_key.priv> - $ oc create secret generic my-signing-key --from-file=key=<my_signing_key.priv>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - oc create secret generic my-signing-key-pub --from-file=cert=<my_signing_key_pub.der> - $ oc create secret generic my-signing-key-pub --from-file=cert=<my_signing_key_pub.der>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Add the files by base64 encoding them: - cat sb_cert.priv | base64 -w 0 > my_signing_key2.base64 - $ cat sb_cert.priv | base64 -w 0 > my_signing_key2.base64- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - cat sb_cert.cer | base64 -w 0 > my_signing_key_pub.base64 - $ cat sb_cert.cer | base64 -w 0 > my_signing_key_pub.base64- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
 
- Add the encoded text to a YAML file: - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- Apply the YAML file: - oc apply -f <yaml_filename> - $ oc apply -f <yaml_filename>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
4.11.1. Checking the keys
After you have added the keys, you must check them to ensure they are set correctly.
Procedure
- Check to ensure the public key secret is set correctly: - oc get secret -o yaml <certificate secret name> | awk '/cert/{print $2; exit}' | base64 -d | openssl x509 -inform der -text- $ oc get secret -o yaml <certificate secret name> | awk '/cert/{print $2; exit}' | base64 -d | openssl x509 -inform der -text- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - This should display a certificate with a Serial Number, Issuer, Subject, and more. 
- Check to ensure the private key secret is set correctly: - oc get secret -o yaml <private key secret name> | awk '/key/{print $2; exit}' | base64 -d- $ oc get secret -o yaml <private key secret name> | awk '/key/{print $2; exit}' | base64 -d- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - This should display the key enclosed in the - -----BEGIN PRIVATE KEY-----and- -----END PRIVATE KEY-----lines.
4.12. Signing kmods in a pre-built image
Use this procedure if you have a pre-built image, such as an image either distributed by a hardware vendor or built elsewhere.
				The following YAML file adds the public/private key-pair as secrets with the required key names - key for the private key, cert for the public key. The cluster then pulls down the unsignedImage image, opens it, signs the kernel modules listed in filesToSign, adds them back, and pushes the resulting image as containerImage.
			
KMM then loads the signed kmods onto all the nodes with that match the selector. The kmods are successfully loaded on any nodes that have the public key in their MOK database, and any nodes that are not secure-boot enabled, which will ignore the signature.
Prerequisites
- 
						The keySecretandcertSecretsecrets have been created in the same namespace as the rest of the resources.
Procedure
- Apply the YAML file: - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
4.13. Building and signing a kmod image
Use this procedure if you have source code and must build your image first.
				The following YAML file builds a new container image using the source code from the repository. The image produced is saved back in the registry with a temporary name, and this temporary image is then signed using the parameters in the sign section.
			
				The temporary image name is based on the final image name and is set to be <containerImage>:<tag>-<namespace>_<module name>_kmm_unsigned.
			
				For example, using the following YAML file, Kernel Module Management (KMM) builds an image named example.org/repository/minimal-driver:final-default_example-module_kmm_unsigned containing the build with unsigned kmods and pushes it to the registry. Then it creates a second image named example.org/repository/minimal-driver:final that contains the signed kmods. It is this second image that is pulled by the worker pods and contains the kmods to be loaded on the cluster nodes.
			
After it is signed, you can safely delete the temporary image from the registry. It will be rebuilt, if needed.
Prerequisites
- 
						The keySecretandcertSecretsecrets have been created in the same namespace as the rest of the resources.
Procedure
- Apply the YAML file: - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- 1 2
- Replacedefaultwith a valid namespace.
- 3
- The defaultserviceAccountNamedoes not have the required permissions to run a module that is privileged. For information on creating a service account, see "Creating service accounts" in the "Additional resources" of this section.
- 4
- Used asimagePullSecretsin theDaemonSetobject and to pull and push for the build and sign features.
4.14. KMM hub and spoke
In hub and spoke scenarios, many spoke clusters are connected to a central, powerful hub cluster. Kernel Module Management (KMM) depends on Red Hat Advanced Cluster Management (RHACM) to operate in hub and spoke environments.
				KMM is compatible with hub and spoke environments through decoupling KMM features. A ManagedClusterModule custom resource definition (CRD) is provided to wrap the existing Module CRD and extend it to select Spoke clusters. Also provided is KMM-Hub, a new standalone controller that builds images and signs modules on the hub cluster.
			
In hub and spoke setups, spokes are focused, resource-constrained clusters that are centrally managed by a hub cluster. Spokes run the single-cluster edition of KMM, with those resource-intensive features disabled. To adapt KMM to this environment, you should reduce the workload running on the spokes to the minimum, while the hub takes care of the expensive tasks.
				Building kernel module images and signing the .ko files, should run on the hub. The scheduling of the Module Loader and Device Plugin DaemonSets can only happen on the spokes.
			
4.14.1. KMM-Hub
The KMM project provides KMM-Hub, an edition of KMM dedicated to hub clusters. KMM-Hub monitors all kernel versions running on the spokes and determines the nodes on the cluster that should receive a kernel module.
					KMM-Hub runs all compute-intensive tasks such as image builds and kmod signing, and prepares the trimmed-down Module to be transferred to the spokes through RHACM.
				
KMM-Hub cannot be used to load kernel modules on the hub cluster. Install the regular edition of KMM to load kernel modules.
4.14.2. Installing KMM-Hub
You can use one of the following methods to install KMM-Hub:
- With the Operator Lifecycle Manager (OLM)
- Creating KMM resources
4.14.2.1. Installing KMM-Hub using the Operator Lifecycle Manager
Use the Operators section of the OpenShift console to install KMM-Hub.
4.14.2.2. Installing KMM-Hub by creating KMM resources
Procedure
- 
								If you want to install KMM-Hub programmatically, you can use the following resources to create the Namespace,OperatorGroupandSubscriptionresources:
4.14.3. Using the ManagedClusterModule CRD
					Use the ManagedClusterModule Custom Resource Definition (CRD) to configure the deployment of kernel modules on spoke clusters. This CRD is cluster-scoped, wraps a Module spec and adds the following additional fields:
				
					If build or signing instructions are present in .spec.moduleSpec, those pods are run on the hub cluster in the operator’s namespace.
				
					When the .spec.selector matches one or more ManagedCluster resources, then KMM-Hub creates a ManifestWork resource in the corresponding namespace(s). ManifestWork contains a trimmed-down Module resource, with kernel mappings preserved but all build and sign subsections are removed. containerImage fields that contain image names ending with a tag are replaced with their digest equivalent.
				
4.14.4. Running KMM on the spoke
					After installing Kernel Module Management (KMM) on the spoke, no further action is required. Create a ManagedClusterModule object from the hub to deploy kernel modules on spoke clusters.
				
Procedure
						You can install KMM on the spokes cluster through a RHACM Policy object. In addition to installing KMM from the OperatorHub and running it in a lightweight spoke mode, the Policy configures additional RBAC required for the RHACM agent to be able to manage Module resources.
					
4.15. Customizing upgrades for kernel modules
Use this procedure to upgrade the kernel module while running maintenance operations on the node, including rebooting the node, if needed. To minimize the impact on the workloads running in the cluster, run the kernel upgrade process sequentially, one node at a time.
This procedure requires knowledge of the workload utilizing the kernel module and must be managed by the cluster administrator.
Prerequisites
- 
						Before upgrading, set the kmm.node.kubernetes.io/version-module.<module_namespace>.<module_name>=$moduleVersionlabel on all the nodes that are used by the kernel module.
- Terminate all user application workloads on the node or move them to another node.
- Unload the currently loaded kernel module.
- Ensure that the user workload (the application running in the cluster that is accessing kernel module) is not running on the node prior to kernel module unloading and that the workload is back running on the node after the new kernel module version has been loaded.
Procedure
- Ensure that the device plugin managed by KMM on the node is unloaded.
- Update the following fields in the - Modulecustom resource (CR):- 
								containerImage(to the appropriate kernel version)
- version- The update should be atomic; that is, both the - containerImageand- versionfields must be updated simultaneously.
 
- 
								
- Terminate any workload using the kernel module on the node being upgraded.
- Remove the - kmm.node.kubernetes.io/version-module.<module_namespace>.<module_name>label on the node. Run the following command to unload the kernel module from the node:- oc label node/<node_name> kmm.node.kubernetes.io/version-module.<module_namespace>.<module_name>- - $ oc label node/<node_name> kmm.node.kubernetes.io/version-module.<module_namespace>.<module_name>-- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
- If required, as the cluster administrator, perform any additional maintenance required on the node for the kernel module upgrade. - If no additional upgrading is needed, you can skip Steps 3 through 6 by updating the - kmm.node.kubernetes.io/version-module.<module_namespace>.<module_name>label value to the new- $moduleVersionas set in the- Module.
- Run the following command to add the - kmm.node.kubernetes.io/version-module.<module_namespace>.<module_name>=$moduleVersionlabel to the node. The- $moduleVersionmust be equal to the new value of the- versionfield in the- ModuleCR.- oc label node/<node_name> kmm.node.kubernetes.io/version-module.<module_namespace>.<module_name>=<desired_version> - $ oc label node/<node_name> kmm.node.kubernetes.io/version-module.<module_namespace>.<module_name>=<desired_version>- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow Note- Because of Kubernetes limitations in label names, the combined length of - Modulename and namespace must not exceed 39 characters.
- Restore any workload that leverages the kernel module on the node.
- Reload the device plugin managed by KMM on the node.
4.16. Day 1 kernel module loading
				Kernel Module Management (KMM) is typically a Day 2 Operator. Kernel modules are loaded only after the complete initialization of a Linux (RHCOS) server. However, in some scenarios the kernel module must be loaded at an earlier stage. Day 1 functionality allows you to use the Machine Config Operator (MCO) to load kernel modules during the Linux systemd initialization stage.
			
4.16.1. Day 1 supported use cases
					The Day 1 functionality supports a limited number of use cases. The main use case is to allow loading out-of-tree (OOT) kernel modules prior to NetworkManager service initialization. It does not support loading kernel module at the initramfs stage.
				
The following are the conditions needed for Day 1 functionality:
- The kernel module is not loaded in the kernel.
- The in-tree kernel module is loaded into the kernel, but can be unloaded and replaced by the OOT kernel module. This means that the in-tree module is not referenced by any other kernel modules.
- In order for Day 1 functionlity to work, the node must have a functional network interface, that is, an in-tree kernel driver for that interface. The OOT kernel module can be a network driver that will replace the functional network driver.
4.16.2. OOT kernel module loading flow
The loading of the out-of-tree (OOT) kernel module leverages the Machine Config Operator (MCO). The flow sequence is as follows:
Procedure
- 
							Apply a MachineConfigresource to the existing running cluster. In order to identify the necessary nodes that need to be updated, you must create an appropriateMachineConfigPoolresource.
- 
							MCO applies the reboots node by node. On any rebooted node, two new systemdservices are deployed:pullservice andloadservice.
- 
							The loadservice is configured to run prior to theNetworkConfigurationservice. The service tries to pull a predefined kernel module image and then, using that image, to unload an in-tree module and load an OOT kernel module.
- 
							The pullservice is configured to run after NetworkManager service. The service checks if the preconfigured kernel module image is located on the node’s filesystem. If it is, the service exists normally, and the server continues with the boot process. If not, it pulls the image onto the node and reboots the node afterwards.
4.16.3. The kernel module image
					The Day 1 functionality uses the same DTK based image leveraged by Day 2 KMM builds. The out-of-tree kernel module should be located under /opt/lib/modules/${kernelVersion}.
				
4.16.4. In-tree module replacement
The Day 1 functionality always tries to replace the in-tree kernel module with the OOT version. If the in-tree kernel module is not loaded, the flow is not affected; the service proceeds and loads the OOT kernel module.
4.16.5. MCO yaml creation
KMM provides an API to create an MCO YAML manifest for the Day 1 functionality:
ProduceMachineConfig(machineConfigName, machineConfigPoolRef, kernelModuleImage, kernelModuleName string) (string, error)
ProduceMachineConfig(machineConfigName, machineConfigPoolRef, kernelModuleImage, kernelModuleName string) (string, error)The returned output is a string representation of the MCO YAML manifest to be applied. It is up to the customer to apply this YAML.
The parameters are:
- machineConfigName
- 
								The name of the MCO YAML manifest. This parameter is set as the nameparameter of the metadata of the MCO YAML manifest.
- machineConfigPoolRef
- 
								The MachineConfigPoolname used to identify the targeted nodes.
- kernelModuleImage
- The name of the container image that includes the OOT kernel module.
- kernelModuleName
- The name of the OOT kernel module. This parameter is used both to unload the in-tree kernel module (if loaded into the kernel) and to load the OOT kernel module.
					The API is located under pkg/mcproducer package of the KMM source code. The KMM operator does not need to be running to use the Day 1 functionality. You only need to import the pkg/mcproducer package into their operator/utility code, call the API, and apply the produced MCO YAML to the cluster.
				
4.16.6. The MachineConfigPool
					The MachineConfigPool identifies a collection of nodes that are affected by the applied MCO.
				
					There are predefined MachineConfigPools in the OCP cluster:
				
- 
							worker: Targets all worker nodes in the cluster
- 
							master: Targets all master nodes in the cluster
					Define the following MachineConfig to target the master MachineConfigPool:
				
metadata:
  labels:
    machineconfiguration.opensfhit.io/role: master
metadata:
  labels:
    machineconfiguration.opensfhit.io/role: master
					Define the following MachineConfig to target the worker MachineConfigPool:
				
metadata:
  labels:
    machineconfiguration.opensfhit.io/role: worker
metadata:
  labels:
    machineconfiguration.opensfhit.io/role: worker4.17. Debugging and troubleshooting
				If the kmods in your driver container are not signed or are signed with the wrong key, then the container can enter a PostStartHookError or CrashLoopBackOff status. You can verify by running the oc describe command on your container, which displays the following message in this scenario:
			
modprobe: ERROR: could not insert '<your_kmod_name>': Required key not available
modprobe: ERROR: could not insert '<your_kmod_name>': Required key not available4.18. KMM firmware support
Kernel modules sometimes need to load firmware files from the file system. KMM supports copying firmware files from the kmod image to the node’s file system.
				The contents of .spec.moduleLoader.container.modprobe.firmwarePath are copied into the /var/lib/firmware path on the node before running the modprobe command to insert the kernel module.
			
				All files and empty directories are removed from that location before running the modprobe -r command to unload the kernel module, when the pod is terminated.
			
4.18.2. Building a kmod image
Procedure
- In addition to building the kernel module itself, include the binary firmware in the builder image: - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
4.18.3. Tuning the Module resource
Procedure
- Set - .spec.moduleLoader.container.modprobe.firmwarePathin the- Modulecustom resource (CR):- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - 1
- Optional: Copies/firmware/*into/var/lib/firmware/on the node.
 
4.19. Day 0 through Day 2 kmod installation
You can install some kernel modules (kmods) during Day 0 through Day 2 operations without Kernel Module Management (KMM). This could assist in the transition of the kmods to KMM.
Use the following criteria to determine suitable kmod installations.
- Day 0
- The most basic kmods that are required for a node to become - Readyin the cluster. Examples of these types of kmods include:- A storage driver that is required to mount the rootFS as part of the boot process
- 
									A network driver that is required for the machine to access machine-config-serveron the bootstrap node to pull the ignition and join the cluster
 
- Day 1
- Kmods that are not required for a node to become - Readyin the cluster but cannot be unloaded when the node is- Ready.- An example of this type of kmod is an out-of-tree (OOT) network driver that replaces an outdated in-tree driver to exploit the full potential of the NIC while - NetworkManagerdepends on it. When the node is- Ready, you cannot unload the driver because of the- NetworkManagerdependency.
- Day 2
- Kmods that can be dynamically loaded to the kernel or removed from it without interfering with the cluster infrastructure, for example, connectivity. - Examples of these types of kmods include: - GPU operators
- Secondary network adapters
- field-programmable gate arrays (FPGAs)
 
4.19.1. Layering background
When a Day 0 kmod is installed in the cluster, layering is applied through the Machine Config Operator (MCO) and OpenShift Container Platform upgrades do not trigger node upgrades.
You only need to recompile the driver if you add new features to it, because the node’s operating system will remain the same.
4.19.2. Lifecycle management
You can leverage KMM to manage the Day 0 through Day 2 lifecycle of kmods without a reboot when the driver allows it.
						This will not work if the upgrade requires a node reboot, for example, when rebuilding initramfs files is needed.
					
Use one of the following options for lifecycle management.
4.19.2.1. Treat the kmod as an in-tree driver
						Use this method when you want to upgrade the kmods. In this case, treat the kmod as an in-tree driver and create a Module in the cluster with the inTreeRemoval field to unload the old version of the driver.
					
Note the following characteristics of treating the kmod as an in-tree driver:
- Downtime might occur as KMM tries to unload and load the kmod on all the selected nodes simultaneously.
- This works if removing the driver makes the node lose connectivity because KMM uses a single pod to unload and load the driver.
4.19.2.2. Use ordered upgrade
						You can use ordered upgrade (ordered_upgrade.md) to create a versioned Module in the cluster representing the kmods with no effect, because the kmods are already loaded.
					
Note the following characteristics of using ordered upgrade:
- There is no cluster downtime because you control the pace of the upgrade and how many nodes are upgraded at the same time; therefore, an upgrade with no downtime is possible.
- This method will not work if unloading the driver results in losing connection to the node, because KMM creates two different worker pods for unloading and another for loading. These pods will not be scheduled.
4.20. Troubleshooting KMM
When troubleshooting KMM installation issues, you can monitor logs to determine at which stage issues occur. Then, retrieve diagnostic data relevant to that stage.
4.20.1. Reading Operator logs
					You can use the oc logs command to read Operator logs, as in the following examples.
				
Example command for KMM controller
oc logs -fn openshift-kmm deployments/kmm-operator-controller
$ oc logs -fn openshift-kmm deployments/kmm-operator-controllerExample command for KMM webhook server
oc logs -fn openshift-kmm deployments/kmm-operator-webhook-server
$ oc logs -fn openshift-kmm deployments/kmm-operator-webhook-serverExample command for KMM-Hub controller
oc logs -fn openshift-kmm-hub deployments/kmm-operator-hub-controller
$ oc logs -fn openshift-kmm-hub deployments/kmm-operator-hub-controllerExample command for KMM-Hub webhook server
oc logs -fn openshift-kmm deployments/kmm-operator-hub-webhook-server
$ oc logs -fn openshift-kmm deployments/kmm-operator-hub-webhook-server4.20.2. Observing events
Use the following methods to view KMM events.
Build & sign
					KMM publishes events whenever it starts a kmod image build or observes its outcome. These events are attached to Module objects and are available at the end of the output of oc describe module command, as in the following example:
				
Module load or unload
					KMM publishes events whenever it successfully loads or unloads a kernel module on a node. These events are attached to Node objects and are available at the end of the output of oc describe node command, as in the following example:
				
4.20.3. Using the must-gather tool
					The oc adm must-gather command is the preferred way to collect a support bundle and provide debugging information to Red Hat Support. Collect specific information by running the command with the appropriate arguments as described in the following sections.
				
4.20.3.1. Gathering data for KMM
Procedure
- Gather the data for the KMM Operator controller manager: - Set the - MUST_GATHER_IMAGEvariable:- export MUST_GATHER_IMAGE=$(oc get deployment -n openshift-kmm kmm-operator-controller -ojsonpath='{.spec.template.spec.containers[?(@.name=="manager")].env[?(@.name=="RELATED_IMAGE_MUST_GATHER")].value}') oc adm must-gather --image="${MUST_GATHER_IMAGE}" -- /usr/bin/gather- $ export MUST_GATHER_IMAGE=$(oc get deployment -n openshift-kmm kmm-operator-controller -ojsonpath='{.spec.template.spec.containers[?(@.name=="manager")].env[?(@.name=="RELATED_IMAGE_MUST_GATHER")].value}') $ oc adm must-gather --image="${MUST_GATHER_IMAGE}" -- /usr/bin/gather- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow Note- Use the - -n <namespace>switch to specify a namespace if you installed KMM in a custom namespace.
- Run the - must-gathertool:- oc adm must-gather --image="${MUST_GATHER_IMAGE}" -- /usr/bin/gather- $ oc adm must-gather --image="${MUST_GATHER_IMAGE}" -- /usr/bin/gather- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
 
- View the Operator logs: - oc logs -fn openshift-kmm deployments/kmm-operator-controller - $ oc logs -fn openshift-kmm deployments/kmm-operator-controller- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example 4.1. Example output - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
4.20.3.2. Gathering data for KMM-Hub
Procedure
- Gather the data for the KMM Operator hub controller manager: - Set the - MUST_GATHER_IMAGEvariable:- export MUST_GATHER_IMAGE=$(oc get deployment -n openshift-kmm-hub kmm-operator-hub-controller -ojsonpath='{.spec.template.spec.containers[?(@.name=="manager")].env[?(@.name=="RELATED_IMAGE_MUST_GATHER")].value}') oc adm must-gather --image="${MUST_GATHER_IMAGE}" -- /usr/bin/gather -u- $ export MUST_GATHER_IMAGE=$(oc get deployment -n openshift-kmm-hub kmm-operator-hub-controller -ojsonpath='{.spec.template.spec.containers[?(@.name=="manager")].env[?(@.name=="RELATED_IMAGE_MUST_GATHER")].value}') $ oc adm must-gather --image="${MUST_GATHER_IMAGE}" -- /usr/bin/gather -u- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow Note- Use the - -n <namespace>switch to specify a namespace if you installed KMM in a custom namespace.
- Run the - must-gathertool:- oc adm must-gather --image="${MUST_GATHER_IMAGE}" -- /usr/bin/gather -u- $ oc adm must-gather --image="${MUST_GATHER_IMAGE}" -- /usr/bin/gather -u- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
 
- View the Operator logs: - oc logs -fn openshift-kmm-hub deployments/kmm-operator-hub-controller - $ oc logs -fn openshift-kmm-hub deployments/kmm-operator-hub-controller- Copy to Clipboard Copied! - Toggle word wrap Toggle overflow - Example 4.2. Example output - Copy to Clipboard Copied! - Toggle word wrap Toggle overflow 
        Legal Notice
        
          
            
          
        
      
 
Copyright © 2025 Red Hat
OpenShift documentation is licensed under the Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0).
Modified versions must remove all Red Hat trademarks.
Portions adapted from https://github.com/kubernetes-incubator/service-catalog/ with modifications by Red Hat.
Red Hat, Red Hat Enterprise Linux, the Red Hat logo, the Shadowman logo, JBoss, OpenShift, Fedora, the Infinity logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries.
Linux® is the registered trademark of Linus Torvalds in the United States and other countries.
Java® is a registered trademark of Oracle and/or its affiliates.
XFS® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries.
MySQL® is a registered trademark of MySQL AB in the United States, the European Union and other countries.
Node.js® is an official trademark of Joyent. Red Hat Software Collections is not formally related to or endorsed by the official Joyent Node.js open source or commercial project.
The OpenStack® Word Mark and OpenStack logo are either registered trademarks/service marks or trademarks/service marks of the OpenStack Foundation, in the United States and other countries and are used with the OpenStack Foundation’s permission. We are not affiliated with, endorsed or sponsored by the OpenStack Foundation, or the OpenStack community.
All other trademarks are the property of their respective owners.