Chapter 5. Kernel Module Management Operator release notes
5.1. Release notes for Kernel Module Management Operator 2.2
5.1.1. New features
- KMM is now using the CRI-O container engine to pull container images in the worker pod instead of using HTTP calls directly from the worker container. For more information, see Example Module CR.
-
The Kernel Module Management (KMM) Operator images are now based on
rhel-els-minimal
container images instead of therhel-els
images. This change results in a greatly reduced image footprint, while still maintaining FIPS compliance. - In this release, the firmware search path has been updated to copy the contents of the specified path into the path specified in worker.setFirmwareClassPath (default: /var/lib/firmware). For more information, see Example Module CR.
- For each node running a kernel matching the regular expression, KMM now checks if you have included a tag or a digest. If you have not specified a tag or digest in the container image, then the validation webhook returns an error and does not apply the module. For more information, see Example Module CR.
5.2. Release notes for Kernel Module Management Operator 2.3
5.2.1. New features
- In this release, KMM uses version 1.23 of the Golang programming language to ensure test continuity for partners.
- You can now schedule KMM pods by defining taints and tolerations. For more information, see Using tolerations for kernel module scheduling.
5.3. Release notes for Kernel Module Management Operator 2.4
5.3.1. New features and enhancements
- In this release, you now have the option to configure the Kernel Module Management (KMM) module to not load an out-of-tree kernel driver and use the in-tree driver instead, and run only the device plugin. For more information, see Using in-tree modules with the device plugin.
In this release, KMM configurations are now persistent after cluster and KMM Operator upgrades and redeployments of KMM.
In earlier releases, a cluster or KMM upgrade, or any other action, such as upgrading a non-default configuration like the firmware path that redeploys KMM, could create the need to reconfigure KMM. In this release, KMM configurations now remain persistent regardless of any of such actions.
For more information, see Configuring the Kernel Module Management Operator.
- Improvements have been added to KMM so that GPU Operator vendors do not need to replicate KMM functionality in their code, but instead use KMM as is. This change greatly improves Operators' code size, tests, and reliability.
-
In this release, KMM no longer uses HTTP(S) direct requests to check if a kmod image exists. Instead, CRI-O is used internally to check for the images. This mitigates the need to access container image registries directly from HTTP(S) requests and manually handle tasks such as reading
/etc/containers/registries.conf
for mirroring configuration, accessing the image cluster resource for TLS configuration, mounting the CAs from the node, and maintaining your own cache in Hub & Spoke.
- The KMM and KMM-hub Operators have been assigned the "Meets Best Practices" label in the Red Hat Catalog.
-
You can now install KMM on compute nodes, if needed. Previously, it was not possible to deploy workloads on the control-plane nodes. Because the compute nodes do not have the
node-role.kubernetes.io/control-plane
ornode-role.kubernetes.io/master
labels, the Kernel Module Management Operator might need further configurations. An internal code change has resolved this issue.
In this release, the heartbeat filter for the NMC reconciler has been updated to filter the following events on nodes:
-
node.spec
-
metadata.labels
-
status.nodeInfo
-
status.conditions[]
(NodeReady
only) and still filtering heartbeats
-
5.3.2. Notable technical changes
- In this release, the preflight validation resource in the cluster has been modified. You can use the preflight validation to verify kernel modules to be installed on the nodes after cluster upgrades and possible kernel upgrades. Preflight validation also reports on the status and progress of each module in the cluster that it attempts or has attempted to validate. For more information, see Preflight validation for Kernel Module Management (KMM) Modules.
-
A requirement when creating a kmod image is that both the
.ko
kernel module files and thecp
binary must be included, which is required for copying files during the image loading process. For more information, see Creating a kmod image.
-
The
capabilities
field that refers to the Operator maturity level has been changed fromBasic Install
toSeamless upgrades
.Basic Install
indicates that the Operator does not have an upgrade option. This is not the case for KMM, where seamless upgrades are supported.
5.3.3. Bug fixes
Webhook deployment has been renamed from
webhook-server
towebhook
.-
Cause: Generating files with
controller-gen
generated a service calledwebhook-service
that is not configurable. And, when deploying KMM with Operator Lifecycle Manager (OLM), OLM deploys a service for the webhook called-service
. -
Consequence: Two services were generated for the same deployment. One generated by
controller-gen
and added to the bundle manifests and the other that the OLM created. -
Fix: Make OLM find an already existing service called
webhook-service
in the cluster because the deployment is calledwebhook
. - Result: A second service is no longer created.
-
Cause: Generating files with
Using
imageRepoSecret
object in conjunction with DTK as the image stream results inauthorization required
error.-
Cause: On the Kernel Module Management (KMM) Operator, when you set
imageRepoSecret
object in the KMM module, and the build’s resulting container image is defined to be stored in the cluster’s internal registry, the build fails to push the final image and generate anauthorization required
error. - Consequence: The KMM Operator does not work as expected.
-
Fix: When the
imageRepoSecret
object is user-defined, it is used as both a pull and push secret by the build process. To support using the cluster’s internal registry, you must add the authorization token for that registry to theimageRepoSecret
object. You can obtain the token from the "build" service account of the KMM module’s namespace. - Result: The KMM Operator works as expected.
-
Cause: On the Kernel Module Management (KMM) Operator, when you set
Creating or deleting the image or creating an MCM module does not load the module on the spoke.
-
Cause: In a hub and spoke environment, when creating or deleting the image in registry, or when creating a
ManagedClusterModule
(MCM), the module on the spoke cluster is not loaded. - Consequence: The module on the spoke is not created.
- Fix: Remove the cache package and image translation from the hub and spoke environment.
- Result: The module on the spoke is created for the second time the MCM object is created.
-
Cause: In a hub and spoke environment, when creating or deleting the image in registry, or when creating a
KMM cannot pull images from the private registry while doing in-cluster builds.
- Cause: The Kernel Module Management (KMM) Operator cannot pull images from private registry while doing in-cluster builds.
- Consequence: Images in private registries that are used in the build process can not be pulled.
-
Fix: The
imageRepoSecret
object configuration is now also used in the build process. TheimageRepoSecret
object specified must include all registries that are being used. - Result: You can now use private registries when doing in-cluster builds.
KMM worker pod is orphaned when deleting a module with a container image that can not be pulled.
- Cause: A Kernel Module Management (KMM) Operator worker pod is orphaned when deleting a module with a container image that can not be pulled.
- Consequence: Failing worker pods are left on the cluster and at no point being collected for garbage.
- Fix: KMM, now collects orphaned failing pods upon the modules deletion for garbage.
- Result: The module is successfully deleted, and all associated orphaned failing pods are also deleted.
The KMM Operator tries to create a MIC even when the node selector does not match.
- Cause: The Kernel Module Management (KMM) Operator tries to create a 'ModuleImagesConfig' (MIC) resource even when the node selector does not match with any actual nodes and fails.
- Consequence: The KMM Operator reports an error when reconciling a module that does not target any node.
-
Fix: The
Images
field in the MIC resource is now optional. - Result: The KMM Operator can successfully create the MIC resource even when there are no images in it.
KMM does not reload the kernel module in case the node reboot sequence is too quick.
- Cause: The Kernel Module Management (KMM) Operator does not reload the kernel module in case the node reboot sequence is too quick. The reboot is determined based on the timestamp of the status condition being later than the timestamp in the Node Machine Configuration (NMC) status.
- Consequence: When the reboot happens quickly, in less time than the grace period, the node state does not change. After the node reboots, KMM does not load the kernel module again.
-
Fix: Instead of relying on the condition state, NMC can rely on the
Status.NodeInfo.BootID
field. This field is set by kubelet based on the/proc/sys/kernel/random/boot_id
file of the server node, so it is updated after each reboot. - Result: The more accurate timestamps enable the Kernel Module Management (KMM) Operator to reload the kernel module after the node reboot sequence.
Filtering out node heartbeats events for the Node Machine Configuration (NMC) controller.
- Cause: The NMC controller gets spammed with events from node heartbeats. The node heartbeats let the Kubernetes API server know that the node is still connected and functional.
- Consequence: The spamming causes a constant reconciliation even when there is no module, and therefore no NMC, are applied to the cluster.
- Fix: The NMC controller now filter the node’s heartbeat from its reconciliation loop.
- Result: The NMC controller only gets real events and filters out node heartbeats.
NMC status contains toleration values, even though there are no tolerations in the
NMC.spec
or in the module.-
Cause: The Node Machine Configuration (NMC) status contains toleration values, even though there are no tolerations in the
NMC.spec
or in the module. - Consequence: Tolerations other than Kernel Module Management-specific tolerations can appear in the status.
- Fix: The NMC status now gets its toleration from a dedicated annotation rather than from the worker pod.
- Result: The NMC status only contains the module’s tolerations.
-
Cause: The Node Machine Configuration (NMC) status contains toleration values, even though there are no tolerations in the
The KMM Operator version 2.4 fails to start properly and cannot list the
\modulebuildsignconfigs\
resource.- Cause: On the Kernel Module Management (KMM) Operator, when the Operator is installed using Red Hat Konflux, it does not start properly because the log files contain errors.
- Consequence: The KMM Operator does not work as expected.
-
Fix: The Cluster Service Version (CSV) file is updated to list the
\modulebuildsignconfigs\
and themoduleimagesconfig
resources . - Result: The KMM Operator works as expected.
The Red Hat Konflux build does not include version and git commit ID in the Operator logs.
- Cause: On the Kernel Module Management (KMM) Operator, when the Operator was built using Communications Platform as a Service (CPaas), the build included the Operator version and git commit ID in the log files. However, with Red Hat Konflux these details are not included in the log files.
- Consequence: Important information is missing from the log files.
- Fix: Some modifications are introduced in Konflux to resolve this issue.
- Result: The KMM Operator build now includes the Operator version and git commit ID in the log files.
The KMM Operator does not load the module after node with taint is rebooted.
- Cause: The Kernel Module Management (KMM) Operator does not reload the kernel module in case the node reboot sequence is too quick. The reboot is determined based on the timestamp of the status condition being later than the timestamp in the Node Machine Configuration (NMC) status.
- Consequence: When the reboot happens quickly, in less time than the grace period, the node state does not change. After the node reboots, KMM does not load the kernel module again.
-
Fix: Instead of relying on the condition state, NMC can rely on the
Status.NodeInfo.BootID
field. This field is set by kubelet based on the /proc/sys/kernel/random/boot_id file of the server node, so it is updated after each reboot. - Result: The more accurate timestamps enable the Kernel Module Management (KMM) Operator to reload the kernel module after the node reboot sequence.
Redeploying a module that uses in-cluster builds fails with the
ImagePullBackOff
policy.- Cause: On the Kernel Module Management (KMM) Operator, the image pull policy for the puller pod and the worker pod is different.
- Consequence: An image can be considered as existing while, in fact, it is not.
- Fix: Make the image pull policy of the pull pod the same as the pull policy defined in the KMM module since its the same policy that is used by the worker pod.
- Result: The MIC represents the state of the image in the same way the worker pod accesses it.
The MIC controller creates two pull-pods when it should create just one.
-
Cause: On the Kernel Module Management (KMM) Operator, the
ModuleImagesConfig
(MIC) controller may create multiple pull-pods for the same image. - Consequence: Resources are not used appropriately or as intended.
-
Fix: The
CreateOrPatch
MIC API receives a slice ofImageSpecs
, as the input is created by going over the the target nodes and adding their images to the slice, so any duplicateImageSpecs
, are now filtered out. - Result: The KMM Operator works as expected.
-
Cause: On the Kernel Module Management (KMM) Operator, the
The
job.dcDelay
example in the documentation should specify0s
instead of0
.-
Cause: The Kernel Module Management (KMM) Operator default
job.gcDelay
duration field is0s
but the documentation mentions the value as0
. -
Consequence: Entering a custom value of
60
instead of60s
or1m
might result in an error due to the wrong input type. -
Fix: The
job.gcDelay
field in the documentation is updated to default value of0s
. - Result: Users are less likely to get confused.
-
Cause: The Kernel Module Management (KMM) Operator default
The KMM Operator Hub environment does not work because of missing MIC and MBSC CRDs.
-
Cause: The Kernel Module Management (KMM) Operator hub environment only generates Custom Resource Definitions (CRD) files based on the
api-hub/
directory. As a result, this does not contain some CRDs that are required for the KMM Operator Hub environment, such as,ModuleImagesConfig
(MIC) resource and Managed Kubernetes Service (MBSC). - Consequence: The KMM Operator hub environment cannot work because it tries to start controllers reconciling CRDs that do not exist in the cluster.
-
Fix: The fix generates all CRD files into the
config/crd-hub/bases
directory, but only applies the resources to the cluster that it actually needs. - Result: The KMM Operator hub environment works as expected.
-
Cause: The Kernel Module Management (KMM) Operator hub environment only generates Custom Resource Definitions (CRD) files based on the
The KMM OperatorHub environment cannot build when finalisers are not set on a resource.
-
Cause: The Kernel Module Management (KMM) Operator displays an error with the
ManagedClusterModule
controller failing to build. This is due to the missingModuleImagesConfig
(MIC) resource finalizers and Role-based Action Control (RBAC) permissions for the KMM OperatorHub environment. - Consequence: The KMM OperatorHub environment cannot build images.
- Fix: The RBAC permissions are updated to allow updating finalizers on the MIC resource, and then the appropriate rules created.
-
Result: The KMM OperatorHub environment builds images without errors with the
ManagedClusterModule
controller.
-
Cause: The Kernel Module Management (KMM) Operator displays an error with the
The
PreflightValidationOCP
custom resource, with akernelVersion: tesdt
causes the KMM Operator to panic.-
Cause: Creating a
PreflightValidationOCP
custom resource (CR), with akernelVersion
flag that is set totesdt
, causes the Kernel Module Management (KMM) Operator to generate a panic runtime error. - Consequence: Entering invalid kernel versions causes the KMM Operator to panic.
-
Fix: A webhook - a method for one application to automatically send real-time data to another application when a specific event occurs - is now added to the
PreflightValidationOCP
CR. -
Result: The
PreflightValidationOCP
CR with invalid kernel versions can no longer be applied to the cluster, therefore, preventing the Operator from generating a panic runtime error.
-
Cause: Creating a
The
PreFflightValidationOCP
custom resource, with akernelVersion
flag that is different that the one of the cluster, does not work.-
Cause: Creating a
PreflightValidationOCP
custom resource (CR), with akernelVersion
flag that is different from the one of the cluster, does not work. - Consequence: The Kernel Module Management (KMM) Operator is unable to find the Driver Toolkit (DTK) input image for the new kernel version.
-
Fix: You must use the
PreflightValidationOCP
CR and explicitly set thedtkImage
field in the CR. -
Result: Using the fields
kernelVersion
anddtkImage
the feature can build installed modules for target OpenShift Container Platform versions.
-
Cause: Creating a
The KMM Operator version 2.4 documentation is updated with
PreflightValidationOCP
information.-
Cause: Previously, when creating an
PreflightValidationOCP
CR, you were required to supply the release-image. This has now changed and you need to set thekernelVersion
thedtkImage
fields. - Consequence: The documentation was outdated and required an update.
- Fix: The documentation is updated with the new support details.
- Result: The KMM preflight feature is documented as expected.
-
Cause: Previously, when creating an
5.3.4. Known issues
The
ModuleUnloaded
event does not appear when a module isUnloaded
.-
Cause: When a module is
Loaded
(using the create aModuleLoad
event) orUnloaded ` (using the create a `ModuleUnloaded
event) the events might not appear. This happens when you load and unload the kernel module in a quick succession. -
Consequence: The
ModuleLoad
and theModuleUnloaded
events might not appear in OpenShift Container Platform. - Fix: Introduce an alerting mechanism for this potential behavior and for awareness when working with modules.
- Result: Not yet available.
-
Cause: When a module is
5.4. Release notes for Kernel Module Management Operator 2.4.1
5.4.1. Known issues
If you are running KMM-hub version 2.3.0 or earlier and you are not running KMM, the upgrade to KMM-hub 2.4.0 is not reliable. Instead, you must upgrade to KMM-hub 2.4.1. KMM is not affected by this issue. For more information, see RHEA-2025:10778 - Product Enhancement Advisory.