Chapter 7. Bug fixes
This section describes the notable bug fixes introduced in Red Hat OpenShift Data Foundation 4.20.
7.1. Multicloud Object Gateway Copy linkLink copied to clipboard!
Noobaa certificate verification for NamespaceStore endpoints
Previously, missing validation of CA bundle when mounting NamespaceStore endpoints caused failures in loading and consuming provided CA bundles. Validation for CA bundles has now been added to ensure proper certificate verification.
Support for AWS region
ap-east-2in Noobaa operatorPreviously, the
ap-east-2region was missing from the MCG operator-supported regions list, preventing creation of a default BackingStore when deployed in this region. The missing region has now been added to the supported list.
Noobaa no longer fails to issue deletes to RGW
A configuration change caused delays in deleting large numbers of small objects from the underlying RGW storage. This impacted performance during high-volume delete operations. The issue was resolved by reverting the configuration change, eliminating the delay in deletion from the underlying storage.
7.2. Disaster recovery Copy linkLink copied to clipboard!
ACM console view persistence on hard refresh
Previously, a hard refresh from the ACM console caused the view to revert to the OCP (local-cluster) console. This was because Multicluster Orchestrator console routes were not registered properly for ACM (all clusters) view, which disrupted the expected navigation behavior. The routing logic has now been corrected, and refreshing the browser no longer changes the active view. Users remain in the ACM console as intended.
DR status now visible for VMs
The DR Status was missing on the VM list page, and the Remove disaster recovery option was not available when managing the VMs protected using label selectors. This happened because the UI could not correctly identify the VM’s cluster and its DRPC.
The issue was fixed by reading the VM cluster from the correct field and improving how DRPCs are parsed when label selectors are used. Now, both the DR Status and the Remove disaster recovery options work as expected.
Disabling DR for a CephFS application with consistency groups enabled no longer leaves some resources behind
Disabling DR for a CephFS application with consistency groups enabled no longer leaves any resources behind. Manual cleanup is no longer required.
s3StoreProfile in ramen-hub-operator-config after upgrade from 4.18 to 4.19
Previously, after upgrading from 4.18 to 4.19, the
ramen-hub-operator-configConfigMap was overwritten with default values from the Ramen-hub CSV. This caused loss of custom S3Profiles and other configurations added by the Multicluster Orchestrator (MCO) operator. The issue has been fixed to preserve custom entries during upgrade, preventing disruption in S3 profile configurations.
virtualmachines.kubevirt.ioresource no longer fails restore due to mac allocation failure on relocatePreviously, when a virtual machine was relocated back to the preferred cluster, the relocation could fail because its MAC address was unavailable. This occurred if the virtual machine was not fully cleaned up on the preferred cluster after being failed over to the failover cluster. This cleanup process has been corrected, ensuring successful relocation to the preferred cluster.
Failover process no longer fails when the
ReplicationDestinationresource has not been created yetPreviously, if the user initiated a failover before the
LastGroupSyncTimewas updated, the failover process would fail. This failure was accompanied by an error message indicating that theReplicationDestinationdoes not exist.This issue has been resolved, and failover works as expected.
After Relocation of consistency groups based workload, synchronization no longer stops
Previously, when applications using CephRBD volumes with volume consistency groups were running and the secondary managed cluster went offline, replication for these volumes could stop indefinitely—even after the secondary cluster came back online. During this condition, the
Volume SynchronizationDelayalert was triggered, starting with aWarningstatus and later escalating toCritical, indicating replication had ceased for the affected volumes. This issue has been resolved to ensure replication resumes automatically when the secondary cluster is restored.
7.3. Rook Copy linkLink copied to clipboard!
Ceph monitor endpoints fully visible
Previously, only one of the three Ceph monitor endpoints was showing up due to missing entries in the CSI ConfigMap. This meant only one of the mons was there for fault tolerance.
The issue was fixed by adding all monitor endpoints to the ConfigMap. Now, all mons are visible, and CSI communication is fault-tolerant.
7.4. OpenShit Data Foundation console Copy linkLink copied to clipboard!
Fixed StorageSystem creation wizard issues
Previously, the Network Type field for Host was missing, resulting in empty network details and a misleading tooltip that described Multus instead of the actual host configuration. This caused confusion in the summary view, where users saw no network information and an inaccurate tooltip.
With this update, the tooltips were removed and replaced with radio buttons featuring correct labels and descriptions.
Force delete option restored for stuck StorageConsumer
Previously, users were unable to forcefully delete a
StorageConsumerresource if it was stuck in a deletion state due to the presence of adeletionTimeStamp.This issue has been resolved by updating the Actions menu to enable Delete StorageConsumer even when a
deletionTimeStampis present. As a result, you can force deleteStorageConsumerresources when required.
Fix for Disaster Recovery misconfiguration after upgrade from v4.17.z to v4.18
Previously, the upgrade process resulted in incorrect DR resource configurations, impacting workloads that rely on
ocs-storagecluster-ceph-rbdandocs-storagecluster-ceph-rbd-virtualizationstorage classes.With this fix, the DR resources are correctly configured after the upgrade.
Warning message in the UI right after creation of StorageCluster no longer appears
Previously, a warning popup appeared in the UI during the creation of a StorageSystem or StorageCluster. This was caused by the Virtualization StorageClass not being annotated with
storageclass.kubevirt.io/is-default-virt-class: "true", by default, after deployment.With this fix, the required annotation is applied automatically, preventing unnecessary warnings.
PVC type misclassification resolved in UI
Previously, the UI was incorrectly displaying block PVCs as filesystem PVCs due to outdated filtering method that relied on assumptions based on VRG naming conventions. This led to confusion as the PVC type was inaccurately reported.
To address this, the filter distinguishing block and filesystem PVCs is removed, acknowledging that a group can contain both types. This change eliminates misclassification and ensures accurate representation of PVCs in the UI.
Bucket Lifecycle rule deletion now supported
Previously it was not possible to delete the last remaining bucket lifecycle rule due to a backend error—attempting to update the
LifecycleConfigurationwith empty rules triggered a 500 response.This has been fixed by switching to
deleteBucketLifecyclefor cases where the entire lifecycle configuration needs to be cleaned up. As a result, you can delete all bucket lifecycle rules without encountering errors.
CephFS volume filtering corrected in the UI
Previously, the UI filtering for CephFS volumes was not functioning correctly and also mistakenly excluded the CephFS PVCs when the "block" option was selected. This was due to the outdated filtering method based on VRG naming assumptions that no longer apply.
To resolve this, the block/filesystem filter is removed, recognizing that a group might contain both types of PVCs. This fix eliminates misclassification and ensures accurate display of CephFS volumes in the UI.
Resource distribution disabled for internal client
Previously, the UI allowed users to distribute resources, such as StorageClasses, to the local or internal client, including adding or removing them. While backend logic would automatically restore removed resources, this behavior was misleading from a user experience perspective.
To improve clarity, the internal client row has been disabled in the client table, and the Distribute resources option has been removed from the Action menu for internal
StorageConsumerentries. You can no longer perform resource distribution actions on the internal client using the UI.
Alert for essential OpenShift Data Foundation pods down during capacity addition
Previously, there was no test to check if the essential OpenShift Data Foundation pods were working, leading to an error when adding capacity.
To address this issue, if essential pods are down when attempting to add capacity, the user is alerted and not allowed to proceed.
Support external Red Hat Ceph Storage deployment on KubeVirt nodes
Previously, on OpenShift Container Platform deployed on KubeVirt nodes, there was no option to deploy OpenShift Data Foundation with external Red Hat Ceph Storage (RHCS) due to the Infrastructure CR reporting
oVirtandKubeVirtas separate platforms.With this fix,
KubeVirtis added to the allowed list of platforms. As a result, you can create or link external RHCS storage systems from the UI.
7.5. OCS operator Copy linkLink copied to clipboard!
Missing Toleration for Prometheus Operator in ROSA HCP Deployments
Previously, it was required to manually patch the pod after creation to apply tolerations.
With this fix, an issue where the prometheus-operator pod in Red Hat OpenShift Service on AWS (ROSA) with hosted control planes (HCP) that was missing the required tolerations was fixed. The tolerations are correctly applied during deployment, eliminating the need for manual intervention.
Service "ocs-provider-server" is invalid: spec.ports[0].nodePort: Invalid value: 31659: provided port is already allocatederror no longer appears while reconcilingPreviously, the
ocs-operatordeployed a service using port 31659, which could conflict with an existing nodePort service that is already using the same port. This conflict caused theocs-operatordeployment to fail, resulting in upgrade reconciliation getting stuck.With this fix, the port allocation is handled more safely to avoid clashes with existing services.
ocs-metrics-exporterinherits node selectorPreviously, the
ocs-metrics-exporterdid not inherit the node selector configuration, causing scheduling issues. This has been resolved by ensuring the node selector is properly applied, as detailed in this Red Hat Solution.
7.6. Ceph monitoring Copy linkLink copied to clipboard!
Clone count alert now fires promptly when 200+ clones are created
The clone count alert was previously stuck in a
Pendingstate and failed to fire in a timely manner when over 200 clones were created. This was caused by the alert’s firing threshold being set to 30 minutes, resulting in a long delay. To resolve this, the firing time was reduced from 30 minutes to 30 seconds. As a result, the alert now fires as expected, providing timely notifications when the clone count exceeds the threshold.
Correct runbook URL for
HighRBDCloneSnapshotCountalertThe runbook URL linked to the 'HighRBDCloneSnapshotCount' alert was previously incorrect, leading users to a non-existent help page. This issue has been fixed by updating the alert configuration with the correct URL.