4.19 Release Notes

Red Hat OpenShift Data Foundation 4.19

Release notes for features and enhancements, known issues, and other important information.

Red Hat Storage Documentation Team

Abstract

The release notes for Red Hat OpenShift Data Foundation 4.19 summarizes all new features and enhancements, notable technical changes, and any known bugs upon general availability.

Chapter 1. Overview
Copy link

Red Hat OpenShift Data Foundation is software-defined storage that is optimized for container environments. It runs as an operator on OpenShift Container Platform to provide highly integrated and simplified persistent storage management for containers.

Red Hat OpenShift Data Foundation is integrated into the latest Red Hat OpenShift Container Platform to address platform services, application portability, and persistence challenges. It provides a highly scalable backend for the next generation of cloud-native applications, built on a technology stack that includes Red Hat Ceph Storage, the Rook.io Operator, and NooBaa’s Multicloud Object Gateway technology.

Red Hat OpenShift Data Foundation is designed for FIPS. When running on RHEL or RHEL CoreOS booted in FIPS mode, OpenShift Container Platform core components use the RHEL cryptographic libraries submitted to NIST for FIPS Validation on only the x86_64, ppc64le, and s390X architectures. For more information about the NIST validation program, see Cryptographic Module Validation Program. For the latest NIST status for the individual versions of the RHEL cryptographic libraries submitted for validation, see Compliance Activities and Government Standards.

Red Hat OpenShift Data Foundation provides a trusted, enterprise-grade application development environment that simplifies and enhances the user experience across the application lifecycle in a number of ways:

Provides block storage for databases.
Shared file storage for continuous integration, messaging, and data aggregation.
Object storage for cloud-first development, archival, backup, and media storage.
Scale applications and data exponentially.
Attach and detach persistent data volumes at an accelerated rate.
Stretch clusters across multiple data-centers or availability zones.
Establish a comprehensive application container registry.
Support the next generation of OpenShift workloads such as Data Analytics, Artificial Intelligence, Machine Learning, Deep Learning, and Internet of Things (IoT).
Dynamically provision not only application containers, but data service volumes and containers, as well as additional OpenShift Container Platform nodes, Elastic Block Store (EBS) volumes and other infrastructure services.

1.1. About this release
Copy link

Red Hat OpenShift Data Foundation 4.19 is now available. New enhancements, features, and known issues that pertain to OpenShift Data Foundation 4.19 are included in this topic.

Red Hat OpenShift Data Foundation 4.19 is supported on the Red Hat OpenShift Container Platform version 4.19. For more information, see Red Hat OpenShift Data Foundation Supportability and Interoperability Checker.

For Red Hat OpenShift Data Foundation life cycle information, refer Red Hat OpenShift Data Foundation Life Cycle.

Chapter 2. New features
Copy link

This section describes new features introduced in Red Hat OpenShift Data Foundation 4.19.

2.1. Disaster recovery solution
Copy link

2.1.1. Multi volume consistency for Disaster Recovery
Copy link

Red Hat OpenShift Data Foundation Disaster Recovery (DR) provides crash consistent multi-volume consistency groups for Regional-DR to be used by applications that are deployed over multiple volumes. This is especially important for VirtualMachines that sometimes have multiple disks attached to it. For more information, see Multi-volume consistency for disaster recovery.

2.1.2. Replication delay for RHACM applications
Copy link

Health status for the Red Hat Advanced Cluster Management (RHACM) managed applications is displayed on the application list page which helps to monitor the health status of the disaster recovery. For more information, see Viewing health status of ApplicationSet-based and Subscription-based applications.

2.1.3. Additional disaster recovery recipe capabilities for CephFS-based applications
Copy link

The capabilities of DR recipes are enhanced to support more applications to provide automated disaster recovery for CephFS based applications that are deployed with imperative models.

2.1.4. Multiple storage classes in RHACM managed clusters for Regional Disaster Recovery operations
Copy link

Red Hat Advanced Cluster Management (RHACM) managed clusters allow replication of data that is using non-default storage classes that are managed by OpenShift Data Foundation. This is important for customers leveraging replica 2 storage classes. This is available for Regional DR using Ceph RBD block volumes.

2.2. Multicloud Object Gateway
Copy link

2.2.1. High availability option for Multicloud Object Gateway metadata database
Copy link

Starting with this release, Multicloud Object Gateway (MCG) runs with high availability for metadata databases (DB). This helps to avoid a single point of failure for MCG DB, which puts data at risk in case of a node failure.

2.2.2. Cross-origin resource sharing support for Multicloud object Gateway buckets
Copy link

Cross-origin resource sharing (CORS) is supported for Multicloud object Gateway buckets for increased coverage and compatibility with AWS::S3. CORS defines a way for client web applications that are loaded in one domain to interact with resources in a different domain.

For more information, see Creating Cross Origin Resource Sharing (CORS) rule and Editing Cross Origin Resource Sharing (CORS) rule.

2.2.3. PublicAccessBlock policy option for Multicloud object Gateway
Copy link

An overriding policy option can be created to block public access to the buckets. This increases the compatibility with Amazon S3. This option enables administrators and bucket owners to limit public access to their resources that are enforced regardless of how the resources are created.

For more information, see Configuring or modifying the PublicAccessBlock configuration for S3 bucket.

2.2.4. Additional expiration rules in Multicloud Object Gateway lifecycle configuration
Copy link

Additional expiration rules are supported to avoid size inflation and manual operations to identify and delete the undesired objects. These rules help the bucket owner to better control the usage of storage and fine-tune the objects that they want to keep or expire. The following rules are supported and these rules are available in the user interface in object browser:

S3 API for the missing expiration rules.
NoncurrentVersionExpiration rule
AbortIncompleteMultipartUpload rule
ExpiredObjectDeleteMarker rule

For more information, see Lifecycle bucket configuration in Multicloud Object Gateway.

2.3. Automatic scaling of storage for dynamic storage devices
Copy link

Automatic capacity scaling can be enabled on clusters deployed using dynamic storage devices. When automatic scaling is enabled, additional raw capacity equivalent to the configured deployment size is automatically added to the cluster after the used capacity reaches 70%.

For more information, see the Creating OpenShift Data Foundation cluster section in your respective deployment guides.

2.4. Prevention of unauthorized volume mode conversion
Copy link

Volume mode conversion is prevented during the volume mode restore when the original volume mode of the persistent volume claim (PVC) for which the snapshot is taken does not match the volume mode of the newly created PVC that is created from the existing volume snapshot. This also helps to verify that conversions work properly.

2.5. Easy configuration of Ceph target size ratio
Copy link

The target size ratio parameter can be set depending on the cluster usage and how the cluster is expected to fill among the three types of storage: block, shared filesystem, and object storage. The target size ratio is a relative value that influences the allocation of Ceph Placement Groups across storage pools.

For more information, see Configuring Ceph target size ratios.

2.6. Reduce data transfer and improve performance using read affinity for RGW
Copy link

In Local Storage deployments, the rados_replica_read_policy is set to localize for the RADOS Gateway (RGW) daemons. This helps to reduce the data transfer costs and improve performance by routing all the RGW read requests to the nearest OSD. For more information, see Performing localized reads.

Chapter 3. Enhancements
Copy link

This section describes the major enhancements introduced in Red Hat OpenShift Data foundation 4.19.

3.1. Multicloud Object Gateway object browser
Copy link

Multicloud Object Gateway (MCG) object browser is enhanced to provide additional browsing features, such as setting expiration policy, setting bucket policy, sorting the buckets by name, modified time, size, and versioning management.

For more information, see Creating and managing buckets using MCG object browser.

3.2. Multicloud Object Gateway backing store data distribution and rebalancing
Copy link

Previously, MCG backing stores configured with multiple volumes and a 'spread' policy did not evenly distribute preexisting data across volumes. As a result, NooBaa continued writing only to volumes with available space, leading to uneven utilization and repeated “nearly full” PVC alerts.

With this release, MCG automatically balances data evenly across all volumes in a backing store configured with a 'spread' policy, so that storage efficiency is maximized without needing to manually redistribute data. This allows scaling storage easily and maintaining consistent and efficient storage utilization.

3.3. Allow to modify the matchLabel of topologySpreadConstraints for MCG’s backing store pods
Copy link

Previously, MCG always set the same label for a PV-Pool backing store for deployments with multiple such backing stores

With this release, modifying the matchLabel of topologySpreadConstraints for MCG’s backing store pods is allowed as it is needed to differentiate between the pods using different labels.

3.4. Increased placement group count for OpenShift Data Foundation pools
Copy link

Placement group (PG) counts per OSD across OpenShift Data Foundation Ceph pools are increased to a default of 200 PGs per OSD. Previously, the default was 100. This leads to an increased PG count across ODF pools. This helps to achieve better performance by increasing parallelism during I/O operations and improves the balancing of capacity across underlying storage devices.

3.5. Prevention of unauthorized volume mode conversion
Copy link

Previously, there was no validation to check whether the mode of an original volume (filesystem or raw block), whose snapshot was taken, matched the mode of a newly created volume. This presented a security gap that could allow malicious users to potentially exploit an as-yet-unknown vulnerability in the host operating system.

However, some users have a legitimate need to perform such conversions. This feature allows cluster administrators to provide these rights (ability to perform update or patch operations on VolumeSnapshotContents objects) only to trusted users or applications, such as backup vendors.

To convert a volume mode, an authorized user needs to change snapshot.storage.kubernetes.io/allow-volume-mode-change: "true" for VolumeSnapshotContent of the snapshot source.

3.6. Local storage operator UI enhancement
Copy link

The user interface for the creation of Local Volume Set and Local Volume Discovery for the local storage operator is updated in the OpenShift console.

For more information, see Persistent storage using local volumes.

Chapter 4. Technology previews
Copy link

This section describes the technology preview features introduced in Red Hat OpenShift Data Foundation 4.19 under Technology Preview support limitations.

Important

Technology Preview features are not supported with Red Hat production service level agreements (SLAs), might not be functionally complete, and Red Hat does not recommend using them for production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

Technology Preview features are provided with a limited support scope, as detailed on the Customer Portal: Technology Preview Features Support Scope.

4.1. Enabling granular level disaster recovery for individual virtual machines or groups of virtual machines in a namespace
Copy link

Enable granular disaster recovery (DR) for individual virtual machines (VMs) or VM groups within the same namespace, allowing independent failover and relocation actions.

Each discovered and ACM-managed VM can have its own DR policy, enabling independent DR operations without impacting other VMs in the namespace.

For more information, see Enabling granular disaster recovery for individual or groups of virtual machines in a namespace.

4.2. RHACM Kubevirt disaster recovery integration
Copy link

OpenShift Data Foundation disaster recovery is added to the Kubevirt integration of Red Hat Advanced Cluster Management (RHACM).

Chapter 5. Developer previews
Copy link

This section describes the developer preview features introduced in Red Hat OpenShift Data Foundation 4.19.

Important

Developer preview feature is subject to Developer preview support limitations. Developer preview releases are not intended to be run in production environments. The clusters deployed with developer preview features are considered to be development clusters and are not supported through the Red Hat Customer Portal case management system. If you need assistance with developer preview features, reach out to the ocs-devpreview@redhat.com mailing list and a member of the Red Hat Development Team will assist you as quickly as possible based on availability and work schedules.

5.1. OpenShift Data Foundation Multus configuration for existing cluster
Copy link

The existing clusters can be configured to use Multus network, which is useful in situations where network isolation is needed including data plane and control plane separation. For more information, see the knowledgebase article OpenShift Data Foundation Multus support for existing cluster.

5.2. Encryption at rest for existing clusters
Copy link

The existing internal mode clusters without encryption can transition to use encryption at rest. This is applicable to both dynamic and local storage devices. For more information, see the knowledgebase article, Enabling encryption at rest as a post deployment operation.

5.3. Troubleshooting disaster recovery with ODF CLI commands
Copy link

Deploying and configuring clusters for disaster recovery is complicated. To verify that the system is configured correctly, a simple application can be deployed and tested for the real disaster recovery flow. The odf dr command makes this verifying easy. Starting with OpenShift Data Foundation 4.19, odf dr command is introduced with two sub commands init and test. For more information, see the knowledgebase article, Testing disaster recovery with odf dr command.

Chapter 6. Bug fixes
Copy link

This section describes the notable bug fixes introduced in Red Hat OpenShift Data Foundation 4.19.

6.1. Multicloud Object Gateway
Copy link

Using PostgresSQL through environment variables
Previously, there was a risk of exposing PostgresSQL connection details as PostgresSQL connection details were passed as an environment variable.
With this fix, Postgres secret is passed as volume mount instead of the environment variable.
(DFBUGS-1466)

Backingstore is stuck in Rejected phase due to IO Errors
Previously, when Multicloud Object Gateway (MCG) detected errors while accessing data on a backing store, MCG disconnected the backing store to force it to reload to clear the issue. This resulted in backing store to be in a rejected state and not serve due to false positives.
With this fix, the disconnection behavior of the backing store is fine tuned to avoid the false positives.
(DFBUGS-1511)

"ap-southeast-7" region is missing from noobaa-operator code
Previously, default backing store was not created when deployed in the new ap-southeast-7 and mx-central-1 AWS regions as these regions were missing from the MCG operator supported regions.
With this fix, the two regions were added to the list of supported features.
(DFBUGS-1550)

Multicloud Object Gateway Prometheus tags not updated after bucket creation
Previously, the updated bucket tagging was not reflected in exported Prometheus metrics of MCG.
With this fix, the update tagging while collecting the metrics is exposed to Prometheus.
(DFBUGS-1615)

Multicloud Object Gateway backing store PV-pool Rejected - setting permissions of /noobaa_storage
Previously, where there were a lot of blocks under noobaa_storage directory, after every pod restart, a long time was to taken to start the pod. This was because the MCG PV pool pod was trying to recursively change permission to noobaa_storage directory under the PV before starting the pod.
With this fix, the requirement to change permission was removed as it is no longer needed.
(DFBUGS-1661)

Postgres queries on object metadata and data blocks take too long to complete
Previously, when the MCG DB was large, the entire system experienced slowness and operations failed as Agent Blocks Reclaimer in MCG looked for deleted unreclaimed blocks in the MCG DB. And, the query used was not indexed.
With this fix, a new index is added to the MCG DB to optimize the query.
(DFBUGS-1765)

MCG long query causing timeouts on endpoints
Previously, slowness was seen in all flows that used MCG DB due to short delays of object reclaimer and as there were no optimized indexes for the object reclaimed. This caused extra load to MCG DB.
With this fix, the timeout interval for the object reclaimed runs and indexes for queries are changed. As a result, slowness is no longer seen in the flows that use MCG DB.
(DFBUGS-2058)

6.2. Ceph container storage interface (CSI) Driver
Copy link

kubelet_volume metrics not reported for some CephFS PVC - NodeGetVolumeStats : health-check has not responded
Previously, PV health metrics were not reported for certain CephFS pods even though they were mounted because an issue in the Ceph CSI driver caused PV health metrics to return an error for CephFS pods in certain scenarios.
With this fix, the issue with the Ceph CSI driver is fixed and as a result, all health metrics for CephFS PVs is successfully reported in all scenarios.
(DFBUGS-2091)

6.3. Ceph container storage interface (CSI) addons
Copy link

ceph-csi-controller-manager pods OOMKilled
Previously, when ReclaimSpace operation was run on PVCs provisioned by driver other than RADOS block device (RBD), the csi-addons controller crashed due to panic because of incorrect logging.
With this fix, the logging format which caused the panic was fixed and as a result, the csi-addons controller handles the scenario gracefully.
(DFBUGS-2142)

6.4. Ceph monitoring
Copy link

Prometheus rule evaluation errors
Previously, the Prometheus query evaluation failed with the error:'many-to-many matching not allowed: matching labels must be unique on one side' as unique label was missing from alert query.
With this fix, added the unique 'managedBy' label to the query, which brought uniqueness to the query-result and resolved the issue.
(DFBUGS-2571)

Chapter 7. Known issues
Copy link

This section describes the known issues in Red Hat OpenShift Data Foundation 4.18.

7.1. Disaster recovery
Copy link

After Relocation of consistency groups based workload, synchronization is stopped
When applications using CephRBD volumes with volume consistency groups enabled are running, and the secondary managed cluster goes offline, replication for these volumes might halt indefinitely. This issue can persist even after the secondary cluster comes back online.
The Volume SynchronizationDelay alert is triggered, initially with a Warning status and later escalating to Critical. This indicates that replication has stopped for the CephRBD volumes within the volume consistency groups for the impacted applications.
Workaround: Contact Red Hat Support.
(DFBUGS-3812)

Node crash results in kubelet service failure causing Data Foundation in error state
An unexpected node crash in an OpenShift cluster might lead to node being stuck in NotReady state and affect storage cluster.
Workaround:
Get the pending CSR:
```
oc get csr | grep Pending
```
```
oc get csr | grep Pending
```
Copy to Clipboard Toggle word wrap
Approve the pending CSR:
```
Approve the pending CSR
```
```
Approve the pending CSR
```
Copy to Clipboard Toggle word wrap
(DFBUGS-3636)

Missing s3StoreProfile in ramen-hub-operator-config after upgrading from 4.18 to 4.19
When a configmap is overridden with the default values, the custom S3Profiles and other such details added by Multicluster Orchestrator (MCO) operator is lost. This happens because after the Ramen-DR hub operator is upgraded, OLM overwrites the existing ramen-hub-operator-config configmap with the default values provided by Ramen-hub CSV.
Workaround: Restart the MCO operator on the hub cluster. As result, the required values like S3profiles are updated in the configmap.
(DFBUGS-3634)

CIDR range does not persist in csiaddonsnode object when the respective node is down
When a node is down, the Classless Inter-Domain Routing (CIDR) information disappears from the csiaddonsnode object. This impacts the fencing mechanism when it is required to fence the impacted nodes.
Workaround: Collect the CIDR information immediately after the NetworkFenceClass object is created.
(DFBUGS-2948)

After node replacement, new mon pod is failing to schedule
After node replacement, the new mon pod fails to schedule itself in the newly added node. As a result, mon pod is stuck in the Pending state, which impacts the storagecluster status with a mon being unavailable.
Workaround: Manually update the new mon deployment with the correct nodeSelector.
(DFBUGS-2918)

Disaster Recovery is misconfigured after upgrade from v4.17.z to v4.18
When ODF Multicluster Orchestrator and Openshift DR Hub Operator are upgraded from 4.17.z to 4.18, certain Disaster Recovery resources are misconfigured in internal mode deployments. This impacts Disaster Recovery of workloads using ocs-storagecluster-ceph-rbd and ocs-storagecluster-ceph-rbd-virtualization StorageClasses.
To workaround this, issue, follow the instructions in this knowledgebase article.
(DFBUGS-1804)

ceph df reports an invalid MAX AVAIL value when the cluster is in stretch mode
When a CRUSH rule in a Red Hat Ceph Storage cluster has multiple take steps, the ceph df report shows the wrong maximum available size for associated pools.
(DFBUGS-1748)

DRPCs protect all persistent volume claims created on the same namespace
The namespaces that host multiple disaster recovery (DR) protected workloads protect all the persistent volume claims (PVCs) within the namespace for each DRPlacementControl resource in the same namespace on the hub cluster that does not specify and isolate PVCs based on the workload using its spec.pvcSelector field.
This results in PVCs that match the DRPlacementControl spec.pvcSelector across multiple workloads. Or, if the selector is missing across all workloads, replication management to potentially manage each PVC multiple times and cause data corruption or invalid operations based on individual DRPlacementControl actions.
Workaround: Label PVCs that belong to a workload uniquely, and use the selected label as the DRPlacementControl spec.pvcSelector to disambiguate which DRPlacementControl protects and manages which subset of PVCs within a namespace. It is not possible to specify the spec.pvcSelector field for the DRPlacementControl using the user interface, hence the DRPlacementControl for such applications must be deleted and created using the command line.
Result: PVCs are no longer managed by multiple DRPlacementControl resources and do not cause any operation and data inconsistencies.
(DFBUGS-1749)

Disaster recovery workloads remain stuck when deleted
When deleting a workload from a cluster, the corresponding pods might not terminate with events such as FailedKillPod. This might cause delay or failure in garbage collecting dependent DR resources such as the PVC, VolumeReplication, and VolumeReplicationGroup. It would also prevent a future deployment of the same workload to the cluster as the stale resources are not yet garbage collected.
Workaround: Reboot the worker node on which the pod is currently running and stuck in a terminating state. This results in successful pod termination and subsequently related DR API resources are also garbage collected.
(DFBUGS-325)

Regional-DR CephFS based application failover show warning about subscription
After the application is failed over or relocated, the hub subscriptions show up errors stating, "Some resources failed to deploy. Use the View status YAML link to view the details." This is because application persistent volume claims (PVCs) that use CephFS as the backing storage provisioner, deployed using Red Hat Advanced Cluster Management for Kubernetes (RHACM) subscriptions, and are DR protected are owned by the respective DR controllers.
Workaround: There are no workarounds to rectify the errors in the subscription status. However, the subscription resources that failed to deploy can be checked to make sure they are PVCs. This ensures that the other resources do not have problems. If the only resources in the subscription that fail to deploy are the ones that are DR protected, the error can be ignored.
(DFBUGS-253)

Disabled PeerReady flag prevents changing the action to Failover
The DR controller executes full reconciliation as and when needed. When a cluster becomes inaccessible, the DR controller performs a sanity check. If the workload is already relocated, this sanity check causes the PeerReady flag associated with the workload to be disabled, and the sanity check does not complete due to the cluster being offline. As a result, the disabled PeerReady flag prevents you from changing the action to Failover.
Workaround: Use the command-line interface to change the DR action to Failover despite the disabled PeerReady flag.
(DFBUGS-665)

Ceph becomes inaccessible and IO is paused when connection is lost between the two data centers in stretch cluster
When two data centers lose connection with each other but are still connected to the Arbiter node, there is a flaw in the election logic that causes an infinite election among Ceph Monitors. As a result, the Monitors are unable to elect a leader and the Ceph cluster becomes unavailable. Also, IO is paused during the connection loss.
Workaround: Shutdown the monitors of any one data zone by bringing down the zone nodes. Additionally, you can reset the connection scores of surviving Monitor pods.
As a result, Monitors can form a quorum and Ceph becomes available again and IOs resumes.
(DFBUGS-425)

RBD applications fail to Relocate when using stale Ceph pool IDs from replacement cluster
For the applications created before the new peer cluster is created, it is not possible to mount the RBD PVC because when a peer cluster is replaced, it is not possible to update the CephBlockPoolID’s mapping in the CSI configmap.
Workaround: Update the rook-ceph-csi-mapping-config configmap with cephBlockPoolID’s mapping on the peer cluster that is not replaced. This enables mounting the RBD PVC for the application.
(DFBUGS-527)

Information about lastGroupSyncTime is lost after hub recovery for the workloads which are primary on the unavailable managed cluster
Applications that are previously failed over to a managed cluster do not report a lastGroupSyncTime, thereby causing the trigger of the alert VolumeSynchronizationDelay. This is because when the ACM hub and a managed cluster that are part of the DRPolicy are unavailable, a new ACM hub cluster is reconstructed from the backup.
Workaround: If the managed cluster to which the workload was failed over is unavailable, you can still failover to a surviving managed cluster.
(DFBUGS-376)

MCO operator reconciles the veleroNamespaceSecretKeyRef and CACertificates fields
When the OpenShift Data Foundation operator is upgraded, the CACertificates and veleroNamespaceSecretKeyRef fields under s3StoreProfiles in the Ramen config are lost.
Workaround: If the Ramen config has the custom values for the CACertificates and veleroNamespaceSecretKeyRef fields, then set those custom values after the upgrade is performed.
(DFBUGS-440)

Instability of the token-exchange-agent pod after upgrade
The token-exchange-agent pod on the managed cluster is unstable as the old deployment resources are not cleaned up properly. This might cause application failover action to fail.
Workaround: Refer the knowledgebase article, "token-exchange-agent" pod on managed cluster is unstable after upgrade to ODF 4.17.0.
Result: If the workaround is followed, "token-exchange-agent" pod is stabilized and failover action works as expected.
(DFBUGS-561)

virtualmachines.kubevirt.io resource fails restore due to mac allocation failure on relocate
When a virtual machine is relocated to the preferred cluster, it might fail to complete relocation due to unavailability of its MAC address. This happens if the virtual machine is not fully cleaned up on the preferred cluster when it is failed over to the failover cluster.
Ensure that the workload is completely removed from the preferred cluster before relocating the workload.
(BZ#2295404)

Disabling DR for a CephFS application with consistency groups enabled may leave some resources behind
Disabling DR for a CephFS application with consistency groups enabled may leave some resources behind. In such cases, manual cleanup might be required.
Workaround: Clean up resources manually by following the steps below:
1. On the Secondary Cluster:
  - Manually delete the ReplicationGroupDestination.
    
    $ oc delete rgd -n <namespace>
    
    Copy to Clipboard Toggle word wrap
  - Confirm that the following resources have been deleted:
    ReplicationGroupDestination
    VolumeSnapshot
    VolumeSnapshotContent
    ReplicationDestination
    VolumeReplicationGroup
2. On the Primary Cluster:
  - Manually delete the ReplicationGroupSource.
    
    $ oc delete rgs -n <namespace>
    
    Copy to Clipboard Toggle word wrap
  - Confirm that the following resources have been deleted:
    ReplicationGroupSource
    VolumeGroupSnapshot
    VolumeGroupSnapshotContent
    VolumeSnapshot
    VolumeSnapshotContent
    ReplicationSource
    VolumeReplicationGroup
    (DFBUGS-2950)

For discovered apps with CephFS, sync stop after failover
For CephFS-based workloads, synchronization of discovered applications may stop at some point after a failover or relocation. This can occur with a Permission Denied error reported in the ReplicationSource status.
Workaround:
- For Non-Discovered Applications
  - Delete the VolumeSnapshot:
    
    $ oc delete volumesnapshot -n <vrg-namespace> <volumesnapshot-name>
    
    Copy to Clipboard Toggle word wrap
    
    The snapshot name usually starts with the PVC name followed by a timestamp.
  - Delete the VolSync Job:
    
    $ oc delete job -n <vrg-namespace> <pvc-name>
    
    Copy to Clipboard Toggle word wrap
    
    The job name matches the PVC name.
- For Discovered Applications
  Use the same steps as above, except <namespace> refers to the application workload namespace, not the VRG namespace.
- For Workloads Using Consistency Groups
  - Delete the ReplicationGroupSource:
    
    $ oc delete replicationgroupsource -n <namespace> <name>
    
    Copy to Clipboard Toggle word wrap
  - Delete All VolSync Jobs in that Namespace:
    
    $ oc delete jobs --all -n <namespace>
    
    Copy to Clipboard Toggle word wrap
    
    In this case, <namespace> refers to the namespace of the workload (either discovered or not), and <name> refers to the name of the ReplicationGroupSource resource.
    (DFBUGS-2883)

Remove DR option is not available for discovered apps on the Virtual machines page

The Remove DR option is not available for discovered applications listed on the Virtual machines page.

Workaround:

Add the missing label to the DRPlacementControl:

{{oc label drplacementcontrol <drpcname> \
odf.console.selector/resourcetype=virtualmachine \
-n openshift-dr-ops}}

{{oc label drplacementcontrol <drpcname> \
odf.console.selector/resourcetype=virtualmachine \
-n openshift-dr-ops}}

Copy to Clipboard

Toggle word wrap

Add the PROTECTED_VMS recipe parameter with the virtual machine name as its value:

{{oc patch drplacementcontrol <drpcname> \
-n openshift-dr-ops \
--type='merge' \
-p '{"spec":{"kubeObjectProtection":{"recipeParameters":{"PROTECTED_VMS":["<vm-name>"]}}}}'}}

{{oc patch drplacementcontrol <drpcname> \
-n openshift-dr-ops \
--type='merge' \
-p '{"spec":{"kubeObjectProtection":{"recipeParameters":{"PROTECTED_VMS":["<vm-name>"]}}}}'}}

Copy to Clipboard

Toggle word wrap

(DFBUGS-2823)

DR Status is not displayed for discovered apps on the Virtual machines page

DR Status is not displayed for discovered applications listed on the Virtual machines page.

Workaround:

Add the missing label to the DRPlacementControl:

{{oc label drplacementcontrol <drpcname> \
odf.console.selector/resourcetype=virtualmachine \
-n openshift-dr-ops}}

{{oc label drplacementcontrol <drpcname> \
odf.console.selector/resourcetype=virtualmachine \
-n openshift-dr-ops}}

Copy to Clipboard

Toggle word wrap

Add the PROTECTED_VMS recipe parameter with the virtual machine name as its value:

{{oc patch drplacementcontrol <drpcname> \
-n openshift-dr-ops \
--type='merge' \
-p '{"spec":{"kubeObjectProtection":{"recipeParameters":{"PROTECTED_VMS":["<vm-name>"]}}}}'}}

{{oc patch drplacementcontrol <drpcname> \
-n openshift-dr-ops \
--type='merge' \
-p '{"spec":{"kubeObjectProtection":{"recipeParameters":{"PROTECTED_VMS":["<vm-name>"]}}}}'}}

Copy to Clipboard

Toggle word wrap

(DFBUGS-2822)

PVCs deselected after failover doesn’t cleanup the stale entries in the secondary VRG causing the subsequent relocate to fail
If PVCs were deselected after a workload failover, and a subsequent relocate operation is performed back to the preferredCluster, stale PVCs may still be reported in VRG. As a result, the DRPC may report its Protected condition as False, with a message similar to the following:
VolumeReplicationGroup (/) on cluster is not reporting any lastGroupSyncTime as primary, retrying till status is met.
Workaround:
To resolve this issue, manually clean up the stale PVCs (that is, those deselected after failover) from VRG status.
1. Identify the stale PVCs that were deselected after failover and are no longer intended to be protected.
2. Edit the VRG status on the ManagedCluster named <managed-cluster-name>:
  $ oc edit --subresource=status -n <vrg-namespace> <vrg-name>
  Copy to Clipboard Toggle word wrap
3. Remove the stale PVC entries from the status.protectedPVCs section.
  Once the stale entries are removed, the DRPC will recover and report as healthy.
  (DFBUGS-2932)

Secondary PVCs aren’t removed when DR protection is removed for discovered apps
On the secondary cluster, CephFS PVCs linked to a workload are usually managed by the VolumeReplicationGroup (VRG). However, when a workload is discovered using the Discovered Applications feature, the associated CephFS PVCs are not marked as VRG-owned. As a result, when the workload is disabled, these PVCs are not automatically cleaned up and become orphaned.
Workaround: To clean up the orphaned CephFS PVCs after disabling DR protection for a discovered workload, manually delete them using the following command:
```
oc delete pvc <pvc-name> -n <pvc-namespace>
```
```
$ oc delete pvc <pvc-name> -n <pvc-namespace>
```
Copy to Clipboard Toggle word wrap
(DFBUGS-2827)

Failover process fails when the ReplicationDestination resource has not been created yet
If the user initiates a failover before the LastGroupSyncTime is updated, the failover process might fail. This failure is accompanied by an error message indicating that the ReplicationDestination does not exist.
Workaround:
Edit the ManifestWork for the VRG on the hub cluster.
Delete the following section from the manifest:
```
/spec/workload/manifests/0/spec/volsync
```
```
/spec/workload/manifests/0/spec/volsync
```
Copy to Clipboard Toggle word wrap
Save the changes.
Applying this workaround ensures that the VRG skips attempting to restore the PVC using the ReplicationDestination resource. If the PVC already exists, the application uses it as is. If the PVC does not exist, a new PVC is created.
(DFBUGS-632)

Ceph in warning state after adding capacity to cluster
After device replacement or capacity addition it is observed that Ceph is in HEALTH_WARN state with mon reporting slow ops. However, there is no impact to the usability of the cluster.
(DFBUGS-1273)

OSD pods restart during add capacity
OSD pods restart after performing cluster expansion by adding capacity to the cluster. However, no impact to the cluster is observed apart from pod restarting.
(DFBUGS-1426)

7.2. Multicloud Object Gateway
Copy link

NooBaa Core cannot assume role with web identity due to a missing entry in the role’s trust policy
For OpenShift Data Foundation deployments on AWS using AWS Security Token Service (STS), you need to add another entry in the trust policy for noobaa-core account. This is because with the release of OpenShift Data Foundation 4.17, the service account has changed from noobaa to noobaa-core.
For instructions to add an entry in the trust policy for noobaa-core account, see the final bullet in the prerequisites section of Updating Red Hat OpenShift Data Foundation 4.16 to 4.17.
(DFBUGS-172)

Upgrade to OpenShift Data Foundation 4.17 results in noobaa-db pod CrashLoopBackOff state
Upgrading to OpenShift Data Foundation 4.17 from OpenShift Data Foundation 4.15 fails when the PostgreSQL upgrade fails in Multicloud Object Gateway which always start with PostgresSQL version 15. If there is a PostgreSQL upgrade failure, the NooBaa-db-pg-0 pod fails to start.
Workaround: Refer to the knowledgebase article Recover NooBaa’s PostgreSQL upgrade failure in OpenShift Data Foundation 4.17.
(DFBUGS-1751)

7.3. Ceph
Copy link

Poor CephFS performance on stretch clusters
Workloads with many small metadata operations might exhibit poor performance because of the arbitrary placement of metadata server pods (MDS) on multi-site Data Foundation clusters.
(DFBUGS-1753)

SELinux relabelling issue with a very high number of files
When attaching volumes to pods in Red Hat OpenShift Container Platform, the pods sometimes do not start or take an excessive amount of time to start. This behavior is generic and it is tied to how SELinux relabelling is handled by Kubelet. This issue is observed with any filesystem based volumes having very high file counts. In OpenShift Data Foundation, the issue is seen when using CephFS based volumes with a very high number of files. There are multiple ways to work around this issue. Depending on your business needs you can choose one of the workarounds from the knowledgebase solution https://access.redhat.com/solutions/6221251.
(Jira#3327)

7.4. CSI Driver
Copy link

Automatic flattening of snapshots is not working
When there is a single common parent RBD PVC, when volume snapshot, restore, and delete snapshot are performed in a sequence more than 450 times, it is not possible to take additional volume snapshots or clones of the common parent RBD PVC.
To workaround this issue, instead of performing volume snapshot, restore, and delete operations in a sequence, you can use PVC to PVC cloning to completely avoid this issue.
If you encounter this issue, contact customer support to perform manual flattening of the final restored PVCs to continue to take volume snapshot or clone of the common parent PVC again.
(DFBUGS-1752)

7.5. OpenShift Data Foundation console
Copy link

UI shows "Unauthorized" error and Blank screen with loading temporarily during ODF operator installation
During OpenShift Data Foundation operator installation, sometimes the InstallPlan transiently goes missing which causes the page to show unknown status. This does not happen regularly. As a result, the messages and title go missing for a few seconds.
(DFBUGS-3574)

Warning message in the UI right after creation of StorageCluster
A popup warning is seen when a StorageSystem or StorageCluster is created from the user interface (UI). This is because the Virtualization StorageClass is not annotated as storageclass.kubevirt.io/is-default-virt-class: "true" by default after the deployment.
Workaround: After the deployment, annotate the StorageClass from the command-line interface (CLI) as follows:
```
`oc patch storagecluster ocs-storagecluster -nopenshift-storage --type json -p '[ {"path": "/spec/managedResources/cephBlockPools/defaultVirtualizationStorageClass", "op": "add", "value": true} ]'`
```
```
`oc patch storagecluster ocs-storagecluster -nopenshift-storage --type json -p '[ {"path": "/spec/managedResources/cephBlockPools/defaultVirtualizationStorageClass", "op": "add", "value": true} ]'`
```
Copy to Clipboard Toggle word wrap
Virtualization StorageClass is annotated and can be used now. However, the popup warning message is still seen during the deployment as UI is setting the wrong field in the StorageCluster CR.
(DFBUGS-2921)

Optimize DRPC creation when multiple workloads are deployed in a single namespace
When multiple applications refer to the same placement, then enabling DR for any of the applications enables it for all the applications that refer to the placement.
If the applications are created after the creation of the DRPC, the PVC label selector in the DRPC might not match the labels of the newer applications.
Workaround: In such cases, disabling DR and enabling it again with the right label selector is recommended.
(DFBUGS-120)

7.6. OCS operator
Copy link

Increasing MDS memory is erasing CPU values when pods are in CLBO state
When the metadata server (MDS) memory is increased while the MDS pods are in a crash loop back off (CLBO) state, CPU request or limit for the MDS pods is removed. As a result, the CPU request or the limit that is set for the MDS changes.
Workaround: Run the oc patch command to adjust the CPU limits.
For example:
```
oc patch -n openshift-storage storagecluster ocs-storagecluster \
    --type merge \
    --patch '{"spec": {"resources": {"mds": {"limits": {"cpu": "3"},
```
```
$ oc patch -n openshift-storage storagecluster ocs-storagecluster \
    --type merge \
    --patch '{"spec": {"resources": {"mds": {"limits": {"cpu": "3"},
    "requests": {"cpu": "3"}}}}}'
```
Copy to Clipboard Toggle word wrap
(DFBUGS-426)

Error while reconciling: Service "ocs-provider-server" is invalid: spec.ports[0].nodePort: Invalid value: 31659: provided port is already allocated
From OpenShift Data Foundation 4.18, the ocs-oeprator deploys a service with the port 31659, which might conflict with the existing service nodePort. Due to this any other service cannot use this port if it is already in use. As a result, ocs-oeprator will always error out while deploying the service. This causes the upgrade reconciliation to be stuck.
Workaround: Replace nodePort to ClusterIP to avoid the collision:
```
oc patch -nopenshift-storage storagecluster ocs-storagecluster --type merge -p '{"spec": {"providerAPIServerServiceType": "ClusterIP"}}'
```
```
oc patch -nopenshift-storage storagecluster ocs-storagecluster --type merge -p '{"spec": {"providerAPIServerServiceType": "ClusterIP"}}'
```
Copy to Clipboard Toggle word wrap
(DFBUGS-1831)

prometheus-operator pod is missing toleration in Red Hat OpenShift Service on AWS (ROSA) with hosted control planes (HCP) deployments

Due to a known issue during Red Hat OpenShift Data Foundation on ROSA HCP deployment, toleration needs to be manually applied for prometheus-operator after pod creation. To apply the toleration, run the following patch command:

oc patch csv odf-prometheus-operator.v4.18.0-rhodf -n odf-storage --type=json -p='[{"op": "add", "path": "/spec/install/spec/deployments/0/spec/template/spec/tolerations", "value": [

$ oc patch csv odf-prometheus-operator.v4.18.0-rhodf -n odf-storage --type=json -p='[{"op": "add", "path": "/spec/install/spec/deployments/0/spec/template/spec/tolerations", "value": [

{"key": "node.ocs.openshift.io/storage", "operator": "Equal", "value": "true", "effect": "NoSchedule" }
]}]'

Copy to Clipboard

Toggle word wrap

(DFBUGS-1272)

7.7. ODF-CLI
Copy link

ODF-CLI tools misidentify stale volumes
Stale subvolume CLI tool misidentifies the valid CephFS persistent volume claim (PVC) as stale due to an issue in the stale subvolume identification tool. As a result, stale subvolume identification functionality will not be available till the issue is fixed.
(DFBUGS-3778)

4.19 Release Notes

Release notes for features and enhancements, known issues, and other important information.

Chapter 1. OverviewCopy linkLink copied to clipboard!

1.1. About this releaseCopy linkLink copied to clipboard!

Chapter 2. New featuresCopy linkLink copied to clipboard!

2.1. Disaster recovery solutionCopy linkLink copied to clipboard!

2.1.1. Multi volume consistency for Disaster RecoveryCopy linkLink copied to clipboard!

2.1.2. Replication delay for RHACM applicationsCopy linkLink copied to clipboard!

2.1.3. Additional disaster recovery recipe capabilities for CephFS-based applicationsCopy linkLink copied to clipboard!

2.1.4. Multiple storage classes in RHACM managed clusters for Regional Disaster Recovery operationsCopy linkLink copied to clipboard!

2.2. Multicloud Object GatewayCopy linkLink copied to clipboard!

2.2.1. High availability option for Multicloud Object Gateway metadata databaseCopy linkLink copied to clipboard!

2.2.2. Cross-origin resource sharing support for Multicloud object Gateway bucketsCopy linkLink copied to clipboard!

2.2.3. PublicAccessBlock policy option for Multicloud object GatewayCopy linkLink copied to clipboard!

2.2.4. Additional expiration rules in Multicloud Object Gateway lifecycle configurationCopy linkLink copied to clipboard!

2.3. Automatic scaling of storage for dynamic storage devicesCopy linkLink copied to clipboard!

2.4. Prevention of unauthorized volume mode conversionCopy linkLink copied to clipboard!

2.5. Easy configuration of Ceph target size ratioCopy linkLink copied to clipboard!

2.6. Reduce data transfer and improve performance using read affinity for RGWCopy linkLink copied to clipboard!

Chapter 3. EnhancementsCopy linkLink copied to clipboard!

3.1. Multicloud Object Gateway object browserCopy linkLink copied to clipboard!

3.2. Multicloud Object Gateway backing store data distribution and rebalancingCopy linkLink copied to clipboard!

3.3. Allow to modify the matchLabel of topologySpreadConstraints for MCG’s backing store podsCopy linkLink copied to clipboard!

3.4. Increased placement group count for OpenShift Data Foundation poolsCopy linkLink copied to clipboard!

3.5. Prevention of unauthorized volume mode conversionCopy linkLink copied to clipboard!

3.6. Local storage operator UI enhancementCopy linkLink copied to clipboard!

Chapter 4. Technology previewsCopy linkLink copied to clipboard!

4.1. Enabling granular level disaster recovery for individual virtual machines or groups of virtual machines in a namespaceCopy linkLink copied to clipboard!

4.2. RHACM Kubevirt disaster recovery integrationCopy linkLink copied to clipboard!

Chapter 5. Developer previewsCopy linkLink copied to clipboard!

5.1. OpenShift Data Foundation Multus configuration for existing clusterCopy linkLink copied to clipboard!

5.2. Encryption at rest for existing clustersCopy linkLink copied to clipboard!

5.3. Troubleshooting disaster recovery with ODF CLI commandsCopy linkLink copied to clipboard!

Chapter 6. Bug fixesCopy linkLink copied to clipboard!

6.1. Multicloud Object GatewayCopy linkLink copied to clipboard!

6.2. Ceph container storage interface (CSI) DriverCopy linkLink copied to clipboard!

6.3. Ceph container storage interface (CSI) addonsCopy linkLink copied to clipboard!

6.4. Ceph monitoringCopy linkLink copied to clipboard!

Chapter 7. Known issuesCopy linkLink copied to clipboard!

7.1. Disaster recoveryCopy linkLink copied to clipboard!

7.2. Multicloud Object GatewayCopy linkLink copied to clipboard!

7.3. CephCopy linkLink copied to clipboard!

7.4. CSI DriverCopy linkLink copied to clipboard!

7.5. OpenShift Data Foundation consoleCopy linkLink copied to clipboard!

7.6. OCS operatorCopy linkLink copied to clipboard!

7.7. ODF-CLICopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 1. Overview
Copy link

1.1. About this release
Copy link

Chapter 2. New features
Copy link

2.1. Disaster recovery solution
Copy link

2.1.1. Multi volume consistency for Disaster Recovery
Copy link

2.1.2. Replication delay for RHACM applications
Copy link

2.1.3. Additional disaster recovery recipe capabilities for CephFS-based applications
Copy link

2.1.4. Multiple storage classes in RHACM managed clusters for Regional Disaster Recovery operations
Copy link

2.2. Multicloud Object Gateway
Copy link

2.2.1. High availability option for Multicloud Object Gateway metadata database
Copy link

2.2.2. Cross-origin resource sharing support for Multicloud object Gateway buckets
Copy link

2.2.3. PublicAccessBlock policy option for Multicloud object Gateway
Copy link

2.2.4. Additional expiration rules in Multicloud Object Gateway lifecycle configuration
Copy link

2.3. Automatic scaling of storage for dynamic storage devices
Copy link

2.4. Prevention of unauthorized volume mode conversion
Copy link

2.5. Easy configuration of Ceph target size ratio
Copy link

2.6. Reduce data transfer and improve performance using read affinity for RGW
Copy link

Chapter 3. Enhancements
Copy link

3.1. Multicloud Object Gateway object browser
Copy link

3.2. Multicloud Object Gateway backing store data distribution and rebalancing
Copy link

3.3. Allow to modify the matchLabel of topologySpreadConstraints for MCG’s backing store pods
Copy link

3.4. Increased placement group count for OpenShift Data Foundation pools
Copy link

3.5. Prevention of unauthorized volume mode conversion
Copy link

3.6. Local storage operator UI enhancement
Copy link

Chapter 4. Technology previews
Copy link

4.1. Enabling granular level disaster recovery for individual virtual machines or groups of virtual machines in a namespace
Copy link

4.2. RHACM Kubevirt disaster recovery integration
Copy link

Chapter 5. Developer previews
Copy link

5.1. OpenShift Data Foundation Multus configuration for existing cluster
Copy link

5.2. Encryption at rest for existing clusters
Copy link

5.3. Troubleshooting disaster recovery with ODF CLI commands
Copy link

Chapter 6. Bug fixes
Copy link

6.1. Multicloud Object Gateway
Copy link

6.2. Ceph container storage interface (CSI) Driver
Copy link

6.3. Ceph container storage interface (CSI) addons
Copy link

6.4. Ceph monitoring
Copy link

Chapter 7. Known issues
Copy link

7.1. Disaster recovery
Copy link

7.2. Multicloud Object Gateway
Copy link

7.3. Ceph
Copy link

7.4. CSI Driver
Copy link

7.5. OpenShift Data Foundation console
Copy link

7.6. OCS operator
Copy link

7.7. ODF-CLI
Copy link