Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.

Chapter 6. Bug fixes


This section describes the notable bug fixes introduced in Red Hat OpenShift Data Foundation 4.21.

6.1. Multicloud Object Gateway

  • Multicloud Object Gateway core pod restarting due to non‑deterministic StorageClass selection

    Previously, the Multicloud Object Gateway reconciler selected a StorageClass by listing all StorageClasses and choosing the first match for the Ceph RBD provisioner. Because the Kubernetes List API does not guarantee ordering, this resulted in non‑deterministic and sometimes incorrect StorageClass selection, causing the noobaa-core-0 pod to restart after a fresh deployment.

    With this fix, the controller now uses a static, deterministic StorageClass name instead of relying on the unordered list.

    As a result, StorageClass selection is consistent, preventing configuration issues and pod restart loops.

    (DFBUGS-4763)

  • Incorrect free‑space calculation for PV Pool

    Previously, the PV Pool free‑space calculation was performed per pod instead of averaging usage across all nodes. As a result, the Multicloud Object Gateway could report high usage even when only a single pod exceeded 80%.

    With this fix, free space is now calculated as the average used space across all pods in the PV Pool.

    (DFBUGS-4152)

  • Multicloud Object Gateway database (NooBaa DB) high availability failure during upgrade

    Previously, the MCG database (NooBaa DB) did not have a default value defined for the PostgreSQL shared_buffers parameter. As a result, if no value was provided, the database was configured with shared_buffers=0, causing high availability (HA) to fail during upgrade.

    With this fix, a default value of shared_buffers=1G is applied to ensure proper database behavior and prevent HA failures.

    (DFBUGS-4746)

  • Database corruption caused by concurrent PostgreSQL instances during DB pod replacement

    Previously, when the database pod was force‑deleted, a new pod started before the old pod’s container had fully stopped. This race condition resulted in two PostgreSQL instances running concurrently on the same data directory and caused database corruption. The HA controller (HAC) in the Multicloud Object Gateway operator was identified as the component triggering the force deletion. In addition, the database PVC used the ReadWriteOnce access mode, which did not prevent concurrent mounts by two containers on the same node.

    This fix disables the HAC by default. For new deployments, the database PVC now uses the ReadWriteOncePod access mode to prevent concurrent mounting.

    As a result, the DB pod is no longer force‑deleted by internal components, and new deployments benefit from stronger volume protection to prevent corruption even if the pod is force‑deleted manually.

    (DFBUGS-4757)

  • PVCs in PV Pool backingstores inherited irrelevant CPU and memory limits

    Previously, when provisioning a PVC for a PV Pool backingstore, the provisioning logic copied all fields into the PVC template. This caused the PVC to incorrectly inherit CPU and memory limits.

    With this fix, only the relevant fields are copied into the PVC template, preventing unintended resource settings.

    (DFBUGS-3986)

  • ThanosRuleHighRuleEvaluationWarnings firing due to incorrect Multicloud Object Gateway metric naming

    Previously, some Multicloud Object Gateway metrics did not end with the required _total, _sum, _count, or _bucket suffix. As a result, the ThanosRuleHighRuleEvaluationWarnings info alert continued to fire in the Red Hat OpenShift Container Platform web console.

    With this fix, the affected metrics now use the appropriate suffixes, preventing this alert from firing for this issue.

    (DFBUGS-3822)

6.2. Disaster recovery

  • Stuck PVs after final sync due to Retain reclaim policy

    Previously, after the final sync, temporary PVs/PVCs were deleted, but some PVs remained because their persistentVolumeReclaimPolicy was set to Retain, preventing cleanup.

    With this fix, Ramen now properly resolves conflicts during resource updates, ensuring that cleanup is not skipped.

    As a result, no PVs remain stuck after failover or final sync.

    (DFBUGS-4535)

  • Certificate rotation race condition caused empty certificate files

    Previously, Kubernetes nodes could fail to start after a reboot because certificate files were empty, even though the files and symlinks existed. This occurred due to a race condition during kubelet certificate rotation.

    The issue happened because when new certificates were written to disk, the data was not explicitly forced to persist to the physical disk. If a reboot occurred before the OS flushed the buffered data, the certificate files ended up empty.

    With this fix, certificate data is now immediately and explicitly written to disk after being generated.

    As a result, certificate files remain valid and non‑empty even if certificate rotation occurs during node reboot.

    (DFBUGS-3636)

  • Incorrect MAX AVAIL value shown in ceph df for stretch‑mode clusters

    Previously, Red Hat Ceph Storage clusters operating in stretch mode displayed an incorrect MAX AVAIL value in the output of the ceph df command. This resulted in inaccurate reporting of available storage capacity.

    With this update, OpenShift Data Foundation now correctly computes and reports the MAX AVAIL metric, ensuring accurate capacity visualization across Ceph pools in stretch‑mode deployments.

    This fix improves cluster observability and prevents misinterpretation of storage utilization.

    (DFBUGS-1748)

6.3. Ceph

  • OSD crashes caused by BlueStore Elastic Shared Blob extent‑resharding logic

    Previously, a bug in BlueStore’s Elastic Shared Blob extent‑resharding logic caused incorrect allocation‑unit (AU) boundary calculations. This triggered the assertion ceph_assert(diff ⇐ bytes_per_au[pos]) during resharding, and resulted in OSD crashes. The issue was fixed upstream and included in the Ceph 8.1z5 downstream branch.

    With this fix, OSDs now handle BlueStore resharding and object deletion operations without crashing.

    As a result, OSDs no longer enter crash loops, and placement groups (PGs) avoid degraded or incomplete states caused by this issue. This bug affected OSDs created with bluestore_elastic_shared_blobs=1 in Squid (19.2.x / Ceph 8.x).

    (DFBUGS-4828)

6.4. OCS Operator

  • Improved Ceph PG autoscaling by removing default target size ratios from DF‑created pools

    Previously, Data Foundation set a default target_size_ratio of 0.49 on the data pools it created. Over time, it was observed that having target size ratios on pools led to poor PG autoscaling and balancing, causing delays in rebalancing across pools.

    With this fix, Data Foundation‑created pools no longer use a default target size ratio.

    As a result, PG autoscaling is faster and overall pool balancing is improved.

    (DFBUGS-2665)

6.5. CSI Addons

  • Orphaned CSIAddonsNode resources causing errant sidecar connection attempts

    Previously, deleting a worker node left behind stale CSIAddonsNode resources owned by the DaemonSet on that node. These orphaned resources caused the csi-addons-controller-manager pod to make repeated, incorrect connection attempts.

    With this fix, the controller manager now watches for and removes stale CSIAddonsNode resources automatically.

    As a result, no orphaned CSIAddonsNode resources remain in the cluster.

    (DFBUGS-4466)

6.6. Ceph monitoring

  • ocs-metrics-exporter and ocs-provider-server pods stuck in Pending state

    Previously, the ocs-metrics-exporter and ocs-provider-server pods did not inherit custom tolerations defined under spec.placement.all in the StorageCluster custom resource, causing the pods to remain in a Pending state.

    With this fix, tolerations configured under the pod-specific placement keys are correctly applied. Administrators must configure extra tolerations under spec.placement.metrics-exporter for the ocs-metrics-exporter pod and under spec.placement.api-server for the ocs-provider-server pod. The all placement key remains reserved for rook-ceph resources.

    A bug that previously overwrote existing tolerations when adding new ones to metrics-exporter or api-server has also been resolved.

    (DFBUGS-5253)

  • Incorrect pool statistics used for pool‑quota monitoring

    Previously, the CephPoolQuotaBytesCriticallyExhausted and CephPoolQuotaBytesNearExhaustion alerts evaluated quota status using incorrect pool statistics, causing the UI to display false warnings.

    With this fix, these alerts now use the correct pool‑quota values for evaluation.

    (DFBUGS-4611)

Red Hat logoGithubredditYoutubeTwitter

Lernen

Testen, kaufen und verkaufen

Communitys

Über Red Hat Dokumentation

Wir helfen Red Hat Benutzern, mit unseren Produkten und Diensten innovativ zu sein und ihre Ziele zu erreichen – mit Inhalten, denen sie vertrauen können. Entdecken Sie unsere neuesten Updates.

Mehr Inklusion in Open Source

Red Hat hat sich verpflichtet, problematische Sprache in unserem Code, unserer Dokumentation und unseren Web-Eigenschaften zu ersetzen. Weitere Einzelheiten finden Sie in Red Hat Blog.

Über Red Hat

Wir liefern gehärtete Lösungen, die es Unternehmen leichter machen, plattform- und umgebungsübergreifend zu arbeiten, vom zentralen Rechenzentrum bis zum Netzwerkrand.

Theme

© 2026 Red Hat
Nach oben