Chapter 5. Bug fixes


This section describes the notable bug fixes introduced in Red Hat OpenShift Data Foundation 4.17.

5.1. Disaster recovery

  • FailOver of applications are hung in FailingOver state

    Previously, applications were not DR protected successfully because of the errors in protecting required resources to the provided S3 stores. So, failing over such applications resulted in FailingOver state.

    With this fix, a metric and a related alert is added to the application DR protection health that shows an alert to rectify protection issues after DR protects the applications. As a result, the applications that are successfully protected are failed over.

    (BZ#2248723)

  • Post hub recovery, applications which were in FailedOver state consistently report FailingOver

    Previously, after recovering a DR setup from a hub and a ManageCluster loss to a passive hub, applications which were in FailedOver state to the lost ManagedCluster consistently reported FailingOver status. Failing over such applications to the surviving cluster was allowed but required checks were missing on the surviving cluster to ensure that the failover can be initiated.

    With this fix, Ramen hub operator ensures if the target cluster is ready for a failover operation before initiating the action. As a result, any failover initiated is successful or if stale resources still exist on the failover target cluster, the operator stalls the failover till the stale resources are cleaned up.

    (BZ#2247847)

  • Post hub recovery, subscription app pods now come up after Failover

    Previously, post hub recovery, the subscription application pods did not come up after failover from primary to the secondary managed clusters. This caused RBAC error occurs in AppSub subscription resource on managed cluster due to a timing issue in the backup and restore scenario.

    This issue has been fixed, and subscription app pods now come up after failover from primary to secondary managed clusters.

    (BZ#2295782)

  • Application namespaces are no longer left behind in managed clusters after deleting the application

    Previously, if an application was deleted on the RHACM hub cluster and its corresponding namespace was deleted on the managed clusters, the namespace reappeared on the managed cluster.

    With this fix, once the corresponding namespace is deleted, the application no longer reappears.

    (BZ#2059669)

  • odf-client-info config map is now created

    Previously, the controller inside MCO was not properly filtering the ManagedClusterView resource. This lead to a key config map odf-client-info to not be created.

    With this update, the filtering mechanism has been fixed, and odf-client-info config map is created as expected.

    (BZ#2308144)

5.2. Multicloud Object Gateway

  • Ability to change log level of backingstore pods

    Previously, there was no way to change the log level of backingstore pods. With this update, changing the NOOBAA_LOG_LEVEL in the config map will now change the debug level of the pv-pools backingstore pods accordingly.

    (BZ#2297448)

  • STS token expiration now works as expected

    Previously, incorrect STS token expiration time calculations and printings caused STS tokens to remain valid long past after their expiration time. Users would see the wrong expiration time when trying to assume a role.

    With this update, the STS code was revamped and modified to fix the problems, as well as added support for the CLI flag --duration-seconds. Now STS token expiration works as expected, and is shown to the user properly.

    (BZ#2299801)

  • Block deletion of OBC via regular S3 flow

    S3 buckets can be created both via object bucket claim (OBC) and directly via the S3 operation. When a bucket is created with an OBC and deleted via S3, it leaves the OBC entity dangling and the state is inconsistent. With this update, deleting an OBC via regular S3 flow is blocked, avoiding an inconsistent state.

    (BZ#2301657)

  • NooBaa Backingstore no longer stuck in Connecting post upgrade

    Previoulsy, NooBaa backingstore blocked upgrade as it remained in the Connecting phase leaving the storagecluster.yaml in phase Progressing. This issue has been fixed, and upgrade progresses as expected.

    (BZ#2302507)

  • NooBaa DB cleanup no longer fails

    Previously, NooBaa DB’s cleanup would stop after DB_CLEANER_BACK_TIME elapsed from the start time of noobaa-core pod. This meant NooBaa DB PVC consumption would rise. This issue has been fixed, and NooBaa DB cleanup works as expected.

    (BZ#2305978)

  • MCG standalone upgrade working as expected

    Previously, a bug caused NooBaa pods to have incorrect affinity settings, leaving them stuck in the pending state.

    This fix ensures that any previously incorrect affinity settings on the NooBaa pods are cleared. Affinity is now only applied when the proper conditions are met, preventing the issue from recurring after the upgrade.

    After upgrading to the fixed version, the pending NooBaa pods won’t automatically restart. To finalize the upgrade, manually delete the old pending pods. The new pods will then start with the correct affinity settings, allowing them to run successfully.

    (BZ#2314636)

5.3. Ceph

  • New restored or cloned CephFS PVC creation no longer slows down due to parallel clone limit

    Previously, upon reaching the limit of parallel CephFS clones, the rest of the clones would queue up, slowing down the cloning.

    With this enhancement, upon reaching the limit of parallel clones at one time, the new clone creation requests are rejected. The default parallel clone creation limit is 4.

    To increase the limit, contact customer support.

    (BZ#2190161)

5.4. OpenShift Data Foundation console

  • Pods created in openshift-storage by end users no longer cause errors

    Previously, when a pod was created in openshift-storage by an end user it would cause the console topology page to break. This was because pods without any ownerReferences were not considered to be part of the design.

    With this fix, pods without owner references are filtered out, and only pods with correct ownerReferences are shown. This allows for the topology page to work correctly even when pods are arbitrarily added to the openshift-storage namespace.

    (BZ#2245068)

  • Applying an object bucket claim (OBC) no longer causes an error

    Previously, when attaching an OBC to a deployment using the OpenShift Web Console, the error Address form errors to proceed was shown even when there were no errors in the form. With this fix, the form validations have been changed, and there is no longer an error.

    (BZ#2302575)

  • Automatic mounting of service account tokens disabled to increase security

    By default, OpenShift automatically mounts a service account token into every pod, regardless of whether the pod needs to interact with the OpenShift API. This behavior can expose the pod’s service account token to unintended use. If a pod is compromised, the attacker could gain access to this token, leading to possible privilege escalation within the cluster.

    If the default service account token is unnecessarily mounted, and the pod becomes compromised, the attacker can use the service account credentials to interact with the OpenShift API. This access could lead to serious security breaches, such as unauthorized actions within the cluster, exposure of sensitive information, or privilege escalation across the cluster.

    To mitigate this vulnerability, the automatic mounting of service account tokens is disabled unless explicitly needed by the application running in the pod. In the case of ODF console pod the fix involved disabling the automatic mounting of the default service account token by setting the automountServiceAccountToken: false in the pod or service account definition.

    With this fix, pods no longer automatically mount the service account token unless explicitly needed. This reduces the risk of privilege escalation or misuse of the service account in case of a compromised pod.

    (BZ#2302857)

  • Provider mode clusters no longer have the option to connect to external RHCS cluster

    Previously, during provider mode deployment there was the option to deploy external RHCS. This resulted in an unsupported deployment.

    With this fix, connecting to external RHCS is now blocked so users do not end with an unsupported deployment.

    (BZ#2312442)

5.5. Rook

  • Rook.io Operator no longer gets stuck when removing a mon from quorum

    Previously, mon quorum could be lost when removing a mon from quorum due to a race condition. This was because there might not have been enough quorum to complete the removal of the mon from quorum.

    This issue has been fixed, and the Rook.io Operator no longer gets stuck when removing a mon from quorum.

    (BZ#2292435)

  • Network Fence for non-graceful node shutdown taint no longer blocks volume mount on surviving zone

    Previously, Rook was creating NetworkFence CR with an incorrect IP address when a node was tainted as out-of-service. Fencing the wrong IP address was blocking the application pods from moving to another node when a taint was added.

    With this fix, auto NetworkFence has been disabled in Rook when the out-of-service taint is added on the node, and application pods are no longer blocked from moving to another node.

    (BZ#2315666)

5.6. Ceph monitoring

  • Invalid KMIP configurations now treated as errors

    Previously, Thales Enterprise Key Management (KMIP) was not added in the recognized KMS services. This meant that whenever an invalid KMIP configuration was provided, it was not treated as an error.

    With this fix, Thales KMIP service has been added as a valid KMS service. This enables KMS services to propagate KMIP configuration statuses correctly. Therefore, any mis-configurations are treated as errors.

    (BZ#2271773)

5.7. CSI Driver

  • Pods no longer get stuck during upgrade

    Previously, if there was a node with an empty label, PVC mount would fail during upgrade.

    With this fix, nodes labeled with empty value aren’t considered for the crush_location mount, so they no longer block PVC mounting.

    (BZ#2297265)

Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.