Chapter 6. Bug fixes


This section describes the notable bug fixes introduced in Red Hat OpenShift Data Foundation 4.15.

6.1. Disaster recovery

Fencing takes more time than expected

Previously, fencing operations took more time than expected. This was due to reconcile of Ramen hub controller a couple of times and requeue with delay as extra checks were added to ensure that the fencing operation was complete on the managed cluster.

With this fix, the hub controller is registered for the updates in fencing state. As a result, the updates of the fencing status change is received immediately and it takes less time to finish fencing operation.

(BZ#2249462)

6.2. Multicloud Object Gateway

Multicloud Object Gateway failing to use the new internal certificate after rotation

Previously, Multicloud Object Gateway (MCG) client was not able to connect to S3 using the new certificate unless the MCG endpoint pods were restarted. Even though the MCG endpoint pods were loading the certificate for the S3 service at the start of the pod, the changes in the certificate were not watched, which means that rotating a certificate was not affecting the endpoint till the pods were restarted.

With this fix, a watch to check for the changes in certificate of the endpoint pods are added. As a result, the pods load the new certificate without the need for a restart.

(BZ#2237903)

Regenerating S3 credentials for OBC in all namespaces

Previously, the Multicloud Object Gateway command for obc regenerate did not have the flag app-namespace. This flag is available for the other object bucket claim (OBC) operations such as creation and deletion of OBC. With this fix, the app-namespace flag is added to the obc generate command. As a result, OBC regenerates S3 credentials in all namespaces.

(BZ#2242414)

Signature validation failure

Previously, in Multicloud Object Gateway, there was failure to verify signatures when operations fail as AWS’s C++ software development kit (SDK) does not encode the "=" sign in signature calculations when it appears as a part of the key name.

With this fix, MCG’s decoding of the path in the HTTP request is fixed to successfully verify the signature.

(BZ#2265288)

6.3. Ceph

Metadata server run out of memory and reports over-sized cache

Previously, metadata server (MDS) would run out of memory as the standby-replay MDS daemons would not trim their caches.

With this fix, the MDS trims its cache when in standby-replay. As a result MDS would not run out of memory.

(BZ#2141422)

Ceph is inaccessible after crash or shutdown tests are run

Previously, in a stretch cluster, when a monitor is revived and is in the probing stage for other monitors to receive the latest information such as MonitorMap or OSDMap, it is unable to enter stretch_mode. This prevents it from correctly setting the elector’s disallowed_leaders list, which leads to the Monitors getting stuck in election and Ceph eventually becomes unresponsive.

With this fix, the marked-down monitors are unconditionally added to the disallowed_leaders list. This fixes the problem of newly revived monitors having different disallowed_leaders set and getting stuck in an election.

(BZ#2241937)

6.4. Ceph container storage interface (CSI)

Snapshot persistent volume claim in pending state

Previously, creation of readonlymany (ROX) CephFS persistent volume claim (PVC) from snapshot source failed when a pool parameter was present in the storage class due to a bug.

With this fix, the check for the pool parameter is removed as it is not required. As a result, creation of ROX CephFS PVC from a snapshot source will be successful.

(BZ#2248117)

6.5. OpenShift Data Foundation console

Incorrect tooltip message for the raw capacity card

Previously, the tooltip for the raw capacity card in the block pool page showed an incorrect message. With this fix, the tooltip content for the raw capacity card has been changed to display an appropriate message, "Raw capacity shows the total physical capacity from all the storage pools in the StorageSystem".

(BZ#2237895)

System raw capacity card not showing external mode StorageSystem

Previously, the System raw capacity card did not display Ceph external StorageSystem as the Multicloud Object Gateway (MCG) standalone and Ceph external StorageSystems were filtered out from the card.

With this fix, only the StorageSystems that do not report the total capacity as per the information reported by the odf_system_raw_capacity_total_bytes metric is filtered out. As a result, any StorageSystem that reports the total raw capacity is displayed on the System raw capacity card and only the StorageSystems that do not report the total capacity is not displayed in the card.

(BZ#2257441)

6.6. Rook

Provisioning object bucket claim with the same bucket name

Previously, for the green field use case, creation of two object bucket claims (OBCs) with the same bucket name was successful from the user interface. Even though two OBCs were created, the second one pointed to invalid credentials.

With this fix, creation of the second OBC with the same bucket name is blocked and it is no longer possible to create two OBCs with the same bucket name for green field use cases.

(BZ#2228785)

Change of the parameter name for the Python script used in external mode deployment

Previously, while deploying OpenShift Data Foundation using Ceph storage in external mode, the Python script used to extract Ceph cluster details had a parameter name, --cluster-name, which could be misunderstood to be the name of the Ceph cluster. However, it represented the name of the OpenShift cluster that the Ceph administrator provided.

With this fix, the --cluster-name flag is changed to --k8s-cluster-name`. The legacy flag --cluster-name is also supported to cater to the upgraded clusters used in automation.

(BZ#2244609)

Incorrect pod placement configurations while detecting Multus Network Attachment Definition CIDRS

Previously, some OpenShift Data Foundation clusters failed where the network "canary" pods were scheduled on nodes without Multus cluster networks, as OpenShift Data Foundation did not process pod placement configurations correctly while detecting Multus Network Attachment Definition CIDRS.

With this fix, OpenShift Data Foundation was fixed to process pod placement for Multus network "canary" pods. As a result, network "canary" scheduling errors are no longer experienced.

(BZ#2249678)

Deployment strategy to avoid rook-ceph-exporter pod restart

Previously, the rook-ceph-exporter pod restarted multiple times on a freshly installed HCI cluster that resulted in crashing of the exporter pod and the Ceph health showing the WARN status. This was because restarting the exporter using RollingRelease caused a race condition resulting in crash of the exporter.

With this fix, the deployment strategy is changed to Recreate. As a result, exporter pods no longer crash and there is no more health WARN status of Ceph.

(BZ#2250995)

rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a pod stuck in CrashLoopBackOff state

Previously, the rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a pod was stuck in CrashLoopBackOff state as the RADOS Gateway (RGW) multisite zonegroup was not getting created and fetched, and the error handling was reporting wrong text.

With this release, the error handling bug in multisite configuration is fixed and fetching the zonegroup is improved by fetching it for a particular rgw-realm that was created earlier. As a result, the multisite configuration and rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a pod gets created successfully.

(BZ#2253185)

6.7. Ceph monitoring

TargetDown alert reported for ocs-metrics-exporter

Previously, metrics endpoint of the ocs-metrics-exporter used to be unresponsive as persistent volume resync by ocs-metrics-exporter was blocked indefinitely.

With this fix, the blocking operations from persistent volume resync in ocs-metrics-exporter is removed and the metrics endpoint is responsive. Also, the TargetDown alert for ocs-metrics-exporter no longer appears.

(BZ#2168042)

Label references of object bucket claim alerts

Previously, label for the object bucket claim alerts was not displayed correctly as the format for the label-template was wrong. Also, a blank object bucket claim name was displayed and the description text was incomplete.

With this fix, the format is corrected. As a result, the description text is correct and complete with appropriate object bucket claim name.

(BZ#2188032)

Discrepancy in storage metrics

Previously, the capacity of a pool was reported incorrectly as a wrong metrics query was used in the Raw Capacity card in the Block Pool dashboard.

With this fix, the metrics query in the user interface is updated. As a result, the metrics of the total capacity of a block pool is reported correctly.

(BZ#2252035)

Add managedBy label to rook-ceph-exporter metrics and alerts

Previously, the metrics generated by rook-ceph-exporter did not have the managedBy label. So, it was not possible for the OpenShift console user interface to identify from which StorageSystem the metrics are generated.

With this fix, the managedBy label, which has the name of the StorageSystem as a value, is added through the OCS operator to the storage cluster’s Monitoring spec. This spec is read by the Rook operator and it relabels the ceph-exporter’s ServiceMonitor endpoint labels. As a result, all the metrics generated from this exporter will have the new label managedBy.

(BZ#2255491)

6.8. Must gather

Must gather logs not collected after upgrade

Previously, the must-gather tool failed to collect logs after the upgrade as Collection started <time> was seen twice.

With this fix, the must-gather tool is updated to run the pre-install script only once. As a result, the tool is able to collect the logs successfully after upgrade.

(BZ#2255240)

Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.