Chapter 6. Bug fixes
This section describes the notable bug fixes introduced in Red Hat OpenShift Data Foundation 4.19.
6.1. Multicloud Object Gateway Copy linkLink copied to clipboard!
Using PostgresSQL through environment variables
Previously, there was a risk of exposing PostgresSQL connection details as PostgresSQL connection details were passed as an environment variable.
With this fix, Postgres secret is passed as volume mount instead of the environment variable.
Backingstore is stuck in Rejected phase due to IO Errors
Previously, when Multicloud Object Gateway (MCG) detected errors while accessing data on a backing store, MCG disconnected the backing store to force it to reload to clear the issue. This resulted in backing store to be in a rejected state and not serve due to false positives.
With this fix, the disconnection behavior of the backing store is fine tuned to avoid the false positives.
"ap-southeast-7" region is missing from noobaa-operator code
Previously, default backing store was not created when deployed in the new
ap-southeast-7
andmx-central-1
AWS regions as these regions were missing from the MCG operatorsupported regions
.With this fix, the two regions were added to the list of supported features.
Multicloud Object Gateway Prometheus tags not updated after bucket creation
Previously, the updated bucket tagging was not reflected in exported Prometheus metrics of MCG.
With this fix, the update tagging while collecting the metrics is exposed to Prometheus.
Multicloud Object Gateway backing store PV-pool Rejected - setting permissions of /noobaa_storage
Previously, where there were a lot of blocks under
noobaa_storage
directory, after every pod restart, a long time was to taken to start the pod. This was because the MCG PV pool pod was trying to recursively change permission tonoobaa_storage
directory under the PV before starting the pod.With this fix, the requirement to change permission was removed as it is no longer needed.
Postgres queries on object metadata and data blocks take too long to complete
Previously, when the MCG DB was large, the entire system experienced slowness and operations failed as
Agent Blocks Reclaimer
in MCG looked for deleted unreclaimed blocks in the MCG DB. And, the query used was not indexed.With this fix, a new index is added to the MCG DB to optimize the query.
MCG long query causing timeouts on endpoints
Previously, slowness was seen in all flows that used MCG DB due to short delays of object reclaimer and as there were no optimized indexes for the object reclaimed. This caused extra load to MCG DB.
With this fix, the timeout interval for the object reclaimed runs and indexes for queries are changed. As a result, slowness is no longer seen in the flows that use MCG DB.
6.2. Ceph container storage interface (CSI) Driver Copy linkLink copied to clipboard!
kubelet_volume
metrics not reported for some CephFS PVC -NodeGetVolumeStats
: health-check has not respondedPreviously, PV health metrics were not reported for certain CephFS pods even though they were mounted because an issue in the Ceph CSI driver caused PV health metrics to return an error for CephFS pods in certain scenarios.
With this fix, the issue with the Ceph CSI driver is fixed and as a result, all health metrics for CephFS PVs is successfully reported in all scenarios.
6.3. Ceph container storage interface (CSI) addons Copy linkLink copied to clipboard!
ceph-csi-controller-manager pods OOMKilled
Previously, when ReclaimSpace operation was run on PVCs provisioned by driver other than RADOS block device (RBD), the csi-addons controller crashed due to panic because of incorrect logging.
With this fix, the logging format which caused the panic was fixed and as a result, the csi-addons controller handles the scenario gracefully.
6.4. Ceph monitoring Copy linkLink copied to clipboard!
Prometheus rule evaluation errors
Previously, the Prometheus query evaluation failed with the error:'many-to-many matching not allowed: matching labels must be unique on one side' as unique label was missing from alert query.
With this fix, added the unique 'managedBy' label to the query, which brought uniqueness to the query-result and resolved the issue.