Chapter 6. Bug fixes
This section describes notable bug fixes introduced in Red Hat OpenShift Data Foundation 4.9.
Multicloud Object Gateway storage class deleted during uninstall
Previously, Multicloud Object Gateway (MCG) storage class which was deployed as a part of OpenShift Data Foundation deployment was not deleted during uninstall.
With this update, the Multicloud Object Gateway (MCG) storage class gets removed while uninstalling OpenShift Data Foundation.
OpenShift Container Platform alert when OpenShift Container Storage quorum lost
Previously, CephMonQuorumAtRisk
alert was fired when mon
quorum was about to be lost, but there was no alert triggered after losing the quorum. This resulted in no notification being sent when the mon
quorum was completely lost.
With this release, a new alert, CephMonQuorumLost
is introduced. This alert is triggered when you have only one node left and a single mon
is running on it. However, at this point the cluster will be in unrecoverable state and the alert serves as a notification of the issue.
(BZ#1944513)
Reduce the mon_data_avail_warn
from 30 % to 15%
Previously, the mon_data_avail_warn
alert was triggered when the mon
store was less than 30% and it did not match the threshold value of OpenShift Container Platform’s garbage collector for images, which is 15%. With this release, you will see the alert when the available storage at the mon
store location is less than 15% and not less than 30%.
(BZ#1964055)
OSD pods do not log anything if the initial deployment is OpenShift Container Storage 4.4
Previously, object storage daemon (OSD) logs were not generated when OpenShift Container Storage 4.4 was deployed. With this update, the OSD logs are generated correctly.
(BZ#1974343)
Multicloud Object Gateway was not able to initialize in a fresh deployment
Previously, after the internal database change from MongoDB to PostgreSQL, duplicate entities that should be unique could be added to the database (MongoDB prevented duplicate entities earlier) due to which Multicloud Object Gateway (MCG) was not working. With this release, duplicate entities are prevented.
(BZ#1975645)
PVC is restored when using two different backend paths for the encrypted parent
Previously, when restoring a persistent volume claim (PVC) from a volume snapshot into a different storage class with a different encryption KMSID
, the restored PVC went into the Bound
state and the restored PVC failed to get attached to a Pod. This was because the encryption passphrase was being copied with the parent PVC’s storage class encryption KMSID
config
. With this release, the restored PVC’s encryption passphrase is copied with the correct encryption KMSID
config
from the destination storage class. Hence, the PVC is successfully restored into a storage class with a different encryption KMSID
than its parent PVC.
(BZ#1975730)
Deletion of data is allowed when the storage cluster is full
Previously, when the storage cluster was full, the Ceph Manager hung on checking pool permissions while reading the configuration file. The Ceph Metadata Server (MDS) did not allow write operations to occur when the Ceph OSD was full, resulting in an ENOSPACE
error. When the storage cluster hit full ratio, users could not delete data to free space using the Ceph Manager and ceph-volume
plugin.
With this release, the new FULL feature is introduced. This feature gives the Ceph Manager FULL capability, and bypasses the Ceph OSD full check. Additionally, the client_check_pool_permission
option can be disabled. With the Ceph Manager having FULL capabilities, the MDS no longer blocks Ceph Manager calls. This allows the Ceph Manager to free up space by deleting subvolumes and snapshots when a storage cluster is full.
(BZ#1978769)
Keys are completely destroyed in Vault after deleting encrypted persistent volume claims (PVCs) while using the kv-v2
secret engine
HashiCorp Vault added a feature for the key-value store v2 where deletion of the stored keys makes it possible to recover the contents in case the metadata of the deleted key is not removed in a separate step. When using key-value v2 storage for secrets in HashiCorp Vault, deletion of volumes did not remove the metadata of the encryption passphrase from the KMS.
With this update, the keys in HashiCorp Vault is completely destroyed by default when a PVC is deleted. You can set the new configuration option VAULT_DESTROY_KEYS
to false
to enable the previous behavior. In that case, the metadata of the keys will be kept in HashiCorp Vault so that recovery of the encryption passphrase of the removed PVC is possible.
Multicloud Object Gateway object bucket creation is going to Pending Phase
Previously, after the internal database change from MongoDB to PostgreSQL, duplicate entries that should be unique could be added to the database (MongoDB prevented duplicate entries earlier). As a result, creation of new resources such as buckets, backing stores, and so on failed. With this release, duplicate entries are prevented.
(BZ#1980299)
Deletion of CephBlockPool
gets stuck and blocks the creation of new pools
Previously, in a Multus enabled cluster, the Rook Operator did not have access to the object storage daemon (OSD) network as it did not have the network annotations. As a result, the rbd
type commands during a pool cleanup would hang because the OSDs could not be contacted.
With this release, the operator proxies the rbd
command through a sidecar container in the mgr
pod and runs successfully during the pool cleanup.
Standalone Multicloud Object Gateway failing to connect
Previously, the Multicloud Object Gateway (MCG) CR was not updated properly because of the change in the internal DB from MongoDB to PostgreSQL. This caused issues in certain flows. As a result, MCG components were not able to communicate with one another and MCG failures occurred on upgrade.
With this release, MCG CR issue is fixed.
Monitoring spec is getting reset in CephCluster resource in external mode
Previously, when OpenShift Container Storage was upgraded, the monitoring endpoints would get reset in external CephCluster’s monitoring spec. This was not an expected behavior and was due to the way monitoring endpoints were passed to the CephCluster. With this update, the way endpoints are passed is changed. Before the CephCluster is created, the endpoints are accessed directly from the JSON secret, rook-ceph-external-cluster-details
and the CephCluster spec is updated. As a result, the monitoring endpoint specs in the CephCluster is updated properly with appropriate values even after the OpenShift Container Storage upgrade.
CrashLoopBackOff
state of noobaa-db-pg-0
pod when enabling hugepages
Previously, enabling hugepages
on OpenShift Container Platform cluster caused the Multicloud Object Gateway (MCG) database pod to go into a CrashLoopBackOff
state. This was due to wrong initialization of PostgreSQL. With this release, MCG database pod’s initialization of PostgreSQL is fixed.
Multicloud Object Gateway unable to create new object bucket claims
Previously, performance degradation when working against the Multicloud Object Gateway (MCG) DB caused back pressure on all the MCG components which resulted in failure to execute flows within the system such as configuration flows and I/O flows.
With this update, the most time consuming queries are fixed, the DB is cleared quickly, and no back pressure is created.
Buckets fail during creation because of an issue with checking attached resources
Previously, because of a problem in checking resources attached to a bucket during its creation, the bucket would fail to be created. The conditions in the resource validation during bucket creation have been fixed, and the buckets are created as expected.
NooBaa Operator still checks for noobaa-db
service after upgrading
Previously, when OpenShift Container Storage was updated from version 4.6, there was a need to retain the old and the new noobaa-db
StatefulSets for migration purposes. The code still supports both the names of sets. A failure message was generated on the old noobaa-db
StatefulSet due to a small issue in the code which caused the operator to check the status of the old noobaa-db
StatefulSet even though it was no longer relevant.
With this update, the operator stops checking the status of the old noobaa-db
StatefulSet.
(BZ#2008821)
Changes to the config maps of the Multicloud Object Gateway (MCG) DB pod does not get reconciled after upgrade
Previously, changes to the config maps of the MCG DB pod did not apply after an upgrade. The flow has been fixed to properly take the variables from the config maps for the DB pod.
(BZ#2012930)