Chapter 7. Known issues
This section describes known issues in Red Hat OpenShift Data Foundation 4.9.
odf-operator
is missing when OpenShift Container Storage is upgraded from version 4.8 to 4.9
Currently, while upgrading the ocs-operator
, if you change the channel in the OpenShift Container Storage subscription without installing the odf-operator
, the cluster will only have the OpenShift Data Foundation and Multicloud Object Gateway (MCG) installed, and the 'odf-operator' will be missing from the cluster.
Workaround: Install the odf-operator
from the graphical user interface (GUI) or backend. Ensure that the subscription name is odf-operator
if you create it via the backend.
Multicloud Object Gateway insecure storage account does not support TLS 1.2
Multicloud Object Gateway (MCG) does not support Microsoft Azure storage account configured with Transport Layer Security (TLS) 1.2. As a result, you cannot create the default backing store or any new backing store on a storage account which is with 1.2 only policy.
Critical alert notification is sent after installation of arbiter storage cluster, when Ceph object user for cephobjectstore
fails to be created during storage cluster reinstallation
In a storage cluster containing a CephCluster and one or more CephObjectStores
, if the CephCluster
resource is deleted before all of the CephObjectStore
resources are fully deleted, the Rook Operator can still keep connection details about the CephObjectStores
in memory. If the same CephCluster
and CephObjectStores
are re-created, the CephObjectStores
might enter Failed
state.
To avoid this issue, you can delete the CephObjectStores
completely before removing the CephCluster. If you do not want to wait for the CephObjectStores to be deleted, restart the Rook Operator (by deleting the Operator Pod) to avoid the issue if done after uninstall. If you are actively experiencing this issue, restart the Rook Operator to resolve it by clearing the Operator’s memory of old CephObjectStore connection details.
(BZ#1974344)
Poor performance of stretch clusters on CephFS
Workloads with many small metadata operations might exhibit poor performance because of the arbitrary placement of metadata server (MDS) on multi-site OpenShift Data Foundation clusters.
rook-ceph-operator-config
ConfigMap
is not updated when OpenShift Container Storage is upgraded from version 4.5 to other version
ocs-operator
uses the rook-ceph-operator-config
ConfigMap
to configure rook-ceph-operator
behaviors, however it only creates it once and then does not reconcile it. This raises the problem that it will not update the default values for the product as they evolve.
Workaround: Administrators can manually change the rook-ceph-operator-config
values.
Automate the creation of cephobjectstoreuser
for object bucket claim metrics collector
Currently, the object bucket claim (OBC) metrics collection fails because the ocs-metrics-exporter
expects the Ceph object store user named prometheus-user
.
Workaround: Manually, create prometheus-user
and provide appropriate permissions after the storage cluster creation. Refer to the Prerequisites section of the Knowledge Base article https://access.redhat.com/articles/6541861 for more information.
StorageCluster
and StorageSystem ocs-storagecluster
are in error state for a few minutes when installing StorageSystem
During StorageCluster
creation, there is a small window of time where it will appear in an error state before moving on to a successful/ready state. This is an intermittent but expected behavior, and will usually resolve itself.
Workaround: Wait and watch status messages or logs for more information.
Tenant config does not override backendpath if the key is specified in upper case
Key Management Service (KMS) provider options set in a Tenants namespace is more advanced than the key/value settings that the OpenShift Container Storage user interface supports. As a result, the configuration options for KMS providers set in the Tenants namespace need to be formatted as camel case, instead of upper case. It might be confusing for the users that have access to the KMS provider configuration in the openshift-storage
namespace, and the configuration in a Tenants namespace as options in the openshift-storage
namespace are in upper case, whereas the options in the Tenants namespace are in camel case.
Workaround: Use camel case formatting for the KMS provider options.
Deleting a protected application that has been failed over and later relocated does not delete the RADOS block device image on the secondary or failover site
Deleting a disaster recovery (DR) protected workload may leak RADOS block device (RBD) images on the secondary DR cluster. The deleted images would then occupy space on the secondary cluster. To resolve this issue, use a toolbox pod to detect and clean up the images on the secondary cluster that are no longer in use for DR protection. This workaround ensures space reclamation on the secondary cluster.
Failover action reports RADOS block device image mount failed on the pod with RPC error still in use
Failing over a disaster recovery (DR) protected workload may result in pods using the volume on the failover cluster to be stuck in reporting RADOS block device (RBD) image is still in use. This prevents the pods from starting up for a long duration (upto several hours).
Relocate action results in PVC’s in Termination state and workload is not moved to a preferred cluster
While relocating a disaster recovery (DR) protected workload, results in workload not stopping on the current primary cluster and PVCs of the workload remaining in the terminating state. This prevents pods and PVCs from being relocated to the preferred cluster. To recover the issue perform a failover action to move the workload to the preferred cluster. The workload would be recovered on the preferred cluster but may include a loss of data as the action is a failover.
Failover action reports RADOS block device image mount failed on the pod with RPC error fsck
Failing over a disaster recovery (DR) protected workload may result in pods not starting with volume mount errors that state the volume has file system consistency check (fsck) errors. This prevents the workload from failing over to the failover cluster.
Overprovision Level Policy Control does not support custom storage class
OpenShift Data Foundation limits the allowed storage classes in overprovision-control
to Ceph sub-types only. As a result, if a user-defined storage class is used in the overprovision-control
, the StorageCluster
CRD is defined as invalid and that storage class cannot have the overprovision-control
.