Chapter 6. Bug fixes


This section describes bugs with significant user impact, which were fixed in this release of Red Hat Ceph Storage. In addition, the section includes descriptions of fixed known issues found in previous versions.

6.1. The Cephadm utility

.rgw.root pool is not automatically created during Ceph Object Gateway multisite check

Previously, a check for Ceph Object Gateway multisite was performed by cephadm to help with signalling regressions in the release causing a .rgw.root pool to be created and recreated if deleted, resulting in users being stuck with the .rgw.root pool even when not using Ceph Object Gateway.

With this fix, the check that created the pool is removed, and the pool is no longer created. Users who already have the pool in their system but do not want it, can now delete it without the issue of its recreation. Users who do not have this pool would not have the pool automatically created for them.

(BZ#2090395)

6.2. Ceph File System

Reintegrating stray entries does not fail when the destination directory is full

Previously, Ceph Metadata Servers reintegrated an unlinked file if it had a reference to it, that is, if the deleted file had hard links or was a part of a snapshot. The reintegration, which is essentially an internal rename operation, failed if the destination directory was full. This resulted in Ceph Metadata Server failing to reintegrate stray or deleted entries.

With this release, full space checks during reintegrating stray entries are ignored and these stray entries are reintegrated even when the destination directory is full.

(BZ#2041572)

MDS daemons no longer crash when receiving metrics from new clients

Previously, in certain scenarios, newer clients were being used for old CephFS clusters. While upgrading old CephFS, cephadm or mgr used newer clients to perform checks, tests or configurations with an old Ceph cluster. Due to this, the MDS daemons crashed when receiving unknown metrics from newer clients.

With this fix, the libceph clients send only those metrics that are supported by MDS daemons to MDS as default. An additional option to force enable all the metrics when users think it’s safe is also added.

(BZ#2081914)

Ceph Metadata Server no longer crashes during concurrent lookup and unlink operations

Previously, an incorrect assumption of an assert placed in the code, which gets hit on concurrent lookup and unlink operations from a Ceph client, caused Ceph Metadata Server crash.

The latest fix moves the assertion to the relevant place where the assumption, during concurrent lookup and unlink operation, is valid, resulting in the continuation of Ceph Metadata Server serving the Ceph client operations without crashing.

(BZ#2093064)

6.3. Ceph Object Gateway

Usage of MD5 for non-cryptographic purposes in a FIPS environment is allowed

Previously, in a FIPS enabled environment, the usage of MD5 digest was not allowed by default, unless explicitly excluded for non-cryptographic purposes. Due to this, a segfault occurred during the S3 complete multipart upload operation.

With this fix, the usage of MD5 for non-cryptographic purposes in a FIPS environment for S3 complete multipart PUT operations is explicitly allowed and the S3 multipart operations can be completed.

(BZ#2088601)

6.4. RADOS

ceph-objectstore-tool command allows manual trimming of the accumulated PG log dups entries

Previously, trimming of PG log dups entries was prevented during the low-level PG split operation which is used by the PG autoscaler with far higher frequency than by a human operator. Stalling the trimming of dups resulted in significant memory growth of PG log, leading to OSD crashes as it ran out of memory. Restarting an OSD did not solve the problem as the PG log is stored on disk and reloaded to RAM on startup.

With this fix, the ceph-objectstore-tool command allows manual trimming of the accumulated PG log dups entries, to unblock the automatic trimming machinery. A debug improvement is implemented that prints the number of dups entries to the OSD’s log to help future investigations.

(BZ#2094069)

6.5. RBD Mirroring

Snapshot-based mirroring process no longer gets cancelled

Previously, as a result of an internal race condition, the rbd mirror snapshot schedule add command would be cancelled out. The snapshot-based mirroring process for the affected image would not start, if no other existing schedules were applicable.

With this release, the race condition is fixed and the snapshot-based mirroring process starts as expected.

(BZ#2099799)

Existing schedules take effect when an image is promoted to primary

Previously, due to an ill-considered optimization, existing schedules would not take effect following an image’s promotion to primary resulting in the snapshot-based mirroring process to not start for a recently promoted image.

With this release, the optimization causing this issue is removed and the existing schedules now take effect when an image is promoted to primary and the snapshot-based mirroring process starts as expected.

(BZ#2100519)

rbd-mirror daemon no longer acquires exclusive lock

Previously, due to a logic error, rbd-mirror daemon could acquire the exclusive lock on a de-facto primary image. Due to this, the snapshot-based mirroring process for the affected image would stop, reporting a "failed to unlink local peer from remote image" error.

With this release, the logic error is fixed resulting in the rbd-mirror daemon not acquiring the exclusive lock on a de-facto primary image and the snapshot-based mirroring process not stopping and working as expected.

(BZ#2100520)

Mirror snapshot queue used by rbd-mirror is extended and is no longer removed

Previously, as a result of an internal race condition, the mirror snapshot used by the rbd-mirror daemon on the secondary cluster would be removed causing the snapshot-based mirroring process for the affected image to stop, reporting a "split-brain" error.

With this release, the mirror snapshot queue is extended in length and the mirror snapshot cleanup procedure is amended accordingly, fixing the automatic removal of the mirror snapshots that are still in-use by rbd-mirror daemon on the secondary cluster and the snapshot-based mirroring process does not stop.

(BZ#2092843)

6.6. The Ceph Ansible utility

Adoption playbook can now install cephadm on OSD nodes

Previously, due to the tools repository being disabled on OSD nodes, you could not install cephadm OSD nodes resulting in the failure of the adoption playbook.

With this fix, the tools repository is enabled on OSD nodes and the adoption playbook can now install cephadm on OSD nodes.

(BZ#2073480)

Removal of legacy directory ensures error-free cluster post adoption

Previously, cephadm displayed an unexpected behaviour with its 'config inferring' function whenever a legacy directory, such as /var/lib/ceph/mon was found. Due to this behaviour, post adoption, the cluster was left with the following error "CEPHADM_REFRESH_FAILED: failed to probe daemons or devices".

With this release, the adoption playbook ensures the removal of this directory and the cluster is not left in an error state after the adoption.

(BZ#2075510)

Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.