Chapter 4. Bug fixes
This section describes bugs with significant user impact, which were fixed in this release of Red Hat Ceph Storage. In addition, the section includes descriptions of fixed known issues found in previous versions.
4.1. The Cephadm utility
The ceph-volume
commands do not block OSDs and devices and runs as expected
Previously, the ceph-volume
commands like ceph-volume lvm list
and ceph-volume inventory
were not completed thereby preventing the execution of other ceph-volume
commands for creating OSDs, listing devices, and listing OSDs.
With this update, the default output of these commands are not added to the Cephadm log resulting in completion of all ceph-volume
commands run in a container launched by the cephadm binary.
Searching Ceph OSD id claim matches a host’s fully-qualified domain name to a host name
Previously, when replacing a failed Ceph OSD, the name in the CRUSH map appeared only as a host name, and searching for the Ceph OSD id claim was using the fully-qualified domain name (FQDN) instead. As a result, the Ceph OSD id claim was not found. With this release, the Ceph OSD id claim search functionality correctly matches a FQDN to a host name, and replacing the Ceph OSD works as expected.
The ceph orch ls
command correctly displays the number of daemons running for a given service
Previously, the ceph orch ls --service-type SERVICE_TYPE
command incorrectly reported 0 daemons running for a service that had running daemons, and users were unable to see how many daemons were running for a specific service. With this release, the ceph orch ls --service-type SERVICE_TYPE
command now correctly displays how many daemons are running for that given service.
Users are no longer able to remove the Ceph Manager service using cephadm
Previously, if a user ran a ceph orch rm mgr
command, it would cause cephadm
to remove all the Ceph Manager daemons in the storage cluster, making the storage cluster inaccessible.
With this release, attempting to remove the Ceph Manager, a Ceph Monitor, or a Ceph OSD service using the ceph orch rm SERVICE_NAME
command displays a warning message stating that it is not safe to remove these services, and results in no actions taken.
The node-exporter
and alert-manager
container versions have been updated
Previously, the Red Hat Ceph Storage 5.0 node-exporter
and alert-manager
container versions defaulted to version 4.5, when version 4.6 was available, and in use in Red Hat Ceph Storage 4.2.
With this release, using the cephadm
command to upgrade from Red Hat Ceph Storage 5.0 to Red Hat Ceph Storage 5.0z1 results in the node-exporter
and alert-manager
container versions being updated to version 4.6.
4.2. Ceph Dashboard
Secure cookie-based sessions are enabled for accessing the Red Hat Ceph Storage Dashboard
Previously, storing information in LocalStorage made the Red Hat Ceph Storage dashboard accessible to all sessions running in a browser, making the dashboard vulnerable to XSS attacks. With this release, LocalStorage is replaced with secure cookie-based sessions and thereby the session secret is available only to the current browser instance.
4.3. Ceph File System
The MDS daemon no longer crashes when receiving unsupported metrics
Previously, the MDS daemon could not handle the new metrics from the kernel client causing the MDS daemons to crash on receiving any unsupported metrics.
With this release, the MDS discards any unsupported metrics and works as expected.
Deletion of data is allowed when the storage cluster is full
Previously, when the storage cluster was full, the Ceph Manager hung on checking pool permissions while reading the configuration file. The Ceph Metadata Server (MDS) did not allow write operations to occur when the Ceph OSD was full, resulting in an ENOSPACE
error. When the storage cluster hit full ratio, users could not delete data to free space using the Ceph Manager volume plugin.
With this release, the new FULL capability is introduced. With the FULL capability, the Ceph Manager bypasses the Ceph OSD full check. The client_check_pool_permission
option is disabled by default whereas, in previous releases, it was enabled. With the Ceph Manager having FULL capabilities, the MDS no longer blocks Ceph Manager calls. This results in allowing the Ceph Manager to free up space by deleting subvolumes and snapshots when a storage cluster is full.
Ceph monitors no longer crash when processing authentication requests from Ceph File System clients
Previously, if a client did not have permission to view a legacy file system, the Ceph monitors would crash when processing authentication requests from clients. This caused the Ceph monitors to become unavailable. With this release, the code update fixes the handling of legacy file system authentication requests and authentication requests work as expected.
Fixes KeyError appearing every few milliseconds in the MGR log
Previously, KeyError
was logged to the Ceph Manager log every few milliseconds. This was due to an attempt to remove an element from client_metadata[in_progress]
dictionary with a non-existent key, resulting in a KeyError
. As a result, locating other stack traces in the logs was difficult. This release fixes the code logic in the Ceph File System performance metrics and KeyError
messages in the Ceph Manager log.
Deleting a subvolume clone is no longer allowed for certain clone states
Previously, if you tried to remove a subvolume clone with the force option when the clone was not in a COMPLETED
or CANCELLED
state, the clone was not removed from the index tracking the ongoing clones. This caused the corresponding cloner thread to retry the cloning indefinitely, eventually resulting in an ENOENT
failure. With the default number of cloner threads set to four, attempts to delete four clones resulted in all four threads entering a blocked state allowing none of the pending clones to complete.
With this release, unless a clone is either in a COMPLETED
or CANCELLED
state, it is not removed. The cloner threads no longer block because the clones are deleted, along with their entry from the index tracking the ongoing clones. As a result, pending clones continue to complete as expected.
The ceph fs snapshot mirror daemon status
command no longer requires a file system name
Previously, users were required to give at least one file system name to the ceph fs snapshot mirror daemon status
command. With this release, the user no longer needs to specify a file system name as a command argument, and daemon status displays each file system separately.
Stopping the cephfs-mirror
daemon can result in an unclean shutdown
Previously, the cephfs-mirror
process would terminate uncleanly due to a race condition during cephfs-mirror
shutdown process. With this release, the race condition was resolved, and as a result, the cephfs-mirror
daemon shuts down gracefully.
The Ceph Metadata Server no longer falsely reports metadata damage, and failure warnings
Previously, the Ceph Monitor assigned a rank to standby-replay daemons during creation. This behavior can lead to the Ceph Metadata Servers (MDS) reporting false metadata damage, and failure warnings. With this release, Ceph Monitors no longer assign rank to standby-replay daemons during creation, eliminating false metadata damage, and failure warnings.
4.4. Ceph Manager plugins
The pg_autoscaler
module no longer reports failed op error
Previously, the pg-autoscaler
module reported KeyError for op
when trying to get the pool status if any pool had the CRUSH rule step set_chooseleaf_vary_r 1
. As a result, the Ceph cluster health displayed HEALTH_ERR with Module ’pg_autoscaler’ has failed: op
error. With this release,only steps with op
are iterated for a CRUSH rule while getting the pool status and the pg_autoscaler
module no longer reports the failed op
error.
4.5. Ceph Object Gateway
S3 lifecycle expiration header feature identifies the objects as expected
Previously, some objects without a lifecycle expiration were incorrectly identified in GET or HEAD requests as having a lifecycle expiration due to an error in the logic of the feature when comparing object names to stored lifecycle policy. With this update, the S3 lifecycle expiration header feature works as expected and identifies the objects correctly.
The radosgw-admin user list
command no longer takes a long time to execute in Red Hat Ceph Storage cluster 4
Previously, in Red Hat Ceph Storage cluster 4, the performance of many radosgw-admin
commands were affected because the value of rgw_gc_max_objs
config variable ,which controls the number of GC shards, was increased significantly. This included radosgw-admin
commands that were not related to GC. With this release, after an upgrade from Red Hat Ceph Storage cluster 3 to Red Hat Ceph Storage cluster 4 , the radosgw-admin user list
command does not take a longer time to execute. Only the performance of radosgw-admin
commands that require GC to operate is affected by the value of the rgw_gc_max_objs
configuration.
Policies with invalid Amazon resource name elements no longer lead to privilege escalations
Previously, incorrect handling of invalid Amazon resource name (ARN) elements in IAM policy documents, such as bucket policies, can cause unintentional permissions granted to users who are not part of the policy. With this release, this fix prevents storing policies with invalid ARN elements, or if already stored, correctly evaluates the policies.
4.6. RADOS
Setting bluestore_cache_trim_max_skip_pinned
to 10000
enables trimming of the object’s metadata
The least recently used (LRU) cache is used for the object’s metadata. Trimming of the cache is done from the least recently accessed objects. Objects that are pinned are exempted from eviction, which means they are still being used by Bluestore..
Previously, the configuration variable bluestore_cache_trim_max_skip_pinned
controlled how many pinned objects were visited, thereby the scrubbing process caused objects to be pinned for a long time. When the number of objects pinned on the bottom of the LRU metadata cache became larger than bluestore_cache_trim_max_skip_pinned
, then trimming of cache was not completed.
With this release, you can set bluestore_cache_trim_max_skip_pinned
to 10000
which is larger than the possible count of metadata cache. This enables trimming and the metadata cache size adheres to the configuration settings.
Upgrading storage cluster from Red Hat Ceph Storage 4 to 5 completes with HEALTH_WARN state
When upgrading a Red Hat Ceph Storage cluster from a previously supported version to Red Hat Ceph Storage 5, the upgrade completes with the storage cluster in a HEALTH_WARN state stating that monitors are allowing insecure global_id
reclaim. This is due to a patched CVE, the details of which are available in the CVE-2021-20288.
Recommendations to mute health warnings:
-
Identify clients that are not updated by checking the
ceph health detail
output for theAUTH_INSECURE_GLOBAL_ID_RECLAIM
alert. - Upgrade all clients to Red Hat Ceph Storage 5.0 release.
If all the clients are not upgraded immediately, mute health alerts temporarily:
Syntax
ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM 1w # 1 week ceph health mute AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED 1w # 1 week
After validating all clients have been updated and the AUTH_INSECURE_GLOBAL_ID_RECLAIM alert is no longer present for a client, set
auth_allow_insecure_global_id_reclaim
tofalse
Syntax
ceph config set mon auth_allow_insecure_global_id_reclaim false
-
Ensure that no clients are listed with the
AUTH_INSECURE_GLOBAL_ID_RECLAIM
alert.
The trigger condition for RocksDB flush and compactions works as expected
BlueStore organizes data into chunks called blobs, the size of which is 64K by default. For large writes, it is split into a sequence of 64K blob writes.
Previously, when the deferred size was equal to or more than the blob size, all the data was deferred and they were placed under the “L” column family. A typical example is the case for HDD configuration where the value is 64K for both bluestore_prefer_deferred_size_hdd
and bluestore_max_blob_size_hdd
parameters. This consumed the “L” column faster resulting in the RocksDB flush count and the compactions becoming more frequent. The trigger condition for this scenario was data size in blob
⇐ minimum deferred size
.
With this release, the deferred trigger condition checks the size of extents on disks and not blobs. Extents smaller than deferred_size
go to a deferred mechanism and larger extents are written to the disk immediately. The trigger condition is changed to data size in extent
< minimum deferred size
.
The small writes are placed under the “L” column and the growth of this column is slow with no extra compactions.
The bluestore_prefer_deferred_size
parameter controls the deferred without any interference from the blob size and works as per it’s description of “writes smaller than this size”.
The Ceph Manager no longer crashes during large increases to pg_num
and pgp_num
Previously, the code that adjusts placement groups did not handle large increases to pg_num
and pgp_num
parameters correctly, and led to an integer underflow that can crash the Ceph Manager.
With this release, the code that adjusts placement groups was fixed. As a result, large increases to placement groups do not cause the Ceph Manager to crash.
4.7. RADOS Block Devices (RBD)
The librbd
code honors the CEPH_OSD_FLAG_FULL_TRY
flag
Previously, you could set the CEPH_OSD_FLAG_FULL_TRY
with the rados_set_pool_full_try()
API function. In Red Hat Ceph Storage 5, librbd
stopped honoring this flag. This resulted in write operations stalling on waiting for space when a pool became full or reached a quota limit, even if the CEPH_OSD_FLAG_FULL_TRY
was set.
With this release, librbd
now honors the CEPH_OSD_FLAG_FULL_TRY
flag, and when set, and a pool becomes full or reaches quota, the write operations either succeed or fail with ENOSPC
or EDQUOT
message. The ability to remove RADOS Block Device (RBD) images from a full or at-quota pool is restored.
4.8. RBD Mirroring
Improvements to the rbd mirror pool peer bootstrap import
command
Previously, running the rbd mirror pool peer bootstrap import
command caused librados
to log errors about a missing key ring file in cases where a key ring was not required. This can confuse site administrators, because it appears as though the command failed due to a missing key ring. With this release, librados
no longer log errors in cases where a remote storage cluster’s key ring is not required, such as when the bootstrap token contains the key.
4.9. iSCSI Gateway
The gwcli
tool now shows the correct erasure coded pool profile
Previously, the gwcli
tool would show the incorrect k+m
values of the erasure coded pool.
With this release, the gwcli
tool pulls the information from the erasure coded pool settings from the associated erasure coded profile and the Red Hat Ceph Storage cluster shows the correct erasure coded pool profile.
The upgrade of the storage cluster with iSCSI configured now works as expected
Previously, the upgrade of the storage cluster with iSCSI configured would fail as the latest ceph-iscsi packages would not have the ceph-iscsi-tools packages that were deprecated.
With this release, the ceph-iscsi-tools
package is marked as obsolete in the RPM specification file and the upgrade succeeds as expected.
The tcmu-runner
no longer fails to remove “blocklist” entries
Previously, the tcmu-runner
would execute incorrect commands to remove the “blocklist” entries resulting in a degradation in performance for iSCSI LUNs.
With this release, the tcmu-runner
was updated to execute the correct command when removing blocklist entries. The blocklist entries are cleaned up by tcmu-runner
and the iSCSI LUNs work as expected.
The tcmu-runner
process now closes normally
Previously, the tcmu-runner
process incorrectly handled a failed path, causing the release of uninitialized g_object
memory. This can cause the tcmu-runner
process to terminate unexpectedly. The source code has been modified to skip the release of uninitialized g_object
memory, resulting in the tcmu-runner
process exiting normally.
The RADOS Block Device handler correctly parses configuration strings
Previously, the RADOS Block Device (RBD) handler used the strtok()
function while parsing configuration strings, which is not thread-safe. This caused incorrect parsing of the configuration string of image names when creating or reopening an image. This resulted in the image failing to open. With this release, the RBD handler uses the thread-safe strtok_r()
function, allowing for the correct parsing of configuration strings.
4.10. The Ceph Ansible utility
The cephadm-adopt
playbook now enables the pool application on the pool when creating a new nfs-ganesha
pool
Previously, when the cephadm-adopt
playbook created a new nfs-ganesha
pool, it did not enable the pool application on the pool. This resulted in a warning that one pool did not have the pool application enabled. With this update, the cephadm-adopt
playbook sets the pool application on the created pool, and a warning after the adoption no longer occurs.
The cephadm-adopt
playbook does not create default realms for multisite configuration
Previously, it was required for the cephadm-adopt
playbook to create the default realms during the adoption process, even when there was no multisite configuration present.
With this release, the cephadm-adopt
playbook does not enforce the creation of default realms when there is no multisite configuration deployed.
The Ceph Ansible cephadm-adopt.yml
playbook can add nodes with a host’s fully-qualified domain name
Previously, the task that adds nodes in cephadm
using the Ceph Ansible cephadm-adopt.yml
playbook, was using the short host name, and was not matching the current fully-qualified domain name (FQDN) of a node. As a result, the adoption playbook failed because no match to the FQDN host name was found.
With this release, the playbook uses the ansible_nodename
fact instead of the ansble_hostname
fact, allowing the adoption playbook to add nodes configured with a FQDN.
The Ceph Ansible cephadm-adopt
playbook now pulls container images successfully
Previously, the Ceph Ansible cephadm-adopt
playbook was not logging into the container registry on storage clusters that were being adopted. With this release, the Ceph Ansible cephadm-adopt
playbook logs into the container registry, and pulls container images as expected.