Chapter 6. Bug fixes
This section describes bugs with significant user impact, which were fixed in this release of Red Hat Ceph Storage. In addition, the section includes descriptions of fixed known issues found in previous versions.
6.1. The Cephadm utility
Users can upgrade to a local repo image without any issues
Previously, in cephadm, docker.io
would be added to the start of the image name by default, if the image name was not a qualified domain name. Due to this, users were unable to upgrade to images on local repositories.
With this fix, care has been taken to identify the images to which docker.io
is added by default. Users using a local repo image can upgrade to that image without encountering issues.
6.2. Ceph File System
snap-schedules
are no longer lost on restarts of Ceph Manager services
Previously, in-memory databases were not written to persistent storage on every change to the schedule. This caused snap-schedules
to get lost on restart of Ceph Manager services.
With this fix, the in-memory databases are dumped into persistent storage on every change or addition to the snap-schedules
. Retention now continues to work across restarts of Ceph Manager services.
The standby-replay Metadata Server daemon is no longer unexpectedly removed
Previously, the Ceph Monitor would remove a standby-replay Metadata Server (MDS) daemon from the MDS map under certain conditions. This caused the standby-replay MDS daemon to be removed from the Metadata Server cluster, which generated cluster warnings.
With this fix, the logic used in Ceph Monitors during the consideration of removal of an MDS daemon from the MDS map, now includes information about the standby-replay MDS daemons holding a rank. This ensures that the standby-replay MDS daemons are no longer unexpectedly removed from the MDS cluster.
6.3. Ceph Manager plugins
Ceph Manager Alert emails are not tagged as spam anymore
Previously, emails sent by the Ceph Manager Alerts module did not have the “Message-Id” and “Date:headers”. This increased the chances of flagging the emails as spam.
With this fix, both the headers are added to the emails sent by Ceph Manager Alerts module and the messages are not flagged as spam.
6.4. The Ceph Volume utility
The volume list remains empty when no ceph-osd
container is found and cephvolumescan
actor no longer fails
Previously, if Ceph containers ran collocated with other containers without a ceph-osd
container present among them, the process would try to retrieve the volume list from one non-Ceph container which would not work. Due to this, cephvolumescan
actor would fail and the upgrade would not complete.
With this fix, if no ceph-osd
container is found, the volume list will remain empty and the cephvolumescan
actor does not fail.
Ceph OSD deployment no longer fails when ceph-volume
treats multiple devices.
Previously, ceph-volume
computed wrong sizes when there were multiple devices to treat, resulting in failure to deploy OSDs.
With this fix, ceph-volume
computes the correct size when multiple devices are to be treated and deployment of OSDs work as expected.
6.5. Ceph Object Gateway
Users can now set up Kafka connectivity with SASL in a non-TLS environment
Previously, due to a failure in configuring the TLS certificate for Ceph Object Gateway, it was not possible to configure Kafka topic with SASL (user and password).
With this fix, a new configuration parameter, rgw_allow_notification_secrets_in_cleartext
, is added. Users can now set up Kafka connectivity with SASL in a non-TLS environment.
Internal handling of tokens is fixed
Previously, internal handling of tokens in the refresh path of Java-based client authentication provider jar for AWS SDK for Java and Hadoop S3A Connector, would not deal correctly with the large tokens, resulting in improper processing of some tokens and preventing the renewal of client tokens.
With this fix, the internal token handling is fixed and it works as expected.
The object version access is corrected preventing object lock violation
Previously, inadvertent slicing of version information would occur in some call paths, causing any object version protected by object lock to be deleted contrary to policy.
With this fix, the object version access is corrected, thereby preventing object lock violation.
Ceph Object Gateway no longer crashes with malformed URLs
Previously, a refactoring abstraction replaced a bucket value with a pointer to a bucket value that was not always initialized. This caused malformed URLs corresponding to bucket operations on no buckets resulting in Ceph Object Gateway crashing.
With this fix, a check on the pointer has been implemented into the call path and Ceph Object Gateway returns a permission error, rather than crashing, if it is uninitialized.
The code that parses dates z-amz-date
format is changed
Previously, the standard format for x-amz-date
was changed which caused issues, since the new software uses the new date format. The new software built with the latest go
libraries would not talk to the Ceph Object Gateway.
With this fix, the code in the Ceph Object Gateway that parses dates in x-amz-date
format is changed to also accept the new date format.
(BZ#2109675)
New logic in processing of lifecycle shards prevents stalling due to deleted buckets
Previously, changes were made to cause lifecycle processing to continuously cycle across days, that is, to not restart from the beginning of the list of eligible buckets each day. However, the changes contained a bug which could stall processing of lifecycle shards that contained deleted buckets, causing the processing of lifecycle shards to stall.
With this fix, a logic is introduced to skip over the deleted buckets, due to which the processing no longer stalls.
Header processing no longer causes sporadic swift-protocol authentication failures
Previously, a combination of incorrect HTTP header processing and timestamp handling logic would either cause an invalid Keystone admin token to be used for operations, or non-renewal of Keystone’s admin token as required. Due to this, sporadic swift-protocol authentication failures would occur.
With this fix, header processing is corrected and new diagnostics are added. The logic now works as expected.
Warnings are no longer logged in inappropriate circumstances
Previously, an inverted logic would occasionally report an incorrect warning - unable to find head object, causing the warning to be logged when it was not applicable in a Ceph Object Gateway configuration.
With this fix, the corrected logic no longer logs the warning in inappropriate circumstances.
PUT object operation writes to the correct bucket index shards
Previously, due to a race condition, a PUT object operation would rarely write to a former bucket index shard. This caused the former bucket index shard to be recreated, and the object would not appear in the proper bucket index. Therefore, the object would not be listed when the bucket was listed.
With this fix, care is taken to prevent various operations from creating bucket index shards and recover when the race condition is encountered. PUT object operations now always write to the correct bucket index shards.
6.6. Multi-site Ceph Object Gateway
Suspending bucket versioning in the primary zone no longer suspends bucket versioning in the archive zone
Previously, if bucket versioning was suspended in the primary zone, bucket versioning in the archive zone would also be suspended.
With this fix, archive zone versioning is always enabled irrespective of bucket versioning changes on other zones. Bucket versioning in the archive zone no longer gets suspended.
The radosgw-admin sync status
command in multi-site replication works as expected
Previously, in a multisite replication, if one or more participating Ceph Object Gateway nodes are down, you would (5) Input/output error output when running the radosgw-admin sync status
command. This status should get resolved after all the Ceph Object Gateway nodes are back online.
With this update, the radosgw-admin sync status
command does not get stuck and works as expected.
Processes trimming retired bucket index entries no longer cause radosgw
instance to crash
Previously, under some circumstances, processes trimming retired bucket index entries could access an uninitialized pointer variable resulting in the radosgw
instance to crash.
With this fix, code is initialized immediately before use and the radosgw
instance no longer crashes.
Bucket sync run is given control logic to sync all objects
Previously, to support dynamic bucket resharding on multisite clusters, a singular bucket index log was replaced with multiple bucket index log generations. But, due to how bucket sync run was implemented, only the oldest outstanding generation would be sync run.
With this fix, bucket sync run is given control logic which enables it to run the sync from oldest outstanding to current and all objects are now synced as expected.
Per-bucket replication logical error fix executes policies correctly
Previously, an internal logic error caused failures in per-bucket replication, due to which per-bucket replication policies did not work in some circumstances.
With this fix, the logic error responsible for confusing the source and destination bucket information is corrected and the policies execute correctly.
Variable access no longer causes undefined program behavior
Previously, a coverity scan would identify two cases, where variables could be used after a move, potentially causing an undefined program behavior to occur.
With this fix, variable access is fixed and the potential fault can no longer occur.
Requests with a tenant but no bucket no longer cause a crash
Previously, an upstream refactoring replaced uninitialized bucket data fields with uninitialized pointers. Due to this, any bucket request containing a URL referencing no valid bucket caused crashes.
With this fix, requests that access the bucket but do not specify a valid bucket are denied, resulting in an error instead of a crash.
6.7. RADOS
Performing a DR test with two sites stretch cluster no longer causes Ceph to become unresponsive
Previously, when performing a DR test with two sites stretch-cluster, removing and adding new monitors to the cluster would cause an incorrect rank in ConnectionTracker
class. Due to this, the monitor would fail to identify itself in the peer_tracker
copy and would never update its correct field, causing a deadlock in the election process which would lead to Ceph becoming unresponsive.
With this fix, the following corrections are made:
-
Added an assert in the function
notify_rank_removed()
, to compare the expected rank provided by theMonmap
against the rank that is manually adjusted as a sanity check. -
Clear the variable
removed_ranks
from everyMonmap
update. -
Added an action to manually reset
peer_tracker.rank
when executing the command -ceph connection scores reset
for each monitor. Thepeer_tracker.rank
matches the current rank of the monitor. -
Added functions in the
Elector
andConnectionTracker
classes to check for cleanpeer_tracker
when upgrading the monitors, including booting up. If found unclean,peer_tracker
is cleared. -
In Red Hat Ceph Storage, the user can choose to manually remove a monitor rank before shutting down the monitor, causing inconsistency in
Monmap
. Therefore, inMonitor::notify_new_monmap()
we prevent the function from removing our rank or ranks that don’t exist inMonmap
.
The cluster now works as expected and there is no unwarranted downtime. The cluster no longer becomes unresponsive when performing a DR test with two sites stretch-cluster.
Rank is removed from the live_pinging
and dead_pinging
set to mitigate the inconsistent connectivity score issue
Previously, when removing two monitors consecutively, if the rank size is equal to Paxos’s size, the monitor would face a condition and would not remove rank from the dead_pinging
set. Due to this, the rank remained in the dead_pinging
set which would cause problems, such as inconsistent connectivity score when the stretch-cluster mode was enabled.
With this fix, a case is added where the highest ranked monitor is removed, that is, when the rank is equal to Paxos’s size, remove the rank from the live_pinging
and dead_pinging
set. The monitor stays healthy with a clean live_pinging
and dead_pinging
set.
The Prometheus metrics now reflect the correct Ceph version for all Ceph Monitors whenever requested
Previously, the Prometheus metrics reported mismatched Ceph versions for Ceph Monitors when the monitor was upgraded. As a result, the active Ceph Manager daemon needed to be restarted to resolve this inconsistency.
With this fix, the Ceph Monitors explicitly send metadata update requests with mon
metadata to mgr
when MON election is over.
The ceph daemon heap status
command shows the heap status
Previously, due to a failure to get heap information through the ceph daemon
command, the ceph daemon heap stats
command would return empty output instead of returning current heap usage for a Ceph daemon. This was because ceph::osd_cmds::heap()
was confusing the stderr
and stdout
concept which caused the difference in output.
With this fix, the ceph daemon heap stats
command returns heap usage information for a Ceph daemon similar to what we get using the ceph tell
command.
Ceph Monitors no longer crash when using ceph orch apply mon <num>
command
Previously, when the command ceph orch apply mon <num>
was used to decrease monitors in a cluster, the monitors were removed before shutting down in ceph-adm
causing the monitors to crash.
With this fix, a sanity check is added to all code paths that check whether the peer rank is more than or equal to the size of the ranks from the monitor map. If the condition is satisfied, then skip certain operations that lead to the monitor crashing. The peer rank eventually resolves itself in the next version of the monitor map. The monitors no longer crash when removed from the monitor map before shutting down.
End-user can now see the scrub or deep-scrub starts
message from the Ceph cluster log
Previously, due to the scrub or deep-scrub starts message missing in the Ceph cluster log, the end-user would fail to know if the PG scrubbing had started for a PG from the Ceph cluster log.
With this fix, the scrub or deep-scrub starts
message is reintroduced. The Ceph cluster log now shows the message for a PG, whenever it goes for a scrubbing or deep-scrubbing process.
No assertion during the Ceph Manager failover
Previously, when activating the Ceph Manager, it would receive several service_map
versions sent by the previously active manager. This incorrect check in code would cause assertion failure when the newly activated manager received a map with a higher version sent by the previously active manager.
With this fix, the check in the manager that deals with the initial service map is relaxed and there is no assertion during the Ceph Manager failover.
Users can remove cloned objects after upgrading a cluster
Previously, after upgrading a cluster from Red Hat Ceph Storage 4 to Red Hat Ceph Storage 5 , removing snapshots of objects created in earlier versions would leave clones, which could not be removed. This was because the SnapMapper
keys were wrongly converted.
With this fix, SnapMapper
’s legacy conversation is updated to match the new key format. The cloned objects in earlier versions of Ceph can now be easily removed after an upgrade.
RocksDB error does not occur for small writes
BlueStore employs a strategy of deferring small writes for HDDs and stores data in RocksDB. Cleaning deferred data from RocksDB is a background process which is not synchronized with BlueFS.
With this fix, deferred replay no longer overwrites BlueFS data and some RocksDB errors do not occur, such as:
-
osd_superblock
corruption. - CURRENT does not end with newline.
-
.sst
files checksum error.
Do not write deferred data as the write location might either contain a proper object or be empty. It is not possible to corrupt object data this way. BlueFS is the only entity that can allocate this space.
Corrupted dups entries of a PG Log can be removed by off-line and on-line trimming
Previously, trimming of PG log dups entries could be prevented during the low-level PG split operation, which is used by the PG autoscaler with far higher frequency than by a human operator. Stalling the trimming of dups resulted in significant memory growth of PG log, leading to OSD crashes as it ran out of memory. Restarting an OSD did not solve the problem as the PG log is stored on disk and reloaded to RAM on startup.
With this fix, both off-line, using the ceph-objectstore-tool
command, and on-line, within OSD, trimming can remove corrupted dups entries of a PG log that jammed the on-line trimming machinery and were responsible for the memory growth. A debug improvement is implemented that prints the number of dups entries to the OSD’s log to help future investigations.
6.8. RADOS Block Devices (RBD)
rbd info
command no longer fails if executed when the image is being flattened
Previously, due to an implementation defect, rbd info
command would fail, although rarely, if run when the image was being flattened. This caused a transient No such file or directory error to occur, although, upon rerun, the command always succeeded.
With this fix, the implementation defect is fixed and rbd info
command no longer fails even if executed when the image is being flattened.
Removing a pool with pending Block Device tasks no longer causes all the tasks to hang
Previously, due to an implementation defect, removing a pool with pending Block Device tasks caused all Block Device tasks, including other pools, to hang. To resume hung Block Device tasks, the administrator had to restart the ceph-mgr
daemon.
With this fix, the implementation defect is fixed and removing a pool with pending RBD tasks no longer causes any hangs. Block Device tasks for the removed pool are cleaned up. Block Device tasks for other pools continue executing uninterrupted.
6.9. RBD Mirroring
The image replayer shuts down as expected
Previously, due to an implementation defect, a request to shut down a particular image replayer would cause the rbd-mirror
daemon to hang indefinitely, especially in cases where the daemon was blocklisted on the remote storage cluster.
With this fix, the implementation defect is fixed and a request to shut down a particular image replayer no longer causes the rbd-mirror
daemon to hang and the image replayer shuts down as expected.
The rbd mirror pool peer bootstrap create
command guarantees correct monitor addresses in the bootstrap token
Previously, a bootstrap token generated with the rbd mirror pool peer bootstrap create
command contained monitor addresses as specified by the mon_host
option in the ceph.conf
file. This was fragile and caused issues to users, such as causing confusion between V1 and V2 endpoints, specifying only one of them, grouping them incorrectly, and the like.
With this fix, the rbd mirror pool peer bootstrap create
command is changed to extract monitor address from the cluster itself, guaranteeing the monitor addresses contained in a bootstrap token to be correct.
6.10. iSCSI Gateway
Upgrade from Red Hat Ceph Storage 4.x to 5.x with iSCSI works as expected
Previously, due to version conflict between some of the ceph-iscsi
dependent libraries, upgrades from Red Hat Ceph Storage 4.x to 5.x would lead to a persistent HTTP 500 error.
With this fix, the versioning conflict is resolved and the upgrade works as expected. However, as a result of this fix, iSCSI REST API responses aren’t pretty-printed.
6.11. The Ceph Ansible utility
Upgrade workflow with Ceph Object Gateway configuration is fixed
Previously, whenever set_radosgw_address.yml
was called from the dashboard playbook execution, the fact is_rgw_instances_defined
was expected to be set if rgw_instances
was defined in group_vars/host_vars
by the user. Otherwise, the next task that sets the fact rgw_instances
will be executed under the assumption that it wasn’t user defined. This caused the upgrade workflow to break when deploying the Ceph Object Gateway multisite and Ceph Dashboard.
With this fix, ceph-ansible
sets the parameter when set_radosgw_address.yml
playbook is called from the dashboard playbook and the upgrade workflow works as expected.
The fact condition is updated to execute only on the Ceph Object Gateway nodes
Previously, due to set_fact _radosgw_address to radosgw_address_block ipv4
being executed on all nodes, including the ones where no Ceph Object Gateway network range was present, playbooks failed to work.
With this fix, the when
condition is updated to execute the fact setting only on the Ceph Object Gateway nodes and now works as expected.