Chapter 4. Bug fixes
This section describes bugs with significant impact on users that were fixed in this release of Red Hat Ceph Storage. In addition, the section includes descriptions of fixed known issues found in previous versions.
4.1. The Cephadm utility
Using the --name NODE
flag with the cephadm shell to start a stopped OSD no longer returns the wrong image container
Previously, in some cases, when using the cephadm shell --name NODE
command, the command would start the container with the wrong version of the tools. This would occur when a user has a newer ceph container image on the host than the one that their OSDs are using.
With this fix, Cephadm determines the container image for stopped daemons when using the cephadm shell command with the --name
flag. Users no longer have any issues with the --name
flag, and the command works as expected.
4.2. The Ceph Ansible utility
Playbooks now remove the RHCS version repositories matching the running RHEL version
Previously, playbooks would try to remove Red Hat Ceph Storage 4 repositories from RHEL 9 even though they do not exist on RHEL 9. This would cause the playbooks to fail.
With this fix, playbooks remove existing Red Hat Ceph Storage version repositories matching the running RHEL version and the correct repositories are removed.
4.3. NFS Ganesha
All memory consumed by the configuration reload process is now released
Previously, reload exports would not release all the memory consumed by the configuration reload process causing the memory footprint to increase.
With this fix, all memory consumed by the configuration reload process is released resulting in reduced memory footprint.
4.4. Ceph Dashboard
Users can create volumes with multiple hosts in the Ceph dashboard
With this fix, users can now create volumes with multiple hosts in the Ceph dashboard.
Unset subvolume size is no longer set as 'infinite'
Previously, the unset subvolume size was set to 'infinite', resulting in the failure of the update.
With this fix, the code that sets the size to 'infinite' is removed and the update works as expected.
Missing options are added in the kernel mount command
Previously, a few options were missing in the kernel mount command for attaching the filesystem causing the command to not work as intended.
With this fix, the missing options are added and the kernel mount command works as expected.
Ceph dashboard now supports both NFS v3 and v4-enabled export management
Previously, the Ceph dashboard only supported the NFSv4-enabled exports management and not the NFSv3-enabled exports. Due to this, any management done for exports via CLI for NFSv3 was corrupted.
With this fix, support for NFSv3-based exports management is enabled by having an additional checkbox. The Ceph dashboard now supports both v3 and v4-enabled export management.
Access/secret keys are now not compulsory while creating a zone
Previously, access/secret keys were compulsory when creating a zone in Ceph Object Gateway multi-site. Due to this, users had to first set the non-system user’s keys in the zone and later update with the system user’s keys.
With this fix, access/secret keys are not compulsory while creating a zone.
Importing multi-site configuration no longer throws an error on submitting the form
Previously, the multi-site period information did not contain the 'realm' name. Due to this, importing the multi-site configuration threw an error on submitting the form.
With this fix, the check for fetching 'realm' name from period information is removed and the token import works as expected.
The Ceph Object Gateway metrics label names are aligned with the Prometheus label naming format and they are now visible in Prometheus
Previously, the metrics label names were not aligned with the Prometheus label naming format, causing the Ceph Object Gateway metrics to not be visible in Prometheus.
With this fix, the hyphen (-) is replaced with an underscore (_) in Ceph Object Gateway metrics label names, wherever applicable and all Ceph Object Gateway metrics are now visible in Prometheus.
Full names can now include dot in Ceph dashboard
Previously, in the Ceph dashboard, it was not possible to create or modify a full name with a dot in it due to incorrect validation.
With this fix, validation is properly adapted to include a dot in full names in Ceph dashboard.
4.5. Ceph File System
MDS metadata with FSMap changes are now added in batches to ensure consistency
Previously, monitors would sometimes lose track of MDS metadata during upgrades and cancelled PAXOS transactions resulting in MDS metadata being no longer available.
With this fix, MDS metadata with FSMap changes are added in batches to ensure consistency. The ceph mds metadata
command now functions as intended across upgrades.
The ENOTEMPTY
output is detected and the message is displayed correctly
Previously, when running the subvolume group rm
command, the ENOTEMPTY
output was not detected in the volume’s plugin causing a generalized error message instead of a specific message.
With this fix, the ENOTEMPTY
output is detected for the subvolume group rm
command when there is subvolume present inside the subvolumegroup and the message is displayed correctly.
MDS now queues the next client replay request automatically as part of request cleanup
Previously, sometimes, MDS would not queue the next client request for replay in the up:client-replay
state causing the MDS to hang.
With this fix, the next client replay request is queued automatically as part of request cleanup and MDS proceeds with failover recovery normally.
cephfs-mirroring overall performance is improved
With this fix, the incremental snapshot sync is corrected, which improves the overall performance of cephfs-mirroring.
The loner member is set to true
Previously, for a file lock in the LOCK_EXCL_XSYN state, the non-loner clients would be issued empty caps. However, since the loner of this state is set to false
, it could make the locker to issue the Fcb caps to them, which is incorrect. This would cause some client requests to incorrectly revoke some caps and infinitely wait and cause slow requests.
With this fix, the loner member is set to true
and as a result the corresponding request is not blocked.
snap-schedule repeat and retention specification for monthly snapshots is changed from m
to M
Previously, the snap-schedule repeat specification and retention specification for monthly snapshots was not consistent with other Ceph components.
With this fix, the specifications are changed from m
to M
and it is now consistent with other Ceph components. For example, to retain 5 monthly snapshots, you need to issue the following command:
# ceph fs snap-schedule retention add /some/path M 5 --fs cephfs
ceph-mds no longer crashes when some inodes are replicated in multi-mds cluster
Previously, due to incorrect lock assertion in ceph-mds, ceph-mds would crash when some inodes were replicated in a multi-mds cluster.
With this fix, the lock state in the assertion is validated and no crash is observed.
Missing fields, such as date
, client_count
, filters
are added to the --dump
output
With this fix, missing fields, such as date
, client_count
, filters
are added to the --dump
output.
MDS no longer fails with the assert function during recovery
Previously, MDS would sometimes report metadata damage incorrectly when recovering a failed rank and thus, fail with an assert function.
With this fix, the startup procedure is corrected and the MDS does not fail with the assert function during recovery.
The target mon_host details are removed from the peer List and mirror daemon status
Previously, the snapshot mirror peer-list showed more information than just the peer list. This output caused confusion if there should be only one MON IP or all the MON host IP’s should be displayed.
With this fix, mon_host is removed from the fs snapshot mirror peer_list command and the target mon_host details are removed from the peer List and mirror daemon status.
The target mon_host details are removed from the peer List and mirror daemon status
Previously, a regression was introduced by the quiesce protocol code. When killing the client requests, it would just skip choosing the new batch head for the batch operations. This caused the stale batch head requests to stay in the MDS cache forever and then be treated as slow requests.
With this fix, choose a new batch head when killing requests and no slow requests are caused by the batch operations.
File system upgrade happens even when no MDS is up
Previously, monitors would not allow an MDS to upgrade a file system when all MDS were down. Due to this, upgrades would fail when the fail_fs
setting was set to 'true'.
With this fix, monitors allow the upgrades to happen when no MDS is up.
4.6. Ceph Object Gateway
Auto-generated internal topics are no longer shown in the admin topic list command
Previously, auto-generated internal topics were exposed to the user via the topic list command due to which the users could see a lot more topics than what they had created.
With this fix, internal, auto-generated topics are not shown in the admin topic list command and users now see only the expected list of topics.
The deprecated bucket name field is no longer shown in the topic list command
Previously, in case of pull mode notifications (pubsub
), the notifications were stored in a bucket. However, despite this mode being deprecated, an empty bucket name field is still shown in the topic list command.
With this fix, the empty bucket name field is removed.
Notifications are now sent on lifecycle transition
Previously, logic to dispatch on transition (as distinct from expiration) was missed. Due to this, notifications were not seen on transition.
With this fix, new logic is added and notifications are now sent on lifecycle transition.
RGWCopyObjRequest
is fixed and rename operations work as expected
Previously, incorrect initialization of RGWCopyObjRequest
, after zipper conversion, broke the rename operation. Due to this, many rgw_rename()
scenarios failed to copy the source object, and due to a secondary issue, also deleted the source even though the copy had failed.
With this fix, RGWCopyObjRequest
is corrected and several unit test cases are added for different renaming operations.
Ceph Object Gateway can no longer be illegally accessed
Previously, a variable representing a Ceph Object Gateway role was being accessed before it was initialized, resulting in a segfault.
With this fix, operations are reordered and there is no illegal access. The roles are enforced as required.
An error message is now shown per wrong CSV object structure
Previously, a CSV file with unclosed double-quotes would cause an assert, followed by a crash.
With this fix, an error message is introduced which pops up per wrong CSV object structure.
Users no longer encounter 'user not found' error when querying user-related information in the Ceph dashboard
Previously, in the Ceph dashboard, end users could not retrieve the user-related information from the Ceph Object Gateway due to the presence of a namespace in the full user_id
which the dashboard would not identify, resulting in encountering the “user not found” error.
With this fix, a fully constructed user ID, which includes tenant
, namespace
, and user_id
is returned as well as each field is returned individually when a GET request is sent to admin ops for fetching user information. End users can now retrieve the correct user_id
, which can be used to further fetch other user-related information from Ceph Object Gateway.
Ceph Object gateway now passes requests with well-formed payloads of the new stream encoding forms
Previously, Ceph Object gateway would not recognize STREAMING-AWS4-HMAC-SHA256-PAYLOAD
and STREAMING-UNSIGNED-PAYLOAD-TRAILER
encoding forms resulting in request failures.
With this fix, the logic to recognize, parse, and wherever applicable, verify new trailing request signatures provided for the new encoding forms is implemented. The Ceph Object gateway now passes requests with well-formed payloads of the new stream encoding forms.
The check stat calculation for radosgw admin bucket and bucket reshard stat calculation are now correct
Previously, due to a code change, radosgw-admin bucket check stat calculation and bucket reshard stat calculation were incorrect when there were objects that transitioned from unversioned to versioned.
With this fix, the calculations are corrected and incorrect bucket stat outputs are no longer generated.
Tail objects are no longer lost during a multipart upload failure
Previously, during a multipart upload, if an upload of a part failed due to scenarios, such as a time-out, and the upload was restarted, the cleaning up of the first attempt would remove tail objects from the subsequent attempt. Due to this, the resulting Ceph Object Gateway multipart object would be damaged as some tail objects would be missing. It would respond to a HEAD request but fail during a GET request.
With this fix, the code cleans up the first attempt correctly. The resulting Ceph Object Gateway multipart object is no longer damaged and can be read by clients.
ETag values in the CompleteMultipartUpload
and its notifications are now present
Previously, changes related to notifications caused the object handle corresponding to the completing multipart upload to not contain the resulting ETag. Due to this, ETags were not present for completing multipart uploads as the result of CompleteMultipartUpload
and its notifications. (The correct ETag was computed and stored, so subsequent operations contained a correct ETag result.)
With this fix, CompleteMultipartUpload
refreshes the object and also prints it as expected. ETag values in the CompleteMultipartUpload
and its notifications are present.
Listing a container (bucket) via swift no longer causes a Ceph Object Gateway crash
Previously, a swift-object-storage
call path was missing a call to update an object handle with its corresponding bucket (zipper backport issue). Due to this, listing a container (bucket) via swift would cause a Ceph Object Gateway crash when an S3 website was configured for the same bucket.
With this fix, the required zipper logic is added and the crash no longer occurs.
Processing a lifecycle on a bucket with no lifecycle policy does not crash now
Previously, attempting to manually process a lifecycle on a bucket with no lifecycle policy induced a null pointer reference causing the radosgw-admin program to crash.
With this fix, a check for a null bucket handle is made before operating on the handle to avoid the crash.
Zone details for a datapool can now be modified
The rgw::zone_create()
function initializes the default placement target and pool name on zone creation. This function was also previously used for radosgw-admin zone set with exclusive=false
. But, zone set
does not allow the STANDARD storage class’s data_pool to be modified.
With this fix, the default-placement target should not be overwritten if it already exists and the zone details for a datapool can be modified as expected.
Modulo operation on float numbers now return correct results
Previously, modulo operation on float numbers returned wrong results.
With this fix, the SQL engine is enhanced to handle modulo operations on floats and return correct results.
SQL statements correctly return results for case-insensitive boolean expressions
Previously, SQL statements contained a boolean expression with capital letters in parts of the statement resulting in wrong interpretation and wrong results.
With this fix, the interpretation of a statement is case-insensitive and hence, the correct results are returned for any case.
SQL engine returns the correct NULL value
Previously, SQL statements contained cast into type from NULL, as a result of which, the wrong result was returned instead of returning NULL.
With this fix, the SQL engine identifies cast from NULL and returns NULL.
ETags values are now present in CompleteMultipartUpload
and its notifications
Previously, the changes related to notifications caused the object handle, corresponding to the completing multipart upload, to not contain the resulting ETag. As a result, ETags were not present for CompleteMultipartUpload
and its notifications. (The correct ETag was computed and stored, so subsequent operations contained a correct ETag result.)
With this fix, CompleteMultipartUpload
refreshes the object and also prints it as expected. ETag values are now present in the CompleteMultipartUpload
and its notifications.
Sending workloads with embedded backslash (/) in object names to cloud-sync no longer causes sync failures
Previously, incorrect URL-escaping of object paths during cloud sync caused sync failures when workloads contained objects with an embedded backslash (/) in the names, that is, when virtual directory paths were used.
With this fix, incorrect escaping is corrected and workloads with embedded backslash (/) in object names can be sent to cloud-sync as expected.
SQL statements containing boolean expression return boolean types
Previously, SQL statements containing boolean expression (a projection) would return a string type instead of boolean type.
With this fix, the engine identifies a string as a boolean expression, according to the statement syntax, and the engine successfully returns a boolean type (true/false).
The work scheduler now takes the next date into account in the should_work
function
Previously, the logic used in the should_work
function, that decides whether the lifecycle should start running at the current time, would not take the next date notion into account. As a result, any custom work time "XY:TW-AB:CD" would break the lifecycle processing when AB < XY.
With this fix, the work scheduler now takes the next date into account and the various custom lifecycle work schedules now function as expected.
merge_and_store_attrs()
method no longer causes attribute update operations to fail
Previously, a bug in the merge_and_store_attrs()
method, which deals with reconciling changed and the unchanged bucket instance attributes, caused some attribute update operations to fail silently. Due to this, some metadata operations on a subset of buckets would fail. For example, a bucket owner change would fail on a bucket with a rate limit set.
With this fix, the merge_and_store_attrs()
method is fixed and all affected scenarios now work correctly.
Checksum and malformed trailers can no longer induce a crash
Previously, an exception from AWSv4ComplMulti
during java AWS4Test.testMultipartUploadWithPauseAWS4
led to a crash induced by some client input, specifically, by those which use checksum trailers.
With this fix, an exception handler is implemented in do_aws4_auth_completion()
. Checksum and malformed trailers can no longer induce a crash.
Implementation of improved trailing chunk boundary detection
Previously, one valid-form of 0-length trailing chunk boundary formatting was not handled. Due to this, the Ceph Object Gateway failed to correctly recognize the start of the trailing chunk, leading to the 403 error.
With this fix, improved trailing chunk boundary detection is implemented and the unexpected 403 error in the anonymous access case no longer occurs.
Default values for Kafka message and idle timeouts no longer cause hangs
Previously, the default values for Kafka message and idle timeouts caused infrequent hangs while waiting for the Kafka broker.
With this fix, the timeouts are adjusted and it no longer hangs.
Delete bucket tagging no longer fails
Previously, an incorrect logic in RADOS SAL merge_and_store_attrs()
caused deleted attributes to not materialize. This also affected DeleteLifecycle
. As a result, a pure attribute delete did not take effect in some code paths.
With this fix, the logic to store bucket tags uses RADOS SAL put_info()
instead of merge_and_store_attrs()
. Delete bucket tagging now succeeds as expected.
Object mtime
now advances on S3 PutACL
and ACL changes replicate properly
Previously, S3 PutACL
operations would not update object mtime
. Due to this, the ACL changes once applied would not replicate as the timestamp-based object-change check incorrectly returned false.
With this fix, the object mtime
always advances on S3 PutACL
and ACL changes properly replicate.
All transition cases can now dispatch notifications
Previously, the logic to dispatch notifications on transition was mistakenly scoped to the cloud-transition case due to which notifications on pool transition were not sent.
With this fix, notification dispatch is added to the pool transition scope and all transition cases can dispatch notifications.
RetainUntilDate
after the year 2106 no longer truncates and works as expected for new PutObjectRetention
requests
Previously, PutObjectRetention
requests specifying a RetainUntilDate
after the year 2106 would truncate, resulting in an earlier date used for object lock enforcement. This did not affect ` PutBucketObjectLockConfiguration` requests, where the duration is specified in days.
With this fix, the RetainUntilDate
now saves and works as expected for new PutObjectRetention
requests. Requests previously existing are not automatically repaired. To fix existing requests, identify the requests by using the HeadObject
request based on the x-amz-object-lock-retain-until-date
and save again with the RetainUntilDate
.
For more information, see S3 put object retention
Bucket lifecycle processing rules are no longer stalled
Previously, enumeration of per-shard bucket-lifecycle rules contained a logical error related to concurrent removal of lifecycle rules for a bucket. Due to this, a shard could enter a state which would stall processing of that shard, causing some bucket lifecycle rules to not be processed.
With this fix, enumeration can now skip past a removed entry and the lifecycle processing stalls related to this issue are resolved.
Deleting objects in versioned buckets causes statistics mismatch
Due to versioned buckets having a mix of current and non-current objects, deleting objects might cause bucket and user statistics discrepancies on local and remote sites. This does not cause object leaks on either site, just statistics mismatch.
4.7. Multi-site Ceph Object Gateway
Ceph Object Gateway no longer deadlocks during object deletion
Previously, during object deletion, the Ceph Object Gateway S3 DeleteObjects
would run together with a multi-site deployment, causing the Ceph Object Gateway to deadlock and stop accepting new requests. This was caused by the DeleteObjects
requests processing several object deletions at a time.
With this fix, the replication logs are serialized and the deadlock is prevented.
CURL path normalization is now disabled at startup
Previously, due to "path normalization" performed by CURL, by default (part of the Ceph Object Gateway replication stack), object names were illegally reformatted during replication. Due to this, objects whose names contained embedded .
and ..
were not replicated.
With this fix, the CURL path normalization is disabled at startup and the affected objects replicate as expected.
The authentication of the forwarded request on the primary site no longer fails
Previously, an S3 request issued to secondary failed if temporary credentials returned by STS were used to sign the request. The failure occured because the request would be forwarded to the primary and signed using a system user’s credentials which do not match the temporary credentials in the session token of the forwarded request. As a result of unmatched credentials, the authentication of the forwarded request on the primary site fails, which results in the failure of the S3 operation.
With this fix, the authentication is by-passed by using temporary credentials in the session token in case a request is forwarded from secondary to primary. The system user’s credentials are used to complete the authentication successfully.
4.8. RADOS
Ceph reports a POOL_APP_NOT_ENABLED
warning if the pool has zero objects stored in it
Previously, Ceph status failed to report pool application warning if the pool was empty resulting in RGW bucket creation failure if the application tag was enabled for RGW pools.
With this fix, Ceph reports a POOL_APP_NOT_ENABLED
warning even if the pool has zero objects stored in it.
Checks are added for uneven OSD weights between two sites in a stretch cluster
Previously, there were no checks for equal OSD weights after stretch cluster deployment. Due to this, users could make OSD weights unequal.
With this fix, checks are added for uneven OSD weights between two sites in a stretch cluster. The cluster now gives a warning about uneven OSD weight between two sites.
Autoscaler no longer runs while the norecover
flag is set
Previously, the autoscaler would run while the norecover
flag was set leading to creation of new PGs and these PGs requiring to be backfilled. Running of autoscaler while the norecover
flag is set allowed in cases where I/O is blocked on missing or degraded objects in order to avoid client I/O hanging indefinitely.
With this fix, the autoscaler does not run while the norecover
flag is set.
The ceph config dump
command output is now consistent
Previously, the ceph config dump
command without the pretty print formatted output showed the localized option name and its value. An example of a normalized vs localized option is shown below:
Normalized: mgr/dashboard/ssl_server_port Localized: mgr/dashboard/x/ssl_server_port
However, the pretty-printed (for example, JSON) version of the command only showed the normalized option name as shown in the example above. The ceph config dump
command result was inconsistent between with and without the pretty-print option.
With this fix, the output is consistent and always shows the localized option name when using the ceph config dump --format TYPE
command, with TYPE
as the pretty-print type.
MGR module no longer takes up one CPU core every minute and CPU usage is normal
Previously, expensive calls from the placement group auto-scaler module to get OSDMap from the Monitor resulted in the MGR module taking up one CPU core every minute. Due to this, the CPU usage was high in the MGR daemon.
With this fix, the number of OSD map calls made from the placement group auto-scaler module is reduced. The CPU usage is now normal.
The correct CRUSH location of the OSDs parent (host) is determined
Previously, when the osd_memory_target_autotune
option was enabled, the memory target was applied at the host level. This was done by using a host mask when auto-tuning the memory. But the code that applied to the memory target would not determine the correct CRUSH location of the parent host for the change to be propagated to the OSD(s) of the host. As a result, none of the OSDs hosted by the machine got notified by the config observer and the osd_memory_target
remained unchanged for those set of OSDs.
With this fix, the correct CRUSH location of the OSDs parent (host) is determined based on the host mask. This allows the change to propagate to the OSDs on the host. All the OSDs hosted by the machine are notified whenever the auto-tuner applies a new osd_memory_target
and the change is reflected.
Monitors no longer get stuck in elections during crash/shutdown tests
Previously, the disallowed_leaders
attribute of the MonitorMap was conditionally filled only when entering stretch_mode
. However, there were instances wherein monitors that got revived would not enter stretch_mode
right away because they would be in a probing
state. This led to a mismatch in the disallowed_leaders
set between the monitors across the cluster. Due to this, monitors would fail to elect a leader, and the election would be stuck, resulting in Ceph being unresponsive.
With this fix, monitors do not have to be in stretch_mode
to fill the disallowed_leaders
attribute. Monitors no longer get stuck in elections during crash/shutdown tests.
'Error getting attr on' message no longer occurs
Previously, ceph-objectstore-tool
listed pgmeta objects when using --op list
resulting in "Error getting attr on" message.
With this fix, pgmeta objects are skipped and the error message no longer appears.
LBA alignment in the allocators are no longer used and the OSD daemon does not assert due to allocation failure
Previously, OSD daemons would assert and fail to restart which could sometimes lead to data unavailability or data loss. This would happen as the OSD daemon would not assert if the allocator got to 4000 requests and configured with a different allocation unit.
With this fix, the LBA alignment in the allocators are not used and the OSD daemon does not assert due to allocation failure.
A sqlite database using the "libcephsqlite" library no longer may be corrupted due to short reads failing to correctly zero memory pages.
Previously, “libcephsqlite” would not handle short reads correctly which may cause corruption of sqlite databases.
With this fix, “libcephsqlite” zeros pages correctly for short reads to avoid potential corruption.
4.9. RBD Mirroring
The image status description now shows "orphan (force promoting)" when a peer site is down during force promotion
Previously, upon a force promotion, when a peer site went down, the image status description showed "local image linked to unknown peer", which is not a clear description.
With this fix, the mirror daemon is improved to show image status description as "orphan (force promoting)".
rbd_support
module no longer fails to recover from repeated block-listing of its client
Previously, it was observed that the rbd_support
module failed to recover from repeated block-listing of its client due to a recursive deadlock in the rbd_support module, a race condition in the rbd_support
module’s librbd client, and a bug in the librbd cython bindings that sometimes crashed the ceph-mgr.
With this release, all these 3 issues are fixed and rbd_support` module no longer fails to recover from repeated block-listing of its client