Chapter 4. Bug fixes
This section describes bugs with significant impact on users that were fixed in this release of Red Hat Ceph Storage. In addition, the section includes descriptions of fixed known issues found in previous versions.
4.1. The Cephadm utility
The original_weight
field is added as an attribute for the OSD removal queue
Previously, cephadm osd removal queue did not have a parameter for original_weight. As a result, the cephadm module would crash during OSD removal. With this fix, the original_weight field is added as an attribute for the osd removal queue and the cephadm no longer crashes during OSD removal.
Cephadm no longer randomly marks hosts offline during large deployments
Previously, in some cases, when Cephadm had a short command timeout on larger deployments, hosts would randomly be marked offline during the host checks.
With this fix, the short timeout is removed. Cephadm now relies on the timeout specified by mgr/cephadm/default_cephadm_command_timeout
setting.
The ssh_keepalive_interval
interval and ssh_keepalive_count_max
settings are also now configurable through the mgr/cephadm/ssh_keepalive_interval
and mgr/cephadm/ssh_keepalive_count_max
settings.
These settings give users better control over how hosts are marked offline in their Cephadm managed clusters and Cephadm no longer randomly marks hosts offline during larger deployments.
Bugzilla:2308688
Custom webhooks are now specified under the custom-receiver
receiver.
Previously, custom Alertmanager webhooks were being specified within the default
receiver in the Alertmanager configuration file. As a result, custom alerts were not being sent to the specified webhook unless the alerts did not match any other receiver.
With this fix, custom webhooks are now specified under the custom-receiver
receiver. Alerts are now sent to custom webhooks even if the alerts match another receiver.
Bugzilla:2313614
4.2. Ceph Dashboard
cherrypy
no longer gets stuck during network security scanning
Previously, due to a bug in the cheroot package, cherrypy
would get stuck during some security scans that were scanning the network. As a result, the Ceph Dashboard became unresponsive and needed to restart the mgr module.
With this fix, the cheroot package is updated and the issue is resolved.
Zone storage class details now display the correct compression information
Previously, the wrong compression information was being set for zone details. As a result, the zone details under the storage classes section were showing incorrect compression information.
With this fix, the information is corrected for storage classes and the zone details now show the correct compression information.
Zone storage class detail values are now set correctly
Previously, wrong data pool values were set for storage classes in zone details. As a result, data pool values were incorrect in the user interface when multiple storage classes were created.
With this fix, the correct values are being set for storage classes in zone details.
_nogroup
is now listed in the Subvolume Group list even if there are no subvolumes in _nogroup
Previously, while cloning subvolume, _nogroup
subvolume group was not listed if there were no subvolumes in _nogroup
. As a result, users were not able to select _nogroup
as a subvolume group.
With this fix, while cloning a subvolume, _nogroup
is listed in the Subvolume Group list, even if there are no subvolumes in _nogroup
.
Correct UID containing $
in the name is displayed in the dashboard
Previously, when a user was created through the CLI, the wrong UID containing $
in the name was being displayed in the Ceph Dashboard.
With this fix, the correct UID is displayed, even if the user containing $
in the name is created by using the CLI.
NFS in File and Object now have separate routing
Previously, the same route was being used for both NFS in File and NFS in Object. This caused issues from a usability perspective, as both navigation links of NFS in File and Object got highlighted. The user was also required to select the storage backend for both views, File and Object.
With this fix, NFS in File and Object have separate routing and do not ask users to enter the storage backend, thereby improving usability.
Validation for pseudo path and CephFS path is now added during NFS export creation
Previously, during the creation of NFS export, the psuedo path had to be entered manually. As a result, CephFS path could not be validated.
With this fix, the pseudo path field is left blank for the user to input a path and CephFS path gets the updated path from the selected subvolume group and subvolume. A validation is now also in place for invalid values added for CephFS path. If a user attempts to change the CephFS path to an invalid value, the export creation fails.
Users are now prompted to enter a path when creating an export
Previously, creating an export was not prompting for a path and by default /
was entered.
With this fix, when attempting to create the export directly on the file system, it prompts for a path. If an invalid path is entered, creation is not permitted. Additionally, when entering the path of the CephFS file system directly, a warning appears stating "Export on CephFS volume '/' not allowed".
Snapshots containing "." and "/" in the name cannot be deleted
When a snapshot is created with "." as a name, it cannot be deleted.
As a workaround, users must avoid creating the snapshot name with "." and "/".
A period update commit is now added after migrating to a multi-site
Previously, a period commit after completing the migration to multi-site form was not being done. As a result, a warning about the master zone not having an endpoint would display despite the endpoint being configured.
With this fix, a period update commit is added after migrating to multi-site form and no warning is emitted.
Performance statistics latency graph now displays the correct data
Previously, the latency graph in Object > Overview > Performance statistics was not displaying data because of the way the NaN values were handled in the code.
With this fix, the latency graph displays the correct data, as expected.
”Delete realm” dialog is now displayed when deleting a realm
Previously, when clicking “Delete realm”, the delete realm dialog was not being displayed as it was broken.
With this fix, the delete realm dialog loads properly and users can delete the realm.
Configuration values, such as rgw_realm
, rgw_zonegroup
, and rgw_zone
are now set before deploying the Ceph Object Gateway daemons
Previously, configuration values like rgw_realm
, rgw_zonegroup
, and rgw_zone
were being set after deploying the Ceph Object Gateway daemons. This would cause the Ceph Object Gateway daemons to deploy in the default realm, zone group, and zone configurations rather than the specified configurations. This would require a restart to deploy them under the correct realm, zone group, and zone configuration.
With this fix, the configuration values are now set before deploying the Ceph Object Gateway daemons and they are deployed in the specified realm, zone group, and zone in the specification.
4.3. Ceph File System
Exceptions during cephfs-top
are fixed
Previously, in some cases where there was insufficient space on the terminal, the cephfs-top
command would not have enough space to run and would throw an exception.
With this fix, the exceptions that arise during the running of the cephfs-top
command on large and small sized windows are fixed.
The path restricted cephx
credential no longer fails permission checks on a removed snapshot of a directory
Previously, path restriction checks were constructed on anonymous paths for unlinked directories accessed through a snapshot. As a result, a path restricted cephx
credential would fail permission checks on a removed snapshot of a directory.
With this fix, the path constructed for the access check is constructed from the original path for the directory at the time of snapshot and the access checks are successfully passed.
Bugzilla:2293353
MDS no longer requests unnecessary authorization PINs
Previously, MDS would unnecessarily acquire remote authorization PINs for some workloads causing slower metadata operations.
With this fix, MDS no longer requests unnecessary authorization PINs, resulting in normal metadata performance.
The erroneous patch from the kernel driver is processed appropriately and MDS no longer enters an infinite loop
Previously, an erroneous patch to the kernel driver caused the MDS to enter an infinite loop processing an operation due to which MDS would become largely unavailable.
With this fix, the erroneous message from the kernel driver is processed appropriately and the MDS no longer enters an infinite loop.
Bugzilla:2303693
Mirror daemon can now restart when blocklisted or fails
Previously, the time difference taken would lead to negative seconds and never reach the threshold interval. As a result, the mirror daemon would not restart when blocklisted or failed.
With this fix, the time difference calculation is corrected.
JSON output of the ceph fs status
command now correctly prints the rank field
Previously, due to a bug in the JSON output of the ceph fs status
command, the rank field for standby-replay MDS daemons were incorrect. Instead of the format {rank}-s
, where {rank} is the active MDS which the standby-replay is following, it displayed a random {rank}.
With this fix, the JSON output of ceph fs status
command correctly prints the rank field for the standby-replay MDS in the format '{rank}-s'.
sync_duration
is now calculated in seconds
Previously, the sync duration was calculated in milliseconds. This would cause usability issues, as all other calculations were in seconds.
With this fix, sync_duration
is now displayed in seconds.
A lock is now implemented to guard the access to the shared data structure
Previously, access to a shared data structure without a lock caused the applications using the CephFS client library to throw an error.
With this fix, a lock, known as a mutex is implemented to guard the access to the shared data structure and applications using Ceph client library work as expected.
The snap-schedule manager
module correctly enforces the global mds_max_snaps_per_dir
configuration option
Previously, the configuration value was not being correctly retrieved from the MDS. As a result, snap-schedule manager
module would not enforce the mds_max_snaps_per_dir
setting and would enforce a default limit of 100.
With this fix, the configuration item is correctly fetched from the MDS. The snap-schedule manager
module correctly enforces the global mds_max_snaps_per_dir
configuration option.
CephFS FUSE clients can now correctly access the specified mds auth caps
path
Previously, when parsing a path while validating mds auth caps
FUSE clients were not able to access a specific path, even when the path was specified as rw
for the mds auth caps
.
With this fix, the parsing issue of the path while validating mds auth caps
is fixed and the paths can be accessed as expected.
4.4. Ceph Object Gateway
SQL queries on a JSON statement no longer confuse key
with array
or object
Previously, in some cases, a result of an SQL statement on a JSON structure would confuse key
with array
or object
. As a result, there was no venue.id
, as defined, with id
as the key
value in the `venue object and it keeps traversing the whole JSON object.
With this fix, the SQL engine is fixed to avoid wrong results for mixing between key
with array
or object
and returns the correct results, according to the query.
Error code of local authentication engine is now returned correctly
Previously, incorrect error codes were returned when the local authentication engine was specified last in the authentication order and when the previous authentication engines were not applicable. As a result, incorrect error codes were returned.
With this fix, the code returns the error code of the local authentication engine in case the previous external authentication engines are not applicable for authenticating a request and correct error codes are returned.
Lifecycle transition now works for the rules containing "Date"
Previously, due to a bug in the lifecycle transition code, rules containing "Date" did not get processed causing the objects meeting the criteria to not get transitioned to other storage classes.
With this fix, the lifecycle transition works for the rules containing "Date".
Notifications are now sent on lifecycle transition
Previously, logic to dispatch on transition (as distinct from expiration) was missed. Due to this, notifications were not seen on transition.
With this fix, new logic is added and notifications are now sent on lifecycle transition.
Batch object deleting is now allowed, with IAM policy permissions
Previously, during a batch delete process, also known as multi object delete, due to the incorrect evaluation of IAM policies returned AccessDenied
output if no explicit or implicit deny were present. The AccessDenied
occurred even if there were Allow privileges. As a result, batch deleting fails with the AccessDenied
error.
With this fix, the policies are evaluated as expected and batch deleting succeeds, when IAM policies are enabled.
Removing an S3 object now properly frees storage space
Previously, in some cases when removing the CopyObject and the size was larger than 4 MB, the object did not properly free all storage space that was used by that object. With this fix the source and destination handles are passed into various RGWRados call paths explicitly and the storage frees up, as expected.
Quota and rate limit settings for assume-roles are properly enforced for S3 requests with temporary credentials
Previously, information of a user using an assume-role was not loaded successfully from the backend store when temporary credentials were being used to serve an S3 request. As a result, the user quota or rate limit settings were not applied with the temporary credentials.
With this fix, the information is loaded from the backend store, even when authenticating with temporary credentials and all settings are applied successfully.
Pre-signed URLs are now accepted with Keystone EC2 authentication
Previously, a properly constructed pre-signed HTTP PUT URLs failed unexpectedly, with a 403/Access Denied
error. This happened because of a change in processing of HTTP OPTIONS requests containing CORS changed the implied AWSv4 request signature calculation for some pre-signed URLs when authentication was through Keystone EC2 (Swift S3 emulation).
With this fix, a new workflow for CORS HTTP OPTIONS is introduced for the Keystone EC2 case and pre-signed URLs no longer unexpectedly fail.
Malformed JSON of radosgw-admin notification output is now corrected
Previously, when bucket notifications were configured with metadata and tag filters, the output of radosgw-admin
notification for the get/list output was malformed JSON. As a result, any JSON parser, such as jquery, reading the output would fail.
With this fix, the JSON output for radosgw-admin
is corrected.
Clusters can now be configured with both QAT and non-QAT Ceph Object Gateway daemons
Previously, QAT could only be configured on new setups (Greenfield only). As a result, QAT Ceph Object Gateway daemons could not be configured in the same cluster as non-QAT (regular) Ceph Object Gateway daemons.
With this fix, both QAT and non-QAT Ceph Object Gateway daemons can be configured in the same cluster.
Ceph Object Gateway now tolerates minio SDK with checksums and other hypothetical traffic
Previously, some versions of the minio client SDK would be missing an appended part number for multipart objects. This would result in unexpected errors for multipart uploads.
With this fix, checksums, both with and without part number suffixes, are accepted. The fix also permits the checksum type to be inferred from the init-multipart when a checksum is not asserted in part uploads.
Lifecycle transition no longer fails for non-current object with an empty instance
Previously, when bucket versioning was enabled, old plain object entries would get converted to versioned by updating its instance as "null" in its raw head/old object. Due to this, the lifecycle transition would fail for a non-current object with an instance empty.
With this fix, the code is corrected to keep the instance empty while updating bucket index entries and the lifecycle transition works for all plain entries which are converted to versioned.
The AST structure SQL statement no longer causes a crash
Previously, in some cases, due to an erroneous semantic combined with the Parquet flow, the AST creation that is produced by the SQL engine was wrong and a crash would occur.
With this fix, more safety checks are in place for the AST structure and the statement processing time is fixed and a crash is avoided.
Bucket policy authorizations now work as expected
Previously, only a bucket owner was able to set, get, and delete the configurations for bucket notifications from a bucket. This was the case even if the bucket policy authorized another user for running these operations.
With this fix, authorization for configuring bucket notifications works as expected.
Bucket policy evaluations now work as expected and allow cross tenant access for actions that are allowed by the policy
Previously, due to an incorrect value bucket tenant, during a bucket policy evaluation access was defined for S3 operations, even if they were explicitly allowed in the bucket policies. As a result, the bucket policy evaluation failed and S3 operations which were marked as allowed by the bucket policy were denied.
With this fix, the requested bucket tenant name is correctly passed when getting the bucket policy from the backend store. The tenant is then matched against the bucket tenant which was passed in as part of the S3 operation request, and S3 operations work as expected.
SSL sessions can now reuse connections for uploading multiple objects
Previously, during consecutive object uploads using SSL, the cipher negotiations occurred for each object. As a result, there would be a low performance of objects per second transfer rate.
With this fix, the SSL session reuse mechanism is activated, allowing supporting clients to reuse existing SSL connections for uploading multiple objects. This avoids the performance penalty of renegotiating the SSL connection for each object.
4.5. Multi-site Ceph Object Gateway
Objects with null version IDs delete on the second site
Previously, objects with null version IDs were not deleted on the second site. In a multi-site environment, deleting an object with a null version ID on one of the sites did not delete the object on the second site.
With this fix, objects with null version IDs delete on the second site.
Bucket creations in a secondary zone no longer fail
Previously, when a secondary zone forwarded a create_bucket
request with a location constraint, the bucket would set the content_length
to a non-zero value. However, the content_length
was not parsed on the primary zone when forwarded from the secondary zone. As a result, when running a create_bucket
operation and the content_length
is 0
with an existing payload hash, the bucket failed to replicate.
With this fix a request body is included when the CreateBucket
operation is forwarded to the primary zone and the bucket is created as expected.
CopyObject requests now replicate as expected
Previously, a copy_object
would retain the source attributes by default. As a result, during a check for RGW_ATTR_OBJ_REPLICATION_TRACE
, a NOT_MODIFIED
error would emit if the destination zone was already present in the trace. This would cause a failure to replicate the copied object.
With this fix, the source object RGW_ATTR_OBJ_REPLICATION_TRACE
attribute is removed during a copy_object
and the CopyObject
requests replicate as expected.
4.6. RADOS
Newly added capacity no longer marked as allocated
Previously, newly added capacity was marked automatically as allocated. As a result, added disk capacity did not add available space.
With this fix, added capacity is marked as free and available, and after restarted OSD, newly added capacity is recognized as added space, as expected.
Bugzilla:2296247
BlueStore now works as expected with OSDs
Previously, the ceph-bluestore-tool show-label
would not work on mounted OSDs and the ceph-volume lvm zap
command was not able to erase the identity of an OSD. With this fix, the show-label
attribute does not require exclusive access to the disk. In addition, the ceph-volume
command now uses ceph-bluestore-tool zap
to clear OSD devices.
BlueStore no longer overwrites labels
Previously, BlueFS would write over the location reserved for a label. As a result, the OSD would not start as expected.
With this fix, the label location is marked as reserved and does not get overwritten. BlueStore now mounts and the OSD starts as expected.
RocksDB files now only take as much space as needed
Previously, RocksDB files were generously preallocated but never truncated. This resulted in wasted disk space that was assigned to files that would never be used.
With this fix, proper truncation is implemented, moving unused allocations back to the free pool.
Monitors no longer get stuck in elections during crash or shutdown tests
Previously, the disallowed_leaders
attribute of the MonitorMap was conditionally filled only when entering stretch_mode
. However, there were instances wherein Monitors that just got revived would not enter stretch_mode
right away because they would be in a probing
state. This led to a mismatch in the disallowed_leaders
set between the monitors across the cluster. Due to this, Monitors would fail to elect a leader, and the election would be stuck, resulting in Ceph being unresponsive.
With this fix, Monitors do not have to be in stretch_mode
to fill the disallowed_leaders
attribute. Monitors no longer get stuck in elections during crash or shutdown tests.
4.7. RADOS Block Devices (RBD)
librbd
no longer crashes when handling discard I/O requests
Previously, due to an implementation defect, librbd
would crash when handling discard I/O requests that straddle multiple RADOS objects on an image with journaling feature enabled. The work around was to set the setting rbd_skip_partial_discard
option to ‘false’ (the default is ‘true’).
With this fix, the implementation defect is fixed and librbd
no longer crashes and the workaround is no longer necessary.
rbd du
command no longer crashes if a 0-sized block device image is encountered
Previously, due to an implementation defect, rbd du
command would crash if it encountered a 0-sized RBD image.
With this fix, the implementation defect is fixed and the rbd du
command no longer crashes if a 0-sized RBD image is encountered.
rbd_diff_iterate2()
API returns correct results for block device images with LUKS encryption loaded
Previously, due to an implementation defect, rbd_diff_iterate2()
API returned incorrect results for RBD images with LUKS encryption loaded.
With this fix, rbd_diff_iterate2()
API returns correct results for RBD images with LUKS encryption loaded.
Bugzilla:2292562
Encrypted image decryption is no longer skipped after importing or live migration
Previously, due to an implementation defect, when reading from an encrypted image that was live migrated or imported, decryption was skipped. As a result, an encrypted buffer (ciphertext), instead of the actual stored data (plaintext) was returned to the user.
With this fix, decryption is no longer skipped when reading from an encrypted image that is being live migrated or imported and the actual stored data (plaintext) is returned to the user, as expected.
Bugzilla:2303528
Encryption specification now always propagates to the migration source
Previously, due to an implementation defect, when opening an encrypted clone image that is being live migrated or imported, the encryption specification wouldn’t be propagated to the migration source. As a result, the encrypted clone image that is being live migrated or imported would not open. The only workaround for the user was to pass the encryption specification twice by duplicating it.
With this fix, the encryption specification always propagates to the migration source.
Bugzilla:2308345
4.8. RBD Mirroring
rbd-mirror
daemon now properly disposes of outdated PoolReplayer
instances
Previously, due to an implementation defect, rbd-mirror
daemon did not properly dispose of outdated PoolReplayer
instances in particular when refreshing the mirror peer configuration. Due to this there was unnecessary resource consumption and number of PoolReplayer
instances competed against each other causing rbd-mirror
daemon health to be reported as ERROR and replication would hang in some cases. To resume replication the administrator had to restart the rbd-mirror
daemon.
With this fix, the implementation defect is corrected and rbd-mirror daemon now properly disposes of outdated PoolReplayer
instances.
4.9. NFS Ganesha
All memory consumed by the configuration reload process is now released
Previously, reload exports would not release all the memory consumed by the configuration reload process causing the memory footprint to increase.
With this fix, all memory consumed by the configuration reload process is released resulting in reduced memory footprint.
The reap_expired_client_list
no longer causes a deadlock
Previously, in some cases, the reap_expired_client_list
would create a deadlock. This would occur due two threads waiting on each other to acquire a lock.
With this fix, the lock order is resolved and no deadlock occurs.
File parsing and startup time is significantly reduced
Previously, due to poor management of parsed tokens, configuration file parsing was too slow.
With this fix, token lookup is replaced with AVL tree resulting in the reduction of parsing time and startup time.