このコンテンツは選択した言語では利用できません。

Chapter 4. Bug fixes


This section describes bugs with significant impact on users that were fixed in this release of Red Hat Ceph Storage. In addition, the section includes descriptions of fixed known issues found in previous versions.

4.1. The Cephadm utility

The haproxy daemon no longer fails deployment when using the haproxy_qat_support setting in the ingress specification

Previously, the haproxy_qat_support was present but not functional in the ingress specification. This was added to allow haproxy to offload encryption operations on machines with QAT hardware, intending to improve performance. The added function did not work as intended, due to an incomplete code update. If the haproxy_qat_support setting was used, then the haproxy daemon failed to deploy.

With this fix, the haproxy_qat_support setting works as intended and does not fail the haproxy daemon during deployment.

Bugzilla:2308344

PROMETHEUS_API_HOST gets set during Cephadm Prometheus deployment

Previously, PROMETHEUS_API_HOST would not always get set when Cephadm initially deployed Prometheus. This issue was seen most commonly when bootstrapping a cluster with --skip-monitoring-stack, then deploying Prometheus at a later time. As a result monitoring information could be unavailable.

With this fix, PROMETHEUS_API_HOST gets set during Cephadm Prometheus deployment and monitoring information is available, as expected.

Bugzilla:2315072

4.2. Ceph Dashboard

Creating a sub-user from the dashboard now works as expected

Previously, a Python coding error prevented the dashboard and API from creating sub-users.

With this fix, the corrected code allows the dashboard and API to create sub-users successfully.

Bugzilla:2325221

Editing RGW-related configurations from the dashboard is now supported

Previously, the dashboard relied on incorrect flag data to determine whether a configuration was editable, preventing users from modifying RGW-related settings that were editable via the CLI.

With this fix, the dashboard now allows editing all configurations starting with rgw, ensuring consistency with CLI capabilities.

Bugzilla:2308641

4.3. Ceph File System

Invalid headers during a no longer cause a segmentation fault during journal import

Previously, the cephfs-journal-tool did not check for headers during a journal import operation. This would cause a segmentation fault.

With this fix, headers are checked when running the journal import command and segmentation faults no longer occur with missing headers.

Bugzilla:2303640

cephfs-data-scan during disaster recovery now completes as expected

Previously, in some cases, the cephfs-data-scan ran during disaster recovery but did not create a missing directory fragment from the backtraces or create duplicate links. As a result, directories were inaccessible or crashed the MDS.

With this fix, cephfs-data-scan now properly recreates missing directory fragments and corrects the duplicate links, as expected.

Bugzilla:2343968

Inode invalidation operations now complete faster

Previously, an extra reference to an inode was taken in some cases that was never released. As a result, operations requiring inode invalidation were delayed until a timeout elapsed, making them very slow.

With this fix, the extra reference is avoided, allowing these operations to complete much faster without unnecessary delays.

Bugzilla:2355691

Space larger than the NFS export disk size can no longer be allocated

Previously, an empty file could be created without storage blocks allocated. These empty files could cause write operations to fail when writing to the desired file region, with a command such as fallocate.

With this fix, the fallocate command fails on the NFS mount point with an "Operation not supported" error and no empty files are created without storage blocks allocated.

Bugzilla:2301434

Proxy daemon logs now update immediately

Previously, log messages from the proxy daemon were buffered by the glibc library, causing delays in log file updates. As a result, in the event of a crash, some log entries could be lost, making troubleshooting and debugging more difficult.

With this fix, messages are now written directly to the log file, bypassing glibc buffering, ensuring that logs are immediately visible.

Bugzilla:2357488

Async write deadlock fixed under OSD full conditions

Previously, when asynchronous writes were ongoing and the OSD became full, the client received a notification to cancel the writes. The cancellation method and the callback invoked after the write was canceled both attempted to acquire the same lock. As a result, this led to a deadlock, causing the client to hang indefinitely during an OSD full scenario.

With this fix, the deadlock in the client code has been resolved. Consequently, asynchronous writes during an OSD full scenario no longer cause the client to hang.

Bugzilla:2291163

Expanded removexattr support for CephFS virtual extended attributes

Previously, removexattr was not supported on all appropriate Ceph virtual extended attributes, resulting in attempts to remove an extended attribute failing with a "No such attribute" error.

With this fix, support for removexattr has been expanded to cover all pertinent CephFS virtual extended attributes. You can now properly use removexattr to remove attributes. You can also remove the layout on the root inode. Removing the layout restores the configuration to the default layout.

Bugzilla:2297166

MDS and FS IDs are now verified during health warning checks for fail commands

Previously, the MDS and FS IDs were not checked when executing the ceph mds fail and ceph fs fail commands. As a result, these commands would fail with a "permission denied" error for healthy MDS or FS instances when another instance in the cluster exhibited health warnings.

With this fix, the system now validates the MDS and FS IDs during the health warning check. This change ensures that the ceph mds fail and ceph fs fail commands succeed for healthy instances, even if other MDS or FS instances in the cluster have health warnings.

Bugzilla:2328008

Error mapping now displays specific error message

Previously, an incorrect mapping of the error code to the user message resulted in a generic message being displayed. As a result, users did not see the specific details of the error encountered.

With this fix, the mapping has been corrected to show an error-specific message, ensuring that users receive detailed feedback for the error.

Bugzilla:2359598

fscrypt now decrypts long file names

Previously, the alternate name, which holds the raw encrypted version of the file name, was not provided in all decryption cases. As a result, long file names were not being decrypted correctly, and incomplete directory entry data was produced.

With this fix, the alternate name is provided during decryption, so fscrypt can now decrypt long file names properly.

Bugzilla:2362278

Snapshot names are now stored in plain text

Previously, snapshots could be created regardless of whether the fscrypt key was present. When a snapshot was created using the mgr subvolume snapshot create command without the key, the snapshot name was not encrypted during creation. As a result, subsequent attempts to decrypt the plain text name produced unreadable output.

With this fix, snapshot names are stored as plain text without encryption. This change helps ensure that snapshot names remain readable, whether the fscrypt key is present or not.

Bugzilla:2362859

4.4. Ceph Object Gateway

Multipart uploads using AWS CLI no longer cause Ceph Object Gateway to crash

Previously, during a multipart upload using AWS CLI, RGW crashed due to checksum algorithms and reporting behavior introduced in AWS S3 and AWS SDKs, specifically the new CRC64NVME checksum algorithm.

With this fix, Ceph Object Gateway safely handles the unknown checksum string. As a result, AWS CLI no longer causes multipart uploads to crash Ceph Object Gateways.

Bugzilla:2352427

Deleted objects no longer appear in the bucket listing

Previously, a race between CompleteMultipart and AbortMultipart uploads could lead to inconsistent results. As a result, objects could appear in the bucket listing, even when they were no longer present.

With this fix, a serializer is now used in AbortMultipart uploads and properly deleted objects no longer appear in a bucket listing.

Bugzilla:2331908

Ceph Object Gateway no longer crashes during object deletion

Previously, in some cases, an uninitialized check_objv parameter variable could lead to accessing an invalid memory address in the object delete path. As a result, there was a segmentation fault.

With this fix, the check_objv parameter is always initialized and the object deletion completes as expected.

Bugzilla:2350607

Tail objects no longer wrongly deleted with copy-object

Previously, there was a reference count invariant on tail objects that was not maintained when an object was copied to itself. This caused the existing object was changed, rather than copied. As a result, references to tail objects were being decremented. When the refcount on tail objects dropped to 0, they were deleted during the next garbage collection (GC) cycle.

With this fix, the refcount on tail objects is no longer decremented when completing a copy-to-self.

Bugzilla:2356678

AssumeRoleWithWebIdentity operations now fails as expected when incorrect thumbprints are added

Previously, due to a boolean flag being incorrectly set in the code, the AssumeRoleWithWebIdentity operation succeeded even when an incorrect thumbprint was registered in the CreateOIDCProvider call. As a result, AssumeRoleWithWebIdentity was able to succeed when it should have failed.

With this fix, the boolean flag is not set when no correct thumbprints are found registered in the CreateOIDCProvider call. As a result, if the end user does not provide a correct thumbprint in the CreateOIDCProvider call, the AssumeRoleWithWebIdentity operation now fails as expected.

Bugzilla:2324227

Ceph Object Gateway can now delete objects when RADOS is at maximum pool capacity

Previously, when a RADOS pool was near its maximum quota, the Ceph Object Gateway was not able to delete objects.

With this fix, Ceph Object Gateway can delete objects even when RADOS has reached its maximum pool threshold.

Bugzilla:2342928

User Put Object permissions are now recognized on copied buckets

Previously, bucket policies of a copy source bucket with access permissions for Put Object were not recognized on the copied bucket. As a result, when accessing the copied bucket, an Access Denied error was emitted.

With this fix, copy source bucket policies are loaded during permission evaluation of Put Object and user access on the copied bucket are recognized, as expected.

Bugzilla:2348708

Large queries on Parquet objects no longer emit an Out of memory error

Previously, in some cases, when a query was processed on a Parquet object, that object was read in large chunks. This caused the Ceph Object Gateway to load a larger buffer into the memory, which was too big for low-end machines. The memory would especially be affected when Ceph Object Gateway was co-located with OSD processes, which consume a large amount of memory. With the Out of memory error, the OS killed the Ceph Object Gateway process.

With this fix, the there is an updated limit for the reader-buffer size for reading column chunks. The default size is now 16 MB and the size can be changed through the Ceph Object Gateway configuration file.

Bugzilla:2365146

radosgw-admin no longer crashes by non-positive values

Previously, when running a radosgw-admin bucket reshard command, using a non-positive --num-shards value, such as a zero or a negative number, would cause radosgw-admin to crash.

With this fix, the --num-shards value is checked an error message is emitted if a non-positive value is provided. As a result, radosgw-admin reshard commands run as expected, and are not able to create a crash.

Bugzilla:2312578

Ceph Object Gateway no longer fails during signature validation

Previously, if the JSON Web Token (JWT) was not signed using the first x5c certification for signature validation, the signature validation fails.

With this fix, the correct certificate is chosen for signature validation, even if is not the first certification. As a result, the signature validation completes as expected.

Bugzilla:2242261

Objects are now removed as per the lifecycle rules set when bucket versioning is suspended

Previously, due to an error in the lifecycle code, the lifecycle process did not remove the objects if the bucket versioning was in the suspended state. As a result, the objects were still seen in the bucket listing.

With this fix, the lifecycle code is fixed and now the lifecycle process removes objects as per the rules set and the objects are no longer listed in the bucket listing.

Bugzilla:2319199

Multipart uploads can now add object tags

Previously, the Ceph Object Gateway S3 multipart upload object tags were not recognized when sent by the client. As a result, clients were not able to successfully apply object tags during initial object creation during a multipart upload.

With this fix, object tags are collected and stored. As a result, object tags can now be added and are recognized during multipart uploads.

Bugzilla:2323604

STS implementation now supports encryption keys larger than 1024 bytes

Previously, Ceph Object Gateway STS implementation did not support encryption keys larger than 1024 bytes.

With this fix, encryption keys larger than 1024 bytes are supported, as expected.

Bugzilla:2237854

Bucket logging configurations no longer allow setting the same source and target buckets

Previously, there was no check in place when setting a bucket logging configuration, verifying that the source and target buckets were different.

With this fix, bucket logging configuration settings are rejected when the source and destination are the same, as expected.

Bugzilla:2321568

Ceph Object Gateway no longer crashes due to mishandled kafka error messages

Previously, error conditions with the kafka message broker were not handled correctly. As a result, in some cases, Ceph Objet Gateway would crash.

With this fix, kafka error messages are handled correctly and do not cause Ceph Object Gateway crashes.

Bugzilla:2327774, Bugzilla:2343980

ACL bucket operations now work as expected

Previously, a local variable 'uri' shadowed a member variable with the same name. As a result, a subset of bucket ACL operations would fail.

With this fix, the shadowing local duplicated variable has been removed and ACL bucket operations now work as expected.

Bugzilla:2338149

Target buckets now needs a bucket policy for users to write logs to them

Previously, no permission checks were run on the target bucket for bucket logging. As a result, any user could write logs to a target bucket, without needing specific permissions.

With this fix, a bucket policy must be added on a target to allow specific users to write logs to them.

Bugzilla:2345305

S3 requests no longer rejected if local is listed before external for the authentication order

Previously, S3 requests were rejected when the request is not authenticated successfully by the local authentication engine. As a result, S3 requests using OpenStack Keystone EC2 credentials failed to authenticate with Ceph Object Gateway when the authentication order had local before external

With this fix, S3 requests signed using OpenStack Keystone EC2 credentials successfully authenticate with Ceph Object gateway, even with the authentication order has local listed before external.

Bugzilla:2316975

Ceph Object Gateway internal HTTP headers are no longer sent while transitioning the object to Cloud

Previously, some Ceph Object Gateway internal HTTP header values were sent to the Cloud endpoint, when transitioning the object to Cloud. As a result, some S3 cloud services did not recognize the headers and failed the transition or failed to restore the operation of the objects.

With this fix, internal HTTP headers are not sent to Cloud and transitioning to Cloud works as expected.

Bugzilla:2344731

The radosgw-admin bucket logging flush command now provides works as expected

Previously, using the radosgw-admin bucket logging flush command would return the next lob object name. As a result, the user did not know the name of the log object that was flushed without listing the log bucket.

With this fix, the correct name of the object that was flushed is now returned as expected.

Bugzilla:2344993

Upgrading clusters now fetches notification_v2 topics correctly

Previously, upgrading clusters upgraded bucket notifications to notification_v2. As a result, topics in notification_v2 were not retrieved as expected.

With this fix, notification_v2 topics are retrieved as expected after a cluster upgrade.

Bugzilla:2355272

olh get now completes as expected

Previously, a 2023 fix for a versioning-related bug caused an internal code path to reference an incorrect attribute name for the object logical head (OLH). As a result, an error would emit when running the radosgw-admin olh get command.

With this fix, the internal attribute name has been corrected, ensuring proper functionality.

Bugzilla:2338402

Swift container listings now report object last modified time

Previously, the Ceph Object Gateway Swift container listing implementation was missing the logic to send the last_modified JSON field. As a result, Swift container listings did not report the last modified time of objects.

With this fix, the last_modified JSON field has been added to the Swift container listing response, ensuring that object modification times are correctly reported.

Bugzilla:2343732

Ceph Object Gateway now recognizes additional checksums from their checksum-type specific headers and trailers

Previously, the aws-sdk-go-v2 checksum behavior differed from other SDKs, as it did not send either x-amz-checksum-algorithm or x-amz-sdk-checksum and never included x-amz-decoded-content-length, despite AWS documentation requiring it. As a result, additional checksums were not recognized when sent, and some AWS-chunked requests failed an assertion check for decoded content length with an InvalidArgument error.

With this fix, Ceph Object Gateway can now recognize additional checksums from their checksum-type specific header or trailer. Ceph Object Gateway no longer tests and asserts for decoded content length, as it is unnecessary due to chunk signature calculations.

Bugzilla:2367319

Shadow users for the AssumeRoleWithWebIdentity call are now created within the oidc namespace

Previously, an incorrect method was used to load the bucket stats, which caused the shadow users for AssumeRoleWithWebIdentity call to not be created within the oidc namespace. As a result, users were not able to differentiate between the shadow users and local rgw users.

With this fix, bucket stats are now loaded correctly and the user is correctly created within the oidc namespace. Users can now correctly identify a shadow user that corresponds to a federated user making the AssumeRoleWithWebIdentity call.

Bugzilla:2346829

4.5. Multi-site Ceph Object Gateway

Replicating metadata from earlier versions of Red Hat Ceph Storage no longer renders user access keys as “inactive”

Previously, when a secondary zone running Red Hat Ceph Storage 8.0 replicated user metadata from a pre-8.0 metadata master zone, the access keys of those users were erroneously marked as "inactive". Inactive keys cannot be used to authenticate requests, so those users are denied access to the secondary zone.

With this fix, secondary zone storage replication works as expected and access keys can still authenticate requests.

Bugzilla:2327402

Invalid URL-encoded text from the client no longer creates errors

Previously, the system improperly handled scenarios where URL decoding resulted in an empty key.name. The empty key.name due to invalid URL-encoded text from the client. As a result, an assertion error during the copy operation would occur, and sometimes led to a crash later.

With this fix, invalid empty key.name values are now ignored, and copy operations no longer trigger assertions or causes crashes.

Bugzilla:2356922

Network error code is now mapped correctly

Previously, when one or some of the Ceph Object Gateways in a target zone were down, the HTTP client in the Ceph Object Gateway in the source zone did not map the network connection error code correctly internally. As a result, the client kept attempting to connect to a downed Ceph Object Gateway instead of falling back to other active ones.

With this fix, the network error code is now mapped correctly. The HTTP client in the source zone detects the network error and fails over to communicate with the functioning Ceph Object Gateways in the target zone.

Bugzilla:2275856

sync error trim now runs as expected with optional --shard-id input

Previously, the sync error trim command did not mark the --shard-id option as optional.

With this fix, --shard-id option is recognized as as optional and is marked as optional in the radosgw-admin help.

Bugzilla:2282369

Objects restored from Cloud/Tape now synchronize correctly to remote locations

Previously, objects restored from Cloud/Tape retained their original mtime, making it insufficient for multisite sync checks. As a result, these restored objects were not synchronized to remote locations.

With this fix, a new extended attribute, internal_mtime, is introduced specifically for multisite usage, ensuring that restored objects are synchronized to remote locations when needed.

Bugzilla:2309701

Sync rate now works as expected

Previously, in some cases, an incorrect internal error return caused sync operations to run slower than expected.

With this fix, the error return was fixed and the expected sync rate is sustained.

Bugzilla:2317153

4.6. RADOS

rgw daemons no longer crash due to stack overflows

Previously, large clusters of over 10,000 daemons with a stack-based allocation of variable-length arrays caused a stack overflow. As a result, the Ceph Object Gateway daemons crashed.

With this fix, stack-based allocation of variable-length arrays are no longer used and stack overflows are avoided, with Ceph Object Gateway daemons working as expected.

Bugzilla:2346896

Pool removals now remove pg_upmap_primary mappings from the OSDMap

Previously, deleting pools did not remove pg_upmap_primary mappings from the OSDMap. As a result, pg_upmap_primary mappings were seen but could not be removed, since the pool, and pgid no longer existed.

With this fix, pg_upmap_primary mappings are now from the OSDMap are now automatically removed each time that a pool is deleted.

Bugzilla:2293847

Destroyed OSDs are no longer listed by the ceph node ls command

Previously, destroyed OSDs were listed without any indication of their status, leading to user confusion and causing cephadm to incorrectly report them as stray.

With this fix, the command filters out destroyed OSDs by checking their status before displaying them, ensuring accurate and reliable output.

Bugzilla:2269003

AVX512 support for the ISA-L erasure code plugin is now enabled

Previously, due to an issue in the build scripts, the plugin did not take advantage of the AVX512 instruction set—even on CPUs that supported it—resulting in reduced performance.

With this fix, the build scripts correctly enable AVX512 support, allowing the plugin to utilize the available CPU capabilities for improved performance.

Bugzilla:2310433

Multiple OSDs crashing while replaying bluefs -- ceph_assert(delta.offset == fnode.allocated)

Previously, a fix was implemented to prevent RocksDB’s SST files from expanding, but it contained a bug. As a result, the BlueFS log became corrupted, causing an error that prevented OSD bootup, even though it could be ignored.

With this fix, a flag skips the error, and BlueFS is updated to prevent the error from occurring. Now, the original fix for preventing RocksDB disk space overbloat functions as intended.

Bugzilla:2338097

BlueFS log no longer gets corrupted due to race conditions

Previously, a rare condition between truncate and unlink operations in BlueFS caused the log to reference deleted files. This corrupted the BlueFS log, triggering an assertion failure during OSD startup and resulting in a crash loop.

With this fix, the operations are now correctly sequenced using proper locking, preventing log corruption and eliminating the assertion failure.

Bugzilla:2354192

トップに戻る
Red Hat logoGithubredditYoutubeTwitter

詳細情報

試用、購入および販売

コミュニティー

Red Hat ドキュメントについて

Red Hat をお使いのお客様が、信頼できるコンテンツが含まれている製品やサービスを活用することで、イノベーションを行い、目標を達成できるようにします。 最新の更新を見る.

多様性を受け入れるオープンソースの強化

Red Hat では、コード、ドキュメント、Web プロパティーにおける配慮に欠ける用語の置き換えに取り組んでいます。このような変更は、段階的に実施される予定です。詳細情報: Red Hat ブログ.

会社概要

Red Hat は、企業がコアとなるデータセンターからネットワークエッジに至るまで、各種プラットフォームや環境全体で作業を簡素化できるように、強化されたソリューションを提供しています。

Theme

© 2025 Red Hat