Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.
9.0 Release Notes
Release notes for Red Hat Ceph Storage 9.0
Abstract
Making open source more inclusive Link kopierenLink in die Zwischenablage kopiert!
Red Hat is committed to replacing problematic language in our code, documentation, and web properties. We are beginning with these four terms: master, slave, blacklist, and whitelist. Because of the enormity of this endeavor, these changes will be implemented gradually over several upcoming releases. For more details, see our CTO Chris Wright’s message.
Providing feedback on Red Hat Ceph Storage documentation Link kopierenLink in die Zwischenablage kopiert!
We appreciate your input on our documentation. Please let us know how we could make it better. To do so, create a Bugzilla ticket:
- Go to the Bugzilla website.
- In the Component drop-down, select Documentation.
- In the Sub-Component drop-down, select the appropriate sub-component.
- Select the appropriate version of the document.
- Fill in the Summary and Description field with your suggestion for improvement. Include a link to the relevant part(s) of documentation.
- Optional: Add an attachment, if any.
- Click Submit Bug.
Chapter 1. Introduction Link kopierenLink in die Zwischenablage kopiert!
Red Hat Ceph Storage is a massively scalable, open, software-defined storage platform that combines the most stable version of the Ceph storage system with a Ceph management platform, deployment utilities, and support services.
The Red Hat Ceph Storage documentation is available at https://docs.redhat.com/en/documentation/red_hat_ceph_storage/9.
Chapter 2. Acknowledgments Link kopierenLink in die Zwischenablage kopiert!
Red Hat Ceph Storage version 8.0 contains many contributions from the Red Hat Ceph Storage team. In addition, the Ceph project is seeing amazing growth in the quality and quantity of contributions from individuals and organizations in the Ceph community. We would like to thank all members of the Red Hat Ceph Storage team, all of the individual contributors in the Ceph community, and additionally, but not limited to, the contributions from organizations such as:
- Intel®
- Fujitsu ®
- UnitedStack
- Yahoo ™
- Ubuntu Kylin
- Mellanox ®
- CERN ™
- Deutsche Telekom
- Mirantis ®
- SanDisk ™
- SUSE ®
Chapter 3. New features and enhancements Link kopierenLink in die Zwischenablage kopiert!
This section lists all the major updates, and enhancements introduced in this release of Red Hat Ceph Storage.
3.1. cephadm utility Link kopierenLink in die Zwischenablage kopiert!
Learn about the key enhancements and new features for the cephadm utility included in this release to improve functionality and user experience.
New cephadm certificate lifecycle management for improved Ceph cluster security
cephadm certificate lifecycle management was previously available as limited release. This enhancement provides full availability for new and existing customers in production environments.
With this enhancement, cephadm now has certificate lifecycle management in the certmgr subsystem. This feature provides a unified mechanism to provision, rotate, and apply TLS certificates for Ceph services, supporting both user-provided and automatically generated cephadm-signed certificates. As part of this feature, certmgr periodically checks the status of all certificates managed by cephadm and issues health warnings for any that are nearing expiration, misconfigured, or invalid. This improves Ceph cluster security and simplifies certificate management through automation and proactive alerts.
Multiple container registries can now be defined in registry credentials
Previously, only a single container registry credential could be configured. However, users may have different registries for different service containers.
With this enhancement, registry credentials can now define multiple container registries. To store multiple registry credentials, use the following command:
Enhanced config parameter to set the maximum number of OSDs to upgrade in parallel
With this enhancement, the config parameter sets the maximum number of OSDs that can be upgraded in parallel. The default value is 16.
For example,
[ceph: root@ceph-node-0 ceph]# ceph config get mgr mgr/cephadm/max_parallel_osd_upgrades 16 [ceph: root@ceph-node-0 ceph]# [ceph: root@ceph-node-0 ceph]# ceph config set mgr mgr/cephadm/max_parallel_osd_upgrades 32
[ceph: root@ceph-node-0 ceph]# ceph config get mgr mgr/cephadm/max_parallel_osd_upgrades
16
[ceph: root@ceph-node-0 ceph]#
[ceph: root@ceph-node-0 ceph]# ceph config set mgr mgr/cephadm/max_parallel_osd_upgrades 32
3.2. Ceph Dashboard Link kopierenLink in die Zwischenablage kopiert!
New support for managing Ceph Object Gateway accounts
Previously, managing Ceph Object Gateway accounts was only possible through the command-line interface (CLI) using radosgw-admin commands.
With this enhancement, you can now view account details, create new accounts, manage quotas, and link users and buckets to an account directly from the Ceph Dashboard.
As a result, Ceph Object Gateway environments align more closely with AWS-style account and IAM semantics, improving usability, scalability, and security governance.
New migration from Promtail to Grafana Alloy for centralized logging
Previously, centralized logging relied on Promtail, which is now deprecated and no longer recommended for new deployments.
With this enhancement, Red Hat Ceph Storage uses Grafana Alloy for log scraping and forwarding. Grafana Alloy provides a unified, modern, and more efficient agent for log collection, processing, and forwarding.
Grafana Alloy simplifies configuration management across clusters and improves performance and reliability. As a result, centralized logging reduces maintenance overhead, improves observability performance, and aligns the monitoring stack with current Grafana best practices.
For more information, see Viewing centralized logs of the Ceph cluster on the dashboard.
3.3. Ceph File System (CephFS) Link kopierenLink in die Zwischenablage kopiert!
Learn about the key enhancements and new features for Ceph File System (CephFS) included in this release to improve functionality and user experience.
Case sensitivity and Unicode normalization can now be configured during subvolume group creation
Previously, it was possible to configure Unicode normalization and case sensitivity when creating a subvolume, but not when creating a subvolume group. To apply these settings, users had to run additional commands after the group was created.
With this enhancement, new command arguments allow users to configure Unicode normalization and case sensitivity directly during subvolume group creation, eliminating the need for extra steps.
Source information of clone subvolumes is now preserved
Previously, after cloning was completed, the source information (subvolume or snapshot) of the clone was removed from the .meta file. As a result, when users ran the subvolume info command for a clone subvolume, they could not view details about its source.
With this enhancement, source information for a clone subvolume is now preserved even after cloning is complete. This allows the subvolume info command to include details about the source subvolume in its output, making it easier for users to find and view the origin of a clone.
Now supports monitoring subvolume-level metrics
CephFS now provides performance metrics at the subvolume level, including IOPS, throughput, and latency. These metrics help administrators monitor IO allocations for applications and protocol gateways that use CephFS subvolumes. Metrics are available through Prometheus, the Ceph Manager stats module, and the Ceph Dashboard.
For more information, see Viewing subvolume metrics for CephFS metadata server clients.
3.4. Ceph Object Gateway Link kopierenLink in die Zwischenablage kopiert!
Learn about the key enhancements and new features for Ceph Object Gateway included in this release to improve functionality and user experience.
Bucket logging support for Ceph Object Gateway with bug fixes and enhancements
Bucket logging was previously available as limited release. This enhancement provides full availability for new and existing customers in production environments.
Bucket logging provides a mechanism for logging all access to a bucket. The log data can be used to monitor bucket activity, detect unauthorized access, get insights into the bucket usage and use the logs as a journal for bucket changes. The log records are stored in objects in a separate bucket and can be analyzed later.
Bucket logging includes support for source and destination buckets across different tenants, suffix/prefix-based key filtering, and standardized AWS operation names in log records.
For more information, see Bucket logging.
Bugzilla:2308169, Bugzilla:2341711
Restore objects transitioned to remote cloud endpoint back into Ceph Object gateway using the cloud-restore feature
The cloud-restore feature was previously available as limited release. This enhancement provides full availability for new and existing customers in production environments.
This feature allows users to restore objects transitioned to remote cloud endpoint back into Ceph Object gateway, using either S3 restore-object API or by re-hydrating using read-through options.
For more information, see Using the radosgw-admin CLI for cloud restore operations.
New support for updating the restoration period for archived objects
With this enhancement, you can now update the expiry date of a restored object by reissuing the restore-object API request with a new restoration period. The updated period is calculated from the current time, allowing you to retain data longer or expire it sooner without re-downloading from the remote cloud endpoint.
For more information, see Restoring objects from S3 cloud-tier storage.
New CLI commands introduced to help monitor and debug restore operations
Previously, administrators had limited visibility into object restore operations, which made monitoring and debugging difficult.
With this enhancement, the system introduces two new CLI commands:
radosgw-admin restore list- Lists the restore status of objects in a bucket.
radosgw-admin restore status- Displays restore attributes for a specific object.
The bucket statistics also include restore-related information for easier monitoring.
For more information, see Using the radosgw-admin CLI for cloud restore operations.
Improved CLI output for topic management
The radosgw-admin topic list command has been enhanced for better usability. The output format is now consistent across v1 and v2 topics and excludes the topics section, reducing complexity for automation and scripting.
Enhanced conditional operations
This enhancement introduces support for conditional PUT and DELETE operations, including bulk and multi-delete requests. These conditional operations improve data consistency for some workloads.
The conditional InitMultipartUpload is not implemented in this release.
Bugzilla:2375000, Bugzilla:2350732
Flushed object name now emitted
Previously, users had no direct way to identify the last object that was flushed. This made it harder to determine the correct starting point when traversing log objects in the log bucket.
With this enhancement, the system now replies with the name of the last flushed object. As a result, users can easily identify the most recent object and streamline log traversal operations.
Reduced client impact during bucket resharding
With this enhancement, bucket resharding now does most of its processing before it starts to block write operations. This should significantly reduce the client-visible impact of resharding on large buckets.
Committed objects now added to log buckets even without pending records
Previously, when committing an object, it was not added to the log bucket if there were no log records pending. This made it harder for consumers to reliably determine the last committed object when listing log bucket contents.
With this enhancement, committed objects are now added to the log bucket even if no log records are pending. As a result, consumers can easily identify the last committed object and traverse log objects more efficiently.
Clear error propagation for logging failures in journal mode
Previously, when logging failed in journal mode, the customer received generic or misleading error messages. For example, a customer performing a regular S3 operation could see a 403 error if permissions were missing on the log bucket, even though permissions were correct on the target bucket.
With this enhancement, the system now propagates a clear error message indicating that the failure occurred during logging, not the primary operation. As a result, customers can quickly identify and resolve logging-related issues without confusion.
Automatic permission setting for D3N cache directory
Previously, configuring the RGW D3N cache directory required manual steps to set permissions, such as running chmod a+rwx rgw_d3n_l1_datacache_persistent_path. This added complexity and increased setup time.
With this enhancement, the correct permissions are automatically applied when the D3N cache directory is created. As a result, customers experience fewer manual configuration steps, improving setup efficiency and overall usability.
New support for AWS S3 GetAccountSummary
Previously, AWS S3 GetAccountSummary was not supported, which limited certain workloads that require account-level information, such as Terraform-based automation.
With this enhancement, AWS S3 GetAccountSummary is now supported.
New support for AWS STS GetCallerIdentity
Previously, AWS STS GetCallerIdentity was not supported, limiting the ability to validate user identities and enforce access policies before creating or modifying policies. This gap impacted workflows that rely on identity verification, such as Terraform-based automation.
With this enhancement, AWS STS GetCallerIdentity is now supported. As a result, customers can securely validate identities and access policies, enabling more robust policy management and seamless integration with Terraform workflows.
Aligned operation names with AWS for consistent log integration
Previously, operation names in Ceph logs were inconsistent with the operation types used by AWS. This required different approaches for log consumption depending on whether Ceph or AWS logs were being processed.
With this enhancement, operation names in Ceph logs now match the names used in AWS logs. This alignment simplifies integration and makes log consumption more consistent across systems.
3.5. Multi-site Ceph Object Gateway Link kopierenLink in die Zwischenablage kopiert!
Learn about the key enhancements and new features for multi-site Ceph Object Gateway included in this release to improve functionality and user experience.
Improved reliability for multi-site replication data log delivery
Previously, in rare cases, replication data logs could lose updates, which created the appearance of stalled replication even though data consistency was not affected.
With this enhancement, the multi-site replication process is hardened to prevent such occurrences. As a result, replication performance is smoother, and log reduction happens more promptly, improving overall system responsiveness.
Cleanup added for index segments of replicated buckets
Previously, dynamic resharding with multi-site replication had a long-standing limitation: old index segments were not cleaned up due to simultaneous access to old and new index shards during replication. This resulted in persistent space leakage.
With this enhancement, cleanup for index segments of replicated buckets has been added. As a result, the space leakage issue is resolved, improving storage efficiency and overall system health.
3.6. RADOS Link kopierenLink in die Zwischenablage kopiert!
Learn about the key enhancements and new features for RADOS included in this release to improve functionality and user experience.
Enhanced support for moving stretch mode to normal mode
Previously, Ceph clusters operating in stretch mode could not be reverted to normal mode without manual intervention.
With this enhancement, Ceph introduces a command that allows users to gracefully exit stretch mode.
ceph mon disable_stretch_mode CRUSH_RULE --yes-i-really-mean-it
ceph mon disable_stretch_mode CRUSH_RULE --yes-i-really-mean-it
Users may optionally specify a CRUSH rule to which all pools should be migrated. If no rule is provided, Ceph automatically selects a default replicated CRUSH rule.
Enhanced detection of network partitions under connectivity election strategy
Previously, monitors operating under the connectivity election strategy did not provide user-facing alerts when network partitions occurred.
With this enhancement, monitors can detect network partitions between themselves. The elected leader monitor evaluates connectivity scores shared by its peers to identify partitioned connectivity groups.
When a netsplit is detected, monitors emit health warnings.
- Example of complete location-level partitions warning
-
Netsplit detected between dc1 and dc2 - Example of individual monitor disconnections warning
-
Netsplit detected between mon.a and mon.d
New ISA plugin support for erasure coded pools
Previously, erasure coded pools only supported Jerasure plugins.
With this enhancement, ISA plugin is now the default and both plugins are now supported.
General enhancements for RADOS and RADOS BlueStore
This version provides several enhancements for RADOS and RADOS BlueStore. These enhancements include the following:
- BlueStore discard optimization
- Actively triggers block device discards to prevent excessive queue growth on SSDs and improves performance on lower-grade drives.
- Faster device scanning
-
ceph-volumescans devices up to 100-times faster, streamlining day-one cluster setup operations. - Improved write latency
-
Uses a single consolidated
fdatasynccall in the WAL to reduce latency and improve overall write performance in BlueStore. - RADOS OMAP iteration
- Optimizes object map (OMAP) iteration to reduce latency during large-scale operations and improve responsiveness in complex workloads.
Erasure coding ratio support enhancements
This release introduces new support and qualification for 5+2 and 6+2 erasure coding ratios. These configurations deliver an optimal balance of performance, scalability, and cost efficiency, making them ideal for clusters that require high storage utilization and robust data protection.
For more information, see Erasure code profiles.
3.7. Ceph Block Device mirroring (rbd-mirror) Link kopierenLink in die Zwischenablage kopiert!
Learn about the key enhancements and new features for Ceph Block Device mirroring included in this release to improve functionality and user experience.
Improved tracking of mirror group snapshot states
Previously, rbd-mirror tracked the progress of a mirror group snapshot without distinguishing between a snapshot that was created and one that was fully synced.
With this enhancement, a new internal field (complete) is integrated into the GroupSnapshotNamespaceMirror structure. This field determines whether a snapshot is completely synced. The existing state field of creating and created continues to indicate whether the snapshot has been created. Together, these fields provide a more precise distinction between snapshots that are created (metadata available) and those that are fully synced.
As a result, mirror group snapshot status tracking is more accurate and consistent, improving compatibility and robustness in the rbd-mirror process. The user-facing output of the rbd group snap ls command is also updated to reflect clearer state names: creating and created instead of incomplete and complete. A mirror group snapshot is completely synced on the secondary cluster when the NAMESPACE column shows as copied, and still syncing when it shows not copied.
Chapter 4. Deprecated functionality Link kopierenLink in die Zwischenablage kopiert!
This section provides an overview of functionality that has been deprecated in all minor releases up to this release of Red Hat Ceph Storage.
Deprecated functionality continues to be supported until the end of life of Red Hat Ceph Storage 9. Deprecated functionality will likely not be supported in future major releases of this product and is not recommended for new deployments. For the most recent list of deprecated functionality within a particular major release, refer to the latest version of release documentation.
Deprecated method of configuring OIDC federation and IAM roles at the tenant level
All OIDC resources are now managed as resources within a Ceph Object Gateway account. These OIDC resources include providers, roles, and polices, As a result, all OIDC operations that target a tenant, including the global or empty tenant, are considered deprecated. The deprecated operations incldue creating providers, creating roles, and assuming roles.
With the newer per-account model, federated users are directly associated with the account and Ceph Object Gateway no longer creates shadow users (for example, TENANT$USER_NAMESPACE) upon role assumption. The account itself tracks all resources and identities.
Tenant-based OIDC federation users should migrate their configurations to the new Ceph Object Gateway per-account model, before feature removal.
For more information, see Secure Token Service.
Chapter 5. Bug fixes Link kopierenLink in die Zwischenablage kopiert!
This section describes bugs with significant user impact, which were fixed in this release of Red Hat Ceph Storage. In addition, the section includes descriptions of fixed known issues found in previous versions.
5.1. Ceph Dashboard Link kopierenLink in die Zwischenablage kopiert!
Learn about bug fixes for Ceph Dashboard included in this release.
Ceph Object Gateway page now loads after a multi-site configuration
Previously, the Ceph Object Gateway page does not load because the dashboard could not find the correct access key and secret key for the new realm during multi-site configuration.
With this fix, the Ceph Object Gateway page can find the correct access and secret key for the new realm and loads as expected.
5.2. Ceph File System (CephFS) Link kopierenLink in die Zwischenablage kopiert!
Learn about bug fixes for Ceph File System included in this release.
enctag value length is now restricted to 255 characters
Previously, enctags could be stored with values longer than 255 characters. However, operations such as enctag get only supported displaying values up to 255 characters. If an enctag with a longer value was stored, the system returned a general Unexpected error output.
With this fix, the system now enforces a maximum enctag value length of 255 characters. Only valid enctags are accepted and stored, allowing operations such as enctag get to display the enctag successfully.
Unsupported encryption algorithms now return an error on CephFS
Previously, the CephFS userspace did not validate encryption algorithms when setting up fscrypt. Only AES-256-XTS and AES-256-CTS were supported, but if a different algorithm was requested, CephFS silently used the default supported algorithm without notifying the user.
With this fix, a validation check ensures that only supported encryption algorithms are allowed when setting up fscrypt on CephFS. If an unsupported algorithm is supplied, the system returns an EINVAL error code.
5.3. Ceph Object Gateway Link kopierenLink in die Zwischenablage kopiert!
Learn about bug fixes for Ceph Object Gateway included in this release.
Cloud-S3 restore requests no longer lost after Ceph Object Gatway restart
Previously, if the Ceph Object Gateway (RGW) service restarted while a restore request for the cloud-s3 cloud tier was in progress, the request state was lost because it was not stored persistently. As a result, the restore operation was not retried.
With this fix, the system now stores the state of restore requests persistently. If the Ceph Object Gateway service restarts during processing, the request is resumed automatically, ensuring continuity without manual intervention.
For more information, see Restoring objects from S3 cloud-tier storage.
Local reads now work as expected
Previously, local reads were sometimes unavailable for recently created or modified RADOS objects due to protocol limitations.
With this fix, eligible reads can now be performed locally, improving consistency and reliability in all environments.
max_objs_per_shard no longer goes below a safe minimum
Previously, in versioned buckets, the max_objs_per_shard value was reduced by a factor of three to account for the additional index entries created by object versioning. In cases, such as debugging, this value could artificially be set low to trigger early resharding. With these two items in combination, these adjustments could result in a max_objs_per_shard value of 0, leading to a division by 0 crash.
With this fix, the value can not be reduced below a safe minimum.
RGW STS now supports encryption keys larger than 1024 bytes
Previously, the RGW STS implementation did not support encryption keys larger than 1024 bytes. Users had to manually adjust Keycloak settings by lowering the priority of the rsa-enc-generated provider and reducing the keySize to 1024.
With this fix, RGW STS now supports encryption keys larger than 1024 bytes without requiring manual configuration changes in Keycloak. This improves security and simplifies setup.
Checksum type and checksum algorithm now display with uncompleted multipart uploads
Previously, when listing parts for uncompleted multipart uploads, the checksum type and checksum algorithm were missing because the logic to extract these fields was not implemented.
With this fix, the logic to extract the checksum type and checksum algorithm has been added, so these fields now appear when listing parts for uncompleted multipart uploads.
radosgw-admin no longer crashes by non-positive values
Previously, when running the radosgw-admin bucket reshard command, using a non-positive --num-shards value, such as a zero or a negative number, would cause radosgw-admin to crash.
With this fix, the --num-shards value is checked an error message is emitted if a non-positive value is provided. As a result, radosgw-admin reshard commands run as expected, and are not able to create a crash.
Empty string in the HTTP_X_AMZ_COPY_SOURCE header no longer causes crashing
Previously, the HTTP_X_AMZ_COPY_SOURCE header could have an empty string output, rather than NULL. When the empty string was passed to RGWCopyObj::parse_copy_location() the empty name would cause a crash.
With this fix, a check is in place that the header contains a valid string.
Multipart upload completion logs now include object size
Previously, the log record for multipart upload completion did not include the object size.
With this fix, the object size is now included in the completion log record.
Operation names in log records now match AWS naming conventions
Previously, the name of the operation in Ceph Object Gateway log records did not match the name used in AWS. As a result, consumers that could parse AWS generator records failed when processing records generated by Ceph Object Gateway.
With this fix, the operation name in log records is now consistent with AWS naming conventions. The same consumer can be used for records from both sources.
Bucket logging configuration changes no longer cause data loss
Previously, a race condition occurred when the bucket logging configuration was changed while the system was running. This caused log objects to be garbage collected, and log records were lost after the garbage collector ran.
With this fix, the race condition has been resolved. Users can now safely change bucket logging configurations without risking data loss.
Copied objects from versioned to non-versioned buckets are now accessible
Previously, when copying an object from a versioned bucket to a non-versioned bucket, some versioning attributes were mistakenly copied to the destination object. This could make the copied object inaccessible.
With this fix, versioning attributes are removed when copying to a non-versioned bucket and the copied object is now accessible.
Data transition to AWS non-default regions is now supported
Previously, the cloud tier module did not handle the location_constraint parameter required by AWS when creating a bucket in a non-default region. As a result, data transition to an AWS cloud endpoint failed if the target bucket was in a non-default region.
With this fix, a new parameter, location_constraint, has been added to the tier_config configuration. This parameter must be set or updated along with region when using AWS non-default regions. Data can now be transitioned to AWS non-default regions successfully.
Tenant user policy and role-based permissions now work as expected after upgrade
Previously, some policy or role-based permissions involving legacy tenant users behaved differently after upgrading to releases that support IAM accounts. As a result, expected access grants would fail.
With this fix, a configuration option has been introduced to allow backward compatibility with previous version behavior.
Predefined ACLs are now correctly matched
Previously, reversed logic in the comparison functor for predefined ACL matching caused all predefined ACLs to be rejected.
With this fix, calls to compare() have been replaced with operator== and predefined ACLs now match correctly.
Permission checks for multipart upload initialization are now correct
Previously, the permission check for InitMultipart incorrectly used the bucket Amazon Resource Name (ARN) instead of the object ARN. This could cause PutObject requests to fail unexpectedly.
With this fix, permission checks now use the object ARN as intended. Multipart upload initialization and subsequent object operations work correctly.
Improved bulk delete performance in versioned buckets
Previously, the Ceph Object Gateway object deletion logic was inefficient, particularly in how it invoked update_olh. This caused high latency and could eventually lead to system lockups when processing bulk deletes of up to 1,000 object versions per request in versioned buckets.
With this fix, update_olh is now limited to a single invocation per bulk-delete request. This change significantly improves system behavior under heavy bulk-delete workloads.
ACL checks after AssumeRole are now correctly enforced
Previously, incorrect logic failed to verify ACLs after an AssumeRole operation. As a result, checks for explicit ACL grants failed incorrectly.
With this fix, RoleApplier::get_perms_from_aclspec() now calls rgw_perms_from_aclspec_default_strategy() to check for matching ACL grants. Additionally, missing RoleApplier support has been added to grant access based on ACLs.
Log records now correctly indicate ACL-based authorization.
Previously, the aclRequired field in the log record would display as with a hyphen (-), even when the request was authorized by an ACL. This was misleading because it suggested that the operation was authorized by a bucket policy.
With this fix, the field is set to Yes whenever a request is authorized by an ACL.
Log records now correctly indicate authentication type for unauthenticated requests
Previously, the AuthenticationType field in the log record was incorrectly set to QueryString for unauthenticated requests.
With this fix, the field is set to hyphen (-) for unauthenticated requests.
IAM policy now recognizes AbortMultipartUpload Deny requests
Previously, a session policy incorrectly used the AbortMultipartUpload action. As a result, a Deny statement for AbortMultipartUpload in the IAM policy was not respected when PutObject was allowed.
With this fix, the action in the IAM policy was corrected. The Deny for AbortMultipartUpload in the IAM policy is now properly enforced.
Delete bucket policy now returns correct status code
Previously, the delete bucket policy operation returned HTTP status code 204, instead of the correct 200 code.
With this fix, the HTTP status code was corrected, and delete bucket policy now returns 200, as expected.
api_name field now initializes during zone group rename
Previously, the api_name field in the zone group map was not initialized during a zone group rename because the variable assignment was missing.
With this fix, the api_name variable is now assigned. The api_name field is now correctly initialized during zone group rename.
Multipart object decryption now works for partNumber requests
Previously, if a multipart object was encrypted using SSE-C or SSE-S3, a get object request with partNumber did not decrypt the part.
With this fix, the logic was updated to attach the saved crypt prefix, if present, when the get action is a get-part operation. This enables Ceph Object Gateway to decrypt the part for get object requests with partNumber.
5.4. Multi-site Ceph Object Gateway Link kopierenLink in die Zwischenablage kopiert!
Learn about bug fixes for Ceph Object Gateway multi-site included in this release.
Rate limit configurations now synchronize across all Ceph Object Gateway multi-site zones
Previously, user and bucket quota rate limit configurations set on a secondary Ceph Object Gateway site were not synchronized back to the primary site in a multi-site setup. This caused configuration inconsistencies, resulting in different rate-limiting behaviors between zones.
With this fix, the metadata synchronization process for Ceph Object Gateway multi-site has been improved. Rate limit updates from secondary zones are now correctly propagated to the primary zone, validated, and applied. All user and bucket rate limit configurations now synchronize across all zones, ensuring consistent behavior throughout the Ceph Object Gateway multi-site cluster.
RGW multi-site now automates cleanup of deleted bucket instances and index objects
Previously, deleted bucket instances and related index objects were retained and not trimmed during bilog trimming to allow multi-site sync to finish processing deletions on other zones.
With this fix, a mechanism has been introduced to automatically clean up deleted bucket instances and index objects across all zones as part of the bucket index log trimming process. Additionally, the DeleteBucket API now returns 409 BucketNotEmpty errors until the bucket is empty on all zones when a sync policy is enabled.
Object lock configuration rule is now synchronized to the secondary zone
Previously, when object lock was enabled on a bucket, the rule failed to replicate to other zones, causing object lock inconsistencies between zones.
With this fix, multi-site replication now synchronizes the object lock configuration to all zones, ensuring consistent buckets across zones.
RGWBucketFullSyncCR no longer spins indefinitely when the source bucket has been deleted
Previously, the coroutine RGWListRemoteBucketCR reused the bucket_list_result member without clearing its prior state. Stale entries and the is_truncated flag from a previous iteration could persist, causing the loop to continue even after the bucket was deleted.
With this fix, the constructor of RGWListRemoteBucketCR clears the provided bucket_list_result at the start. This ensures that each listing begins with a clean state and accurately reflects the current remote bucket contents.
Chapter 6. Technology preview Link kopierenLink in die Zwischenablage kopiert!
This section provides an overview of Technology Preview features introduced or updated in this release of Red Hat Ceph Storage.
Technology Preview features are not supported with Red Hat production service level agreements (SLAs), might not be functionally complete, and Red Hat does not recommend using them for production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
== cephadm utility
Get to know the Technology Preview features introduced or updated for the cephadm utility in this release.
New Ceph Management gateway and the OAuth2 Proxy service for unified access and high availability
With this enhancement, the Ceph Dashboard introduces the Ceph Management gateway (mgmt-gateway) and the OAuth2 Proxy service (oauth2-proxy). With the Ceph Management gateway (mgmt-gateway) and the OAuth2 Proxy (oauth2-proxy) in place, nginx automatically directs the user through the oauth2-proxy to the configured Identity Provider (IdP), when single sign-on (SSO) is configured.
6.1. Ceph Dashboard Link kopierenLink in die Zwischenablage kopiert!
Get to know the Technology Preview features introduced or updated for the Ceph Dashboard in this release.
New OAuth2 SSO
OAuth2 SSO uses the oauth2-proxy service to work with the Ceph Management gateway (mgmt-gateway), providing unified access and improved user experience.
6.2. Ceph Object Gateway Link kopierenLink in die Zwischenablage kopiert!
Get to know the Technology Preview features introduced or updated for Ceph Object Gateway in this release.
New per-user and per-bucket usage counters in Prometheus
Ceph Object Gateway now exports per-user and per-bucket usage counters via performance counters automatically collected by the ceph-exporter and made available in Prometheus. This provides low-overhead, real-time visibility into the following:
- Per-bucket metrics: used bytes, utilized bytes, and number of objects
- Per-user metrics: used bytes and number of objects
- Cache performance metrics: cache hits, misses, updates, and evictions
These metrics are disabled by default. To enable them, configure the appropriate settings in your Ceph Object Gateway configuration.
For more information, see Viewing Ceph Object Gateway per-user and per-bucket usage statistics (Technology Preview).
6.3. RADOS Link kopierenLink in die Zwischenablage kopiert!
Get to know the Technology Preview features introduced or updated for RADOS in this release.
Balanced primary placement groups can now be observed in a cluster
Previously, users could only balance primaries with the offline osdmaptool.
With this enhancement, autobalancing is available with the upmap balancer. Users can now choose between either the upmap-read or read mode. The upmap-read mode offers simultaneous upmap and read optimization. The read mode can only be used to optimize reads.
For more information, see Using the Ceph Manager balancer module.
Now supports tracking data availability score of a cluster
This release introduces a feature that tracks the data availability score of a Ceph cluster over time. The score represents how accessible your data is at any given moment, based on factors such as OSD health, placement group states, and redundancy policies.
By monitoring this metric, administrators gain a fact-based view of cluster reliability and can validate availability percentages (for example, 99.99%) against service-level objectives. This capability provides actionable insight into operational resilience and helps ensure confidence in Ceph as a storage platform for critical workloads.
Chapter 7. Known issues Link kopierenLink in die Zwischenablage kopiert!
This section documents known issues found in this release of Red Hat Ceph Storage.
7.1. cephadm utility Link kopierenLink in die Zwischenablage kopiert!
Get to know the known issues for cephadm utility found in this release.
NFS daemon fails to start for NFSv3
The NFS daemon fails to start when the rpcbind and rpc.statd services are missing or not running. These services are required for NFSv3, and by default, cephadm creates the NFS service for both NFSv3 and NFSv4 protocols. When these services are unavailable, the NFS daemon does not come online and emits the Cannot register NFS V3 on TCP error.
As a workaround, install the the rpcbind and rpc.statd packages and start the services. After these services are running, the NFS daemon starts successfully.
Grafana certificate does not migrate during upgrade
When you upgrade from Red Hat Ceph Storage 8.1 to 9.0, the existing user-signed Grafana certificate is not migrated. Instead, Grafana switches to a cephadm-signed certificate. As a result, duplicate certificate entries may appear, and certificate-related health warnings can persist. Manual reconfiguration is required if you want to use custom TLS certificates.
Data services remain unaffected.
To work without custom TLS certificates, you can continue using the cephadm-signed certificate.
As a workaround to use custom TLS certificates, complete the following steps:
-
Change the Grafana specification to use
certificate_source: reference. -
Use
certmgrto upload a valid user-signed certificate and key for each host. -
Run the
ceph orch reconfig grafanacommand.
Management gateway does not open HTTPS port during deployment
When the management gateway (mgmt-gateway) is deployed with default settings and firewalld is active, the default HTTPS port (443) is not opened in firewalld. The gateway listens on port 443 and is reachable locally, but remote access to the dashboard fails until the firewall is manually adjusted.
As a workaround, use one of the following options:
-
Explicitly configure a port for
mgmt-gatewayby using the--portoption or settingspec.port. This ensures thatcephadmopens the correct port infirewalld. -
Manually open HTTPS (443) in
firewalld. For example,
firewall-cmd --add-service=https firewall-cmd --add-port=443/tcp
firewall-cmd --add-service=https
firewall-cmd --add-port=443/tcp
Cephadm operations may fail when interactive shell aliases are present
In Red Hat Ceph Storage 7, cephadm uses the shell mv command on remote hosts. If the cephadm SSH user has interactive aliases such as mv='mv -i' (and similar for rm or cp), these aliases trigger prompts and block cephadm operations. As a result, commands like ceph orch upgrade, cephadm bootstrap, or adding hosts may hang or fail because mv waits for user confirmation instead of running non-interactively.
Currently there is no workaround. To avoid this issue, remove or disable interactive aliases for mv, rm, and cp for the cephadm SSH user. For example, comment them out in .bashrc or define them only for interactive shells, then rerun the cephadm operation.
Promtail image remains visible after migration to Alloy
During the transition from Promtail to Alloy, cephadm continues to register the Promtail container image to maintain backward compatibility and ensure a smooth migration path. As a result, Promtail still appears in the cephadm list-images output after upgrading, even though Alloy is the new default. The behavior is intentional to prevent breaking log collection on clusters that have not fully migrated.
No workaround is required. Ignore the Promtail image entry during the supported transition phase. If log collection has fully migrated to Alloy and is verified, you can optionally remove legacy Promtail daemons and images manually. This cleanup is not required for cluster operation.
Bugzilla:2418617 == Ceph build Get to know the known issues for Ceph build found in this release.
HAProxy deployment fails when QAT is enabled with ingress
Deploying HAProxy with the QAT feature enabled fails on Red Hat Ceph Storage 9.0 container images when using the ingress feature.
This occurs because HAProxy no longer supports ssl_engine in default builds. In addition, newer OpenSSL versions have removed the legacy engine used by QAT, making them incompatible. Attempts to use older OpenSSL versions or build a QAT provider for newer versions also lead to compatibility issues.
As a result, HAProxy cannot run with QAT enabled, and deployment fails.
There is no way to enable QAT with HAProxy. To continue using HAProxy without QAT, update the HAProxy configuration file (typically located at /var/lib/haproxy/haproxy.cfg) as follows:
haproxy_qat_support: false ssl: true
haproxy_qat_support: false
ssl: true
QAT cannot be used for TLS offload or acceleration mode together with SSL set
Enabling QAT on HAProxy with SSL enabled injects legacy OpenSSL engine directives. The legacy OpenSSL engine path breaks the TLS handshake, emitting the tlsv1 alert internal error error. With the TLS handshake broken, the TLS termination fails.
As a workaround, disable the QAT at HAProxy in order to keep the TLS handshake.
Set the configuration file specifications as follows:
-
haproxy_qat_support: false -
ssl: true
As a result, QAT is disabled and the HAProxy TLS works as expected.
Under heavy connection rates higher CPU usage may be seen versus QAT-offloaded handshakes.
7.2. Ceph Dashboard Link kopierenLink in die Zwischenablage kopiert!
Get to know the known issues for Ceph Dashboard found in this release.
Active alert displays even when Prometheus module is active
In some cases, the Ceph Dashboard shows an active alert for CephMgrPrometheusModuleInactive even though the Prometheus module is enabled. This can happen due to a cluster misconfiguration that causes the Ceph target to go down, falsely triggering the alert.
The alert remains visible unless silenced, even when the Prometheus module is functioning correctly.
As a workaround, to suppress the alert, from the Ceph Dashboard, select the CephMgrPrometheusModuleInactive alert and create a silence.
Observability → Alerts → CephMgrPrometheusModuleInactive → Create Silence.
For more information, see Managing alerts on the Ceph dashboard.
Dashboard cannot delete non-default zone groups or zones
Users cannot delete non-default zone groups or zones from the Ceph Dashboard. Attempts to delete them fail.
As a workaround, delete non-default zone groups and zones through the command-line interface by using the appropriate radosgw-admin commands.
7.3. Ceph File System (CephFS) Link kopierenLink in die Zwischenablage kopiert!
Get to know the known issues for Ceph File System (CephFS) found in this release.
Subvolume operations delayed due to GIL contention during asynchronous cloning
When the asynchronous cloner in the volumes module (mgr/volumes) uses the CephFS Python binding, it invokes the Ceph client library API while holding the Python Global Interpreter Lock (GIL). During asynchronous clone operations, the GIL remains locked for an extended period, which prevents other CephFS subvolume operations such as create and delete from acquiring the GIL in time. As a result, customers may experience delayed responses when performing subvolume operations.
As a workaround, temporarily pause cloning to allow other subvolume operations to proceed.
This workaround is not practical in most production environments and should be used only in exceptional cases.
7.4. Ceph Object Gateway Link kopierenLink in die Zwischenablage kopiert!
Get to know the known issues for Ceph Object Gateway found in this release.
Lifecycle processing stuck in PROCESSING state for a given bucket
If a Ceph Object Gateway server is unexpectedly restarted when the lifecycle processing is in progress for a given bucket, that bucket does not resume processing lifecycle work for at least two scheduling cycles and is stuck in PROCESSING state. This is an expected behavior as it is intended to avoid multiple Ceph Object gateway instances or threads from processing the same bucket simultaneously, especially when the debugging is in progress in production.
Currently there is no workaround.
Ceph Object Gateway services down after upgrade
After upgrading, Ceph Object Gateway services may fail to start. The service fails to start as the rgw service now enforces the rgw_realm configuration but no realm exists in the Ceph Object Gateway configuration. As a result, the following outputs occur:
-
The Ceph Object Gateway logs show the following error:
rgw main: failed to load zone: (2) No such file or directory -
The
ceph orch ps | grep rgwoutput displays Ceph Object Gateway in an error state. -
Ceph Object Gateways are missing from
ceph versions.
As a workaround, removing the rgw_realm entry and restart all Ceph Object Gateway services.
Verify if the Ceph Object Gateways are configured with no realm indicated while the Ceph configuration database specifies a realm.
Check the Ceph Object Gateway realm list.
radosgw-admin realm list
radosgw-admin realm listCopy to Clipboard Copied! Toggle word wrap Toggle overflow The following is an example with an empty realm list:
[ceph: root@host01 /]# radosgw-admin realm list { "default_info": "", "realms": [] }[ceph: root@host01 /]# radosgw-admin realm list { "default_info": "", "realms": [] }Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the Ceph configuration database.
ceph config dump | egrep "^WHO|rgw_realm”
ceph config dump | egrep "^WHO|rgw_realm”Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# `ceph config dump | egrep "^WHO|rgw_realm WHO MASK LEVEL OPTION VALUE xxxxx.yyyyy advanced rgw_realm default
[ceph: root@host01 /]# `ceph config dump | egrep "^WHO|rgw_realm WHO MASK LEVEL OPTION VALUE xxxxx.yyyyy advanced rgw_realm defaultCopy to Clipboard Copied! Toggle word wrap Toggle overflow If step 1a matches step 1b, continue to step 2 to remove the
rgw_realmfrom the Ceph configuration database.If the two steps do not match, contact Support.
-
Remove the
rgw_realmfrom the Ceph configuration database.
ceph config rm xxxxx.yyyyy rgw_realm
ceph config rm xxxxx.yyyyy rgw_realm
- Restart all Ceph Object Gateway services.
ceph orch restart rgw
ceph orch restart rgw
7.5. Ceph Object Gateway multi-site Link kopierenLink in die Zwischenablage kopiert!
Get to know the known issues for Ceph Object Gateway multi-site found in this release.
Sync failure occurs after renaming a zone or zone group
Renaming a zone or zone group in the Primary zone in the master_zonegroup can cause sync failures. When sync failures occur, the following sync status error is may be emitted and further sync operations are affected:
failed to retrieve sync info: (2200) Unknown error 2200
As a workaround, before renaming a zone or zone group in the master_zonegroup, remove the old zone or zone group name from the Ceph configuration file. For more information, see Renaming a zone group and Removing a zone from a zone group.
Secondary site continues to display old zone group name after rename
In some cases, when a zone group is renamed on the Primary site, the Secondary site may still display the old zone group name. This occurs because the old name is not removed from the .rgw.root pool after the rename operation.
As a result, both the old and new zone groups appear under the radosgw-admin zonegroup list command, and sync operations may be impacted.
As a workaround, complete the following steps.
Verify that the new zone group name exists.
radosgw-admin zonegroup list
radosgw-admin zonegroup listCopy to Clipboard Copied! Toggle word wrap Toggle overflow List the
.rgw.rootpool and locate the old zone group name.rados -p .rgw.root ls
rados -p .rgw.root lsCopy to Clipboard Copied! Toggle word wrap Toggle overflow The old name appears in the format:
zonegroups_names.OLD_ZONEGROUP_NAME
zonegroups_names.OLD_ZONEGROUP_NAMECopy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the old zone group name from the pool.
rados -p .rgw.root rm zonegroups_names.OLD_ZONEGROUP_NAME
rados -p .rgw.root rm zonegroups_names.OLD_ZONEGROUP_NAMECopy to Clipboard Copied! Toggle word wrap Toggle overflow
Removing the old zone group name restores normal sync operations.
Multi-site lifecycle expiration does not clean OLH entries in versioned buckets
Multi-site lifecycle expiration may fail to remove object log header (OLH) entries in versioned buckets. The system leaves stale data in the bucket index. This issue occurs when lifecycle expiration runs on multi-site deployments for versioned buckets. As a result, stale OLH entries remain after object deletion. This causes bucket index bloat and may impact bucket operations for customers.
As a workaround, administrators can manually detect and repair the affected buckets.
Detect stale entries.
radosgw-admin bucket check olh --dump-keys --bucket=BUCKET_NAME --hide-progress
radosgw-admin bucket check olh --dump-keys --bucket=BUCKET_NAME --hide-progressCopy to Clipboard Copied! Toggle word wrap Toggle overflow Repair the bucket index.
radosgw-admin bucket check olh --fix --bucket=BUCKET_NAME
radosgw-admin bucket check olh --fix --bucket=BUCKET_NAMECopy to Clipboard Copied! Toggle word wrap Toggle overflow
After repair, stale entries are purged.
7.6. Ceph Block Device (RBD) Link kopierenLink in die Zwischenablage kopiert!
Get to know the known issues for Ceph Block Device found in this release.
Kernel client does not support pg-upmap-primary
The kernel client currently does not support the pg-upmap-primary feature. As a result, users may encounter issues when attempting to mount images or filesystems using the kernel client in environments where pg-upmap-primary is configured.
If issues occur during mounting with the kernel client, verify that the issue is due to this support issue.
Confirm that your cluster contains
pg-upmap-primarymappings.ceph osd dump | grep "pg_upmap_primary"
ceph osd dump | grep "pg_upmap_primary"Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the kernel log for the following error message.
dmesg | tail [73393.901029] libceph: mon2 (1)10.64.24.186:6789 feature set mismatch, my 2f018fb87aa4aafe < server's 2f018fb8faa4aafe, missing 80000000 [73393.901037] libceph: mon2 (1)10.64.24.186:6789 missing required protocol features
$ dmesg | tail [73393.901029] libceph: mon2 (1)10.64.24.186:6789 feature set mismatch, my 2f018fb87aa4aafe < server's 2f018fb8faa4aafe, missing 80000000 [73393.901037] libceph: mon2 (1)10.64.24.186:6789 missing required protocol featuresCopy to Clipboard Copied! Toggle word wrap Toggle overflow These error details confirm that the cluster is using features that the kernel client does not currently support.
- If this error message is not emitted, contact Support.
- With this error message continue by removing the related mappings.
As a workaround, remove the related pg-upmap-primary mappings.
If using the balancer module, change the mode back to one that does not use
pg-upmap-primary. This prevents additional mappings from being made.ceph balancer mode upmap
ceph balancer mode upmapCopy to Clipboard Copied! Toggle word wrap Toggle overflow Remove all
pg-upmap-primarymappings.ceph osd rm-pg-upmap-primary-all
ceph osd rm-pg-upmap-primary-allCopy to Clipboard Copied! Toggle word wrap Toggle overflow
7.7. RADOS Link kopierenLink in die Zwischenablage kopiert!
Get to know the known issues for RADOS found in this release.
Placement groups are not scaled down in upmap-read and read balancer modes
Currently, pg-upmap-primary entries are not properly removed for placement groups (PGs) that are pending merge. For example, when the bulk flag is removed on a pool, or any case where the number of PGs in a pool decreases. As a result, the PG scale-down process gets stuck and the number of PGs in the affected pool do not decrease as expected.
As a workaround, remove the pg_upmap_primary entries in the OSD map of the affected pool. To view the entries, run the ceph osd dump command and then run ceph osd rm-pg-upmap-primary PG_ID for reach PG in the affected pool.
After using the workaround, the PG scale-down process resumes as expected.
Chapter 8. Sources Link kopierenLink in die Zwischenablage kopiert!
The updated Red Hat Ceph Storage source code packages are available at the following location:
- For Red Hat Enterprise Linux 9: https://ftp.redhat.com/redhat/linux/enterprise/9Base/en/RHCEPH/SRPMS/