Chapter 8. Asynchronous errata updates
This section describes the bug fixes, known issues, and enhancements of the z-stream releases.
8.1. Red Hat Ceph Storage 5.3z7
Red Hat Ceph Storage release 5.3z7 is now available. The bug fixes and security updates that are included in the update are listed in the RHSA-2024:4118 and RHSA-2024:4119 advisories.
8.1.1. Enhancements
8.1.1.1. RBD
Improved rbd_diff_iterate2()
API performance
Previously, RBD diff-iterate was not guaranteed to execute locally if exclusive lock was available when diffing against the beginning of time (fromsnapname == NULL
) in fast-diff mode (whole_object == true
with fast-diff
image feature enabled and valid).
With this enhancement, rbd_diff_iterate2()
API performance is improved, thereby increasing the performance of QEMU live disk synchronization and backup use cases, where the fast-diff
image feature is enabled.
8.1.2. Known issues
8.1.2.1. Ceph Upgrade
Cluster keys and certain configuration directories are removed during RHEL 8 to RHEL 9 upgrade
Due to the RHEL 8 deprecation of the libunwind
package, this package is removed when upgrading to RHEL 9. The ceph-common
package depends on the libunwind
package and therefore is removed as well. Removing the ceph-common
package results in the removal of the cluster keys and the certain configurations in the /etc/ceph
and /var/log/ceph
directories.
As a result, various node failures can occur. Ceph operations may not work on some nodes, due to the removal of the /etc/ceph
package. systemd and Podman cannot start on Ceph services on the node due to the removal of /var/log/ceph
package.
As a workaround, configure LEAPP to not remove the libunwind
package. For full instructions, see Upgrading RHCS 5 hosts from RHEL 8 to RHEL 9 removes ceph-common package. Services fail to start on the Red Hat Customer Portal.
8.2. Red Hat Ceph Storage 5.3z6
Red Hat Ceph Storage release 5.3z6 is now available. The bug fixes and security updates that are included in the update are listed in RHSA-2024:0745 advisory.
8.2.1. Enhancements
8.2.1.1. Ceph Object Gateway
The rgw-restore-bucket-index
experimental tool restores bucket indices for versioned and un-versioned buckets
With this enhancement, you can restore the bucket indices for versioned buckets with the rgw-restore-bucket-index
experimental tool, in addition to its existing ability to work with un-versioned buckets.
Enhanced ordered bucket listing
Previously, in some cases, buckets with larger number of shards and several pseudo-subdirectories would take an unnecessarily long time to complete.
With this enhancement, such buckets perform an ordered bucket listing more quickly.
The radosgw-admin bucket stats
command prints bucket versioning
With this enhancement, the radosgw-admin bucket stats
command prints the versioning status for buckets as one of three values of enabled
, off
, or suspended
since versioning can be enabled or disabled after creation.
8.2.1.2. Ceph File System
The MDS default balancer is now disabled by default
With this release, the MDS default balancer or the automatic dynamic subtree balancer is disabled by default. This prevents accidental subtree migrations, Subtree migrations can be expensive to undo when the operator increases the file system max_mds
setting without planning subtree delegations, such as, with pinning.
8.2.1.3. Ceph Manager plugins
Each Ceph Manager module has a separate thread to run commands
Previously, there was one thread through which all the ceph-mgr
module commands were run. If one of the module’s commands were stuck, all the other module’s commands would hang, waiting on the same thread.
With this update, one finisher thread for each Ceph Manager module is added. Each module has a separate thread for commands run. Even if one of the module’s command hangs, the other modules are able to run.
8.2.1.4. RADOS
Improved protection against running BlueStore twice
Previously, advisory locking was used to protect against running BlueStore twice. This works well on baremetal deployments. However, when used on containers it would create unrelated inodes that targeted same mknod b
block device. As a result, two containers might assume that they can have exclusive access which led to severe errors.
With this release, you can improve protection against running OSDs twice at the same time on one block device. You can reinforce advisory locking with O_EXCL open flag dedicated for block devices. It is no longer possible to open one BlueStore instance twice and the overwrite and corruption does not occur.
New reports available for sub-events for delayed operations
Previously, slow operations were marked as delayed but without a detailed description.
With this enhancement, you can view the detailed descriptions of delayed sub-events for operations.
8.2.2. Known issues
8.2.2.1. Ceph Dashboard
Some metrics are displayed as null leading to blank spaces in graphs
Some metrics on the Ceph dashboard are shown as null, which leads to blank space in the graphs since you do not initialize a metric until it has some value.
As a workaround, edit the Grafana panel in which the issue is present. From the Edit menu, click Migrate and select Connect Nulls. Choose Always and the issue is resolved.
8.3. Red Hat Ceph Storage 5.3z5
Red Hat Ceph Storage release 5.3z5 is now available. The bug fixes that are included in the update are listed in the RHBA-2023:4760 advisory.
8.4. Red Hat Ceph Storage 5.3z4
Red Hat Ceph Storage release 5.3z4 is now available. The bug fixes that are included in the update are listed in the RHBA-2023:4213 advisory.
8.4.1. Known issues
8.4.1.1. Multi-site Ceph Object Gateway
md5 mismatch of replicated objects when testing Ceph Object gateway’s server-side encryption in multi-site
Presently, a md5 mismatch of replicated objects is observed when testing Ceph Object gateway’s server-side encryption in multi-site. The data corruption is specific to S3 multipart uploads with SSE encryption enabled. The corruption only affects the replicated copy. The original object remains intact.
Encryption of multipart uploads requires special handling around the part boundaries because each part is uploaded and encrypted separately. In multi-site, objects are encrypted, and multipart uploads are replicated as a single part. As a result, the replicated copy loses its knowledge about the original part boundaries required to decrypt the data correctly, which causes this corruption.
As a workaround, multi-site users should not use server-side encryption for multipart uploads. For more detailed information, see the KCS Sever side encryption with RGW multisite configuration might lead to data corruption of multipart objects.
8.5. Red Hat Ceph Storage 5.3z3
Red Hat Ceph Storage release 5.3z3 is now available. The bug fixes that are included in the update are listed in the RHBA-2023:3259 advisory.
8.5.1. Enhancements
8.5.1.1. The Cephadm utility
Users can now set crush_device_class
in OSD specification
Previously, users would manually set the crush_device_class
after the OSDs were made.
With this release, users can set the crush_device_class
in an OSD specification, which gets cephadm
to mark all OSDs created based on that specification as being that crush device class.
Syntax
service_type: osd service_id: SERVICE_ID_OF_OSD placement: hosts: - HOSTNAME_01 - HOSTNAME_01 crush_device_class: CRUSH_DEVICE_CLASS(SSD/HDD) spec: data_devices: paths: - DATA_DEVICES db_devices: paths: - DB_DEVICES wal_devices: paths: - WAL_DEVICES
Users can now set retention time in Prometheus specification
Previously, setting the retention time required manually editing the unit.run
file, and that would be overwritten whenever the Prometheus daemon was redeployed.
With this release, you can set the retention time in the Prometheus specification file as follows:
Example
service_type: prometheus placement: count: 1 spec: retention_time: "1y"
In this example, the retention time is set to one year instead of the default 15 days.
8.5.2. Known issues
Documentation for users to manage Ceph File system snapshots on the Red Hat Ceph Storage Dashboard
Details for this feature will be included in the next version of the Red Hat Ceph Storage Dashboard Guide.
Documentation for users to manage hosts on the Red Hat Ceph Storage Dashboard
Details for this feature will be included in the next version of the Red Hat Ceph Storage Dashboard Guide.
Documentation for users to import RBD images instantaneously
Details for the
rbd import
command will be included in the next version of the Red Hat Ceph Storage Block Device Guide.
8.6. Red Hat Ceph Storage 5.3z2
Red Hat Ceph Storage release 5.3z1 is now available. The bug fixes that are included in the update are listed in the RHBA-2023:1732 advisory.
8.6.1. Enhancements
8.6.1.1. Ceph File System
Client request counters are converted from _u8
type to _u32
type and the limit is set to 256 times
Previously, in multiple active MDSs cases, if a single request failed in the current MDS, the client would forward the request to another MDS. If no MDS could successfully handle the request, it would bounce infinitely between MDSs. The old num_fwd
/num_retry
counters are _u8
type, which would overflow after bouncing 256 times.
With this enhancement, the counters are converted from _u8
type to _u32
type and the limit for forwarding and retrying is set to 256 times. The client requests stop forwarding and retrying after 256 times and fails directly instead of infinitely forwarding and retrying.
8.6.1.2. Ceph Object Gateway
Administrators can now reuse output from rados ls
to complete bucket reindexing quickly
Previously, running rados ls
command for each bucket was very time-consuming and therefore, slowed down the reindexing of buckets.
With this enhancement, the rgw-restore-bucket-index
tool is enhanced to allow it to reuse a pre-existing output of a rados ls
command, thereby allowing administrators to reuse the output from one rados ls
command. This allows bucket index recovery of multiple non-versioned buckets to be completed more quickly.
8.6.2. Known issues
8.6.2.1. The Cephadm utility
Adding or expanding iSCSI gateways in gwcli
across the iSCSI daemons works as expected
Previously, due to iSCSI daemons not being reconfigured automatically when a trusted IP list was updated in the specification file, adding or expanding iSCSI gateways in gwcli
would fail due to the iscsi-gateway.cfg`
not matching across the iSCSI daemons.
With this fix, you can expand the gateways and add it to the existing gateways with gwcli
command.
ceph orch ps
does not display a version for monitoring stack daemons
In cephadm`
, due to the version grabbing code currently being incompatible with the downstream monitoring stack containers, version grabbing fails for monitoring stack daemons, such as node-exporter
, prometheus
, and alertmanager
.
As a workaround, if the user needs to find the version, the daemons' container names include the version.
8.7. Red Hat Ceph Storage 5.3z1
Red Hat Ceph Storage release 5.3z1 is now available. The bug fixes that are included in the update are listed in the RHBA-2023:0981 advisory.
8.7.1. Enhancements
8.7.1.1. The Cephadm utility
cephadm
automatically updates the dashboard Grafana password if it is set in the Grafana service spec
Previously, users would have to manually set the Grafana password after applying the specification.
With this enhancement, if initial_admin_password
is set in an applied Grafana specification, cephadm
automatically updates the dashboard Grafana password, which is equivalent to running ceph dashboard set-grafana-api-password
command, to streamline the process of fully setting up Grafana. Users no longer have to manually set the dashboard Grafana password after applying a specification that includes the password.
OSDs automatically update their Ceph configuration files with the new mon
locations
With this enhancement, whenever a monmap
change is detected, cephadm
automatically updates the Ceph configuration files for each OSD with the new mon
locations.
This enhancement may take some time to update on all OSDs if you have a lot of OSDs.
8.7.1.2. Ceph Dashboard
The Block Device images table is paginated
With this enhancement, the Block Device images table is paginated to use with 10000+ image storage clusters as retrieving information for a block device image is expensive.
Newly added cross_origin_url
option allows cross origin resource sharing
Previously, IBM developers faced issues with their storage insights product when they tried to ping the REST API using their front-end because of the tight Cross Origin Resource Sharing (CORS) policies set up in Red Hat’s REST API.
With this enhancement, CORS is allowed by adding the cross_origin_url
option that can be set to a particular URL - ceph config set mgr mgr/dashboard/cross_origin_url localhost
and the REST API allows communication with only that URL.
8.7.1.3. Ceph File System
Users can store arbitrary metadata of CephFS subvolume snapshots
With this enhancement, Ceph File System (CephFS) volume users can store arbitrary metadata in the form of key-value pairs for CephFS subvolume snapshots with a set of command-line interface (CLI) commands.
8.7.1.4. Ceph Object Gateway
STS max_session_duration
for a role can now be updated
With this enhancement, the STS max_session_duration
for a role can be updated using the radosgw-admin
command-line interface.
ListBucket
S3 operation now generates JSON output
With this enhancement, on customers’ request to facilitate integrations, the ListBucket
S3 operation generates JSON-formatted output, instead of the default XML, if the request contains an Accept: application/json
header.
The option to enable TCP keepalive managed by libcurl
is added
With this enhancement, the option to enable TCP keepalive on the HTTP client sockets managed by libcurl
is added to make sync and other operations initiated by Ceph Object Gateway more resilient to network instability. This does not apply to connections received by the HTTP frontend, but only to HTTP requests sent by the Ceph Object Gateway, such as Keystone for authentication, sync requests from multi-site, and requests to key management servers for SSE.
Result code 2002 of radosgw-admin
commands is explicitly translated to 2
Previously, a change in the S3 error translation of internal NoSuchBucket
result inadvertently changed the error code from the radosgw-admin bucket stats
command, causing the programs checking the shell result code of those radosgw-admin
commands to see a different result code.
With this enhancement, the result code 2002 is explicitly translated to 2 and users can see the original behaviour.
You can now use use bucket policies with useful errors
Bucket policies were difficult to use since the error indication was wrong. Additionally, silently dropping principals would cause problems during the upgrade. With this update, useful errors from policy parser and a flag to reject invalid principals with rgw policy reject invalid principals=true
parameter is introduced.
8.7.1.5. Multi-site Ceph Object Gateway
The bucket sync run
command provides more details
With this enhancement, user-friendly progress reports on the bucket sync run
command are added to provide users easier visibility into the progress of the operation. When the user runs the radosgw-admin bucket sync run
command with --extra-info
flag, users get a message for the start of generation sync and also for each object that is synced.
It is not recommended to use the bucket sync run
command without contacting Red Hat support.
Multi-site configuration supports dynamic bucket index resharding
Previously, only manual resharding of the buckets for multi-site configurations was supported.
With this enhancement, dynamic bucket resharding is supported in multi-site configurations. Once the storage clusters are upgraded, enable the resharding
feature, zone level, and zone group. You can either manually reshard the buckets with radogw-admin bucket reshard command
or automatically reshard them with dynamic resharding, independently of other zones in the storage cluster.
Users can now reshard bucket index dynamically with multi-site archive zones
With this enhancement, multi-site archive zone bucket index can be resharded dynamically when dynamic resharding is enabled for that zone.
8.7.1.6. RADOS
Low-level log messages are introduced to warn user about hitting throttle limits
Previously, there was a lack of low-level logging indication that throttle limits were hit, causing these occurrences to incorrectly have the appearance of a networking issue.
With this enhancement, the introduction of low-level log messages makes it much clearer that the throttle limits are hit.
8.7.1.7. RADOS Block Devices (RBD)
Cloned images can now be encrypted with their own encryption format and passphrase
With this enhancement, layered client-side encryption is now supported that enables each cloned image to be encrypted with its own encryption format and passphrase, potentially different from that of the parent image. The efficient copy-on-write semantics intrinsic to unformatted regular cloned images are retained.
8.7.2. Known issues
8.7.2.1. The Cephadm utility
Adding or expanding iSCSI gateways in gwcli
across the iSCSI daemons works as expected
Previously, due to iSCSI daemons not being reconfigured automatically when a trusted IP list was updated in the specification file, adding or expanding iSCSI gateways in gwcli
would fail due to the iscsi-gateway.cfg
not matching across the iSCSI daemons.
With this fix, you can expand the gateways and add it to the existing gateways with gwcli
command.
ceph orch ps
does not display a version for monitoring stack daemons
In cephadm
, due to the version grabbing code currently being incompatible with the downstream monitoring stack containers, version grabbing fails for monitoring stack daemons, such as node-exporter
, prometheus
, and alertmanager
.
As a workaround, if the user needs to find the version, the daemons' container names include the version.
8.7.2.2. Ceph Object Gateway
Resharding a bucket having num_shards = 0
results in the bucket’s metadata being lost
Upgrade to Red Hat Ceph Storage 5.3 from older releases with buckets having num_shards = 0
can result in bucket’s metadata loss leading to the bucket’s unavailability while trying to access it. This is a known issue that will be fixed in an upcoming release. The Upgrade guide contains the workaround to disable the dynamic bucket resharding and setting num_shards
to a non-zero value before going for the upgrade. For any help on the upgrade or to know more about the issue, contact Red Hat Support.