Release Notes
Release notes for Red Hat Ceph Storage 3.2
Abstract
Chapter 1. Introduction
Red Hat Ceph Storage is a massively scalable, open, software-defined storage platform that combines the most stable version of the Ceph storage system with a Ceph management platform, deployment utilities, and support services.
The Red Hat Ceph Storage documentation is available at https://access.redhat.com/documentation/en/red-hat-ceph-storage/.
Chapter 2. Acknowledgments
Red Hat Ceph Storage version 3.2 contains many contributions from the Red Hat Ceph Storage team. Additionally, the Ceph project is seeing amazing growth in the quality and quantity of contributions from individuals and organizations in the Ceph community. We would like to thank all members of the Red Hat Ceph Storage team, all of the individual contributors in the Ceph community, and additionally (but not limited to) the contributions from organizations such as:
- Intel
- Fujitsu
- UnitedStack
- Yahoo
- UbuntuKylin
- Mellanox
- CERN
- Deutsche Telekom
- Mirantis
- SanDisk
- SUSE
Chapter 3. New features
This section lists all major updates, enhancements, and new features introduced in this release of Red Hat Ceph Storage.
The main features added by this release are:
3.1. The ceph-ansible
Utility
Ansible now configures firewalld
by default
The ceph-ansible
utility now configures the firewalld
service by default when creating a new cluster. Previously, it only checked if required ports were opened or closed, but it did not configure any firewall rules.
Pool size can now be customized when deploying clusters with ceph-ansible
Previously, the ceph-ansible
utility set the pool size to 3 by default and did not allow the user to change it. However, in Red Hat OpenStack deployments, setting the size of each pool is sometimes required. With this update, the pool size can be customized. To do so, change the size
setting in the all.yml
file. Each time, the value of size
is changed, a new size is applied.
Ansible now validates CHAP settings before running playbooks
Previously, when the Challenge Handshake Authentication Protocol (CHAP) settings were set incorrectly, the ceph-ansible
utility returned an unclear error message during deploying Ceph iSCSI gateway. With this update, ceph-ansible
validates the CHAP settings before deploying Ceph iSCSI gateways.
The noup
flag is now set before creating OSDs to distribute PGs properly
The ceph-ansible
utility now sets the noup
flag before creating OSDs to prevent them from changing their status to up
before all OSDs are created. Previously, if the flag was not set, placement groups (PGs) were created on only one OSD and got stuck in creation or activation. With this update, the noup
flag is set before creating OSDs and unset after the creation is complete. As a result, PGs are distributed properly among all OSDs.
Variables are now validated at the beginning of an invocation of ceph-ansible
playbooks
The ceph-ansible
utility now validates variables specified in configuration files located in the group_vars
or host_vars
directories at the beginning of playbooks invocation. This change makes it easier to discover misconfigured variables.
Ceph Ansible supports a mulit-site Ceph Object Gateway configuration
With previous versions of ceph-ansible
, only one Object Gateway endpoint was configurable. With this release, ceph-ansible
supports a multi-site Ceph Object Gateway for multiple endpoints. Zones can be configured with multiple Object Gateways and can be added to a zone automatically by appending their endpoint information to a list. With the rgw_multisite_proto
option, users can set it to http
or https
depending on whether the endpoint is configured to use SSL or not.
When more than one Ceph Object Gateway is in the master zone or in the secondary zone, then the rgw_multisite_endpoints
option needs to be set. The rgw_multisite_endpoints
option is a comma separated list, with no spaces. For example:
rgw_multisite_endpoints: http://foo.example.com:8080,http://bar.example.com:8080,http://baz.example.com:8080
When adding a new Object Gateway, append it to the end of the rgw_multisite_endpoints
list with the endpoint URL of the new Object Gateway before running the Ansible playbook.
Ansible now has the ability to start OSD containers using numactl
With this update, the ceph-ansible
utility has the ability to start OSD containers using the numactl
utility. numactl
allows use of the --preferred
option, which means the program can allocate memory outside of the NUMA socket and running out of memory causes less problems.
3.2. Ceph File System
A new subcommand: drop_cache
The ceph tell
command now supports the drop_cache
subcommand. Use this subcommand to drop Metadata Server (MDS) cache without restarting, trim its journal, and ask clients to drop all capabilities that are not in use.
New option: mds_cap_revoke_eviction_timeout
This update adds a new configurable timeout for evicting clients that have not responded to capability revoke request by the Metadata Server (MDS). MDS can request clients to release its capabilities under certain conditions, such as another client requesting a capability that is currently held by a client. The client then releases its capabilities and acknowledges the MDS which can handover the capability to other clients. However, a misbehaving client might not acknowledge or could totally ignore the capability revoke request by the MDS, causing other clients to wait and thereby stalling requested I/O operations. Now, MDS can evict clients that have not responded to capability revoke requests for a configured timeout. This is disabled by default and can be enabled by setting the mds_cap_revoke_eviction_timeout
configuration parameter.
SELinux support for CephFS
This update adds the SELinux policy for the Metadata Server (MDS) and ceph-fuse
daemons so that users can use Ceph File System (CephFS) with SELinux in enforcing mode.
MDS now records the IP address and source port for evicted clients
The Red Hat Ceph Storage Metadata Server (MDS) now logs the IP address and source port for evicted clients. If you want to correlate client evictions with machines, review the cluster log for this information.
Improved logging for Ceph MDS
Now, the Ceph MetaData Server (MDS) outputs more metrics concerning client sessions by default to the debug log. This includes the creation of the client session and other metadata. This information is useful for storage administrators to see when a new client session is created and how long it took to establish a connection.
session_timeout and session_autoclose are now configurable by ceph fs set
You can now configure the session_timeout
and session_autoclose
options by using the ceph fs set
command instead of setting them in the Ceph configuration file.
3.3. The ceph-volume
Utility
Specifying more than one OSD per device is now possible
With this version, a new batch
subcommand has been added. The batch
subcommand includes the --osds-per-device
option that allows specifying multiple OSD per device. This is especially useful when using high-speed devices, such as Non-volatile Memory Express (NVMe).
New subcommand: `ceph-volume lvm batch'
This update adds the ceph-volume lvm batch
subcommand that allows creation of volume groups and logical volumes for OSD provisioning from raw disks. The batch
subcommand makes creating logical volumes easier for users who are not familiar with the Logical Volume Manager (LVM). With batch
, one or many OSDs can be created by passing an array of devices and an OSD count per device to the ceph-volume lvm batch
command.
3.4. Containers
Support the iSCSI gateway in containers
Previously, the iSCSI gateway could not be run in a container. With this update to Red Hat Ceph Storage, a containerized version of the Ceph iSCSI gateway can be deployed with a containerized Ceph cluster.
3.5. Distribution
nfs-ganesha
rebased to 2.7
The nfs-ganesha
package has been upgraded to upstream version 2.7, which provides a number of bug fixes and enhancements over the previous version.
3.6. iSCSI Gateway
Target-level control parameters can be now overridden
Only if instructed to by Red Hat Support, the following configuration settings can now be overridden by using the gwcli reconfigure
subcommand:
- cmdsn_depth
- immediate_data
- initial_r2t
- max_outstanding_r2t
- first_burst_length
- max_burst_length
- max_recv_data_segment_length
- max_xmit_data_segment_length
Tuning these variables might be useful for high IOPS/throughput environments. Only use these variables if instructed to by Red Hat Support
Automatic rotation of iSCSI logs
This update implements automatic log rotation for the rbd-target-gw
, rbd-target-api
, and tcmu-runner
daemons that are used by Ceph iSCSI gateways.
3.7. Object Gateway
Changed the reshard_status
output
Previously, the radogw-admin reshard status --bucket <bucket_name>
command displayed a numerical value for the reshard_status
output. These numerical values corresponded with an actual status, as follows:
CLS_RGW_RESHARD_NONE = 0 CLS_RGW_RESHARD_IN_PROGRESS = 1 CLS_RGW_RESHARD_DONE = 2
In this release, these numerical values were replaced by the actual status.
3.8. Object Gateway Multisite
New performance counters added
This update adds the following performance counters to multi-site configuration of the Ceph Object Gateway to measure data sync:
-
poll_latency
measures the latency of requests for remote replication logs. -
fetch_bytes
measures the number of objects and bytes fetched by data sync.
3.9. Packages
ceph
rebased to 12.2.8
The ceph
package has been upgraded to upstream version 12.2.8, which provides a number of bug fixes and enhancements over the previous version.
3.10. RADOS
OSD BlueStore is now fully supported
BlueStore is a new back end for the OSD daemons that allows for storing objects directly on the block devices. Because BlueStore does not need any file system interface, it improves performance of Ceph Storage Clusters.
To learn more about the BlueStore OSD back end, see the OSD BlueStore chapter in the Administration Guide for Red Hat Ceph Storage 3.
New option: osd_scrub_max_preemptions
With this release a new osd_scrub_max_preemptions
option has been added. This option sets the maximum number of times Ceph preempts a deep scrub due to a client operation before blocking the client I/O to complete the scrubbing process. The option is set to 5 by default.
Offline splitting FileStore directories to a target hash level is now supported
The ceph-objectstore-tool
utility now supports splitting FileStore directories to a target hash level.
New option: osd_memory_target
A new option, osd_memory_target
, has been added with the release. This option sets a target memory size for OSDs. The BlueStore back end adjusts its cache size and attempts to stay close to this target. The ceph-ansible
utility automatically adjusts osd_memory_target
based on host memory. The default value is 4 GiB. The osd_memory_target
option is set differently for Hyper-converged infrastructure (HCI) and non-HCI setups. To differentiate between them, use the is_hci
configuration parameter. This parameter is set to false
by default. To change the default values of osd_memory_target
and is_hci
, set them in the all.yml
file.
New options: osd_delete_sleep
, osd_remove_threads
, and osd_recovery_threads
This update adds a new configuration option, osd_delete_sleep
to throttle object delete operations. In addition, the osd_disk_threads
option has been replaced with the osd_remove_threads
and osd_recovery_threads
options so that users can separately configure the threads for these tasks. These changes help to throttle the rate of object delete operations to reduce the impact on client operations. This is especially important when migrating placement groups (PGs). When using these options, every removal thread sleeps for the number of seconds specified between small batches of removal operations.
Upgrading to the latest version no longer causes cluster data movement
Previously, upgrading a Red Hat Ceph Storage cluster to the latest version when CRUSH device classes were enabled, the crushtool
utility rebalanced data in the cluster because of changes in the CRUSH map. This data movement should not have occurred. With this update, a reclassify functionality is available to help transition from older CRUSH maps that maintains parallel hierarchies for OSDs of different types to a modern CRUSHmap that makes use of the device class feature without triggering data movement.
3.11. Block Devices (RBD)
Support for RBD mirroring to multiple secondary clusters
Mirroring RADOS Block Devices (RBD) from one primary cluster to multiple secondary clusters is now fully supported.
rbd ls
now uses IEC units
The rbd ls
command now uses International Electrotechnical Commission (IEC) units to display image sizes.
Chapter 4. Bug fixes
This section describes bugs with significant impact on users that were fixed in this release of Red Hat Ceph Storage. In addition, the section includes descriptions of fixed known issues found in previous versions.
4.1. The ceph-ansible
Utility
osd_scenario: lvm
now works when deploying Ceph in containers
Previously, the lvm
installation scenario did not work when deploying a Ceph cluster in containers. With this update, the osd_scenario: lvm
installation method is supported as expected in this situation.
The --limit mdss
option now creates CephFS pools as expected
Previously, when deploying the Metadata Server (MDS) nodes by using the Ansible and the --limit mdss
option, Ansible did not create the Ceph File System (CephFS) pools. This bug has been fixed, and Ansible creates the CephFS pools as expected.
Ceph Ansible no longer fails if network interface names include dashes
When ceph-ansible
makes an inventory of network interfaces if they have a dash (-
) in the name the inventory must convert the dashes to undescores (_
) in order to use them. In some cases conversion did not occur and Ceph installation failed. With this update to Red Hat Ceph Storage, all dashes in the names of network interfaces are converted in the facts and installation completes successfully.
Ansible now sets container and service names that correspond with OSD numbers
When containerized Ceph OSDs were deployed with the ceph-ansible
utility, the resulting container names and service names of the OSDs did not correspond in any way to the OSD number and were thus difficult to find and use. With this update, ceph-ansible
has been improved to set container and service names that correspond with OSD numbers. Note that this change does not affect existing deployed OSDs.
Expanding clusters deployed with osd_scenario: lvm
works
Previously, the ceph-ansible
utility could not expand a cluster that was deployed by using the osd_scenario: lvm
option. The underlying source code has been modified, and clusters deployed with osd_scenario: lvm
can be expanded as expected.
Ansible now stops and disables the iSCSI gateway services when purging the Ceph iSCSI gateway
Previously, the ceph-ansible
utility did not stop and disable the Ceph iSCSI gateway services when using the purge-iscsi-gateways.yml
playbook. Consequently, the services had to be stopped manually. The playbook has been improved, and the iSCSI services are now stopped and disabled as expected when purging the iSCSI gateway.
The values passed into devices in osds.yml
are now validated
Previously in the osds.yml
of the Ansible playbook, the values passed into the devices
parameter were not validated. This caused errors when ceph-disk
, parted
, or other device preparation tools failed to operate on devices that did not exist. It also caused errors if the number of values passed into the dedicated_devices
parameter was not equal to the number of values passed into devices
. With this update, the values are validated as expected, and none of the above mentioned errors occur.
Purging clusters using ceph-ansible
deletes logical volumes as expected
When using the ceph-ansible
utility to purge a cluster that deployed OSDs with the ceph-volume
utility, the logical volumes were not deleted. This behavior caused logical volumes to remain in the system after the purge process completed. This bug has been fixed, and purging clusters using ceph-ansible
deletes logical volumes as expected.
The --limit osds
option now works as expected
Previously, an attempt to add OSDs by using the --limit osds
option failed on container setup. The underlying source code has been modified, and adding OSDs with --limit osds
works as expected.
Increased CPU CGroup limit for containerized Ceph Object Gateway
The default CPU CGroup limit for containerized Ceph Object Gateway (RGW) was very low and has been increased with this update to be more reasonable for typical Hard Disk Drive (HDD) production environments. However, consider evaluating what limit to set for the site’s configuration and workload. To customize the limit, adjust the ceph_rgw_docker_cpu_limit
parameter in the Ansible group_vars/rgws.yml
file.
SSL works as expected with containerized Ceph Object Gateways
Previously, the SSL configuration in containerized Ceph Object Gateways did not work because the Certificate Authority (CA) certificate was only added to the TLS bundle on the hypervisor and was not propagated to the Ceph Object Gateway container due to missing container bind mounts on the /etc/pki/ca-trusted/
directory. This bug has been fixed, and SSL works as expected with containerized Ceph Object Gateways.
The rolling-upgrade.yml
playbook now restarts all OSDs as expected
Due to a bug in a regular expression, the rolling-upgrade.yml
playbook did not restart OSDs that used Non-volatile Memory Express devices. The regular expression has been fixed, and rolling-upgrade.yml
now restarts all OSDs as expected.
4.2. Ceph Management Dashboard
The OSD node details are now displayed in the Host OSD Breakdown
panel as expected
Previously, in the Red Hat Ceph Storage Dashboard, the Host OSD Breakdown
information was not displayed on the OSD Node Detail
panel under the All OSD Overview
section. With this update, the underlying issue has been fixed, and the OSD node details are displayed as expected.
4.3. Ceph File System
The Ceph Metadata Server no longer allows recursive stat rctime to go backwards
Previously, the Ceph Metadata Server used the client’s time to update rctime. But because client time may not be synchronized with the MDS, the inode rctime could go backwards. The underlying source code has been modified, and the Ceph Metadata Server no longer allows recursive stat rctime to go backwards.
The ceph-fuse
client no longer indicates incorrect recursive change time
Previously, the ceph-fuse
client did not update change time when file content was modified. Consequently, incorrect recursive change time was indicated. With this update, the bug has been fixed, and the client now indicates the correct change time.
The Ceph MDS no longer allows dumping of cache larger than 1 GB
Previously, if you attempted to dump a Ceph Metadata Server (MDS) cache with a size of around 1 GB or larger, the MDS could terminate unexpectedly. With this update, MDS no longer allows dumping of cache that size so the MDS no longer terminates in the described situation.
When Monitors cannot reach an MDS, they no longer incorrectly mark its rank as damaged
Previously, Monitors were evicting and fencing an unreachable Metadata Server (MDS), then MDS was signaling that its rank was damaged due to improper handling of blacklist errors. Consequently, Monitors were incorrectly marking the rank as damaged, and the file system became unavailable because of one or more damaged ranks. In this release, the Monitors are setting the correct rank.
The reconnect timeout for MDS clients has been extended
When the Metadata Server (MDS) daemon was handling a large number of reconnecting clients with a huge number of capabilities to aggregate, the reconnect timeout was reached. Consequently, the MDS rejected clients that attempted to reconnect. With this update, the reconnect timeout has been extended, and MDS now handles reconnecting clients as expected in the described situation.
Shrinking large MDS cache no longer causes the MDS daemon to appear to hang
Previously, an attempt to shrink a large Metadata Server (MDS) cache caused the primary MDS daemon to become unresponsive. Consequently, Monitors removed the unresponsive MDS and a standby MDS became the primary MDS. With this update, shrinking large MDS cache no longer causes the primary MDS daemon to hang.
4.4. Ceph Manager Plugins
HDD and SSD devices can now be mixed when accessing the /osd
endpoint
Previously, the Red Hat Ceph Storage RESTful API did not handle when HDD and SSD devices were mixed when accessing the /osd
endpoint and returned an error. With this update, the OSD traversal algorithm has been improved to handle this scenario as expected.
4.5. The ceph-volume
Utility
ceph-volume does not break custom named clusters
When using a custom storage cluster name other than ceph
, the OSDs could not start after a reboot. With this update, ceph-volume
provisions OSDs in a way that allows them to boot properly when a custom name is used.
Despite this fix, Red Hat does not support clusters with custom names. This is because the upstream Ceph project removed support for custom names in the Ceph OSD, Monitor, Manager, and Metadata server daemons. The Ceph project removed this support because it added complexities to systemd unit files. This fix was created before the decision to remove support for custom cluster names was made.
4.6. Containers
Deploying encrypted OSDs in containers by using ceph-disk
works as expected
When attempting to deploy a containerized OSD by using the ceph-disk
and dmcrypt
utilities, the container process failed to start because the OSD ID could not be found by the mounts table. With this update, the OSD ID is correctly determined, and the container process no longer fails.
4.7. Object Gateway
CivetWeb was rebased to upstream version 1.10 and the enable_keep_alive
CivetWeb option works as expected
When using the Ceph Object Gateway with the CivetWeb front end, the CivetWeb connections timed out despite the enable_keep_alive
option enabled. Consequently, S3 clients that did not reconnect or retry were not reliable. With this update, CivetWeb has been updated, and the enable_keep_alive
option works as expected. As a result, CivetWeb connections no longer time out in this case.
In addition, the new CivetWeb version introduces more strict header checks. This new behavior can cause certain return codes to change because invalid requests are detected sooner. For example, in previous version CivetWeb returned the 403 Forbidden
error on an invalid HTTP request, but in the new version it returns the 400 Bad Request
error instead.
Red Hat Ceph Storage passes the Swift Tempest test in the RefStack 15.0 toolset
Various improvements have been made to the Ceph Object Gateway Swift service. As a result, when configured correctly, Red Hat Ceph Storage 3.2, which includes the ceph-12.2.8
package, passes the Swift Tempest tempest.api.object_storage
test suite with the exception of the test_container_synchronization
test case. Red Hat Ceph Storage includes a different synchronization model, multisite operations, for users who require that feature.
Mounting the NFS Ganesha file server in a containerized IPv6 cluster no longer fails
When a containerized IPv6 Red Hat Ceph Storage cluster with an nfs-ganesha-rgw
daemon was deployed by using the ceph-ansible
utility, an attempt to mount the NFS Ganesha file server on a client failed with the Connection Refused
error. Consequently, I/O requests were unable to run. This update fixes the default configuration IPv6 connections, and mounting the NFS Ganesha server works as expected in this case.
Stale lifecycle configuration data of deleted buckets no longer persists in OMAP
consuming space
Previously, in the Ceph Object Gateway (RGW), incorrect key formatting in the RGWDeleteLC::execute()`function caused bucket lifecycle configuration metadata to persist after the deletion of the corresponding bucket. This caused stale lifecycle configuration data to persist in `OMAP
consuming space. With this update, the correct name for the lifecycle object is now used in RGWDeleteLC::execute()
, and the lifecycle configuration is removed as expected on removal of the corresponding bucket.
The Keystone credentials were moved to an external file
When using the Keystone identity service to authenticate a Ceph Object Gateway user, the Keystone credentials were set as plain text in the Ceph configuration file. With this update, the Keystone credentials are configured in an external file that only the Ceph user can read.
Wildcard policies match objects with colons in the name
Previously, using colons in the name caused an error in a matching function not allowing wildcards to match beyond colons. In this release, colons can be used to match objects.
Lifecycle rules with multiple tag filters are no longer rejected
Due to a bug in lifecycle rule processing, an attempt to install the lifecycle rules with multiple tag filters was rejected and the InvalidRequest
error message was returned. With this update, other rule forms are used, and lifecycle rules with multiple tag filters are no longer rejected.
An object can no longer be deleted even if a bucket or user policy with DENY s3:DeleteObject exists
Previously, this issue was caused by an incorrect value being returned by a method which evaluates policies. In this release, the correct value is being returned.
The Ubuntu nfs_ganesha
package did not install the systemd unit file properly
When running systemctl enable nfs-ganesha
the following error would be printed: Failed to execute operation: No such file or directory
. This was because the nfs-ganesha-lock.service
file was not created properly. With this release, the file is created properly and the nfs-ganehsa
service can be enabled successfully.
(BZ#1660063)
The Ceph Object Gateway supports a string as a delimiter
Invalid logic was used to find and project a delimiter sequence greater than one character. This was causing the Ceph Object Gateway to fail any request with a string as the delimiter, returning an invalid utf-8 character
message. The logic handling the delimiter has been replaced by an 8-bit shift-carry equivalent. As a result, a string delimiter will work correctly. Red Hat has only tested this against the US-ascii
character set.
Mapping NFS exports to Object Gateway tenant user IDs works as expected
Previously, the NFS server for the Ceph Object Gateway (nfs-ganesha
) did not correctly map Object Gateway tenants into their correct namespace. As a consequence, an attempt to map an NFS export onto Ceph Object Gateway with a tenanted user ID silently failed; the account could authenticate and NFS mounts could succeed, but the namespace did not contain buckets and objects. This bug has been fixed, and tenanted mappings are now set correctly. As a result, NFS exports can now be mapped to Object Gateway tenant user IDs and buckets and objects are visible as expected in the described situation.
An attempt to get bucket ACL for non-existing bucket returns an error as expected
Previously, an attempt to get bucket Access Control Lists (ACL) for a non-existent bucket by calling the GetBucketAcl()
function returned a result instead of returning a NoSuchBucket
error. This bug has been fixed, and the NoSuchBucket
error is returned in the aforementioned situation.
(BZ#1667142)
The log level for gc_iterate_entries
has been changed to 10
Previously, the log level for the gc_iterate_entries
log message was set to 0. As a consequence, OSD log files included unnecessary information and could grow significantly. With this update, the log level for gc_iterate_entries
has been changed to 10.
Garbage collection no longer consumes bandwidth without making forward progress
Previously, some underlying bugs prevented garbage collection (GC) from making forward progress. Specifically, the marker was not always being advanced, GC was unable to process entries with zero-length chains, and the truncated flag was not always being set correctly. This caused GC to consume bandwidth without making any forward progress, thereby not freeing up disk space, slowing down other cluster work, and allowing OMAP entries related to GC to continue to increase. With this update, the underlying bugs have been fixed, and GC is able to make progress as expected freeing up disk space and OMAP entries.
The radosgw-admin utility no longer gets stuck and creates high read operations when creating greater than 999 buckets per user
An issue with a limit check caused the radosgw-admin
utility to never finish when creating 1,000 or more buckets per user. This problem has been fixed and radosgw-admin
no longer gets stuck or creates high read operations.
LDAP authentication is available again
Previously, a logic error caused LDAP authentication checks to be skipped. Consequently, the LDAP authentication was not available. With this update, the checks for a valid LDAP authentication setup and credentials have been fixed, and LDAP authentication is available again.
(BZ#1687800)
NFS Ganesha no longer aborts when an S3 object name contains a //
sequence
Previously, the NFS server for the Ceph Object Gateway (RGW NFS
) would abort when as S3 object name contained a //
sequence. With this update, RGW NFS
ignores such sequence as expected and no longer aborts.
(BZ#1687970)
Expiration time is calculated the same as S3
Previously, a Ceph Object Gateway computed relative object’s life cycle expiration rules from the time of creation, rather than rounded to midnight UTC as in AWS. This could cause the following error: botocore.exceptions.ClientError: An error occurred (InvalidArgument) when calling the PutBucketLifecycleConfiguration operation: 'Date' must be at midnight GMT
. Expiration is now rounded to midnight UTC for greater AWS compatibility.
Operations waiting for resharding to complete are able to complete after resharding
Previously, when using dynamic resharding, some operations that were waiting to complete after resharding failed to complete. This was due to code changes to the Ceph Object Gateway when automatically cleaning up no longer used bucket index shards. While this reduced storage demands and eliminated the need for manual clean up, the process removed one source of an identifier needed for operations to complete after resharding. The code has been updated so that identifier is retrieved from a different source after resharding and operations requiring it can now complete.
radosgw-admin bi put
now sets the correct mtime
time stamp
Previously, the radosgw-admin bi put
command did not set the mtime
time stamp correctly. This bug has been fixed.
Ceph Object Gateway lifecycle works properly after a bucket is resharded
Previously, after a bucket was resharded using the dynamic resharding feature, if a lifecycle policy was applied to the bucket, it did not complete and the policy failed to update the bucket. With this update to Red Hat Ceph Storage, a lifecycle policy is properly applied after resharding of a bucket.
The RGW server no longer returns an incorrect S3 error code NoSuchKey
when asked to return non-existent CORS
rules
Previously, the Ceph Object Gateway (RGW) sever would return an incorrect S3 error code NoSuchKey
when asked to return non-existent CORS
rules. This caused the s3cmd
tool and other programs to misbehave. With this update, the RGW server now returns NoSuchCORSConfiguration
for this case, and the s3cmd
tool and other programs that expect this error behave correctly.
Decrypting multipart uploads was corrupting data
When doing multipart uploads with SSE-C, the part size was not a multiple of the 4k encryption block size. While the multipart uploads were encrypted correctly, the decryption process failed to account for the part boundaries and was returning corrupted data. With this release, the decryption process correctly handles the part boundaries when using SSE-C. As a result, all encrypted multipart uploads can be successfully decrypted.
4.8. Object Gateway Multisite
Redundant multi-site replication sync errors were moved to debug level 10
A few multi-site replication sync errors were logged multiple times at log level 0 and consumed extra space in logs. This update moves the redundant messages to debug level 10 to hide them from the log.
Buckets with false entries can now be deleted as expected
Previously, bucket indices could include "false entries" that did not represent actual objects and that resulted from a prior bug. Consequently, during the process of deleting such buckets, encountering a false entry caused the process to stop and return an error code. With this update, when a false entry is encountered, Ceph ignores it, and deleting buckets with false entries works as expected.
Datalogs are now trimmed regularly as expected
Due to a regression in decoding of the JSON format of data sync status objects, automated datalog trimming logic was unable to query the sync status of its peer zones. Consequently, the datalog trimming process did not progress. This update fixes the JSON decoding and adds more regression test coverage for log trimming. As a result, datalogs are now trimmed regularly as expected.
Objects are now synced correctly in versioning-suspended buckets
Due to a bug in multi-site sync of versioning-suspended buckets, certain object versioning attributes were overwritten with incorrect values. Consequently, the objects failed to sync and attempted to retry endlessly, blocking further sync progress. With this update, the sync process no longer overwrites versioning attributes. In addition, any broken attributes are now detected and repaired. As a result, objects are synced correctly in versioning-suspended buckets.
Objects are now synced correctly in versioning-suspended buckets
Due to a bug in multi-site sync of versioning-suspended buckets, certain object versioning attributes were overwritten with incorrect values. Consequently, the objects failed to sync and attempted to retry endlessly, blocking further sync progress. With this update, the sync process no longer overwrites versioning attributes. In addition, any broken attributes are now detected and repaired. As a result, objects are synced correctly in versioning-suspended buckets.
Buckets with false entries can now be deleted as expected
Previously, bucket indices could include "false entries" that did not represent actual objects and that resulted from a prior bug. Consequently, during the process of deleting such buckets, encountering a false entry caused the process to stop and return an error code. With this update, when a false entry is encountered, Ceph ignores it, and deleting buckets with false entries works as expected.
radosgw-admin sync status
now shows timestamps for master zone
Previously in Ceph Object Gateway multisite, running radosgw-admin sync status
on the master zone did not show timestamps, which made it difficult to tell if data sync was making progress. This bug has been fixed, and timestamps are shown as expected.
Synchronizing a multi-site Ceph Object Gateway was getting stuck
When recovering versioned objects, other operations were unable to finish. These stuck operations were caused by the removing of expired user.rgw.olh.pending
extended attributes (xattrs) all at once on those versioned objects. Another bug was causing too many of the user.rgw.olh.pending
xattrs to be written to those recovering versioned objects. With this release, batches of expired xattrs are removed instead of all at once. This results in versioned objects recovering correctly so other operations can proceed normally.
A multi-site Ceph Object Gateway is not trimming the data and bucket index logs
Configuring zones for a multi-site Ceph Object Gateway without setting the sync_from_all
option, was causing the data and bucket index logs not to be trimmed. With this release, the automated trimming process only consults the synchronization status of peer zones that are configured to synchronize. As result, this allows the data and bucket index logs to be trimmed properly.
4.9. RADOS
A PG repair no longer sets the storage cluster to a warning state
When doing a repair of a placement group (PG) it was considered a damaged PG. This was placing the storage cluster into a warning state. With this release, repairing a PG does not place the storage cluster into a warning state.
The ceph-mgr
daemon no longer crashes after starting balancer module in automatic mode
Previously, due to a CRUSH bug, invalid mappings were created. When an invalid mapping was encountered in the _apply_upmap
function, the code caused a segmentation fault. With this release, the code has been updated to check that the values are within an expected range. If not, the invalid values are ignored.
RocksDB compaction no longer exhausts free space of BlueFS
Previously, the balancing of free space between main storage and storage for RocksDB, managed by BlueFS, happened only when write operations were underway. This caused an ENOSPC
error for BlueFS to be returned when RocksDB compaction was triggered right before long interval without write operations. With this update, the code has been modified to periodically check free space balance even if no write operations are ongoing so that compaction no longer exhausts free space of BlueFS.
PGs per OSD limits have been increased
In some situations, such as widely varying disk sizes, the default limit on placement groups (PGs) per OSD could prevent PGs from going active. These limits have been increased by default to make this situation less likely.
Ceph installation no longer fails when FIPS mode is enabled
Previously, installing Red Hat Ceph Storage using the ceph-ansible
utility failed at TASK [ceph-mon : create monitor initial keyring]
when FIPS mode was enabled. To resolve this bug, the symmetric cipher cryptographic key is now wrapped with a one-shot wrapping key before it is used to instantiate the cipher. This allows Red Hat Ceph Storage to install normally when FIPS mode is enabled.
Slow request messages have been re-added to the OSD logs
Previously, slow request messages were removed from the OSD logs, which made debugging harder. This update re-adds these warnings to the OSD logs.
Force backfill and recovery preempt a lower priority backfill or recovery
Previously, force backfill or force recovery did not preempt an already running recovery or backfill process. As a consequence, although force backfill or recovery set priority to the max value, recovery process for placement groups (PGs) already running at a lower priority was finished first. With this update, force backfill and recovery preempt a lower priority backfill or recovery processes.
Ceph Manager no longer crashes when two or more Ceph Object Gateway daemons use the same name
Previously, when two or more Ceph Object Gateway daemons used the same name in a cluster, Ceph Manager terminated unexpectedly. The underlying source code has been modified, and Ceph Manager no longer crashes in the described scenario.
A race condition was causing threads to deadlock with the standby ceph-mgr
daemon
Some threads can cause a race condition when acquiring a local lock and the Python global interpreter lock, which is causing a deadlock issue for each thread. As the thread holds on to one of the locks, it wants to acquire the other lock, but cannot. In this release, the code was fixed to close the window of opportunity for the race condition to occur. This is done by changing the location of the lock acquisition and releasing the appropriate locks. Doing this results in the threads not causing a deadlock, which allows progress to be made.
An OSD daemon no longer crashes when a block device has read errors
Previously, an OSD daemon would crash when a block device had read errors, because the daemon expected only a general EIO error code, not the more specific errors the kernel generates. With this release, low-level errors are mapped to EIO, resulting in an OSD daemon not crashing because of an unrecognized error code.
Read retries no longer cause the client to hang after a failed sync read
Previously, when an OSD daemon failed to sync read an object, the length of the object to be read was set to 0. This caused the read retry to incorrectly read the entire object. The underlying code has been fixed, and the read retry uses the correct length and does not cause the client to hang.
(BZ#1682966)
4.10. Block Devices (RBD)
The python-rbd list_snaps() method no longer segfaults after an error
This issue was discovered with OpenStack Cinder Backup when rados_connect_timeout
was set. Normally the timeout is not enabled. If the cluster was highly loaded the timeout could be reached, causing the segfault. With this update to Red Hat Ceph Storage, if the timeout is reached a segfault no longer occurs.
Chapter 5. Technology previews
This section provides an overview of Technology Preview features introduced or updated in this release of Red Hat Ceph Storage.
Technology Preview features are not supported with Red Hat production service level agreements (SLAs), might not be functionally complete, and Red Hat does not recommend to use them for production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information on Red Hat Technology Preview features support scope, see https://access.redhat.com/support/offerings/techpreview/.
5.1. Block Devices (RBD)
Erasure Coding for Ceph Block Devices
Erasure coding for Ceph Block Devices is supported as a Technology Preview. For details, see the Erasure Coding with Overwrites (Technology Preview) section in the Storage Strategies Guide for Red Hat Ceph Storage 3.
5.2. Ceph File System
Erasure Coding for Ceph File System
Erasure coding for Ceph File System is now supported as a Technology Preview. For details, see the Creating Ceph File Systems with erasure coding section in the Ceph File System Guide for Red Hat Ceph Storage 3.
5.3. Object Gateway
Improved interoperability with S3 and Swift by using a unified tenant namespace
This enhancement allows buckets to be moved between tenants. It also allows buckets to be renamed.
In Red Hat Ceph Storage 2 the rgw_keystone_implicit_tenants
option only applied to Swift. As of Red Hat Ceph Storage 3 this option applies to s3 also. Sites that used this feature with Red Hat Ceph Storage 2 now have outstanding data that depends on the old behavior. To accommodate that issue this enhancement also expands rgw_keystone_implicit_tenants
so it can be set to any of "none", "all", "s3", or "swift".
For more information, see Bucket management in the Object Gateway Guide for Red Hat Enterprise Linux or Object Gateway Guide for Ubuntu depending on your distribution. The rgw_keystone_implicit_tenants
setting is documented in the Using Keystone to Authenticate Ceph Object Gateway Users guide.
AWS4 signature support in S3 authentication for Ceph Object Gateway when using Keystone
With this update, S3 user authentication using the new AWS4 signatures as a part of the Keystone service is supported as a Technology Preview.
The Ceph Object Gateway supports a subset of the Amazon Secure Token Service (STS) REST APIs. STS Lite is one supported API. It provides access to a set of temporary credentials for identity and access management. For more information, see Authentication using the STS Lite API (Technology Preview) in the Developer Guide.
The Beast HTTP front end
This update adds a new Ceph Object Gateway HTTP front end called Beast as a Technology Preview. The Beast front end uses the Boost.Beast library for HTTP parsing and the Boost.Asio library for asynchronous I/O.
Experimental support for delegated authorization using the Open Policy Agent (OPA)
The Open Policy Agent is a distributed policy-based authorization framework being incubated in the Cloud-Native Computing Foundation (CNCF). This feature is in development and is not to be used in a production environment.
Chapter 6. Known issues
This section documents known issues found in this release of Red Hat Ceph Storage.
6.1. The ceph-ansible
Utility
The shrink-osd.yml
playbook currently has no support for removing OSDs created by ceph-volume
The shrink-osd.yml
playbook assumes all OSDs are created by the ceph-disk
utility. Consequently, OSDs deployed by using the ceph-volume
utility cannot be shrunk.
To work around this issue, remove OSDs deployed by using ceph-volume
manually.
Partitions are not removed from NVMe devices by shrink-osd.yml in certain situations
The Ansible playbook infrastructure-playbooks/shrink-osd.yml
does not properly remove partitions on NVMe devices when used with osd_scenario: non-collocated
in containerized environments.
To work around this issue, manually remove the partitions.
When putting a dedicated journal on an NVMe device installation can fail
When the dedicated_devices
setting contains an NVMe device and it has partitions or signatures on it Ansible installation might fail with an error like the following:
journal check: ondisk fsid 00000000-0000-0000-0000-000000000000 doesn't match expected c325f439-6849-47ef-ac43-439d9909d391, invalid (someone else's?) journal
To work around this issue, ensure there are no partitions or signatures on the NVMe device.
When deploying Ceph NFS Ganesha gateways on Ubuntu IPv6 systems ceph-ansible may fail to start the nfs-ganesha services
This issue causes Ceph NFS Ganesha gateways to fail to deploy.
To work around this issue, rerun ceph-ansible playbook site.yml
to deploy only the Ceph NFS Ganesha gateways:
[root@ansible ~]# ansible-playbook /usr/share/ceph-ansible/site.yml --limit nfss
When using dedicated devices for BlueStore the default sizes for block.db and block.wal might be too small
By default ceph-ansible
does not override the default values bluestore block db size
and bluestore block wal size
. The default sizes are 1 GB and 576 MB respectively. These sizes might be too small when using dedicated devices with BlueStore.
To work around this issue, set bluestore_block_db_size
or bluestore_block_wal_size
, or both, using ceph_conf_overrides
in ceph.conf
to override the default values.
6.2. Ceph Management Dashboard
Ceph OSD encryption summary is not displayed in the Red Hat Ceph Storage Dashboard
On the Ceph OSD Information dashboard, under the OSD Summary panel, the OSD Encryption Summary information is not displayed.
There is no workaround at this time.
The Prometheus node-exporter
service is not removed after purging the Dashboard
When purging the Red Hat Ceph Storage Dashboard, the node-exporter
service is not removed, and is still running.
To work around this issue, manually stop and remove the node-exporter
service.
Perform the following commands as root
:
# systemctl stop prometheus-node-exporter # systemctl disable prometheus-node-exporter # rpm -e prometheus-node-exporter # reboot
For Ceph Monitor, OSD, Object Gateway, MDS, and Dashboard, nodes, reboot these one at a time.
The OSD down
tab shows an incorrect value
When rebooting OSDs, the OSD down
tab in the CEPH Backend storage
dashboard shows the correct number of OSDs that are down
. However, when all OSDs are up
again after the reboot, the tab continues showing the number of down
OSDs.
There is no workaround at this time.
The Top 5 pools by Throughput graph lists all pools
The Top 5 pools by Throughput graph in the Ceph Pools tab lists all pools in the cluster instead of listing only the top five pools with the highest throughput.
There is no workaround at this time.
The MDS Performance dashboard displays the wrong value for Clients after increasing and decreasing the number of active MDS servers and clients multiple times.
This issue causes the Red Hat Ceph Storage dashboard to display the wrong number of CephFS clients. This can be verified by comparing the value in the Red Hat Ceph Storage dashboard with the value printed by the ceph fs status $FILESYSTEM_NAME
command.
There is no workaround at this time.
Request Queue Length
displays an incorrect value
In the Ceph RGW Workload
dashboard, the Request Queue Length
parameter always displays 0
even when running Ceph Object Gateways I/Os from different clients.
There is no workaround at this time.
Capacity Utilization in Ceph - At Glance dashboard shows the wrong value when an OSD is down
This issue causes the Red Hat Ceph Dashboard to show capacity utilization which is less than what ceph df
shows.
There is no workarond at this time.
Some links on the Ceph - At Glance page do not work after installing ceph-metrics
After installing ceph-metrics
, some of the panel links on the Ceph - At Glance page in the Ceph Dashboard do not work.
To work around this issue, clear the browser cache and reload the Ceph Dashboard site.
The iSCSI Overview dashboard does not display graphs if the [iscsigws] role is included in the Ansible inventory file.
When deploying the Red Hat Ceph Storage Dashboard, the iSCSI Overview dashboard does not display any graphs or values if the Ansible inventory file has the [iscsigws] role included for iSCSI gateways.
To work around this issue, add [iscsis]
as a role in the Ansible inventory file and run the Ansible playbook for cephmetrics-ansible
. The iSCSI Overview dashboard then displays the graphs and values.
In the Ceph Cluster dashboard the Pool Capacity graphs display values higher than actual capacity
This issue causes the Pool Capacity graph to display values around one percent higher than what df --cluster
shows.
There is no workaround at this time.
Graphs on the OSD Node Detail dashboard might appear incorrect when used with All
Graphs generated under OSD Node Detail > OSD Host Name > All do not show all OSDs in the cluster. A graph with data for hundreds or thousands of OSDs would not be usable. The ability to set All is intended to show cluster-wide values. For some dashboards it does not make sense and should not be used.
There is no workaround at this time.
6.3. Ceph File System
The Ceph Metadata Server might crash during scrub with multiple MDS
This issue is triggered when the scrub_path
command is run in an environment with multiple Ceph Metadata Servers.
There is no workaround at this time.
6.4. The ceph-volume
Utility
Deploying an OSD on devices with GPT headers fails
Drives with GPT headers will cause an error to be returned by LVM when deploying an OSD on them. The error says the device has been excluded by a filter.
To work around this issue ensure there is no GPT header present on devices to be used by OSDs.
6.5. iSCSI Gateway
Using ceph-ansible
to deploy the iSCSI gateway does not allow the user to adjust the max_data_area_mb
option
Using the max_data_area_mb
option with the ceph-ansible
utility sets a default value of 8 MB. To adjust this value, set it manually using the gwcli
command. See the Red Hat Ceph Storage Block Device Guide for details on setting the max_data_area_mb
option.
Ansible fails to purge RBD images with snapshots
The purge-iscsi-gateways.yml
Ansible playbook does not purge RBD images with snapshots. To purge the images and their snapshots, use the rbd
command-line utility:
To purge a snapshot:
rbd snap purge pool-name/image-name
For example:
# rbd snap purge data/image1
To delete an image:
rbd rm image-name
For example:
# rbd rm image1
6.6. Object Gateway
Ceph Object Gateway garbage collection decreases client performance by up to 50% during mixed workload
In testing during a mixed workload of 60% read operations, 16% write operations, 14% delete operations, and 10% list operations, at 18 hours into the testing run, client throughput and bandwidth drop to half their earlier levels.
Pushing a docker image to the Ceph Object Gateway over s3 does not complete
In certain situations when configuring docker-distribution
to use Ceph Object Gateway with the s3 interface the docker push
command does not complete. Instead the command fails with an HTTP 500 error.
There is no workaround at this time.
Delete markers are not removed with a lifecycle configuration
In certain situations after deleting a file and a lifecycle triggers, delete markers are not removed.
There is no workaround at this time.
The Ceph Object Gateway’s S3 does not always work in FIPS mode
If a secret key of a Ceph Object Gateway user or sub-user is less than 112 bits in length, it can cause the radosgw
daemon to exit unexpectedly when a user attempts to authenticate using S3.
This is because the FIPS mode Red Hat Enterprise Linux security policy forbids construction of a cryptographic HMAC based on a key of less than 112 bits, and violation of this constraint yields an exception that is not correctly handled in Ceph Object Gateway.
To work around this issue, ensure that the secret keys of Ceph Object Gateway users and sub-users are at least 112 bits in length.
6.7. RADOS
Performing I/O in CephFS erasure-coded pools can cause a failure on assertion
This issue is being investigated as a possible latent bug in the messenger layer which could be causing out of order operations on the OSD.
The issue causes the following error:
FAILED assert(repop_queue.front() == repop)
There is no workaround at this time. CephFS with erasure-coded pools are a Technology Preview. For more information see Creating Ceph File Systems with erasure coding in the Ceph File System Guide
Chapter 7. Deprecated functionality
This section provides an overview of functionality that has been deprecated in all minor releases up to this release of Red Hat Ceph Storage.
7.1. The ceph-ansible
Utility
The rgw_dns_name
parameter
The rgw_dns_name
parameter is deprecated. Instead, configure the RADOS Gateway (RGW) zonegroup with the RGW DNS name. For more information, see: Ceph - How to add hostnames in RGW zonegroup in the Red Hat Customer Portal.
Chapter 8. Sources
The updated Red Hat Ceph Storage source code packages are available at the following locations:
- For Red Hat Enterprise Linux: http://ftp.redhat.com/redhat/linux/enterprise/7Server/en/RHCEPH/SRPMS/
- For Ubuntu: https://rhcs.download.redhat.com/ubuntu/