Chapter 3. New features
This section lists all major updates, enhancements, and new features introduced in this release of Red Hat Ceph Storage.
Compression on-wire with msgr2 protocol is now available
With this release, in addition to encryption on wire, compression on wire is also supported to secure network operations within the storage cluster.
See the Encryption and key management section in the Red Hat Ceph Storage Data Security and Hardening Guide for more details.
Python notifications are more efficient
Previously, there were some unused notifications that no modules needed at the moment. This caused inefficiency.
With this release, the NotifyType
parameter is introduced. It is annotated, which events modules consume at the moment, for example NotifyType.mon_map
, NotifyType.osd_map
, and the like. As a consequence, only events that modules ask for are queued. The events that no modules consume are issued. Because of these changes, python notifications are now more efficient.
The changes to pg_num
are limited
Previously, if drastic changes were made to pg_num
that outpaced pgp_num
, the user could hit the per-osd placement group limits and cause errors.
With this release, the changes to pg_num
are limited to avoid the issue with per-osd placement group limits.
New pg_progress
item is created to avoid dumping all placement group statistics for progress updates
Previously, the pg_dump
item included unnecessary fields that wasted CPU if it was copied to python-land
. This tended to lead to long ClusterState::lock
hold times, leading to long ms_dispatch
delays and generally slowing the processes.
With this release, a new pg_progress
item is created to dump only the fields that mgr tasks
or progress
needs.
The mgr_ip
is no longer re-fetched
Previously, the mgr_ip
had to be re-fetched during the lifetime of an active Ceph manager module.
With this release, the mgr_ip
does not change for the lifetime of an active Ceph manager module, thereby, there is no need to call back into Ceph Manager for re-fetching.
WORM compliance is now supported
Red Hat now supports WORM compliance.
See the Enabling object lock for S3 for more details.
Set rate limits on users and buckets
With this release, you can set rate limits on users and buckets based on the operations in a Red Hat Ceph Storage cluster. See the Rate limits for ingesting data for more details.
librbd
plugin named persistent write log cache to reduce latency
With this release, the new librbd
plugin named Persistent Write Log Cache (PWL) provides a persistent, fault-tolerant write-back cache targeted with SSD devices. It greatly reduces latency and also improves performance at low io_depths
. This cache uses a log-ordered write-back design which maintains checkpoints internally, so that writes that get flushed back to the cluster are always crash consistent. Even if the client cache is lost entirely, the disk image is still consistent; but the data will appear to be stale.
Ceph File System (CephFS) now supports high availability asynchronous replication for snapshots
Previously, only one cephfs-mirror
daemon would be deployed per storage cluster, thereby a CephFS supported only asynchronous replication of snapshots directories.
With this release, multiple cephfs-mirror
daemons can be deployed on two or more nodes to achieve concurrency in snapshot synchronization, thereby providing high availability.
See the Ceph File System mirroring section in the Red Hat Ceph Storage File System Guide for more details.
BlueStore is upgraded to V3
With this release, BlueStore object store is upgraded to V3. Following are the two features:
- The allocation metadata is removed from RocksDB and now performs a full destage of the allocator object with the OSD allocation.
- With cache age binning, older onodes might be assigned a lower priority than the hot workload data. See the Ceph BlueStore for more details.
Use cephadm
to manage operating system tuning profiles
With this release, you can use cephadm
to create and manage operating system tuning profiles for better performance of the Red Hat Ceph Storage cluster. See the Managing operating system tuning profiles with `cephadm` for more details.
The new cephfs-shell
option is introduced to mount a filesystem by name
Previously, cephfs-shell could only mount the default filesystem.
With this release, a CLI option is added in cephfs-shell that allows the mounting of a different filesystem by name, that is, something analogous to the mds_namespace=
or fs= options
for kclient
and ceph-fuse
.
Day-2 tasks can now be performed through the Ceph Dashboard
With this release, on the Ceph dashboard, user can perform day-2 tasks that require daily or weekly frequency of actions. This enhancement improves the Dashboard’s assessment capabilities, customer experience, and strengthens its usability and maturity. In addition to this, new on-screen elements are also included to help and guide the user in retrieving additional information to complete a task.
3.1. The Cephadm utility
OS tuning profiles added to manage kernel parameters using cephadm
With this release, to achieve feature parity with ceph-ansible
, users can apply tuned
profile specifications that cause cephadm
to set OS tuning parameters on the hosts matching the specifications
See the Managing operating system tuning profiles with `cephadm` for more details.
Users can now easily set the Prometheus TSDB retention size and time in the Prometheus specification
Previously, users could not modify the default 15d retention period and disk consumption from Prometheus.
With this release, users can customize these settings through cephadm
so that they are persistently applied, thereby making it easier for users to specify how much and for how long they would like their Prometheus instances to return data.
The format for achieving this is as follows:
Example
service_type: prometheus placement: count: 1 spec: retention_time: "1y" retention_size: "1GB"
New Ansible playbook is added to define an insecure registry
Previously, when deploying a Red Hat Ceph Storage cluster with a large number of hosts in a disconnected installation environment, it was tedious to populate the /etc/containers/registries.conf
file on each host.
With this release, a new Ansible playbook is added to define an insecure registry in the /etc/containers/registries.conf
file. Therefore, the deployment of such a Ceph cluster in a disconnected installation environment is now easier as the user can populate /etc/containers/registries.conf
with this new playbook.
3.2. Ceph Dashboard
Improved Ceph Dashboard features for rbd-mirroring
is now available
Previously, there was no Ceph Block Device Snapshot mirroring support from the user interface.
With this release, the Ceph Block Device Mirroring tab on the Ceph Dashboard is enhanced with the following features that were previously present only in the command-line interface (CLI):
- Support for enabling or disabling mirroring in images.
- Support for promoting and demoting actions.
- Support for resyncing images.
- The improvement of visibility for editing site names and creating bootstrap keys.
-
A blank page consisting a button to automatically create an
rbd-mirror
if none exists.
A new logging functionality is added to the Ceph dashboard
With this release, a centralized logging functionality for a single cluster, named Daemon Logs, is implemented on the dashboard under the Cluster
A new TTL cache is added between the Ceph Manager and its modules
Big Ceph clusters generate a lot of data, which might overload the cluster and render modules unsuitable.
With this release, a new TTL cache is added between the Ceph Manager and its modules to help alleviate loads and prevent the cluster from overloading.
A new information message is provided on Ceph Dashboard to troubleshoot issues with Grafana
When Grafana is deployed with self-signed TLS certificates instead of certificates signed by a Certificate Authority, most browsers, such as Chrome or Firefox, do not allow the embedded Grafana iframe to be displayed within the Ceph Dashboard.
This is a security limitation imposed by browsers themselves. Some browsers, like Firefox, still display a security warning: Your connection is not secure
, but still allow users to accept the exception and load the embedded Grafana iframe. However, other browsers, for example Chrome, silently fail and do not display any kind of error message, therefore users were not aware of the failure.
With this release, a new notification is displayed on the Ceph Dashboard:
If no embedded Grafana Dashboard appeared below, please follow this link to check if Grafana is reachable and there are no HTTPS certificate issues. You may need to reload this page after accepting any Browser certificate exceptions.
The number of repaired objects in pools is exposed under Prometheus metrics
Previously, data regarding auto-repaired objects was gathered through log parsing which was inefficient.
With this release, the number of repaired objects per pool is now exposed as Prometheus metrics on the Ceph Dashboard.
Ceph Dashboard now clearly indicates errors on certain CephFS operations
Previously, when a user tried to perform an operation on a filesystem directory, but did not have permission, the Ceph Dashboard reported a generic 500 internal server-side error. However, these errors are actually imputable to users, since permissions are the same for preventing certain actions for given users.
With this release, when the user tries to perform an unauthorized operation, they receive a clear explanation on the permission error.
Users can now see new metrics for different storage class in Prometheus
With this release three new metrics, ceph_cluster_by_class_total_bytes
, ceph_cluster_by_class_total_used_bytes
, and ceph_cluster_by_class_total_used_raw_bytes
, are added for different storage class in Prometheus filtered by device class which would help to follow up the performances and the capacity of the infrastructure.
The WAL and DB devices now get filters pre selected automatically
Previously, the user had to manually apply filters to the selected WAL or DB devices, which was a repetitive task.
With this release, when the user selects devices on the primary devices table, the appropriate filters are pre selected for WAL and DB devices.
A new shortcut button for silencing alerts is added
With this release, users can create a silence for every alert in the notification bar on the Ceph Dashboard using the newly created silent shortcut.
Users can now add server side encryption to the Ceph Object Gateway bucket from the Dashboard
Previously, there was no option on the Ceph Dashboard to add server side encryption (SSE) to the Ceph Object Gateway buckets.
With this release, it is now possible to add SSE while creating the Ceph Object Gateway bucket through the Ceph Dashboard.
Cross origin resource sharing is now allowed
Previously, IBM developers were facing issues with their storage insights product when they tried to ping the REST API using their front-end because of the tight cross origin resource sharing (CORS) policies setup in the REST API.
With this release, the cross_origin_url
option is added, which can be set to a particular URL. The REST API now allows communicating with only that URL.
Example
[ceph: root@host01 /]# ceph config set mgr mgr/dashboard/cross_origin_url http://localhost:4200
3.3. Ceph File System
Users can now set and manage quotas on subvolume group
Previously, the user could only apply quotas to individual subvolumes.
With this release, the user can now set, apply and manage quotas for a given subvolume group, especially when working on a multi-tenant environment.
The Ceph File System client can now track average read, write, and metadata latencies
Previously, the Ceph File System client would track only the cumulative read, write, and metadata latencies. However, average read, write, and metadata latencies are more useful to the user.
With this feature, the client can start tracking average latencies and forward it to the metadata server to display in the perf stats
command output and the cephfs-top
utility.
The cephfs-top
utility is improved with the support of multiple file systems
Previously, the cephfs-top
utility with multiple file systems was not reliable. Moreover, there was no option to display the metrics for only the selected file system.
With this feature, the cephfs-top
utility now supports multiple file systems and it is now possible to select an option to see the metrics related to a particular file system.
Users can now use the fs volume info
command to display basic details about a volume
Previously, there was no command in Ceph File System to list only the basic details about a volume.
With this release, the user can list the basic details about a volume by running the fs volume info
command.
See Viewing information about a Ceph file system volume in Red Hat Ceph Storage File System Guide.
Users can list the in-progress or pending clones for a subvolume snapshot
Previously, there was no way of knowing the set of clone operations in-progress or pending for a subvolume snapshot, unless the user knew the clone subvolume’s name and used clone status
command to infer the details.
With this release, for a given subvolume snapshot name, the in-progress or pending clones can be listed.
Users can now use the --human-readable
flag with the fs volume info
command
Previously, all the sizes were displayed only in bytes on running the fs volume info
command.
With this release, users can now see the sizes along with the units on running fs volume info
command.
3.4. Ceph Object Gateway
New S3 bucket lifecycle notifications are now generated
With this release, S3 bucket notifications are generated for current and non-current versions, delete-marker expiration generated by lifecycle processing. This capability is potentially useful for application workflow, among other potential uses.
The objects are transitioned to the S3 cloud endpoint as per the set lifecycle rules
In Red Hat Ceph Storage, we use a special storage class of tier type cloud-s3
to configure the remote cloud S3 object store service to which the data needs to be transitioned. These are defined in terms of zonegroup placement targets and unlike regular storage classes, do not need a data pool.
With this release, users can transition Ceph Object Gateway objects from a Ceph Object Gateway server to a remote S3 cloud-point through storage classes. However, the transition is unidirectional, as such data cannot be transitioned back from the remote server.
The Ceph Object Gateway S3 policy errors are now more useful
Previously, Ceph Object Gateway S3 policy error messages were opaque and not very useful. The initial issue with not being able to access data in the buckets after upgrading versions seemed to be the result of an accepted but invalid principal being ignored silently on ingest but rejected on use later due to a code change.
With this release, the policy now prints detailed and useful error messages. There is also a new rgw-policy-check
command that lets policy documents be tested in the command line, and a new option rgw policy reject invalid principals
that is false
by default and that rejects, with an error message, invalid principals on ingest only rather than ignoring them without error.
Level 20 Ceph Object Gateway log messages are reduced when updating bucket indices
With this release the Ceph Object Gateway level 20 log messages are reduced when updating bucket indices to remove messages that do not add value and to reduce size of logs.
3.5. Multi-site Ceph Object Gateway
Lifecycle policy now runs on all zones in multi-site configurations
With this release, lifecycle policy runs on all zones in multi-site Red Hat Ceph Storage configurations, which makes lifecycle processing more resilient in these configurations. The changes are made to also permit new features, such as conditional processing in archive zones.
Multi-site configuration supports dynamic bucket index resharding
Previously, only manual resharding of the buckets for multi-site configurations was supported.
With this release, dynamic bucket resharding is supported in multi-site configurations. Once the storage clusters are upgraded, enable the resharding
feature and reshard the buckets either manually with radosgw-admin bucket reshard
command or automatically with dynamic resharding, independently of other zones in the storage cluster.
Sites can now customize STS max-session-duration
parameter with the role interface
Previously, the max-session-duration
parameter controlling duration of STS sessions could not be configured because it was not exposed on the interface.
With this release, it is possible to customize STS max-session-duration
parameter through the role interface.
3.6. RADOS
Kafka SASL/SCRAM security mechanism is added to bucket notifications
With this release, Kafka SASL/SCRAM security mechanism is added to bucket notifications.
To know how to use the feature, refer to the "kafka" section in Creating a topic. Note that end-to-end configuration for the feature,in case of testing, is out of scope of Ceph.
Low-level log messages are introduced to warn user about hitting throttle limits
Previously, there was a lack of low-level logging indication that throttle limits were hit, causing these occurrences to incorrectly have the appearance of a networking issue.
With this release, the introduction of low-level log messages makes it much clearer that the throttle limits are hit.
The user is warned on Filestore deprecation through ceph status
and ceph health detail
commands
BlueStore is the default and widely used objectstore.
With this release, if there are any OSDs that are on Filestore, the storage cluster goes into the HEALTH_WARN
status due to the OSD_FILESTORE
health check. The end user has to migrate the OSDs which are on Filestore to BlueStore to clear this warning.
User can now take advantage of a tunable KernelDevice buffer in BlueStore
With this release, users can now configure custom alignment for read buffers using bdev_read_buffer_alignment
command in BlueStore. This removes the limitations imposed by the default 4 KiB alignment space, when buffers are intended to be backed up by huge pages.
Additionally, BlueStore, through KernelDevice, gets a configurable pool with bdev_read_preallocated_huge_buffer_num
parameter of MAP_HUGETLB-based read buffers for workloads with cache-unfriendly access patterns, which undergo recycling and are not cacheable.
Taken together, these features allow to shorten a scatter-gather list that is passed by the storage component to NICs, thereby improving the handling of huge page-based read buffers in BlueStore.
OSDs report the slow operation details in an aggregated format to the Ceph cluster log
Previously, slow requests would overwhelm a cluster log with too many details, filling up the monitor database.
With this release, slow requests by operation type and by pool information gets logged to the cluster log in an aggregated format.
Users can now blocklist a CIDR range
With this release, you can blocklist a CIDR range, in addition to individual client instances and IPs. In certain circumstances, you would want to blocklist all clients in an entire data center or rack instead of specifying individual clients to blocklist.
For example, failing over a workload to a different set of machines and wanting to prevent the old workload instance from continuing to partially operate.
This is now possible using a "blocklist range" analogous to the existing "blocklist" command.
3.7. RADOS Block Devices (RBD)
librbd
SSD-based persistent write-back cache to reduce latency now fully supported.
With this release, the pwl_cache
librbd plugin provides a log-structured write-back cache targeted at SSD devices. Just as with the already provided log-structured write-back cache targeted at PMEM devices, the updates to the image are batched and flushed in-order, retaining the actual image in a crash-consistent state. The benefits and use cases remain the same, but users no longer need to procure more expensive PMEM devices to take advantage of them.
The librbd compare-and-write operation is improved and new rbd_aio_compare_and_writev
API method is introduced
The semantics of compare-and-write C++ API now match those of C API.
Previously, the compare-and-write C++ API, that is
Image::compare_and_write
andImage::aio_compare_and_write
methods, would compare up to the size of the compare buffer. This would cause breakage after straddling a stripe unit boundary.With this release, the compare-and-write C++ API matches the semantics of C API and both compare and write steps operate only on
len
bytes even if the respective buffers are larger.The compare-and-write operation is no longer limited to 512-byte sectors.
With this release, the compare-and-write can operate on stripe units, if the access is aligned properly. The stripe units are 4 MB by default.
New
rbd_aio_compare_and_writev
API method is now available.With this release, the
rbd_aio_compare_and_writev
API method is included to support scatter/gather on both compare and write buffers, which complements existingrbd_aio_readv
andrbd_aio_writev
methods.
Layered client-side encryption is now supported
With this release, cloned images can be encrypted, each with its own encryption format and passphrase, potentially different from that of the parent image. The efficient copy-on-write semantics used for unformatted regular cloned images are retained.