Chapter 3. New features


This section lists all major updates, enhancements, and new features introduced in this release of Red Hat Ceph Storage.

Compression on-wire with msgr2 protocol is now available

With this release, in addition to encryption on wire, compression on wire is also supported to secure network operations within the storage cluster.

See the Encryption and key management section in the Red Hat Ceph Storage Data Security and Hardening Guide for more details.

Python notifications are more efficient

Previously, there were some unused notifications that no modules needed at the moment. This caused inefficiency.

With this release, the NotifyType parameter is introduced. It is annotated, which events modules consume at the moment, for example NotifyType.mon_map, NotifyType.osd_map, and the like. As a consequence, only events that modules ask for are queued. The events that no modules consume are issued. Because of these changes, python notifications are now more efficient.

The changes to pg_num are limited

Previously, if drastic changes were made to pg_num that outpaced pgp_num, the user could hit the per-osd placement group limits and cause errors.

With this release, the changes to pg_num are limited to avoid the issue with per-osd placement group limits.

New pg_progress item is created to avoid dumping all placement group statistics for progress updates

Previously, the pg_dump item included unnecessary fields that wasted CPU if it was copied to python-land. This tended to lead to long ClusterState::lock hold times, leading to long ms_dispatch delays and generally slowing the processes.

With this release, a new pg_progress item is created to dump only the fields that mgr tasks or progress needs.

The mgr_ip is no longer re-fetched

Previously, the mgr_ip had to be re-fetched during the lifetime of an active Ceph manager module.

With this release, the mgr_ip does not change for the lifetime of an active Ceph manager module, thereby, there is no need to call back into Ceph Manager for re-fetching.

WORM compliance is now supported

Red Hat now supports WORM compliance.

See the Enabling object lock for S3 for more details.

Set rate limits on users and buckets

With this release, you can set rate limits on users and buckets based on the operations in a Red Hat Ceph Storage cluster. See the Rate limits for ingesting data for more details.

librbd plugin named persistent write log cache to reduce latency

With this release, the new librbd plugin named Persistent Write Log Cache (PWL) provides a persistent, fault-tolerant write-back cache targeted with SSD devices. It greatly reduces latency and also improves performance at low io_depths. This cache uses a log-ordered write-back design which maintains checkpoints internally, so that writes that get flushed back to the cluster are always crash consistent. Even if the client cache is lost entirely, the disk image is still consistent; but the data will appear to be stale.

Ceph File System (CephFS) now supports high availability asynchronous replication for snapshots

Previously, only one cephfs-mirror daemon would be deployed per storage cluster, thereby a CephFS supported only asynchronous replication of snapshots directories.

With this release, multiple cephfs-mirror daemons can be deployed on two or more nodes to achieve concurrency in snapshot synchronization, thereby providing high availability.

See the Ceph File System mirroring section in the Red Hat Ceph Storage File System Guide for more details.

BlueStore is upgraded to V3

With this release, BlueStore object store is upgraded to V3. Following are the two features:

  • The allocation metadata is removed from RocksDB and now performs a full destage of the allocator object with the OSD allocation.
  • With cache age binning, older onodes might be assigned a lower priority than the hot workload data. See the Ceph BlueStore for more details.

Use cephadm to manage operating system tuning profiles

With this release, you can use cephadm to create and manage operating system tuning profiles for better performance of the Red Hat Ceph Storage cluster. See the Managing operating system tuning profiles with `cephadm` for more details.

The new cephfs-shell option is introduced to mount a filesystem by name

Previously, cephfs-shell could only mount the default filesystem.

With this release, a CLI option is added in cephfs-shell that allows the mounting of a different filesystem by name, that is, something analogous to the mds_namespace= or fs= options for kclient and ceph-fuse.

Day-2 tasks can now be performed through the Ceph Dashboard

With this release, on the Ceph dashboard, user can perform day-2 tasks that require daily or weekly frequency of actions. This enhancement improves the Dashboard’s assessment capabilities, customer experience, and strengthens its usability and maturity. In addition to this, new on-screen elements are also included to help and guide the user in retrieving additional information to complete a task.

3.1. The Cephadm utility

OS tuning profiles added to manage kernel parameters using cephadm

With this release, to achieve feature parity with ceph-ansible, users can apply tuned profile specifications that cause cephadm to set OS tuning parameters on the hosts matching the specifications

See the Managing operating system tuning profiles with `cephadm` for more details.

Users can now easily set the Prometheus TSDB retention size and time in the Prometheus specification

Previously, users could not modify the default 15d retention period and disk consumption from Prometheus.

With this release, users can customize these settings through cephadm so that they are persistently applied, thereby making it easier for users to specify how much and for how long they would like their Prometheus instances to return data.

The format for achieving this is as follows:

Example

service_type: prometheus
placement:
  count: 1
spec:
  retention_time: "1y"
  retention_size: "1GB"

New Ansible playbook is added to define an insecure registry

Previously, when deploying a Red Hat Ceph Storage cluster with a large number of hosts in a disconnected installation environment, it was tedious to populate the /etc/containers/registries.conf file on each host.

With this release, a new Ansible playbook is added to define an insecure registry in the /etc/containers/registries.conf file. Therefore, the deployment of such a Ceph cluster in a disconnected installation environment is now easier as the user can populate /etc/containers/registries.conf with this new playbook.

3.2. Ceph Dashboard

Improved Ceph Dashboard features for rbd-mirroring is now available

Previously, there was no Ceph Block Device Snapshot mirroring support from the user interface.

With this release, the Ceph Block Device Mirroring tab on the Ceph Dashboard is enhanced with the following features that were previously present only in the command-line interface (CLI):

  • Support for enabling or disabling mirroring in images.
  • Support for promoting and demoting actions.
  • Support for resyncing images.
  • The improvement of visibility for editing site names and creating bootstrap keys.
  • A blank page consisting a button to automatically create an rbd-mirror if none exists.

A new logging functionality is added to the Ceph dashboard

With this release, a centralized logging functionality for a single cluster, named Daemon Logs, is implemented on the dashboard under the Cluster Logs section. This makes it easier for users to monitor logs in an efficient manner.

A new TTL cache is added between the Ceph Manager and its modules

Big Ceph clusters generate a lot of data, which might overload the cluster and render modules unsuitable.

With this release, a new TTL cache is added between the Ceph Manager and its modules to help alleviate loads and prevent the cluster from overloading.

A new information message is provided on Ceph Dashboard to troubleshoot issues with Grafana

When Grafana is deployed with self-signed TLS certificates instead of certificates signed by a Certificate Authority, most browsers, such as Chrome or Firefox, do not allow the embedded Grafana iframe to be displayed within the Ceph Dashboard.

This is a security limitation imposed by browsers themselves. Some browsers, like Firefox, still display a security warning: Your connection is not secure, but still allow users to accept the exception and load the embedded Grafana iframe. However, other browsers, for example Chrome, silently fail and do not display any kind of error message, therefore users were not aware of the failure.

With this release, a new notification is displayed on the Ceph Dashboard:

If no embedded Grafana Dashboard appeared below, please follow this link to check if Grafana is reachable and there are no HTTPS certificate issues. You may need to reload this page after accepting any Browser certificate exceptions.

The number of repaired objects in pools is exposed under Prometheus metrics

Previously, data regarding auto-repaired objects was gathered through log parsing which was inefficient.

With this release, the number of repaired objects per pool is now exposed as Prometheus metrics on the Ceph Dashboard.

Ceph Dashboard now clearly indicates errors on certain CephFS operations

Previously, when a user tried to perform an operation on a filesystem directory, but did not have permission, the Ceph Dashboard reported a generic 500 internal server-side error. However, these errors are actually imputable to users, since permissions are the same for preventing certain actions for given users.

With this release, when the user tries to perform an unauthorized operation, they receive a clear explanation on the permission error.

Users can now see new metrics for different storage class in Prometheus

With this release three new metrics, ceph_cluster_by_class_total_bytes, ceph_cluster_by_class_total_used_bytes, and ceph_cluster_by_class_total_used_raw_bytes, are added for different storage class in Prometheus filtered by device class which would help to follow up the performances and the capacity of the infrastructure.

The WAL and DB devices now get filters pre selected automatically

Previously, the user had to manually apply filters to the selected WAL or DB devices, which was a repetitive task.

With this release, when the user selects devices on the primary devices table, the appropriate filters are pre selected for WAL and DB devices.

A new shortcut button for silencing alerts is added

With this release, users can create a silence for every alert in the notification bar on the Ceph Dashboard using the newly created silent shortcut.

Users can now add server side encryption to the Ceph Object Gateway bucket from the Dashboard

Previously, there was no option on the Ceph Dashboard to add server side encryption (SSE) to the Ceph Object Gateway buckets.

With this release, it is now possible to add SSE while creating the Ceph Object Gateway bucket through the Ceph Dashboard.

Cross origin resource sharing is now allowed

Previously, IBM developers were facing issues with their storage insights product when they tried to ping the REST API using their front-end because of the tight cross origin resource sharing (CORS) policies setup in the REST API.

With this release, the cross_origin_url option is added, which can be set to a particular URL. The REST API now allows communicating with only that URL.

Example

[ceph: root@host01 /]# ceph config set mgr mgr/dashboard/cross_origin_url http://localhost:4200

3.3. Ceph File System

Users can now set and manage quotas on subvolume group

Previously, the user could only apply quotas to individual subvolumes.

With this release, the user can now set, apply and manage quotas for a given subvolume group, especially when working on a multi-tenant environment.

The Ceph File System client can now track average read, write, and metadata latencies

Previously, the Ceph File System client would track only the cumulative read, write, and metadata latencies. However, average read, write, and metadata latencies are more useful to the user.

With this feature, the client can start tracking average latencies and forward it to the metadata server to display in the perf stats command output and the cephfs-top utility.

The cephfs-top utility is improved with the support of multiple file systems

Previously, the cephfs-top utility with multiple file systems was not reliable. Moreover, there was no option to display the metrics for only the selected file system.

With this feature, the cephfs-top utility now supports multiple file systems and it is now possible to select an option to see the metrics related to a particular file system.

Users can now use the fs volume info command to display basic details about a volume

Previously, there was no command in Ceph File System to list only the basic details about a volume.

With this release, the user can list the basic details about a volume by running the fs volume info command.

See Viewing information about a Ceph file system volume in Red Hat Ceph Storage File System Guide.

Users can list the in-progress or pending clones for a subvolume snapshot

Previously, there was no way of knowing the set of clone operations in-progress or pending for a subvolume snapshot, unless the user knew the clone subvolume’s name and used clone status command to infer the details.

With this release, for a given subvolume snapshot name, the in-progress or pending clones can be listed.

Users can now use the --human-readable flag with the fs volume info command

Previously, all the sizes were displayed only in bytes on running the fs volume info command.

With this release, users can now see the sizes along with the units on running fs volume info command.

3.4. Ceph Object Gateway

New S3 bucket lifecycle notifications are now generated

With this release, S3 bucket notifications are generated for current and non-current versions, delete-marker expiration generated by lifecycle processing. This capability is potentially useful for application workflow, among other potential uses.

The objects are transitioned to the S3 cloud endpoint as per the set lifecycle rules

In Red Hat Ceph Storage, we use a special storage class of tier type cloud-s3 to configure the remote cloud S3 object store service to which the data needs to be transitioned. These are defined in terms of zonegroup placement targets and unlike regular storage classes, do not need a data pool.

With this release, users can transition Ceph Object Gateway objects from a Ceph Object Gateway server to a remote S3 cloud-point through storage classes. However, the transition is unidirectional, as such data cannot be transitioned back from the remote server.

The Ceph Object Gateway S3 policy errors are now more useful

Previously, Ceph Object Gateway S3 policy error messages were opaque and not very useful. The initial issue with not being able to access data in the buckets after upgrading versions seemed to be the result of an accepted but invalid principal being ignored silently on ingest but rejected on use later due to a code change.

With this release, the policy now prints detailed and useful error messages. There is also a new rgw-policy-check command that lets policy documents be tested in the command line, and a new option rgw policy reject invalid principals that is false by default and that rejects, with an error message, invalid principals on ingest only rather than ignoring them without error.

Level 20 Ceph Object Gateway log messages are reduced when updating bucket indices

With this release the Ceph Object Gateway level 20 log messages are reduced when updating bucket indices to remove messages that do not add value and to reduce size of logs.

3.5. Multi-site Ceph Object Gateway

Lifecycle policy now runs on all zones in multi-site configurations

With this release, lifecycle policy runs on all zones in multi-site Red Hat Ceph Storage configurations, which makes lifecycle processing more resilient in these configurations. The changes are made to also permit new features, such as conditional processing in archive zones.

Multi-site configuration supports dynamic bucket index resharding

Previously, only manual resharding of the buckets for multi-site configurations was supported.

With this release, dynamic bucket resharding is supported in multi-site configurations. Once the storage clusters are upgraded, enable the resharding feature and reshard the buckets either manually with radosgw-admin bucket reshard command or automatically with dynamic resharding, independently of other zones in the storage cluster.

Sites can now customize STS max-session-duration parameter with the role interface

Previously, the max-session-duration parameter controlling duration of STS sessions could not be configured because it was not exposed on the interface.

With this release, it is possible to customize STS max-session-duration parameter through the role interface.

3.6. RADOS

Kafka SASL/SCRAM security mechanism is added to bucket notifications

With this release, Kafka SASL/SCRAM security mechanism is added to bucket notifications.

To know how to use the feature, refer to the "kafka" section in Creating a topic. Note that end-to-end configuration for the feature,in case of testing, is out of scope of Ceph.

Low-level log messages are introduced to warn user about hitting throttle limits

Previously, there was a lack of low-level logging indication that throttle limits were hit, causing these occurrences to incorrectly have the appearance of a networking issue.

With this release, the introduction of low-level log messages makes it much clearer that the throttle limits are hit.

The user is warned on Filestore deprecation through ceph status and ceph health detail commands

BlueStore is the default and widely used objectstore.

With this release, if there are any OSDs that are on Filestore, the storage cluster goes into the HEALTH_WARN status due to the OSD_FILESTORE health check. The end user has to migrate the OSDs which are on Filestore to BlueStore to clear this warning.

User can now take advantage of a tunable KernelDevice buffer in BlueStore

With this release, users can now configure custom alignment for read buffers using bdev_read_buffer_alignment command in BlueStore. This removes the limitations imposed by the default 4 KiB alignment space, when buffers are intended to be backed up by huge pages.

Additionally, BlueStore, through KernelDevice, gets a configurable pool with bdev_read_preallocated_huge_buffer_num parameter of MAP_HUGETLB-based read buffers for workloads with cache-unfriendly access patterns, which undergo recycling and are not cacheable.

Taken together, these features allow to shorten a scatter-gather list that is passed by the storage component to NICs, thereby improving the handling of huge page-based read buffers in BlueStore.

OSDs report the slow operation details in an aggregated format to the Ceph cluster log

Previously, slow requests would overwhelm a cluster log with too many details, filling up the monitor database.

With this release, slow requests by operation type and by pool information gets logged to the cluster log in an aggregated format.

Users can now blocklist a CIDR range

With this release, you can blocklist a CIDR range, in addition to individual client instances and IPs. In certain circumstances, you would want to blocklist all clients in an entire data center or rack instead of specifying individual clients to blocklist.

For example, failing over a workload to a different set of machines and wanting to prevent the old workload instance from continuing to partially operate.

This is now possible using a "blocklist range" analogous to the existing "blocklist" command.

3.7. RADOS Block Devices (RBD)

librbd SSD-based persistent write-back cache to reduce latency now fully supported.

With this release, the pwl_cache librbd plugin provides a log-structured write-back cache targeted at SSD devices. Just as with the already provided log-structured write-back cache targeted at PMEM devices, the updates to the image are batched and flushed in-order, retaining the actual image in a crash-consistent state. The benefits and use cases remain the same, but users no longer need to procure more expensive PMEM devices to take advantage of them.

The librbd compare-and-write operation is improved and new rbd_aio_compare_and_writev API method is introduced

  • The semantics of compare-and-write C++ API now match those of C API.

    Previously, the compare-and-write C++ API, that is Image::compare_and_write and Image::aio_compare_and_write methods, would compare up to the size of the compare buffer. This would cause breakage after straddling a stripe unit boundary.

    With this release, the compare-and-write C++ API matches the semantics of C API and both compare and write steps operate only on len bytes even if the respective buffers are larger.

  • The compare-and-write operation is no longer limited to 512-byte sectors.

    With this release, the compare-and-write can operate on stripe units, if the access is aligned properly. The stripe units are 4 MB by default.

  • New rbd_aio_compare_and_writev API method is now available.

    With this release, the rbd_aio_compare_and_writev API method is included to support scatter/gather on both compare and write buffers, which complements existing rbd_aio_readv and rbd_aio_writev methods.

Layered client-side encryption is now supported

With this release, cloned images can be encrypted, each with its own encryption format and passphrase, potentially different from that of the parent image. The efficient copy-on-write semantics used for unformatted regular cloned images are retained.

Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.