Chapter 3. New features and Enhancements
This section lists all major updates, enhancements, and new features introduced in this release of Red Hat Ceph Storage.
The main features added by this release are:
Compression on-wire with msgr2 protocol is now available
With this release, in addition to encryption on wire, compression on wire is also supported to secure network opertaions within the storage cluster.
See the Encryption and key management section in the Red Hat Ceph Storage Data Security and Hardening Guide for more details.
Python notifications are more efficient
Previously, there were some unused notifications that no modules needed at the moment. This caused inefficiency.
With this release, the NotifyType
parameter is introduced. It is annotated, which events modules consume at the moment, for example NotifyType.mon_map
, NotifyType.osd_map
, and the like. As a consequence, only events that modules ask for are queued. The events that no modules consume are issued. Because of these changes, python notifications are now more efficient.
The changes to pg_num
are limited
Previously, if drastic changes were made to pg_num
that outpaced pgp_num
, the user could hit the per-osd placement group limits and cause errors.
With this release, the changes to pg_num
are limited to avoid the issue with per-osd placement group limits.
New pg_progress
item is created to avoid dumping all placement group statistics for progress updates
Previously, the pg_dump
item included unnecessary fields that wasted CPU if it was copied to python-land
. This tended to lead to long ClusterState::lock
hold times, leading to long ms_dispatch
delays and generally slowing the processes.
With this release, a new pg_progress
item is created to dump only the fields that mgr tasks
or progress
needs.
The mgr_ip
is no longer re-fetched
Previously, the mgr_ip
had to be re-fetched during the lifetime of an active Ceph manager module.
With this release, the mgr_ip
does not change for the lifetime of an active Ceph manager module, thereby, there is no need to call back into Ceph Manager for re-fetching.
QoS in the Ceph OSD is based on the mClock algorithm, by default
Previously, the scheduler defaulted to the Weighted Priority Queue (WPQ). Quality of service (QoS) based on the mClock algorithm was in an experimental phase and was not yet recommended for production.
With this release, the mClock based operation queue enables QoS controls to be applied to Ceph OSD specific operations, such as client input and output (I/O) and recovery or backfill, as well as other background operations, such as pg scrub
, snap trim
, and pg deletion
. The allocation of resources to each of the services is based on the input and output operations per second (IOPS) capacity of each Ceph OSD and is achieved using built-in mClock profiles.
Also, this release includes the following enhancements:
- Hands-off automated baseline performance measurements for the OSDs determine Ceph OSD IOPS capacity with safeguards to fallback to default capacity when an unrealistic measurement is detected.
- Setting sleep throttles for background tasks is eliminated.
- Higher default values for recoveries and max backfills options with the ability to override them using an override flag.
- Configuration sets using mClock profiles hide complexity of tuning mClock and Ceph parameters.
See The mClock OSD scheduler in Red Hat Ceph Storage Administration Guide for details.
WORM compliance certification is now supported
Red Hat now supports WORM compliance certification.
See the Enabling object lock for S3 for more details.
Set rate limits on users and buckets
With this release, you can set rate limits on users and buckets based on the operations in a Red Hat Ceph Storage cluster. See the Rate limits for ingesting data for more details.
librbd
plugin named persistent write log cache to reduce latency
With this release, the new librbd
plugin named Persistent Write Log Cache (PWL) provides a persistent, fault-tolerant write-back cache targeted with SSD devices. It greatly reduces latency and also improves performance at low io_depths
. This cache uses a log-ordered write-back design which maintains checkpoints internally, so that writes that get flushed back to the cluster are always crash consistent. Even if the client cache is lost entirely, the disk image is still consistent; but the data will appear to be stale.
Ceph File System (CephFS) now supports high availability asynchronous replication for snapshots
Previously, only one cephfs-mirror
daemon would be deployed per storage cluster, thereby a CephFS supported only asynchronous replication of snapshots directories.
With this release, multiple cephfs-mirror
daemons can be deployed on two or more nodes to achieve concurrency in snapshot synchronization, thereby providing high availability.
See the Ceph File System mirroring section in the Red Hat Ceph Storage File System Guide for more details.
BlueStore is uprgaded to V3
With this release, BlueStore object store is upgraded to V3. Following are the two features:
- The allocation metadata is removed from RocksDB and now performas a full destage of the allocator object with the OSD allocation.
- With cache age binning, older onodes might be assigned a lower priority than the hot workload data. See the Ceph BlueStore for more details.
Use cephadm
to manage operating system tuning profiles
With this release, you can use cephadm
to create and manage operating susyem tuning profiles for better performance of the Red Hat Ceph Storage cluster. See the Managing operating system tuning profiles with `cephadm` for more details.
A direct upgrade from Red Hat Ceph Storage 5 to Red Hat Ceph Storage 7 will be available
For upgrade planning awareness, directly upgrading Red Hat Ceph Storage 5 to Red Hat Ceph Storage 7 (N=2) will be available.
The new cephfs-shell
option is introduced to mount a filesystem by name
Previously, cephfs-shell could only mount the default filesystem.
With this release, a CLI option is added in cephfs-shell that allows the mounting of a different filesystem by name, that is, something analogous to the mds_namespace=
or fs= options
for kclient
and ceph-fuse
.
Day-2 tasks can now be performed through the Ceph Dashboard
With this release, in the Ceph Dashboard, a user can perform every day-2 tasks that require daily or weekly frequency of actions. This enhancement improves the Dashboard’s assessment capabilities, customer experience, and strengthens its usability and maturity. In addition to this, new on-screen elements are also included to help and guide the user in retrieving additional information to complete a task.
3.1. The Cephadm utility
Users can now rotate the authentication key for Ceph daemons
For security reasons, some users might desire to occasionally rotate the authentication key used for daemons in the storage cluster.
With this release, the ability to rotate the authentication key for ceph daemons using the ceph orch daemon rotate-key DAEMON_NAME
command is introduced. For MDS, OSD, and MGR daemons, this does not require a daemon restart. However, for other daemons, such as Ceph Object Gateway daemons, the daemon might require restarting to switch to the new key.
Bootstrap logs are now logged to STDOUT
With this release, to reduce potential errors, bootstrap logs are now logged to STDOUT
instead of STDERR
in successful bootstrap scenarios.
Ceph Object Gateway zonegroup can now be specified in the specification used by the orchestrator
Previously, the orchestrator could handle setting the realm and zone for the Ceph Object Gateway. However, setting the zonegroup was not supported.
With this release, users can specify a rgw_zonegroup
parameter in the specification that is used by the orchestrator. Cephadm sets the zonegroup for Ceph Object Gateway daemons deployed from the specification.
ceph orch daemon add osd
now reports if the hostname specified for deploying the OSD is unknown
Previously, since the ceph orch daemon add osd
command gave no output, users would not notice if the hostname was incorrect. Due to this, Cephadm would discard the command.
With this release, the ceph orch daemon add osd
command reports to the user if the hostname specified for deploying the OSD on is unknown.
cephadm shell
command now reports the image being used for the shell on startup
Previously, users would not always know which image was being used for the shell. This would affect the packages that were used for commands being run within the shell.
With this release, cephadm shell
command reports the image used for the shell on startup. Users can now see the packages being used within the shell, as they can see the container image being used, and when that image was created as the shell starts up.
Cluster logs under` /var/log/ceph` are now deleted
With this release, to better clean up the node as part of removing the Ceph cluster from that node, cluster logs under /var/log/ceph
are deleted when cephadm rm-cluster
command is run. The cluster logs are removed as long as --keep-logs
is not passed to the rm-cluster
command.
If the cephadm rm-cluster
command is run on a host that is part of a still existent cluster, the host is managed by Cephadm, and the Cephadm mgr module is still enabled and running, then Cephadm might immediately start deploying new daemons, and more logs could appear.
Bugzilla:2036063
Better error handling when daemon names are passed to ceph orch restart
command
Previously, in cases where the daemon passed to ceph orch restart
command was a haproxy or keepalived daemon, it would return a traceback. This made it unclear to users if they had made a mistake or Cephadm had failed in some other way.
With this release, better error handling is introduced to identify when the users pass a daemon name to ceph orch restart
command instead of the expected service name. Upon encountering a daemon name, Cephadm reports and requests the user to check ceph orch ls
for valid services to pass.
Users can now create Ceph Object Gateway realm,zone, and zonegroup using ceph rgw realm bootstrap -i rgw_spec.yaml
command
With this release, to streamline the process of setting up Ceph Object Gateway on a Red Hat Ceph Storage cluster, users can create a Ceph Object Gateway realm, zone, and zonegroup using ceph rgw realm bootstrap -i rgw_spec.yaml
command. The specification file should be modeled similar to the one that is used to deploy Ceph Object Gateway daemons using the orchestrator. The command then creates the realm,zone, and, zonegroup, and passes the specification on to the orchestrator, which then deploys the Ceph Object Gateway daemons.
Example
rgw_realm: myrealm rgw_zonegroup: myzonegroup rgw_zone: myzone placement: hosts: - rgw-host1 - rgw-host2 spec: rgw_frontend_port: 5500
crush_device_class
and location
fields are added to OSD specifications and host specifications respectively
With this release, the crush_device_class
field is added to the OSD specifications, and the location
field, referring to the initial crush location of the host, is added to host specifications. If a user sets the location
field in a host specification, cephadm runs ceph osd crush add-bucket
with the hostname, and given location to add it as a bucket in the crush map. For OSDs, they are set with the given crush_device_class
in the crush map upon creation.
This is only for OSDs that were created based on the specification with the field set. It does not affect the already deployed OSDs.
Users can enable Ceph Object Gateway manager module
With this release, Ceph Object Gateway Manager module is now available, and can be turned on with ceph mgr module enable rgw
command to enable users to gain access to the functionality of the Ceph Object Gateway manager module, such as ceph rgw realm bootstrap
, and ceph rgw realm tokens
commands.
Users can enable additional metrics for node-exporter daemons
With this release, to enable users to have more customization of their node-exporter deployments, without requiring explicit support for each individual option, additional metrics are introduced that can now be enabled for node-exporter
daemons deployed by Cephadm, using the extra_entrypoint_args
field.
service_type: node-exporter service_name: node-exporter placement: label: "node-exporter" extra_entrypoint_args: - "--collector.textfile.directory=/var/lib/node_exporter/textfile_collector2" ---
Bugzilla:2142431
Users can set the crush location for a Ceph Monitor to replace tiebreaker monitors
With this release, users can set the crush location for a monitor deployed on a host. It should be assigned in the mon specification file.
Example
service_type: mon service_name: mon placement: hosts: - host1 - host2 - host3 spec: crush_locations: host1: - datacenter=a host2: - datacenter=b - rack=2 host3: - datacenter=a
This is primarily added to make the replacing of a tiebreaker monitor daemon in stretch clusters deployed by Cephadm, more feasible. Without this change, users would have to manually edit the files written by Cephadm to deploy the tiebreaker monitor, as the tiebreaker monitor is not allowed to join without declaring its crush location.
crush_device_class
can now be specified per path in an OSD specification
With this release, to allow users more flexibility with crush_device_class
settings when deploying OSDs through Cephadm, crush_device_class
, you can specify per path inside an OSD specification. It is also supported to provide these per-path crush_device_classes
along with a service-wide crush_device_class
for the OSD service. In cases of service-wide crush_device_class
, the setting is considered as default, and the path-specified settings take priority.
Example
service_type: osd service_id: osd_using_paths placement: hosts: - Node01 - Node02 crush_device_class: hdd spec: data_devices: paths: - path: /dev/sdb crush_device_class: ssd - path: /dev/sdc crush_device_class: nvme - /dev/sdd db_devices: paths: - /dev/sde wal_devices: paths: - /dev/sdf
Cephadm now raises a specific health warning UPGRADE_OFFLINE_HOST
when the host goes offline during upgrade
Previously, when upgrades failed due to a host going offline, a generic UPGRADE_EXCEPTION
health warning would be raised that was too ambiguous for users to understand.
With this release, when an upgrade fails due to a host being offline, Cephadm raises a specific health warning - UPGRADE_OFFLINE_HOST
, and the issue is now made transparent to the user.
All the Cephadm logs are no longer logged into cephadm.log
when --verbose
is not passed
Previously, some Cephadm commands, such as gather-facts
, would spam the log with massive amounts of command output every time they were run. In some cases, it was once per minute.
With this release, in Cephadm, all the logs are no longer logged into cephadm.log
when --verbose
is not passed. The cephadm.log
is now easier to read since most of the spam previously written is no longer present.
3.2. Ceph Dashboard
A new metric is added for OSD blocklist count
With this release, to configure a corresponding alert, a new metric ceph_cluster_osd_blocklist_count
is added on the Ceph Dashboard.
Introduction of ceph-exporter
daemon
With this release, ceph-exporter
daemon is introduced to collect and expose performance counters of all Ceph daemons as Prometheus metrics. It is deployed on each node of the cluster to be performant at large scale clusters.
Support force promote for RBD mirroring through Dashboard
Previously, although RBD mirror promote/demote was implemented on the Ceph Dashboard, there was no option to force promote.
With this release, support for force promoting RBD mirroring through Ceph Dashboard is added. If the promotion fails on the Ceph Dashboard, the user is given the option to force the promotion.
Support for collecting and exposing the labeled performance counters
With this release, support for collecting and exposing the labeled performance counters of Ceph daemons as Prometheus metrics with labels is introduced.
3.3. Ceph File System
cephfs-top limitation is increased for more client loading
Previously, due to a limitation in the cephfs-top
utility, only less than 100 clients were loaded at a time, which could not be scrolled and also hung if more clients were loaded.
With this release, cephfs-top
users can scroll vertically, as well as, horizontally. This enables cephfs-top
to load nearly 10,000 clients. The users can scroll the loaded clients, and view them on the screen.
Users now have the option to sort clients based on the fields of their choice in cephfs-top
With this release, users have the option to sort the clients based on the fields of their choice in cephfs-top
and also, limit the number of clients to be displayed. This enables the user to analyze the metrics based on the order of fields as per requirement.
Non-head omap
entries are now included in the omap
entries
Previously, a directory fragment would not split if non-head snapshotted entries were not taken into account when deciding to merge or split a fragment. Due to this, the number of omap
entries in a directory object would exceed a certain limit, and result in cluster warnings.
With this release, non-head omap
entries are included in the number of omap
entries when deciding to merge or split a directory fragment to never exceed the limit.
3.4. Ceph Object Gateway
Objects replicated from another zone now returns the header
With this release, in a multi-site configuration, objects that have replicated from another zone, return the header x-amz-replication-status=REPLICA
, to allow multi-site users to identify if the object was replicated locally or not.
Bugzilla:1467648
Support for AWS PublicAccessBlock
With this release, Ceph Object Storage supports the AWS public access block S3 APIs such as PutPublicAccessBlock
.
Swift object storage dialect now includes support for SHA-256
and SHA-512
digest algorithms
Previously, support for digest algorithms was added by OpenStack Swift in 2022, but Ceph Object Gateway had not implemented them.
With this release, Ceph Object Gateway’s Swift object storage dialect now includes support for SHA-256
and SHA-512
digest methods in tempurl
operations. Ceph Object Gateway can now correctly handle tempurl
operations by recent OpenStack Swift clients.
3.5. Multi-site Ceph Object Gateway
Bucket notifications are sent when an object is synced to a zone
With this release, bucket notifications are sent when an object is synced to a zone, to allow external systems to receive information into the zone syncing status at object-level. The following bucket notification event types are added - s3:ObjectSynced:*
and s3:ObjectSynced:Created
. When configured with the bucket notification mechanism, a notification event is sent from the synced Ceph Object Gateway upon successful sync of an object.
Both topics and the notification configuration should be done separately in each zone from which you would like to see the notification events being sent.
Disable per-bucket replication when zones replicate by default
With this release, the ability to disable per-bucket replication when the zones replicate by default, using multisite sync policy, is introduced to ensure that selective buckets can opt out.