Chapter 4. Bug fixes
This section describes bugs with significant impact on users that were fixed in this release of Red Hat Ceph Storage. In addition, the section includes descriptions of fixed known issues found in previous versions.
4.1. The Cephadm utility
Bootstrap no longer fails if a comma-separated list of quoted IPs are passed in as the public network in the initial Ceph configuration
Previously,cephadm
bootstrap would improperly parse comma-delimited lists of IP addresses, if the list was quoted. Due to this, the bootstrap would fail if a comma-separated list of quoted IP addresses, for example, ’172.120.3.0/24,172.117.3.0/24,172.118.3.0/24,172.119.3.0/24’, was provided as the public_network
in the initial Ceph configuration passed to bootstrap with the --config
parameter.
With this fix, you can enter the comma-separated lists of quoted IPs into the initial Ceph configuration passed to bootstrap for the public_network
or cluster_network
, and it works as expected.
cephadm
no longer attempts to parse the provided yaml files more than necessary
Previously, cephadm
bootstrap would attempt to manually parse the provided yaml files more than necessary. Due to this, sometimes, even if the user had provided a valid yaml file to cephadm
bootstrap, the manual parsing would fail, depending on the individual specification, causing the entire specification to be discarded.
With this fix, cephadm
no longer attempts to parse the yaml more than necessary. The host specification is searched only for the purpose of spreading SSH keys. Otherwise, the specification is just passed up to the manager module. The cephadm bootstrap --apply-spec
command now works as expected with any valid specification.
host.containers.internal
entry is no longer added to the /etc/hosts
file of deployed containers
Previously, certain podman versions would, by default, add a host.containers.internal
entry to the /etc/hosts
file of deployed containers. Due to this, issues arose in some services with respect to this entry, as it was misunderstood to represent FQDN of a real node.
With this fix, Cephadm mounts the host’s /etc/hosts
file when deploying containers. The host.containers.internal
entry in the /etc/hosts
file in the containers, is no longer present, avoiding all bugs related to the entry, although users can still see the host’s /etc/hosts
for name resolution within the container.
Cephadm
now logs device information only when an actual change occurs
Previously, cephadm`would compare all fields reported for OSDs, to check for new or changed devices. But one of these fields included a timestamp that would differ every time. Due to this, `cephadm
would log that it ‘Detected new or changed devices' every time it refreshed a host’s devices, regardless of whether anything actually changed or not.
With this fix, the comparison of device information against previous information no longer takes the timestamp fields into account that are expected to constantly change. Cephadm
now logs only when there is an actual change in the devices.
The generated Prometheus URL is now accessible
Previously, if a host did not have an FQDN, the Prometheus URL generated would be http://_host-shortname:9095_, and it would be inaccessible.
With this fix, if no FQDN is available, the host IP is used over the shortname. The URL generated for Prometheus is now in a format that is accessible, even if the host Prometheus is deployed on a service that has no FQDN available.
cephadm no longer has permission issues while writing files to the host
Previously, cephadm would first create files within the /tmp
directory, and then move them to their final location. Due to this, in certain setups, a permission issue would arise when writing files, making cephadm effectively unable to operate until permissions were modified.
With this fix, cephadm uses a subdirectory within`/tmp` to write files to the host that do not have the same permission issues.
4.2. Ceph Dashboard
The default option in the OSD creation step of Expand Cluster wizard works as expected
Previously, the default option in the OSD creation step of Expand Cluster wizard was not working on the dashboard, causing the user to be misled by showing the option as “selected”.
With this fix, the default option works as expected. Additionally, a “Skip” button is added if the user decides to skip the step.
Users can create normal or mirror snapshots
Previously, even though the users were allowed to create a normal image snapshot and mirror image snapshot, it was not possible to create a normal image snapshot.
With this fix, the user can choose from two options to select either normal or mirror image snapshot modes.
Flicker no longer occurs on the Host page
Previously, the host page would flicker after 5 seconds if there were more than 1 hosts, causing a bad user experience.
With this fix, the API is optimized to load the page normally and the flicker no longer occurs.
4.3. Ceph Metrics
The metrics names produced by Ceph exporter and prometheus manager module are the same
Previously, the metrics coming from the Ceph daemons (performance counters) were produced by the Prometheus manager module. The new Ceph exporter would replace the Prometheus manager module, and the metrics name produced would not follow the same rules applied in the Prometheus manager module. Due to this, the name of the metrics for the same performance counters were different depending on the provider of the metric (Prometheus manager module or Ceph exporter)
With this fix, the Ceph exporter uses the same rules as the ones in the Prometheus manager module to generate metric names from Ceph performance counters. The metrics produced by Ceph exporter and Prometheus manager module are exactly the same.
4.4. Ceph File System
mtime and change_attr are now updated for snapshot directory when snapshots are created
Previously, libcephfs
clients would not update mtime, and would change the attribute when snaps were created or deleted. Due to this, NFS clients could not list CephFS snapshots within a CephFS NFS-Ganesha export correctly.
With this fix, mtime and change_attr are updated for the snapshot directory, .snap
, when snapshots are created, deleted, and renamed. Correct mtime and change_attr ensure that listing snapshots do not return stale snapshot entries.
cephfs-top -d [--delay]
option accepts only integer values ranging between 1 to 25
Previously, cephfs-top -d [--delay]
option would not work properly, due to the addition of a few new curses methods. The new curses method would accept only integer values, due to which an exception was thrown on getting the float values from a helper function.
With this fix, cephfs-top -d [--delay]
option accepts only integer values ranging between 1 and 25, and cephfs-top
utility works as expected.
Creating same dentries after the unlink finishes does not crash the MDS daemons
Previously, there was a racy condition between unlink and creating operations. Due to this, if the previous unlink request was delayed due to any reasons, and creating same dentries was attempted during this time, it would fail by crashing the MDS daemons or new creation would succeed but the written content would be lost.
With this fix, users need to ensure to wait until the unlink finishes, to avoid conflict when creating the same dentries.
Non-existing cluster no longer shows up when running the ceph nfs cluster info CLUSTER_ID
command.
Previously, existence of a cluster would not be checked when ceph nfs cluster info CLUSTER_ID
command was run, due to which, information of the non-existing cluster would be shown, such as virtual_ip
and backend
, null and empty respectively.
With this fix, the`ceph nfs cluster info CLUSTER_ID` command checks the cluster existence and an Error ENOENT: cluster does not exist is thrown in case a non-existing cluster is queried.
The snap-schedule module no longer incorrectly refers to the volumes module
Previously, the snap-schedule module would incorrectly refer to the volumes module when attempting to fetch the subvolume path. Due to using the incorrect name of the volumes module and remote method name, the ImportError
traceback would be seen.
With this fix, the untested and incorrect code is rectified, and the method is implemented and correctly invoked from the snap-schedule CLI interface methods. The snap-schedule module now correctly resolves the subvolume path when trying to add a subvolume level schedule.
Integer overflow and ops_in_flight
value overflow no longer happens
Previously, _calculate_ops
would rely on a configuration option filer_max_purge_ops
, which could be modified on the fly too. Due to this, if the value of ops_in_flight
is set to more than uint64
’s capability, then there would be an integer overflow, and this would make ops_in_flight
far more greater than max_purge_ops
and it would not be able to go back to a reasonable value.
With this fix, the usage of filer_max_purge_ops
in ops_in_flight
is ignored, since it is already used in Filer::_do_purge_range()
. Integer overflow and ops_in_flight
value overflow no longer happens.
Invalid OSD requests are no longer submitted to RADOS
Previously, when the first dentry had enough metadata and the size was larger than max_write_size
, an invalid OSD request would be submitted to RADOS. Due to this, RADOS would fail the invalid request, causing CephFS to be read-only.
With this fix, all the OSD requests are filled with validated information before sending it to RADOS and no invalid OSD requests cause the CephFS to be read-only.
MDS now processes all stray directory entries.
Previously, a bug in the MDS stray directory processing logic caused the MDS to skip processing a few stray directory entries. Due to this, the MDS would not process all stray directory entries, causing deleted files to not free up space.
With this fix, the stray index pointer is corrected, so that the MDS processes all stray directories.
Pool-level snaps for pools attached to a Ceph File System are disabled
Previously, the pool-level snaps and mon-managed snaps had their own snap ID namespace and this caused a clash between the IDs, and the Ceph Monitor was unable to uniquely identify a snap as to whether it is a pool-level snap or a mon-managed snap. Due to this, there were chances for the wrong snap to get deleted when referring to an ID, which is present in the set of pool-level snaps and mon-managed snaps.
With this fix, the pool-level snaps for the pools attached to a Ceph File System are disabled and no clash of pool IDs occurs. Hence, no unintentional data loss happens when a CephFS snap is removed.
Client requests no longer bounce indefinitely between MDS and clients
Previously, there was a mismatch between the Ceph protocols for client requests between CephFS client and MDS. Due to this, the corresponding information would be truncated or lost when communicating between CephFS clients and MDS, and the client requests would indefinitely bounce between MDS and clients.
With this fix, the type of the corresponding members in the protocol for the client requests is corrected by making them the same type and the new code is made to be compatible with the old Cephs. The client request does not bounce between MDS and clients indefinitely, and stops after being well retried.
A code assert is added to the Ceph Manager daemon service to detect metadata corruption
Previously, a type of snapshot-related metadata corruption would be introduced by the manager daemon service for workloads running Postgres, and possibly others.
With this fix, a code assert is added to the manager daemon service which is triggered if a new corruption is detected. This reduces the proliferation of the damage, and allows the collection of logs to ascertain the cause.
If daemons crash after the cluster is upgraded to Red Hat Ceph Storage 6.1, contact Red Hat support for analysis and corrective action.
MDS daemons no longer crash due to sessionmap version mismatch issue
Previously, MDS sessionmap journal log would not correctly persist when MDS failover occurred. Due to this, when a new MDS was trying to replay the journal logs, the sessionmap journal logs would mismatch with the information in the MDCache or the information from other journal logs, causing the MDS daemons to trigger an assert to crash themselves.
With this fix, trying to force replay the sessionmap version instead of crashing the MDS daemons results in no MDS daemon crashes due to sessionmap version mismatch issue.
MDS no longer gets indefinitely stuck while waiting for the cap revocation acknowledgement
Previously, if __setattrx()
failed, the _write()
would retain the CEPH_CAP_FILE_WR
caps reference, the MDS would be indefinitely stuck waiting for the cap revocation acknowledgment. It would also cause other clients' requests to be stuck indefinitely.
With this fix, the CEPH_CAP_FILE_WR
caps reference is released if the __setattrx()
fails and MDS' caps revoke request is not stuck.
4.5. The Ceph Volume utility
The correct size is calculated for each database device in ceph-volume
Previously, as of RHCS 4.3, ceph-volume
would not make a single VG with all database devices inside, since each database device had its own VG. Due to this, the database size was calculated differently for each LV.
With this release, the logic is updated to take into account the new database devices with LVM layout. The correct size is calculated for each database device.
4.6. Ceph Object Gateway
Topic creation is now allowed with or without trailing slash
Previously, http endpoints with one trailing slash in the push-endpoint URL, failed to create a topic.
With this fix, topic creation is allowed with or without trailing slash and it creates successfully.
Blocksize is changed to 4K
Previously, Ceph Object Gateway GC processing would consume excessive time due to the use of a 1K blocksize that would consume the GC queue. This caused slower processing of large GC queues.
With this fix, blocksize is changed to 4K, which has accelerated the processing of large GC queues.
Timestamp is sent in the multipart upload bucket notification event to the receiver
Previously, no timestamp was sent on the multipart upload bucket notification event. Due to this, the receiver of the event would not know when the multipart upload ended.
With this fix, the timestamp when the multipart upload ends is sent in the notification event to the receiver.
Object size and etag
values are no longer sent as 0
/empty
Previously, some object metadata would not be decoded before dispatching bucket notifications from the lifecycle. Due to this, object size and etag
values were sent as 0
/empty
in notifications from lifecycle events.
With this fix, object metadata is fetched and values are now correctly sent with notifications.
Ceph Object Gateway recovers from kafka broker disconnections
Previously, if the kafka broker was down for more than 30 seconds, there would be no reconnect after the broker was up again. Due to this, bucket notifications would not be sent, and eventually, after queue fill up, S3 operations that require notifications would be rejected.
With this fix, the broker reconnect happens regardless of the time duration the broker is down and the Ceph Object Gateway is able to recover from kafka broker disconnects.
S3 PUT requests with chunked Transfer-Encoding does not require content-length
Previously, S3 clients that PUT objects with Transfer-Encoding:chunked
, without providing the x-amz-decoded-content-length
field, would fail. As a result, the S3 PUT requests would fail with 411 Length Required
http status code.
With this fix, S3 PUT requests with chunked Transfer-Encoding need not specify a content-length
, and S3 clients can perform S3 PUT requests as expected.
Users can now configure the remote S3 service with the right credentials
Previously, while configuring remote cloud S3 object store service to transition objects, access keys starting with digit were incorrectly parsed. Due to this, there were chances for the object transition to fail.
With this fix, the keys are parsed correctly. Users cannot configure the remote S3 service with the right credentials for transition.
4.7. Multi-site Ceph Object Gateway
Bucket attributes are no longer overwritten in the archive sync module
Previously, bucket attributes were overwritten in the archive sync module. Due to this, bucket policy or any other attributes would be reset when archive zone sync_object()
was executed.
With this fix, ensure to not reset bucket attributes. Any bucket attribute set on source replicates to the archive zone without being reset.
Bugzilla:1937618
Zonegroup is added to the bucket ARN in the notification event
Previously, zonegroup was missing from bucket ARN in the notification event. Due to this, while the notification events handler received events from multiple zone groups, it would cause confusion in the identification of the source bucket of the event.
With this fix, zonegroup is added to the bucket ARN and the notification events handler receiving events from multiple zone groups has all the required information.
bucket read_sync_status()
command no longer returns a negative ret value
Previously, bucket read_sync_status()
would always return a negative ret value. Due to this, the bucket sync marker command would fail with : ERROR: sync.read_sync_status() returned error=0
.
With this fix, the actual ret value from the bucket read_sync_status()
operation is returned and the bucket sync marker command runs successfully.
New bucket instance information are stored on the newly created bucket
Previously, in the archive zone, a new bucket would be created when a source bucket was deleted, in order to preserve the archived versions of objects. The new bucket instance information would be stored in the old instance rendering the new bucket on the archived zone to be in accessible
With this fix, the bucket instance information is stored in the newly created bucket. Deleted buckets on source are still accessible in the archive zone.
Segmentation fault no longer occurs when bucket has a num_shards
value of 0
Previously, multi-site sync would result in segmentation faults when a bucket had num_shards
value of 0
. This resulted in inconsistent sync behavior and segmentation fault.
With this fix, num_shards=0
is properly represented in data sync and buckets with shard value 0
does not have any issues with syncing.
4.8. RADOS
Upon querying the IOPS capacity for an OSD, only the configuration option that matches the underlying device type shows the measured/default value
Previously, the osd_mclock_max_capacity_iops_[ssd|hdd]
values were set depending on the OSD’s underlying device type. The configuration options also had default values that were displayed when queried. For example, if the underlying device type for an OSD was SSD, the default value for the HDD option, osd_mclock_max_capacity_iops_hdd
, was also displayed with a non-zero value. Due to this, displaying values for both HDD and SSD options of an OSD when queried, caused confusion regarding the correct option to interpret.
With this fix, the IOPS capacity-related configuration option of the OSD that matches the underlying device type is set and the alternate/inactive configuration option is set to 0
. When a user queries the IOPS capacity for an OSD, only the configuration option that matches the underlying device type shows the measured/default value. The alternative/inactive option is set to 0 to clearly indicate that it is disabled.
4.9. RBD Mirroring
Error message when enabling image mirroring within a namespace now provides more insight
Previously, attempting to enable image mirroring within a namespace would fail with a "cannot enable mirroring in current pool mirroring mode" error. The error would neither provide insight into the problem nor provide any solution.
With this fix, to provide more insight, the error handling is improved and the error now states "cannot enable mirroring: mirroring is not enabled on a namespace".
Snapshot mirroring no longer halts permanently
Previously, if a primary snapshot creation request was forwarded to rbd-mirror daemon when the rbd-mirror daemon was axed for some practical reason before marking the snapshot as complete, the primary snapshot would be permanently incomplete. This is because, upon retrying that primary snapshot creation request, librbd
would notice that such a snapshot already existed. It would not check whether this "pre-existing" snapshot was complete or not. Due to this, the mirroring of snapshots was permanently halted.
With this fix, as part of the next mirror snapshot creation, including being triggered by a scheduler, checks are made to ensure that any incomplete snapshots are deleted accordingly to resume the mirroring.