Chapter 6. Notable Bug Fixes
This section describes bugs fixed in this release of Red Hat Ceph Storage that have significant impact on users.
OSD operations are no longer blocked when the osd_scrub_sleep
option is set
Previously, the scrubbing operation was moved into the unified operations queue, but the osd_scrub_sleep
option did not change. Consequently, setting the osd_scrub_sleep
option was blocking OSD operations. With this update, setting the osd_scrub_sleep
option no longer blocks OSD operations by doing the scrub sleep operation asynchronously.
Some OSDs fail to come up after reboot
Previously, on a machine with more than five OSDs, some OSDs failed to come up after a reboot because the systemd
unit for the ceph-disk
utility timed out after 120 seconds. With this release, the ceph-disk
code no longer fails, if the OSD udev
rule triggers prior to the mounting of the /var/
directory.
Swift POST operations no longer generate random 500 errors
Previously, when making changes to the same bucket through multiple Ceph Object Gateways, under certain circumstances under heavy load, the Ceph Object Gateway returned a 500 error. With this release, the chances are reduced to cause a race condition.
In containerized deployment, ceph-disk
now retries when it cannot find a partition
Due to a possible race condition between kernel and the udev
utility, the device file for a partition was not created when the ceph-disk
utility searched for it. Consequently, the ceph-disk
utility could not create the partition. In this release, when ceph-disk
cannot find the partition, it retries the search again, thus working around this possible race condition.
Stale bucket index entries are no longer left over after object deletions
Previously, under certain circumstances, deleted objects were incorrectly interpreted as incomplete delete transactions because of an incorrect time. As a consequence, the delete operations were reported successful in the Ceph Object Gateway logs, but the deleted objects were not correctly removed from bucket indexes. The incorrect time comparison has been fixed, and deleting objects works correctly.
The Ceph Object Gateway no longer refuses an S3 upload when the Content-Type field is missing
When doing a Simple Storage Service (S3) upload, if the Content-Type field was missing from the policy part of the upload, then Ceph Object Gateway refused the upload with a 403 error:
Policy missing condition: Content-Type
With this update, the S3 POST policy does not require the Content-Type field.
The OSD no longer times out when backfilling
Previously, backfilling objects with hundreds of thousands of omap
entries could cause the OSD to time out. With this release, the backfilling process now reads a maximum of 8096 omap
entries at a time, allowing the OSDs to continue running.
Swift buckets can now be accessed anonymously
Previously, it was not possible anonymously access a Swift bucket, even if the permissions allowed it. The logic to anonymously access Swift buckets was missing. With this release, that logic has been added. As a result, Swift buckets can now be accessed anonymously, if the permissions allow the access.
Unreconstructable object errors are not handled properly
During a backfill or recovery operation of an erasure coded pool, the unreconstructable object errors were not handled properly. As a consequence, the OSDs terminated unexpectedly with the following message:
osd/ReplicatedPG.cc: recover_replicas: object added to missing set for backfill, but is not in recovering, error!
With this update, the error handling has been corrected and two new placement group (PG) states have been added: recover_unfound
and backfill_unfound
. As a result, the OSD does not terminate unexpectedly, and the PG state indicates that the PG contains unfound objects.
The Ceph Object Gateway successfully starts after upgrading Red Hat OpenStack Platform 11 to 12
Previously, when upgrading Red Hat OpenStack Platform 11 to 12, the Ceph Object Gateway would fail to start because port 8080 was already in use by haproxy
. With this release, you can specify the IP address and port bindings for the Ceph Object Gateway. As a result, the Ceph Object Gateway will start properly.
Objects eligible for expiration are no longer infrequently passed over
Previously, due to an off-by-one error in expiration processing in the Ceph Object Gateway, objects eligible for expiration could infrequently be passed over, and consequently were not removed. The underlying source code has been modified, and the objects are no longer passed over.
Folders starting with an underscore (_) are not in the bucket index
Previously, a server-side copy mishandled object names starting with an underscore. This led to objects being created with two leading underscores. The Ceph Object Gateway code has been fixed to properly handle leading underscores. As a result, objects names with leading underscores behave correctly.
Fixed a memory leak with the Ceph Object Gateway
A buffer used to transfer incoming PUT data was incorrectly sized at the maximum chunk value of 4 MB. This was leading to a space leak of unused buffer space when PUTs of smaller objects were processed. The Ceph Object Gateway can leak space when processing large numbers of PUT requests less than 4M in size. As a result, the incorrect buffer sizing logic was fixed.
ceph-ansible
now disables the Ceph Object Gateway service as expected when upgrading the OpenStack container
When upgrading the OpenStack container from version 11 to 12, the ceph-ansible
utility did not properly disable the Ceph Object Gateway service provided by the overcloud image. Consequently, the containerized Ceph Object Gateway service entered a failed state because the port it used was bound. The ceph-ansible
utility has been updated to properly disable the system Ceph Object Gateway service. As a result, the containerized Ceph Object Gateway service starts as expected after upgrading the OpenStack container from version 11 to 12.
The ceph-ansible
utility no longer fails when upgrading from Red Hat OpenStack Platform 11 to 12
Ceph containers set their socket name by either the short host name or by the fully qualified domain name (FQDN). Previously, the ceph-ansible
utility could not predict which host name type would be used. With this release, ceph-ansible
checks for both socket names.
Relocated some OSD options the rolling-update.yml
Ceph Ansible playbook
Previously, when doing a minor Ceph upgrade, for example, upgrading version 10.2.9 to 10.2.10, the noout
, noscrub
and nodeep-scrub
OSD options did not get applied. Since a daemon does not exist for these versions, the mgr
section in the rolling-update.yml
file was skipped. With this release, the OSD options are set properly after all the Ceph Monitors have been upgraded.
Slow OSD startup after upgrading to Red Hat Ceph Storage 2.5
Ceph Storage Clusters that have large omap
databases experience slow OSD startup due to scanning and repairing during the upgrade from Red Hat Ceph Storage 2.4 to 2.5. The rolling update may take longer than the specified time out of 5 minutes. Before running the Ansible rolling_update.yml
playbook, set the handler_health_osd_check_delay
option to 180 in the group_vars/all.yml
file.
An inconsistent PG state can reappear long after the PG was repaired
In rare circumstances after a PG has been repaired and the primary changes, the inconsistent state can falsely reappear, even without a scrub being performed. This bug fix cleans up stray scrub error counts to prevent this.
(BZ#1550892)
The expected_num_objects
option was not working as expected
Previously, when using the ceph osd pool create
command with expected_num_objects
option, placement group (PG) directories were not pre-created at pool creation time as expected, resulting in performance drops when filestore splitting occurred. With this update, the expected_num_objects
parameter is now passed through to filestore correctly, and PG directories for the expected number of objects are pre-created at pool creation time.
The Ceph Object Gateway handles requests for negative byte-range objects correctly
The Ceph Object Gateway was treating negative byte-range object requests as invalid, whereas such requests succeed and return the whole object in AWS S3. This caused applications which expected the AWS behavior for negative or other invalid range requests saw unexpected errors and possible failure. In this update, a new option rgw_ignore_get_invalid_range
was added to Ceph Object Gateway. When the rgw_ignore_get_invalid_range
option is true
(non-default), the Ceph Object Gateway behavior for invalid range requests is backward compatible with AWS.
(BZ#1576487)
Fixed the Ceph Object Gateway multiple site sync for versioned buckets and objects
Previously, internal Ceph Object Gateway multi-site sync logic behaved incorrectly in some scenarios when attempting to sync containers with S3 object versioning enabled. In particular, when a new object upload was followed immediately by an attribute or ACL setting operation. Objects in versioning-enabled containers would fail to sync in some scenarios. For example, when using the s3cmd sync
command to mirror a filesystem directory. With this update, the Ceph Object Gateway multi-site replication logic has been corrected for the known failure cases.
(BZ#1578401)(BZ#1584763)
Improving performance of the Ceph Object Gateway when using SSL
The Ceph Object Gateway’s use of libcurl
was slower and had a bigger memory footprint. With this update, the Ceph Object Gateway reuses the libcurl
data structures, which improves SSL efficiency when authenticating using OpenStack Keystone with SSL.
(BZ#1578670)
Update to the ceph-disk
Unit Files
The transition to containerized Ceph left some ceph-disk
unit files. The files were harmless, but appeared as failing, which could be distressing to the operator. With this update, executing the switch-from-non-containerized-to-containerized-ceph-daemons.yml
playbook disables the ceph-disk
unit files too.
The listing of versioned-bucket objects was returning an additional entry
When listing large versioning-enabled buckets with a marker, that marker was also listed. Listings would include an extra duplicate entry for each 1000 objects in the bucket. With this update, the marker is no longer included in bucket listings, and the correct number of entries are returned.
Cache entries were not refreshing as expected
The new time-based metadata cache entry expiration logic did not include logic to update the expiration time on already-cached entries being updated in place. Cache entries became permanently stale after expiration, leading to a performance regression as metadata objects were effectively not cached and always read from the cluster. Logic was added to update the expiration time of cached entries when updated.
The Ceph Object Gateway was using large amounts of the CPU
Because of the linkage between the Ceph Object Gateway and tcmalloc
, a bug was found causing high CPU utilization when tcmalloc
tries to reclaim space for specific workloads. This bug seems to be tcmalloc
version specific. The linkage between the Ceph Object Gateway and tcmalloc
has been reverted for Red Hat Ceph Storage 2.
(BZ#1591455)
The Ceph Object Gateway retry logic can cause high-CPU utilization
A bug in the Ceph Object Gateway retry logic can cause a non-terminating condition in operations processing, which can lead to high-CPU utilization and less noticeable side effects. This high-CPU utilization triggered a "busy-loop" in various workloads. In this update, the error-handling logic has been fixed to handle this condition.
Reduce OSD memory usage for Ceph Object Gateway workloads
The OSD memory usage was tuned to reduce unnecessary usage, especially for Ceph Object Gateway workloads.
Ceph Monitor does not mark an OSD down when the cluster network is down
In some cases, when a new OSD was added into an existed storage cluster, the OSD heartbeat peers were not updated. This happened if the new OSD did not get any PGs mapped to it. With this fix, the existing OSDs will refresh their heartbeat peers when a new OSD joins the storage cluster. As a result, the Ceph Monitor marks the OSD daemon as down if the OSD node’s cluster network is down.
(BZ#1488389)
Better recovery from cache inconsistency for the Ceph Object Gateway nodes
When the Ceph Object Gateway nodes were experiencing a heavy load, cache transmission updates sent through the watch or notify requests were being lost. Consequently, certain Ceph Object Gateway nodes were left with stale cache data, which causes several problems, notably that the bucket index contained old data. With this update, secondary cache coherency mechanisms have been added:
- Bucket operations now look for errors that indicate cache inconsistency.
- Ability to forcibly refresh has been added.
- A timeout to bound the age of any cache entry, forcing an eventual refresh at some point has been added.
As a result, the Ceph Object Gateway nodes recover from cache inconsistency rather than entering a persistent, user-visible error state.
(BZ#1491723)