Chapter 5. Notable Bug Fixes
This section describes bugs fixed in this release of Red Hat Ceph Storage that have significant impact on users.
Calamari now correctly handles manually added OSDs that do not have "ceph-osd" running
Previously, when OSD nodes were added manually to the Calamari server but the ceph-osd
daemon was not started on the nodes, the Calamari server returned error messages and stopped updating statuses for the rest of the OSD nodes. The underlying source code has been modified, and Calamari now handles such OSDs properly. (BZ#1360467)
OSDs no longer reboot when corrupted snapsets are found during scrubbing
Previously, Ceph incorrectly handled corrupted snapsets that were found during scrubbing. This behavior caused the OSD nodes to terminate unexpectedly every time the snapsets were detected. As a consequence, the OSDs rebooted every few minutes. With this update, the underlying source code has been modified, and OSDs no longer reboots in the described situation. (BZ#1273127)
OSD now deletes old OSD maps as expected
When new OSD maps are received, the OSD daemon marks the unused OSD maps as stale
and deletes them to keep up with the changes. Previously, an attempt to delete stale OSD maps could fail for various reasons. As a consequence, certain OSD nodes were sometimes marked as down
if it took too long to clean their OSD map caches when booting. With this update, the OSD daemon deletes old OSD maps as expected, thus fixing this bug. (BZ#1291632)
%USED now shows correct value
Previously, the %USED
column in the output of the ceph df
command erroneously showed the size of a pool divided by the raw space available on the OSD nodes. With this update, the column correctly shows the space used by all replicas divided by the raw space available on the OSD nodes. (BZ#1330643)
SELinux no longer prevents "ceph-mon" and "ceph-osd" from accessing /var/lock/ and /run/lock/
Due to insufficient SELinux policy rules, SELinux denied the ceph-mon
and ceph-osd
daemons to access files in the /var/lock/
and /run/lock/
directories. With this update, SELinux no longer prevents ceph-mon
and ceph-osd
from accessing /var/lock/
and /run/lock/
. (BZ#1330279)
The QEMU process no longer hangs when creating snapshots on images
When the RADOS Block Device (RBD) cache was enabled, creating a snapshot on an image with active I/O operations could cause the QEMU process to become unresponsive. With this update, the QEMU process no longer hangs in the described scenario. (BZ#1316287)
"ceph-deploy" now correctly removes directories of manually added monitors
Previously, an attempt to remove a manually added monitor node by using the ceph-deploy mon destroy
command failed with the following error:
UnboundLocalError: local variable 'status_args' referenced before assignment"
The monitor was removed despite the error, however, ceph-deploy
failed to remove the monitor configuration directory located in the /var/lib/ceph/mon/
directory. With this update, ceph-deploy
removes the monitor directory as expected. (BZ#1278524)
The least used OSDs are selected for increasing the weight
With this update, the least used OSD nodes are now selected for increasing the weight during the reweight-by-utilization
process. (BZ#1333907)
OSDs are now selected properly during "reweight-by-utilization"
During the reweight-by-utilization
process, some of the OSD nodes that met the criteria for reweighting were not selected. The underlying algorithm has been modified, and OSDs are now selected properly during reweight-by-utilization
. (BZ#1331764)
OSDs no longer receive unreasonably large weight during "reweight-by-utilization"
When the value of the max_change
parameter was greater than an OSD weight, an underflow occurred. Consequently, the OSD node could receive an unreasonably large weight during the reweight-by-utilization
process. This bug has been fixed, and OSDs no longer receive large weight in the described situation. (BZ#1331523)
OSDs no longer crash when using "rados cppool" to copy an "omap" object
The omap
objects cannot be stored in an erasure-coded pool. Previously, copying the omap
objects from a replicated pool to an erasure-coded pool by using the rados cppool
command caused the OSD nodes to terminate unexpectedly. With this update, the OSD nodes return an error message instead of crashing in the described situation. (BZ#1368402)
Listing versioned buckets no longer hangs
Due to a bug in the bucket listing logic, the radosgw-admin bucket list
and radosgw-admin bucket stats
commands could become unresponsive while attempting to list versioned buckets or get their statistics. This bug has been fixed, and listing versioned buckets no longer hangs in the described situation. (BZ#1322239)
Ceph Object Gateway now properly uploads files to erasure-coded pools
Under certain conditions, Ceph Object Gateway did not properly upload files to an erasure-coded pool by using the SWIFT API. Consequently, such files were broken and an attempt to download them failed with the following error message:
ERROR: got unexpected error when trying to read object: -2
The underlying source code has been modified, and Ceph Object Gateway now properly uploads files to erasure-coded pools. (BZ#1369013)
The ceph osd tell
command now prints correct error message
When the deprecated ceph osd tell
command was executed, the command returned a misleading error message. With this update, the error message is correct. (BZ#1193710)
"filestore_merge_threshold" can be set to a negative value as expected
If the filestore_merge_threshold
parameter is set to a negative value, merging of subdirectories is disabled. Previously, an attempt to set filestore_merge_threshold
to a negative value by using the command line failed and an error message similar to the following one was returned:
"error": "error setting 'filestore_merge_threshold' to '-40': (22) Invalid argument"
As a consequence, it was not possible to disable merging of subdirectories. This bug has been fixed, and filestore_merge_threshold
can now be set to a negative value as expected. (BZ#1284696)
"radosgw-admin region-map set" output includes the bucket quota
Previously, the output of the radosgw-admin region-map set
command did not include the bucket quota, which led to confusion if the quota was properly set. With this update, the radosgw-admin region-map set
output includes the bucket quota as expected. (BZ#1349484)
The form of the "by-parttypeuuid" term is now correct
The ceph-disk(8)
manual page and the ceph-disk
python script now include the correct form of the by-parttypeuuid
term. Previously, they included by-parttype-uuid
instead. (BZ#1335564)
Index files are removed as expected after deleting buckets
Previously, when deleting buckets, the buckets' index files remained in the .rgw.buckets.index
file. With this update, the index files are removed as expected. (BZ#1340496)
"ceph df" now shows proper value of "MAX AVAIL"
When adding a new OSD node to the cluster by using the ceph-deploy
utility with the osd_crush_initial_weight
option set to 0
, the value of the MAX AVAIL
field in the output of the ceph df
command was 0
for each pool instead of the proper numerical value. As a consequence, other applications using Ceph, such as OpenStack Cinder, assumed that there is no space available to provision new volumes. This bug has been fixed, and ceph df
now shows proper value of MAX AVAIL
as expected. (BZ#1306842)
The columns in the "rados bench" command output are now separated correctly
This update ensures that the columns in the rados bench
command output are separated correctly. (BZ#1332470)
OSDs now obtain PID files properly during an upgrade
After upgrading from Red Hat Ceph Storage 1.2 to 1.3, some of the OSD daemons did not obtain PID files properly. As a consequence, such OSDs could not be restarted or stopped by using SysVinit commands and therefore could not be upgraded to the newer version. This update ensures that OSDs obtain PID files properly during an upgrade. As a result, OSDs are upgraded to newer versions as expected. (BZ#1299409)
The default value of "osd_scrub_thread_suicide_timeout" is now 300
The osd_scrub_thread_suicide_timeout
configuration option ensures that poorly behaving OSD nodes self-terminate instead of running in degraded states and slowing traffic. Previously, the default value of osd_scrub_thread_suicide_timeout
was set to 60 seconds. This value was not sufficient when scanning data for objects on extremely large buckets. This update increases the default value of osd_scrub_thread_suicide_timeout
300. (BZ#1300539)
PG collection split no longer produces any orphaned files
Due to a bug in the underlying source code, a placement group (PG) collection split could produce orphaned files. Consequently, the PG could be incorrectly marked as inconsistent during scrubbing, or the OSD nodes could terminate unexpectedly. The bug has been fixed, and PG collection split no longer produces any orphaned files. (BZ#1334534)
The bucket owner is now properly changed
Previously, the bucket owner was not properly changed by using the radosgw-admin bucket unlink
and radosgw-admin bucket link
commands. As a consequence, the new owner was not able to access the bucket. The underlying source code has been modified, and the bucket owner is now properly changed as expected. (BZ#1324497)
The monitor nodes exit gracefully after authenticating with an incorrect keyring
When a new cluster included monitor nodes that were previously a part of an another cluster, the monitor nodes terminated with a segmentation fault when attempting to authenticate with an incorrect keyring. With this update, the monitor nodes exit gracefully instead of crashing in the described scenario. (BZ#1312587)