Chapter 5. Known Issues
This section documents known issues found in this release of Red Hat Ceph Storage.
Realm names must be updated on each cluster separately
In a multi-site configuration, the name of a realm is only stored locally and is not shared as part of the period. As a consequence, when the name is changed on one cluster, the name is not updated on the other cluster. To rename the realm, execute the radosgw-admin realm rename
command separately on each cluster. (BZ#1423886)
Calamari sometimes fails to discover some cluster nodes
The Calamari API sometimes fails to discover some cluster nodes. To work around this problem, restart the Calamari service:
# systemctl restart calamari.service
Multi-site configuration of the Ceph Object Gateway sometimes fails when options are changed at runtime
When the rgw md log max shards
and rgw data log num shards
options are changed at runtime in multi-site configuration of the Ceph Object Gateway, the radosgw
process terminates unexpectedly with a segmentation fault.
To avoid this issue, do not change the aforementioned options at runtime, but set them during the initial configuration of the Ceph Object Gateway. (BZ#1330952)
Dynamic feature updates are not replicated
When a feature is disabled or enabled on an already existing image and the image is mirrored to a peer cluster, the feature is not disabled or enabled on the replicated image. (BZ#1344262)
Unable to write data on a promoted image after a non orderly shutdown
In RBD mirroring configuration, after an non orderly shutdown of the local cluster, images are demoted to non-primary on the local cluster and promoted to primary on the remote cluster. If this happens and the rbd-mirror
daemon is not restarted on the remote cluster, it is not possible to write data on the promoted image because rbd-daemon
considers the demoted image on the local cluster to be the primary one. To avoid this issue, restart the rbd-mirror
daemon to gain the read/write access to the promoted image. (BZ#1365648)
Mirroring image metadata is not supported
Image metadata are not currently replicated to a peer cluster. (BZ#1344212)
Disabling image features is incorrectly allowed on non-primary images
With RADOS Block Device (RBD) mirroring enabled, non-primary images are expected to be read-only. An attempt to disable image features on non-primary images could cause an indefinite wait. This operation should be disallowed on non-primary images.
To avoid this issue, make sure to disable image features only on the primary image. (BZ#1353877)
Users created by using the Calamari API do not have permissions to run the API commands
When a user is created by using the Calamari REST API (api/v2/user
), the user does not have permissions to run most of the Calamari API commands. Consequently, an attempt to run the commands fails with the following error message:
"You do not have permission to perform this action"
To work around this issue, use the calamari-ctl add_user
command from the command line when creating new users. (BZ#1356872)
The GNU tar utility currently cannot extract archives directly into the Ceph Object Gateway NFS mounted file systems
The current version of the GNU tar utility makes overlapping write operations when extracting files. This behavior breaks the strict sequential write restriction in the current version of the Ceph Object Gateway NFS. In addition, GNU tar reports these errors in the usual way, but it also by default continues extracting the files after reporting the errors. As a result, the extracted files can contain incorrect data.
To work around this problem, use alternate programs to copy file hierarchies into the Ceph Object Gateway NFS. Recursive copying by using the cp -r
command works correctly. Non-GNU archive utilities might be able to correctly extract the tar archives, but none have been verified. (BZ#1418606)
Ansible fails to install OSDs if they point to directories
Ansible does not support installation of OSDs that point to directories and not to partitions. As a consequence, an attempt to install such OSDs fails. (BZ#1361228)
Results from deep scrubbing are overwritten by shallow scrubbing
When performing shallow scrubbing after deep scrubbing, results from deep scrubbing are overwritten by results from shallow scrubbing. As a consequence, the deep scrubbing results are lost. (BZ#1330023)
The NFS interface for the Ceph Object Gateway does not show bucket size or number of blocks
The NFS interface of the Ceph Object Gateway lists buckets as directories. However, the interface always shows that the directory size and the number of blocks is 0
, even if some data is written to the buckets. (BZ#1359408)
Certain image features are not supported with the RBD kernel module
The following image features are not supported with the current version of the RADOS Block Device (RBD) kernel module (krbd
) that is included in Red Hat Enterprise Linux 7.3:
-
object-map
-
deep-flatten
-
journaling
-
fast-diff
However, by default the ceph-installer
utility creates RBDs with the aforementioned features enabled. As a consequence, an attempt to map the kernel RBDs by running the rbd map
command fails.
To work around this issue, disable the unsupported features by setting the rbd_default_features = 1
option in the Ceph configuration file for kernel RBDs or dynamically disable them by running the following command:
rbd feature disable <image> <feature>
This issue is a limitation only in kernel RBDs, and the features work as expected with user-space RBDs. (BZ#1340080)
Swift SLOs cannot be read from any other zones
The Ceph Object Gateway fails to fetch manifest files of Swift Static Large Objects (SLO). As a consequence, an attempt to read these objects from any other zone than the zone where the object was originally uploaded fails. (BZ#1423858)
The Calamari REST-based API fails to edit user details
An attempt to use the Calamari REST-based API to edit user details fails with an error. To change user details, use the calamari-ctl
command-line utility. (BZ#1338649)
The rbd bench write
command fails when --io-size
is equal to the image size
The rbd bench-write --io-size <size> <image>
command fails with a segmentation fault if the size specified by the --io-size
option is equal to the image size.
To avoid this problem, make sure that the value of --io-size
is smaller than the image size. (BZ#1362014)
Calamari sometimes does not respond when sending a PATCH Request
The Calamari API does not respond when making PATCH requests to /api/v2/cluster/FSID/osd/OSD_ID
if the requests does not change any fields on the OSD from their present values. (BZ#1338688)
The rados list-inconsistent-obj
command does not highlight inconsistent shards when it could have
The output of the rados list-inconsistent-obj
command does not explicitly show which shard is inconsistent when it could have. (BZ#1363949)
An LDAP user can access buckets created by a local RGW user with the same name
The RADOS Object Gateway (RGW) does not differentiate between a local RGW user and an LDAP user with the same name. As a consequence, the LDAP user can access the buckets created by the local RGW user.
To work around this issue, use different names for RGW and LDAP users. (BZ#1361754)
Simultaneous upload operations to the same file cause I/O errors
Simultaneous upload operations to the same file location by different NFS clients cause I/O errors on both clients. Consequently, no data is updated in the Ceph Object Gateway cluster; if an object already existed in the cluster in the same location, it is unchanged.
To work around this problem, do not simultaneously upload to the same file location. (BZ#1420328)
Ansible and "ceph-disk" fail to create encrypted OSDs if the cluster name is different than "ceph"
The ceph-disk
utility does not support configuring the dmcrypt
utility if the cluster name is different than "ceph". Consequently, it is not possible to use the ceph-ansible
utility to create encrypted OSDs if you use a custom cluster name.
To avoid this problem, use the default cluster name, which is "ceph". (BZ#1391920)
Ansible fails to add a monitor to an upgraded cluster
An attempt to add a monitor to a cluster by using the Ansible automation application after upgrading the cluster from Red Hat Ceph Storage 1.3 to 2 fails on the following task:
TASK: [ceph-mon | collect admin and bootstrap keys]
This happens because the original monitor keyring was created with the mds "allow"
capability while the newly added monitor requires a keyring with the mds "allow *"
capability.
To work around this issue, after installing the ceph-mon
package, manually copy the administration keyring from an already existing monitor node to the new monitor node:
scp /etc/ceph/<cluster_name>.client.admin.keyring <target_host_name>:/etc/ceph
For example:
# scp /etc/ceph/ceph.client.admin.keyring node4:/etc/ceph
Then use Ansible to add the monitor as described in the Adding a Monitor with Ansible section of the Administration Guide for Red Hat Ceph Storage 2. (BZ#1357292)
Ansible does not support removing monitor or OSD nodes
The current version of the ceph-ansible
utility does not support removing monitor or OSD nodes. To remove monitor or OSD nodes from a cluster, use the manual procedure. For more information, see the Administration Guide for Red Hat Ceph Storage 2. (BZ#1366807)
ceph-radosgw does not start after upgrading from 1.3 to 2 if a non-default value is used for rgw_region_root_pool and rgw_zone_root_pool
The ceph-radosgw
service does not start after upgrading the Ceph Object Gateway from 1.3 to 2, if the Gateway uses non-default values for the rgw_region_root_pool
and rgw_zone_root_pool
parameters.
See the Inconsistent zonegroup/zone state in Rados GW after upgrade of multizone site to Ceph 2 solution on the Red Hat Customer Portal for details on how to work around this issue. (BZ#1396956)
Old zone group name is sometimes displayed alongside with the new one
In a multi-site configuration when a zone group is renamed, other zones can in some cases continue to display the old zone group name in the output of the radosgw-admin zonegroup list
command.
To work around this issue:
- Verify that the new zone group name is present on each cluster.
- Remove the old zone group name:
$ rados -p .rgw.root rm zonegroups_names.<old-name>
Calamari sometimes incorrectly outputs "null" as a value
When the Calamari REST-based API is used to get details of a CRUSH rule in the Ceph cluster, the output contains "null" as a value for certain fields in the steps
section of the CRUSH rule. The fields containing null values can be safely ignored for the respective steps in the CRUSH rule. However, do not use "null" as a value for any field when doing a PATCH operation. Using null values in such a case causes the operation to fail. (BZ#1342504)
The Calamari API returns the "server error (500)" error when changing the take
step
When changing a CRUSH rule, modifying the take
step type to any other value than take
causes the Calamari API to return the "server error (500)" error.
To avoid this issue, do not change the take
step to any other value. (BZ#1329216)
Ansible does not properly handle unresponsive tasks
Certain tasks, for example adding monitors with the same host name, cause the ceph-ansible
utility to become unresponsive. Currently, there is no timeout set after which the unresponsive tasks is marked as failed. (BZ#1313935)
Object sync requests are sometimes skipped
In multi-site configurations of the Ceph Object Gateway, a non-master zone can be promoted to the master zone. In most cases, the master zone’s gateway or gateways are still running when this happens. However, if the gateways are down, it can take up to 30 seconds after their restart until the gateways notice that another zone was promoted. During this time, the gateways can miss changes to buckets that occur on other zones. Consequently, object sync requests are skipped.
To work around this issue, pull the new master’s period to the old master zone before restarting the old master zone:
$ radosgw-admin period pull --remote=<new-master-zone-id>
For details on pulling the period, see the Ceph Object Gateway Guide for Red Hat Enterprise Linux or the Ceph Object Gateway Guide for Ubuntu. (BZ#1362639)