Chapter 6. Known issues
This section documents known issues found in this release of Red Hat Ceph Storage.
6.1. Ceph Ansible
The shrink-osd.yml
playbook currently has no support for removing OSDs created by ceph-volume
The shrink-osd.yml
playbook assumes all OSDs are created by the ceph-disk
utility. Consequently, OSDs deployed by using the ceph-volume
utility cannot be shrunk.
To work around this issue, remove OSDs deployed by using ceph-volume
manually.
When putting a dedicated journal on an NVMe device installation can fail
When the dedicated_devices
setting contains an NVMe device and it has partitions or signatures on it Ansible installation might fail with an error like the following:
journal check: ondisk fsid 00000000-0000-0000-0000-000000000000 doesn't match expected c325f439-6849-47ef-ac43-439d9909d391, invalid (someone else's?) journal
To work around this issue ensure there are no partitions or signatures on the NVMe device.
When the mon_use_fqdn
option is set to true
the rolling upgrade fails
For Red Hat Ceph Storage 2 container deployments using the mon_use_fqdn = true
option, upgrading to Red Hat Ceph Storage 3.1z1 using the Ceph Ansible rolling upgrade playbook fails. Currently, there are no known workarounds.
(BZ#1646882)
6.2. Ceph Dashboard
The 'iSCSI Overview' page does not display correctly
When using the Red Hat Ceph Storage Dashboard, the 'iSCSI Overview' page does not display any graphs or values as it is expected to.
Ceph OSD encryption summary is not displayed in the Red Hat Ceph Storage Dashboard
On the Ceph OSD Information dashboard, under the OSD Summary panel, the OSD Encryption Summary information is not displayed. Currently, there is no workaround for this issue.
The Prometheus node-exporter
service is not removed after purging the Dashboard
When doing a purge of the Red Hat Ceph Storage Dashboard, the node-exporter
service is not removed, and is still running. To work around this issue, you manually stop and remove the node-exporter
service.
Perform the following commands as root
:
# systemctl stop prometheus-node-exporter # systemctl disable prometheus-node-exporter # rpm -e prometheus-node-exporter # reboot
For Ceph Monitor, OSD, Object Gateway, MDS, and Dashboard nodes, reboot these ones at a time.
The OSD node details are not displayed in the Host OSD Breakdown panel
In the Red Hat Ceph Storage Dashboard, the Host OSD Breakdown information is not displayed on the OSD Node Detail panel under All.
6.3. iSCSI Gateway
Using ceph-ansible
to deploy the iSCSI gateway does not allow the user to adjust the max_data_area_mb
option
Using the max_data_area_mb
option with the ceph-ansible
utility sets a default value of 8 MB. To adjust this value, you set it manually using the gwcli
command. See the Red Hat Ceph Storage Block Device Guide for details on setting the max_data_area_mb
option.
6.4. Object Gateway
The Ceph Object Gateway requires applications to write sequentially
The Ceph Object Gateway requires applications to write sequentially from offset 0 to the end of a file. Attempting to write out of order causes the upload operation to fail. To work around this issue, use utilities like cp
, cat
, or rsync
when copying files into NFS space. Always mount with the sync
option.
RGW garbage collection fails to keep pace during evenly balanced delete-write workloads
In testing during an evenly balanced delete-write (50% / 50%) workload the cluster fills completely in eleven hours. Object Gateway garbage collection fails to keep pace. This causes the cluster to fill completely and the status switches to HEALTH_ERR state. Aggressive settings for the new parallel/async garbage collection tunables did significantly delay the onset of cluster fill in testing, and can be helpful for many workloads. Typical real world cluster workloads are not likely to cause a cluster fill due primarily to garbage collection.
RGW garbage collection decreases client performance by up to 50% during mixed workload
In testing during a mixed workload of 60% reads, 16% writes, 14% deletes, and 10% lists, at 18 hours into the testing run, client throughput and bandwidth drop to half their earlier levels.
6.5. RADOS
High object counts can degrade IO performance
The overhead with directory merging on FileStore can degrade the client’s IO performance for pools with high object counts.
To work around this issue, use the ‘expected_num_objects’ option during pool creation. Creating pools is described in the Red Hat Ceph Storage Object Gateway for Production Guide.
When two or more Ceph Gateway daemons have the same name in a cluster Ceph Manager can crash
Currently, Ceph Manager can terminate unexpectedly if some Ceph Gateway daemons have the same name. The following assert is be generated in this case:
DaemonPerfCounters::update(MMgrReport*)
To work around this issue, rename all the Ceph Gateway daemons that have the same name with new unique names.