Chapter 6. Known issues
This section documents known issues found in this release of Red Hat Ceph Storage.
6.1. The ceph-ansible
Utility
The shrink-osd.yml
playbook currently has no support for removing OSDs created by ceph-volume
The shrink-osd.yml
playbook assumes all OSDs are created by the ceph-disk
utility. Consequently, OSDs deployed by using the ceph-volume
utility cannot be shrunk.
To work around this issue, remove OSDs deployed by using ceph-volume
manually.
Partitions are not removed from NVMe devices by shrink-osd.yml in certain situations
The Ansible playbook infrastructure-playbooks/shrink-osd.yml
does not properly remove partitions on NVMe devices when used with osd_scenario: non-collocated
in containerized environments.
To work around this issue, manually remove the partitions.
When putting a dedicated journal on an NVMe device installation can fail
When the dedicated_devices
setting contains an NVMe device and it has partitions or signatures on it Ansible installation might fail with an error like the following:
journal check: ondisk fsid 00000000-0000-0000-0000-000000000000 doesn't match expected c325f439-6849-47ef-ac43-439d9909d391, invalid (someone else's?) journal
To work around this issue, ensure there are no partitions or signatures on the NVMe device.
When deploying Ceph NFS Ganesha gateways on Ubuntu IPv6 systems ceph-ansible may fail to start the nfs-ganesha services
This issue causes Ceph NFS Ganesha gateways to fail to deploy.
To work around this issue, rerun ceph-ansible playbook site.yml
to deploy only the Ceph NFS Ganesha gateways:
[root@ansible ~]# ansible-playbook /usr/share/ceph-ansible/site.yml --limit nfss
When using dedicated devices for BlueStore the default sizes for block.db and block.wal might be too small
By default ceph-ansible
does not override the default values bluestore block db size
and bluestore block wal size
. The default sizes are 1 GB and 576 MB respectively. These sizes might be too small when using dedicated devices with BlueStore.
To work around this issue, set bluestore_block_db_size
or bluestore_block_wal_size
, or both, using ceph_conf_overrides
in ceph.conf
to override the default values.
6.2. Ceph Management Dashboard
Ceph OSD encryption summary is not displayed in the Red Hat Ceph Storage Dashboard
On the Ceph OSD Information dashboard, under the OSD Summary panel, the OSD Encryption Summary information is not displayed.
There is no workaround at this time.
The Prometheus node-exporter
service is not removed after purging the Dashboard
When purging the Red Hat Ceph Storage Dashboard, the node-exporter
service is not removed, and is still running.
To work around this issue, manually stop and remove the node-exporter
service.
Perform the following commands as root
:
# systemctl stop prometheus-node-exporter # systemctl disable prometheus-node-exporter # rpm -e prometheus-node-exporter # reboot
For Ceph Monitor, OSD, Object Gateway, MDS, and Dashboard, nodes, reboot these one at a time.
The OSD down
tab shows an incorrect value
When rebooting OSDs, the OSD down
tab in the CEPH Backend storage
dashboard shows the correct number of OSDs that are down
. However, when all OSDs are up
again after the reboot, the tab continues showing the number of down
OSDs.
There is no workaround at this time.
The Top 5 pools by Throughput graph lists all pools
The Top 5 pools by Throughput graph in the Ceph Pools tab lists all pools in the cluster instead of listing only the top five pools with the highest throughput.
There is no workaround at this time.
The MDS Performance dashboard displays the wrong value for Clients after increasing and decreasing the number of active MDS servers and clients multiple times.
This issue causes the Red Hat Ceph Storage dashboard to display the wrong number of CephFS clients. This can be verified by comparing the value in the Red Hat Ceph Storage dashboard with the value printed by the ceph fs status $FILESYSTEM_NAME
command.
There is no workaround at this time.
Request Queue Length
displays an incorrect value
In the Ceph RGW Workload
dashboard, the Request Queue Length
parameter always displays 0
even when running Ceph Object Gateways I/Os from different clients.
There is no workaround at this time.
Capacity Utilization in Ceph - At Glance dashboard shows the wrong value when an OSD is down
This issue causes the Red Hat Ceph Dashboard to show capacity utilization which is less than what ceph df
shows.
There is no workarond at this time.
Some links on the Ceph - At Glance page do not work after installing ceph-metrics
After installing ceph-metrics
, some of the panel links on the Ceph - At Glance page in the Ceph Dashboard do not work.
To work around this issue, clear the browser cache and reload the Ceph Dashboard site.
The iSCSI Overview dashboard does not display graphs if the [iscsigws] role is included in the Ansible inventory file.
When deploying the Red Hat Ceph Storage Dashboard, the iSCSI Overview dashboard does not display any graphs or values if the Ansible inventory file has the [iscsigws] role included for iSCSI gateways.
To work around this issue, add [iscsis]
as a role in the Ansible inventory file and run the Ansible playbook for cephmetrics-ansible
. The iSCSI Overview dashboard then displays the graphs and values.
In the Ceph Cluster dashboard the Pool Capacity graphs display values higher than actual capacity
This issue causes the Pool Capacity graph to display values around one percent higher than what df --cluster
shows.
There is no workaround at this time.
Graphs on the OSD Node Detail dashboard might appear incorrect when used with All
Graphs generated under OSD Node Detail > OSD Host Name > All do not show all OSDs in the cluster. A graph with data for hundreds or thousands of OSDs would not be usable. The ability to set All is intended to show cluster-wide values. For some dashboards it does not make sense and should not be used.
There is no workaround at this time.
6.3. Ceph File System
The Ceph Metadata Server might crash during scrub with multiple MDS
This issue is triggered when the scrub_path
command is run in an environment with multiple Ceph Metadata Servers.
There is no workaround at this time.
6.4. The ceph-volume
Utility
Deploying an OSD on devices with GPT headers fails
Drives with GPT headers will cause an error to be returned by LVM when deploying an OSD on them. The error says the device has been excluded by a filter.
To work around this issue ensure there is no GPT header present on devices to be used by OSDs.
6.5. iSCSI Gateway
Using ceph-ansible
to deploy the iSCSI gateway does not allow the user to adjust the max_data_area_mb
option
Using the max_data_area_mb
option with the ceph-ansible
utility sets a default value of 8 MB. To adjust this value, set it manually using the gwcli
command. See the Red Hat Ceph Storage Block Device Guide for details on setting the max_data_area_mb
option.
Ansible fails to purge RBD images with snapshots
The purge-iscsi-gateways.yml
Ansible playbook does not purge RBD images with snapshots. To purge the images and their snapshots, use the rbd
command-line utility:
To purge a snapshot:
rbd snap purge pool-name/image-name
For example:
# rbd snap purge data/image1
To delete an image:
rbd rm image-name
For example:
# rbd rm image1
6.6. Object Gateway
Ceph Object Gateway garbage collection decreases client performance by up to 50% during mixed workload
In testing during a mixed workload of 60% read operations, 16% write operations, 14% delete operations, and 10% list operations, at 18 hours into the testing run, client throughput and bandwidth drop to half their earlier levels.
Pushing a docker image to the Ceph Object Gateway over s3 does not complete
In certain situations when configuring docker-distribution
to use Ceph Object Gateway with the s3 interface the docker push
command does not complete. Instead the command fails with an HTTP 500 error.
There is no workaround at this time.
Delete markers are not removed with a lifecycle configuration
In certain situations after deleting a file and a lifecycle triggers, delete markers are not removed.
There is no workaround at this time.
The Ceph Object Gateway’s S3 does not always work in FIPS mode
If a secret key of a Ceph Object Gateway user or sub-user is less than 112 bits in length, it can cause the radosgw
daemon to exit unexpectedly when a user attempts to authenticate using S3.
This is because the FIPS mode Red Hat Enterprise Linux security policy forbids construction of a cryptographic HMAC based on a key of less than 112 bits, and violation of this constraint yields an exception that is not correctly handled in Ceph Object Gateway.
To work around this issue, ensure that the secret keys of Ceph Object Gateway users and sub-users are at least 112 bits in length.
6.7. RADOS
Performing I/O in CephFS erasure-coded pools can cause a failure on assertion
This issue is being investigated as a possible latent bug in the messenger layer which could be causing out of order operations on the OSD.
The issue causes the following error:
FAILED assert(repop_queue.front() == repop)
There is no workaround at this time. CephFS with erasure-coded pools are a Technology Preview. For more information see Creating Ceph File Systems with erasure coding in the Ceph File System Guide