Appendix A. Troubleshooting
A.1. Ansible stops installation because it detects less devices than expected
The Ansible automation application stops the installation process and returns the following error:
- name: fix partitions gpt header or labels of the osd disks (autodiscover disks) shell: "sgdisk --zap-all --clear --mbrtogpt -- '/dev/{{ item.0.item.key }}' || sgdisk --zap-all --clear --mbrtogpt -- '/dev/{{ item.0.item.key }}'" with_together: - "{{ osd_partition_status_results.results }}" - "{{ ansible_devices }}" changed_when: false when: - ansible_devices is defined - item.0.item.value.removable == "0" - item.0.item.value.partitions|count == 0 - item.0.rc != 0
What this means:
When the osd_auto_discovery
parameter is set to true
in the /usr/share/ceph-ansible/group_vars/osds.yml
file, Ansible automatically detects and configures all the available devices. During this process, Ansible expects that all OSDs use the same devices. The devices get their names in the same order in which Ansible detects them. If one of the devices fails on one of the OSDs, Ansible fails to detect the failed device and stops the whole installation process.
Example situation:
-
Three OSD nodes (
host1
,host2
,host3
) use the/dev/sdb
,/dev/sdc
, anddev/sdd
disks. -
On
host2
, the/dev/sdc
disk fails and is removed. -
Upon the next reboot, Ansible fails to detect the removed
/dev/sdc
disk and expects that only two disks will be used forhost2
,/dev/sdb
and/dev/sdc
(formerly/dev/sdd
). - Ansible stops the installation process and returns the above error message.
To fix the problem:
In the /etc/ansible/hosts
file, specify the devices used by the OSD node with the failed disk (host2
in the Example situation above):
[osds] host1 host2 devices="[ '/dev/sdb', '/dev/sdc' ]" host3
See Chapter 5, Installing Red Hat Ceph Storage using Ansible for details.