Ce contenu n'est pas disponible dans la langue sélectionnée.
Chapter 11. Cephadm troubleshooting
As a storage administrator, you can troubleshoot the Red Hat Ceph Storage cluster. Sometimes there is a need to investigate why a Cephadm command failed or why a specific service does not run properly.
11.1. Prerequisites Copier lienLien copié sur presse-papiers!
- A running Red Hat Ceph Storage cluster.
11.2. Pause or disable cephadm Copier lienLien copié sur presse-papiers!
If Cephadm does not behave as expected, you can pause most of the background activity with the following command:
Example
[ceph: root@host01 /]# ceph orch pause
This stops any changes, but Cephadm periodically checks hosts to refresh it’s inventory of daemons and devices.
If you want to disable Cephadm completely, run the following commands:
Example
[ceph: root@host01 /]# ceph orch set backend ''
[ceph: root@host01 /]# ceph mgr module disable cephadm
Note that previously deployed daemon containers continue to exist and start as they did before.
To re-enable Cephadm in the cluster, run the following commands:
Example
[ceph: root@host01 /]# ceph mgr module enable cephadm
[ceph: root@host01 /]# ceph orch set backend cephadm
11.3. Per service and per daemon event Copier lienLien copié sur presse-papiers!
Cephadm stores events per service and per daemon in order to aid in debugging failed daemon deployments. These events often contain relevant information:
Per service
Syntax
ceph orch ls --service_name SERVICE_NAME --format yaml
Example
[ceph: root@host01 /]# ceph orch ls --service_name alertmanager --format yaml
service_type: alertmanager
service_name: alertmanager
placement:
hosts:
- unknown_host
status:
...
running: 1
size: 1
events:
- 2021-02-01T08:58:02.741162 service:alertmanager [INFO] "service was created"
- '2021-02-01T12:09:25.264584 service:alertmanager [ERROR] "Failed to apply: Cannot
place <AlertManagerSpec for service_name=alertmanager> on unknown_host: Unknown hosts"'
Per daemon
Syntax
ceph orch ps --service-name SERVICE_NAME --daemon-id DAEMON_ID --format yaml
Example
[ceph: root@host01 /]# ceph orch ps --service-name mds --daemon-id cephfs.hostname.ppdhsz --format yaml
daemon_type: mds
daemon_id: cephfs.hostname.ppdhsz
hostname: hostname
status_desc: running
...
events:
- 2021-02-01T08:59:43.845866 daemon:mds.cephfs.hostname.ppdhsz [INFO] "Reconfigured
mds.cephfs.hostname.ppdhsz on host 'hostname'"
11.4. Check cephadm logs Copier lienLien copié sur presse-papiers!
You can monitor the Cephadm log in real time with the following command:
Example
[ceph: root@host01 /]# ceph -W cephadm
You can see the last few messages with the following command:
Example
[ceph: root@host01 /]# ceph log last cephadm
If you have enabled logging to files, you can see a Cephadm log file called ceph.cephadm.log on the monitor hosts.
11.5. Gather log files Copier lienLien copié sur presse-papiers!
You can use the journalctl command, to gather the log files for all the daemons.
You have to run all these commands outside the cephadm shell.
By default, Cephadm stores logs in journald which means that daemon logs are no longer available in /var/log/ceph.
To read the log file of a specific daemon, run the following command:
Syntax
cephadm logs --name DAEMON_NAMEExample
[root@host01 ~]# cephadm logs --name cephfs.hostname.ppdhsz
This command works when run on the same hosts where the daemon is running.
To read the log file of a specific daemon running on a different host, run the following command:
Syntax
cephadm logs --fsid FSID --name DAEMON_NAMEExample
[root@host01 ~]# cephadm logs --fsid 2d2fd136-6df1-11ea-ae74-002590e526e8 --name cephfs.hostname.ppdhszwhere
fsidis the cluster ID provided by theceph statuscommand.To fetch all log files of all the daemons on a given host, run the following command:
Syntax
for name in $(cephadm ls | python3 -c "import sys, json; [print(i['name']) for i in json.load(sys.stdin)]") ; do cephadm logs --fsid FSID_OF_CLUSTER --name "$name" > $name; doneExample
[root@host01 ~]# for name in $(cephadm ls | python3 -c "import sys, json; [print(i['name']) for i in json.load(sys.stdin)]") ; do cephadm logs --fsid 57bddb48-ee04-11eb-9962-001a4a000672 --name "$name" > $name; done
11.6. Collect systemd status Copier lienLien copié sur presse-papiers!
To print the state of a systemd unit, run the following command:
Example
[root@host01 ~]$ systemctl status ceph-a538d494-fb2a-48e4-82c8-b91c37bb0684@mon.host01.service
11.7. List all downloaded container images Copier lienLien copié sur presse-papiers!
To list all the container images that are downloaded on a host, run the following command:
Example
[ceph: root@host01 /]# podman ps -a --format json | jq '.[].Image'
"docker.io/library/rhel8"
"registry.redhat.io/rhceph-alpha/rhceph-5-rhel8@sha256:9aaea414e2c263216f3cdcb7a096f57c3adf6125ec9f4b0f5f65fa8c43987155"
11.8. Manually run containers Copier lienLien copié sur presse-papiers!
Cephadm writes small wrappers that runs a container. Refer to /var/lib/ceph/CLUSTER_FSID/SERVICE_NAME/unit to run the container execution command.
Analysing SSH errors
If you get the following error:
Example
execnet.gateway_bootstrap.HostNotFound: -F /tmp/cephadm-conf-73z09u6g -i /tmp/cephadm-identity-ky7ahp_5 root@10.10.1.2
...
raise OrchestratorError(msg) from e
orchestrator._interface.OrchestratorError: Failed to connect to 10.10.1.2 (10.10.1.2).
Please make sure that the host is reachable and accepts connections using the cephadm SSH key
Try the following options to troubleshoot the issue:
To ensure Cephadm has a SSH identity key, run the following command:
Example
[ceph: root@host01 /]# ceph config-key get mgr/cephadm/ssh_identity_key > ~/cephadm_private_key INFO:cephadm:Inferring fsid f8edc08a-7f17-11ea-8707-000c2915dd98 INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15 obtained 'mgr/cephadm/ssh_identity_key' [root@mon1 ~] # chmod 0600 ~/cephadm_private_keyIf the above command fails, Cephadm does not have a key. To generate a SSH key, run the following command:
Example
[ceph: root@host01 /]# chmod 0600 ~/cephadm_private_keyOr
Example
[ceph: root@host01 /]# cat ~/cephadm_private_key | ceph cephadm set-ssk-key -i-To ensure that the SSH configuration is correct, run the following command:
Example
[ceph: root@host01 /]# ceph cephadm get-ssh-configTo verify the connection to the host, run the following command:
Example
[ceph: root@host01 /]# ssh -F config -i ~/cephadm_private_key root@host01
Verify public key is in authorized_keys.
To verify that the public key is in the authorized_keys file, run the following commands:
Example
[ceph: root@host01 /]# ceph cephadm get-pub-key
[ceph: root@host01 /]# grep "`cat ~/ceph.pub`" /root/.ssh/authorized_keys
11.9. CIDR network error Copier lienLien copié sur presse-papiers!
Classless inter domain routing (CIDR) also known as supernetting, is a method of assigning Internet Protocol (IP) addresses,FThe Cephadm log entries shows the current state that improves the efficiency of address distribution and replaces the previous system based on Class A, Class B and Class C networks. If you see one of the following errors:
ERROR: Failed to infer CIDR network for mon ip *; pass --skip-mon-network to configure it later
Or
Must set public_network config option or specify a CIDR network, ceph addrvec, or plain IP
You need to run the following command:
Example
[ceph: root@host01 /]# ceph config set host public_network hostnetwork
11.10. Access the admin socket Copier lienLien copié sur presse-papiers!
Each Ceph daemon provides an admin socket that bypasses the MONs.
To access the admin socket, enter the daemon container on the host:
Example
[ceph: root@host01 /]# cephadm enter --name cephfs.hostname.ppdhsz
[ceph: root@mon1 /]# ceph --admin-daemon /var/run/ceph/ceph-cephfs.hostname.ppdhsz.asok config show
11.11. Manually deploying a mgr daemon Copier lienLien copié sur presse-papiers!
Cephadm requires a mgr daemon in order to manage the Red Hat Ceph Storage cluster. In case the last mgr daemon of a Red Hat Ceph Storage cluster was removed, you can manually deploy a mgr daemon, on a random host of the Red Hat Ceph Storage cluster.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Root-level access to all the nodes.
- Hosts are added to the cluster.
Procedure
Log into the Cephadm shell:
Example
[root@host01 ~]# cephadm shellDisable the Cephadm scheduler to prevent Cephadm from removing the new MGR daemon, with the following command:
Example
[ceph: root@host01 /]# ceph config-key set mgr/cephadm/pause trueGet or create the
authentry for the new MGR daemon:Example
[ceph: root@host01 /]# ceph auth get-or-create mgr.host01.smfvfd1 mon "profile mgr" osd "allow *" mds "allow *" [mgr.host01.smfvfd1] key = AQDhcORgW8toCRAAlMzlqWXnh3cGRjqYEa9ikw==Open
ceph.conffile:Example
[ceph: root@host01 /]# ceph config generate-minimal-conf # minimal ceph.conf for 8c9b0072-67ca-11eb-af06-001a4a0002a0 [global] fsid = 8c9b0072-67ca-11eb-af06-001a4a0002a0 mon_host = [v2:10.10.200.10:3300/0,v1:10.10.200.10:6789/0] [v2:10.10.10.100:3300/0,v1:10.10.200.100:6789/0]Get the container image:
Example
[ceph: root@host01 /]# ceph config get "mgr.host01.smfvfd1" container_imageCreate a
config-json.jsonfile and add the following:NoteUse the values from the output of the
ceph config generate-minimal-confcommand.Example
{ { "config": "# minimal ceph.conf for 8c9b0072-67ca-11eb-af06-001a4a0002a0\n[global]\n\tfsid = 8c9b0072-67ca-11eb-af06-001a4a0002a0\n\tmon_host = [v2:10.10.200.10:3300/0,v1:10.10.200.10:6789/0] [v2:10.10.10.100:3300/0,v1:10.10.200.100:6789/0]\n", "keyring": "[mgr.Ceph5-2.smfvfd1]\n\tkey = AQDhcORgW8toCRAAlMzlqWXnh3cGRjqYEa9ikw==\n" } }Exit from the Cephadm shell:
Example
[ceph: root@host01 /]# exitDeploy the MGR daemon:
Example
[root@host01 ~]# cephadm --image registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest deploy --fsid 8c9b0072-67ca-11eb-af06-001a4a0002a0 --name mgr.host01.smfvfd1 --config-json config-json.json
Verification
In the Cephadm shell, run the following command:
Example
[ceph: root@host01 /]# ceph -s
You can see a new mgr daemon has been added.