Chapter 14. Cephadm troubleshooting

14.1. Pause or disable cephadm
Link kopieren

If Cephadm does not behave as expected, you can pause most of the background activity with the following command:

Example

[ceph: root@host01 /]# ceph orch pause

[ceph: root@host01 /]# ceph orch pause

Copy to Clipboard

Toggle word wrap

This stops any changes, but Cephadm periodically checks hosts to refresh it’s inventory of daemons and devices.

If you want to disable Cephadm completely, run the following commands:

Example

[ceph: root@host01 /]# ceph orch set backend ''
[ceph: root@host01 /]# ceph mgr module disable cephadm

[ceph: root@host01 /]# ceph orch set backend ''
[ceph: root@host01 /]# ceph mgr module disable cephadm

Copy to Clipboard

Toggle word wrap

Note that previously deployed daemon containers continue to exist and start as they did before.

To re-enable Cephadm in the cluster, run the following commands:

Example

[ceph: root@host01 /]# ceph mgr module enable cephadm
[ceph: root@host01 /]# ceph orch set backend cephadm

[ceph: root@host01 /]# ceph mgr module enable cephadm
[ceph: root@host01 /]# ceph orch set backend cephadm

Copy to Clipboard

Toggle word wrap

14.2. Per service and per daemon event
Link kopieren

Cephadm stores events per service and per daemon in order to aid in debugging failed daemon deployments. These events often contain relevant information:

Per service

Syntax

ceph orch ls --service_name SERVICE_NAME --format yaml

ceph orch ls --service_name SERVICE_NAME --format yaml

Copy to Clipboard

Toggle word wrap

Example

[ceph: root@host01 /]# ceph orch ls --service_name alertmanager --format yaml
service_type: alertmanager
service_name: alertmanager
placement:
  hosts:
  - unknown_host
status:
  ...
  running: 1
  size: 1
events:
- 2021-02-01T08:58:02.741162 service:alertmanager [INFO] "service was created"
- '2021-02-01T12:09:25.264584 service:alertmanager [ERROR] "Failed to apply: Cannot
  place <AlertManagerSpec for service_name=alertmanager> on unknown_host: Unknown hosts"'

[ceph: root@host01 /]# ceph orch ls --service_name alertmanager --format yaml
service_type: alertmanager
service_name: alertmanager
placement:
  hosts:
  - unknown_host
status:
  ...
  running: 1
  size: 1
events:
- 2021-02-01T08:58:02.741162 service:alertmanager [INFO] "service was created"
- '2021-02-01T12:09:25.264584 service:alertmanager [ERROR] "Failed to apply: Cannot
  place <AlertManagerSpec for service_name=alertmanager> on unknown_host: Unknown hosts"'

Copy to Clipboard

Toggle word wrap

Per daemon

Syntax

ceph orch ps --service-name SERVICE_NAME --daemon-id DAEMON_ID --format yaml

ceph orch ps --service-name SERVICE_NAME --daemon-id DAEMON_ID --format yaml

Copy to Clipboard

Toggle word wrap

Example

[ceph: root@host01 /]# ceph orch ps --service-name mds --daemon-id cephfs.hostname.ppdhsz --format yaml
daemon_type: mds
daemon_id: cephfs.hostname.ppdhsz
hostname: hostname
status_desc: running
...
events:
- 2021-02-01T08:59:43.845866 daemon:mds.cephfs.hostname.ppdhsz [INFO] "Reconfigured
  mds.cephfs.hostname.ppdhsz on host 'hostname'"

[ceph: root@host01 /]# ceph orch ps --service-name mds --daemon-id cephfs.hostname.ppdhsz --format yaml
daemon_type: mds
daemon_id: cephfs.hostname.ppdhsz
hostname: hostname
status_desc: running
...
events:
- 2021-02-01T08:59:43.845866 daemon:mds.cephfs.hostname.ppdhsz [INFO] "Reconfigured
  mds.cephfs.hostname.ppdhsz on host 'hostname'"

Copy to Clipboard

Toggle word wrap

14.3. Check cephadm logs
Link kopieren

You can monitor the Cephadm log in real time with the following command:

Example

[ceph: root@host01 /]# ceph -W cephadm

[ceph: root@host01 /]# ceph -W cephadm

Copy to Clipboard

Toggle word wrap

You can see the last few messages with the following command:

Example

[ceph: root@host01 /]# ceph log last cephadm

[ceph: root@host01 /]# ceph log last cephadm

Copy to Clipboard

Toggle word wrap

If you have enabled logging to files, you can see a Cephadm log file called ceph.cephadm.log on the monitor hosts.

14.4. Gather log files
Link kopieren

You can use the journalctl command, to gather the log files for all the daemons.

Note

You have to run all these commands outside the cephadm shell.

Note

By default, Cephadm stores logs in journald which means that daemon logs are no longer available in /var/log/ceph.

To read the log file of a specific daemon, run the following command:
Syntax
```
cephadm logs --name DAEMON_NAME
```
```
cephadm logs --name DAEMON_NAME
```
Copy to Clipboard Toggle word wrap
Example
```
cephadm logs --name cephfs.hostname.ppdhsz
```
```
[root@host01 ~]# cephadm logs --name cephfs.hostname.ppdhsz
```
Copy to Clipboard Toggle word wrap

Note

This command works when run on the same hosts where the daemon is running.

To read the log file of a specific daemon running on a different host, run the following command:
Syntax
```
cephadm logs --fsid FSID --name DAEMON_NAME
```
```
cephadm logs --fsid FSID --name DAEMON_NAME
```
Copy to Clipboard Toggle word wrap
Example
```
cephadm logs --fsid 2d2fd136-6df1-11ea-ae74-002590e526e8 --name cephfs.hostname.ppdhsz
```
```
[root@host01 ~]# cephadm logs --fsid 2d2fd136-6df1-11ea-ae74-002590e526e8 --name cephfs.hostname.ppdhsz
```
Copy to Clipboard Toggle word wrap
where fsid is the cluster ID provided by the ceph status command.

To fetch all log files of all the daemons on a given host, run the following command:

Syntax

for name in $(cephadm ls | python3 -c "import sys, json; [print(i['name']) for i in json.load(sys.stdin)]") ; do cephadm logs --fsid FSID_OF_CLUSTER --name "$name" > $name; done

for name in $(cephadm ls | python3 -c "import sys, json; [print(i['name']) for i in json.load(sys.stdin)]") ; do cephadm logs --fsid FSID_OF_CLUSTER --name "$name" > $name; done

Copy to Clipboard

Toggle word wrap

Example

for name in $(cephadm ls | python3 -c "import sys, json; [print(i['name']) for i in json.load(sys.stdin)]") ; do cephadm logs --fsid 57bddb48-ee04-11eb-9962-001a4a000672 --name "$name" > $name; done

[root@host01 ~]# for name in $(cephadm ls | python3 -c "import sys, json; [print(i['name']) for i in json.load(sys.stdin)]") ; do cephadm logs --fsid 57bddb48-ee04-11eb-9962-001a4a000672 --name "$name" > $name; done

Copy to Clipboard

Toggle word wrap

14.5. Collect systemd status
Link kopieren

To print the state of a systemd unit, run the following command:

Example

systemctl status ceph-a538d494-fb2a-48e4-82c8-b91c37bb0684@mon.host01.service

[root@host01 ~]$ systemctl status ceph-a538d494-fb2a-48e4-82c8-b91c37bb0684@mon.host01.service

Copy to Clipboard

Toggle word wrap

14.6. List all downloaded container images
Link kopieren

To list all the container images that are downloaded on a host, run the following command:

Example

[ceph: root@host01 /]# podman ps -a --format json | jq '.[].Image'
"docker.io/library/rhel9"
"registry.redhat.io/rhceph-alpha/rhceph-6-rhel9@sha256:9aaea414e2c263216f3cdcb7a096f57c3adf6125ec9f4b0f5f65fa8c43987155"

[ceph: root@host01 /]# podman ps -a --format json | jq '.[].Image'
"docker.io/library/rhel9"
"registry.redhat.io/rhceph-alpha/rhceph-6-rhel9@sha256:9aaea414e2c263216f3cdcb7a096f57c3adf6125ec9f4b0f5f65fa8c43987155"

Copy to Clipboard

Toggle word wrap

14.7. Manually run containers
Link kopieren

Cephadm writes small wrappers that runs a container. Refer to /var/lib/ceph/CLUSTER_FSID/SERVICE_NAME/unit to run the container execution command.

Analysing SSH errors

If you get the following error:

Example

execnet.gateway_bootstrap.HostNotFound: -F /tmp/cephadm-conf-73z09u6g -i /tmp/cephadm-identity-ky7ahp_5 root@10.10.1.2
...
raise OrchestratorError(msg) from e
orchestrator._interface.OrchestratorError: Failed to connect to 10.10.1.2 (10.10.1.2).
Please make sure that the host is reachable and accepts connections using the cephadm SSH key

execnet.gateway_bootstrap.HostNotFound: -F /tmp/cephadm-conf-73z09u6g -i /tmp/cephadm-identity-ky7ahp_5 root@10.10.1.2
...
raise OrchestratorError(msg) from e
orchestrator._interface.OrchestratorError: Failed to connect to 10.10.1.2 (10.10.1.2).
Please make sure that the host is reachable and accepts connections using the cephadm SSH key

Copy to Clipboard

Toggle word wrap

Try the following options to troubleshoot the issue:

To ensure Cephadm has a SSH identity key, run the following command:

Example

[ceph: root@host01 /]# ceph config-key get mgr/cephadm/ssh_identity_key > ~/cephadm_private_key
INFO:cephadm:Inferring fsid f8edc08a-7f17-11ea-8707-000c2915dd98
INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15 obtained 'mgr/cephadm/ssh_identity_key'
[root@mon1 ~] # chmod 0600 ~/cephadm_private_key

[ceph: root@host01 /]# ceph config-key get mgr/cephadm/ssh_identity_key > ~/cephadm_private_key
INFO:cephadm:Inferring fsid f8edc08a-7f17-11ea-8707-000c2915dd98
INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15 obtained 'mgr/cephadm/ssh_identity_key'
[root@mon1 ~] # chmod 0600 ~/cephadm_private_key

Copy to Clipboard

Toggle word wrap

If the above command fails, Cephadm does not have a key. To generate a SSH key, run the following command:

Example

[ceph: root@host01 /]# chmod 0600 ~/cephadm_private_key

[ceph: root@host01 /]# chmod 0600 ~/cephadm_private_key

Copy to Clipboard

Toggle word wrap

Or

Example

[ceph: root@host01 /]# cat ~/cephadm_private_key | ceph cephadm set-ssk-key -i-

[ceph: root@host01 /]# cat ~/cephadm_private_key | ceph cephadm set-ssk-key -i-

Copy to Clipboard

Toggle word wrap

To ensure that the SSH configuration is correct, run the following command:
Example
```
[ceph: root@host01 /]# ceph cephadm get-ssh-config
```
```
[ceph: root@host01 /]# ceph cephadm get-ssh-config
```
Copy to Clipboard Toggle word wrap

To verify the connection to the host, run the following command:

Example

[ceph: root@host01 /]# ssh -F config -i ~/cephadm_private_key root@host01

[ceph: root@host01 /]# ssh -F config -i ~/cephadm_private_key root@host01

Copy to Clipboard

Toggle word wrap

Verify public key is in authorized_keys.

To verify that the public key is in the authorized_keys file, run the following commands:

Example

[ceph: root@host01 /]# ceph cephadm get-pub-key
[ceph: root@host01 /]# grep "`cat ~/ceph.pub`" /root/.ssh/authorized_keys

[ceph: root@host01 /]# ceph cephadm get-pub-key
[ceph: root@host01 /]# grep "`cat ~/ceph.pub`" /root/.ssh/authorized_keys

Copy to Clipboard

Toggle word wrap

14.8. CIDR network error
Link kopieren

Classless inter domain routing (CIDR) also known as supernetting, is a method of assigning Internet Protocol (IP) addresses,FThe Cephadm log entries shows the current state that improves the efficiency of address distribution and replaces the previous system based on Class A, Class B and Class C networks. If you see one of the following errors:

ERROR: Failed to infer CIDR network for mon ip *; pass --skip-mon-network to configure it later

Or

Must set public_network config option or specify a CIDR network, ceph addrvec, or plain IP

You need to run the following command:

Example

[ceph: root@host01 /]# ceph config set host public_network hostnetwork

[ceph: root@host01 /]# ceph config set host public_network hostnetwork

Copy to Clipboard

Toggle word wrap

14.9. Access the admin socket
Link kopieren

Each Ceph daemon provides an admin socket that bypasses the MONs.

To access the admin socket, enter the daemon container on the host:

Example

[ceph: root@host01 /]# cephadm enter --name cephfs.hostname.ppdhsz
[ceph: root@mon1 /]# ceph --admin-daemon /var/run/ceph/ceph-cephfs.hostname.ppdhsz.asok config show

[ceph: root@host01 /]# cephadm enter --name cephfs.hostname.ppdhsz
[ceph: root@mon1 /]# ceph --admin-daemon /var/run/ceph/ceph-cephfs.hostname.ppdhsz.asok config show

Copy to Clipboard

Toggle word wrap

14.10. Manually deploying a mgr daemon
Link kopieren

Cephadm requires a mgr daemon in order to manage the Red Hat Ceph Storage cluster. In case the last mgr daemon of a Red Hat Ceph Storage cluster was removed, you can manually deploy a mgr daemon, on a random host of the Red Hat Ceph Storage cluster.

Prerequisites

A running Red Hat Ceph Storage cluster.
Root-level access to all the nodes.
Hosts are added to the cluster.

Procedure

Log into the Cephadm shell:
Example
```
cephadm shell
```
```
[root@host01 ~]# cephadm shell
```
Copy to Clipboard Toggle word wrap
Disable the Cephadm scheduler to prevent Cephadm from removing the new MGR daemon, with the following command:
Example
```
[ceph: root@host01 /]# ceph config-key set mgr/cephadm/pause true
```
```
[ceph: root@host01 /]# ceph config-key set mgr/cephadm/pause true
```
Copy to Clipboard Toggle word wrap

Get or create the auth entry for the new MGR daemon:

Example

[ceph: root@host01 /]# ceph auth get-or-create mgr.host01.smfvfd1 mon "profile mgr" osd "allow *" mds "allow *"
[mgr.host01.smfvfd1]
key = AQDhcORgW8toCRAAlMzlqWXnh3cGRjqYEa9ikw==

[ceph: root@host01 /]# ceph auth get-or-create mgr.host01.smfvfd1 mon "profile mgr" osd "allow *" mds "allow *"
[mgr.host01.smfvfd1]
key = AQDhcORgW8toCRAAlMzlqWXnh3cGRjqYEa9ikw==

Copy to Clipboard

Toggle word wrap

Open ceph.conf file:

Example

[ceph: root@host01 /]# ceph config generate-minimal-conf
# minimal ceph.conf for 8c9b0072-67ca-11eb-af06-001a4a0002a0
[global]
fsid = 8c9b0072-67ca-11eb-af06-001a4a0002a0
mon_host = [v2:10.10.200.10:3300/0,v1:10.10.200.10:6789/0] [v2:10.10.10.100:3300/0,v1:10.10.200.100:6789/0]

[ceph: root@host01 /]# ceph config generate-minimal-conf
# minimal ceph.conf for 8c9b0072-67ca-11eb-af06-001a4a0002a0
[global]
fsid = 8c9b0072-67ca-11eb-af06-001a4a0002a0
mon_host = [v2:10.10.200.10:3300/0,v1:10.10.200.10:6789/0] [v2:10.10.10.100:3300/0,v1:10.10.200.100:6789/0]

Copy to Clipboard

Toggle word wrap

Get the container image:

Example

[ceph: root@host01 /]# ceph config get "mgr.host01.smfvfd1" container_image

[ceph: root@host01 /]# ceph config get "mgr.host01.smfvfd1" container_image

Copy to Clipboard

Toggle word wrap

Create a config-json.json file and add the following:

Note

Use the values from the output of the ceph config generate-minimal-conf command.

Example

{
  {
  "config": "# minimal ceph.conf for 8c9b0072-67ca-11eb-af06-001a4a0002a0\n[global]\n\tfsid = 8c9b0072-67ca-11eb-af06-001a4a0002a0\n\tmon_host =  [v2:10.10.200.10:3300/0,v1:10.10.200.10:6789/0] [v2:10.10.10.100:3300/0,v1:10.10.200.100:6789/0]\n",
  "keyring": "[mgr.Ceph5-2.smfvfd1]\n\tkey = AQDhcORgW8toCRAAlMzlqWXnh3cGRjqYEa9ikw==\n"
}
}

{
  {
  "config": "# minimal ceph.conf for 8c9b0072-67ca-11eb-af06-001a4a0002a0\n[global]\n\tfsid = 8c9b0072-67ca-11eb-af06-001a4a0002a0\n\tmon_host =  [v2:10.10.200.10:3300/0,v1:10.10.200.10:6789/0] [v2:10.10.10.100:3300/0,v1:10.10.200.100:6789/0]\n",
  "keyring": "[mgr.Ceph5-2.smfvfd1]\n\tkey = AQDhcORgW8toCRAAlMzlqWXnh3cGRjqYEa9ikw==\n"
}
}

Copy to Clipboard

Toggle word wrap

Exit from the Cephadm shell:
Example
```
[ceph: root@host01 /]# exit
```
```
[ceph: root@host01 /]# exit
```
Copy to Clipboard Toggle word wrap

Deploy the MGR daemon:

Example

cephadm --image registry.redhat.io/rhceph-alpha/rhceph-6-rhel9:latest  deploy --fsid  8c9b0072-67ca-11eb-af06-001a4a0002a0 --name mgr.host01.smfvfd1 --config-json config-json.json

[root@host01 ~]# cephadm --image registry.redhat.io/rhceph-alpha/rhceph-6-rhel9:latest  deploy --fsid  8c9b0072-67ca-11eb-af06-001a4a0002a0 --name mgr.host01.smfvfd1 --config-json config-json.json

Copy to Clipboard

Toggle word wrap

Verification

In the Cephadm shell, run the following command:

Example

[ceph: root@host01 /]# ceph -s

[ceph: root@host01 /]# ceph -s

Copy to Clipboard

Toggle word wrap

You can see a new mgr daemon has been added.

Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.

14.1. Pause or disable cephadm
Link kopieren

14.2. Per service and per daemon event
Link kopieren

14.3. Check cephadm logs
Link kopieren

14.4. Gather log files
Link kopieren

14.5. Collect systemd status
Link kopieren

14.6. List all downloaded container images
Link kopieren

14.7. Manually run containers
Link kopieren

14.8. CIDR network error
Link kopieren

14.9. Access the admin socket
Link kopieren

14.10. Manually deploying a mgr daemon
Link kopieren

Lernen

Testen, kaufen und verkaufen

Communitys

Über Red Hat Dokumentation

Mehr Inklusion in Open Source

Über Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.

Chapter 14. Cephadm troubleshooting

14.1. Pause or disable cephadmLink kopierenLink in die Zwischenablage kopiert!

14.2. Per service and per daemon eventLink kopierenLink in die Zwischenablage kopiert!

14.3. Check cephadm logsLink kopierenLink in die Zwischenablage kopiert!

14.4. Gather log filesLink kopierenLink in die Zwischenablage kopiert!

14.5. Collect systemd statusLink kopierenLink in die Zwischenablage kopiert!

14.6. List all downloaded container imagesLink kopierenLink in die Zwischenablage kopiert!

14.7. Manually run containersLink kopierenLink in die Zwischenablage kopiert!

14.8. CIDR network errorLink kopierenLink in die Zwischenablage kopiert!

14.9. Access the admin socketLink kopierenLink in die Zwischenablage kopiert!

14.10. Manually deploying a mgr daemonLink kopierenLink in die Zwischenablage kopiert!

Lernen

Testen, kaufen und verkaufen

Communitys

Über Red Hat Dokumentation

Mehr Inklusion in Open Source

Über Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

14.1. Pause or disable cephadm
Link kopieren

14.2. Per service and per daemon event
Link kopieren

14.3. Check cephadm logs
Link kopieren

14.4. Gather log files
Link kopieren

14.5. Collect systemd status
Link kopieren

14.6. List all downloaded container images
Link kopieren

14.7. Manually run containers
Link kopieren

14.8. CIDR network error
Link kopieren

14.9. Access the admin socket
Link kopieren

14.10. Manually deploying a mgr daemon
Link kopieren