Chapter 2. Understanding process management for Ceph
As a storage administrator, you can manipulate the various Ceph daemons by type or instance in a Red Hat Ceph Storage cluster. Manipulating these daemons allows you to start, stop and restart all of the Ceph services as needed.
2.1. Prerequisites Copy linkLink copied to clipboard!
- Installation of the Red Hat Ceph Storage software.
2.2. Ceph process management Copy linkLink copied to clipboard!
In Red Hat Ceph Storage, all process management is done through the Systemd service. Each time you want to start
, restart
, and stop
the Ceph daemons, you must specify the daemon type or the daemon instance.
Additional Resources
- For more information on using systemd, see Introduction to systemd chapter, and the Managing system services with systemctl chapter in the Configuring basic system settings guide for Red Hat Enterprise Linux 8.
2.3. Starting, stopping, and restarting all Ceph daemons Copy linkLink copied to clipboard!
You can start, stop, and restart all Ceph daemons as the root user from the host where you want to stop the Ceph daemons.
Prerequisites
- A running Red Hat Ceph Storage cluster.
-
Having
root
access to the node.
Procedure
On the host where you want to start, stop, and restart the daemons, run the systemctl service to get the SERVICE_ID of the service.
Example
systemctl --type=service
[root@host01 ~]# systemctl --type=service ceph-499829b4-832f-11eb-8d6d-001a4a000635@mon.host01.service
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Starting all Ceph daemons:
Syntax
systemctl start SERVICE_ID
systemctl start SERVICE_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
systemctl start ceph-499829b4-832f-11eb-8d6d-001a4a000635@mon.host01.service
[root@host01 ~]# systemctl start ceph-499829b4-832f-11eb-8d6d-001a4a000635@mon.host01.service
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Stopping all Ceph daemons:
Syntax
systemctl stop SERVICE_ID
systemctl stop SERVICE_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
systemctl stop ceph-499829b4-832f-11eb-8d6d-001a4a000635@mon.host01.service
[root@host01 ~]# systemctl stop ceph-499829b4-832f-11eb-8d6d-001a4a000635@mon.host01.service
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Restarting all Ceph daemons:
Syntax
systemctl restart SERVICE_ID
systemctl restart SERVICE_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
systemctl restart ceph-499829b4-832f-11eb-8d6d-001a4a000635@mon.host01.service
[root@host01 ~]# systemctl restart ceph-499829b4-832f-11eb-8d6d-001a4a000635@mon.host01.service
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
2.4. Starting, stopping, and restarting all Ceph services Copy linkLink copied to clipboard!
Ceph services are logical groups of Ceph daemons of the same type, configured to run in the same Red Hat Ceph Storage cluster. The orchestration layer in Ceph allows the user to manage these services in a centralized way, making it easy to execute operations that affect all the Ceph daemons that belong to the same logical service. The Ceph daemons running in each host are managed through the Systemd service. You can start, stop, and restart all Ceph services from the host where you want to manage the Ceph services.
If you want to start,stop, or restart a specific Ceph daemon in a specific host, you need to use the SystemD service. To obtain a list of the SystemD services running in a specific host, connect to the host, and run the following command:
Example
systemctl list-units “ceph*”
[root@host01 ~]# systemctl list-units “ceph*”
The output will give you a list of the service names that you can use, to manage each Ceph daemon.
Prerequisites
- A running Red Hat Ceph Storage cluster.
-
Having
root
access to the node.
Procedure
Log into the Cephadm shell:
Example
cephadm shell
[root@host01 ~]# cephadm shell
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run the
ceph orch ls
command to get a list of Ceph services configured in the Red Hat Ceph Storage cluster and to get the specific service ID.Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To start a specific service, run the following command:
Syntax
ceph orch start SERVICE_ID
ceph orch start SERVICE_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph orch start node-exporter
[ceph: root@host01 /]# ceph orch start node-exporter
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To stop a specific service, run the following command:
ImportantThe
ceph orch stop SERVICE_ID
command results in the Red Hat Ceph Storage cluster being inaccessible, only for the MON and MGR service. It is recommended to use thesystemctl stop SERVICE_ID
command to stop a specific daemon in the host.Syntax
ceph orch stop SERVICE_ID
ceph orch stop SERVICE_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph orch stop node-exporter
[ceph: root@host01 /]# ceph orch stop node-exporter
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In the example the
ceph orch stop node-exporter
command removes all the daemons of thenode exporter
service.To restart a specific service, run the following command:
Syntax
ceph orch restart SERVICE_ID
ceph orch restart SERVICE_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph orch restart node-exporter
[ceph: root@host01 /]# ceph orch restart node-exporter
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
2.5. Viewing log files of Ceph daemons that run in containers Copy linkLink copied to clipboard!
Use the journald
daemon from the container host to view a log file of a Ceph daemon from a container.
Prerequisites
- Installation of the Red Hat Ceph Storage software.
- Root-level access to the node.
Procedure
To view the entire Ceph log file, run a
journalctl
command asroot
composed in the following format:Syntax
journalctl -u ceph SERVICE_ID
journalctl -u ceph SERVICE_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow journalctl -u ceph-499829b4-832f-11eb-8d6d-001a4a000635@osd.8.service
[root@host01 ~]# journalctl -u ceph-499829b4-832f-11eb-8d6d-001a4a000635@osd.8.service
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In the above example, you can view the entire log for the OSD with ID
osd.8
.To show only the recent journal entries, use the
-f
option.Syntax
journalctl -fu SERVICE_ID
journalctl -fu SERVICE_ID
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
journalctl -fu ceph-499829b4-832f-11eb-8d6d-001a4a000635@osd.8.service
[root@host01 ~]# journalctl -fu ceph-499829b4-832f-11eb-8d6d-001a4a000635@osd.8.service
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
You can also use the sosreport
utility to view the journald
logs. For more details about SOS reports, see the What is an sosreport and how to create one in Red Hat Enterprise Linux? solution on the Red Hat Customer Portal.
Additional Resources
-
The
journalctl
manual page.
2.6. Powering down and rebooting Red Hat Ceph Storage cluster Copy linkLink copied to clipboard!
You can power down and reboot the Red Hat Ceph Storage cluster using two different approaches: systemctl
commands and the Ceph Orchestrator. You can choose either approach to power down and reboot the cluster.
2.6.1. Powering down and rebooting the cluster using the systemctl commands Copy linkLink copied to clipboard!
You can use the systemctl
commands approach to power down and reboot the Red Hat Ceph Storage cluster. This approach follows the Linux way of stopping the services.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Root-level access.
Procedure
Powering down the Red Hat Ceph Storage cluster
- Stop the clients from using the Block Device images RADOS Gateway - Ceph Object Gateway on this cluster and any other clients.
Log into the Cephadm shell:
Example
cephadm shell
[root@host01 ~]# cephadm shell
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The cluster must be in healthy state (
Health_OK
and all PGsactive+clean
) before proceeding. Runceph status
on the host with the client keyrings, for example, the Ceph Monitor or OpenStack controller nodes, to ensure the cluster is healthy.Example
[ceph: root@host01 /]# ceph -s
[ceph: root@host01 /]# ceph -s
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If you use the Ceph File System (
CephFS
), bring down theCephFS
cluster:Syntax
ceph fs set FS_NAME max_mds 1 ceph fs fail FS_NAME ceph status ceph fs set FS_NAME joinable false
ceph fs set FS_NAME max_mds 1 ceph fs fail FS_NAME ceph status ceph fs set FS_NAME joinable false
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph fs set cephfs max_mds 1 [ceph: root@host01 /]# ceph fs fail cephfs [ceph: root@host01 /]# ceph status [ceph: root@host01 /]# ceph fs set cephfs joinable false
[ceph: root@host01 /]# ceph fs set cephfs max_mds 1 [ceph: root@host01 /]# ceph fs fail cephfs [ceph: root@host01 /]# ceph status [ceph: root@host01 /]# ceph fs set cephfs joinable false
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Set the
noout
,norecover
,norebalance
,nobackfill
,nodown
, andpause
flags. Run the following on a node with the client keyrings, for example, the Ceph Monitor or OpenStack controller node:Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow ImportantThe above example is only for stopping the service and each OSD in the OSD node and it needs to be repeated on each OSD node.
- If the MDS and Ceph Object Gateway nodes are on their own dedicated nodes, power them off.
Get the systemd target of the daemons:
Example
systemctl list-units --type target | grep ceph
[root@host01 ~]# systemctl list-units --type target | grep ceph ceph-0b007564-ec48-11ee-b736-525400fd02f8.target loaded active active Ceph cluster 0b007564-ec48-11ee-b736-525400fd02f8 ceph.target loaded active active All Ceph clusters and services
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Disable the target that includes the cluster FSID:
Example
systemctl disable ceph-0b007564-ec48-11ee-b736-525400fd02f8.target
[root@host01 ~]# systemctl disable ceph-0b007564-ec48-11ee-b736-525400fd02f8.target Removed "/etc/systemd/system/multi-user.target.wants/ceph-0b007564-ec48-11ee-b736-525400fd02f8.target". Removed "/etc/systemd/system/ceph.target.wants/ceph-0b007564-ec48-11ee-b736-525400fd02f8.target".
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Stop the target:
Example
systemctl stop ceph-0b007564-ec48-11ee-b736-525400fd02f8.target
[root@host01 ~]# systemctl stop ceph-0b007564-ec48-11ee-b736-525400fd02f8.target
Copy to Clipboard Copied! Toggle word wrap Toggle overflow This stops all the daemons on the host that needs to be stopped.
Shutdown the node:
Example
shutdown
[root@host01 ~]# shutdown Shutdown scheduled for Wed 2024-03-27 11:47:19 EDT, use 'shutdown -c' to cancel.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Repeat the above steps for all the nodes of the cluster.
Rebooting the Red Hat Ceph Storage cluster
- If network equipment was involved, ensure it is powered ON and stable prior to powering ON any Ceph hosts or nodes.
- Power ON the administration node.
Enable the systemd target to get all the daemons running:
Example
systemctl enable ceph-0b007564-ec48-11ee-b736-525400fd02f8.target
[root@host01 ~]# systemctl enable ceph-0b007564-ec48-11ee-b736-525400fd02f8.target Created symlink /etc/systemd/system/multi-user.target.wants/ceph-0b007564-ec48-11ee-b736-525400fd02f8.target
/etc/systemd/system/ceph-0b007564-ec48-11ee-b736-525400fd02f8.target. Created symlink /etc/systemd/system/ceph.target.wants/ceph-0b007564-ec48-11ee-b736-525400fd02f8.target /etc/systemd/system/ceph-0b007564-ec48-11ee-b736-525400fd02f8.target. Copy to Clipboard Copied! Toggle word wrap Toggle overflow Start the systemd target:
Example
systemctl start ceph-0b007564-ec48-11ee-b736-525400fd02f8.target
[root@host01 ~]# systemctl start ceph-0b007564-ec48-11ee-b736-525400fd02f8.target
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Wait for all the nodes to come up. Verify all the services are up and there are no connectivity issues between the nodes.
Unset the
noout
,norecover
,norebalance
,nobackfill
,nodown
andpause
flags. Run the following on a node with the client keyrings, for example, the Ceph Monitor or OpenStack controller node:Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If you use the Ceph File System (
CephFS
), bring theCephFS
cluster back up by setting thejoinable
flag totrue
:Syntax
ceph fs set FS_NAME joinable true
ceph fs set FS_NAME joinable true
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph fs set cephfs joinable true
[ceph: root@host01 /]# ceph fs set cephfs joinable true
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
-
Verify the cluster is in healthy state (
Health_OK
and all PGsactive+clean
). Runceph status
on a node with the client keyrings, for example, the Ceph Monitor or OpenStack controller nodes, to ensure the cluster is healthy.
Example
[ceph: root@host01 /]# ceph -s
[ceph: root@host01 /]# ceph -s
2.6.2. Powering down and rebooting the cluster using the Ceph Orchestrator Copy linkLink copied to clipboard!
You can also use the capabilities of the Ceph Orchestrator to power down and reboot the Red Hat Ceph Storage cluster. In most cases, it is a single system login that can help in powering off the cluster.
The Ceph Orchestrator supports several operations, such as start
, stop
, and restart
. You can use these commands with systemctl
, for some cases, in powering down or rebooting the cluster.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Root-level access to the node.
Procedure
Powering down the Red Hat Ceph Storage cluster
- Stop the clients from using the user Block Device Image and Ceph Object Gateway on this cluster and any other clients.
Log into the Cephadm shell:
Example
cephadm shell
[root@host01 ~]# cephadm shell
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The cluster must be in healthy state (
Health_OK
and all PGsactive+clean
) before proceeding. Runceph status
on the host with the client keyrings, for example, the Ceph Monitor or OpenStack controller nodes, to ensure the cluster is healthy.Example
[ceph: root@host01 /]# ceph -s
[ceph: root@host01 /]# ceph -s
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If you use the Ceph File System (
CephFS
), bring down theCephFS
cluster:Syntax
ceph fs set FS_NAME max_mds 1 ceph fs fail FS_NAME ceph status ceph fs set FS_NAME joinable false ceph mds fail FS_NAME:N
ceph fs set FS_NAME max_mds 1 ceph fs fail FS_NAME ceph status ceph fs set FS_NAME joinable false ceph mds fail FS_NAME:N
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph fs set cephfs max_mds 1 [ceph: root@host01 /]# ceph fs fail cephfs [ceph: root@host01 /]# ceph status [ceph: root@host01 /]# ceph fs set cephfs joinable false [ceph: root@host01 /]# ceph mds fail cephfs:1
[ceph: root@host01 /]# ceph fs set cephfs max_mds 1 [ceph: root@host01 /]# ceph fs fail cephfs [ceph: root@host01 /]# ceph status [ceph: root@host01 /]# ceph fs set cephfs joinable false [ceph: root@host01 /]# ceph mds fail cephfs:1
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Set the
noout
,norecover
,norebalance
,nobackfill
,nodown
, andpause
flags. Run the following on a node with the client keyrings, for example, the Ceph Monitor or OpenStack controller node:Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Stop the MDS service.
Fetch the MDS service name:
Example
[ceph: root@host01 /]# ceph orch ls --service-type mds
[ceph: root@host01 /]# ceph orch ls --service-type mds
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Stop the MDS service using the fetched name in the previous step:
Syntax
ceph orch stop SERVICE-NAME
ceph orch stop SERVICE-NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Stop the Ceph Object Gateway services. Repeat for each deployed service.
Fetch the Ceph Object Gateway service names:
Example
[ceph: root@host01 /]# ceph orch ls --service-type rgw
[ceph: root@host01 /]# ceph orch ls --service-type rgw
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Stop the Ceph Object Gateway service using the fetched name:
Syntax
ceph orch stop SERVICE-NAME
ceph orch stop SERVICE-NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Stop the Alertmanager service:
Example
[ceph: root@host01 /]# ceph orch stop alertmanager
[ceph: root@host01 /]# ceph orch stop alertmanager
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Stop the node-exporter service which is a part of the monitoring stack:
Example
[ceph: root@host01 /]# ceph orch stop node-exporter
[ceph: root@host01 /]# ceph orch stop node-exporter
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Stop the Prometheus service:
Example
[ceph: root@host01 /]# ceph orch stop prometheus
[ceph: root@host01 /]# ceph orch stop prometheus
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Stop the Grafana dashboard service:
Example
[ceph: root@host01 /]# ceph orch stop grafana
[ceph: root@host01 /]# ceph orch stop grafana
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Stop the crash service:
Example
[ceph: root@host01 /]# ceph orch stop crash
[ceph: root@host01 /]# ceph orch stop crash
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Shut down the OSD nodes from the cephadm node, one by one. Repeat this step for all the OSDs in the cluster.
Fetch the OSD ID:
Example
[ceph: root@host01 /]# ceph orch ps --daemon-type=osd
[ceph: root@host01 /]# ceph orch ps --daemon-type=osd
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Shut down the OSD node using the OSD ID you fetched:
Example
[ceph: root@host01 /]# ceph orch daemon stop osd.1 Scheduled to stop osd.1 on host 'host02'
[ceph: root@host01 /]# ceph orch daemon stop osd.1 Scheduled to stop osd.1 on host 'host02'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Stop the monitors one by one.
Identify the hosts hosting the monitors:
Example
[ceph: root@host01 /]# ceph orch ps --daemon-type mon
[ceph: root@host01 /]# ceph orch ps --daemon-type mon
Copy to Clipboard Copied! Toggle word wrap Toggle overflow On each host, stop the monitor.
Identify the
systemctl
unit name:Example
[ceph: root@host01 /]# systemctl list-units ceph-* | grep mon
[ceph: root@host01 /]# systemctl list-units ceph-* | grep mon
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Stop the service:
Syntax
systemct stop SERVICE-NAME
systemct stop SERVICE-NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
- Shut down all the hosts.
Rebooting the Red Hat Ceph Storage cluster
- If network equipment was involved, ensure it is powered ON and stable prior to powering ON any Ceph hosts or nodes.
- Power ON all the Ceph hosts.
Log into the administration node from the Cephadm shell:
Example
cephadm shell
[root@host01 ~]# cephadm shell
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify all the services are in running state:
Example
[ceph: root@host01 /]# ceph orch ls
[ceph: root@host01 /]# ceph orch ls
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Ensure the cluster health is `Health_OK`status:
Example
[ceph: root@host01 /]# ceph -s
[ceph: root@host01 /]# ceph -s
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Unset the
noout
,norecover
,norebalance
,nobackfill
,nodown
andpause
flags. Run the following on a node with the client keyrings, for example, the Ceph Monitor or OpenStack controller node:Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If you use the Ceph File System (
CephFS
), bring theCephFS
cluster back up by setting thejoinable
flag totrue
:Syntax
ceph fs set FS_NAME joinable true
ceph fs set FS_NAME joinable true
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
[ceph: root@host01 /]# ceph fs set cephfs joinable true
[ceph: root@host01 /]# ceph fs set cephfs joinable true
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
-
Verify the cluster is in healthy state (
Health_OK
and all PGsactive+clean
). Runceph status
on a node with the client keyrings, for example, the Ceph Monitor or OpenStack controller nodes, to ensure the cluster is healthy.
Example
[ceph: root@host01 /]# ceph -s
[ceph: root@host01 /]# ceph -s