Chapter 2. Understanding process management for Ceph
As a storage administrator, you can manipulate the various Ceph daemons by type or instance in a Red Hat Ceph Storage cluster. Manipulating these daemons allows you to start, stop and restart all of the Ceph services as needed.
2.1. Prerequisites
- Installation of the Red Hat Ceph Storage software.
2.2. Ceph process management
In Red Hat Ceph Storage, all process management is done through the Systemd service. Each time you want to start
, restart
, and stop
the Ceph daemons, you must specify the daemon type or the daemon instance.
Additional Resources
- For more information on using systemd, see Introduction to systemd chapter, and the Managing system services with systemctl chapter in the Configuring basic system settings guide for Red Hat Enterprise Linux 8.
2.3. Starting, stopping, and restarting all Ceph daemons
You can start, stop, and restart all Ceph daemons as the root user from the host where you want to stop the Ceph daemons.
Prerequisites
- A running Red Hat Ceph Storage cluster.
-
Having
root
access to the node.
Procedure
On the host where you want to start, stop, and restart the daemons, run the systemctl service to get the SERVICE_ID of the service.
Example
[root@host01 ~]# systemctl --type=service ceph-499829b4-832f-11eb-8d6d-001a4a000635@mon.host01.service
Starting all Ceph daemons:
Syntax
systemctl start SERVICE_ID
Example
[root@host01 ~]# systemctl start ceph-499829b4-832f-11eb-8d6d-001a4a000635@mon.host01.service
Stopping all Ceph daemons:
Syntax
systemctl stop SERVICE_ID
Example
[root@host01 ~]# systemctl stop ceph-499829b4-832f-11eb-8d6d-001a4a000635@mon.host01.service
Restarting all Ceph daemons:
Syntax
systemctl restart SERVICE_ID
Example
[root@host01 ~]# systemctl restart ceph-499829b4-832f-11eb-8d6d-001a4a000635@mon.host01.service
2.4. Starting, stopping, and restarting all Ceph services
Ceph services are logical groups of Ceph daemons of the same type, configured to run in the same Red Hat Ceph Storage cluster. The orchestration layer in Ceph allows the user to manage these services in a centralized way, making it easy to execute operations that affect all the Ceph daemons that belong to the same logical service. The Ceph daemons running in each host are managed through the Systemd service. You can start, stop, and restart all Ceph services from the host where you want to manage the Ceph services.
If you want to start,stop, or restart a specific Ceph daemon in a specific host, you need to use the SystemD service. To obtain a list of the SystemD services running in a specific host, connect to the host, and run the following command:
Example
[root@host01 ~]# systemctl list-units “ceph*”
The output will give you a list of the service names that you can use, to manage each Ceph daemon.
Prerequisites
- A running Red Hat Ceph Storage cluster.
-
Having
root
access to the node.
Procedure
Log into the Cephadm shell:
Example
[root@host01 ~]# cephadm shell
Run the
ceph orch ls
command to get a list of Ceph services configured in the Red Hat Ceph Storage cluster and to get the specific service ID.Example
[ceph: root@host01 /]# ceph orch ls NAME RUNNING REFRESHED AGE PLACEMENT IMAGE NAME IMAGE ID alertmanager 1/1 4m ago 4M count:1 registry.redhat.io/openshift4/ose-prometheus-alertmanager:v4.5 b7bae610cd46 crash 3/3 4m ago 4M * registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest c88a5d60f510 grafana 1/1 4m ago 4M count:1 registry.redhat.io/rhceph-alpha/rhceph-5-dashboard-rhel8:latest bd3d7748747b mgr 2/2 4m ago 4M count:2 registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest c88a5d60f510 mon 2/2 4m ago 10w count:2 registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest c88a5d60f510 nfs.foo 0/1 - - count:1 <unknown> <unknown> node-exporter 1/3 4m ago 4M * registry.redhat.io/openshift4/ose-prometheus-node-exporter:v4.5 mix osd.all-available-devices 5/5 4m ago 3M * registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest c88a5d60f510 prometheus 1/1 4m ago 4M count:1 registry.redhat.io/openshift4/ose-prometheus:v4.6 bebb0ddef7f0 rgw.test_realm.test_zone 2/2 4m ago 3M count:2 registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest c88a5d60f510
To start a specific service, run the following command:
Syntax
ceph orch start SERVICE_ID
Example
[ceph: root@host01 /]# ceph orch start node-exporter
To stop a specific service, run the following command:
ImportantThe
ceph orch stop SERVICE_ID
command results in the Red Hat Ceph Storage cluster being inaccessible, only for the MON and MGR service. It is recommended to use thesystemctl stop SERVICE_ID
command to stop a specific daemon in the host.Syntax
ceph orch stop SERVICE_ID
Example
[ceph: root@host01 /]# ceph orch stop node-exporter
In the example the
ceph orch stop node-exporter
command removes all the daemons of thenode exporter
service.To restart a specific service, run the following command:
Syntax
ceph orch restart SERVICE_ID
Example
[ceph: root@host01 /]# ceph orch restart node-exporter
2.5. Viewing log files of Ceph daemons that run in containers
Use the journald
daemon from the container host to view a log file of a Ceph daemon from a container.
Prerequisites
- Installation of the Red Hat Ceph Storage software.
- Root-level access to the node.
Procedure
To view the entire Ceph log file, run a
journalctl
command asroot
composed in the following format:Syntax
journalctl -u ceph SERVICE_ID
[root@host01 ~]# journalctl -u ceph-499829b4-832f-11eb-8d6d-001a4a000635@osd.8.service
In the above example, you can view the entire log for the OSD with ID
osd.8
.To show only the recent journal entries, use the
-f
option.Syntax
journalctl -fu SERVICE_ID
Example
[root@host01 ~]# journalctl -fu ceph-499829b4-832f-11eb-8d6d-001a4a000635@osd.8.service
You can also use the sosreport
utility to view the journald
logs. For more details about SOS reports, see the What is an sosreport and how to create one in Red Hat Enterprise Linux? solution on the Red Hat Customer Portal.
Additional Resources
-
The
journalctl
manual page.
2.6. Powering down and rebooting Red Hat Ceph Storage cluster
You can power down and reboot the Red Hat Ceph Storage cluster using two different approaches: systemctl
commands and the Ceph Orchestrator. You can choose either approach to power down and reboot the cluster.
2.6.1. Powering down and rebooting the cluster using the systemctl
commands
You can use the systemctl
commands approach to power down and reboot the Red Hat Ceph Storage cluster. This approach follows the Linux way of stopping the services.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Root-level access.
Procedure
Powering down the Red Hat Ceph Storage cluster
- Stop the clients from using the Block Device images RADOS Gateway - Ceph Object Gateway on this cluster and any other clients.
Log into the Cephadm shell:
Example
[root@host01 ~]# cephadm shell
The cluster must be in healthy state (
Health_OK
and all PGsactive+clean
) before proceeding. Runceph status
on the host with the client keyrings, for example, the Ceph Monitor or OpenStack controller nodes, to ensure the cluster is healthy.Example
[ceph: root@host01 /]# ceph -s
If you use the Ceph File System (
CephFS
), bring down theCephFS
cluster:Syntax
ceph fs set FS_NAME max_mds 1 ceph fs fail FS_NAME ceph status ceph fs set FS_NAME joinable false
Example
[ceph: root@host01 /]# ceph fs set cephfs max_mds 1 [ceph: root@host01 /]# ceph fs fail cephfs [ceph: root@host01 /]# ceph status [ceph: root@host01 /]# ceph fs set cephfs joinable false
Set the
noout
,norecover
,norebalance
,nobackfill
,nodown
, andpause
flags. Run the following on a node with the client keyrings, for example, the Ceph Monitor or OpenStack controller node:Example
[ceph: root@host01 /]# ceph osd set noout [ceph: root@host01 /]# ceph osd set norecover [ceph: root@host01 /]# ceph osd set norebalance [ceph: root@host01 /]# ceph osd set nobackfill [ceph: root@host01 /]# ceph osd set nodown [ceph: root@host01 /]# ceph osd set pause
ImportantThe above example is only for stopping the service and each OSD in the OSD node and it needs to be repeated on each OSD node.
- If the MDS and Ceph Object Gateway nodes are on their own dedicated nodes, power them off.
Get the systemd target of the daemons:
Example
[root@host01 ~]# systemctl list-units --type target | grep ceph ceph-0b007564-ec48-11ee-b736-525400fd02f8.target loaded active active Ceph cluster 0b007564-ec48-11ee-b736-525400fd02f8 ceph.target loaded active active All Ceph clusters and services
Disable the target that includes the cluster FSID:
Example
[root@host01 ~]# systemctl disable ceph-0b007564-ec48-11ee-b736-525400fd02f8.target Removed "/etc/systemd/system/multi-user.target.wants/ceph-0b007564-ec48-11ee-b736-525400fd02f8.target". Removed "/etc/systemd/system/ceph.target.wants/ceph-0b007564-ec48-11ee-b736-525400fd02f8.target".
Stop the target:
Example
[root@host01 ~]# systemctl stop ceph-0b007564-ec48-11ee-b736-525400fd02f8.target
This stops all the daemons on the host that needs to be stopped.
Shutdown the node:
Example
[root@host01 ~]# shutdown Shutdown scheduled for Wed 2024-03-27 11:47:19 EDT, use 'shutdown -c' to cancel.
- Repeat the above steps for all the nodes of the cluster.
Rebooting the Red Hat Ceph Storage cluster
- If network equipment was involved, ensure it is powered ON and stable prior to powering ON any Ceph hosts or nodes.
- Power ON the administration node.
Enable the systemd target to get all the daemons running:
Example
[root@host01 ~]# systemctl enable ceph-0b007564-ec48-11ee-b736-525400fd02f8.target Created symlink /etc/systemd/system/multi-user.target.wants/ceph-0b007564-ec48-11ee-b736-525400fd02f8.target
/etc/systemd/system/ceph-0b007564-ec48-11ee-b736-525400fd02f8.target. Created symlink /etc/systemd/system/ceph.target.wants/ceph-0b007564-ec48-11ee-b736-525400fd02f8.target /etc/systemd/system/ceph-0b007564-ec48-11ee-b736-525400fd02f8.target. Start the systemd target:
Example
[root@host01 ~]# systemctl start ceph-0b007564-ec48-11ee-b736-525400fd02f8.target
- Wait for all the nodes to come up. Verify all the services are up and there are no connectivity issues between the nodes.
Unset the
noout
,norecover
,norebalance
,nobackfill
,nodown
andpause
flags. Run the following on a node with the client keyrings, for example, the Ceph Monitor or OpenStack controller node:Example
[ceph: root@host01 /]# ceph osd unset noout [ceph: root@host01 /]# ceph osd unset norecover [ceph: root@host01 /]# ceph osd unset norebalance [ceph: root@host01 /]# ceph osd unset nobackfill [ceph: root@host01 /]# ceph osd unset nodown [ceph: root@host01 /]# ceph osd unset pause
If you use the Ceph File System (
CephFS
), bring theCephFS
cluster back up by setting thejoinable
flag totrue
:Syntax
ceph fs set FS_NAME joinable true
Example
[ceph: root@host01 /]# ceph fs set cephfs joinable true
Verification
-
Verify the cluster is in healthy state (
Health_OK
and all PGsactive+clean
). Runceph status
on a node with the client keyrings, for example, the Ceph Monitor or OpenStack controller nodes, to ensure the cluster is healthy.
Example
[ceph: root@host01 /]# ceph -s
Additional Resources
- For more information on installing Ceph, see the Red Hat Ceph Storage Installation Guide.
2.6.2. Powering down and rebooting the cluster using the Ceph Orchestrator
You can also use the capabilities of the Ceph Orchestrator to power down and reboot the Red Hat Ceph Storage cluster. In most cases, it is a single system login that can help in powering off the cluster.
The Ceph Orchestrator supports several operations, such as start
, stop
, and restart
. You can use these commands with systemctl
, for some cases, in powering down or rebooting the cluster.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Root-level access to the node.
Procedure
Powering down the Red Hat Ceph Storage cluster
- Stop the clients from using the user Block Device Image and Ceph Object Gateway on this cluster and any other clients.
Log into the Cephadm shell:
Example
[root@host01 ~]# cephadm shell
The cluster must be in healthy state (
Health_OK
and all PGsactive+clean
) before proceeding. Runceph status
on the host with the client keyrings, for example, the Ceph Monitor or OpenStack controller nodes, to ensure the cluster is healthy.Example
[ceph: root@host01 /]# ceph -s
If you use the Ceph File System (
CephFS
), bring down theCephFS
cluster:Syntax
ceph fs set FS_NAME max_mds 1 ceph fs fail FS_NAME ceph status ceph fs set FS_NAME joinable false ceph mds fail FS_NAME:N
Example
[ceph: root@host01 /]# ceph fs set cephfs max_mds 1 [ceph: root@host01 /]# ceph fs fail cephfs [ceph: root@host01 /]# ceph status [ceph: root@host01 /]# ceph fs set cephfs joinable false [ceph: root@host01 /]# ceph mds fail cephfs:1
Set the
noout
,norecover
,norebalance
,nobackfill
,nodown
, andpause
flags. Run the following on a node with the client keyrings, for example, the Ceph Monitor or OpenStack controller node:Example
[ceph: root@host01 /]# ceph osd set noout [ceph: root@host01 /]# ceph osd set norecover [ceph: root@host01 /]# ceph osd set norebalance [ceph: root@host01 /]# ceph osd set nobackfill [ceph: root@host01 /]# ceph osd set nodown [ceph: root@host01 /]# ceph osd set pause
Stop the MDS service.
Fetch the MDS service name:
Example
[ceph: root@host01 /]# ceph orch ls --service-type mds
Stop the MDS service using the fetched name in the previous step:
Syntax
ceph orch stop SERVICE-NAME
Stop the Ceph Object Gateway services. Repeat for each deployed service.
Fetch the Ceph Object Gateway service names:
Example
[ceph: root@host01 /]# ceph orch ls --service-type rgw
Stop the Ceph Object Gateway service using the fetched name:
Syntax
ceph orch stop SERVICE-NAME
Stop the Alertmanager service:
Example
[ceph: root@host01 /]# ceph orch stop alertmanager
Stop the node-exporter service which is a part of the monitoring stack:
Example
[ceph: root@host01 /]# ceph orch stop node-exporter
Stop the Prometheus service:
Example
[ceph: root@host01 /]# ceph orch stop prometheus
Stop the Grafana dashboard service:
Example
[ceph: root@host01 /]# ceph orch stop grafana
Stop the crash service:
Example
[ceph: root@host01 /]# ceph orch stop crash
Shut down the OSD nodes from the cephadm node, one by one. Repeat this step for all the OSDs in the cluster.
Fetch the OSD ID:
Example
[ceph: root@host01 /]# ceph orch ps --daemon-type=osd
Shut down the OSD node using the OSD ID you fetched:
Example
[ceph: root@host01 /]# ceph orch daemon stop osd.1 Scheduled to stop osd.1 on host 'host02'
Stop the monitors one by one.
Identify the hosts hosting the monitors:
Example
[ceph: root@host01 /]# ceph orch ps --daemon-type mon
On each host, stop the monitor.
Identify the
systemctl
unit name:Example
[ceph: root@host01 /]# systemctl list-units ceph-* | grep mon
Stop the service:
Syntax
systemct stop SERVICE-NAME
- Shut down all the hosts.
Rebooting the Red Hat Ceph Storage cluster
- If network equipment was involved, ensure it is powered ON and stable prior to powering ON any Ceph hosts or nodes.
- Power ON all the Ceph hosts.
Log into the administration node from the Cephadm shell:
Example
[root@host01 ~]# cephadm shell
Verify all the services are in running state:
Example
[ceph: root@host01 /]# ceph orch ls
Ensure the cluster health is `Health_OK`status:
Example
[ceph: root@host01 /]# ceph -s
Unset the
noout
,norecover
,norebalance
,nobackfill
,nodown
andpause
flags. Run the following on a node with the client keyrings, for example, the Ceph Monitor or OpenStack controller node:Example
[ceph: root@host01 /]# ceph osd unset noout [ceph: root@host01 /]# ceph osd unset norecover [ceph: root@host01 /]# ceph osd unset norebalance [ceph: root@host01 /]# ceph osd unset nobackfill [ceph: root@host01 /]# ceph osd unset nodown [ceph: root@host01 /]# ceph osd unset pause
If you use the Ceph File System (
CephFS
), bring theCephFS
cluster back up by setting thejoinable
flag totrue
:Syntax
ceph fs set FS_NAME joinable true
Example
[ceph: root@host01 /]# ceph fs set cephfs joinable true
Verification
-
Verify the cluster is in healthy state (
Health_OK
and all PGsactive+clean
). Runceph status
on a node with the client keyrings, for example, the Ceph Monitor or OpenStack controller nodes, to ensure the cluster is healthy.
Example
[ceph: root@host01 /]# ceph -s
Additional Resources
- For more information on installing Ceph see the Red Hat Ceph Storage Installation Guide