Chapter 2. Understanding process management for Ceph


As a storage administrator, you can manipulate the various Ceph daemons by type or instance in a Red Hat Ceph Storage cluster. Manipulating these daemons allows you to start, stop and restart all of the Ceph services as needed.

2.1. Prerequisites

  • Installation of the Red Hat Ceph Storage software.

2.2. Ceph process management

In Red Hat Ceph Storage, all process management is done through the Systemd service. Each time you want to start, restart, and stop the Ceph daemons, you must specify the daemon type or the daemon instance.

Additional Resources

2.3. Starting, stopping, and restarting all Ceph daemons

You can start, stop, and restart all Ceph daemons as the root user from the host where you want to stop the Ceph daemons.

Prerequisites

  • A running Red Hat Ceph Storage cluster.
  • Having root access to the node.

Procedure

  1. On the host where you want to start, stop, and restart the daemons, run the systemctl service to get the SERVICE_ID of the service.

    Example

    [root@host01 ~]# systemctl --type=service
    ceph-499829b4-832f-11eb-8d6d-001a4a000635@mon.host01.service

  2. Starting all Ceph daemons:

    Syntax

    systemctl start SERVICE_ID

    Example

    [root@host01 ~]# systemctl start ceph-499829b4-832f-11eb-8d6d-001a4a000635@mon.host01.service

  3. Stopping all Ceph daemons:

    Syntax

    systemctl stop SERVICE_ID

    Example

    [root@host01 ~]# systemctl stop ceph-499829b4-832f-11eb-8d6d-001a4a000635@mon.host01.service

  4. Restarting all Ceph daemons:

    Syntax

    systemctl restart SERVICE_ID

    Example

    [root@host01 ~]# systemctl restart ceph-499829b4-832f-11eb-8d6d-001a4a000635@mon.host01.service

2.4. Starting, stopping, and restarting all Ceph services

Ceph services are logical groups of Ceph daemons of the same type, configured to run in the same Red Hat Ceph Storage cluster. The orchestration layer in Ceph allows the user to manage these services in a centralized way, making it easy to execute operations that affect all the Ceph daemons that belong to the same logical service. The Ceph daemons running in each host are managed through the Systemd service. You can start, stop, and restart all Ceph services from the host where you want to manage the Ceph services.

Important

If you want to start,stop, or restart a specific Ceph daemon in a specific host, you need to use the SystemD service. To obtain a list of the SystemD services running in a specific host, connect to the host, and run the following command:

Example

[root@host01 ~]# systemctl list-units “ceph*”

The output will give you a list of the service names that you can use, to manage each Ceph daemon.

Prerequisites

  • A running Red Hat Ceph Storage cluster.
  • Having root access to the node.

Procedure

  1. Log into the Cephadm shell:

    Example

    [root@host01 ~]# cephadm shell

  2. Run the ceph orch ls command to get a list of Ceph services configured in the Red Hat Ceph Storage cluster and to get the specific service ID.

    Example

    [ceph: root@host01 /]# ceph orch ls
    NAME                       RUNNING  REFRESHED  AGE  PLACEMENT  IMAGE NAME                                                       IMAGE ID
    alertmanager                   1/1  4m ago     4M   count:1    registry.redhat.io/openshift4/ose-prometheus-alertmanager:v4.5   b7bae610cd46
    crash                          3/3  4m ago     4M   *          registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest            c88a5d60f510
    grafana                        1/1  4m ago     4M   count:1    registry.redhat.io/rhceph-alpha/rhceph-5-dashboard-rhel8:latest  bd3d7748747b
    mgr                            2/2  4m ago     4M   count:2    registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest            c88a5d60f510
    mon                            2/2  4m ago     10w  count:2    registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest            c88a5d60f510
    nfs.foo                        0/1  -          -    count:1    <unknown>                                                        <unknown>
    node-exporter                  1/3  4m ago     4M   *          registry.redhat.io/openshift4/ose-prometheus-node-exporter:v4.5  mix
    osd.all-available-devices      5/5  4m ago     3M   *          registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest            c88a5d60f510
    prometheus                     1/1  4m ago     4M   count:1    registry.redhat.io/openshift4/ose-prometheus:v4.6                bebb0ddef7f0
    rgw.test_realm.test_zone       2/2  4m ago     3M   count:2    registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest            c88a5d60f510

  3. To start a specific service, run the following command:

    Syntax

    ceph orch start SERVICE_ID

    Example

    [ceph: root@host01 /]# ceph orch start node-exporter

  4. To stop a specific service, run the following command:

    Important

    The ceph orch stop SERVICE_ID command results in the Red Hat Ceph Storage cluster being inaccessible, only for the MON and MGR service. It is recommended to use the systemctl stop SERVICE_ID command to stop a specific daemon in the host.

    Syntax

    ceph orch stop SERVICE_ID

    Example

    [ceph: root@host01 /]# ceph orch stop node-exporter

    In the example the ceph orch stop node-exporter command removes all the daemons of the node exporter service.

  5. To restart a specific service, run the following command:

    Syntax

    ceph orch restart SERVICE_ID

    Example

    [ceph: root@host01 /]# ceph orch restart node-exporter

2.5. Viewing log files of Ceph daemons that run in containers

Use the journald daemon from the container host to view a log file of a Ceph daemon from a container.

Prerequisites

  • Installation of the Red Hat Ceph Storage software.
  • Root-level access to the node.

Procedure

  1. To view the entire Ceph log file, run a journalctl command as root composed in the following format:

    Syntax

    journalctl -u ceph SERVICE_ID

    [root@host01 ~]# journalctl -u ceph-499829b4-832f-11eb-8d6d-001a4a000635@osd.8.service

    In the above example, you can view the entire log for the OSD with ID osd.8.

  2. To show only the recent journal entries, use the -f option.

    Syntax

    journalctl -fu SERVICE_ID

    Example

    [root@host01 ~]# journalctl -fu ceph-499829b4-832f-11eb-8d6d-001a4a000635@osd.8.service

Note

You can also use the sosreport utility to view the journald logs. For more details about SOS reports, see the What is an sosreport and how to create one in Red Hat Enterprise Linux? solution on the Red Hat Customer Portal.

Additional Resources

  • The journalctl manual page.

2.6. Powering down and rebooting Red Hat Ceph Storage cluster

You can power down and reboot the Red Hat Ceph Storage cluster using two different approaches: systemctl commands and the Ceph Orchestrator. You can choose either approach to power down and reboot the cluster.

2.6.1. Powering down and rebooting the cluster using the systemctl commands

You can use the systemctl commands approach to power down and reboot the Red Hat Ceph Storage cluster. This approach follows the Linux way of stopping the services.

Prerequisites

  • A running Red Hat Ceph Storage cluster.
  • Root-level access.

Procedure

Powering down the Red Hat Ceph Storage cluster

  1. Stop the clients from using the Block Device images RADOS Gateway - Ceph Object Gateway on this cluster and any other clients.
  2. Log into the Cephadm shell:

    Example

    [root@host01 ~]# cephadm shell

  3. The cluster must be in healthy state (Health_OK and all PGs active+clean) before proceeding. Run ceph status on the host with the client keyrings, for example, the Ceph Monitor or OpenStack controller nodes, to ensure the cluster is healthy.

    Example

    [ceph: root@host01 /]# ceph -s

  4. If you use the Ceph File System (CephFS), bring down the CephFS cluster:

    Syntax

    ceph fs set FS_NAME max_mds 1
    ceph fs fail FS_NAME
    ceph status
    ceph fs set FS_NAME joinable false

    Example

    [ceph: root@host01 /]# ceph fs set cephfs max_mds 1
    [ceph: root@host01 /]# ceph fs fail cephfs
    [ceph: root@host01 /]# ceph status
    [ceph: root@host01 /]# ceph fs set cephfs joinable false

  5. Set the noout, norecover, norebalance, nobackfill, nodown, and pause flags. Run the following on a node with the client keyrings, for example, the Ceph Monitor or OpenStack controller node:

    Example

    [ceph: root@host01 /]# ceph osd set noout
    [ceph: root@host01 /]# ceph osd set norecover
    [ceph: root@host01 /]# ceph osd set norebalance
    [ceph: root@host01 /]# ceph osd set nobackfill
    [ceph: root@host01 /]# ceph osd set nodown
    [ceph: root@host01 /]# ceph osd set pause

    Important

    The above example is only for stopping the service and each OSD in the OSD node and it needs to be repeated on each OSD node.

  6. If the MDS and Ceph Object Gateway nodes are on their own dedicated nodes, power them off.
  7. Get the systemd target of the daemons:

    Example

    [root@host01 ~]# systemctl list-units --type target | grep ceph
    ceph-0b007564-ec48-11ee-b736-525400fd02f8.target loaded active active Ceph cluster 0b007564-ec48-11ee-b736-525400fd02f8
    ceph.target                                      loaded active active All Ceph clusters and services

  8. Disable the target that includes the cluster FSID:

    Example

    [root@host01 ~]# systemctl disable ceph-0b007564-ec48-11ee-b736-525400fd02f8.target
    
    Removed "/etc/systemd/system/multi-user.target.wants/ceph-0b007564-ec48-11ee-b736-525400fd02f8.target".
    Removed "/etc/systemd/system/ceph.target.wants/ceph-0b007564-ec48-11ee-b736-525400fd02f8.target".

  9. Stop the target:

    Example

    [root@host01 ~]# systemctl stop ceph-0b007564-ec48-11ee-b736-525400fd02f8.target

    This stops all the daemons on the host that needs to be stopped.

  10. Shutdown the node:

    Example

    [root@host01 ~]# shutdown
    Shutdown scheduled for Wed 2024-03-27 11:47:19 EDT, use 'shutdown -c' to cancel.

  11. Repeat the above steps for all the nodes of the cluster.

Rebooting the Red Hat Ceph Storage cluster

  1. If network equipment was involved, ensure it is powered ON and stable prior to powering ON any Ceph hosts or nodes.
  2. Power ON the administration node.
  3. Enable the systemd target to get all the daemons running:

    Example

    [root@host01 ~]# systemctl enable ceph-0b007564-ec48-11ee-b736-525400fd02f8.target
    Created symlink /etc/systemd/system/multi-user.target.wants/ceph-0b007564-ec48-11ee-b736-525400fd02f8.target  /etc/systemd/system/ceph-0b007564-ec48-11ee-b736-525400fd02f8.target.
    Created symlink /etc/systemd/system/ceph.target.wants/ceph-0b007564-ec48-11ee-b736-525400fd02f8.target  /etc/systemd/system/ceph-0b007564-ec48-11ee-b736-525400fd02f8.target.

  4. Start the systemd target:

    Example

    [root@host01 ~]# systemctl start ceph-0b007564-ec48-11ee-b736-525400fd02f8.target

  5. Wait for all the nodes to come up. Verify all the services are up and there are no connectivity issues between the nodes.
  6. Unset the noout, norecover, norebalance, nobackfill, nodown and pause flags. Run the following on a node with the client keyrings, for example, the Ceph Monitor or OpenStack controller node:

    Example

    [ceph: root@host01 /]# ceph osd unset noout
    [ceph: root@host01 /]# ceph osd unset norecover
    [ceph: root@host01 /]# ceph osd unset norebalance
    [ceph: root@host01 /]# ceph osd unset nobackfill
    [ceph: root@host01 /]# ceph osd unset nodown
    [ceph: root@host01 /]# ceph osd unset pause

  7. If you use the Ceph File System (CephFS), bring the CephFS cluster back up by setting the joinable flag to true:

    Syntax

    ceph fs set FS_NAME joinable true

    Example

    [ceph: root@host01 /]# ceph fs set cephfs joinable true

Verification

  • Verify the cluster is in healthy state (Health_OK and all PGs active+clean). Run ceph status on a node with the client keyrings, for example, the Ceph Monitor or OpenStack controller nodes, to ensure the cluster is healthy.

Example

[ceph: root@host01 /]# ceph -s

Additional Resources

2.6.2. Powering down and rebooting the cluster using the Ceph Orchestrator

You can also use the capabilities of the Ceph Orchestrator to power down and reboot the Red Hat Ceph Storage cluster. In most cases, it is a single system login that can help in powering off the cluster.

The Ceph Orchestrator supports several operations, such as start, stop, and restart. You can use these commands with systemctl, for some cases, in powering down or rebooting the cluster.

Prerequisites

  • A running Red Hat Ceph Storage cluster.
  • Root-level access to the node.

Procedure

Powering down the Red Hat Ceph Storage cluster

  1. Stop the clients from using the user Block Device Image and Ceph Object Gateway on this cluster and any other clients.
  2. Log into the Cephadm shell:

    Example

    [root@host01 ~]# cephadm shell

  3. The cluster must be in healthy state (Health_OK and all PGs active+clean) before proceeding. Run ceph status on the host with the client keyrings, for example, the Ceph Monitor or OpenStack controller nodes, to ensure the cluster is healthy.

    Example

    [ceph: root@host01 /]# ceph -s

  4. If you use the Ceph File System (CephFS), bring down the CephFS cluster:

    Syntax

    ceph fs set FS_NAME max_mds 1
    ceph fs fail FS_NAME
    ceph status
    ceph fs set FS_NAME joinable false
    ceph mds fail FS_NAME:N

    Example

    [ceph: root@host01 /]# ceph fs set cephfs max_mds 1
    [ceph: root@host01 /]# ceph fs fail cephfs
    [ceph: root@host01 /]# ceph status
    [ceph: root@host01 /]# ceph fs set cephfs joinable false
    [ceph: root@host01 /]# ceph mds fail cephfs:1

  5. Set the noout, norecover, norebalance, nobackfill, nodown, and pause flags. Run the following on a node with the client keyrings, for example, the Ceph Monitor or OpenStack controller node:

    Example

    [ceph: root@host01 /]# ceph osd set noout
    [ceph: root@host01 /]# ceph osd set norecover
    [ceph: root@host01 /]# ceph osd set norebalance
    [ceph: root@host01 /]# ceph osd set nobackfill
    [ceph: root@host01 /]# ceph osd set nodown
    [ceph: root@host01 /]# ceph osd set pause

  6. Stop the MDS service.

    1. Fetch the MDS service name:

      Example

      [ceph: root@host01 /]# ceph orch ls --service-type mds

    2. Stop the MDS service using the fetched name in the previous step:

      Syntax

      ceph orch stop SERVICE-NAME

  7. Stop the Ceph Object Gateway services. Repeat for each deployed service.

    1. Fetch the Ceph Object Gateway service names:

      Example

      [ceph: root@host01 /]# ceph orch ls --service-type rgw

    2. Stop the Ceph Object Gateway service using the fetched name:

      Syntax

      ceph orch stop SERVICE-NAME

  8. Stop the Alertmanager service:

    Example

    [ceph: root@host01 /]# ceph orch stop alertmanager

  9. Stop the node-exporter service which is a part of the monitoring stack:

    Example

    [ceph: root@host01 /]# ceph orch stop node-exporter

  10. Stop the Prometheus service:

    Example

    [ceph: root@host01 /]# ceph orch stop prometheus

  11. Stop the Grafana dashboard service:

    Example

    [ceph: root@host01 /]# ceph orch stop grafana

  12. Stop the crash service:

    Example

    [ceph: root@host01 /]# ceph orch stop crash

  13. Shut down the OSD nodes from the cephadm node, one by one. Repeat this step for all the OSDs in the cluster.

    1. Fetch the OSD ID:

      Example

      [ceph: root@host01 /]# ceph orch ps --daemon-type=osd

    2. Shut down the OSD node using the OSD ID you fetched:

      Example

      [ceph: root@host01 /]# ceph orch daemon stop osd.1
      Scheduled to stop osd.1 on host 'host02'

  14. Stop the monitors one by one.

    1. Identify the hosts hosting the monitors:

      Example

      [ceph: root@host01 /]# ceph orch ps --daemon-type mon

    2. On each host, stop the monitor.

      1. Identify the systemctl unit name:

        Example

        [ceph: root@host01 /]# systemctl list-units ceph-* | grep mon

      2. Stop the service:

        Syntax

        systemct stop SERVICE-NAME

  15. Shut down all the hosts.

Rebooting the Red Hat Ceph Storage cluster

  1. If network equipment was involved, ensure it is powered ON and stable prior to powering ON any Ceph hosts or nodes.
  2. Power ON all the Ceph hosts.
  3. Log into the administration node from the Cephadm shell:

    Example

    [root@host01 ~]# cephadm shell

  4. Verify all the services are in running state:

    Example

    [ceph: root@host01 /]# ceph orch ls

  5. Ensure the cluster health is `Health_OK`status:

    Example

    [ceph: root@host01 /]# ceph -s

  6. Unset the noout, norecover, norebalance, nobackfill, nodown and pause flags. Run the following on a node with the client keyrings, for example, the Ceph Monitor or OpenStack controller node:

    Example

    [ceph: root@host01 /]# ceph osd unset noout
    [ceph: root@host01 /]# ceph osd unset norecover
    [ceph: root@host01 /]# ceph osd unset norebalance
    [ceph: root@host01 /]# ceph osd unset nobackfill
    [ceph: root@host01 /]# ceph osd unset nodown
    [ceph: root@host01 /]# ceph osd unset pause

  7. If you use the Ceph File System (CephFS), bring the CephFS cluster back up by setting the joinable flag to true:

    Syntax

    ceph fs set FS_NAME joinable true

    Example

    [ceph: root@host01 /]# ceph fs set cephfs joinable true

Verification

  • Verify the cluster is in healthy state (Health_OK and all PGs active+clean). Run ceph status on a node with the client keyrings, for example, the Ceph Monitor or OpenStack controller nodes, to ensure the cluster is healthy.

Example

[ceph: root@host01 /]# ceph -s

Additional Resources

Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.