
Ce contenu n'est pas disponible dans la langue sélectionnée.

Chapter 5. Ceph File System administration

download PDF

As a storage administrator, you can perform common Ceph File System (CephFS) administrative tasks, such as:


  • A running, and healthy Red Hat Ceph Storage cluster.
  • Installation and configuration of the Ceph Metadata Server daemons (ceph-mds).
  • Create and mount a Ceph File System.

5.1. Using the cephfs-top utility

The Ceph File System (CephFS) provides a top-like utility to display metrics on Ceph File Systems in realtime. The cephfs-top utility is a curses-based Python script that uses the Ceph Manager stats module to fetch and display client performance metrics.

Currently, the cephfs-top utility supports nearly 10k clients.


Currently, not all of the performance stats are available in the Red Hat Enterprise Linux 9.2 kernel. cephfs-top is supported on Red Hat Enterprise Linux 9 and above and uses one of the standard terminals in Red Hat Enterprise Linux.


The minimum compatible python version for cephfs-top utility is 3.6.0.


  • A healthy and running Red Hat Ceph Storage cluster.
  • Deployment of a Ceph File System.
  • Root-level access to a Ceph client node.
  • Installation of the cephfs-top package.


  1. Enable the Red Hat Ceph Storage 7 tools repository, if it is not already enabled:

    Red Hat Enterprise Linux 9

    [root@client ~]# subscription-manager repos --enable=rhceph-7-tools-for-rhel-9-x86_64-rpms

  2. Install the cephfs-top package:


    [root@client ~]# dnf install cephfs-top

  3. Enable the Ceph Manager stats plugin:


    [root@client ~]# ceph mgr module enable stats

  4. Create the client.fstop Ceph user:


    [root@client ~]# ceph auth get-or-create client.fstop mon 'allow r' mds 'allow r' osd 'allow r' mgr 'allow r' > /etc/ceph/ceph.client.fstop.keyring


    Optionally, use the --id argument to specify a different Ceph user, other than client.fstop.

  5. Start the cephfs-top utility:


    [root@client ~]# cephfs-top
    cephfs-top - Wed Nov 30 15:26:05 2022
    All Filesystem Info
    Total Client(s): 4 - 3 FUSE, 1 kclient, 0 libcephfs
    COMMANDS: m - select a filesystem | s - sort menu | l - limit number of clients | r - reset to default | q - quit
      client_id mount_root chit(%) dlease(%) ofiles oicaps oinodes rtio(MB) raio(MB) rsp(MB/s) wtio(MB) waio(MB) wsp(MB/s) rlatavg(ms) rlatsd(ms) wlatavg(ms) wlatsd(ms) mlatavg(ms) mlatsd(ms) mount_point@host/addr
    Filesystem: cephfs1 - 2 client(s)
      4500     /          100.0   100.0	     0	    751    0       0.0	    0.0	     0.0	   578.13   0.03     0.0       N/A         N/A        N/A         N/A       N/A        N/A       N/A@example/
      4501     /          100.0   0.0      0	    1    0       0.0	    0.0	     0.0	   0.0   0.0     0.0       0.0         0.0        0.0         0.0        0.41        0.0       /mnt/cephfs2@example/
    Filesystem: cephfs2 - 2 client(s)
      4512     /          100.0   0.0	     0	    1      0       0.0	    0.0	     0.0	   0.0      0.0      0.0       0.0         0.0        0.0         0.0        0.4        0.0        /mnt/cephfs3@example/
      4518     /          100.0   0.0	     0	    1      0       0.0	    0.0	     0.0	   0.0      0.0      0.0       0.0         0.0        0.0         0.0        0.52        0.0        /mnt/cephfs4@example/

5.1.1. The cephfs-top utility interactive commands

Select a particular file system and view the metrics related to that file system with the cephfs-top utility interactive commands.

Filesystem selection: Displays a menu of file systems for selection.
Quit: Exits the utility if you are at the home screen with all file system information. If you are not at the home screen, it redirects you back to the home screen.
Sort field selection: Designates the sort field. ‘cap_hit’ is the default.
Client limit: Sets the limit on the number of clients to be displayed.
Reset: Resets the sort field and limit value to the default.

The metrics display can be scrolled using the Arrow Keys, PgUp/PgDn, Home/End and mouse.

Example of entering and exiting the file system selection menu

[root@client ~]# m

Press "q" to go back to home (all filesystem info) screen

[root@client ~]# q

cephfs-top - Thu Oct 20 07:29:35 2022
Total Client(s): 3 - 2 FUSE, 1 kclient, 0 libcephfs

5.1.2. The cephfs-top utility options

You can use the cephfs-top utility command with various options.


[root@client ~]# cephfs-top --selftest
selftest ok

With this option, you can connect to the non-default cluster name. The default name is ceph.
--id USER
This is a client which connects to the Ceph cluster and is fstop by default.
With this option, you can perform a selftest. This mode performs a sanity check of stats module.
With this option, you can provide a path to the Ceph cluster configuration file.

The cephfs-top utility refreshes statistics every second by default. With this option, you can change a refresh interval.


Interval should be greater than or equal to 1 seconds. Fractional seconds are honored.

With this option, you can dump the metrics to stdout without creating a curses display use.
With this option, you can dump the metrics of the given filesystem to stdout without creating a curses display use.

5.2. Using the MDS autoscaler module

The MDS Autoscaler Module monitors the Ceph File System (CephFS) to ensure sufficient MDS daemons are available. It works by adjusting the placement specification for the Orchestrator backend of the MDS service.

The module monitors the following file system settings to inform placement count adjustments:

  • max_mds file system setting
  • standby_count_wanted file system setting

The Ceph monitor daemons are still responsible for promoting or stopping MDS according to these settings. The mds_autoscaler simply adjusts the number of MDS which are spawned by the orchestrator.


  • A healthy and running Red Hat Ceph Storage cluster.
  • Deployment of a Ceph File System.
  • Root-level access to a Ceph Monitor node.


  • Enable the MDS autoscaler module:


    [ceph: root@host01 /]# ceph mgr module enable mds_autoscaler

5.3. Unmounting Ceph File Systems mounted as kernel clients

How to unmount a Ceph File System that is mounted as a kernel client.


  • Root-level access to the node doing the mounting.


  • To unmount a Ceph File System mounted as a kernel client:


    umount MOUNT_POINT


    [root@client ~]# umount /mnt/cephfs

Additional Resources

  • The umount(8) manual page

5.4. Unmounting Ceph File Systems mounted as FUSE clients

Unmounting a Ceph File System that is mounted as a File System in User Space (FUSE) client.


  • Root-level access to the FUSE client node.


  • To unmount a Ceph File System mounted in FUSE:


    fusermount -u MOUNT_POINT


    [root@client ~]# fusermount -u /mnt/cephfs

Additional Resources

  • The ceph-fuse(8) manual page

5.5. Mapping directory trees to Metadata Server daemon ranks

You can map a directory and its subdirectories to a particular active Metadata Server (MDS) rank so that its metadata is only managed by the MDS daemon holding that rank. This approach enables you to evenly spread application load or the limit impact of users' metadata requests to the entire storage cluster.


An internal balancer already dynamically spreads the application load. Therefore, only map directory trees to ranks for certain carefully chosen applications.

In addition, when a directory is mapped to a rank, the balancer cannot split it. Consequently, a large number of operations within the mapped directory can overload the rank and the MDS daemon that manages it.


  • At least two active MDS daemons.
  • User access to the CephFS client node.
  • Verify that the attr package is installed on the CephFS client node with a mounted Ceph File System.


  1. Add the p flag to the Ceph user’s capabilities:




    [user@client ~]$ ceph fs authorize cephfs_a client.1 /temp rwp
      key: AQBSdFhcGZFUDRAAcKhG9Cl2HPiDMMRv4DC43A==
      caps: [mds] allow r, allow rwp path=/temp
      caps: [mon] allow r
      caps: [osd] allow rw tag cephfs data=cephfs_a

  2. Set the extended attribute on a directory:


    setfattr -n -v RANK DIRECTORY


    [user@client ~]$ setfattr -n -v 2 /temp

    This example assigns the /temp directory and all of its subdirectories to rank 2.

Additional Resources

5.6. Disassociating directory trees from Metadata Server daemon ranks

Disassociate a directory from a particular active Metadata Server (MDS) rank.


  • User access to the Ceph File System (CephFS) client node.
  • Ensure that the attr package is installed on the client node with a mounted CephFS.


  • Set the extended attribute to -1 on a directory:


    setfattr -n -v -1 DIRECTORY


    [user@client ~]$ setfattr -n -v -1 /home/ceph-user


    Any separately mapped subdirectories of /home/ceph-user/ are not affected.

Additional Resources

5.7. Adding data pools

The Ceph File System (CephFS) supports adding more than one pool to be used for storing data. This can be useful for:

  • Storing log data on reduced redundancy pools.
  • Storing user home directories on an SSD or NVMe pool.
  • Basic data segregation.

Before using another data pool in the Ceph File System, you must add it as described in this section.

By default, for storing file data, CephFS uses the initial data pool that was specified during its creation. To use a secondary data pool, you must also configure a part of the file system hierarchy to store file data in that pool or optionally within a namespace of that pool, using file and directory layouts.


  • Root-level access to the Ceph Monitor node.


  1. Create a new data pool:


    ceph osd pool create POOL_NAME


    • POOL_NAME with the name of the pool.


    [ceph: root@host01 /]# ceph osd pool create cephfs_data_ssd
    pool 'cephfs_data_ssd' created

  2. Add the newly created pool under the control of the Metadata Servers:


    ceph fs add_data_pool FS_NAME POOL_NAME


    • FS_NAME with the name of the file system.
    • POOL_NAME with the name of the pool.


    [ceph: root@host01 /]# ceph fs add_data_pool cephfs cephfs_data_ssd
    added data pool 6 to fsmap

  3. Verify that the pool was successfully added:


    [ceph: root@host01 /]# ceph fs ls
    name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data cephfs_data_ssd]

  4. Optional: Remove a data pool from the file system:


    ceph fs rm_data_pool FS_NAME POOL_NAME


    [ceph: root@host01 /]# ceph fs rm_data_pool cephfs cephfs_data_ssd
    removed data pool 6 from fsmap

    1. Verify that the pool was successfully removed:


      [ceph: root@host01 /]# ceph fs ls
      name: cephfs, metadata pool: cephfs_metadata, data pools: []

  5. If you use the cephx authentication, make sure that clients can access the new pool.

Additional Resources

5.8. Taking down a Ceph File System cluster

You can take down Ceph File System (CephFS) cluster by setting the down flag to true. Doing this gracefully shuts down the Metadata Server (MDS) daemons by flushing journals to the metadata pool and stopping all client I/O.

You can also take the CephFS cluster down quickly to test the deletion of a file system and bring the Metadata Server (MDS) daemons down, for example, when practicing a disaster recovery scenario. Doing this sets the jointable flag to prevent the MDS standby daemons from activating the file system.


  • Root-level access to a Ceph Monitor node.


  1. To mark the CephFS cluster down:


    ceph fs set FS_NAME down true


    [ceph: root@host01 /]# ceph fs set cephfs down true

    1. To bring the CephFS cluster back up:


      ceph fs set FS_NAME down false


      [ceph: root@host01 /]# ceph fs set cephfs down false


  1. To quickly take down a CephFS cluster:


    ceph fs fail FS_NAME


    [ceph: root@host01 /]# ceph fs fail cephfs


    To get the CephFS cluster back up, set cephfs to joinable:


    ceph fs set FS_NAME joinable true


    [ceph: root@host01 /]# ceph fs set cephfs joinable true
    cephfs marked joinable; MDS may join as newly active.

5.9. Removing a Ceph File System

You can remove a Ceph File System (CephFS). Before doing so, consider backing up all the data and verifying that all clients have unmounted the file system locally.


This operation is destructive and will make the data stored on the Ceph File System permanently inaccessible.


  • Back up your data.
  • Root-level access to a Ceph Monitor node.


  1. Mark the storage cluster as down:


    ceph fs set FS_NAME down true

    • FS_NAME with the name of the Ceph File System you want to remove.


    [ceph: root@host01 /]# ceph fs set cephfs down true
    cephfs marked down.

  2. Display the status of the Ceph File System:

    ceph fs status


    [ceph: root@host01 /]# ceph fs status
    cephfs - 0 clients
    |       POOL        |   TYPE   |  USED | AVAIL |
    |cephfs.cephfs.meta | metadata | 31.5M |  52.6G|
    | |   data   |    0  |  52.6G|
                   STANDBY MDS

  3. Remove the Ceph File System:


    ceph fs rm FS_NAME --yes-i-really-mean-it

    • FS_NAME with the name of the Ceph File System you want to remove.


    [ceph: root@host01 /]# ceph fs rm cephfs --yes-i-really-mean-it

  4. Verify that the file system has been successfully removed:


    [ceph: root@host01 /]# ceph fs ls

  5. Optional. Remove data and metadata pools associated with the removed file system.

Additional Resources

  • See the Delete a Pool section in the Red Hat Ceph Storage Storage Strategies Guide.

5.10. Using the ceph mds fail command

Use the ceph mds fail command to:

  • Mark a MDS daemon as failed. If the daemon was active and a suitable standby daemon was available, and if the standby daemon was active after disabling the standby-replay configuration, using this command forces a failover to the standby daemon. By disabling the standby-replay daemon, this prevents new standby-replay daemons from being assigned.
  • Restart a running MDS daemon. If the daemon was active and a suitable standby daemon was available, the "failed" daemon becomes a standby daemon.


  • Installation and configuration of the Ceph MDS daemons.


  • To fail a daemon:


    ceph mds fail MDS_NAME

    Where MDS_NAME is the name of the standby-replay MDS node.


    [ceph: root@host01 /]# ceph mds fail example01


    You can find the Ceph MDS name from the ceph fs status command.

Additional Resources

5.11. Client features

At times you might want to set Ceph File System (CephFS) features that clients must support to enable them to use Ceph File Systems. Clients without these features might disrupt other CephFS clients, or behave in unexpected ways. Also, you might want to require new features to prevent older, and possibly buggy clients from connecting to a Ceph File System.


CephFS clients missing newly added features are evicted automatically.

You can list all the CephFS features by using the fs features ls command. You can add or remove requirements by using the fs required_client_features command.


fs required_client_features FILE_SYSTEM_NAME add FEATURE_NAME
fs required_client_features FILE_SYSTEM_NAME rm FEATURE_NAME

Feature Descriptions

The Ceph Metadata Server (MDS) encodes reply requests in extensible format, if the client supports this feature.
The Ceph MDS allows a new client to reclaim another, perhaps a dead, client’s state. This feature is used by NFS Ganesha.
When a stale client resumes, the Ceph MDS only needs to re-issue the capabilities that are explicitly wanted, if the client supports this feature.
After a Ceph MDS failover event, the client sends a reconnect message to the MDS to reestablish cache states. A client can split large reconnect messages into multiple messages.
A Ceph MDS delegates inode numbers to a client, if the client supports this feature. Delegating inode numbers is a prerequisite for a client to do async file creation.
CephFS clients can send performance metrics to a Ceph MDS.
CephFS clients can set and understand alternate names for directory entries. This feature allows for encrypted file names.

5.12. Ceph File System client evictions

When a Ceph File System (CephFS) client is unresponsive or misbehaving, it might be necessary to forcibly terminate, or evict it from accessing the CephFS. Evicting a CephFS client prevents it from communicating further with Metadata Server (MDS) daemons and Ceph OSD daemons. If a CephFS client is buffering I/O to the CephFS at the time of eviction, then any un-flushed data will be lost. The CephFS client eviction process applies to all client types: FUSE mounts, kernel mounts, NFS gateways, and any process using libcephfs API library.

You can evict CephFS clients automatically, if they fail to communicate promptly with the MDS daemon, or manually.

Automatic Client Eviction

These scenarios cause an automatic CephFS client eviction:

  • If a CephFS client has not communicated with the active MDS daemon for over the default of 300 seconds, or as set by the session_autoclose option.
  • If the mds_cap_revoke_eviction_timeout option is set, and a CephFS client has not responded to the cap revoke messages for over the set amount of seconds. The mds_cap_revoke_eviction_timeout option is disabled by default.
  • During MDS startup or failover, the MDS daemon goes through a reconnect phase waiting for all the CephFS clients to connect to the new MDS daemon. If any CephFS clients fail to reconnect within the default time window of 45 seconds, or as set by the mds_reconnect_timeout option.

Additional Resources

5.13. Blocklist Ceph File System clients

Ceph File System (CephFS) client blocklisting is enabled by default. When you send an eviction command to a single Metadata Server (MDS) daemon, it propagates the blocklist to the other MDS daemons. This is to prevent the CephFS client from accessing any data objects, so it is necessary to update the other CephFS clients, and MDS daemons with the latest Ceph OSD map, which includes the blocklisted client entries.

An internal “osdmap epoch barrier” mechanism is used when updating the Ceph OSD map. The purpose of the barrier is to verify the CephFS clients receiving the capabilities have a sufficiently recent Ceph OSD map, before any capabilities are assigned that might allow access to the same RADOS objects, as to not race with canceled operations, such as, from ENOSPC or blocklisted clients from evictions.

If you are experiencing frequent CephFS client evictions due to slow nodes or an unreliable network, and you cannot fix the underlying issue, then you can ask the MDS to be less strict. It is possible to respond to slow CephFS clients by simply dropping their MDS sessions, but permit the CephFS client to re-open sessions and to continue talking to Ceph OSDs. By setting the mds_session_blocklist_on_timeout and mds_session_blocklist_on_evict options to false enables this mode.


When blocklisting is disabled, the evicted CephFS client has only an effect on the MDS daemon you send the command to. On a system with multiple active MDS daemons, you need to send an eviction command to each active daemon.

5.14. Manually evicting a Ceph File System client

You might want to manually evict a Ceph File System (CephFS) client, if the client is misbehaving and you do not have access to the client node, or if a client dies, and you do not want to wait for the client session to time out.


  • Root-level access to the Ceph Monitor node.


  1. Review the client list:


    ceph tell DAEMON_NAME client ls


    [ceph: root@host01 /]# ceph tell mds.0 client ls
            "id": 4305,
            "num_leases": 0,
            "num_caps": 3,
            "state": "open",
            "replay_requests": 0,
            "completed_requests": 0,
            "reconnecting": false,
            "inst": "client.4305",
            "client_metadata": {
                "ceph_sha1": "79f0367338897c8c6d9805eb8c9ad24af0dcd9c7",
                "ceph_version": "ceph version 16.2.8-65.el8cp (79f0367338897c8c6d9805eb8c9ad24af0dcd9c7)",
                "entity_id": "0",
                "hostname": "senta04",
                "mount_point": "/tmp/tmpcMpF1b/mnt.0",
                "pid": "29377",
                "root": "/"

  2. Evict the specified CephFS client:


    ceph tell DAEMON_NAME client evict id=ID_NUMBER


    [ceph: root@host01 /]# ceph tell mds.0 client evict id=4305

5.15. Removing a Ceph File System client from the blocklist

In some situations, it can be useful to allow a previously blocklisted Ceph File System (CephFS) client to reconnect to the storage cluster.


Removing a CephFS client from the blocklist puts data integrity at risk, and does not guarantee a fully healthy, and functional CephFS client as a result. The best way to get a fully healthy CephFS client back after an eviction, is to unmount the CephFS client and do a fresh mount. If other CephFS clients are accessing files that the blocklisted CephFS client was buffering I/O to, it can result in data corruption.


  • Root-level access to the Ceph Monitor node.


  1. Review the blocklist:


    [ceph: root@host01 /]# ceph osd blocklist ls
    listed 1 entries 2022-05-09 11:32:24.716146

  2. Remove the CephFS client from the blocklist:


    ceph osd blocklist rm CLIENT_NAME_OR_IP_ADDR


    [ceph: root@host01 /]# ceph osd blocklist rm

  3. Optionally, you can have kernel-based CephFS clients automatically reconnect when removing them from the blocklist. On the kernel-based CephFS client, set the following option to clean either when doing a manual mount, or automatically mounting with an entry in the /etc/fstab file:

  4. Optionally, you can have FUSE-based CephFS clients automatically reconnect when removing them from the blocklist. On the FUSE client, set the following option to true either when doing a manual mount, or automatically mounting with an entry in the /etc/fstab file:


Additional Resources

Red Hat logoGithubRedditYoutubeTwitter


Essayez, achetez et vendez


À propos de la documentation Red Hat

Nous aidons les utilisateurs de Red Hat à innover et à atteindre leurs objectifs grâce à nos produits et services avec un contenu auquel ils peuvent faire confiance.

Rendre l’open source plus inclusif

Red Hat s'engage à remplacer le langage problématique dans notre code, notre documentation et nos propriétés Web. Pour plus de détails, consultez leBlog Red Hat.

À propos de Red Hat

Nous proposons des solutions renforcées qui facilitent le travail des entreprises sur plusieurs plates-formes et environnements, du centre de données central à la périphérie du réseau.

© 2024 Red Hat, Inc.