Chapter 5. Ceph File System administration
As a storage administrator, you can perform common Ceph File System (CephFS) administrative tasks, such as:
-
Monitoring CephFS metrics in real-time, see Section 5.1, “Using the
cephfs-top
utility” - Mapping a directory to a particular MDS rank, see Section 5.5, “Mapping directory trees to Metadata Server daemon ranks”.
- Disassociating a directory from a MDS rank, see Section 5.6, “Disassociating directory trees from Metadata Server daemon ranks”.
- Adding a new data pool, see Section 5.7, “Adding data pools”.
- Working with quotas, see Chapter 7, Ceph File System quotas.
- Working with files and directory layouts, see Chapter 8, File and directory layouts.
- Removing a Ceph File System, see Section 5.9, “Removing a Ceph File System”.
- Client features, see Section 5.11, “Client features”.
-
Using the
ceph mds fail
command, see Section 5.10, “Using theceph mds fail
command”. - Manually evict a CephFS client, see Section 5.14, “Manually evicting a Ceph File System client”
Prerequisites
- A running, and healthy Red Hat Ceph Storage cluster.
-
Installation and configuration of the Ceph Metadata Server daemons (
ceph-mds
). - Create and mount a Ceph File System.
5.1. Using the cephfs-top
utility
The Ceph File System (CephFS) provides a top
-like utility to display metrics on Ceph File Systems in realtime. The cephfs-top
utility is a curses
-based Python script that uses the Ceph Manager stats
module to fetch and display client performance metrics.
Currently, the cephfs-top
utility supports nearly 10k clients.
Currently, not all of the performance stats are available in the Red Hat Enterprise Linux 9.2 kernel. cephfs-top
is supported on Red Hat Enterprise Linux 9 and above and uses one of the standard terminals in Red Hat Enterprise Linux.
The minimum compatible python version for cephfs-top
utility is 3.6.0.
Prerequisites
- A healthy and running Red Hat Ceph Storage cluster.
- Deployment of a Ceph File System.
- Root-level access to a Ceph client node.
-
Installation of the
cephfs-top
package.
Procedure
Enable the Red Hat Ceph Storage 8 tools repository, if it is not already enabled:
Red Hat Enterprise Linux 9
[root@client ~]# subscription-manager repos --enable=rhceph-8-tools-for-rhel-9-x86_64-rpms
Install the
cephfs-top
package:Example
[root@client ~]# dnf install cephfs-top
Enable the Ceph Manager
stats
plugin:Example
[root@client ~]# ceph mgr module enable stats
Create the
client.fstop
Ceph user:Example
[root@client ~]# ceph auth get-or-create client.fstop mon 'allow r' mds 'allow r' osd 'allow r' mgr 'allow r' > /etc/ceph/ceph.client.fstop.keyring
NoteOptionally, use the
--id
argument to specify a different Ceph user, other thanclient.fstop
.Start the
cephfs-top
utility:Example
[root@client ~]# cephfs-top cephfs-top - Wed Nov 30 15:26:05 2022 All Filesystem Info Total Client(s): 4 - 3 FUSE, 1 kclient, 0 libcephfs COMMANDS: m - select a filesystem | s - sort menu | l - limit number of clients | r - reset to default | q - quit client_id mount_root chit(%) dlease(%) ofiles oicaps oinodes rtio(MB) raio(MB) rsp(MB/s) wtio(MB) waio(MB) wsp(MB/s) rlatavg(ms) rlatsd(ms) wlatavg(ms) wlatsd(ms) mlatavg(ms) mlatsd(ms) mount_point@host/addr Filesystem: cephfs1 - 2 client(s) 4500 / 100.0 100.0 0 751 0 0.0 0.0 0.0 578.13 0.03 0.0 N/A N/A N/A N/A N/A N/A N/A@example/192.168.1.4 4501 / 100.0 0.0 0 1 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.41 0.0 /mnt/cephfs2@example/192.168.1.4 Filesystem: cephfs2 - 2 client(s) 4512 / 100.0 0.0 0 1 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0.0 /mnt/cephfs3@example/192.168.1.4 4518 / 100.0 0.0 0 1 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.52 0.0 /mnt/cephfs4@example/192.168.1.4
5.1.1. The cephfs-top
utility interactive commands
Select a particular file system and view the metrics related to that file system with the cephfs-top
utility interactive commands.
m
- Description
- Filesystem selection: Displays a menu of file systems for selection.
q
- Description
- Quit: Exits the utility if you are at the home screen with all file system information. If you are not at the home screen, it redirects you back to the home screen.
s
- Description
- Sort field selection: Designates the sort field. ‘cap_hit’ is the default.
l
- Description
- Client limit: Sets the limit on the number of clients to be displayed.
r
- Description
- Reset: Resets the sort field and limit value to the default.
The metrics display can be scrolled using the Arrow Keys, PgUp/PgDn, Home/End and mouse.
Example of entering and exiting the file system selection menu
[root@client ~]# m Filesystems Press "q" to go back to home (all filesystem info) screen cephfs01 cephfs02 [root@client ~]# q cephfs-top - Thu Oct 20 07:29:35 2022 Total Client(s): 3 - 2 FUSE, 1 kclient, 0 libcephfs
5.1.2. The cephfs-top
utility options
You can use the cephfs-top
utility command with various options.
Example
[root@client ~]# cephfs-top --selftest selftest ok
--cluster NAME_OF_THE_CLUSTER
- Description
-
With this option, you can connect to the non-default cluster name. The default name is
ceph
.
--id USER
- Description
-
This is a client which connects to the Ceph cluster and is
fstop
by default.
--selftest
- Description
-
With this option, you can perform a selftest. This mode performs a sanity check of
stats
module.
--conffile PATH_TO_THE_CONFIGURATION_FILE
- Description
- With this option, you can provide a path to the Ceph cluster configuration file.
-d/--delay INTERVAL_IN_SECONDS
- Description
The
cephfs-top
utility refreshes statistics every second by default. With this option, you can change a refresh interval.NoteInterval should be greater than or equal to 1 seconds. Fractional seconds are honored.
--dump
- Description
- With this option, you can dump the metrics to stdout without creating a curses display use.
--dumpfs FILE_SYSTEM_NAME
- Description
- With this option, you can dump the metrics of the given filesystem to stdout without creating a curses display use.
5.2. Using the MDS autoscaler module
The MDS Autoscaler Module monitors the Ceph File System (CephFS) to ensure sufficient MDS daemons are available. It works by adjusting the placement specification for the Orchestrator backend of the MDS service.
The module monitors the following file system settings to inform placement count adjustments:
-
max_mds
file system setting -
standby_count_wanted
file system setting
The Ceph monitor daemons are still responsible for promoting or stopping MDS according to these settings. The mds_autoscaler
simply adjusts the number of MDS which are spawned by the orchestrator.
Prerequisites
- A healthy and running Red Hat Ceph Storage cluster.
- Deployment of a Ceph File System.
- Root-level access to a Ceph Monitor node.
Procedure
Enable the MDS autoscaler module:
Example
[ceph: root@host01 /]# ceph mgr module enable mds_autoscaler
5.3. Unmounting Ceph File Systems mounted as kernel clients
How to unmount a Ceph File System that is mounted as a kernel client.
Prerequisites
- Root-level access to the node doing the mounting.
Procedure
To unmount a Ceph File System mounted as a kernel client:
Syntax
umount MOUNT_POINT
Example
[root@client ~]# umount /mnt/cephfs
Additional Resources
-
The
umount(8)
manual page
5.4. Unmounting Ceph File Systems mounted as FUSE clients
Unmounting a Ceph File System that is mounted as a File System in User Space (FUSE) client.
Prerequisites
- Root-level access to the FUSE client node.
Procedure
To unmount a Ceph File System mounted in FUSE:
Syntax
fusermount -u MOUNT_POINT
Example
[root@client ~]# fusermount -u /mnt/cephfs
Additional Resources
-
The
ceph-fuse(8)
manual page
5.5. Mapping directory trees to Metadata Server daemon ranks
You can map a directory and its subdirectories to a particular active Metadata Server (MDS) rank so that its metadata is only managed by the MDS daemon holding that rank. This approach enables you to evenly spread application load or the limit impact of users' metadata requests to the entire storage cluster.
An internal balancer already dynamically spreads the application load. Therefore, only map directory trees to ranks for certain carefully chosen applications.
In addition, when a directory is mapped to a rank, the balancer cannot split it. Consequently, a large number of operations within the mapped directory can overload the rank and the MDS daemon that manages it.
Prerequisites
- At least two active MDS daemons.
- User access to the CephFS client node.
-
Verify that the
attr
package is installed on the CephFS client node with a mounted Ceph File System.
Procedure
Add the
p
flag to the Ceph user’s capabilities:Syntax
ceph fs authorize FILE_SYSTEM_NAME client.CLIENT_NAME /DIRECTORY CAPABILITY [/DIRECTORY CAPABILITY] ...
Example
[user@client ~]$ ceph fs authorize cephfs_a client.1 /temp rwp client.1 key: AQBSdFhcGZFUDRAAcKhG9Cl2HPiDMMRv4DC43A== caps: [mds] allow r, allow rwp path=/temp caps: [mon] allow r caps: [osd] allow rw tag cephfs data=cephfs_a
Set the
ceph.dir.pin
extended attribute on a directory:Syntax
setfattr -n ceph.dir.pin -v RANK DIRECTORY
Example
[user@client ~]$ setfattr -n ceph.dir.pin -v 2 /temp
This example assigns the
/temp
directory and all of its subdirectories to rank 2.
Additional Resources
-
See the Layout, quota, snapshot, and network restrictions section in the Red Hat Ceph Storage File System Guide for more details about the
p
flag. - See the Manually pinning directory trees to a particular rank section in the Red Hat Ceph Storage File System Guide for more details.
- See the Configuring multiple active Metadata Server daemons section in the Red Hat Ceph Storage File System Guide for more details.
5.6. Disassociating directory trees from Metadata Server daemon ranks
Disassociate a directory from a particular active Metadata Server (MDS) rank.
Prerequisites
- User access to the Ceph File System (CephFS) client node.
-
Ensure that the
attr
package is installed on the client node with a mounted CephFS.
Procedure
Set the
ceph.dir.pin
extended attribute to -1 on a directory:Syntax
setfattr -n ceph.dir.pin -v -1 DIRECTORY
Example
[user@client ~]$ setfattr -n ceph.dir.pin -v -1 /home/ceph-user
NoteAny separately mapped subdirectories of
/home/ceph-user/
are not affected.
Additional Resources
- See the Mapping directory trees to Metadata Server daemon ranks section in Red Hat Ceph Storage File System Guide for more details.
5.7. Adding data pools
The Ceph File System (CephFS) supports adding more than one pool to be used for storing data. This can be useful for:
- Storing log data on reduced redundancy pools.
- Storing user home directories on an SSD or NVMe pool.
- Basic data segregation.
Before using another data pool in the Ceph File System, you must add it as described in this section.
By default, for storing file data, CephFS uses the initial data pool that was specified during its creation. To use a secondary data pool, you must also configure a part of the file system hierarchy to store file data in that pool or optionally within a namespace of that pool, using file and directory layouts.
Prerequisites
- Root-level access to the Ceph Monitor node.
Procedure
Create a new data pool:
Syntax
ceph osd pool create POOL_NAME
Replace:
-
POOL_NAME
with the name of the pool.
Example
[ceph: root@host01 /]# ceph osd pool create cephfs_data_ssd pool 'cephfs_data_ssd' created
-
Add the newly created pool under the control of the Metadata Servers:
Syntax
ceph fs add_data_pool FS_NAME POOL_NAME
Replace:
-
FS_NAME
with the name of the file system. -
POOL_NAME
with the name of the pool.
Example:
[ceph: root@host01 /]# ceph fs add_data_pool cephfs cephfs_data_ssd added data pool 6 to fsmap
-
Verify that the pool was successfully added:
Example
[ceph: root@host01 /]# ceph fs ls name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data cephfs_data_ssd]
Optional: Remove a data pool from the file system:
Syntax
ceph fs rm_data_pool FS_NAME POOL_NAME
Example:
[ceph: root@host01 /]# ceph fs rm_data_pool cephfs cephfs_data_ssd removed data pool 6 from fsmap
Verify that the pool was successfully removed:
Example
[ceph: root@host01 /]# ceph fs ls name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs.cephfs.data]
-
If you use the
cephx
authentication, make sure that clients can access the new pool.
Additional Resources
- See the File and directory layouts section in the Red Hat Ceph Storage File System Guide for details.
- See the Creating client users for a Ceph File System section in the Red Hat Ceph Storage File System Guide for details.
5.8. Taking down a Ceph File System cluster
You can take down Ceph File System (CephFS) cluster by setting the down
flag to true
. Doing this gracefully shuts down the Metadata Server (MDS) daemons by flushing journals to the metadata pool and stopping all client I/O.
You can also take the CephFS cluster down quickly to test the deletion of a file system and bring the Metadata Server (MDS) daemons down, for example, when practicing a disaster recovery scenario. Doing this sets the jointable
flag to prevent the MDS standby daemons from activating the file system.
Prerequisites
- Root-level access to a Ceph Monitor node.
Procedure
To mark the CephFS cluster down:
Syntax
ceph fs set FS_NAME down true
Example
[ceph: root@host01 /]# ceph fs set cephfs down true
To bring the CephFS cluster back up:
Syntax
ceph fs set FS_NAME down false
Example
[ceph: root@host01 /]# ceph fs set cephfs down false
or
To quickly take down a CephFS cluster:
Syntax
ceph fs fail FS_NAME
Example
[ceph: root@host01 /]# ceph fs fail cephfs
NoteTo get the CephFS cluster back up, set
cephfs
tojoinable
:Syntax
ceph fs set FS_NAME joinable true
Example
[ceph: root@host01 /]# ceph fs set cephfs joinable true cephfs marked joinable; MDS may join as newly active.
5.9. Removing a Ceph File System
You can remove a Ceph File System (CephFS). Before doing so, consider backing up all the data and verifying that all clients have unmounted the file system locally.
This operation is destructive and will make the data stored on the Ceph File System permanently inaccessible.
Prerequisites
- Back up your data.
- Root-level access to a Ceph Monitor node.
Procedure
Mark the storage cluster as down:
Syntax
ceph fs set FS_NAME down true
- Replace
- FS_NAME with the name of the Ceph File System you want to remove.
Example
[ceph: root@host01 /]# ceph fs set cephfs down true cephfs marked down.
Display the status of the Ceph File System:
ceph fs status
Example
[ceph: root@host01 /]# ceph fs status cephfs - 0 clients ====== +-------------------+----------+-------+-------+ | POOL | TYPE | USED | AVAIL | +-----------------+------------+-------+-------+ |cephfs.cephfs.meta | metadata | 31.5M | 52.6G| |cephfs.cephfs.data | data | 0 | 52.6G| +-----------------+----------+-------+---------+ STANDBY MDS cephfs.ceph-host01 cephfs.ceph-host02 cephfs.ceph-host03
Remove the Ceph File System:
Syntax
ceph fs rm FS_NAME --yes-i-really-mean-it
- Replace
- FS_NAME with the name of the Ceph File System you want to remove.
Example
[ceph: root@host01 /]# ceph fs rm cephfs --yes-i-really-mean-it
Verify that the file system has been successfully removed:
Example
[ceph: root@host01 /]# ceph fs ls
- Optional. Remove data and metadata pools associated with the removed file system.
Additional Resources
- See the Delete a Pool section in the Red Hat Ceph Storage Storage Strategies Guide.
5.10. Using the ceph mds fail
command
Use the ceph mds fail
command to:
-
Mark a MDS daemon as failed. If the daemon was active and a suitable standby daemon was available, and if the standby daemon was active after disabling the
standby-replay
configuration, using this command forces a failover to the standby daemon. By disabling thestandby-replay
daemon, this prevents newstandby-replay
daemons from being assigned. - Restart a running MDS daemon. If the daemon was active and a suitable standby daemon was available, the "failed" daemon becomes a standby daemon.
Prerequisites
- Installation and configuration of the Ceph MDS daemons.
Procedure
To fail a daemon:
Syntax
ceph mds fail MDS_NAME
Where MDS_NAME is the name of the
standby-replay
MDS node.Example
[ceph: root@host01 /]# ceph mds fail example01
NoteYou can find the Ceph MDS name from the
ceph fs status
command.
Additional Resources
- See the Decreasing the number of active Metadata Server daemons section in the Red Hat Ceph Storage File System Guide.
- See the Configuring the number of standby daemons section in the Red Hat Ceph Storage File System Guide.
- See the Metadata Server ranks section in the Red Hat Ceph Storage File System Guide.
5.11. Client features
At times you might want to set Ceph File System (CephFS) features that clients must support to enable them to use Ceph File Systems. Clients without these features might disrupt other CephFS clients, or behave in unexpected ways. Also, you might want to require new features to prevent older, and possibly buggy clients from connecting to a Ceph File System.
CephFS clients missing newly added features are evicted automatically.
You can list all the CephFS features by using the fs features ls
command. You can add or remove requirements by using the fs required_client_features
command.
Syntax
fs required_client_features FILE_SYSTEM_NAME add FEATURE_NAME fs required_client_features FILE_SYSTEM_NAME rm FEATURE_NAME
Feature Descriptions
reply_encoding
- Description
- The Ceph Metadata Server (MDS) encodes reply requests in extensible format, if the client supports this feature.
reclaim_client
- Description
- The Ceph MDS allows a new client to reclaim another, perhaps a dead, client’s state. This feature is used by NFS Ganesha.
lazy_caps_wanted
- Description
- When a stale client resumes, the Ceph MDS only needs to re-issue the capabilities that are explicitly wanted, if the client supports this feature.
multi_reconnect
- Description
- After a Ceph MDS failover event, the client sends a reconnect message to the MDS to reestablish cache states. A client can split large reconnect messages into multiple messages.
deleg_ino
- Description
- A Ceph MDS delegates inode numbers to a client, if the client supports this feature. Delegating inode numbers is a prerequisite for a client to do async file creation.
metric_collect
- Description
- CephFS clients can send performance metrics to a Ceph MDS.
alternate_name
- Description
- CephFS clients can set and understand alternate names for directory entries. This feature allows for encrypted file names.
5.12. Ceph File System client evictions
When a Ceph File System (CephFS) client is unresponsive or misbehaving, it might be necessary to forcibly terminate, or evict it from accessing the CephFS. Evicting a CephFS client prevents it from communicating further with Metadata Server (MDS) daemons and Ceph OSD daemons. If a CephFS client is buffering I/O to the CephFS at the time of eviction, then any un-flushed data will be lost. The CephFS client eviction process applies to all client types: FUSE mounts, kernel mounts, NFS gateways, and any process using libcephfs
API library.
You can evict CephFS clients automatically, if they fail to communicate promptly with the MDS daemon, or manually.
Automatic Client Eviction
These scenarios cause an automatic CephFS client eviction:
-
If a CephFS client has not communicated with the active MDS daemon for over the default of 300 seconds, or as set by the
session_autoclose
option. -
If the
mds_cap_revoke_eviction_timeout
option is set, and a CephFS client has not responded to the cap revoke messages for over the set amount of seconds. Themds_cap_revoke_eviction_timeout
option is disabled by default. -
During MDS startup or failover, the MDS daemon goes through a reconnect phase waiting for all the CephFS clients to connect to the new MDS daemon. If any CephFS clients fail to reconnect within the default time window of 45 seconds, or as set by the
mds_reconnect_timeout
option.
Additional Resources
- See the Manually evicting a Ceph File System client section in the Red Hat Ceph Storage File System Guide for more details.
5.13. Blocklist Ceph File System clients
Ceph File System (CephFS) client blocklisting is enabled by default. When you send an eviction command to a single Metadata Server (MDS) daemon, it propagates the blocklist to the other MDS daemons. This is to prevent the CephFS client from accessing any data objects, so it is necessary to update the other CephFS clients, and MDS daemons with the latest Ceph OSD map, which includes the blocklisted client entries.
An internal “osdmap epoch barrier” mechanism is used when updating the Ceph OSD map. The purpose of the barrier is to verify the CephFS clients receiving the capabilities have a sufficiently recent Ceph OSD map, before any capabilities are assigned that might allow access to the same RADOS objects, as to not race with canceled operations, such as, from ENOSPC or blocklisted clients from evictions.
If you are experiencing frequent CephFS client evictions due to slow nodes or an unreliable network, and you cannot fix the underlying issue, then you can ask the MDS to be less strict. It is possible to respond to slow CephFS clients by simply dropping their MDS sessions, but permit the CephFS client to re-open sessions and to continue talking to Ceph OSDs. By setting the mds_session_blocklist_on_timeout
and mds_session_blocklist_on_evict
options to false
enables this mode.
When blocklisting is disabled, the evicted CephFS client has only an effect on the MDS daemon you send the command to. On a system with multiple active MDS daemons, you need to send an eviction command to each active daemon.
5.14. Manually evicting a Ceph File System client
You might want to manually evict a Ceph File System (CephFS) client, if the client is misbehaving and you do not have access to the client node, or if a client dies, and you do not want to wait for the client session to time out.
Prerequisites
- Root-level access to the Ceph Monitor node.
Procedure
Review the client list:
Syntax
ceph tell DAEMON_NAME client ls
Example
[ceph: root@host01 /]# ceph tell mds.0 client ls [ { "id": 4305, "num_leases": 0, "num_caps": 3, "state": "open", "replay_requests": 0, "completed_requests": 0, "reconnecting": false, "inst": "client.4305 172.21.9.34:0/422650892", "client_metadata": { "ceph_sha1": "79f0367338897c8c6d9805eb8c9ad24af0dcd9c7", "ceph_version": "ceph version 16.2.8-65.el8cp (79f0367338897c8c6d9805eb8c9ad24af0dcd9c7)", "entity_id": "0", "hostname": "senta04", "mount_point": "/tmp/tmpcMpF1b/mnt.0", "pid": "29377", "root": "/" } } ]
Evict the specified CephFS client:
Syntax
ceph tell DAEMON_NAME client evict id=ID_NUMBER
Example
[ceph: root@host01 /]# ceph tell mds.0 client evict id=4305
5.15. Removing a Ceph File System client from the blocklist
In some situations, it can be useful to allow a previously blocklisted Ceph File System (CephFS) client to reconnect to the storage cluster.
Removing a CephFS client from the blocklist puts data integrity at risk, and does not guarantee a fully healthy, and functional CephFS client as a result. The best way to get a fully healthy CephFS client back after an eviction, is to unmount the CephFS client and do a fresh mount. If other CephFS clients are accessing files that the blocklisted CephFS client was buffering I/O to, it can result in data corruption.
Prerequisites
- Root-level access to the Ceph Monitor node.
Procedure
Review the blocklist:
Example
[ceph: root@host01 /]# ceph osd blocklist ls listed 1 entries 127.0.0.1:0/3710147553 2022-05-09 11:32:24.716146
Remove the CephFS client from the blocklist:
Syntax
ceph osd blocklist rm CLIENT_NAME_OR_IP_ADDR
Example
[ceph: root@host01 /]# ceph osd blocklist rm 127.0.0.1:0/3710147553 un-blocklisting 127.0.0.1:0/3710147553
Optionally, you can have kernel-based CephFS clients automatically reconnect when removing them from the blocklist. On the kernel-based CephFS client, set the following option to
clean
either when doing a manual mount, or automatically mounting with an entry in the/etc/fstab
file:recover_session=clean
Optionally, you can have FUSE-based CephFS clients automatically reconnect when removing them from the blocklist. On the FUSE client, set the following option to
true
either when doing a manual mount, or automatically mounting with an entry in the/etc/fstab
file:client_reconnect_stale=true
Additional Resources
- See the Mounting the Ceph File System as a FUSE client section in the Red Hat Ceph Storage File System Guide for more information.