Chapter 4. Ceph File System administration
As a storage administrator, you can perform common Ceph File System (CephFS) administrative tasks, such as:
- To map a directory to a particular MDS rank, see Section 4.4, “Mapping directory trees to Metadata Server daemon ranks”.
- To disassociate a directory from a MDS rank, see Section 4.5, “Disassociating directory trees from Metadata Server daemon ranks”.
- To work with files and directory layouts, see Section 4.8, “Working with File and Directory Layouts”.
- To add a new data pool, see Section 4.6, “Adding data pools”.
- To work with quotas, see Section 4.7, “Working with Ceph File System quotas”.
- To remove a Ceph File System using the command-line interface, see Section 4.12, “Removing a Ceph File System using the command-line interface”.
- To remove a Ceph File System using Ansible, see Section 4.13, “Removing a Ceph File System using Ansible”.
- To set a minimum client version, see Section 4.14, “Setting a minimum client version”.
-
To use the
ceph mds fail
command, see Section 4.15, “Using theceph mds fail
command”.
4.1. Prerequisites
- A running, and healthy Red Hat Ceph Storage cluster.
-
Installation and configuration of the Ceph Metadata Server daemons (
ceph-mds
). - Create and mount the Ceph File System.
4.2. Unmounting Ceph File Systems mounted as kernel clients
How to unmount a Ceph File System that is mounted as a kernel client.
Prerequisites
- Root-level access to the node doing the mounting.
Procedure
To unmount a Ceph File System mounted as a kernel client:
Syntax
umount MOUNT_POINT
Example
[root@client ~]# umount /mnt/cephfs
Additional Resources
-
The
umount(8)
manual page
4.3. Unmounting Ceph File Systems mounted as FUSE clients
Unmounting a Ceph File System that is mounted as a File System in User Space (FUSE) client.
Prerequisites
- Root-level access to the FUSE client node.
Procedure
To unmount a Ceph File System mounted in FUSE:
Syntax
fusermount -u MOUNT_POINT
Example
[root@client ~]# fusermount -u /mnt/cephfs
Additional Resources
-
The
ceph-fuse(8)
manual page
4.4. Mapping directory trees to Metadata Server daemon ranks
To map a directory and its subdirectories to a particular active Metadata Server (MDS) rank so that its metadata is only managed by the MDS daemon holding that rank. This approach enables you to evenly spread application load or limit impact of users' metadata requests to the entire storage cluster.
An internal balancer already dynamically spreads the application load. Therefore, only map directory trees to ranks for certain carefully chosen applications.
In addition, when a directory is mapped to a rank, the balancer cannot split it. Consequently, a large number of operations within the mapped directory can overload the rank and the MDS daemon that manages it.
Prerequisites
- At least two active MDS daemons.
- User access to the CephFS client node.
-
Verify that the
attr
package is installed on the CephFS client node with a mounted Ceph File System.
Procedure
Add the
p
flag to the Ceph user’s capabilities:Syntax
ceph fs authorize FILE_SYSTEM_NAME client.CLIENT_NAME /DIRECTORY CAPABILITY [/DIRECTORY CAPABILITY] ...
Example
[user@client ~]$ ceph fs authorize cephfs_a client.1 /temp rwp client.1 key: AQBSdFhcGZFUDRAAcKhG9Cl2HPiDMMRv4DC43A== caps: [mds] allow r, allow rwp path=/temp caps: [mon] allow r caps: [osd] allow rw tag cephfs data=cephfs_a
Set the
ceph.dir.pin
extended attribute on a directory:Syntax
setfattr -n ceph.dir.pin -v RANK DIRECTORY
Example
[user@client ~]$ setfattr -n ceph.dir.pin -v 2 /temp
This example assigns the
/temp
directory and all of its subdirectories to rank 2.
Additional Resources
-
See the Layout, quota, snapshot, and network restrictions section in the Red Hat Ceph Storage File System Guide for more details about the
p
flag. - See the Disassociating directory trees from Metadata Server daemon ranks section in the Red Hat Ceph Storage File System Guide for more details.
- See the Configuring multiple active Metadata Server daemons section in the Red Hat Ceph Storage File System Guide for more details.
4.5. Disassociating directory trees from Metadata Server daemon ranks
Disassociate a directory from a particular active Metadata Server (MDS) rank.
Prerequisites
- User access to the Ceph File System (CephFS) client node.
-
Ensure that the
attr
package is installed on the client node with a mounted CephFS.
Procedure
Set the
ceph.dir.pin
extended attribute to -1 on a directory:Syntax
setfattr -n ceph.dir.pin -v -1 DIRECTORY
Example
[user@client ~]$ serfattr -n ceph.dir.pin -v -1 /home/ceph-user
NoteAny separately mapped subdirectories of
/home/ceph-user/
are not affected.
Additional Resources
- See the Mapping Directory Trees to MDS Ranks section in Red Hat Ceph Storage File System Guide for more details.
4.6. Adding data pools
The Ceph File System (CephFS) supports adding more than one pool to be used for storing data. This can be useful for:
- Storing log data on reduced redundancy pools
- Storing user home directories on an SSD or NVMe pool
- Basic data segregation.
Before using another data pool in the Ceph File System, you must add it as described in this section.
By default, for storing file data, CephFS uses the initial data pool that was specified during its creation. To use a secondary data pool, you must also configure a part of the file system hierarchy to store file data in that pool or optionally within a namespace of that pool, using file and directory layouts.
Prerequisites
- Root-level access to the Ceph Monitor node.
Procedure
Create a new data pool:
Syntax
ceph osd pool create POOL_NAME PG_NUMBER
Replace:
-
POOL_NAME
with the name of the pool. -
PG_NUMBER
with the number of placement groups (PGs).
Example
[root@mon ~]# ceph osd pool create cephfs_data_ssd 64 pool 'cephfs_data_ssd' created
-
Add the newly created pool under the control of the Metadata Servers:
Syntax
ceph fs add_data_pool FS_NAME POOL_NAME
Replace:
-
FS_NAME
with the name of the file system. -
POOL_NAME
with the name of the pool.
Example:
[root@mon ~]# ceph fs add_data_pool cephfs cephfs_data_ssd added data pool 6 to fsmap
-
Verify that the pool was successfully added:
Example
[root@mon ~]# ceph fs ls name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data cephfs_data_ssd]
-
If you use the
cephx
authentication, make sure that clients can access the new pool.
Additional Resources
- See the Working with File and Directory Layouts for details.
- See the Creating Ceph File System Client Users for details.
4.7. Working with Ceph File System quotas
As a storage administrator, you can view, set, and remove quotas on any directory in the file system. You can place quota restrictions on the number of bytes or the number of files within the directory.
4.7.1. Prerequisites
-
Make sure that the
attr
package is installed.
4.7.2. Ceph File System quotas
The Ceph File System (CephFS) quotas allow you to restrict the number of bytes or the number of files stored in the directory structure.
Limitations
- CephFS quotas rely on the cooperation of the client mounting the file system to stop writing data when it reaches the configured limit. However, quotas alone cannot prevent an adversarial, untrusted client from filling the file system.
- Once processes that write data to the file system reach the configured limit, a short period of time elapses between when the amount of data reaches the quota limit, and when the processes stop writing data. The time period generally measures in the tenths of seconds. However, processes continue to write data during that time. The amount of additional data that the processes write depends on the amount of time elapsed before they stop.
-
Previously, quotas were only supported with the userspace FUSE client. With Linux kernel version 4.17 or newer, the CephFS kernel client supports quotas against Ceph mimic or newer clusters. Those version requirements are met by Red Hat Enterprise Linux 8 and Red Hat Ceph Storage 4, respectively. The userspace FUSE client can be used on older and newer OS and cluster versions. The FUSE client is provided by the
ceph-fuse
package. -
When using path-based access restrictions, be sure to configure the quota on the directory to which the client is restricted, or to a directory nested beneath it. If the client has restricted access to a specific path based on the MDS capability, and the quota is configured on an ancestor directory that the client cannot access, the client will not enforce the quota. For example, if the client cannot access the
/home/
directory and the quota is configured on/home/
, the client cannot enforce that quota on the directory/home/user/
. - Snapshot file data that has been deleted or changed does not count towards the quota.
4.7.3. Viewing quotas
Use the getfattr
command and the ceph.quota
extended attributes to view the quota settings for a directory.
If the attributes appear on a directory inode, then that directory has a configured quota. If the attributes do not appear on the inode, then the directory does not have a quota set, although its parent directory might have a quota configured. If the value of the extended attribute is 0, the quota is not set.
Prerequisites
-
Make sure that the
attr
package is installed.
Procedure
To view CephFS quotas.
Using a byte-limit quota:
Syntax
getfattr -n ceph.quota.max_bytes DIRECTORY
Example
[root@fs ~]# getfattr -n ceph.quota.max_bytes /cephfs/
Using a file-limit quota:
Syntax
getfattr -n ceph.quota.max_files DIRECTORY
Example
[root@fs ~]# getfattr -n ceph.quota.max_files /cephfs/
Additional Resources
-
See the
getfattr(1)
manual page for more information.
4.7.4. Setting quotas
This section describes how to use the setfattr
command and the ceph.quota
extended attributes to set the quota for a directory.
Prerequisites
-
Make sure that the
attr
package is installed.
Procedure
To set CephFS quotas.
Using a byte-limit quota:
Syntax
setfattr -n ceph.quota.max_bytes -v 100000000 /some/dir
Example
[root@fs ~]# setfattr -n ceph.quota.max_bytes -v 100000000 /cephfs/
In this example, 100000000 bytes equals 100 MB.
Using a file-limit quota:
Syntax
setfattr -n ceph.quota.max_files -v 10000 /some/dir
Example
[root@fs ~]# setfattr -n ceph.quota.max_files -v 10000 /cephfs/
In this example, 10000 equals 10,000 files.
Additional Resources
-
See the
setfattr(1)
manual page for more information.
4.7.5. Removing quotas
This section describes how to use the setfattr
command and the ceph.quota
extended attributes to remove a quota from a directory.
Prerequisites
-
Make sure that the
attr
package is installed.
Procedure
To remove CephFS quotas.
Using a byte-limit quota:
Syntax
setfattr -n ceph.quota.max_bytes -v 0 DIRECTORY
Example
[root@fs ~]# setfattr -n ceph.quota.max_bytes -v 0 /cephfs/
Using a file-limit quota:
Syntax
setfattr -n ceph.quota.max_files -v 0 DIRECTORY
Example
[root@fs ~]# setfattr -n ceph.quota.max_files -v 0 /cephfs/
Additional Resources
-
See the
setfattr(1)
manual page for more information.
4.7.6. Additional Resources
-
See the
getfattr(1)
manual page for more information. -
See the
setfattr(1)
manual page for more information.
4.8. Working with File and Directory Layouts
As a storage administrator, you can control how file or directory data is mapped to objects.
This section describes how to:
4.8.1. Prerequisites
-
The installation of the
attr
package.
4.8.2. Overview of file and directory layouts
This section explains what file and directory layouts are in the context for the Ceph File System.
A layout of a file or directory controls how its content is mapped to Ceph RADOS objects. The directory layouts serves primarily for setting an inherited layout for new files in that directory.
To view and set a file or directory layout, use virtual extended attributes or extended file attributes (xattrs
). The name of the layout attributes depends on whether a file is a regular file or a directory:
-
Regular files layout attributes are called
ceph.file.layout
. -
Directories layout attributes are called
ceph.dir.layout
.
The File and Directory Layout Fields table lists available layout fields that you can set on files and directories.
Layouts Inheritance
Files inherit the layout of their parent directory when you create them. However, subsequent changes to the parent directory layout do not affect children. If a directory does not have any layouts set, files inherit the layout from the closest directory with layout in the directory structure.
Additional Resources
- See the Layouts Inheritance for more details.
4.8.3. Setting file and directory layout fields
Use the setfattr
command to set layout fields on a file or directory.
When you modify the layout fields of a file, the file must be empty, otherwise an error occurs.
Prerequisites
- Root-level access to the node.
Procedure
To modify layout fields on a file or directory:
Syntax
setfattr -n ceph.TYPE.layout.FIELD -v VALUE PATH
Replace:
-
TYPE with
file
ordir
. - FIELD with the name of the field.
- VALUE with the new value of the field.
- PATH with the path to the file or directory.
Example
[root@fs ~]# setfattr -n ceph.file.layout.stripe_unit -v 1048576 test
-
TYPE with
Additional Resources
- See the table in the Overview of the file and directory layouts section of the Red Hat Ceph Storage File System Guide for more details.
-
See the
setfattr(1)
manual page.
4.8.4. Viewing file and directory layout fields
To use the getfattr
command to view layout fields on a file or directory.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Root-level access to all nodes in the storage cluster.
Procedure
To view layout fields on a file or directory as a single string:
Syntax
getfattr -n ceph.TYPE.layout PATH
- Replace
- PATH with the path to the file or directory.
-
TYPE with
file
ordir
.
Example
[root@mon ~] getfattr -n ceph.dir.layout /home/test ceph.dir.layout="stripe_unit=4194304 stripe_count=2 object_size=4194304 pool=cephfs_data"
A directory does not have an explicit layout until you set it. Consequently, attempting to view the layout without first setting it fails because there are no changes to display.
Additional Resources
-
The
getfattr(1)
manual page. - For more information, see Setting file and directory layouts section in the Red Hat Ceph Storage File System Guide.
4.8.5. Viewing individual layout fields
Use the getfattr
command to view individual layout fields for a file or directory.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Root-level access to all nodes in the storage cluster.
Procedure
To view individual layout fields on a file or directory:
Syntax
getfattr -n ceph.TYPE.layout.FIELD _PATH
- Replace
-
TYPE with
file
ordir
. - FIELD with the name of the field.
- PATH with the path to the file or directory.
-
TYPE with
Example
[root@mon ~] getfattr -n ceph.file.layout.pool test ceph.file.layout.pool="cephfs_data"
NotePools in the
pool
field are indicated by name. However, newly created pools can be indicated by ID.
Additional Resources
-
The
getfattr(1)
manual page. - For more information, see File and directory layout fields.
4.8.6. Removing directory layouts
Use the setfattr
command to remove layouts from a directory.
When you set a file layout, you cannot change or remove it.
Prerequisites
- A directory with a layout.
Procedure
To remove a layout from a directory:
Syntax
setfattr -x ceph.dir.layout DIRECTORY_PATH
Example
[user@client ~]$ setfattr -x ceph.dir.layout /home/cephfs
To remove the
pool_namespace
field:Syntax
setfattr -x ceph.dir.layout.pool_namespace DIRECTORY_PATH
Example
[user@client ~]$ setfattr -x ceph.dir.layout.pool_namespace /home/cephfs
NoteThe
pool_namespace
field is the only field you can remove separately.
Additional Resources
-
The
setfattr(1)
manual page
4.9. Ceph File System snapshot considerations
As a storage administrator, you can gain an understanding of the data structures, system components, and considerations to manage Ceph File System (CephFS) snapshots.
Snapshots create an immutable view of a file system at the point in time of creation. You can create a snapshot within any directory, and all data in the file system under that directory is covered.
4.9.1. Storing snapshot metadata for a Ceph File System
Storage of snapshot directory entries and their inodes occurs in-line as part of the directory they were in at the time of the snapshot. All directory entries include a first and last snapid
for which they are valid.
4.9.2. Ceph File System snapshot writeback
Ceph snapshots rely on clients to help determine which operations apply to a snapshot and flush snapshot data and metadata back to the OSD and MDS clusters. Handling snapshot writeback is an involved process because snapshots apply to subtrees of the file hierarchy, and the creation of snapshots can occur anytime.
Parts of the file hierarchy that belong to the same set of snapshots are referred to by a single SnapRealm
. Each snapshot applies to the subdirectory nested beneath a directory and divides the file hierarchy into multiple "realms" where all of the files contained by a realm share the same set of snapshots.
The Ceph Metadata Server (MDS) controls client access to inode metadata and file data by issuing capabilities (caps) for each inode. During snapshot creation, clients acquire dirty metadata on inodes with capabilities to describe the file state at that time. When a client receives a ClientSnap
message, it updates the local SnapRealm
and its links to specific inodes and generates a CapSnap
for the inode. Capability writeback flushes out the CapSnap
and, if dirty data exists, the CapSnap
is used to block new data writes until the snapshot flushes to the OSDs.
The MDS generates snapshot-representing directory entries as part of the routine process for flushing them. The MDS keeps directory entries with outstanding CapSnap
data pinned in memory and the journal until the writeback process flushes them.
Additional Resources
- See the Creating client users for a Ceph File System section in the Red Hat Ceph Storage File System Guide for details on setting the Ceph user capabilities.
4.9.3. Ceph File System snapshots and hard links
Ceph moves an inode with multiple hard links to a dummy global SnapRealm
. This dummy SnapRealm
covers all snapshots in the filesystem. Any new snapshots preserve the inode’s data. This preserved data covers snapshots on any linkage of the inode.
4.9.4. Updating a snapshot for a Ceph File System
The process of updating a snapshot is similar to the process of deleting a snapshot.
If you remove an inode out of its parent SnapRealm
, Ceph generates a new SnapRealm
for the renamed inode if the SnapRealm
does not already exist. Ceph saves the IDs of snapshots that are effective on the original parent SnapRealm
into the past_parent_snaps
data structure of the new SnapRealm
and then follows a process similar to creating a snapshot.
Additional Resources
- For details about snapshot data structures, see Ceph File System snapshot data structures in Red Hat Ceph Storage File System Guide.
4.9.5. Ceph File System snapshots and multiple file systems
Snapshots are known to not function properly with multiple file systems.
If you have multiple file systems sharing a single Ceph pool with namespaces, their snapshots will collide, and deleting one snapshot results in missing file data for other snapshots sharing the same Ceph pool.
4.9.6. Ceph File System snapshot data structures
The Ceph File System (CephFS) uses the following snapshot data structures to store data efficiently:
SnapRealm
-
A
SnapRealm
is created whenever you create a snapshot at a new point in the file hierarchy or when you move a snapshotted inode outside its parent snapshot. A singleSnapRealm
represents the parts of the file hierarchy that belong to the same set of snapshots. ASnapRealm
contains asr_t_srnode
andinodes_with_caps
that are part of the snapshot. sr_t
-
An
sr_t
is the on-disk snapshot metadata. It contains sequence counters, time-stamps, and a list of associated snapshot IDs and thepast_parent_snaps
. SnapServer
-
A
SnapServer
manages snapshot ID allocation, snapshot deletion, and maintaining a list of cumulative snapshots in the file system. A file system only has one instance of aSnapServer
. SnapContext
A
SnapContext
consists of a snapshot sequence ID (snapid) and all the snapshot IDs currently defined for an object. When a write operation occurs, a Ceph client provides aSnapContext
to specify the set of snapshots that exist for an object. To generate aSnapContext
list, Ceph combines snapids associated with theSnapRealm
and all valid snapids in thepast_parent_snaps
data structure.File data is stored using RADOS self-managed snapshots. In a self-managed snapshot, the client must provide the current
SnapContext
on each write. Clients are careful to use the correctSnapContext
when writing file data to the Ceph OSDs.SnapClient
cached effective snapshots filter out stale snapids.SnapClient
-
A
SnapClient
is used to communicate with aSnapServer
and cache cumulative snapshots locally. Each Metadata Server (MDS) rank has aSnapClient
instance.
4.10. Managing Ceph File System snapshots
As a storage administrator, you can take a point-in-time snapshot of a Ceph File System (CephFS) directory. CephFS snapshots are asynchronous, and you can choose which directory snapshot creation occurs in.
4.10.1. Prerequisites
- A running and healthy Red Hat Ceph Storage cluster.
- Deployment of a Ceph File System.
4.10.2. Ceph File System snapshots
A Ceph File System (CephFS) snapshot creates an immutable, point-in-time view of a Ceph File System. CephFS snapshots are asynchronous and are kept in a special hidden directory in the CephFS directory, named .snap
. You can specify snapshot creation for any directory within a Ceph File System. When specifying a directory, the snapshot also includes all the subdirectories beneath it.
Each Ceph Metadata Server (MDS) cluster allocates the snap identifiers independently. Using snapshots for multiple Ceph File Systems that are sharing a single pool causes snapshot collisions and results in missing file data.
Additional Resources
- See the Creating a snapshot for a Ceph File System section in the Red Hat Ceph Storage File System Guide for more details.
4.10.3. Enabling a snapshot for a Ceph File System
New Ceph File Systems enable the snapshotting feature by default, but you must manually enable the feature on existing Ceph File Systems.
Prerequisites
- A running and healthy Red Hat Ceph Storage cluster.
- Deployment of a Ceph File System.
- Root-level access to a Ceph Metadata Server (MDS) node.
Procedure
For existing Ceph File Systems, enable the snapshotting feature:
Syntax
ceph fs set FILE_SYSTEM_NAME allow_new_snaps true
Example
[root@mds ~]# ceph fs set cephfs allow_new_snaps true enabled new snapshots
Additional Resources
- See the Creating a snapshot for a Ceph File System section in the Red Hat Ceph Storage File System Guide for details on creating a snapshot.
- See the Deleting a snapshot for a Ceph File System section in the Red Hat Ceph Storage File System Guide for details on deleting a snapshot.
- See the Restoring a snapshot for a Ceph File System section in the Red Hat Ceph Storage File System Guide for details on restoring a snapshot.
4.10.4. Creating a snapshot for a Ceph File System
You can create an immutable, point-in-time view of a Ceph File System by creating a snapshot. A snapshot uses a hidden directory located in the directory to snapshot. The name of this directory is .snap
by default.
Prerequisites
- A running and healthy Red Hat Ceph Storage cluster.
- Deployment of a Ceph File System.
- Root-level access to a Ceph Metadata Server (MDS) node.
Procedure
To create a snapshot, create a new subdirectory inside the
.snap
directory. The snapshot name is the new subdirectory name.Syntax
mkdir NEW_DIRECTORY_PATH
Example
[root@mds cephfs]# mkdir .snap/new-snaps
This example creates the
new-snaps
subdirectory on a Ceph File System that is mounted on/mnt/cephfs
and informs the Ceph Metadata Server (MDS) to start making snapshots.
Verification
List the new snapshot directory:
Syntax
ls -l .snap/
The
new-snaps
subdirectory displays under the.snap
directory.
Additional Resources
- See the Deleting a snapshot for a Ceph File System section in the Red Hat Ceph Storage File System Guide for details on deleting a snapshot.
- See the Restoring a snapshot for a Ceph File System section in the Red Hat Ceph Storage File System Guide for details on restoring a snapshot.
4.10.5. Deleting a snapshot for a Ceph File System
You can delete a snapshot by removing the corresponding directory in a .snap
directory.
Prerequisites
- A running and healthy Red Hat Ceph Storage cluster.
- Deployment of a Ceph File System.
- Creation of snapshots on a Ceph File System.
- Root-level access to a Ceph Metadata Server (MDS) node.
Procedure
To delete a snapshot, remove the corresponding directory:
Syntax
rmdir DIRECTORY_PATH
Example
[root@mds cephfs]# rmdir .snap/new-snaps
This example deletes the
new-snaps
subdirectory on a Ceph File System that is mounted on/mnt/cephfs
.
Contrary to a regular directory, a rmdir
command succeeds even if the directory is not empty, so you do not need to use a recursive rm
command.
Attempting to delete root-level snapshots, which might contain underlying snapshots, will fail.
Additional Resources
- See the Restoring a snapshot for a Ceph File System section in the Red Hat Ceph Storage File System Guide for details on restoring a snapshot.
- See the Creating a snapshot for a Ceph File System section in the Red Hat Ceph Storage File System Guide for details on creating a snapshot.
4.10.6. Restoring a snapshot for a Ceph File System
You can restore a file from a snapshot or fully restore a complete snapshot for a Ceph File System (CephFS).
Prerequisites
- A running, and healthy Red Hat Ceph Storage cluster.
- Deployment of a Ceph File System.
- Root-level access to a Ceph Metadata Server (MDS) node.
Procedure
To restore a file from a snapshot, copy it from the snapshot directory to the regular tree:
Syntax
cp -a .snap/SNAP_DIRECTORY/FILENAME
Example
[root@mds dir1]# cp .snap/new-snaps/file1 .
This example restores
file1
to the current directory.You can also fully restore a snapshot from the
.snap
directory tree. Replace the current entries with copies from the desired snapshot:Syntax
[root@mds dir1]# rm -rf * [root@mds dir1]# cp -a .snap/SNAP_DIRECTORY/* .
Example
[root@mds dir1]# rm -rf * [root@mds dir1]# cp -a .snap/new-snaps/* .
This example removes all files and directories under
dir1
and restores the files from thenew-snaps
snapshot to the current directory,dir1
.
4.10.7. Additional Resources
- See the Deployment of the Ceph File System section in the Red Hat Ceph Storage File System Guide.
4.11. Taking down a Ceph File System cluster
You can take down Ceph File System (CephFS) cluster by simply setting the down
flag true
. Doing this gracefully shuts down the Metadata Server (MDS) daemons by flushing journals to the metadata pool and all client I/O is stopped.
You can also take the CephFS cluster down quickly for testing the deletion of a file system and bring the Metadata Server (MDS) daemons down, for example, practicing a disaster recovery scenario. Doing this sets the jointable
flag to prevent the MDS standby daemons from activating the file system.
Prerequisites
- User access to the Ceph Monitor node.
Procedure
To mark the CephFS cluster down:
Syntax
ceph fs set FS_NAME down true
Exmaple
[root@mon]# ceph fs set cephfs down true
To bring the CephFS cluster back up:
Syntax
ceph fs set FS_NAME down false
Exmaple
[root@mon]# ceph fs set cephfs down false
or
To quickly take down a CephFS cluster:
Syntax
ceph fs fail FS_NAME
Exmaple
[root@mon]# ceph fs fail cephfs
4.12. Removing a Ceph File System using the command-line interface
You can remove a Ceph File System (CephFS) using the command-line interface. Before doing so, consider backing up all the data and verifying that all clients have unmounted the file system locally.
This operation is destructive and will make the data stored on the Ceph File System permanently inaccessible.
Prerequisites
- Back-up the data.
- All clients have unmounted the Ceph File System (CephFS).
- Root-level access to a Ceph Monitor node.
Procedure
Display the CephFS status to determine the MDS ranks.
Syntax
ceph fs status
Example
[root@mon ~]# ceph fs status cephfs - 0 clients ====== +------+--------+----------------+---------------+-------+-------+ | Rank | State | MDS | Activity | dns | inos | +------+--------+----------------+---------------+-------+-------+ | 0 | active | cluster1-node6 | Reqs: 0 /s | 10 | 13 | +------+--------+----------------+---------------+-------+-------+ +-----------------+----------+-------+-------+ | Pool | type | used | avail | +-----------------+----------+-------+-------+ | cephfs_metadata | metadata | 2688k | 15.0G | | cephfs_data | data | 0 | 15.0G | +-----------------+----------+-------+-------+ +----------------+ | Standby MDS | +----------------+ | cluster1-node5 | +----------------+
In the example above, the rank is 0.
Mark the CephFS as down:
Syntax
ceph fs set FS_NAME down true
Replace FS_NAME with the name of the CephFS you want to remove.
Example
[root@mon]# ceph fs set cephfs down true marked down
Display the status of the CephFS to determine it has stopped:
Syntax
ceph fs status
Example
[root@mon ~]# ceph fs status cephfs - 0 clients ====== +------+----------+----------------+----------+-------+-------+ | Rank | State | MDS | Activity | dns | inos | +------+----------+----------------+----------+-------+-------+ | 0 | stopping | cluster1-node6 | | 10 | 12 | +------+----------+----------------+----------+-------+-------+ +-----------------+----------+-------+-------+ | Pool | type | used | avail | +-----------------+----------+-------+-------+ | cephfs_metadata | metadata | 2688k | 15.0G | | cephfs_data | data | 0 | 15.0G | +-----------------+----------+-------+-------+ +----------------+ | Standby MDS | +----------------+ | cluster1-node5 | +----------------+
After some time, the MDS is no longer listed:
Example
[root@mon ~]# ceph fs status cephfs - 0 clients ====== +------+-------+-----+----------+-----+------+ | Rank | State | MDS | Activity | dns | inos | +------+-------+-----+----------+-----+------+ +------+-------+-----+----------+-----+------+ +-----------------+----------+-------+-------+ | Pool | type | used | avail | +-----------------+----------+-------+-------+ | cephfs_metadata | metadata | 2688k | 15.0G | | cephfs_data | data | 0 | 15.0G | +-----------------+----------+-------+-------+ +----------------+ | Standby MDS | +----------------+ | cluster1-node5 | +----------------+
Fail all MDS ranks shown in the status of step one:
Syntax
ceph mds fail RANK
Replace RANK with the rank of the MDS daemons to fail.
Example
[root@mon]# ceph mds fail 0
Remove the CephFS:
Syntax
ceph fs rm FS_NAME --yes-i-really-mean-it
Replace FS_NAME with the name of the Ceph File System you want to remove.
Example
[root@mon]# ceph fs rm cephfs --yes-i-really-mean-it
Verify that the file system is removed:
Syntax
ceph fs ls
Example
[root@mon ~]# ceph fs ls No filesystems enabled
Optional: Remove the pools that were used by CephFS.
On a Ceph Monitor node, list the pools:
Syntax
ceph osd pool ls
Example
[root@mon ~]# ceph osd pool ls rbd cephfs_data cephfs_metadata
In the example output,
cephfs_metadata
andcephfs_data
are the pools that were used by CephFS.Remove the metadata pool:
Syntax
ceph osd pool delete CEPH_METADATA_POOL CEPH_METADATA_POOL --yes-i-really-really-mean-it
Replace CEPH_METADATA_POOL with the pool CephFS used for metadata storage by including the pool name twice.
Example
[root@mon ~]# ceph osd pool delete cephfs_metadata cephfs_metadata --yes-i-really-really-mean-it pool 'cephfs_metadata' removed
Remove the data pool:
Syntax
ceph osd pool delete CEPH_DATA_POOL CEPH_DATA_POOL --yes-i-really-really-mean-it
Replace CEPH_DATA_POOL with the pool CephFS used for data storage by including the pool name twice.
Example
[root@mon ~]# ceph osd pool delete cephfs_data cephfs_data --yes-i-really-really-mean-it pool 'cephfs_data' removed
Additional Resources
- See Removing a Ceph File System Using Ansible in the Red Hat Ceph Storage File System Guide.
- See the Delete a pool section in the Red Hat Ceph Storage Storage Strategies Guide.
4.13. Removing a Ceph File System using Ansible
You can remove a Ceph File System (CephFS) using ceph-ansible
. Before doing so, consider backing up all the data and verifying that all clients have unmounted the file system locally.
This operation is destructive and will make the data stored on the Ceph File System permanently inaccessible.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- A good backup of the data.
- All clients have unmounted the Ceph File System.
- Access to the Ansible administration node.
- Root-level access to a Ceph Monitor node.
Procedure
Navigate to the
/usr/share/ceph-ansible/
directory:[admin@admin ~]$ cd /usr/share/ceph-ansible
Identify the Ceph Metadata Server (MDS) nodes by reviewing the
[mdss]
section in the Ansible inventory file. On the Ansible administration node, open/usr/share/ceph-ansible/hosts
:Example
[mdss] cluster1-node5 cluster1-node6
In the example,
cluster1-node5
andcluster1-node6
are the MDS nodes.Set the
max_mds
parameter to1
:Syntax
ceph fs set NAME max_mds NUMBER
Example
[root@mon ~]# ceph fs set cephfs max_mds 1
Run the
shrink-mds.yml
playbook, specifying the Metadata Server (MDS) to remove:Syntax
ansible-playbook infrastructure-playbooks/shrink-mds.yml -e mds_to_kill=MDS_NODE -i hosts
Replace MDS_NODE with the Metadata Server node you want to remove. The Ansible playbook will ask you if you want to shrink the cluster. Type
yes
and press the enter key.Example
[admin@admin ceph-ansible]$ ansible-playbook infrastructure-playbooks/shrink-mds.yml -e mds_to_kill=cluster1-node6 -i hosts
Optional: Repeat the process for any additional MDS nodes:
Syntax
ansible-playbook infrastructure-playbooks/shrink-mds.yml -e mds_to_kill=MDS_NODE -i hosts
Replace MDS_NODE with the Metadata Server node you want to remove. The Ansible playbook will ask you if you want to shrink the cluster. Type
yes
and press the enter key.Example
[admin@admin ceph-ansible]$ ansible-playbook infrastructure-playbooks/shrink-mds.yml -e mds_to_kill=cluster1-node5 -i hosts
Check the status of the CephFS:
Syntax
ceph fs status
Example
[root@mon ~]# ceph fs status cephfs - 0 clients ====== +------+--------+----------------+---------------+-------+-------+ | Rank | State | MDS | Activity | dns | inos | +------+--------+----------------+---------------+-------+-------+ | 0 | failed | cluster1-node6 | Reqs: 0 /s | 10 | 13 | +------+--------+----------------+---------------+-------+-------+ +-----------------+----------+-------+-------+ | Pool | type | used | avail | +-----------------+----------+-------+-------+ | cephfs_metadata | metadata | 2688k | 15.0G | | cephfs_data | data | 0 | 15.0G | +-----------------+----------+-------+-------+ +----------------+ | Standby MDS | +----------------+ | cluster1-node5 | +----------------+
Remove the
[mdss]
section and the nodes in it from the Ansible inventory file so they will not be reprovisioned as metadata servers on future runs of thesite.yml
orsite-container.yml
playbooks. Open for editing the Ansible inventory file,/usr/share/ceph-ansible/hosts
:Example
[mdss] cluster1-node5 cluster1-node6
Remove the
[mdss]
section and all nodes under it.Remove the CephFS:
Syntax
ceph fs rm FS_NAME --yes-i-really-mean-it
Replace FS_NAME with the name of the Ceph File System you want to remove.
Example
[root@mon]# ceph fs rm cephfs --yes-i-really-mean-it
Optional: Remove the pools that were used by CephFS.
On a Ceph Monitor node, list the pools:
Syntax
ceph osd pool ls
Find the pools that were used by CephFS.
Example
[root@mon ~]# ceph osd pool ls rbd cephfs_data cephfs_metadata
In the example output,
cephfs_metadata
andcephfs_data
are the pools that were used by CephFS.Remove the metadata pool:
Syntax
ceph osd pool delete CEPH_METADATA_POOL CEPH_METADATA_POOL --yes-i-really-really-mean-it
Replace CEPH_METADATA_POOL with the pool CephFS used for metadata storage by including the pool name twice.
Example
[root@mon ~]# ceph osd pool delete cephfs_metadata cephfs_metadata --yes-i-really-really-mean-it pool 'cephfs_metadata' removed
Remove the data pool:
Syntax
ceph osd pool delete CEPH_DATA_POOL CEPH_DATA_POOL --yes-i-really-really-mean-it
Replace CEPH_METADATA_POOL with the pool CephFS used for metadata storage by including the pool name twice.
Example
[root@mon ~]# ceph osd pool delete cephfs_data cephfs_data --yes-i-really-really-mean-it pool 'cephfs_data' removed
Verify the pools no longer exist:
Example
[root@mon ~]# ceph osd pool ls rbd
The
cephfs_metadata
andcephfs_data
pools are no longer listed.
Additional Resources
- See Removing a Ceph File System Manually in the Red Hat Ceph Storage File System Guide.
- See the Delete a pool section in the Red Hat Ceph Storage Storage Strategies Guide.
4.14. Setting a minimum client version
You can set a minimum version of Ceph that a third-party client must be running to connect to a Red Hat Ceph Storage Ceph File System (CephFS). Set the min_compat_client
parameter to prevent older clients from mounting the file system. CephFS will also automatically evict currently connected clients that use an older version than the version set with min_compat_client
.
The rationale for this setting is to prevent older clients which might include bugs or have incomplete feature compatibility from connecting to the cluster and disrupting other clients. For example, some older versions of CephFS clients might not release capabilities properly and cause other client requests to be handled slowly.
The values of min_compat_client
are based on the upstream Ceph versions. Red Hat recommends that the third-party clients use the same major upstream version as the Red Hat Ceph Storage cluster is based on. See the following table to see the upstream versions and corresponding Red Hat Ceph Storage versions.
Value | Upstream Ceph version | Red Hat Ceph Storage version |
---|---|---|
luminous | 12.2 | Red Hat Ceph Storage 3 |
mimic | 13.2 | not applicable |
nautilus | 14.2 | Red Hat Ceph Storage 4 |
If you use Red Hat Enterprise Linux 7, do not set min_compat_client
to a later version than luminous
because Red Hat Enterprise Linux 7 is considered a luminous client and if you use a later version, CephFS does not allow it to access the mount point.
Prerequisites
- A working Red Hat Ceph Storage cluster with Ceph File System deployed
Procedure
Set the minimum client version:
ceph fs set name min_compat_client release
Replace name with the name of the Ceph File System and release with the minimum client version. For example to restrict clients to use the
nautilus
upstream version at minimum on thecephfs
Ceph File System:$ ceph fs set cephfs min_compat_client nautilus
See Table 4.1, “
min_compat_client
values” for the full list of available values and how they correspond with Red Hat Ceph Storage versions.
4.15. Using the ceph mds fail
command
Use the ceph mds fail
command to:
-
Mark an MDS daemon as failed. If the daemon was active and a suitable standby daemon was available, and if the standby daemon was active after disabling the
standby-replay
configuration, using this command forces a failover to the standby daemon. By disabling thestandby-replay
daemon, this prevents newstandby-replay
daemons from being assigned. - Restart a running MDS daemon. If the daemon was active and a suitable standby daemon was available, the "failed" daemon becomes a standby daemon.
Prerequisites
- Installation and configuration of the Ceph MDS daemons.
Procedure
To fail a daemon:
Syntax
ceph mds fail MDS_NAME
Where MDS_NAME is the name of the
standby-replay
MDS node.Example
[root@mds ~]# ceph mds fail example01
NoteYou can find the Ceph MDS name from the
ceph fs status
command.
Additional Resources
- See the Decreasing the Number of Active MDS Daemons in the Red Hat Ceph Storage File System Guide.
- See the Configuring Standby Metadata Server Daemons in the Red Hat Ceph Storage File System Guide.
- See the Explanation of Ranks in Metadata Server Configuration in the Red Hat Ceph Storage File System Guide.
4.16. Ceph File System client evictions
When a Ceph File System (CephFS) client is unresponsive or misbehaving, it might be necessary to forcibly terminate, or evict it from accessing the CephFS. Evicting a CephFS client prevents it from communicating further with Metadata Server (MDS) daemons and Ceph OSD daemons. If a CephFS client is buffering I/O to the CephFS at the time of eviction, then any un-flushed data will be lost. The CephFS client eviction process applies to all client types: FUSE mounts, kernel mounts, NFS gateways, and any process using libcephfs
API library.
You can evict CephFS clients automatically, if they fail to communicate promptly with the MDS daemon, or manually.
Automatic Client Eviction
These scenarios cause an automatic CephFS client eviction:
-
If a CephFS client has not communicated with the active MDS daemon for over the default 300 seconds, or as set by the
session_autoclose
option. -
If the
mds_cap_revoke_eviction_timeout
option is set, and a CephFS client has not responded to the cap revoke messages for over the set amount of seconds. Themds_cap_revoke_eviction_timeout
option is disabled by default. -
During MDS startup or failover, the MDS daemon goes through a reconnect phase waiting for all the CephFS clients to connect to the new MDS daemon. If any CephFS clients fails to reconnect within the default time window of 45 seconds, or as set by the
mds_reconnect_timeout
option.
Additional Resources
- See the Manually evicting a Ceph File System client section in the Red Hat Ceph Storage File System Guide for more details.
4.17. Blacklist Ceph File System clients
Ceph File System client blacklisting is enabled by default. When you send an eviction command to a single Metadata Server (MDS) daemon, it propagates the blacklist to the other MDS daemons. This is to prevent the CephFS client from accessing any data objects, so it is necessary to update the other CephFS clients, and MDS daemons with the latest Ceph OSD map, which includes the blacklisted client entries.
An internal “osdmap epoch barrier” mechanism is used when updating the Ceph OSD map. The purpose of the barrier is to verify the CephFS clients receiving the capabilities have a sufficiently recent Ceph OSD map, before any capabilities are assigned that might allow access to the same RADOS objects, as to not race with cancelled operations, such as, from ENOSPC or blacklisted clients from evictions.
If you are experiencing frequent CephFS client evictions due to slow nodes or an unreliable network, and you cannot fix the underlying issue, then you can ask the MDS to be less strict. It is possible to respond to slow CephFS clients by simply dropping their MDS sessions, but permit the CephFS client to re-open sessions and to continue talking to Ceph OSDs. By setting the mds_session_blacklist_on_timeout
and mds_session_blacklist_on_evict
options to false
enables this mode.
When blacklisting is disabled, the evicted CephFS client has only an effect on the MDS daemon you send the command to. On a system with multiple active MDS daemons, you would need to send an eviction command to each active daemon.
4.18. Manually evicting a Ceph File System client
You might want to manually evict a Ceph File System (CephFS) client, if the client is misbehaving and you do not have access to the client node, or if a client dies, and you do not want to wait for the client session to time out.
Prerequisites
- User access to the Ceph Monitor node.
Procedure
Review the client list:
Syntax
ceph tell DAEMON_NAME client ls
Exmaple
[root@mon]# ceph tell mds.0 client ls [ { "id": 4305, "num_leases": 0, "num_caps": 3, "state": "open", "replay_requests": 0, "completed_requests": 0, "reconnecting": false, "inst": "client.4305 172.21.9.34:0/422650892", "client_metadata": { "ceph_sha1": "ae81e49d369875ac8b569ff3e3c456a31b8f3af5", "ceph_version": "ceph version 12.0.0-1934-gae81e49 (ae81e49d369875ac8b569ff3e3c456a31b8f3af5)", "entity_id": "0", "hostname": "senta04", "mount_point": "/tmp/tmpcMpF1b/mnt.0", "pid": "29377", "root": "/" } } ]
Evict the specified CephFS client:
Syntax
ceph tell DAEMON_NAME client evict id=ID_NUMBER
Exmaple
[root@mon]# ceph tell mds.0 client evict id=4305
4.19. Removing a Ceph File System client from the blacklist
In some situations, it can be useful to allow a previous blacklisted Ceph File System (CephFS) client to reconnect to the storage cluster.
Removing a CephFS client from the blacklist puts data integrity at risk, and does not guarantee a fully healthy, and functional CephFS client as a result. The best way to get a fully healthy CephFS client back after an eviction, is to unmount the CephFS client and do a fresh mount. If other CephFS clients are accessing files that the blacklisted CephFS client was doing buffered I/O to can result in data corruption.
Prerequisites
- User access to the Ceph Monitor node.
Procedure
Review the blacklist:
Exmaple
[root@mon]# ceph osd blacklist ls listed 1 entries 127.0.0.1:0/3710147553 2020-03-19 11:32:24.716146
Remove the CephFS client from the blacklist:
Syntax
ceph osd blacklist rm CLIENT_NAME_OR_IP_ADDR
Exmaple
[root@mon]# ceph osd blacklist rm 127.0.0.1:0/3710147553 un-blacklisting 127.0.0.1:0/3710147553
Optionally, to have FUSE-based CephFS clients trying automatically to reconnect when removing them from the blacklist. On the FUSE client, set the following option to
true
:client_reconnect_stale = true
4.20. Additional Resources
- For details, see Chapter 3, Deployment of the Ceph File System.
- For details, see the Red Hat Ceph Storage Installation Guide.
- For details, see the Configuring Metadata Server Daemons in the Red Hat Ceph Storage File System Guide.