Chapter 2. The Ceph File System Metadata Server
As a storage administrator, you can learn about the different states of the Ceph File System (CephFS) Metadata Server (MDS), along with learning about CephFS MDS ranking mechanic, configuring the MDS standby daemon, and cache size limits. Knowing these concepts can enable you to configure the MDS daemons for a storage environment.
2.1. Prerequisites
- A running, and healthy Red Hat Ceph Storage cluster.
-
Installation of the Ceph Metadata Server daemons (
ceph-mds
).
2.2. Metadata Server daemon states
The Metadata Server (MDS) daemons operate in two states:
- Active — manages metadata for files and directories stores on the Ceph File System.
- Standby — serves as a backup, and becomes active when an active MDS daemon becomes unresponsive.
By default, a Ceph File System uses only one active MDS daemon. However, systems with many clients benefit from multiple active MDS daemons.
You can configure the file system to use multiple active MDS daemons so that you can scale metadata performance for larger workloads. The active MDS daemons dynamically share the metadata workload when metadata load patterns change. Note that systems with multiple active MDS daemons still require standby MDS daemons to remain highly available.
What Happens When the Active MDS Daemon Fails
When the active MDS becomes unresponsive, a Ceph Monitor daemon waits a number of seconds equal to the value specified in the mds_beacon_grace
option. If the active MDS is still unresponsive after the specified time period has passed, the Ceph Monitor marks the MDS daemon as laggy
. One of the standby daemons becomes active, depending on the configuration.
To change the value of mds_beacon_grace
, add this option to the Ceph configuration file and specify the new value.
2.3. Metadata Server ranks
Each Ceph File System (CephFS) has a number of ranks, one by default, which starts at zero.
Ranks define the way how the metadata workload is shared between multiple Metadata Server (MDS) daemons. The number of ranks is the maximum number of MDS daemons that can be active at one time. Each MDS daemon handles a subset of the CephFS metadata that is assigned to that rank.
Each MDS daemon initially starts without a rank. The Ceph Monitor assigns a rank to the daemon. The MDS daemon can only hold one rank at a time. Daemons only lose ranks when they are stopped.
The max_mds
setting controls how many ranks will be created.
The actual number of ranks in the CephFS is only increased if a spare daemon is available to accept the new rank.
Rank States
Ranks can be:
- Up - A rank that is assigned to the MDS daemon.
- Failed - A rank that is not associated with any MDS daemon.
-
Damaged - A rank that is damaged; its metadata is corrupted or missing. Damaged ranks are not assigned to any MDS daemons until the operator fixes the problem, and uses the
ceph mds repaired
command on the damaged rank.
2.4. Metadata Server cache size limits
You can limit the size of the Ceph File System (CephFS) Metadata Server (MDS) cache by:
A memory limit: Use the
mds_cache_memory_limit
option. Red Hat recommends a value between 8 GB and 64 GB formds_cache_memory_limit
. Setting more cache can cause issues with recovery. This limit is approximately 66% of the desired maximum memory use of the MDS.ImportantRed Hat recommends to use memory limits instead of inode count limits.
-
Inode count: Use the
mds_cache_size
option. By default, limiting the MDS cache by inode count is disabled.
In addition, you can specify a cache reservation by using the mds_cache_reservation
option for MDS operations. The cache reservation is limited as a percentage of the memory or inode limit and is set to 5% by default. The intent of this parameter is to have the MDS maintain an extra reserve of memory for its cache for new metadata operations to use. As a consequence, the MDS should in general operate below its memory limit because it will recall old state from clients in order to drop unused metadata in its cache.
The mds_cache_reservation
option replaces the mds_health_cache_threshold
option in all situations, except when MDS nodes sends a health alert to the Ceph Monitors indicating the cache is too large. By default, mds_health_cache_threshold
is 150% of the maximum cache size.
Be aware that the cache limit is not a hard limit. Potential bugs in the CephFS client or MDS or misbehaving applications might cause the MDS to exceed its cache size. The mds_health_cache_threshold
option configures the storage cluster health warning message, so that operators can investigate why the MDS cannot shrink its cache.
Additional Resources
- See the Metadata Server daemon configuration reference section in the Red Hat Ceph Storage File System Guide for more information.
2.5. Configuring multiple active Metadata Server daemons
Configure multiple active Metadata Server (MDS) daemons to scale metadata performance for large systems.
Do not convert all standby MDS daemons to active ones. A Ceph File System (CephFS)requires at least one standby MDS daemon to remain highly available.
The scrubbing process is not currently supported when multiple active MDS daemons are configured.
Prerequisites
- Ceph administration capabilities on the MDS node.
Procedure
Set the
max_mds
parameter to the desired number of active MDS daemons:Syntax
ceph fs set NAME max_mds NUMBER
Example
[root@mon ~]# ceph fs set cephfs max_mds 2
This example increases the number of active MDS daemons to two in the CephFS called
cephfs
NoteCeph only increases the actual number of ranks in the CephFS if a spare MDS daemon is available to take the new rank.
Verify the number of active MDS daemons:
Syntax
ceph fs status NAME
Example
[root@mon ~]# ceph fs status cephfs cephfs - 0 clients ====== +------+--------+-------+---------------+-------+-------+ | Rank | State | MDS | Activity | dns | inos | +------+--------+-------+---------------+-------+-------+ | 0 | active | node1 | Reqs: 0 /s | 10 | 12 | | 1 | active | node2 | Reqs: 0 /s | 10 | 12 | +------+--------+-------+---------------+-------+-------+ +-----------------+----------+-------+-------+ | Pool | type | used | avail | +-----------------+----------+-------+-------+ | cephfs_metadata | metadata | 4638 | 26.7G | | cephfs_data | data | 0 | 26.7G | +-----------------+----------+-------+-------+ +-------------+ | Standby MDS | +-------------+ | node3 | +-------------+
Additional Resources
- See the Metadata Server daemons states section in the Red Hat Ceph Storage File System Guide for more details.
- See the Decreasing the Number of Active MDS Daemons section in the Red Hat Ceph Storage File System Guide for more details.
- See the Managing Ceph users section in the Red Hat Ceph Storage Administration Guide for more details.
2.6. Configuring the number of standby daemons
Each Ceph File System (CephFS) can specify the required number of standby daemons to be considered healthy. This number also includes the standby-replay daemon waiting for a rank failure.
Prerequisites
- User access to the Ceph Monitor node.
Procedure
Set the expected number of standby daemons for a particular CephFS:
Syntax
ceph fs set FS_NAME standby_count_wanted NUMBER
NoteSetting the NUMBER to zero disables the daemon health check.
Example
[root@mon]# ceph fs set cephfs standby_count_wanted 2
This example sets the expected standby daemon count to two.
2.7. Configuring the standby-replay Metadata Server
Configure each Ceph File System (CephFS) by adding a standby-replay Metadata Server (MDS) daemon. Doing this reduces failover time if the active MDS becomes unavailable.
This specific standby-replay daemon follows the active MDS’s metadata journal. The standby-replay daemon is only used by the active MDS of the same rank, and is not available to other ranks.
If using standby-replay, then every active MDS must have a standby-replay daemon.
Prerequisites
- User access to the Ceph Monitor node.
Procedure
Set the standby-replay for a particular CephFS:
Syntax
ceph fs set FS_NAME allow_standby_replay 1
Example
[root@mon]# ceph fs set cephfs allow_standby_replay 1
In this example, the Boolean value is
1
, which enables the standby-replay daemons to be assigned to the active Ceph MDS daemons.NoteSetting the
allow_standby_replay
Boolean value back to0
only prevents new standby-replay daemons from being assigned. To also stop the running daemons, mark them asfailed
with theceph mds fail
command.
Additional Resources
- See the Using the ceph mds fail command section in the Red Hat Ceph Storage File System Guide for details.
2.8. Decreasing the number of active Metadata Server daemons
How to decrease the number of active Ceph File System (CephFS) Metadata Server (MDS) daemons.
Prerequisites
-
The rank that you will remove must be active first, meaning that you must have the same number of MDS daemons as specified by the
max_mds
parameter.
Procedure
Set the same number of MDS daemons as specified by the
max_mds
parameter:Syntax
ceph fs status NAME
Example
[root@mon ~]# ceph fs status cephfs cephfs - 0 clients +------+--------+-------+---------------+-------+-------+ | Rank | State | MDS | Activity | dns | inos | +------+--------+-------+---------------+-------+-------+ | 0 | active | node1 | Reqs: 0 /s | 10 | 12 | | 1 | active | node2 | Reqs: 0 /s | 10 | 12 | +------+--------+-------+---------------+-------+-------+ +-----------------+----------+-------+-------+ | Pool | type | used | avail | +-----------------+----------+-------+-------+ | cephfs_metadata | metadata | 4638 | 26.7G | | cephfs_data | data | 0 | 26.7G | +-----------------+----------+-------+-------+ +-------------+ | Standby MDS | +-------------+ | node3 | +-------------+
On a node with administration capabilities, change the
max_mds
parameter to the desired number of active MDS daemons:Syntax
ceph fs set NAME max_mds NUMBER
Example
[root@mon ~]# ceph fs set cephfs max_mds 1
-
Wait for the storage cluster to stabilize to the new
max_mds
value by watching the Ceph File System status. Verify the number of active MDS daemons:
Syntax
ceph fs status NAME
Example
[root@mon ~]# ceph fs status cephfs cephfs - 0 clients +------+--------+-------+---------------+-------+-------+ | Rank | State | MDS | Activity | dns | inos | +------+--------+-------+---------------+-------+-------+ | 0 | active | node1 | Reqs: 0 /s | 10 | 12 | +------+--------+-------+---------------+-------+-------+ +-----------------+----------+-------+-------+ | Pool | type | used | avail | +-----------------+----------+-------+-------+ | cephfs_metadata | metadata | 4638 | 26.7G | | cephfs_data | data | 0 | 26.7G | +-----------------+----------+-------+-------+ +-------------+ | Standby MDS | +-------------+ | node3 | | node2 | +-------------+
Additional Resources
- See the Metadata Server daemons states section in the Red Hat Ceph Storage File System Guide.
- See the Configuring multiple active Metadata Server daemons section in the Red Hat Ceph Storage File System Guide.
2.9. Additional Resources
- See the Installing Metadata servers section of the Red Hat Ceph Storage Installation Guide for details.
- See the Red Hat Ceph Storage Installation Guide for details on installing a Red Hat Ceph Storage cluster.