Chapter 2. Configuring Metadata Server Daemons
This chapter explains how to configure Ceph Metadata Server (MDS) daemons.
- To understand different states of MDS daemons, see Section 2.2, “States of Metadata Server Daemons”.
- To understand what a "rank" mean in MDS configuration, see Section 2.3, “Explanation of Ranks in Metadata Server Configuration”.
- To learn about various configuration types of standby MDS daemons, see Section 2.4, “Types of Standby Configuration”.
- To configure standby MDS daemons, see Section 2.5, “Configuring Standby Metadata Server Daemons”.
- To configure multiple active MDS daemons, see Section 2.6, “Configuring Multiple Active Metadata Server Daemons”.
- To decrease the number of active MDS daemons, see Section 2.7, “Decreasing the Number of Active MDS Daemons”.
- To learn about MDS cache size limits, see Section 2.8, “Understanding MDS Cache Size Limits”.
Starting with Red Hat Ceph Storage 3.2, the ceph-mds
and ceph-fuse
daemons can run with SELinux in enforcing mode.
2.1. Prerequisites
- Deploy a Ceph Storage Cluster if you do not have one. For details, see the Installation Guide for Red Hat Enterprise Linux or Ubuntu.
-
Install Ceph Metadata Server daemons (
ceph-mds
). For details, see the Installation Guide for Red Hat Enterprise Linux or Ubuntu.
2.2. States of Metadata Server Daemons
This section explains two different modes of Metadata Server (MDS) daemons and how a daemon in one mode starts operating in the other mode.
The MDS daemons can be:
- Active
- Standby
The active MDS daemon manages the metadata for files and directories stored on the Ceph File System. The standby MDS daemons serves as backup daemons and become active when an active MDS daemon becomes unresponsive.
By default, a Ceph File System uses only one active MDS daemon. However, you can configure the file system to use multiple active MDS daemons to scale metadata performance for larger workloads. The active MDS daemons will share the metadata workload with one another dynamically when metadata load patterns change. Typically, systems with many clients benefit from multiple active MDS daemons. Note that systems with multiple active MDS daemons still require standby MDS daemons to remain highly available.
What Happens When the Active MDS Daemon Fails
When the active MDS becomes unresponsive, a Monitor will wait the number of seconds specified by the mds_beacon_grace
option. Then the Monitor marks the MDS daemon as laggy
and one of the standby daemons becomes active depending on the configuration.
To change the value of mds_beacon_grace
, add this option to the Ceph configuration file and specify the new value.
2.3. Explanation of Ranks in Metadata Server Configuration
Each Ceph File System has a number of ranks, one by default, which starts at zero.
Ranks define the way how the metadata workload is shared between multiple Metadata Server (MDS) daemons. The number of ranks is the maximum number of MDS daemons that can be active at one time. Each MDS daemon handles a subset of the Ceph File System metadata that is assigned to that rank.
Each MDS daemon initially starts without a rank. The Monitor assigns a rank to the daemon. An MDS daemon can only hold one rank at a time. Daemons only lose ranks when they are stopped.
The max_mds
setting controls how many ranks will be created.
The actual number of ranks in the Ceph File System is only increased if a spare daemon is available to accept the new rank.
Rank States
Ranks can be:
- Up - A rank that is assigned to an MDS daemon.
- Failed - A rank that is not associated with any MDS daemon.
-
Damaged - A rank that is damaged; its metadata is corrupted or missing. Damaged ranks will not be assigned to any MDS daemons until the operators fixes the problem and uses the
ceph mds repaired
command on the damaged rank.
2.4. Types of Standby Configuration
This section describes various types of standby daemons configuration.
Prerequisites
- Familiarize yourself with the meaning of rank in Ceph File System context. See Section 2.3, “Explanation of Ranks in Metadata Server Configuration” for details.
Configuration Parameters
By default, all Metadata Server daemons that do not hold a rank are standby daemons for any active daemon. However, you can configure how the MDS daemons behave in standby mode by using the following parameters in the Ceph configuration file.
-
mds_standby_replay
(Standby Replay) -
mds_standby_for_name
(Standby for Name) -
mds_standby_for_rank
(Standby for Rank) -
mds_standby_for_fscid
(Standby for FSCID)
You can set these parameters in the Ceph configuration file on the host where the MDS daemon runs as opposed to the one on the Monitor node. The MDS daemon loads these settings when it starts and sends them to the Monitor node.
Standby Replay
When the mds_standby_replay
option is set to true
for a daemon, this daemon will continuously read the metadata journal of a rank associated with another MDS daemon (the up
rank). This behavior gives the standby replay daemon a more recent metadata cache and makes the failover process faster if the daemon serving the rank fails.
An up
rank can only have one standby replay daemon assigned to it. If two daemons are both set to be standby replay then one of them becomes a normal non-replay standby daemon.
If the mon_force_standby_active
option is set to false
, a standby replay daemon is only used as a standby for the rank that it is following. If another rank fails, the standby replay daemon will not be used as a replacement, even if no other standby daemons are available. By default, mon_force_standby_active
is set to true
.
Standby for Name
Each daemon has a static name that is set by the administrator when configuring the daemon for the first time. Usually, the host name of the host where the daemon runs is used as the daemon name.
When setting the mds_standby_for_name
option, the standby daemon only takes over a failed rank if the name of the daemon that previously held the rank matches the given name.
Standby for Rank
Set the mds_standby_for_rank
option to configure the standby daemon to only take over the specified rank. If another rank fails, this daemon will not replace it.
If you have multiple file systems, use this option in conjunction with the mds_standby_for_fscid
option to specify which file system rank you target.
Standby for FSCID
The File System Cluster ID (FSCID) is an integer ID specific to a Ceph File System.
If the mds_standby_for_fscid
option is used in conjunction with mds_standby_for_rank
it only specifies which file system rank is referred to.
If mds_standby_for_rank
is not set, then setting mds_standby_for_fscid
causes the standby daemon to target any rank in the specified FSCID. Use mds_standby_for_fscid
if you want to use the standby daemon for any rank, but only within a particular file system.
2.5. Configuring Standby Metadata Server Daemons
This section describes how to configure Metadata Sever (MDS) daemons in standby mode to better manage a failure of the active MDS daemon.
Procedure
Edit the Ceph configuration file. You can edit the main Ceph configuration file present on all nodes, or you can use different configuration files on each MDS node that contain just configuration related to that node. Use parameters described in Section 2.4, “Types of Standby Configuration”.
For example, to configure two MDS daemons
a
andb
acting, as a pair, where whichever one has not currently assigned a rank will be the standby replay follower of the other:[mds.a] mds_standby_replay = true mds_standby_for_rank = 0 [mds.b] mds_standby_replay = true mds_standby_for_rank = 0
For example, to configure four MDS daemons (
a
,b
,c
, andd
) on two Ceph File Systems, where each File System has a pair of daemons:[mds.a] mds_standby_for_fscid = 1 [mds.b] mds_standby_for_fscid = 1 [mds.c] mds_standby_for_fscid = 2 [mds.d] mds_standby_for_fscid = 2
Additional Resources
2.6. Configuring Multiple Active Metadata Server Daemons
This section describes how to configure multiple active Metadata Server (MDS) daemons to scale metadata performance for large systems.
Do not convert all standby MDS daemons to active ones. A Ceph File System requires at least one standby MDS daemon to remain highly available.
The scrubbing process is not currently supported when multiple active MDS daemons are configured.
Procedure
On a node with administration capabilities, set the
max_mds
parameter to the desired number of active MDS daemons. Note that Ceph only increases the actual number of ranks in the Ceph File Systems if a spare MDS daemon is available to take the new rank.ceph fs set <name> max_mds <number>
For example, to increase the number of active MDS daemons to two in the Ceph File System called
cephfs
:[root@monitor ~]# ceph fs set cephfs max_mds 2
Verify the number of active MDS daemons.
ceph fs status <name>
Specify the name of the Ceph File System, for example:
[root@monitor ~]# ceph fs status cephfs cephfs - 0 clients ====== +------+--------+-------+---------------+-------+-------+ | Rank | State | MDS | Activity | dns | inos | +------+--------+-------+---------------+-------+-------+ | 0 | active | node1 | Reqs: 0 /s | 10 | 12 | | 1 | active | node2 | Reqs: 0 /s | 10 | 12 | +------+--------+-------+---------------+-------+-------+ +-----------------+----------+-------+-------+ | Pool | type | used | avail | +-----------------+----------+-------+-------+ | cephfs_metadata | metadata | 4638 | 26.7G | | cephfs_data | data | 0 | 26.7G | +-----------------+----------+-------+-------+ +-------------+ | Standby MDS | +-------------+ | node3 | +-------------+
Additional Resources
2.7. Decreasing the Number of Active MDS Daemons
This section describes how to decrease the number of active MDS daemons.
Prerequisites
The rank that you will remove must be active first, meaning that you must have the same number of MDS daemons as specified by the
max_mds
parameter.ceph fs status <name>
Specify the name of the Ceph File System, for example:
[root@monitor ~]# ceph fs status cephfs cephfs - 0 clients ====== +------+--------+-------+---------------+-------+-------+ | Rank | State | MDS | Activity | dns | inos | +------+--------+-------+---------------+-------+-------+ | 0 | active | node1 | Reqs: 0 /s | 10 | 12 | | 1 | active | node2 | Reqs: 0 /s | 10 | 12 | +------+--------+-------+---------------+-------+-------+ +-----------------+----------+-------+-------+ | Pool | type | used | avail | +-----------------+----------+-------+-------+ | cephfs_metadata | metadata | 4638 | 26.7G | | cephfs_data | data | 0 | 26.7G | +-----------------+----------+-------+-------+ +-------------+ | Standby MDS | +-------------+ | node3 | +-------------+
Procedure
On a node with administration capabilities, change the
max_mds
parameter to the desired number of active MDS daemons.ceph fs set <name> max_mds <number>
For example, to decrease the number of active MDS daemons to one in the Ceph File System called
cephfs
:[root@monitor ~]# ceph fs set cephfs max_mds 1
Deactivate the active MDS daemon:
ceph mds deactivate <role>
Replace
<role>
with "name of the Ceph File System:rank", "FSID:rank", or just rank. For example, to deactivate the MDS daemon with rank 1 on the Ceph File System namedcephfs
:[root@monitor ~]# ceph mds deactivate cephfs:1 telling mds.1:1 127.0.0.1:6800/3187061458 to deactivate
Verify the number of active MDS daemons.
ceph fs status <name>
Specify the name of the Ceph File System, for example:
[root@monitor ~]# ceph fs status cephfs cephfs - 0 clients ====== +------+--------+-------+---------------+-------+-------+ | Rank | State | MDS | Activity | dns | inos | +------+--------+-------+---------------+-------+-------+ | 0 | active | node1 | Reqs: 0 /s | 10 | 12 | +------+--------+-------+---------------+-------+-------+ +-----------------+----------+-------+-------+ | Pool | type | used | avail | +-----------------+----------+-------+-------+ | cephfs_metadata | metadata | 4638 | 26.7G | | cephfs_data | data | 0 | 26.7G | +-----------------+----------+-------+-------+ +-------------+ | Standby MDS | +-------------+ | node3 | | node2 | +-------------+
Additional Resources
2.8. Understanding MDS Cache Size Limits
This section describes ways to limit MDS cache size.
You can limit the size of the Metadata Server (MDS) cache by:
-
A memory limit: A new behavior introduced in the Red Hat Ceph Storage 3. Use the
mds_cache_memory_limit
parameters. Red Hat recommends to use memory limits instead of inode count limits. -
Inode count: Use the
mds_cache_size
parameter. By default, limiting the MDS cache by inode count is disabled.
In addition, you can specify a cache reservation by using the mds_cache_reservation
parameter for MDS operations. The cache reservation is limited as a percentage of the memory or inode limit and is set to 5% by default. The intent of this parameter is to have the MDS maintain an extra reserve of memory for its cache for new metadata operations to use. As a consequence, the MDS should in general operate below its memory limit because it will recall old state from clients in order to drop unused metadata in its cache.
The mds_cache_reservation
parameter replaces the mds_health_cache_threshold
in all situations except when MDS nodes sends a health alert to the Monitors indicating the cache is too large. By default, mds_health_cache_threshold
is 150% of the maximum cache size.
Be aware that the cache limit is not a hard limit. Potential bugs in the CephFS client or MDS or misbehaving applications might cause the MDS to exceed its cache size. The mds_health_cache_threshold
configures the cluster health warning message so that operators can investigate why the MDS cannot shrink its cache.