Chapter 5. Troubleshooting

download PDF

5.1. CephFS Health Messages

Cluster health checks

The Ceph monitor daemons generate health messages in response to certain states of the MDS cluster. Below is the list of the cluster health messages and their explanation.

mds rank(s) <ranks> have failed
One or more MDS ranks are not currently assigned to any MDS daemon. The cluster will not recover until a suitable replacement daemon starts.
mds rank(s) <ranks> are damaged
One or more MDS ranks has encountered severe damage to its stored metadata, and cannot start again until the metadata is repaired.
mds cluster is degraded
One or more MDS ranks are not currently up and running, clients might pause metadata I/O until this situation is resolved. This includes ranks being failed or damaged, and additionally includes ranks which are running on an MDS but are not in the active state yet, for example ranks in the replay state.
mds <names> are laggy
The MDS daemons are supposed to send beacon messages to the monitor in an interval specified by the mds_beacon_interval option (default is 4 seconds). If an MDS daemon fails to send a message within the time specified by the mds_beacon_grace option (default is 15 seconds), the Ceph monitor marks the MDS daemon as laggy and automatically replaces it with a standby daemon if any is available.
Daemon-reported health checks

The MDS daemons can identify a variety of unwanted conditions, and return them in the output of the ceph status command. This conditions have human readable messages, and additionally a unique code starting MDS_HEALTH which appears in JSON output. Below is the list of the daemon messages, their codes and explanation.

"Behind on trimming…​"


CephFS maintains a metadata journal that is divided into log segments. The length of journal (in number of segments) is controlled by the mds_log_max_segments setting. When the number of segments exceeds that setting, the MDS starts writing back metadata so that it can remove (trim) the oldest segments. If this process is too slow, or a software bug is preventing trimming, then this health message appears. The threshold for this message to appear is for the number of segments to be double mds_log_max_segments.

"Client <name> failing to respond to capability release"


CephFS clients are issued capabilities by the MDS. The capabilities work like locks. Sometimes, for example when another client needs access, the MDS requests clients to release their capabilities. If the client is unresponsive, it might fail to do so promptly or fail to do so at all. This message appears if a client has taken a longer time to comply than the time specified by the mds_revoke_cap_timeout option (default is 60 seconds).

"Client <name> failing to respond to cache pressure"


Clients maintain a metadata cache. Items, such as inodes, in the client cache are also pinned in the MDS cache. When the MDS needs to shrink its cache to stay within the size specified by the mds_cache_size option, the MDS sends messages to clients to shrink their caches too. If a client is unresponsive, it can prevent the MDS from properly staying within its cache size and the MDS might eventually run out of memory and terminate unexpectedly. This message appears if a client has taken more time to comply than the time specified by the mds_recall_state_timeout option (default is 60 seconds).

"Client name failing to advance its oldest client/flush tid"


The CephFS protocol for communicating between clients and MDS servers uses a field called oldest tid to inform the MDS of which client requests are fully complete so that the MDS can forget about them. If an unresponsive client is failing to advance this field, the MDS might be prevented from properly cleaning up resources used by client requests. This message appears if a client have more requests than the number specified by the max_completed_requests option (default is 100000) that are complete on the MDS side but have not yet been accounted for in the client’s oldest tid value.

"Metadata damage detected"


Corrupt or missing metadata was encountered when reading from the metadata pool. This message indicates that the damage was sufficiently isolated for the MDS to continue operating, although client accesses to the damaged subtree return I/O errors. Use the damage ls administration socket command to view details on the damage. This message appears as soon as any damage is encountered.

"MDS in read-only mode"


The MDS has entered into read-only mode and will return the EROFS error codes to client operations that attempt to modify any metadata. The MDS enters into read-only mode:

  • If it encounters a write error while writing to the metadata pool.
  • If the administrator forces the MDS to enter into read-only mode by using the force_readonly administration socket command.
"<N> slow requests are blocked"


One or more client requests have not been completed promptly, indicating that the MDS is either running very slowly, or encountering a bug. Use the ops administration socket command to list outstanding metadata operations. This message appears if any client requests have taken longer time than the value specified by the mds_op_complaint_time option (default is 30 seconds).

""Too many inodes in cache"


The MDS has failed to trim its cache to comply with the limit set by the administrator. If the MDS cache becomes too large, the daemon might exhaust available memory and terminate unexpectedly. This message appears if the actual cache size in inodes is at least 50% greater than the value specified by the mds_cache_size option (default is 100000).

Red Hat logoGithubRedditYoutubeTwitter


Try, buy, & sell


About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.