Appendix A. Troubleshooting
A.1. CephFS Health Messages
- Cluster health checks
The Ceph monitor daemons generate health messages in response to certain states of the MDS cluster. Below is the list of the cluster health messages and their explanation.
- mds rank(s) <ranks> have failed
- One or more MDS ranks are not currently assigned to any MDS daemon. The cluster will not recover until a suitable replacement daemon starts.
- mds rank(s) <ranks> are damaged
- One or more MDS ranks has encountered severe damage to its stored metadata, and cannot start again until the metadata is repaired.
- mds cluster is degraded
-
One or more MDS ranks are not currently up and running, clients might pause metadata I/O until this situation is resolved. This includes ranks being failed or damaged, and additionally includes ranks which are running on an MDS but are not in the
active
state yet, for example ranks in thereplay
state. - mds <names> are laggy
-
The MDS daemons are supposed to send beacon messages to the monitor in an interval specified by the
mds_beacon_interval
option (default is 4 seconds). If an MDS daemon fails to send a message within the time specified by themds_beacon_grace
option (default is 15 seconds), the Ceph monitor marks the MDS daemon aslaggy
and automatically replaces it with a standby daemon if any is available.
- Daemon-reported health checks
The MDS daemons can identify a variety of unwanted conditions, and return them in the output of the
ceph status
command. This conditions have human readable messages, and additionally a unique code startingMDS_HEALTH
which appears in JSON output. Below is the list of the daemon messages, their codes and explanation.- "Behind on trimming…"
Code: MDS_HEALTH_TRIM
CephFS maintains a metadata journal that is divided into log segments. The length of journal (in number of segments) is controlled by the
mds_log_max_segments
setting. When the number of segments exceeds that setting, the MDS starts writing back metadata so that it can remove (trim) the oldest segments. If this process is too slow, or a software bug is preventing trimming, then this health message appears. The threshold for this message to appear is for the number of segments to be doublemds_log_max_segments
.- "Client <name> failing to respond to capability release"
Code: MDS_HEALTH_CLIENT_LATE_RELEASE, MDS_HEALTH_CLIENT_LATE_RELEASE_MANY
CephFS clients are issued capabilities by the MDS. The capabilities work like locks. Sometimes, for example when another client needs access, the MDS requests clients to release their capabilities. If the client is unresponsive, it might fail to do so promptly or fail to do so at all. This message appears if a client has taken a longer time to comply than the time specified by the
mds_revoke_cap_timeout
option (default is 60 seconds).- "Client <name> failing to respond to cache pressure"
Code: MDS_HEALTH_CLIENT_RECALL, MDS_HEALTH_CLIENT_RECALL_MANY
Clients maintain a metadata cache. Items, such as inodes, in the client cache are also pinned in the MDS cache. When the MDS needs to shrink its cache to stay within its own cache size limits, the MDS sends messages to clients to shrink their caches too. If a client is unresponsive, it can prevent the MDS from properly staying within its cache size and the MDS might eventually run out of memory and terminate unexpectedly. This message appears if a client has taken more time to comply than the time specified by the
mds_recall_state_timeout
option (default is 60 seconds). See Section 2.8, “Understanding MDS Cache Size Limits” for details.- "Client name failing to advance its oldest client/flush tid"
Code: MDS_HEALTH_CLIENT_OLDEST_TID, MDS_HEALTH_CLIENT_OLDEST_TID_MANY
The CephFS protocol for communicating between clients and MDS servers uses a field called oldest tid to inform the MDS of which client requests are fully complete so that the MDS can forget about them. If an unresponsive client is failing to advance this field, the MDS might be prevented from properly cleaning up resources used by client requests. This message appears if a client have more requests than the number specified by the
max_completed_requests
option (default is 100000) that are complete on the MDS side but have not yet been accounted for in the client’s oldest tid value.- "Metadata damage detected"
Code: MDS_HEALTH_DAMAGE
Corrupt or missing metadata was encountered when reading from the metadata pool. This message indicates that the damage was sufficiently isolated for the MDS to continue operating, although client accesses to the damaged subtree return I/O errors. Use the
damage ls
administration socket command to view details on the damage. This message appears as soon as any damage is encountered.- "MDS in read-only mode"
Code: MDS_HEALTH_READ_ONLY
The MDS has entered into read-only mode and will return the
EROFS
error codes to client operations that attempt to modify any metadata. The MDS enters into read-only mode:- If it encounters a write error while writing to the metadata pool.
-
If the administrator forces the MDS to enter into read-only mode by using the
force_readonly
administration socket command.
- "<N> slow requests are blocked"
Code: MDS_HEALTH_SLOW_REQUEST
One or more client requests have not been completed promptly, indicating that the MDS is either running very slowly, or encountering a bug. Use the
ops
administration socket command to list outstanding metadata operations. This message appears if any client requests have taken longer time than the value specified by themds_op_complaint_time
option (default is 30 seconds).- ""Too many inodes in cache"
Code: MDS_HEALTH_CACHE_OVERSIZED
The MDS has failed to trim its cache to comply with the limit set by the administrator. If the MDS cache becomes too large, the daemon might exhaust available memory and terminate unexpectedly. This message appears if the MDS cache size is 50% greater than its limit (by default). See Section 2.8, “Understanding MDS Cache Size Limits” for details.