Chapter 4. Troubleshooting Monitors
This chapter contains information on how to fix the most common errors related to the Ceph Monitors.
Before You Start
- Verify your network connection. See Chapter 3, Troubleshooting Networking Issues for details.
4.1. The Most Common Error Messages Related to Monitors
The following tables list the most common error messages that are returned by the ceph health detail
command, or included in the Ceph logs. The tables provide links to corresponding sections that explain the errors and point to specific procedures to fix the problems.
Error message | See |
---|---|
| |
| |
| |
|
Error message | Log file | See |
---|---|---|
| Main cluster log | |
| Main cluster log | |
| Monitor log | |
| Monitor log | |
| Monitor log |
4.1.1. A Monitor Is Out of Quorum
One or more Monitors are marked as down
but the other Monitors are still able to form a quorum. In addition, the ceph health detail
command returns an error message similar to the following one:
HEALTH_WARN 1 mons down, quorum 1,2 mon.b,mon.c mon.a (rank 0) addr 127.0.0.1:6789/0 is down (out of quorum)
What This Means
Ceph marks a Monitor as down
due to various reasons.
If the ceph-mon
daemon is not running, it might have a corrupted store or some other error is preventing the daemon from starting. Also, the /var/
partition might be full. As a consequence, ceph-mon
is not able to perform any operations to the store located by default at /var/lib/ceph/mon-<short-host-name>/store.db
and terminates.
If the ceph-mon
daemon is running but the Monitor is out of quorum and marked as down
, the cause of the problem depends on the Monitor state:
-
If the Monitor is in the probing state longer than expected, it cannot find the other Monitors. This problem can be caused by networking issues, or the Monitor can have an outdated Monitor map (
monmap
) and be trying to reach the other Monitors on incorrect IP addresses. Alternatively, if themonmap
is up-to-date, Monitor’s clock might not be synchronized. - If the Monitor is in the electing state longer than expected, the Monitor’s clock might not be synchronized.
- If the Monitor changes its state from synchronizing to electing and back, the cluster state is advancing. This means that it is generating new maps faster than the synchronization process can handle.
- If the Monitor marks itself as the leader or a peon, then it believes to be in a quorum, while the remaining cluster is sure that it is not. This problem can be caused by failed clock synchronization.
To Troubleshoot This Problem
Verify that the
ceph-mon
daemon is running. If not, start it:systemctl status ceph-mon@<host-name> systemctl start ceph-mon@<host-name>
Replace
<host-name>
with the short name of the host where the daemon is running. Use thehostname -s
command when unsure.-
If you are not able to start
ceph-mon
, follow the steps in Theceph-mon
Daemon Cannot Start. -
If you are able to start the
ceph-mon
daemon but is is marked asdown
, follow the steps in Theceph-mon
Daemon Is Running, but Still Marked asdown
.
The ceph-mon
Daemon Cannot Start
-
Check the corresponding Monitor log, by default located at
/var/log/ceph/ceph-mon.<host-name>.log
. If the log contains error messages similar to the following ones, the Monitor might have a corrupted store.
Corruption: error in middle of record Corruption: 1 missing files; e.g.: /var/lib/ceph/mon/mon.0/store.db/1234567.ldb
To fix this problem, replace the Monitor. See Section 4.4, “Replacing a Failed Monitor”.
If the log contains an error message similar to the following one, the
/var/
partition might be full. Delete any unnecessary data from/var/
.Caught signal (Bus error)
ImportantDo not delete any data from the Monitor directory manually. Instead, use the
ceph-monstore-tool
to compact it. See Section 4.5, “Compacting the Monitor Store” for details.- If you see any other error messages, open a support ticket. See Chapter 9, Contacting Red Hat Support Service for details.
The ceph-mon
Daemon Is Running, but Still Marked as down
From the Monitor host that is out of the quorum, use the
mon_status
command to check its state:ceph daemon <id> mon_status
Replace
<id>
with the ID of the Monitor, for example:# ceph daemon mon.a mon_status
If the status is probing, verify the locations of the other Monitors in the
mon_status
output.-
If the addresses are incorrect, the Monitor has incorrect Monitor map (
monmap
). To fix this problem, see Section 4.2, “Injecting a Monitor Map”. - If the addresses are correct, verify that the Monitor clocks are synchronized. See ] for details. In addition, troubleshoot any networking issues, see xref:troubleshooting-networking-issues[.
-
If the addresses are incorrect, the Monitor has incorrect Monitor map (
- If the status is electing, verify that the Monitor clocks are synchronized. See Section 4.1.2, “Clock Skew”.
- If the status changes from electing to synchronizing, open a support ticket. See Chapter 9, Contacting Red Hat Support Service for details.
- If the Monitor is the leader or a peon, verify that the Monitor clocks are synchronized. See ]. Open a support ticket if synchronizing the clocks does not solve the problem. See xref:contacting-red-hat-support-service[ for details.
See Also
- Section 4.1.4, “Understanding Monitor Status”
- The Starting, Stopping, Restarting a Daemon by Instances section in the Administration Guide for Red Hat Ceph Storage 3
- The Using the Administration Socket section in the Administration Guide for Red Hat Ceph Storage 3
4.1.2. Clock Skew
A Ceph Monitor is out of quorum, and the ceph health detail
command output contains error messages similar to these:
mon.a (rank 0) addr 127.0.0.1:6789/0 is down (out of quorum) mon.a addr 127.0.0.1:6789/0 clock skew 0.08235s > max 0.05s (latency 0.0045s)
In addition, Ceph logs contain error messages similar to these:
2015-06-04 07:28:32.035795 7f806062e700 0 log [WRN] : mon.a 127.0.0.1:6789/0 clock skew 0.14s > max 0.05s 2015-06-04 04:31:25.773235 7f4997663700 0 log [WRN] : message from mon.1 was stamped 0.186257s in the future, clocks not synchronized
What This Means
The clock skew
error message indicates that Monitors' clocks are not synchronized. Clock synchronization is important because Monitors depend on time precision and behave unpredictably if their clocks are not synchronized.
The mon_clock_drift_allowed
parameter determines what disparity between the clocks is tolerated. By default, this parameter is set to 0.05 seconds.
Do not change the default value of mon_clock_drift_allowed
without previous testing. Changing this value might affect the stability of the Monitors and the Ceph Storage Cluster in general.
Possible causes of the clock skew
error include network problems or problems with Network Time Protocol (NTP) synchronization if that is configured. In addition, time synchronization does not work properly on Monitors deployed on virtual machines.
To Troubleshoot This Problem
- Verify that your network works correctly. For details, see ]. In particular, troubleshoot any problems with NTP clients if you use NTP. See xref:basic-ntp-troubleshooting[ for more information.
- If you use a remote NTP server, consider deploying your own NTP server on your network. For details, see the Configuring NTP Using ntpd chapter in the System Administrator’s Guide for Red Hat Enterprise Linux 7.
- If you do not use an NTP client, set one up. For details, see the Configuring the Network Time Protocol for Red Hat Ceph Storage section in the Red Hat Ceph Storage 3 Installation Guide for Red Hat Enterprise Linux or Ubuntu.
- If you use virtual machines for hosting the Monitors, move them to bare metal hosts. Using virtual machines for hosting Monitors is not supported. For details, see the Red Hat Ceph Storage: Supported configurations article on the Red Hat Customer Portal.
Ceph evaluates time synchronization every five minutes only so there will be a delay between fixing the problem and clearing the clock skew
messages.
See Also
4.1.3. The Monitor Store is Getting Too Big
The ceph health
command returns an error message similar to the following one:
mon.ceph1 store is getting too big! 48031 MB >= 15360 MB -- 62% avail
What This Means
Ceph Monitors store is in fact a LevelDB database that stores entries as key–values pairs. The database includes a cluster map and is located by default at /var/lib/ceph/mon/<cluster-name>-<short-host-name>/store.db
.
Querying a large Monitor store can take time. As a consequence, the Monitor can be delayed in responding to client queries.
In addition, if the /var/
partition is full, the Monitor cannot perform any write operations to the store and terminates. See Section 4.1.1, “A Monitor Is Out of Quorum” for details on troubleshooting this issue.
To Troubleshoot This Problem
Check the size of the database:
du -sch /var/lib/ceph/mon/<cluster-name>-<short-host-name>/store.db
Specify the name of the cluster and the short host name of the host where the
ceph-mon
is running, for example:# du -sch /var/lib/ceph/mon/ceph-host1/store.db 47G /var/lib/ceph/mon/ceph-ceph1/store.db/ 47G total
- Compact the Monitor store. For details, see Section 4.5, “Compacting the Monitor Store”.
See Also
4.1.4. Understanding Monitor Status
The mon_status
command returns information about a Monitor, such as:
- State
- Rank
- Elections epoch
-
Monitor map (
monmap
)
If Monitors are able to form a quorum, use mon_status
with the ceph
command-line utility.
If Monitors are not able to form a quorum, but the ceph-mon
daemon is running, use the administration socket to execute mon_status
. For details, see the Using the Administration Socket section in the Administration Guide for Red Hat Ceph Storage 3.
An example output of mon_status
{ "name": "mon.3", "rank": 2, "state": "peon", "election_epoch": 96, "quorum": [ 1, 2 ], "outside_quorum": [], "extra_probe_peers": [], "sync_provider": [], "monmap": { "epoch": 1, "fsid": "d5552d32-9d1d-436c-8db1-ab5fc2c63cd0", "modified": "0.000000", "created": "0.000000", "mons": [ { "rank": 0, "name": "mon.1", "addr": "172.25.1.10:6789\/0" }, { "rank": 1, "name": "mon.2", "addr": "172.25.1.12:6789\/0" }, { "rank": 2, "name": "mon.3", "addr": "172.25.1.13:6789\/0" } ] } }
Monitor States
- Leader
-
During the electing phase, Monitors are electing a leader. The leader is the Monitor with the highest rank, that is the rank with the lowest value. In the example above, the leader is
mon.1
. - Peon
- Peons are the Monitors in the quorum that are not leaders. If the leader fails, the peon with the highest rank becomes a new leader.
- Probing
-
A Monitor is in the probing state if it is looking for other Monitors. For example after you start the Monitors, they are probing until they find enough Monitors specified in the Monitor map (
monmap
) to form a quorum. - Electing
- A Monitor is in the electing state if it is in the process of electing the leader. Usually, this status changes quickly.
- Synchronizing
- A Monitor is in the synchronizing state if it is synchronizing with the other Monitors to join the quorum. The smaller the Monitor store it, the faster the synchronization process. Therefore, if you have a large store, synchronization takes longer time.
4.2. Injecting a Monitor Map
If a Monitor has an outdated or corrupted Monitor map (monmap
), it cannot join a quorum because it is trying to reach the other Monitors on incorrect IP addresses.
The safest way to fix this problem is to obtain and inject the actual Monitor map from other Monitors. Note that this action overwrites the existing Monitor map kept by the Monitor.
This procedure shows how to inject the Monitor map when the other Monitors are able to form a quorum, or when at least one Monitor has a correct Monitor map. If all Monitors have corrupted store and therefore also the Monitor map, see Section 4.3, “Recovering the Monitor Store”.
Procedure: Injecting a Monitor Map
If the remaining Monitors are able to form a quorum, get the Monitor map by using the
ceph mon getmap
command:# ceph mon getmap -o /tmp/monmap
If the remaining Monitors are not able to form the quorum and you have at least one Monitor with a correct Monitor map, copy it from that Monitor:
Stop the Monitor which you want to copy the Monitor map from:
systemctl stop ceph-mon@<host-name>
For example, to stop the Monitor running on a host with the
host1
short host name:# systemctl stop ceph-mon@host1
Copy the Monitor map:
ceph-mon -i <id> --extract-monmap /tmp/monmap
Replace
<id>
with the ID of the Monitor which you want to copy the Monitor map from, for example:# ceph-mon -i mon.a --extract-monmap /tmp/monmap
Stop the Monitor with the corrupted or outdated Monitor map:
systemctl stop ceph-mon@<host-name>
For example, to stop a Monitor running on a host with the
host2
short host name:# systemctl stop ceph-mon@host2
Inject the Monitor map:
ceph-mon -i <id> --inject-monmap /tmp/monmap
Replace
<id>
with the ID of the Monitor with the corrupted or outdated Monitor map, for example:# ceph-mon -i mon.c --inject-monmap /tmp/monmap
Start the Monitor, for example:
# systemctl start ceph-mon@host2
If you copied the Monitor map from another Monitor, start that Monitor, too, for example:
# systemctl start ceph-mon@host1
See Also
4.3. Recovering the Monitor Store
Ceph Monitors store the cluster map in a key–value store such as LevelDB. If the store is corrupted on a Monitor, the Monitor terminates unexpectedly and fails to start again. The Ceph logs might include the following errors:
Corruption: error in middle of record Corruption: 1 missing files; e.g.: /var/lib/ceph/mon/mon.0/store.db/1234567.ldb
Production clusters must use at least three Monitors so that if one fails, it can be replaced with another one. However, under certain circumstances, all Monitors can have corrupted stores. For example, when the Monitor nodes have incorrectly configured disk or file system settings, a power outage can corrupt the underlying file system.
If the store is corrupted on all Monitors, you can recover it with information stored on the OSD nodes by using utilities called ceph-monstore-tool
and ceph-objectstore-tool
.
This procedure cannot recover the following information:
- Metadata Daemon Server (MDS) keyrings and maps
Placement Group settings:
-
full ratio
set by using theceph pg set_full_ratio
command -
nearfull ratio
set by using theceph pg set_nearfull_ratio
command
-
Never restore the monitor store from an old backup. Rebuild the monitor store from the current cluster state using the following steps and restore from that.
Before You Start
-
Ensure that you have the
rsync
utility and theceph-test
package installed.
Procedure: Recovering the Monitor Store
Use the following commands from the Monitor node with the corrupted store.
Collect the cluster map from all OSD nodes:
ms=<directory> mkdir $ms for host in $host_list; do rsync -avz "$ms" root@$host:"$ms"; rm -rf "$ms" ssh root@$host <<EOF for osd in /var/lib/ceph/osd/ceph-*; do ceph-objectstore-tool --data-path \$osd --op update-mon-db --mon-store-path $ms done EOF rsync -avz root@$host:$ms $ms; done
Replace
<directory>
with a temporary directory to store the collected cluster map, for example:$ ms=/tmp/monstore/ $ mkdir $ms $ for host in $host_list; do rsync -avz "$ms" root@$host:"$ms"; rm -rf "$ms" ssh root@$host <<EOF for osd in /var/lib/ceph/osd/ceph-*; do ceph-objectstore-tool --data-path \$osd --op update-mon-db --mon-store-path $ms done EOF rsync -avz root@$host:$ms $ms; done
Set appropriate capabilities:
ceph-authtool <keyring> -n mon. --cap mon 'allow *' ceph-authtool <keyring> -n client.admin --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *'
Replace
<keyring>
with the path to the client administration keyring, for example:$ ceph-authtool /etc/ceph/ceph.client.admin.keyring -n mon. --cap mon 'allow *' $ ceph-authtool /etc/ceph/ceph.client.admin.keyring -n client.admin --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *'
Rebuild the Monitor store from the collected map:
ceph-monstore-tool <directory> rebuild -- --keyring <keyring>
Replace
<directory>
with the temporary directory from the first step and<keyring>
with the path to the client administration keyring, for example:$ ceph-monstore-tool /tmp/mon-store rebuild -- --keyring /etc/ceph/ceph.client.admin.keyring
NoteIf you do not use the
cephfx
authentication, omit the--keyring
option:$ ceph-monstore-tool /tmp/mon-store rebuild
Back up the corrupted store:
mv /var/lib/ceph/mon/<mon-ID>/store.db \ /var/lib/ceph/mon/<mon-ID>/store.db.corrupted
Replace
<mon-ID>
with the Monitor ID, for example<mon.0>
:# mv /var/lib/ceph/mon/mon.0/store.db \ /var/lib/ceph/mon/mon.0/store.db.corrupted
Replace the corrupted store:
mv /tmp/mon-store/store.db /var/lib/ceph/mon/<mon-ID>/store.db
Replace
<mon-ID>
with the Monitor ID, for example<mon.0>
:# mv /tmp/mon-store/store.db /var/lib/ceph/mon/mon.0/store.db
Repeat this step for all Monitors with corrupted store.
Change the owner of the new store:
chown -R ceph:ceph /var/lib/ceph/mon/<mon-ID>/store.db
Replace
<mon-ID>
with the Monitor ID, for example<mon.0>
:# chown -R ceph:ceph /var/lib/ceph/mon/mon.0/store.db
Repeat this step for all Monitors with corrupted store.
See also
4.4. Replacing a Failed Monitor
When a Monitor has a corrupted store, the recommended way to fix this problem is to replace the Monitor by using the Ansible automation application.
Before You Start
- Before removing a Monitor, ensure that the other Monitors are running and able to form a quorum.
Procedure: Replacing a Failed Monitor
From the Monitor host, remove the Monitor store by default located at
/var/lib/ceph/mon/<cluster-name>-<short-host-name>
:rm -rf /var/lib/ceph/mon/<cluster-name>-<short-host-name>
Specify the short host name of the Monitor host and the cluster name. For example, to remove the Monitor store of a Monitor running on
host1
from a cluster calledremote
:# rm -rf /var/lib/ceph/mon/remote-host1
Remove the Monitor from the Monitor map (
monmap
):ceph mon remove <short-host-name> --cluster <cluster-name>
Specify the short host name of the Monitor host and the cluster name. For example, to remove the Monitor running on
host1
from a cluster calledremote
:# ceph mon remove host1 --cluster remote
- Troubleshoot and fix any problems related to the underlying file system or hardware of the Monitor host.
From the Ansible administration node, redeploy the Monitor by running the
ceph-ansible
playbook:$ /usr/share/ceph-ansible/ansible-playbook site.yml
See Also
- Section 4.1.1, “A Monitor Is Out of Quorum”
- The Managing Cluster Size chapter in the Administration Guide for Red Hat Ceph Storage 3
- The Deploying Red Hat Ceph Storage chapter in the Red Hat Ceph Storage 3 Installation Guide for Red Hat Enterprise Linux
4.5. Compacting the Monitor Store
When the Monitor store has grown big in size, you can compact it:
-
Dynamically by using the
ceph tell
command. See the Compacting the Monitor Store Dynamically procedure for details. -
Upon the start of the
ceph-mon
daemon. See the Compacting the Monitor Store at Startup procedure for details. -
By using the
ceph-monstore-tool
when theceph-mon
daemon is not running. Use this method when the previously mentioned methods fail to compact the Monitor store or when the Monitor is out of quorum and its log contains theCaught signal (Bus error)
error message. See the Compacting the Monitor Store withceph-monstore-tool
procedure for details.
Monitor store size changes when the cluster is not in the active+clean
state or during the rebalancing process. For this reason, compact the Monitor store when rebalancing is completed. Also, ensure that the placement groups are in the active+clean
state.
Procedure: Compacting the Monitor Store Dynamically
To compact the Monitor store when the ceph-mon
daemon is running:
ceph tell mon.<host-name> compact
Replace <host-name>
with the short host name of the host where the ceph-mon
is running. Use the hostname -s
command when unsure.
# ceph tell mon.host1 compact
Procedure: Compacting the Monitor Store at Startup
Add the following parameter to the Ceph configuration under the
[mon]
section:[mon] mon_compact_on_start = true
Restart the
ceph-mon
daemon:systemctl restart ceph-mon@<host-name>
Replace
<host-name>
with the short name of the host where the daemon is running. Use thehostname -s
command when unsure.# systemctl restart ceph-mon@host1
Ensure that Monitors have formed a quorum:
# ceph mon stat
- Repeat these steps on other Monitors if needed.
Procedure: Compacting Monitor Store with ceph-monstore-tool
Before you start, ensure that you have the ceph-test
package installed.
Verify that the
ceph-mon
daemon with the large store is not running. Stop the daemon if needed.systemctl status ceph-mon@<host-name> systemctl stop ceph-mon@<host-name>
Replace
<host-name>
with the short name of the host where the daemon is running. Use thehostname -s
command when unsure.# systemctl status ceph-mon@host1 # systemctl stop ceph-mon@host1
Compact the Monitor store:
ceph-monstore-tool /var/lib/ceph/mon/mon.<host-name> compact
Replace
<host-name>
with a short host name of the Monitor host.# ceph-monstore-tool /var/lib/ceph/mon/mon.node1 compact
Start
ceph-mon
again:systemctl start ceph-mon@<host-name>
For example:
# systemctl start ceph-mon@host1
See Also
4.6. Opening Ports for Ceph Manager
The ceph-mgr
daemons receive placement group information from OSDs on the same range of ports as the ceph-osd
daemons. If these ports are not open, a cluster will devolve from HEALTH_OK
to HEALTH_WARN
and will indicate that PGs are unknown
with a percentage count of the PGs unknown.
To resolve this situation, for each host running ceph-mgr
daemons, open ports 6800:7300
. For example:
[root@ceph-mgr] # firewall-cmd --add-port 6800:7300/tcp [root@ceph-mgr] # firewall-cmd --add-port 6800:7300/tcp --permanent
Then, restart the ceph-mgr
daemons.