Chapter 4. Troubleshooting Monitors

This chapter contains information on how to fix the most common errors related to the Ceph Monitors.

Before You Start

Verify your network connection. See Chapter 3, Troubleshooting Networking Issues for details.

4.1. The Most Common Error Messages Related to Monitors
Copy link

The following tables list the most common error messages that are returned by the ceph health detail command, or included in the Ceph logs. The tables provide links to corresponding sections that explain the errors and point to specific procedures to fix the problems.

Expand

Table 4.1. Error Messages Related to Monitors
Error message	See
`HEALTH_WARN`
`mon.X is down (out of quorum)`	Section 4.1.1, “A Monitor Is Out of Quorum”
`clock skew`	Section 4.1.2, “Clock Skew”
`store is getting too big!`	Section 4.1.3, “The Monitor Store is Getting Too Big”

Expand

Table 4.2. Common Error Messages in Ceph Logs Related to Monitors
Error message	Log file	See
`clock skew`	Main cluster log	Section 4.1.2, “Clock Skew”
`clocks not synchronized`	Main cluster log	Section 4.1.2, “Clock Skew”
`Corruption: error in middle of record`	Monitor log	Section 4.1.1, “A Monitor Is Out of Quorum” Section 4.3, “Recovering the Monitor Store”
`Corruption: 1 missing files`	Monitor log	Section 4.1.1, “A Monitor Is Out of Quorum” Section 4.3, “Recovering the Monitor Store”
`Caught signal (Bus error)`	Monitor log	Section 4.1.1, “A Monitor Is Out of Quorum”

4.1.1. A Monitor Is Out of Quorum
Copy link

One or more Monitors are marked as down but the other Monitors are still able to form a quorum. In addition, the ceph health detail command returns an error message similar to the following one:

HEALTH_WARN 1 mons down, quorum 1,2 mon.b,mon.c
mon.a (rank 0) addr 127.0.0.1:6789/0 is down (out of quorum)

HEALTH_WARN 1 mons down, quorum 1,2 mon.b,mon.c
mon.a (rank 0) addr 127.0.0.1:6789/0 is down (out of quorum)

Copy to Clipboard

Toggle word wrap

What This Means

Ceph marks a Monitor as down due to various reasons.

If the ceph-mon daemon is not running, it might have a corrupted store or some other error is preventing the daemon from starting. Also, the /var/ partition might be full. As a consequence, ceph-mon is not able to perform any operations to the store located by default at /var/lib/ceph/mon-<short-host-name>/store.db and terminates.

If the ceph-mon daemon is running but the Monitor is out of quorum and marked as down, the cause of the problem depends on the Monitor state:

If the Monitor is in the probing state longer than expected, it cannot find the other Monitors. This problem can be caused by networking issues, or the Monitor can have an outdated Monitor map (monmap) and be trying to reach the other Monitors on incorrect IP addresses. Alternatively, if the monmap is up-to-date, Monitor’s clock might not be synchronized.
If the Monitor is in the electing state longer than expected, the Monitor’s clock might not be synchronized.
If the Monitor changes its state from synchronizing to electing and back, the cluster state is advancing. This means that it is generating new maps faster than the synchronization process can handle.
If the Monitor marks itself as the leader or a peon, then it believes to be in a quorum, while the remaining cluster is sure that it is not. This problem can be caused by failed clock synchronization.

To Troubleshoot This Problem

Verify that the ceph-mon daemon is running. If not, start it:
```
systemctl status ceph-mon@<host-name>
systemctl start ceph-mon@<host-name>
```
```
systemctl status ceph-mon@<host-name>
systemctl start ceph-mon@<host-name>
```
Copy to Clipboard Toggle word wrap
Replace <host-name> with the short name of the host where the daemon is running. Use the hostname -s command when unsure.
If you are not able to start ceph-mon, follow the steps in The ceph-mon Daemon Cannot Start.
If you are able to start the ceph-mon daemon but is is marked as down, follow the steps in The ceph-mon Daemon Is Running, but Still Marked as down.

The `ceph-mon` Daemon Cannot Start

Check the corresponding Monitor log, by default located at /var/log/ceph/ceph-mon.<host-name>.log.
If the log contains error messages similar to the following ones, the Monitor might have a corrupted store.
```
Corruption: error in middle of record
Corruption: 1 missing files; e.g.: /var/lib/ceph/mon/mon.0/store.db/1234567.ldb
```
```
Corruption: error in middle of record
Corruption: 1 missing files; e.g.: /var/lib/ceph/mon/mon.0/store.db/1234567.ldb
```
Copy to Clipboard Toggle word wrap
To fix this problem, replace the Monitor. See Section 4.4, “Replacing a Failed Monitor”.
If the log contains an error message similar to the following one, the /var/ partition might be full. Delete any unnecessary data from /var/.
```
Caught signal (Bus error)
```
```
Caught signal (Bus error)
```
Copy to Clipboard Toggle word wrap
Important
Do not delete any data from the Monitor directory manually. Instead, use the ceph-monstore-tool to compact it. See Section 4.5, “Compacting the Monitor Store” for details.
If you see any other error messages, open a support ticket. See Chapter 7, Contacting Red Hat Support Service for details.

The `ceph-mon` Daemon Is Running, but Still Marked as `down`

From the Monitor host that is out of the quorum, use the mon_status command to check its state:
```
ceph daemon <id> mon_status
```
```
ceph daemon <id> mon_status
```
Copy to Clipboard Toggle word wrap
Replace <id> with the ID of the Monitor, for example:
```
ceph daemon mon.a mon_status
```
```
# ceph daemon mon.a mon_status
```
Copy to Clipboard Toggle word wrap
If the status is probing, verify the locations of the other Monitors in the mon_status output.
1. If the addresses are incorrect, the Monitor has incorrect Monitor map (monmap). To fix this problem, see Section 4.2, “Injecting a Monitor Map”.
2. If the addresses are correct, verify that the Monitor clocks are synchronized. See Section 4.1.2, “Clock Skew” for details. In addition, troubleshoot any networking issues, see Chapter 3, Troubleshooting Networking Issues.
If the status is electing, verify that the Monitor clocks are synchronized. See Section 4.1.2, “Clock Skew”.
If the status changes from electing to synchronizing, open a support ticket. See Chapter 7, Contacting Red Hat Support Service for details.
If the Monitor is the leader or a peon, verify that the Monitor clocks are synchronized. See Section 4.1.2, “Clock Skew”. Open a support ticket if synchronizing the clocks does not solve the problem. See Chapter 7, Contacting Red Hat Support Service for details.

4.1.2. Clock Skew
Copy link

A Ceph Monitor is out of quorum, and the ceph health detail command output contains error messages similar to these:

mon.a (rank 0) addr 127.0.0.1:6789/0 is down (out of quorum)
mon.a addr 127.0.0.1:6789/0 clock skew 0.08235s > max 0.05s (latency 0.0045s)

mon.a (rank 0) addr 127.0.0.1:6789/0 is down (out of quorum)
mon.a addr 127.0.0.1:6789/0 clock skew 0.08235s > max 0.05s (latency 0.0045s)

Copy to Clipboard

Toggle word wrap

In addition, Ceph logs contain error messages similar to these:

2015-06-04 07:28:32.035795 7f806062e700 0 log [WRN] : mon.a 127.0.0.1:6789/0 clock skew 0.14s > max 0.05s
2015-06-04 04:31:25.773235 7f4997663700 0 log [WRN] : message from mon.1 was stamped 0.186257s in the future, clocks not synchronized

2015-06-04 07:28:32.035795 7f806062e700 0 log [WRN] : mon.a 127.0.0.1:6789/0 clock skew 0.14s > max 0.05s
2015-06-04 04:31:25.773235 7f4997663700 0 log [WRN] : message from mon.1 was stamped 0.186257s in the future, clocks not synchronized

Copy to Clipboard

Toggle word wrap

What This Means

The clock skew error message indicates that Monitors' clocks are not synchronized. Clock synchronization is important because Monitors depend on time precision and behave unpredictably if their clocks are not synchronized.

The mon_clock_drift_allowed parameter determines what disparity between the clocks is tolerated. By default, this parameter is set to 0.05 seconds.

Important

Do not change the default value of mon_clock_drift_allowed without previous testing. Changing this value might affect the stability of the Monitors and the Ceph Storage Cluster in general.

Possible causes of the clock skew error include network problems or problems with Network Time Protocol (NTP) synchronization if that is configured. In addition, time synchronization does not work properly on Monitors deployed on virtual machines.

To Troubleshoot This Problem

Verify that your network works correctly. For details, see Chapter 3, Troubleshooting Networking Issues. In particular, troubleshoot any problems with NTP clients if you use NTP. See Section 3.2, “Basic NTP Troubleshooting” for more information.
If you use a remote NTP server, consider deploying your own NTP server on your network. For details, see the Configuring NTP Using ntpd chapter in the System Administrator’s Guide for Red Hat Enterprise Linux 7.
If you do not use an NTP client, set one up. For details, see the Configuring Network Time Protocol section in the Red Hat Ceph Storage 2 Installation Guide for Red Hat Enterprise Linux or Ubuntu.
If you use virtual machines for hosting the Monitors, move them to bare metal hosts. Using virtual machines for hosting Monitors is not supported. For details, see the Red Hat Ceph Storage: Supported configurations article on the Red Hat Customer Portal.

Note

Ceph evaluates time synchronization every five minutes only so there will be a delay between fixing the problem and clearing the clock skew messages.

4.1.3. The Monitor Store is Getting Too Big
Copy link

The ceph health command returns an error message similar to the following one:

mon.ceph1 store is getting too big! 48031 MB >= 15360 MB -- 62% avail

mon.ceph1 store is getting too big! 48031 MB >= 15360 MB -- 62% avail

Copy to Clipboard

Toggle word wrap

What This Means

Ceph Monitors store is in fact a LevelDB database that stores entries as key–values pairs. The database includes a cluster map and is located by default at /var/lib/ceph/mon/<cluster-name>-<short-host-name>/store.db.

Querying a large Monitor store can take time. As a consequence, the Monitor can be delayed in responding to client queries.

In addition, if the /var/ partition is full, the Monitor cannot perform any write operations to the store and terminates. See Section 4.1.1, “A Monitor Is Out of Quorum” for details on troubleshooting this issue.

To Troubleshoot This Problem

Check the size of the database:

du -sch /var/lib/ceph/mon/<cluster-name>-<short-host-name>/store.db

du -sch /var/lib/ceph/mon/<cluster-name>-<short-host-name>/store.db

Copy to Clipboard

Toggle word wrap

Specify the name of the cluster and the short host name of the host where the ceph-mon is running, for example:

du -sch /var/lib/ceph/mon/ceph-host1/store.db

# du -sch /var/lib/ceph/mon/ceph-host1/store.db
47G     /var/lib/ceph/mon/ceph-ceph1/store.db/
47G     total

Copy to Clipboard

Toggle word wrap

Compact the Monitor store. For details, see Section 4.5, “Compacting the Monitor Store”.

4.1.4. Understanding Monitor Status
Copy link

The mon_status command returns information about a Monitor, such as:

State
Rank
Elections epoch
Monitor map (monmap)

If Monitors are able to form a quorum, use mon_status with the ceph command-line utility.

If Monitors are not able to form a quorum, but the ceph-mon daemon is running, use the administration socket to execute mon_status. For details, see the Using the Administration Socket section in the Administration Guide for Red Hat Ceph Storage 2.

An example output of mon_status

{
    "name": "mon.3",
    "rank": 2,
    "state": "peon",
    "election_epoch": 96,
    "quorum": [
        1,
        2
    ],
    "outside_quorum": [],
    "extra_probe_peers": [],
    "sync_provider": [],
    "monmap": {
        "epoch": 1,
        "fsid": "d5552d32-9d1d-436c-8db1-ab5fc2c63cd0",
        "modified": "0.000000",
        "created": "0.000000",
        "mons": [
            {
                "rank": 0,
                "name": "mon.1",
                "addr": "172.25.1.10:6789\/0"
            },
            {
                "rank": 1,
                "name": "mon.2",
                "addr": "172.25.1.12:6789\/0"
            },
            {
                "rank": 2,
                "name": "mon.3",
                "addr": "172.25.1.13:6789\/0"
            }
        ]
    }
}

{
    "name": "mon.3",
    "rank": 2,
    "state": "peon",
    "election_epoch": 96,
    "quorum": [
        1,
        2
    ],
    "outside_quorum": [],
    "extra_probe_peers": [],
    "sync_provider": [],
    "monmap": {
        "epoch": 1,
        "fsid": "d5552d32-9d1d-436c-8db1-ab5fc2c63cd0",
        "modified": "0.000000",
        "created": "0.000000",
        "mons": [
            {
                "rank": 0,
                "name": "mon.1",
                "addr": "172.25.1.10:6789\/0"
            },
            {
                "rank": 1,
                "name": "mon.2",
                "addr": "172.25.1.12:6789\/0"
            },
            {
                "rank": 2,
                "name": "mon.3",
                "addr": "172.25.1.13:6789\/0"
            }
        ]
    }
}

Copy to Clipboard

Toggle word wrap

Monitor States

Leader: During the electing phase, Monitors are electing a leader. The leader is the Monitor with the highest rank, that is the rank with the lowest value. In the example above, the leader is mon.1.
Peon: Peons are the Monitors in the quorum that are not leaders. If the leader fails, the peon with the highest rank becomes a new leader.
Probing: A Monitor is in the probing state if it is looking for other Monitors. For example after you start the Monitors, they are probing until they find enough Monitors specified in the Monitor map (monmap) to form a quorum.
Electing: A Monitor is in the electing state if it is in the process of electing the leader. Usually, this status changes quickly.
Synchronizing: A Monitor is in the synchronizing state if it is synchronizing with the other Monitors to join the quorum. The smaller the Monitor store it, the faster the synchronization process. Therefore, if you have a large store, synchronization takes longer time.

4.2. Injecting a Monitor Map
Copy link

If a Monitor has an outdated or corrupted Monitor map (monmap), it cannot join a quorum because it is trying to reach the other Monitors on incorrect IP addresses.

The safest way to fix this problem is to obtain and inject the actual Monitor map from other Monitors. Note that this action overwrites the existing Monitor map kept by the Monitor.

This procedure shows how to inject the Monitor map when the other Monitors are able to form a quorum, or when at least one Monitor has a correct Monitor map. If all Monitors have corrupted store and therefore also the Monitor map, see Section 4.3, “Recovering the Monitor Store”.

Procedure: Injecting a Monitor Map

If the remaining Monitors are able to form a quorum, get the Monitor map by using the ceph mon getmap command:
```
ceph mon getmap -o /tmp/monmap
```
```
# ceph mon getmap -o /tmp/monmap
```
Copy to Clipboard Toggle word wrap
If the remaining Monitors are not able to form the quorum and you have at least one Monitor with a correct Monitor map, copy it from that Monitor:
1. Stop the Monitor which you want to copy the Monitor map from:
  systemctl stop ceph-mon@<host-name>
  Copy to Clipboard Toggle word wrap
  For example, to stop the Monitor running on a host with the host1 short host name:
  # systemctl stop ceph-mon@host1
  Copy to Clipboard Toggle word wrap
2. Copy the Monitor map:
  ceph-mon -i <id> --extract-monmap /tmp/monmap
  Copy to Clipboard Toggle word wrap
  Replace <id> with the ID of the Monitor which you want to copy the Monitor map from, for example:
  # ceph-mon -i mon.a --extract-monmap /tmp/monmap
  Copy to Clipboard Toggle word wrap
Stop the Monitor with the corrupted or outdated Monitor map:
```
systemctl stop ceph-mon@<host-name>
```
```
systemctl stop ceph-mon@<host-name>
```
Copy to Clipboard Toggle word wrap
For example, to stop a Monitor running on a host with the host2 short host name:
```
systemctl stop ceph-mon@host2
```
```
# systemctl stop ceph-mon@host2
```
Copy to Clipboard Toggle word wrap
Inject the Monitor map:
```
ceph-mon -i <id> --inject-monmap /tmp/monmap
```
```
ceph-mon -i <id> --inject-monmap /tmp/monmap
```
Copy to Clipboard Toggle word wrap
Replace <id> with the ID of the Monitor with the corrupted or outdated Monitor map, for example:
```
ceph-mon -i mon.c --inject-monmap /tmp/monmap
```
```
# ceph-mon -i mon.c --inject-monmap /tmp/monmap
```
Copy to Clipboard Toggle word wrap
Start the Monitor, for example:
```
systemctl start ceph-mon@host2
```
```
# systemctl start ceph-mon@host2
```
Copy to Clipboard Toggle word wrap
If you copied the Monitor map from another Monitor, start that Monitor, too, for example:
```
systemctl start ceph-mon@host1
```
```
# systemctl start ceph-mon@host1
```
Copy to Clipboard Toggle word wrap

4.3. Recovering the Monitor Store
Copy link

Ceph Monitors store the cluster map in a key–value store such as LevelDB. If the store is corrupted on a Monitor, the Monitor terminates unexpectedly and fails to start again. The Ceph logs might include the following errors:

Corruption: error in middle of record
Corruption: 1 missing files; e.g.: /var/lib/ceph/mon/mon.0/store.db/1234567.ldb

Corruption: error in middle of record
Corruption: 1 missing files; e.g.: /var/lib/ceph/mon/mon.0/store.db/1234567.ldb

Copy to Clipboard

Toggle word wrap

Production clusters must use at least three Monitors so that if one fails, it can be replaced with another one. However, under certain circumstances, all Monitors can have corrupted stores. For example, when the Monitor nodes have incorrectly configured disk or file system settings, a power outage can corrupt the underlying file system.

If the store is corrupted on all Monitors, you can recover it with information stored on the OSD nodes by using utilities called ceph-monstore-tool and ceph-objectstore-tool.

Important

This procedure cannot recover the following information:

Metadata Daemon Server (MDS) keyrings and maps
Placement Group settings:
- full ratio set by using the ceph pg set_full_ratio command
- nearfull ratio set by using the ceph pg set_nearfull_ratio command

Before You Start

Ensure that you have the rsync utility and the ceph-test package installed.

Procedure: Recovering the Monitor Store

Use the following commands from the Monitor node with the corrupted store.

Collect the cluster map from all OSD nodes:

ms=<directory>
mkdir $ms

for host in $host_list; do
  rsync -avz "$ms" root@$host:"$ms"; rm -rf "$ms"
  ssh root@$host <<EOF
  for osd in  /var/lib/ceph/osd/ceph-*; do
    ceph-objectstore-tool --data-path \$osd --op update-mon-db --mon-store-path $ms
  done
EOF
rsync -avz root@$host:$ms $ms; done

ms=<directory>
mkdir $ms

for host in $host_list; do
  rsync -avz "$ms" root@$host:"$ms"; rm -rf "$ms"
  ssh root@$host <<EOF
  for osd in  /var/lib/ceph/osd/ceph-*; do
    ceph-objectstore-tool --data-path \$osd --op update-mon-db --mon-store-path $ms
  done
EOF
rsync -avz root@$host:$ms $ms; done

Copy to Clipboard

Toggle word wrap

Replace <directory> with a temporary directory to store the collected cluster map, for example:

ms=/tmp/monstore/
mkdir $ms
for host in $host_list; do

$ ms=/tmp/monstore/
$ mkdir $ms
$ for host in $host_list; do
  rsync -avz "$ms" root@$host:"$ms"; rm -rf "$ms"
  ssh root@$host <<EOF
  for osd in  /var/lib/ceph/osd/ceph-*; do
    ceph-objectstore-tool --data-path \$osd --op update-mon-db --mon-store-path $ms
  done
EOF
rsync -avz root@$host:$ms $ms; done

Copy to Clipboard

Toggle word wrap

Set appropriate capabilities:

ceph-authtool <keyring>  -n mon. --cap mon 'allow *'
ceph-authtool <keyring>  -n client.admin --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *'

ceph-authtool <keyring>  -n mon. --cap mon 'allow *'
ceph-authtool <keyring>  -n client.admin --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *'

Copy to Clipboard

Toggle word wrap

Replace <keyring> with the path to the client administration keyring, for example:

ceph-authtool /etc/ceph/ceph.client.admin.keyring  -n mon. --cap mon 'allow *'
ceph-authtool /etc/ceph/ceph.client.admin.keyring  -n client.admin --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *'

$ ceph-authtool /etc/ceph/ceph.client.admin.keyring  -n mon. --cap mon 'allow *'
$ ceph-authtool /etc/ceph/ceph.client.admin.keyring  -n client.admin --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *'

Copy to Clipboard

Toggle word wrap

Rebuild the Monitor store from the collected map:
```
ceph-monstore-tool <directory> rebuild -- --keyring <keyring>
```
```
ceph-monstore-tool <directory> rebuild -- --keyring <keyring>
```
Copy to Clipboard Toggle word wrap
Replace <directory> with the temporary directory from the first step and <keyring> with the path to the client administration keyring, for example:
```
ceph-monstore-tool /tmp/mon-store rebuild -- --keyring /etc/ceph/ceph.client.admin.keyring
```
```
$ ceph-monstore-tool /tmp/mon-store rebuild -- --keyring /etc/ceph/ceph.client.admin.keyring
```
Copy to Clipboard Toggle word wrap
Note
If you do not use the cephfx authentication, omit the --keyring option:
$ ceph-monstore-tool /tmp/mon-store rebuild
Copy to Clipboard Toggle word wrap

Back up the corrupted store:

mv /var/lib/ceph/mon/<mon-ID>/store.db \
   /var/lib/ceph/mon/<mon-ID>/store.db.corrupted

mv /var/lib/ceph/mon/<mon-ID>/store.db \
   /var/lib/ceph/mon/<mon-ID>/store.db.corrupted

Copy to Clipboard

Toggle word wrap

Replace <mon-ID> with the Monitor ID, for example <mon.0>:

mv /var/lib/ceph/mon/mon.0/store.db \
     /var/lib/ceph/mon/mon.0/store.db.corrupted

# mv /var/lib/ceph/mon/mon.0/store.db \
     /var/lib/ceph/mon/mon.0/store.db.corrupted

Copy to Clipboard

Toggle word wrap

Replace the corrupted store:
```
mv /tmp/mon-store/store.db /var/lib/ceph/mon/<mon-ID>/store.db
```
```
mv /tmp/mon-store/store.db /var/lib/ceph/mon/<mon-ID>/store.db
```
Copy to Clipboard Toggle word wrap
Replace <mon-ID> with the Monitor ID, for example <mon.0>:
```
mv /tmp/mon-store/store.db /var/lib/ceph/mon/mon.0/store.db
```
```
# mv /tmp/mon-store/store.db /var/lib/ceph/mon/mon.0/store.db
```
Copy to Clipboard Toggle word wrap
Repeat this step for all Monitors with corrupted store.
Change the owner of the new store:
```
chown -R ceph:ceph /var/lib/ceph/mon/<mon-ID>/store.db
```
```
chown -R ceph:ceph /var/lib/ceph/mon/<mon-ID>/store.db
```
Copy to Clipboard Toggle word wrap
Replace <mon-ID> with the Monitor ID, for example <mon.0>:
```
chown -R ceph:ceph /var/lib/ceph/mon/mon.0/store.db
```
```
# chown -R ceph:ceph /var/lib/ceph/mon/mon.0/store.db
```
Copy to Clipboard Toggle word wrap
Repeat this step for all Monitors with corrupted store.

4.4. Replacing a Failed Monitor
Copy link

When a Monitor has a corrupted store, the recommended way to fix this problem is to replace the Monitor by using the Ansible automation application.

Before You Start

Before removing a Monitor, ensure that the other Monitors are running and able to form a quorum.

Procedure: Replacing a Failed Monitor

From the Monitor host, remove the Monitor store by default located at /var/lib/ceph/mon/<cluster-name>-<short-host-name>:
```
rm -rf /var/lib/ceph/mon/<cluster-name>-<short-host-name>
```
```
rm -rf /var/lib/ceph/mon/<cluster-name>-<short-host-name>
```
Copy to Clipboard Toggle word wrap
Specify the short host name of the Monitor host and the cluster name. For example, to remove the Monitor store of a Monitor running on host1 from a cluster called remote:
```
rm -rf /var/lib/ceph/mon/remote-host1
```
```
# rm -rf /var/lib/ceph/mon/remote-host1
```
Copy to Clipboard Toggle word wrap
Remove the Monitor from the Monitor map (monmap):
```
ceph mon remove <short-host-name> --cluster <cluster-name>
```
```
ceph mon remove <short-host-name> --cluster <cluster-name>
```
Copy to Clipboard Toggle word wrap
Specify the short host name of the Monitor host and the cluster name. For example, to remove the Monitor running on host1 from a cluster called remote:
```
ceph mon remove host1 --cluster remote
```
```
# ceph mon remove host1 --cluster remote
```
Copy to Clipboard Toggle word wrap
Troubleshoot and fix any problems related to the underlying file system or hardware of the Monitor host.
From the Ansible administration node, redeploy the Monitor by running the ceph-ansible playbook:
```
/usr/share/ceph-ansible/ansible-playbook site.yml
```
```
# /usr/share/ceph-ansible/ansible-playbook site.yml
```
Copy to Clipboard Toggle word wrap

4.5. Compacting the Monitor Store
Copy link

When the Monitor store has grown big in size, you can compact it:

Dynamically by using the ceph tell command. See the Compacting the Monitor Store Dynamically procedure for details.
Upon the start of the ceph-mon daemon. See the Compacting the Monitor Store at Startup procedure for details.
By using the ceph-monstore-tool when the ceph-mon daemon is not running. Use this method when the previously mentioned methods fail to compact the Monitor store or when the Monitor is out of quorum and its log contains the Caught signal (Bus error) error message. See the Compacting the Monitor Store with ceph-monstore-tool procedure for details.

Important

Monitor store size changes when the cluster is not in the active+clean state or during the rebalancing process. For this reason, compact the Monitor store when rebalancing is completed. Also, ensure that the placement groups are in the active+clean state.

Procedure: Compacting the Monitor Store Dynamically

To compact the Monitor store when the ceph-mon daemon is running:

ceph tell mon.<host-name> compact

ceph tell mon.<host-name> compact

Copy to Clipboard

Toggle word wrap

Replace <host-name> with the short host name of the host where the ceph-mon is running. Use the hostname -s command when unsure.

ceph tell mon.host1 compact

# ceph tell mon.host1 compact

Copy to Clipboard

Toggle word wrap

Procedure: Compacting the Monitor Store at Startup

Add the following parameter to the Ceph configuration under the [mon] section:
```
[mon]
mon_compact_on_start = true
```
```
[mon]
mon_compact_on_start = true
```
Copy to Clipboard Toggle word wrap
Restart the ceph-mon daemon:
```
systemctl restart ceph-mon@<host-name>
```
```
systemctl restart ceph-mon@<host-name>
```
Copy to Clipboard Toggle word wrap
Replace <host-name> with the short name of the host where the daemon is running. Use the hostname -s command when unsure.
```
systemctl restart ceph-mon@host1
```
```
# systemctl restart ceph-mon@host1
```
Copy to Clipboard Toggle word wrap
Ensure that Monitors have formed a quorum:
```
ceph mon stat
```
```
# ceph mon stat
```
Copy to Clipboard Toggle word wrap
Repeat these steps on other Monitors if needed.

Procedure: Compacting Monitor Store with `ceph-monstore-tool`

Note

Before you start, ensure that you have the ceph-test package installed.

Verify that the ceph-mon daemon with the large store is not running. Stop the daemon if needed.
```
systemctl status ceph-mon@<host-name>
systemctl stop ceph-mon@<host-name>
```
```
systemctl status ceph-mon@<host-name>
systemctl stop ceph-mon@<host-name>
```
Copy to Clipboard Toggle word wrap
Replace <host-name> with the short name of the host where the daemon is running. Use the hostname -s command when unsure.
```
systemctl status ceph-mon@host1
systemctl stop ceph-mon@host1
```
```
# systemctl status ceph-mon@host1
# systemctl stop ceph-mon@host1
```
Copy to Clipboard Toggle word wrap

Compact the Monitor store:

ceph-monstore-tool /var/lib/ceph/mon/mon.<host-name> compact

ceph-monstore-tool /var/lib/ceph/mon/mon.<host-name> compact

Copy to Clipboard

Toggle word wrap

Replace <host-name> with a short host name of the Monitor host.

ceph-monstore-tool /var/lib/ceph/mon/mon.node1 compact

# ceph-monstore-tool /var/lib/ceph/mon/mon.node1 compact

Copy to Clipboard

Toggle word wrap

Start ceph-mon again:
```
systemctl start ceph-mon@<host-name>
```
```
systemctl start ceph-mon@<host-name>
```
Copy to Clipboard Toggle word wrap
For example:
```
systemctl start ceph-mon@host1
```
```
# systemctl start ceph-mon@host1
```
Copy to Clipboard Toggle word wrap

Chapter 4. Troubleshooting Monitors

Before You Start

4.1. The Most Common Error Messages Related to MonitorsCopy linkLink copied to clipboard!

4.1.1. A Monitor Is Out of QuorumCopy linkLink copied to clipboard!

What This Means

To Troubleshoot This Problem

The ceph-mon Daemon Cannot Start

The ceph-mon Daemon Is Running, but Still Marked as down

See Also

4.1.2. Clock SkewCopy linkLink copied to clipboard!

What This Means

To Troubleshoot This Problem

See Also

4.1.3. The Monitor Store is Getting Too BigCopy linkLink copied to clipboard!

What This Means

To Troubleshoot This Problem

See Also

4.1.4. Understanding Monitor StatusCopy linkLink copied to clipboard!

Monitor States

4.2. Injecting a Monitor MapCopy linkLink copied to clipboard!

Procedure: Injecting a Monitor Map

See Also

4.3. Recovering the Monitor StoreCopy linkLink copied to clipboard!

Before You Start

Procedure: Recovering the Monitor Store

See also

4.4. Replacing a Failed MonitorCopy linkLink copied to clipboard!

Before You Start

Procedure: Replacing a Failed Monitor

See Also

4.5. Compacting the Monitor StoreCopy linkLink copied to clipboard!

Procedure: Compacting the Monitor Store Dynamically

Procedure: Compacting the Monitor Store at Startup

Procedure: Compacting Monitor Store with ceph-monstore-tool

See Also

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

4.1. The Most Common Error Messages Related to Monitors
Copy link

4.1.1. A Monitor Is Out of Quorum
Copy link

The `ceph-mon` Daemon Cannot Start

The `ceph-mon` Daemon Is Running, but Still Marked as `down`

4.1.2. Clock Skew
Copy link

4.1.3. The Monitor Store is Getting Too Big
Copy link

4.1.4. Understanding Monitor Status
Copy link

4.2. Injecting a Monitor Map
Copy link

4.3. Recovering the Monitor Store
Copy link

4.4. Replacing a Failed Monitor
Copy link

4.5. Compacting the Monitor Store
Copy link

Procedure: Compacting Monitor Store with `ceph-monstore-tool`