Chapter 8. Troubleshooting Ceph placement groups

8.1. Prerequisites
Copy link

Verify your network connection.
Ensure that Monitors are able to form a quorum.
Ensure that all healthy OSDs are up and in, and the backfilling and recovery processes are finished.

8.2. Most common Ceph placement groups errors
Copy link

The following table lists the most common error messages that are returned by the ceph health detail command. The table provides links to corresponding sections that explain the errors and point to specific procedures to fix the problems.

In addition, you can list placement groups that are stuck in a state that is not optimal. See Section 8.3, “Listing placement groups stuck in stale, inactive, or unclean state” for details.

8.2.1. Prerequisites
Copy link

A running Red Hat Ceph Storage cluster.
A running Ceph Object Gateway.

8.2.2. Placement group error messages
Copy link

A table of common placement group error messages, and a potential fix.

Expand

Error message	See
`HEALTH_ERR`
`pgs down`	Placement groups are `down`
`pgs inconsistent`	Inconsistent placement groups
`scrub errors`	Inconsistent placement groups
`HEALTH_WARN`
`pgs stale`	Stale placement groups
`unfound`	Unfound objects

8.2.3. Stale placement groups
Copy link

The ceph health command lists some Placement Groups (PGs) as stale:

HEALTH_WARN 24 pgs stale; 3/300 in osds are down

HEALTH_WARN 24 pgs stale; 3/300 in osds are down

Copy to Clipboard

Toggle word wrap

What This Means

The Monitor marks a placement group as stale when it does not receive any status update from the primary OSD of the placement group’s acting set or when other OSDs reported that the primary OSD is down.

Usually, PGs enter the stale state after you start the storage cluster and until the peering process completes. However, when the PGs remain stale for longer than expected, it might indicate that the primary OSD for those PGs is down or not reporting PG statistics to the Monitor. When the primary OSD storing stale PGs is back up, Ceph starts to recover the PGs.

The mon_osd_report_timeout setting determines how often OSDs report PGs statistics to Monitors. By default, this parameter is set to 0.5, which means that OSDs report the statistics every half a second.

To Troubleshoot This Problem

Identify which PGs are stale and on what OSDs they are stored. The error message includes information similar to the following example:

Example

[ceph: root@host01 /]# ceph health detail
HEALTH_WARN 24 pgs stale; 3/300 in osds are down
...
pg 2.5 is stuck stale+active+remapped, last acting [2,0]
...
osd.10 is down since epoch 23, last address 192.168.106.220:6800/11080
osd.11 is down since epoch 13, last address 192.168.106.220:6803/11539
osd.12 is down since epoch 24, last address 192.168.106.220:6806/11861

[ceph: root@host01 /]# ceph health detail
HEALTH_WARN 24 pgs stale; 3/300 in osds are down
...
pg 2.5 is stuck stale+active+remapped, last acting [2,0]
...
osd.10 is down since epoch 23, last address 192.168.106.220:6800/11080
osd.11 is down since epoch 13, last address 192.168.106.220:6803/11539
osd.12 is down since epoch 24, last address 192.168.106.220:6806/11861

Copy to Clipboard

Toggle word wrap

Troubleshoot any problems with the OSDs that are marked as down. For details, see Down OSDs.

8.2.4. Inconsistent placement groups
Copy link

Some placement groups are marked as active + clean + inconsistent and the ceph health detail returns an error message similar to the following one:

HEALTH_ERR 1 pgs inconsistent; 2 scrub errors
pg 0.6 is active+clean+inconsistent, acting [0,1,2]
2 scrub errors

HEALTH_ERR 1 pgs inconsistent; 2 scrub errors
pg 0.6 is active+clean+inconsistent, acting [0,1,2]
2 scrub errors

Copy to Clipboard

Toggle word wrap

What This Means

When Ceph detects inconsistencies in one or more replicas of an object in a placement group, it marks the placement group as inconsistent. The most common inconsistencies are:

Objects have an incorrect size.
Objects are missing from one replica after a recovery finished.

In most cases, errors during scrubbing cause inconsistency within placement groups.

To Troubleshoot This Problem

Log in to the Cephadm shell:
Example
```
cephadm shell
```
```
[root@host01 ~]# cephadm shell
```
Copy to Clipboard Toggle word wrap

Determine which placement group is in the inconsistent state:

[ceph: root@host01 /]# ceph health detail
HEALTH_ERR 1 pgs inconsistent; 2 scrub errors
pg 0.6 is active+clean+inconsistent, acting [0,1,2]
2 scrub errors

[ceph: root@host01 /]# ceph health detail
HEALTH_ERR 1 pgs inconsistent; 2 scrub errors
pg 0.6 is active+clean+inconsistent, acting [0,1,2]
2 scrub errors

Copy to Clipboard

Toggle word wrap

Determine why the placement group is inconsistent.

Start the deep scrubbing process on the placement group:
Syntax
```
ceph pg deep-scrub ID
```
```
ceph pg deep-scrub ID
```
Copy to Clipboard Toggle word wrap
Replace ID with the ID of the inconsistent placement group, for example:
```
[ceph: root@host01 /]# ceph pg deep-scrub 0.6
instructing pg 0.6 on osd.0 to deep-scrub
```
```
[ceph: root@host01 /]# ceph pg deep-scrub 0.6
instructing pg 0.6 on osd.0 to deep-scrub
```
Copy to Clipboard Toggle word wrap

Search the output of the ceph -w for any messages related to that placement group:

Syntax

ceph -w | grep ID

ceph -w | grep ID

Copy to Clipboard

Toggle word wrap

Replace ID with the ID of the inconsistent placement group, for example:

[ceph: root@host01 /]# ceph -w | grep 0.6
2022-05-26 01:35:36.778215 osd.106 [ERR] 0.6 deep-scrub stat mismatch, got 636/635 objects, 0/0 clones, 0/0 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts, 1855455/1854371 bytes.
2022-05-26 01:35:36.788334 osd.106 [ERR] 0.6 deep-scrub 1 errors

[ceph: root@host01 /]# ceph -w | grep 0.6
2022-05-26 01:35:36.778215 osd.106 [ERR] 0.6 deep-scrub stat mismatch, got 636/635 objects, 0/0 clones, 0/0 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts, 1855455/1854371 bytes.
2022-05-26 01:35:36.788334 osd.106 [ERR] 0.6 deep-scrub 1 errors

Copy to Clipboard

Toggle word wrap

If the output includes any error messages similar to the following ones, you can repair the inconsistent placement group. See Repairing inconsistent placement groups for details.

Syntax

PG.ID shard OSD: soid OBJECT missing attr , missing attr _ATTRIBUTE_TYPE
PG.ID shard OSD: soid OBJECT digest 0 != known digest DIGEST, size 0 != known size SIZE
PG.ID shard OSD: soid OBJECT size 0 != known size SIZE
PG.ID deep-scrub stat mismatch, got MISMATCH
PG.ID shard OSD: soid OBJECT candidate had a read error, digest 0 != known digest DIGEST

PG.ID shard OSD: soid OBJECT missing attr , missing attr _ATTRIBUTE_TYPE
PG.ID shard OSD: soid OBJECT digest 0 != known digest DIGEST, size 0 != known size SIZE
PG.ID shard OSD: soid OBJECT size 0 != known size SIZE
PG.ID deep-scrub stat mismatch, got MISMATCH
PG.ID shard OSD: soid OBJECT candidate had a read error, digest 0 != known digest DIGEST

Copy to Clipboard

Toggle word wrap

If the output includes any error messages similar to the following ones, it is not safe to repair the inconsistent placement group because you can lose data. Open a support ticket in this situation. See Contacting Red Hat support for details.

PG.ID shard OSD: soid OBJECT digest DIGEST != known digest DIGEST
PG.ID shard OSD: soid OBJECT omap_digest DIGEST != known omap_digest DIGEST

PG.ID shard OSD: soid OBJECT digest DIGEST != known digest DIGEST
PG.ID shard OSD: soid OBJECT omap_digest DIGEST != known omap_digest DIGEST

Copy to Clipboard

Toggle word wrap

8.2.5. Unclean placement groups
Copy link

The ceph health command returns an error message similar to the following one:

HEALTH_WARN 197 pgs stuck unclean

HEALTH_WARN 197 pgs stuck unclean

Copy to Clipboard

Toggle word wrap

What This Means

Ceph marks a placement group as unclean if it has not achieved the active+clean state for the number of seconds specified in the mon_pg_stuck_threshold parameter in the Ceph configuration file. The default value of mon_pg_stuck_threshold is 300 seconds.

If a placement group is unclean, it contains objects that are not replicated the number of times specified in the osd_pool_default_size parameter. The default value of osd_pool_default_size is 3, which means that Ceph creates three replicas.

Usually, unclean placement groups indicate that some OSDs might be down.

To Troubleshoot This Problem

Determine which OSDs are down:
```
[ceph: root@host01 /]# ceph osd tree
```
```
[ceph: root@host01 /]# ceph osd tree
```
Copy to Clipboard Toggle word wrap
Troubleshoot and fix any problems with the OSDs. See Down OSDs for details.

8.2.6. Inactive placement groups
Copy link

The ceph health command returns an error message similar to the following one:

HEALTH_WARN 197 pgs stuck inactive

HEALTH_WARN 197 pgs stuck inactive

Copy to Clipboard

Toggle word wrap

What This Means

Ceph marks a placement group as inactive if it has not be active for the number of seconds specified in the mon_pg_stuck_threshold parameter in the Ceph configuration file. The default value of mon_pg_stuck_threshold is 300 seconds.

Usually, inactive placement groups indicate that some OSDs might be down.

To Troubleshoot This Problem

Determine which OSDs are down:
```
ceph osd tree
```
```
# ceph osd tree
```
Copy to Clipboard Toggle word wrap
Troubleshoot and fix any problems with the OSDs.

Additional Resources

Listing placement groups stuck in stale inactive or unclean state
See Down OSDs for details.

8.2.7. Placement groups are down
Copy link

The ceph health detail command reports that some placement groups are down:

HEALTH_ERR 7 pgs degraded; 12 pgs down; 12 pgs peering; 1 pgs recovering; 6 pgs stuck unclean; 114/3300 degraded (3.455%); 1/3 in osds are down
...
pg 0.5 is down+peering
pg 1.4 is down+peering
...
osd.1 is down since epoch 69, last address 192.168.106.220:6801/8651

HEALTH_ERR 7 pgs degraded; 12 pgs down; 12 pgs peering; 1 pgs recovering; 6 pgs stuck unclean; 114/3300 degraded (3.455%); 1/3 in osds are down
...
pg 0.5 is down+peering
pg 1.4 is down+peering
...
osd.1 is down since epoch 69, last address 192.168.106.220:6801/8651

Copy to Clipboard

Toggle word wrap

What This Means

In certain cases, the peering process can be blocked, which prevents a placement group from becoming active and usable. Usually, a failure of an OSD causes the peering failures.

To Troubleshoot This Problem

Determine what blocks the peering process:

Syntax

ceph pg ID query

ceph pg ID query

Copy to Clipboard

Toggle word wrap

Replace ID with the ID of the placement group that is down:

Example

[ceph: root@host01 /]#  ceph pg 0.5 query

{ "state": "down+peering",
  ...
  "recovery_state": [
       { "name": "Started\/Primary\/Peering\/GetInfo",
         "enter_time": "2021-08-06 14:40:16.169679",
         "requested_info_from": []},
       { "name": "Started\/Primary\/Peering",
         "enter_time": "2021-08-06 14:40:16.169659",
         "probing_osds": [
               0,
               1],
         "blocked": "peering is blocked due to down osds",
         "down_osds_we_would_probe": [
               1],
         "peering_blocked_by": [
               { "osd": 1,
                 "current_lost_at": 0,
                 "comment": "starting or marking this osd lost may let us proceed"}]},
       { "name": "Started",
         "enter_time": "2021-08-06 14:40:16.169513"}
   ]
}

[ceph: root@host01 /]#  ceph pg 0.5 query

{ "state": "down+peering",
  ...
  "recovery_state": [
       { "name": "Started\/Primary\/Peering\/GetInfo",
         "enter_time": "2021-08-06 14:40:16.169679",
         "requested_info_from": []},
       { "name": "Started\/Primary\/Peering",
         "enter_time": "2021-08-06 14:40:16.169659",
         "probing_osds": [
               0,
               1],
         "blocked": "peering is blocked due to down osds",
         "down_osds_we_would_probe": [
               1],
         "peering_blocked_by": [
               { "osd": 1,
                 "current_lost_at": 0,
                 "comment": "starting or marking this osd lost may let us proceed"}]},
       { "name": "Started",
         "enter_time": "2021-08-06 14:40:16.169513"}
   ]
}

Copy to Clipboard

Toggle word wrap

The recovery_state section includes information on why the peering process is blocked.

If the output includes the peering is blocked due to down osds error message, see Down OSDs.
If you see any other error message, open a support ticket. See Contacting Red Hat Support service for details.

8.2.8. Unfound objects
Copy link

The ceph health command returns an error message similar to the following one, containing the unfound keyword:

HEALTH_WARN 1 pgs degraded; 78/3778 unfound (2.065%)

HEALTH_WARN 1 pgs degraded; 78/3778 unfound (2.065%)

Copy to Clipboard

Toggle word wrap

What This Means

Ceph marks objects as unfound when it knows these objects or their newer copies exist but it is unable to find them. As a consequence, Ceph cannot recover such objects and proceed with the recovery process.

An Example Situation

A placement group stores data on osd.1 and osd.2.

osd.1 goes down.
osd.2 handles some write operations.
osd.1 comes up.
A peering process between osd.1 and osd.2 starts, and the objects missing on osd.1 are queued for recovery.
Before Ceph copies new objects, osd.2 goes down.

As a result, osd.1 knows that these objects exist, but there is no OSD that has a copy of the objects.

In this scenario, Ceph is waiting for the failed node to be accessible again, and the unfound objects blocks the recovery process.

To Troubleshoot This Problem

Log in to the Cephadm shell:
Example
```
cephadm shell
```
```
[root@host01 ~]# cephadm shell
```
Copy to Clipboard Toggle word wrap

Determine which placement group contains unfound objects:

[ceph: root@host01 /]# ceph health detail
HEALTH_WARN 1 pgs recovering; 1 pgs stuck unclean; recovery 5/937611 objects degraded (0.001%); 1/312537 unfound (0.000%)
pg 3.8a5 is stuck unclean for 803946.712780, current state active+recovering, last acting [320,248,0]
pg 3.8a5 is active+recovering, acting [320,248,0], 1 unfound
recovery 5/937611 objects degraded (0.001%); **1/312537 unfound (0.000%)**

[ceph: root@host01 /]# ceph health detail
HEALTH_WARN 1 pgs recovering; 1 pgs stuck unclean; recovery 5/937611 objects degraded (0.001%); 1/312537 unfound (0.000%)
pg 3.8a5 is stuck unclean for 803946.712780, current state active+recovering, last acting [320,248,0]
pg 3.8a5 is active+recovering, acting [320,248,0], 1 unfound
recovery 5/937611 objects degraded (0.001%); **1/312537 unfound (0.000%)**

Copy to Clipboard

Toggle word wrap

List more information about the placement group:

Syntax

ceph pg ID query

ceph pg ID query

Copy to Clipboard

Toggle word wrap

Replace ID with the ID of the placement group containing the unfound objects:

Example

[ceph: root@host01 /]# ceph pg 3.8a5 query
{ "state": "active+recovering",
  "epoch": 10741,
  "up": [
        320,
        248,
        0],
  "acting": [
        320,
        248,
        0],
<snip>
  "recovery_state": [
        { "name": "Started\/Primary\/Active",
          "enter_time": "2021-08-28 19:30:12.058136",
          "might_have_unfound": [
                { "osd": "0",
                  "status": "already probed"},
                { "osd": "248",
                  "status": "already probed"},
                { "osd": "301",
                  "status": "already probed"},
                { "osd": "362",
                  "status": "already probed"},
                { "osd": "395",
                  "status": "already probed"},
                { "osd": "429",
                  "status": "osd is down"}],
          "recovery_progress": { "backfill_targets": [],
              "waiting_on_backfill": [],
              "last_backfill_started": "0\/\/0\/\/-1",
              "backfill_info": { "begin": "0\/\/0\/\/-1",
                  "end": "0\/\/0\/\/-1",
                  "objects": []},
              "peer_backfill_info": [],
              "backfills_in_flight": [],
              "recovering": [],
              "pg_backend": { "pull_from_peer": [],
                  "pushing": []}},
          "scrub": { "scrubber.epoch_start": "0",
              "scrubber.active": 0,
              "scrubber.block_writes": 0,
              "scrubber.finalizing": 0,
              "scrubber.waiting_on": 0,
              "scrubber.waiting_on_whom": []}},
        { "name": "Started",
          "enter_time": "2021-08-28 19:30:11.044020"}],

[ceph: root@host01 /]# ceph pg 3.8a5 query
{ "state": "active+recovering",
  "epoch": 10741,
  "up": [
        320,
        248,
        0],
  "acting": [
        320,
        248,
        0],
<snip>
  "recovery_state": [
        { "name": "Started\/Primary\/Active",
          "enter_time": "2021-08-28 19:30:12.058136",
          "might_have_unfound": [
                { "osd": "0",
                  "status": "already probed"},
                { "osd": "248",
                  "status": "already probed"},
                { "osd": "301",
                  "status": "already probed"},
                { "osd": "362",
                  "status": "already probed"},
                { "osd": "395",
                  "status": "already probed"},
                { "osd": "429",
                  "status": "osd is down"}],
          "recovery_progress": { "backfill_targets": [],
              "waiting_on_backfill": [],
              "last_backfill_started": "0\/\/0\/\/-1",
              "backfill_info": { "begin": "0\/\/0\/\/-1",
                  "end": "0\/\/0\/\/-1",
                  "objects": []},
              "peer_backfill_info": [],
              "backfills_in_flight": [],
              "recovering": [],
              "pg_backend": { "pull_from_peer": [],
                  "pushing": []}},
          "scrub": { "scrubber.epoch_start": "0",
              "scrubber.active": 0,
              "scrubber.block_writes": 0,
              "scrubber.finalizing": 0,
              "scrubber.waiting_on": 0,
              "scrubber.waiting_on_whom": []}},
        { "name": "Started",
          "enter_time": "2021-08-28 19:30:11.044020"}],

Copy to Clipboard

Toggle word wrap

The might_have_unfound section includes OSDs where Ceph tried to locate the unfound objects:

The already probed status indicates that Ceph cannot locate the unfound objects in that OSD.
The osd is down status indicates that Ceph cannot contact that OSD.

Troubleshoot the OSDs that are marked as down. See Down OSDs for details.
If you are unable to fix the problem that causes the OSD to be down, open a support ticket. See Contacting Red Hat Support for service for details.

8.3. Listing placement groups stuck in stale, inactive, or unclean state
Copy link

After a failure, placement groups enter states like degraded or peering. This states indicate normal progression through the failure recovery process.

However, if a placement group stays in one of these states for a longer time than expected, it can be an indication of a larger problem. The Monitors report when placement groups get stuck in a state that is not optimal.

The mon_pg_stuck_threshold option in the Ceph configuration file determines the number of seconds after which placement groups are considered inactive, unclean, or stale.

The following table lists these states together with a short explanation.

Expand

State	What it means	Most common causes	See
`inactive`	The PG has not been able to service read/write requests.	Peering problems	Inactive placement groups
`unclean`	The PG contains objects that are not replicated the desired number of times. Something is preventing the PG from recovering.	`unfound` objects OSDs are `down` Incorrect configuration	Unclean placement groups
`stale`	The status of the PG has not been updated by a `ceph-osd` daemon.	OSDs are `down`	Stale placement groups

Prerequisites

A running Red Hat Ceph Storage cluster.
Root-level access to the node.

Procedure

Log into the Cephadm shell:
Example
```
cephadm shell
```
```
[root@host01 ~]# cephadm shell
```
Copy to Clipboard Toggle word wrap

List the stuck PGs:

Example

[ceph: root@host01 /]# ceph pg dump_stuck inactive
[ceph: root@host01 /]# ceph pg dump_stuck unclean
[ceph: root@host01 /]# ceph pg dump_stuck stale

[ceph: root@host01 /]# ceph pg dump_stuck inactive
[ceph: root@host01 /]# ceph pg dump_stuck unclean
[ceph: root@host01 /]# ceph pg dump_stuck stale

Copy to Clipboard

Toggle word wrap

8.4. Listing placement group inconsistencies
Copy link

Use the rados utility to list inconsistencies in various replicas of objects. Use the --format=json-pretty option to list a more detailed output.

This section covers the listing of:

Inconsistent placement group in a pool
Inconsistent objects in a placement group
Inconsistent snapshot sets in a placement group

Prerequisites

A running Red Hat Ceph Storage cluster in a healthy state.
Root-level access to the node.

Procedure

List all the inconsistent placement groups in a pool:

Syntax

rados list-inconsistent-pg POOL --format=json-pretty

rados list-inconsistent-pg POOL --format=json-pretty

Copy to Clipboard

Toggle word wrap

Example

[ceph: root@host01 /]# rados list-inconsistent-pg data --format=json-pretty
[0.6]

[ceph: root@host01 /]# rados list-inconsistent-pg data --format=json-pretty
[0.6]

Copy to Clipboard

Toggle word wrap

List inconsistent objects in a placement group with ID:

Syntax

rados list-inconsistent-obj PLACEMENT_GROUP_ID

rados list-inconsistent-obj PLACEMENT_GROUP_ID

Copy to Clipboard

Toggle word wrap

Example

[ceph: root@host01 /]# rados list-inconsistent-obj 0.6
{
    "epoch": 14,
    "inconsistents": [
        {
            "object": {
                "name": "image1",
                "nspace": "",
                "locator": "",
                "snap": "head",
                "version": 1
            },
            "errors": [
                "data_digest_mismatch",
                "size_mismatch"
            ],
            "union_shard_errors": [
                "data_digest_mismatch_oi",
                "size_mismatch_oi"
            ],
            "selected_object_info": "0:602f83fe:::foo:head(16'1 client.4110.0:1 dirty|data_digest|omap_digest s 968 uv 1 dd e978e67f od ffffffff alloc_hint [0 0 0])",
            "shards": [
                {
                    "osd": 0,
                    "errors": [],
                    "size": 968,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0xe978e67f"
                },
                {
                    "osd": 1,
                    "errors": [],
                    "size": 968,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0xe978e67f"
                },
                {
                    "osd": 2,
                    "errors": [
                        "data_digest_mismatch_oi",
                        "size_mismatch_oi"
                    ],
                    "size": 0,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0xffffffff"
                }
            ]
        }
    ]
}

[ceph: root@host01 /]# rados list-inconsistent-obj 0.6
{
    "epoch": 14,
    "inconsistents": [
        {
            "object": {
                "name": "image1",
                "nspace": "",
                "locator": "",
                "snap": "head",
                "version": 1
            },
            "errors": [
                "data_digest_mismatch",
                "size_mismatch"
            ],
            "union_shard_errors": [
                "data_digest_mismatch_oi",
                "size_mismatch_oi"
            ],
            "selected_object_info": "0:602f83fe:::foo:head(16'1 client.4110.0:1 dirty|data_digest|omap_digest s 968 uv 1 dd e978e67f od ffffffff alloc_hint [0 0 0])",
            "shards": [
                {
                    "osd": 0,
                    "errors": [],
                    "size": 968,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0xe978e67f"
                },
                {
                    "osd": 1,
                    "errors": [],
                    "size": 968,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0xe978e67f"
                },
                {
                    "osd": 2,
                    "errors": [
                        "data_digest_mismatch_oi",
                        "size_mismatch_oi"
                    ],
                    "size": 0,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0xffffffff"
                }
            ]
        }
    ]
}

Copy to Clipboard

Toggle word wrap

The following fields are important to determine what causes the inconsistency:

name: The name of the object with inconsistent replicas.
nspace: The namespace that is a logical separation of a pool. It’s empty by default.
locator: The key that is used as the alternative of the object name for placement.
snap: The snapshot ID of the object. The only writable version of the object is called head. If an object is a clone, this field includes its sequential ID.
version: The version ID of the object with inconsistent replicas. Each write operation to an object increments it.
errors: A list of errors that indicate inconsistencies between shards without determining which shard or shards are incorrect. See the shard array to further investigate the errors.
- data_digest_mismatch: The digest of the replica read from one OSD is different from the other OSDs.
- size_mismatch: The size of a clone or the head object does not match the expectation.
- read_error: This error indicates inconsistencies caused most likely by disk errors.
union_shard_error: The union of all errors specific to shards. These errors are connected to a faulty shard. The errors that end with oi indicate that you have to compare the information from a faulty object to information with selected objects. See the shard array to further investigate the errors.
In the above example, the object replica stored on osd.2 has different digest than the replicas stored on osd.0 and osd.1. Specifically, the digest of the replica is not 0xffffffff as calculated from the shard read from osd.2, but 0xe978e67f. In addition, the size of the replica read from osd.2 is 0, while the size reported by osd.0 and osd.1 is 968.

List inconsistent sets of snapshots:

Syntax

rados list-inconsistent-snapset PLACEMENT_GROUP_ID

rados list-inconsistent-snapset PLACEMENT_GROUP_ID

Copy to Clipboard

Toggle word wrap

Example

[ceph: root@host01 /]# rados list-inconsistent-snapset 0.23 --format=json-pretty
{
    "epoch": 64,
    "inconsistents": [
        {
            "name": "obj5",
            "nspace": "",
            "locator": "",
            "snap": "0x00000001",
            "headless": true
        },
        {
            "name": "obj5",
            "nspace": "",
            "locator": "",
            "snap": "0x00000002",
            "headless": true
        },
        {
            "name": "obj5",
            "nspace": "",
            "locator": "",
            "snap": "head",
            "ss_attr_missing": true,
            "extra_clones": true,
            "extra clones": [
                2,
                1
            ]
        }
    ]

[ceph: root@host01 /]# rados list-inconsistent-snapset 0.23 --format=json-pretty
{
    "epoch": 64,
    "inconsistents": [
        {
            "name": "obj5",
            "nspace": "",
            "locator": "",
            "snap": "0x00000001",
            "headless": true
        },
        {
            "name": "obj5",
            "nspace": "",
            "locator": "",
            "snap": "0x00000002",
            "headless": true
        },
        {
            "name": "obj5",
            "nspace": "",
            "locator": "",
            "snap": "head",
            "ss_attr_missing": true,
            "extra_clones": true,
            "extra clones": [
                2,
                1
            ]
        }
    ]

Copy to Clipboard

Toggle word wrap

The command returns the following errors:

ss_attr_missing: One or more attributes are missing. Attributes are information about snapshots encoded into a snapshot set as a list of key-value pairs.
ss_attr_corrupted: One or more attributes fail to decode.
clone_missing: A clone is missing.
snapset_mismatch: The snapshot set is inconsistent by itself.
head_mismatch: The snapshot set indicates that head exists or not, but the scrub results report otherwise.
headless: The head of the snapshot set is missing.
size_mismatch: The size of a clone or the head object does not match the expectation.

8.5. Repairing inconsistent placement groups
Copy link

Due to an error during deep scrubbing, some placement groups can include inconsistencies. Ceph reports such placement groups as inconsistent:

HEALTH_ERR 1 pgs inconsistent; 2 scrub errors
pg 0.6 is active+clean+inconsistent, acting [0,1,2]
2 scrub errors

HEALTH_ERR 1 pgs inconsistent; 2 scrub errors
pg 0.6 is active+clean+inconsistent, acting [0,1,2]
2 scrub errors

Copy to Clipboard

Toggle word wrap

Warning

You can repair only certain inconsistencies.

Do not repair the placement groups if the Ceph logs include the following errors:

_PG_._ID_ shard _OSD_: soid _OBJECT_ digest _DIGEST_ != known digest _DIGEST_
_PG_._ID_ shard _OSD_: soid _OBJECT_ omap_digest _DIGEST_ != known omap_digest _DIGEST_

_PG_._ID_ shard _OSD_: soid _OBJECT_ digest _DIGEST_ != known digest _DIGEST_
_PG_._ID_ shard _OSD_: soid _OBJECT_ omap_digest _DIGEST_ != known omap_digest _DIGEST_

Copy to Clipboard

Toggle word wrap

Open a support ticket instead. See Contacting Red Hat Support for service for details.

Prerequisites

Root-level access to the Ceph Monitor node.

Procedure

Repair the inconsistent placement groups:
Syntax
```
ceph pg repair ID
```
```
ceph pg repair ID
```
Copy to Clipboard Toggle word wrap
Replace ID with the ID of the inconsistent placement group.

8.6. Increasing the placement group
Copy link

Insufficient Placement Group (PG) count impacts the performance of the Ceph cluster and data distribution. It is one of the main causes of the nearfull osds error messages.

The recommended ratio is between 100 and 300 PGs per OSD. This ratio can decrease when you add more OSDs to the cluster.

The pg_num and pgp_num parameters determine the PG count. These parameters are configured per each pool, and therefore, you must adjust each pool with low PG count separately.

Important

Increasing the PG count is the most intensive process that you can perform on a Ceph cluster. This process might have a serious performance impact if not done in a slow and methodical way. Once you increase pgp_num, you will not be able to stop or reverse the process and you must complete it. Consider increasing the PG count outside of business critical processing time allocation, and alert all clients about the potential performance impact. Do not change the PG count if the cluster is in the HEALTH_ERR state.

Prerequisites

A running Red Hat Ceph Storage cluster in a healthy state.
Root-level access to the node.

Procedure

Reduce the impact of data redistribution and recovery on individual OSDs and OSD hosts:

Lower the value of the osd max backfills, osd_recovery_max_active, and osd_recovery_op_priority parameters:

[ceph: root@host01 /]# ceph tell osd.* injectargs '--osd_max_backfills 1 --osd_recovery_max_active 1 --osd_recovery_op_priority 1'

[ceph: root@host01 /]# ceph tell osd.* injectargs '--osd_max_backfills 1 --osd_recovery_max_active 1 --osd_recovery_op_priority 1'

Copy to Clipboard

Toggle word wrap

Disable the shallow and deep scrubbing:

[ceph: root@host01 /]# ceph osd set noscrub
[ceph: root@host01 /]# ceph osd set nodeep-scrub

[ceph: root@host01 /]# ceph osd set noscrub
[ceph: root@host01 /]# ceph osd set nodeep-scrub

Copy to Clipboard

Toggle word wrap

Use the Ceph Placement Groups (PGs) per Pool Calculator to calculate the optimal value of the pg_num and pgp_num parameters.
Increase the pg_num value in small increments until you reach the desired value.
1. Determine the starting increment value. Use a very low value that is a power of two, and increase it when you determine the impact on the cluster. The optimal value depends on the pool size, OSD count, and client I/O load.
2. Increment the pg_num value:
  Syntax
  ceph osd pool set POOL pg_num VALUE
  
  Copy to Clipboard Toggle word wrap
  Specify the pool name and the new value, for example:
  Example
  [ceph: root@host01 /]# ceph osd pool set data pg_num 4
  
  Copy to Clipboard Toggle word wrap
3. Monitor the status of the cluster:
  Example
  [ceph: root@host01 /]# ceph -s
  
  Copy to Clipboard Toggle word wrap
  The PGs state will change from creating to active+clean. Wait until all PGs are in the active+clean state.
Increase the pgp_num value in small increments until you reach the desired value:
1. Determine the starting increment value. Use a very low value that is a power of two, and increase it when you determine the impact on the cluster. The optimal value depends on the pool size, OSD count, and client I/O load.
2. Increment the pgp_num value:
  Syntax
  ceph osd pool set POOL pgp_num VALUE
  
  Copy to Clipboard Toggle word wrap
  Specify the pool name and the new value, for example:
  [ceph: root@host01 /]# ceph osd pool set data pgp_num 4
  Copy to Clipboard Toggle word wrap
3. Monitor the status of the cluster:
  [ceph: root@host01 /]# ceph -s
  Copy to Clipboard Toggle word wrap
  The PGs state will change through peering, wait_backfill, backfilling, recover, and others. Wait until all PGs are in the active+clean state.
Repeat the previous steps for all pools with insufficient PG count.

Set osd max backfills, osd_recovery_max_active, and osd_recovery_op_priority to their default values:

[ceph: root@host01 /]# ceph tell osd.* injectargs '--osd_max_backfills 1 --osd_recovery_max_active 3 --osd_recovery_op_priority 3'

[ceph: root@host01 /]# ceph tell osd.* injectargs '--osd_max_backfills 1 --osd_recovery_max_active 3 --osd_recovery_op_priority 3'

Copy to Clipboard

Toggle word wrap

Enable the shallow and deep scrubbing:

[ceph: root@host01 /]# ceph osd unset noscrub
[ceph: root@host01 /]# ceph osd unset nodeep-scrub

[ceph: root@host01 /]# ceph osd unset noscrub
[ceph: root@host01 /]# ceph osd unset nodeep-scrub

Copy to Clipboard

Toggle word wrap

8.7. Interpreting placement group dump output
Copy link

The ceph pg dump command displays a wealth of information regarding placement groups.

Because placement groups (PGs) typically range from hundreds to tens of thousands, redirecting the command output to a file or piping it through a pager is advised for easier readability.

Example

; PG stats
PG_STAT  OBJECTS  MISSING_ON_PRIMARY  DEGRADED  MISPLACED  UNFOUND  BYTES  OMAP_BYTES*  OMAP_KEYS*  LOG    DISK_LOG  STATE         STATE_STAMP                      VERSION    REPORTED    UP       UP_PRIMARY  ACTING   ACTING_PRIMARY  LAST_SCRUB  SCRUB_STAMP                      LAST_DEEP_SCRUB  DEEP_SCRUB_STAMP                 SNAPTRIMQ_LEN
5.1b           0                   0         0          0        0      0            0           0      0         0  active+clean  2024-06-05T22:12:41.648713+0000        0'0     255:294  [0,7,8]           0  [0,7,8]               0         0'0  2024-06-01T16:34:39.253348+0000              0'0  2024-05-30T05:04:41.133279+0000              0
4.1a           0                   0         0          0        0      0            0           0      0         0  active+clean  2024-06-05T22:12:44.045134+0000        0'0     255:291  [3,4,5]           3  [3,4,5]               3         0'0  2024-06-01T07:43:59.302980+0000              0'0  2024-05-27T16:45:10.130663+0000              0
1.1f           0                   0         0          0        0      0            0           0      0         0  active+clean  2024-06-05T22:12:44.044842+0000        0'0     255:262  [4,1,3]           4  [4,1,3]               4         0'0  2024-06-01T08:52:22.029352+0000              0'0  2024-05-27T16:44:47.679985+0000              0
....
....
; Pool stats
POOLID OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG     DISK_LOG
5      4       0                  0        0         0       763   0           0          8       8
4      8       0                  0        0         0       0     0           0          1145    1145
3      209     0                  0        0         0       3702  0           0          250271  250271
1      4       0                  0        0         0       1403  0           0          4       4
2      0       0                  0        0         0       0     0           0          0       0
...
...
; OSD stats
OSD_STAT  USED     AVAIL    USED_RAW  TOTAL   HB_PEERS           PG_SUM  PRIMARY_PG_SUM
4          70 MiB  9.9 GiB    70 MiB  10 GiB  [0,1,2,3,5,6,7,8]      56              21
7          44 MiB   10 GiB    44 MiB  10 GiB  [0,1,2,3,4,5,6,8]      39               9
2          44 MiB   10 GiB    44 MiB  10 GiB  [0,1,3,4,5,6,7,8]      34              11

; PG stats
PG_STAT  OBJECTS  MISSING_ON_PRIMARY  DEGRADED  MISPLACED  UNFOUND  BYTES  OMAP_BYTES*  OMAP_KEYS*  LOG    DISK_LOG  STATE         STATE_STAMP                      VERSION    REPORTED    UP       UP_PRIMARY  ACTING   ACTING_PRIMARY  LAST_SCRUB  SCRUB_STAMP                      LAST_DEEP_SCRUB  DEEP_SCRUB_STAMP                 SNAPTRIMQ_LEN
5.1b           0                   0         0          0        0      0            0           0      0         0  active+clean  2024-06-05T22:12:41.648713+0000        0'0     255:294  [0,7,8]           0  [0,7,8]               0         0'0  2024-06-01T16:34:39.253348+0000              0'0  2024-05-30T05:04:41.133279+0000              0
4.1a           0                   0         0          0        0      0            0           0      0         0  active+clean  2024-06-05T22:12:44.045134+0000        0'0     255:291  [3,4,5]           3  [3,4,5]               3         0'0  2024-06-01T07:43:59.302980+0000              0'0  2024-05-27T16:45:10.130663+0000              0
1.1f           0                   0         0          0        0      0            0           0      0         0  active+clean  2024-06-05T22:12:44.044842+0000        0'0     255:262  [4,1,3]           4  [4,1,3]               4         0'0  2024-06-01T08:52:22.029352+0000              0'0  2024-05-27T16:44:47.679985+0000              0
....
....
; Pool stats
POOLID OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG     DISK_LOG
5      4       0                  0        0         0       763   0           0          8       8
4      8       0                  0        0         0       0     0           0          1145    1145
3      209     0                  0        0         0       3702  0           0          250271  250271
1      4       0                  0        0         0       1403  0           0          4       4
2      0       0                  0        0         0       0     0           0          0       0
...
...
; OSD stats
OSD_STAT  USED     AVAIL    USED_RAW  TOTAL   HB_PEERS           PG_SUM  PRIMARY_PG_SUM
4          70 MiB  9.9 GiB    70 MiB  10 GiB  [0,1,2,3,5,6,7,8]      56              21
7          44 MiB   10 GiB    44 MiB  10 GiB  [0,1,2,3,4,5,6,8]      39               9
2          44 MiB   10 GiB    44 MiB  10 GiB  [0,1,3,4,5,6,7,8]      34              11

Copy to Clipboard

Toggle word wrap

The following tables describe the fields in the output example. Note that new releases may add additional columns.

Expand

Table 8.1. PG Stats Table
PG stats column name	Description
PG_STAT	PG ID
OBJECTS	Number of RADOS objects associated with this PG
MISSING_ON_PRIMARY	Number of missing RADOS objects on primary OSD
DEGRADED	Number of degraded (missing desired replicas) RADOS objects
MISPLACED	Number of misplaced (wrong location in the cluster) RADOS objects
UNFOUND	Number of unfound RADOS objects
BYTES	Total number of bytes stored in RADOS objects
OMAP_BYTES*	Total number of bytes of all omaps
OMAP_KEYS*	Total number of omap keys
LOG	Number of PG log entries
LOG_DUPS	Number of PG log entries for duplicate ops
DISK_LOG	Number of stored PG log entries
STATE	Current state of the PG
STATE_STAMP	Timestamp of the last state change
VERSION	Version of the most recent write to the placement group
REPORTED	Version of pg_stat reported for this PG
UP	The set of OSDs responsible for transferring data
UP_PRIMARY	The primary OSD for this PG; handles client requests
ACTING	The set of OSDs that currently have a full and working version of the PG
ACTING_PRIMARY	Primary OSD responsible for handling client requests
LAST_SCRUB	Last PG version that completed a shallow scrub
SCRUB_STAMP	Timestamp when this PG last completed a shallow scrub
LAST_DEEP_SCRUB	Last PG version that completed a deep scrub
DEEP_SCRUB_STAMP	Timestamp when this PG last completed a deep-scrub
SNAPTRIMQ_LEN	Size of the snaptrim queue
LAST_SCRUB_DURATION	The duration (in seconds) of the last completed scrub
SCRUB_SCHEDULING	Indicates whether this PG is scheduled to be scrubbed at a specified time, whether it is queued for scrubbing, or whether it is being scrubbed
OBJECTS_SCRUBBED	The number of RADOS objects scrubbed in a PG after a scrub begins
OBJECTS_TRIMMED	The number of RADOS objects trimmed

Expand

Table 8.2. Pool Stats Table
Pool stats column name	Description
POOLID	Numeric pool ID
OBJECTS	Number of RADOS objects stored in this pool
MISSING_ON_PRIMARY	Number of RADOS objects missing from this pool
DEGRADED	Number of degraded (missing desired replicas) RADOS objects in this pool
MISPLACED	Number of misplaced (wrong location in the cluster) RADOS objects in the pool
UNFOUND	Number of unfound RADOS objects in this pool
BYTES	Total number of bytes stored in RADOS objects in this pool
OMAP_BYTES*	Total number of bytes stored in OMAPs in this pool
OMAP_KEYS*	Total number of OMAP keys stored in this pool
LOG	Sum of log entries across all PGs within this pool
DISK_LOG	Sum of all persisted log entries across all PGs within this pool

Expand

Table 8.3. OSD Stats Table
OSD stats column name	Description
OSD_STAT	OSD ID
USED	The amount of data stored on the OSD
AVAIL	The amount of free space available on the OSD
USED_RAW	The amount of raw storage consumed by user data, internal overhead, and reserved capacity
TOTAL	The total storage capacity of the OSD
HB_PEERS	Peer OSDs that this OSD heartbeats
PG_SUM	The number of placement group replicas or shards on this OSD
PRIMARY_PG_SUM	The number of placement groups for which this OSD acts as primary

8.8. Additional Resources
Copy link

See Chapter 3, Troubleshooting networking issues for details.
See Chapter 4, Troubleshooting Ceph Monitors for details about troubleshooting the most common errors related to Ceph Monitors.
See Chapter 5, Troubleshooting Ceph OSDs for details about troubleshooting the most common errors related to Ceph OSDs.
See the Auto-scaling placement groups section in the Red Hat Ceph Storage Storage Strategies Guide for more information on PG autoscaler.

8.1. Prerequisites
Copy link

8.2. Most common Ceph placement groups errors
Copy link

8.2.1. Prerequisites
Copy link

8.2.2. Placement group error messages
Copy link

8.2.3. Stale placement groups
Copy link

8.2.4. Inconsistent placement groups
Copy link

8.2.5. Unclean placement groups
Copy link

8.2.6. Inactive placement groups
Copy link

8.2.7. Placement groups are down
Copy link

8.2.8. Unfound objects
Copy link

8.3. Listing placement groups stuck in stale, inactive, or unclean state
Copy link

8.4. Listing placement group inconsistencies
Copy link

8.5. Repairing inconsistent placement groups
Copy link

8.6. Increasing the placement group
Copy link

8.7. Interpreting placement group dump output
Copy link

8.8. Additional Resources
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 8. Troubleshooting Ceph placement groups

8.1. PrerequisitesCopy linkLink copied to clipboard!

8.2. Most common Ceph placement groups errorsCopy linkLink copied to clipboard!

8.2.1. PrerequisitesCopy linkLink copied to clipboard!

8.2.2. Placement group error messagesCopy linkLink copied to clipboard!

8.2.3. Stale placement groupsCopy linkLink copied to clipboard!

8.2.4. Inconsistent placement groupsCopy linkLink copied to clipboard!

8.2.5. Unclean placement groupsCopy linkLink copied to clipboard!

8.2.6. Inactive placement groupsCopy linkLink copied to clipboard!

8.2.7. Placement groups are downCopy linkLink copied to clipboard!

8.2.8. Unfound objectsCopy linkLink copied to clipboard!

8.3. Listing placement groups stuck in stale, inactive, or unclean stateCopy linkLink copied to clipboard!

8.4. Listing placement group inconsistenciesCopy linkLink copied to clipboard!

8.5. Repairing inconsistent placement groupsCopy linkLink copied to clipboard!

8.6. Increasing the placement groupCopy linkLink copied to clipboard!

8.7. Interpreting placement group dump outputCopy linkLink copied to clipboard!

8.8. Additional ResourcesCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

8.1. Prerequisites
Copy link

8.2. Most common Ceph placement groups errors
Copy link

8.2.1. Prerequisites
Copy link

8.2.2. Placement group error messages
Copy link

8.2.3. Stale placement groups
Copy link

8.2.4. Inconsistent placement groups
Copy link

8.2.5. Unclean placement groups
Copy link

8.2.6. Inactive placement groups
Copy link

8.2.7. Placement groups are down
Copy link

8.2.8. Unfound objects
Copy link

8.3. Listing placement groups stuck in stale, inactive, or unclean state
Copy link

8.4. Listing placement group inconsistencies
Copy link

8.5. Repairing inconsistent placement groups
Copy link

8.6. Increasing the placement group
Copy link

8.7. Interpreting placement group dump output
Copy link

8.8. Additional Resources
Copy link