Home
Products
Red Hat Ceph Storage
4
Troubleshooting Guide
Chapter 9. Troubleshooting Ceph placement groups

Chapter 9. Troubleshooting Ceph placement groups

This section contains information about fixing the most common errors related to the Ceph Placement Groups (PGs).

9.1. Prerequisites
Copy link

Verify your network connection.
Ensure that Monitors are able to form a quorum.
Ensure that all healthy OSDs are up and in, and the backfilling and recovery processes are finished.

9.2. Most common Ceph placement groups errors
Copy link

The following table lists the most common errors messages that are returned by the ceph health detail command. The table provides links to corresponding sections that explain the errors and point to specific procedures to fix the problems.

In addition, you can list placement groups that are stuck in a state that is not optimal. See Section 9.3, “Listing placement groups stuck in stale, inactive, or unclean state” for details.

9.2.1. Prerequisites
Copy link

A running Red Hat Ceph Storage cluster.
A running Ceph Object Gateway.

9.2.2. Placement group error messages
Copy link

A table of common placement group error messages, and a potential fix.

Expand

Error message	See
`HEALTH_ERR`
`pgs down`	Placement groups are `down`
`pgs inconsistent`	Inconsistent placement groups
`scrub errors`	Inconsistent placement groups
`HEALTH_WARN`
`pgs stale`	Stale placement groups
`unfound`	Unfound objects

9.2.3. Stale placement groups
Copy link

The ceph health command lists some Placement Groups (PGs) as stale:

HEALTH_WARN 24 pgs stale; 3/300 in osds are down

HEALTH_WARN 24 pgs stale; 3/300 in osds are down

Copy to Clipboard

Toggle word wrap

What This Means

The Monitor marks a placement group as stale when it does not receive any status update from the primary OSD of the placement group’s acting set or when other OSDs reported that the primary OSD is down.

Usually, PGs enter the stale state after you start the storage cluster and until the peering process completes. However, when the PGs remain stale for longer than expected, it might indicate that the primary OSD for those PGs is down or not reporting PG statistics to the Monitor. When the primary OSD storing stale PGs is back up, Ceph starts to recover the PGs.

The mon_osd_report_timeout setting determines how often OSDs report PGs statistics to Monitors. Be default, this parameter is set to 0.5, which means that OSDs report the statistics every half a second.

To Troubleshoot This Problem

Identify which PGs are stale and on what OSDs they are stored. The error message will include information similar to the following example:

Example

ceph health detail
HEALTH_WARN 24 pgs stale; 3/300 in osds are down
...
pg 2.5 is stuck stale+active+remapped, last acting [2,0]
...
osd.10 is down since epoch 23, last address 192.168.106.220:6800/11080
osd.11 is down since epoch 13, last address 192.168.106.220:6803/11539
osd.12 is down since epoch 24, last address 192.168.106.220:6806/11861

# ceph health detail
HEALTH_WARN 24 pgs stale; 3/300 in osds are down
...
pg 2.5 is stuck stale+active+remapped, last acting [2,0]
...
osd.10 is down since epoch 23, last address 192.168.106.220:6800/11080
osd.11 is down since epoch 13, last address 192.168.106.220:6803/11539
osd.12 is down since epoch 24, last address 192.168.106.220:6806/11861

Copy to Clipboard

Toggle word wrap

Troubleshoot any problems with the OSDs that are marked as down. For details, see Down OSDs.

Additional Resources

The Monitoring Placement Group sets section in the Administration Guide for Red Hat Ceph Storage 4

9.2.4. Inconsistent placement groups
Copy link

Some placement groups are marked as active + clean + inconsistent and the ceph health detail returns an error messages similar to the following one:

HEALTH_ERR 1 pgs inconsistent; 2 scrub errors
pg 0.6 is active+clean+inconsistent, acting [0,1,2]
2 scrub errors

HEALTH_ERR 1 pgs inconsistent; 2 scrub errors
pg 0.6 is active+clean+inconsistent, acting [0,1,2]
2 scrub errors

Copy to Clipboard

Toggle word wrap

What This Means

When Ceph detects inconsistencies in one or more replicas of an object in a placement group, it marks the placement group as inconsistent. The most common inconsistencies are:

Objects have an incorrect size.
Objects are missing from one replica after a recovery finished.

In most cases, errors during scrubbing cause inconsistency within placement groups.

To Troubleshoot This Problem

Determine which placement group is in the inconsistent state:

ceph health detail
HEALTH_ERR 1 pgs inconsistent; 2 scrub errors
pg 0.6 is active+clean+inconsistent, acting [0,1,2]
2 scrub errors

# ceph health detail
HEALTH_ERR 1 pgs inconsistent; 2 scrub errors
pg 0.6 is active+clean+inconsistent, acting [0,1,2]
2 scrub errors

Copy to Clipboard

Toggle word wrap

Determine why the placement group is inconsistent.

Start the deep scrubbing process on the placement group:
```
ceph pg deep-scrub ID
```
```
[root@mon ~]# ceph pg deep-scrub ID
```
Copy to Clipboard Toggle word wrap
Replace ID with the ID of the inconsistent placement group, for example:
```
ceph pg deep-scrub 0.6
instructing pg 0.6 on osd.0 to deep-scrub
```
```
[root@mon ~]# ceph pg deep-scrub 0.6
instructing pg 0.6 on osd.0 to deep-scrub
```
Copy to Clipboard Toggle word wrap

Search the output of the ceph -w for any messages related to that placement group:

ceph -w | grep ID

ceph -w | grep ID

Copy to Clipboard

Toggle word wrap

Replace ID with the ID of the inconsistent placement group, for example:

ceph -w | grep 0.6
2015-02-26 01:35:36.778215 osd.106 [ERR] 0.6 deep-scrub stat mismatch, got 636/635 objects, 0/0 clones, 0/0 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts, 1855455/1854371 bytes.
2015-02-26 01:35:36.788334 osd.106 [ERR] 0.6 deep-scrub 1 errors

[root@mon ~]# ceph -w | grep 0.6
2015-02-26 01:35:36.778215 osd.106 [ERR] 0.6 deep-scrub stat mismatch, got 636/635 objects, 0/0 clones, 0/0 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts, 1855455/1854371 bytes.
2015-02-26 01:35:36.788334 osd.106 [ERR] 0.6 deep-scrub 1 errors

Copy to Clipboard

Toggle word wrap

If the output includes any error messages similar to the following ones, you can repair the inconsistent placement group. See Repairing inconsistent placement groups for details.

PG.ID shard OSD: soid OBJECT missing attr , missing attr _ATTRIBUTE_TYPE
PG.ID shard OSD: soid OBJECT digest 0 != known digest DIGEST, size 0 != known size SIZE
PG.ID shard OSD: soid OBJECT size 0 != known size SIZE
PG.ID deep-scrub stat mismatch, got MISMATCH
PG.ID shard OSD: soid OBJECT candidate had a read error, digest 0 != known digest DIGEST

PG.ID shard OSD: soid OBJECT missing attr , missing attr _ATTRIBUTE_TYPE
PG.ID shard OSD: soid OBJECT digest 0 != known digest DIGEST, size 0 != known size SIZE
PG.ID shard OSD: soid OBJECT size 0 != known size SIZE
PG.ID deep-scrub stat mismatch, got MISMATCH
PG.ID shard OSD: soid OBJECT candidate had a read error, digest 0 != known digest DIGEST

Copy to Clipboard

Toggle word wrap

If the output includes any error messages similar to the following ones, it is not safe to repair the inconsistent placement group because you can lose data. Open a support ticket in this situation. See Contacting Red Hat support for details.

PG.ID shard OSD: soid OBJECT digest DIGEST != known digest DIGEST
PG.ID shard OSD: soid OBJECT omap_digest DIGEST != known omap_digest DIGEST

PG.ID shard OSD: soid OBJECT digest DIGEST != known digest DIGEST
PG.ID shard OSD: soid OBJECT omap_digest DIGEST != known omap_digest DIGEST

Copy to Clipboard

Toggle word wrap

Additional Resources

Listing placement group inconsistencies in the Red Hat Ceph Storage Troubleshooting Guide.
The Ceph Data integrity section in the Red Hat Ceph Storage Architecture Guide.
The Scrubbing the OSD section in the Red Hat Ceph Storage Configuration Guide.

9.2.5. Unclean placement groups
Copy link

The ceph health command returns an error message similar to the following one:

HEALTH_WARN 197 pgs stuck unclean

HEALTH_WARN 197 pgs stuck unclean

Copy to Clipboard

Toggle word wrap

What This Means

Ceph marks a placement group as unclean if it has not achieved the active+clean state for the number of seconds specified in the mon_pg_stuck_threshold parameter in the Ceph configuration file. The default value of mon_pg_stuck_threshold is 300 seconds.

If a placement group is unclean, it contains objects that are not replicated the number of times specified in the osd_pool_default_size parameter. The default value of osd_pool_default_size is 3, which means that Ceph creates three replicas.

Usually, unclean placement groups indicate that some OSDs might be down.

To Troubleshoot This Problem

Determine which OSDs are down:
```
ceph osd tree
```
```
# ceph osd tree
```
Copy to Clipboard Toggle word wrap
Troubleshoot and fix any problems with the OSDs. See Down OSDs for details.

Additional Resources

Listing placement groups stuck in stale inactive or unclean state.

9.2.6. Inactive placement groups
Copy link

The ceph health command returns a error message similar to the following one:

HEALTH_WARN 197 pgs stuck inactive

HEALTH_WARN 197 pgs stuck inactive

Copy to Clipboard

Toggle word wrap

What This Means

Ceph marks a placement group as inactive if it has not be active for the number of seconds specified in the mon_pg_stuck_threshold parameter in the Ceph configuration file. The default value of mon_pg_stuck_threshold is 300 seconds.

Usually, inactive placement groups indicate that some OSDs might be down.

To Troubleshoot This Problem

Determine which OSDs are down:
```
ceph osd tree
```
```
# ceph osd tree
```
Copy to Clipboard Toggle word wrap
Troubleshoot and fix any problems with the OSDs.

Additional Resources

Listing placement groups stuck in stale inactive or unclean state
See Down OSDs for details.

9.2.7. Placement groups are down
Copy link

The ceph health detail command reports that some placement groups are down:

HEALTH_ERR 7 pgs degraded; 12 pgs down; 12 pgs peering; 1 pgs recovering; 6 pgs stuck unclean; 114/3300 degraded (3.455%); 1/3 in osds are down
...
pg 0.5 is down+peering
pg 1.4 is down+peering
...
osd.1 is down since epoch 69, last address 192.168.106.220:6801/8651

HEALTH_ERR 7 pgs degraded; 12 pgs down; 12 pgs peering; 1 pgs recovering; 6 pgs stuck unclean; 114/3300 degraded (3.455%); 1/3 in osds are down
...
pg 0.5 is down+peering
pg 1.4 is down+peering
...
osd.1 is down since epoch 69, last address 192.168.106.220:6801/8651

Copy to Clipboard

Toggle word wrap

What This Means

In certain cases, the peering process can be blocked, which prevents a placement group from becoming active and usable. Usually, a failure of an OSD causes the peering failures.

To Troubleshoot This Problem

Determine what blocks the peering process:

ceph pg ID query

[root@mon ~]# ceph pg ID query

Copy to Clipboard

Toggle word wrap

Replace ID with the ID of the placement group that is down, for example:

ceph pg 0.5 query

{ "state": "down+peering",
  ...
  "recovery_state": [
       { "name": "Started\/Primary\/Peering\/GetInfo",
         "enter_time": "2012-03-06 14:40:16.169679",
         "requested_info_from": []},
       { "name": "Started\/Primary\/Peering",
         "enter_time": "2012-03-06 14:40:16.169659",
         "probing_osds": [
               0,
               1],
         "blocked": "peering is blocked due to down osds",
         "down_osds_we_would_probe": [
               1],
         "peering_blocked_by": [
               { "osd": 1,
                 "current_lost_at": 0,
                 "comment": "starting or marking this osd lost may let us proceed"}]},
       { "name": "Started",
         "enter_time": "2012-03-06 14:40:16.169513"}
   ]
}

[root@mon ~]# ceph pg 0.5 query

{ "state": "down+peering",
  ...
  "recovery_state": [
       { "name": "Started\/Primary\/Peering\/GetInfo",
         "enter_time": "2012-03-06 14:40:16.169679",
         "requested_info_from": []},
       { "name": "Started\/Primary\/Peering",
         "enter_time": "2012-03-06 14:40:16.169659",
         "probing_osds": [
               0,
               1],
         "blocked": "peering is blocked due to down osds",
         "down_osds_we_would_probe": [
               1],
         "peering_blocked_by": [
               { "osd": 1,
                 "current_lost_at": 0,
                 "comment": "starting or marking this osd lost may let us proceed"}]},
       { "name": "Started",
         "enter_time": "2012-03-06 14:40:16.169513"}
   ]
}

Copy to Clipboard

Toggle word wrap

The recovery_state section includes information why the peering process is blocked.

If the output includes the peering is blocked due to down osds error message, see Down OSDs.
If you see any other error message, open a support ticket. See Contacting Red Hat Support service for details.

Additional Resources

The Ceph OSD peering section in the Red Hat Ceph Storage Administration Guide.

9.2.8. Unfound objects
Copy link

The ceph health command returns an error message similar to the following one, containing the unfound keyword:

HEALTH_WARN 1 pgs degraded; 78/3778 unfound (2.065%)

HEALTH_WARN 1 pgs degraded; 78/3778 unfound (2.065%)

Copy to Clipboard

Toggle word wrap

What This Means

Ceph marks objects as unfound when it knows these objects or their newer copies exist but it is unable to find them. As a consequence, Ceph cannot recover such objects and proceed with the recovery process.

An Example Situation

A placement group stores data on osd.1 and osd.2.

osd.1 goes down.
osd.2 handles some write operations.
osd.1 comes up.
A peering process between osd.1 and osd.2 starts, and the objects missing on osd.1 are queued for recovery.
Before Ceph copies new objects, osd.2 goes down.

As a result, osd.1 knows that these objects exist, but there is no OSD that has a copy of the objects.

In this scenario, Ceph is waiting for the failed node to be accessible again, and the unfound objects blocks the recovery process.

To Troubleshoot This Problem

Determine which placement group contain unfound objects:

ceph health detail
HEALTH_WARN 1 pgs recovering; 1 pgs stuck unclean; recovery 5/937611 objects degraded (0.001%); 1/312537 unfound (0.000%)
pg 3.8a5 is stuck unclean for 803946.712780, current state active+recovering, last acting [320,248,0]
pg 3.8a5 is active+recovering, acting [320,248,0], 1 unfound
recovery 5/937611 objects degraded (0.001%); **1/312537 unfound (0.000%)**

[root@mon ~]# ceph health detail
HEALTH_WARN 1 pgs recovering; 1 pgs stuck unclean; recovery 5/937611 objects degraded (0.001%); 1/312537 unfound (0.000%)
pg 3.8a5 is stuck unclean for 803946.712780, current state active+recovering, last acting [320,248,0]
pg 3.8a5 is active+recovering, acting [320,248,0], 1 unfound
recovery 5/937611 objects degraded (0.001%); **1/312537 unfound (0.000%)**

Copy to Clipboard

Toggle word wrap

List more information about the placement group:

ceph pg ID query

[root@mon ~]# ceph pg ID query

Copy to Clipboard

Toggle word wrap

Replace ID with the ID of the placement group containing the unfound objects, for example:

ceph pg 3.8a5 query
{ "state": "active+recovering",
  "epoch": 10741,
  "up": [
        320,
        248,
        0],
  "acting": [
        320,
        248,
        0],
<snip>
  "recovery_state": [
        { "name": "Started\/Primary\/Active",
          "enter_time": "2015-01-28 19:30:12.058136",
          "might_have_unfound": [
                { "osd": "0",
                  "status": "already probed"},
                { "osd": "248",
                  "status": "already probed"},
                { "osd": "301",
                  "status": "already probed"},
                { "osd": "362",
                  "status": "already probed"},
                { "osd": "395",
                  "status": "already probed"},
                { "osd": "429",
                  "status": "osd is down"}],
          "recovery_progress": { "backfill_targets": [],
              "waiting_on_backfill": [],
              "last_backfill_started": "0\/\/0\/\/-1",
              "backfill_info": { "begin": "0\/\/0\/\/-1",
                  "end": "0\/\/0\/\/-1",
                  "objects": []},
              "peer_backfill_info": [],
              "backfills_in_flight": [],
              "recovering": [],
              "pg_backend": { "pull_from_peer": [],
                  "pushing": []}},
          "scrub": { "scrubber.epoch_start": "0",
              "scrubber.active": 0,
              "scrubber.block_writes": 0,
              "scrubber.finalizing": 0,
              "scrubber.waiting_on": 0,
              "scrubber.waiting_on_whom": []}},
        { "name": "Started",
          "enter_time": "2015-01-28 19:30:11.044020"}],

[root@mon ~]# ceph pg 3.8a5 query
{ "state": "active+recovering",
  "epoch": 10741,
  "up": [
        320,
        248,
        0],
  "acting": [
        320,
        248,
        0],
<snip>
  "recovery_state": [
        { "name": "Started\/Primary\/Active",
          "enter_time": "2015-01-28 19:30:12.058136",
          "might_have_unfound": [
                { "osd": "0",
                  "status": "already probed"},
                { "osd": "248",
                  "status": "already probed"},
                { "osd": "301",
                  "status": "already probed"},
                { "osd": "362",
                  "status": "already probed"},
                { "osd": "395",
                  "status": "already probed"},
                { "osd": "429",
                  "status": "osd is down"}],
          "recovery_progress": { "backfill_targets": [],
              "waiting_on_backfill": [],
              "last_backfill_started": "0\/\/0\/\/-1",
              "backfill_info": { "begin": "0\/\/0\/\/-1",
                  "end": "0\/\/0\/\/-1",
                  "objects": []},
              "peer_backfill_info": [],
              "backfills_in_flight": [],
              "recovering": [],
              "pg_backend": { "pull_from_peer": [],
                  "pushing": []}},
          "scrub": { "scrubber.epoch_start": "0",
              "scrubber.active": 0,
              "scrubber.block_writes": 0,
              "scrubber.finalizing": 0,
              "scrubber.waiting_on": 0,
              "scrubber.waiting_on_whom": []}},
        { "name": "Started",
          "enter_time": "2015-01-28 19:30:11.044020"}],

Copy to Clipboard

Toggle word wrap

The might_have_unfound section includes OSDs where Ceph tried to locate the unfound objects:

The already probed status indicates that Ceph cannot locate the unfound objects in that OSD.
The osd is down status indicates that Ceph cannot contact that OSD.

Troubleshoot the OSDs that are marked as down. See Down OSDs for details.
If you are unable to fix the problem that causes the OSD to be down, open a support ticket. See Contacting Red Hat Support for service for details.

9.3. Listing placement groups stuck in stale, inactive, or unclean state
Copy link

After a failure, placement groups enter states like degraded or peering. This states indicate normal progression through the failure recovery process.

However, if a placement group stays in one of these states for a longer time than expected, it can be an indication of a larger problem. The Monitors reports when placement groups get stuck in a state that is not optimal.

The mon_pg_stuck_threshold option in the Ceph configuration file determines the number of seconds after which placement groups are considered inactive, unclean, or stale.

The following table lists these states together with a short explanation.

Expand

State	What it means	Most common causes	See
`inactive`	The PG has not been able to service read/write requests.	Peering problems	Inactive placement groups
`unclean`	The PG contains objects that are not replicated the desired number of times. Something is preventing the PG from recovering.	`unfound` objects OSDs are `down` Incorrect configuration	Unclean placement groups
`stale`	The status of the PG has not been updated by a `ceph-osd` daemon.	OSDs are `down`	Stale placement groups

Prerequisites

A running Red Hat Ceph Storage cluster.
Root-level access to the node.

Procedure

List the stuck PGs:

ceph pg dump_stuck inactive
ceph pg dump_stuck unclean
ceph pg dump_stuck stale

[root@mon ~]# ceph pg dump_stuck inactive
[root@mon ~]# ceph pg dump_stuck unclean
[root@mon ~]# ceph pg dump_stuck stale

Copy to Clipboard

Toggle word wrap

Additional Resources

See the Placement group states section in the Red Hat Ceph Storage Administration Guide.

9.4. Listing placement group inconsistencies
Copy link

Use the rados utility to list inconsistencies in various replicas of an objects. Use the --format=json-pretty option to list a more detailed output.

This section covers the listing of:

Inconsistent placement group in a pool
Inconsistent objects in a placement group
Inconsistent snapshot sets in a placement group

Prerequisites

A running Red Hat Ceph Storage cluster in a healthy state.
Root-level access to the node.

Procedure

rados list-inconsistent-pg POOL --format=json-pretty

rados list-inconsistent-pg POOL --format=json-pretty

Copy to Clipboard

Toggle word wrap

For example, list all inconsistent placement groups in a pool named data:

rados list-inconsistent-pg data --format=json-pretty
[0.6]

# rados list-inconsistent-pg data --format=json-pretty
[0.6]

Copy to Clipboard

Toggle word wrap

rados list-inconsistent-obj PLACEMENT_GROUP_ID

rados list-inconsistent-obj PLACEMENT_GROUP_ID

Copy to Clipboard

Toggle word wrap

For example, list inconsistent objects in a placement group with ID 0.6:

rados list-inconsistent-obj 0.6
{
    "epoch": 14,
    "inconsistents": [
        {
            "object": {
                "name": "image1",
                "nspace": "",
                "locator": "",
                "snap": "head",
                "version": 1
            },
            "errors": [
                "data_digest_mismatch",
                "size_mismatch"
            ],
            "union_shard_errors": [
                "data_digest_mismatch_oi",
                "size_mismatch_oi"
            ],
            "selected_object_info": "0:602f83fe:::foo:head(16'1 client.4110.0:1 dirty|data_digest|omap_digest s 968 uv 1 dd e978e67f od ffffffff alloc_hint [0 0 0])",
            "shards": [
                {
                    "osd": 0,
                    "errors": [],
                    "size": 968,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0xe978e67f"
                },
                {
                    "osd": 1,
                    "errors": [],
                    "size": 968,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0xe978e67f"
                },
                {
                    "osd": 2,
                    "errors": [
                        "data_digest_mismatch_oi",
                        "size_mismatch_oi"
                    ],
                    "size": 0,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0xffffffff"
                }
            ]
        }
    ]
}

# rados list-inconsistent-obj 0.6
{
    "epoch": 14,
    "inconsistents": [
        {
            "object": {
                "name": "image1",
                "nspace": "",
                "locator": "",
                "snap": "head",
                "version": 1
            },
            "errors": [
                "data_digest_mismatch",
                "size_mismatch"
            ],
            "union_shard_errors": [
                "data_digest_mismatch_oi",
                "size_mismatch_oi"
            ],
            "selected_object_info": "0:602f83fe:::foo:head(16'1 client.4110.0:1 dirty|data_digest|omap_digest s 968 uv 1 dd e978e67f od ffffffff alloc_hint [0 0 0])",
            "shards": [
                {
                    "osd": 0,
                    "errors": [],
                    "size": 968,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0xe978e67f"
                },
                {
                    "osd": 1,
                    "errors": [],
                    "size": 968,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0xe978e67f"
                },
                {
                    "osd": 2,
                    "errors": [
                        "data_digest_mismatch_oi",
                        "size_mismatch_oi"
                    ],
                    "size": 0,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0xffffffff"
                }
            ]
        }
    ]
}

Copy to Clipboard

Toggle word wrap

The following fields are important to determine what causes the inconsistency:

name: The name of the object with inconsistent replicas.
nspace: The namespace that is a logical separation of a pool. It’s empty by default.
locator: The key that is used as the alternative of the object name for placement.
snap: The snapshot ID of the object. The only writable version of the object is called head. If an object is a clone, this field includes its sequential ID.
version: The version ID of the object with inconsistent replicas. Each write operation to an object increments it.
errors: A list of errors that indicate inconsistencies between shards without determining which shard or shards are incorrect. See the shard array to further investigate the errors.
- data_digest_mismatch: The digest of the replica read from one OSD is different from the other OSDs.
- size_mismatch: The size of a clone or the head object does not match the expectation.
- read_error: This error indicates inconsistencies caused most likely by disk errors.
union_shard_error: The union of all errors specific to shards. These errors are connected to a faulty shard. The errors that end with oi indicate that you have to compare the information from a faulty object to information with selected objects. See the shard array to further investigate the errors.
In the above example, the object replica stored on osd.2 has different digest than the replicas stored on osd.0 and osd.1. Specifically, the digest of the replica is not 0xffffffff as calculated from the shard read from osd.2, but 0xe978e67f. In addition, the size of the replica read from osd.2 is 0, while the size reported by osd.0 and osd.1 is 968.

rados list-inconsistent-snapset PLACEMENT_GROUP_ID

rados list-inconsistent-snapset PLACEMENT_GROUP_ID

Copy to Clipboard

Toggle word wrap

For example, list inconsistent sets of snapshots (snapsets) in a placement group with ID 0.23:

rados list-inconsistent-snapset 0.23 --format=json-pretty
{
    "epoch": 64,
    "inconsistents": [
        {
            "name": "obj5",
            "nspace": "",
            "locator": "",
            "snap": "0x00000001",
            "headless": true
        },
        {
            "name": "obj5",
            "nspace": "",
            "locator": "",
            "snap": "0x00000002",
            "headless": true
        },
        {
            "name": "obj5",
            "nspace": "",
            "locator": "",
            "snap": "head",
            "ss_attr_missing": true,
            "extra_clones": true,
            "extra clones": [
                2,
                1
            ]
        }
    ]

# rados list-inconsistent-snapset 0.23 --format=json-pretty
{
    "epoch": 64,
    "inconsistents": [
        {
            "name": "obj5",
            "nspace": "",
            "locator": "",
            "snap": "0x00000001",
            "headless": true
        },
        {
            "name": "obj5",
            "nspace": "",
            "locator": "",
            "snap": "0x00000002",
            "headless": true
        },
        {
            "name": "obj5",
            "nspace": "",
            "locator": "",
            "snap": "head",
            "ss_attr_missing": true,
            "extra_clones": true,
            "extra clones": [
                2,
                1
            ]
        }
    ]

Copy to Clipboard

Toggle word wrap

The command returns the following errors:

ss_attr_missing: One or more attributes are missing. Attributes are information about snapshots encoded into a snapshot set as a list of key-value pairs.
ss_attr_corrupted: One or more attributes fail to decode.
clone_missing: A clone is missing.
snapset_mismatch: The snapshot set is inconsistent by itself.
head_mismatch: The snapshot set indicates that head exists or not, but the scrub results report otherwise.
headless: The head of the snapshot set is missing.
size_mismatch: The size of a clone or the head object does not match the expectation.

Additional Resources

Inconsistent placement groups section in the Red Hat Ceph Storage Troubleshooting Guide.
Repairing inconsistent placement groups section in the Red Hat Ceph Storage Troubleshooting Guide.

9.5. Repairing inconsistent placement groups
Copy link

Due to an error during deep scrubbing, some placement groups can include inconsistencies. Ceph reports such placement groups as inconsistent:

HEALTH_ERR 1 pgs inconsistent; 2 scrub errors
pg 0.6 is active+clean+inconsistent, acting [0,1,2]
2 scrub errors

HEALTH_ERR 1 pgs inconsistent; 2 scrub errors
pg 0.6 is active+clean+inconsistent, acting [0,1,2]
2 scrub errors

Copy to Clipboard

Toggle word wrap

Warning

You can repair only certain inconsistencies.

Do not repair the placement groups if the Ceph logs include the following errors:

_PG_._ID_ shard _OSD_: soid _OBJECT_ digest _DIGEST_ != known digest _DIGEST_
_PG_._ID_ shard _OSD_: soid _OBJECT_ omap_digest _DIGEST_ != known omap_digest _DIGEST_

_PG_._ID_ shard _OSD_: soid _OBJECT_ digest _DIGEST_ != known digest _DIGEST_
_PG_._ID_ shard _OSD_: soid _OBJECT_ omap_digest _DIGEST_ != known omap_digest _DIGEST_

Copy to Clipboard

Toggle word wrap

Open a support ticket instead. See Contacting Red Hat Support for service for details.

Prerequisites

Root-level access to the Ceph Monitor node.

Procedure

Repair the inconsistent placement groups:

ceph pg repair ID

[root@mon ~]# ceph pg repair ID

Copy to Clipboard

Toggle word wrap

Replace ID with the ID of the inconsistent placement group.

Additional Resources

Inconsistent placement groups section in the Red Hat Ceph Storage Troubleshooting Guide.
Listing placement group inconsistencies Red Hat Ceph Storage Troubleshooting Guide.

9.6. Increasing the placement group
Copy link

Insufficient Placement Group (PG) count impacts the performance of the Ceph cluster and data distribution. It is one of the main causes of the nearfull osds error messages.

The recommended ratio is between 100 and 300 PGs per OSD. This ratio can decrease when you add more OSDs to the cluster.

The pg_num and pgp_num parameters determine the PG count. These parameters are configured per each pool, and therefore, you must adjust each pool with low PG count separately.

Important

Increasing the PG count is the most intensive process that you can perform on a Ceph cluster. This process might have serious performance impact if not done in a slow and methodical way. Once you increase pgp_num, you will not be able to stop or reverse the process and you must complete it. Consider increasing the PG count outside of business critical processing time allocation, and alert all clients about the potential performance impact. Do not change the PG count if the cluster is in the HEALTH_ERR state.

Prerequisites

A running Red Hat Ceph Storage cluster in a healthy state.
Root-level access to the node.

Procedure

Reduce the impact of data redistribution and recovery on individual OSDs and OSD hosts:

Lower the value of the osd max backfills, osd_recovery_max_active, and osd_recovery_op_priority parameters:

ceph tell osd.* injectargs '--osd_max_backfills 1 --osd_recovery_max_active 1 --osd_recovery_op_priority 1'

[root@mon ~]# ceph tell osd.* injectargs '--osd_max_backfills 1 --osd_recovery_max_active 1 --osd_recovery_op_priority 1'

Copy to Clipboard

Toggle word wrap

Disable the shallow and deep scrubbing:

ceph osd set noscrub
ceph osd set nodeep-scrub

[root@mon ~]# ceph osd set noscrub
[root@mon ~]# ceph osd set nodeep-scrub

Copy to Clipboard

Toggle word wrap

Use the Ceph Placement Groups (PGs) per Pool Calculator to calculate the optimal value of the pg_num and pgp_num parameters.
Increase the pg_num value in small increments until you reach the desired value.
1. Determine the starting increment value. Use a very low value that is a power of two, and increase it when you determine the impact on the cluster. The optimal value depends on the pool size, OSD count, and client I/O load.
2. Increment the pg_num value:
  ceph osd pool set POOL pg_num VALUE
  Copy to Clipboard Toggle word wrap
  Specify the pool name and the new value, for example:
  # ceph osd pool set data pg_num 4
  Copy to Clipboard Toggle word wrap
3. Monitor the status of the cluster:
  # ceph -s
  Copy to Clipboard Toggle word wrap
  The PGs state will change from creating to active+clean. Wait until all PGs are in the active+clean state.
Increase the pgp_num value in small increments until you reach the desired value:
1. Determine the starting increment value. Use a very low value that is a power of two, and increase it when you determine the impact on the cluster. The optimal value depends on the pool size, OSD count, and client I/O load.
2. Increment the pgp_num value:
  ceph osd pool set POOL pgp_num VALUE
  Copy to Clipboard Toggle word wrap
  Specify the pool name and the new value, for example:
  # ceph osd pool set data pgp_num 4
  Copy to Clipboard Toggle word wrap
3. Monitor the status of the cluster:
  # ceph -s
  Copy to Clipboard Toggle word wrap
  The PGs state will change through peering, wait_backfill, backfilling, recover, and others. Wait until all PGs are in the active+clean state.
Repeat the previous steps for all pools with insufficient PG count.

Set osd max backfills, osd_recovery_max_active, and osd_recovery_op_priority to their default values:

ceph tell osd.* injectargs '--osd_max_backfills 1 --osd_recovery_max_active 3 --osd_recovery_op_priority 3'

# ceph tell osd.* injectargs '--osd_max_backfills 1 --osd_recovery_max_active 3 --osd_recovery_op_priority 3'

Copy to Clipboard

Toggle word wrap

Enable the shallow and deep scrubbing:

ceph osd unset noscrub
ceph osd unset nodeep-scrub

# ceph osd unset noscrub
# ceph osd unset nodeep-scrub

Copy to Clipboard

Toggle word wrap

Additional Resources

Nearfull OSDs
The Monitoring Placement Group Sets section in the Administration Guide for Red Hat Ceph Storage 4

9.7. Additional Resources
Copy link

See Chapter 3, Troubleshooting networking issues for details.
See Chapter 4, Troubleshooting Ceph Monitors for details about troubleshooting the most common errors related to Ceph Monitors.
See Chapter 5, Troubleshooting Ceph OSDs for details about troubleshooting the most common errors related to Ceph OSDs.

Chapter 9. Troubleshooting Ceph placement groups

9.1. Prerequisites
Copy link

9.2. Most common Ceph placement groups errors
Copy link

9.2.1. Prerequisites
Copy link

9.2.2. Placement group error messages
Copy link

9.2.3. Stale placement groups
Copy link

9.2.4. Inconsistent placement groups
Copy link

9.2.5. Unclean placement groups
Copy link

9.2.6. Inactive placement groups
Copy link

9.2.7. Placement groups are down
Copy link

9.2.8. Unfound objects
Copy link

9.3. Listing placement groups stuck in stale, inactive, or unclean state
Copy link

9.4. Listing placement group inconsistencies
Copy link

9.5. Repairing inconsistent placement groups
Copy link

9.6. Increasing the placement group
Copy link

9.7. Additional Resources
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 9. Troubleshooting Ceph placement groups

9.1. PrerequisitesCopy linkLink copied to clipboard!

9.2. Most common Ceph placement groups errorsCopy linkLink copied to clipboard!

9.2.1. PrerequisitesCopy linkLink copied to clipboard!

9.2.2. Placement group error messagesCopy linkLink copied to clipboard!

9.2.3. Stale placement groupsCopy linkLink copied to clipboard!

9.2.4. Inconsistent placement groupsCopy linkLink copied to clipboard!

9.2.5. Unclean placement groupsCopy linkLink copied to clipboard!

9.2.6. Inactive placement groupsCopy linkLink copied to clipboard!

9.2.7. Placement groups are downCopy linkLink copied to clipboard!

9.2.8. Unfound objectsCopy linkLink copied to clipboard!

9.3. Listing placement groups stuck in stale, inactive, or unclean stateCopy linkLink copied to clipboard!

9.4. Listing placement group inconsistenciesCopy linkLink copied to clipboard!

9.5. Repairing inconsistent placement groupsCopy linkLink copied to clipboard!

9.6. Increasing the placement groupCopy linkLink copied to clipboard!

9.7. Additional ResourcesCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

9.1. Prerequisites
Copy link

9.2. Most common Ceph placement groups errors
Copy link

9.2.1. Prerequisites
Copy link

9.2.2. Placement group error messages
Copy link

9.2.3. Stale placement groups
Copy link

9.2.4. Inconsistent placement groups
Copy link

9.2.5. Unclean placement groups
Copy link

9.2.6. Inactive placement groups
Copy link

9.2.7. Placement groups are down
Copy link

9.2.8. Unfound objects
Copy link

9.3. Listing placement groups stuck in stale, inactive, or unclean state
Copy link

9.4. Listing placement group inconsistencies
Copy link

9.5. Repairing inconsistent placement groups
Copy link

9.6. Increasing the placement group
Copy link

9.7. Additional Resources
Copy link