Chapter 6. Troubleshooting a multisite Ceph Object Gateway

This chapter contains information on how to fix the most common errors related to multisite Ceph Object Gateways configuration and operational conditions.

6.1. Prerequisites
Copy link

A running Red Hat Ceph Storage 3 environment.
A running Ceph Object Gateway.

6.2. Error code definitions for the Ceph Object Gateway
Copy link

The Ceph Object Gateway logs contain error and warning messages to assist in troubleshooting conditions in your environment. Some common ones are listed below with suggested resolutions. Contact Red Hat Support for any additional assistance.

Common error messages

data_sync: ERROR: a sync operation returned error: This is the high-level data sync process complaining that a lower-level bucket sync process returned an error. This message is redundant; the bucket sync error appears above it in the log.
data sync: ERROR: failed to sync object: <bucket name>:<object name>: Either the process failed to fetch the required object over HTTP from a remote gateway or the process failed to write that object to RADOS and it will be tried again.
data sync: ERROR: failure in sync, backing out (sync_status=2): A low level message reflecting one of the above conditions, specifically that the data was deleted before it could sync and thus showing a -2 ENOENT status.
data sync: ERROR: failure in sync, backing out (sync_status=-5): A low level message reflecting one of the above conditions, specifically that we failed to write that object to RADOS and thus showing a -5 EIO.
ERROR: failed to fetch remote data log info: ret=11: This is the EAGAIN generic error code from libcurl reflecting an error condition from another gateway. It will try again by default.
meta sync: ERROR: failed to read mdlog info with (2) No such file or directory: The shard of the mdlog was never created so there is nothing to sync.

Syncing error messages

failed to sync object: Either the process failed to fetch this object over HTTP from a remote gateway or it failed to write that object to RADOS and it will be tried again.
failed to sync bucket instance: (11) Resource temporarily unavailable: A connection issue between primary and secondary zones.
failed to sync bucket instance: (125) Operation canceled: A racing condition exists between writes to the same RADOS object.

6.3. Syncing a multisite Ceph Object Gateway
Copy link

A multisite sync reads the change log from other zones. To get a high-level view of the sync progress from the metadata and the data loags, you can use the following command:

radosgw-admin sync status

radosgw-admin sync status

Copy to Clipboard

Toggle word wrap

This command lists which log shards, if any, which are behind their source zone.

If the results of the sync status you have run above reports log shards are behind, run the following command substituting the shard-id for X.

radosgw-admin data sync status --shard-id=X

radosgw-admin data sync status --shard-id=X

Copy to Clipboard

Toggle word wrap

Replace…: X with the ID number of the shard.

Example

radosgw-admin data sync status --shard-id=27

[root@rgw ~]# radosgw-admin data sync status --shard-id=27
{
  "shard_id": 27,
  "marker": {
         "status": "incremental-sync",
         "marker": "1_1534494893.816775_131867195.1",
         "next_step_marker": "",
         "total_entries": 1,
         "pos": 0,
         "timestamp": "0.000000"
   },
   "pending_buckets": [],
   "recovering_buckets": [
         "pro-registry:4ed07bb2-a80b-4c69-aa15-fdc17ae6f5f2.314303.1:26"
   ]
}

Copy to Clipboard

Toggle word wrap

The output lists which buckets are next to sync and which buckets, if any, are going to be retried due to previous errors.

Inspect the status of individual buckets with the following command, substituting the bucket id for X.

radosgw-admin bucket sync status --bucket=X.

radosgw-admin bucket sync status --bucket=X.

Copy to Clipboard

Toggle word wrap

Replace…: X with the ID number of the bucket.

The result shows which bucket index log shards are behind their source zone.

A common error in sync is EBUSY, which means the sync is already in progress, often on another gateway. Read errors written to the sync error log, which can be read with the following command:

radosgw-admin sync error list

radosgw-admin sync error list

Copy to Clipboard

Toggle word wrap

The syncing process will try again until it is successful. Errors can still occur that can require intervention.

6.3.1. Performance counters for multi-site Ceph Object Gateway data sync
Copy link

The following performance counters are available for multi-site configurations of the Ceph Object Gateway to measure data sync:

poll_latency measures the latency of requests for remote replication logs.
fetch_bytes measures the number of objects and bytes fetched by data sync.

Use the ceph daemon .. perf dump command to view the current metric data for the performance counters:

ceph daemon /var/run/ceph/{rgw}.asok

# ceph daemon /var/run/ceph/{rgw}.asok

Copy to Clipboard

Toggle word wrap

Example output:

{
    "data-sync-from-us-west": {
        "fetch bytes": {
            "avgcount": 54,
            "sum": 54526039885
        },
        "fetch not modified": 7,
        "fetch errors": 0,
        "poll latency": {
            "avgcount": 41,
            "sum": 2.533653367,
            "avgtime": 0.061796423
        },
        "poll errors": 0
    }
}

{
    "data-sync-from-us-west": {
        "fetch bytes": {
            "avgcount": 54,
            "sum": 54526039885
        },
        "fetch not modified": 7,
        "fetch errors": 0,
        "poll latency": {
            "avgcount": 41,
            "sum": 2.533653367,
            "avgtime": 0.061796423
        },
        "poll errors": 0
    }
}

Copy to Clipboard

Toggle word wrap

Note

You must run the ceph daemon command from the node running the daemon.

Additional Resources

For more information about performance counters, see the Performance Counters section in the Administration Guide for Red Hat Ceph Storage 3

Chapter 6. Troubleshooting a multisite Ceph Object Gateway

6.1. Prerequisites
Copy link

6.2. Error code definitions for the Ceph Object Gateway
Copy link

6.3. Syncing a multisite Ceph Object Gateway
Copy link

6.3.1. Performance counters for multi-site Ceph Object Gateway data sync
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 6. Troubleshooting a multisite Ceph Object Gateway

6.1. PrerequisitesCopy linkLink copied to clipboard!

6.2. Error code definitions for the Ceph Object GatewayCopy linkLink copied to clipboard!

6.3. Syncing a multisite Ceph Object GatewayCopy linkLink copied to clipboard!

6.3.1. Performance counters for multi-site Ceph Object Gateway data syncCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

6.1. Prerequisites
Copy link

6.2. Error code definitions for the Ceph Object Gateway
Copy link

6.3. Syncing a multisite Ceph Object Gateway
Copy link

6.3.1. Performance counters for multi-site Ceph Object Gateway data sync
Copy link