Ce contenu n'est pas disponible dans la langue sélectionnée.

Chapter 1. Initial Troubleshooting

As a storage administrator, you can do the initial troubleshooting of a Red Hat Ceph Storage cluster before contacting Red Hat support. This chapter includes the following information:

Prerequisites

A running Red Hat Ceph Storage cluster.

1.1. Identifying problems

To determine possible causes of the error with the Red Hat Ceph Storage cluster, answer the questions in the Procedure section.

Prerequisites

A running Red Hat Ceph Storage cluster.

Procedure

Certain problems can arise when using unsupported configurations. Ensure that your configuration is supported.
Do you know what Ceph component causes the problem?
1. No. Follow Diagnosing the health of a Ceph storage cluster procedure in the Red Hat Ceph Storage Troubleshooting Guide.
2. Ceph Monitors. See Troubleshooting Ceph Monitors section in the Red Hat Ceph Storage Troubleshooting Guide.
3. Ceph OSDs. See Troubleshooting Ceph OSDs section in the Red Hat Ceph Storage Troubleshooting Guide.
4. Ceph placement groups. See Troubleshooting Ceph placement groups section in the Red Hat Ceph Storage Troubleshooting Guide.
5. Multi-site Ceph Object Gateway. See Troubleshooting a multi-site Ceph Object Gateway section in the Red Hat Ceph Storage Troubleshooting Guide.

Additional Resources

See the Red Hat Ceph Storage: Supported configurations article for details.

1.2. Diagnosing the health of a storage cluster

This procedure lists basic steps to diagnose the health of a Red Hat Ceph Storage cluster.

Prerequisites

A running Red Hat Ceph Storage cluster.

Procedure

Log into the Cephadm shell:
Example
```
[root@host01 ~]# cephadm shell
```
Check the overall status of the storage cluster:
Example
```
[ceph: root@host01 /]# ceph health detail
```
If the command returns HEALTH_WARN or HEALTH_ERR see Understanding Ceph health for details.
Monitor the logs of the storage cluster:
Example
```
[ceph: root@host01 /]# ceph -W cephadm
```
To capture the logs of the cluster to a file, run the following commands:
Example
```
[ceph: root@host01 /]# ceph config set global log_to_file true
[ceph: root@host01 /]# ceph config set global mon_cluster_log_to_file true
```
The logs are located by default in the /var/log/ceph/CLUSTER_FSID/ directory. Check the Ceph logs for any error messages listed in Understanding Ceph logs.
If the logs do not include a sufficient amount of information, increase the debugging level and try to reproduce the action that failed. See Configuring logging for details.

1.3. Understanding Ceph health

The ceph health command returns information about the status of the Red Hat Ceph Storage cluster:

HEALTH_OK indicates that the cluster is healthy.
HEALTH_WARN indicates a warning. In some cases, the Ceph status returns to HEALTH_OK automatically. For example when Red Hat Ceph Storage cluster finishes the rebalancing process. However, consider further troubleshooting if a cluster is in the HEALTH_WARN state for longer time.
HEALTH_ERR indicates a more serious problem that requires your immediate attention.

Use the ceph health detail and ceph -s commands to get a more detailed output.

Note

A health warning is displayed if there is no mgr daemon running. In case the last mgr daemon of a Red Hat Ceph Storage cluster was removed, you can manually deploy a mgr daemon, on a random host of the Red Hat Storage cluster. See the Manually deploying a mgr daemon in the Red Hat Ceph Storage 8 Administration Guide.

Additional Resources

See the Ceph Monitor error messages table in the Red Hat Ceph Storage Troubleshooting Guide.
See the Ceph OSD error messages table in the Red Hat Ceph Storage Troubleshooting Guide.
See the Placement group error messages table in the Red Hat Ceph Storage Troubleshooting Guide.

1.4. Muting health alerts of a Ceph cluster

In certain scenarios, users might want to temporarily mute some warnings, because they are already aware of the warning and cannot act on it right away. You can mute health checks so that they do not affect the overall reported status of the Ceph cluster.

Alerts are specified using the health check codes. One example is, when an OSD is brought down for maintenance, OSD_DOWN warnings are expected. You can choose to mute the warning until the maintenance is over because those warnings put the cluster in HEALTH_WARN instead of HEALTH_OK for the entire duration of maintenance.

Most health mutes also disappear if the extent of an alert gets worse. For example, if there is one OSD down, and the alert is muted, the mute disappears if one or more additional OSDs go down. This is true for any health alert that involves a count indicating how much or how many of something is triggering the warning or error.

Prerequisites

A running Red Hat Ceph Storage cluster.
Root-level of access to the nodes.
A health warning message.

Procedure

Log into the Cephadm shell:
Example
```
[root@host01 ~]# cephadm shell
```

Check the health of the Red Hat Ceph Storage cluster by running the ceph health detail command:

Example

[ceph: root@host01 /]# ceph health detail

HEALTH_WARN 1 osds down; 1 OSDs or CRUSH {nodes, device-classes} have {NOUP,NODOWN,NOIN,NOOUT} flags set
[WRN] OSD_DOWN: 1 osds down
    osd.1 (root=default,host=host01) is down
[WRN] OSD_FLAGS: 1 OSDs or CRUSH {nodes, device-classes} have {NOUP,NODOWN,NOIN,NOOUT} flags set
    osd.1 has flags noup

You can see that the storage cluster is in HEALTH_WARN status as one of the OSDs is down.

Mute the alert:

Syntax

ceph health mute HEALTH_MESSAGE

Example

[ceph: root@host01 /]# ceph health mute OSD_DOWN

Optional: A health check mute can have a time to live (TTL) associated with it, such that the mute automatically expires after the specified period of time has elapsed. Specify the TTL as an optional duration argument in the command:
Syntax
```
ceph health mute HEALTH_MESSAGE DURATION
```
DURATION can be specified in s, sec, m, min, h, or hour.
Example
```
[ceph: root@host01 /]# ceph health mute OSD_DOWN 10m
```
In this example, the alert OSD_DOWN is muted for 10 minutes.

Verify if the Red Hat Ceph Storage cluster status has changed to HEALTH_OK:

Example

[ceph: root@host01 /]# ceph -s
  cluster:
    id:     81a4597a-b711-11eb-8cb8-001a4a000740
    health: HEALTH_OK
            (muted: OSD_DOWN(9m) OSD_FLAGS(9m))

  services:
    mon: 3 daemons, quorum host01,host02,host03 (age 33h)
    mgr: host01.pzhfuh(active, since 33h), standbys: host02.wsnngf, host03.xwzphg
    osd: 11 osds: 10 up (since 4m), 11 in (since 5d)

  data:
    pools:   1 pools, 1 pgs
    objects: 13 objects, 0 B
    usage:   85 MiB used, 165 GiB / 165 GiB avail
    pgs:     1 active+clean

In this example, you can see that the alert OSD_DOWN and OSD_FLAG is muted and the mute is active for nine minutes.

Optional: You can retain the mute even after the alert is cleared by making it sticky.

Syntax

ceph health mute HEALTH_MESSAGE DURATION --sticky

Example

[ceph: root@host01 /]# ceph health mute OSD_DOWN 1h --sticky

You can remove the mute by running the following command:

Syntax

ceph health unmute HEALTH_MESSAGE

Example

[ceph: root@host01 /]# ceph health unmute OSD_DOWN

Additional Resources

See the Health messages of a Ceph cluster section in the Red Hat Ceph Storage Troubleshooting Guide for details.

1.5. Understanding Ceph logs

Ceph stores its logs in the /var/log/ceph/CLUSTER_FSID/ directory after the logging to files is enabled.

The CLUSTER_NAME.log is the main storage cluster log file that includes global events. By default, the log file name is ceph.log. Only the Ceph Monitor nodes include the main storage cluster log.

Each Ceph OSD and Monitor has its own log file, named CLUSTER_NAME-osd.NUMBER.log and CLUSTER_NAME-mon.HOSTNAME.log.

When you increase debugging level for Ceph subsystems, Ceph generates new log files for those subsystems as well.

Additional Resources

For details about logging, see Configuring logging in the Red Hat Ceph Storage Troubleshooting Guide.
See the Common Ceph Monitor error messages in the Ceph logs table in the Red Hat Ceph Storage Troubleshooting Guide.
See the Common Ceph OSD error messages in the Ceph logs table in the Red Hat Ceph Storage Troubleshooting Guide.
See the Ceph daemon logs to enable logging to files.

1.6. Generating an `sos report`

You can run the sos report command to collect the configuration details, system information, and diagnostic information of a Red Hat Ceph Storage cluster from a Red Hat Enterprise Linux. Red Hat Support team uses this information for further troubleshooting of the storage cluster.

Prerequisites

A running Red Hat Ceph Storage cluster.
Root-level access to the nodes.

Procedure

Install the sos package:
Example
```
[root@host01 ~]# dnf install sos
```
Run the sos report to get the system information of the storage cluster:
Example
```
[root@host01 ~]# sosreport -a --all-logs
```
The report is saved in the /var/tmp file.
Run the following command for specific Ceph daemon information:
Example
```
[root@host01 ~]# sos report --all-logs -e ceph_mgr,ceph_common,ceph_mon,ceph_osd,ceph_ansible,ceph_mds,ceph_rgw
```

Additional Resources

See the What is an sosreport and how to create one in Red Hat Enterprise Linux? KnowledgeBase article for more information.

Ce contenu n'est pas disponible dans la langue sélectionnée.

Chapter 1. Initial Troubleshooting

1.1. Identifying problems

1.2. Diagnosing the health of a storage cluster

1.3. Understanding Ceph health

1.4. Muting health alerts of a Ceph cluster

1.5. Understanding Ceph logs

1.6. Generating an `sos report`

Apprendre

Essayez, achetez et vendez

Communautés

À propos de la documentation Red Hat

Rendre l’open source plus inclusif

À propos de Red Hat

Red Hat legal and privacy links

Red Hat legal and privacy links

Ce contenu n'est pas disponible dans la langue sélectionnée.

Chapter 1. Initial Troubleshooting

1.1. Identifying problems

1.2. Diagnosing the health of a storage cluster

1.3. Understanding Ceph health

1.4. Muting health alerts of a Ceph cluster

1.5. Understanding Ceph logs

1.6. Generating an sos report

Apprendre

Essayez, achetez et vendez

Communautés

À propos de la documentation Red Hat

Rendre l’open source plus inclusif

À propos de Red Hat

Red Hat legal and privacy links

Red Hat legal and privacy links

1.6. Generating an `sos report`