Este contenido no está disponible en el idioma seleccionado.

Chapter 1. Initial Troubleshooting

This chapter includes information on:

How to start troubleshooting Ceph errors (Section 1.1, “Identifying Problems”)
Most common ceph health error messages (Section 1.2, “Understanding the Output of the ceph health Command”)
Most common Ceph log error messages (Section 1.3, “Understanding Ceph Logs”)

1.1. Identifying Problems
Copiar enlace

To determine possible causes of the error with Red Hat Ceph Storage you encounter, answer the following question:

Certain problems can arise when using unsupported configurations. Ensure that your configuration is supported. See the Red Hat Ceph Storage: Supported configurations article for details.
Do you know what Ceph component causes the problem?
1. No. Follow Section 1.1.1, “Diagnosing the Health of a Ceph Storage Cluster”.
2. Monitors. See Chapter 4, Troubleshooting Monitors.
3. OSDs. See Chapter 5, Troubleshooting OSDs.
4. Placement groups. See Chapter 6, Troubleshooting Placement Groups.

1.1.1. Diagnosing the Health of a Ceph Storage Cluster
Copiar enlace

This procedure lists basic steps to diagnose the health of a Ceph Storage Cluster.

Check the overall status of the cluster:
```
ceph health detail
```
```
# ceph health detail
```
Copy to Clipboard Toggle word wrap
If the command returns HEALTH_WARN or HEALTH_ERR see Section 1.2, “Understanding the Output of the ceph health Command” for details.
Check the Ceph logs for any error messages listed in Section 1.3, “Understanding Ceph Logs”. The logs are located by default in the /var/log/ceph/ directory.
If the logs do not include sufficient amount of information, increase the debugging level and try to reproduce the action that failed. See Chapter 2, Configuring Logging for details.

1.2. Understanding the Output of the ceph health Command
Copiar enlace

The ceph health command returns information about the status of the Ceph Storage Cluster:

HEALTH_OK indicates that the cluster is healthy.
HEALTH_WARN indicates a warning. In some cases, the Ceph status returns to HEALTH_OK automatically, for example when Ceph finishes the rebalancing process. However, consider further troubleshooting if a cluster is in the HEALTH_WARN state for longer time.
HEALTH_ERR indicates a more serious problem that requires your immediate attention.

Use the ceph health detail and ceph -s commands to get a more detailed output.

The following tables list the most common HEALTH_ERR and HEALTH_WARN error messages related to Monitors, OSDs, and placement groups. The tables provide links to corresponding sections that explain the errors and point to specific procedures to fix problems.

Expand

Table 1.1. Error Messages Related to Monitors
Error message	See
`HEALTH_WARN`
`mon.X is down (out of quorum)`	Section 4.1.1, “A Monitor Is Out of Quorum”
`clock skew`	Section 4.1.2, “Clock Skew”
`store is getting too big!`	Section 4.1.3, “The Monitor Store is Getting Too Big”

Expand

Table 1.2. Error Messages Related to OSDs
Error message	See
`HEALTH_ERR`
`full osds`	Section 5.1.1, “Full OSDs”
`HEALTH_WARN`
`nearfull osds`	Section 5.1.2, “Nearfull OSDs”
`osds are down`	Section 5.1.3, “One or More OSDs Are Down” Section 5.1.4, “Flapping OSDs”
`requests are blocked`	Section 5.1.5, “Slow Requests, and Requests are Blocked”
`slow requests`	Section 5.1.5, “Slow Requests, and Requests are Blocked”

Expand

Table 1.3. Error Messages Related to Placement Groups
Error message	See
`HEALTH_ERR`
`pgs down`	Section 6.1.5, “Placement Groups Are `down`”
`pgs inconsistent`	Section 6.1.2, “Inconsistent Placement Groups”
`scrub errors`	Section 6.1.2, “Inconsistent Placement Groups”
`HEALTH_WARN`
`pgs stale`	Section 6.1.1, “Stale Placement Groups”
`unfound`	Section 6.1.6, “Unfound Objects”

1.3. Understanding Ceph Logs
Copiar enlace

By default, Ceph stores its logs in the /var/log/ceph/ directory.

The <cluster-name>.log is the main cluster log file that includes the global cluster events. By default, this log is named ceph.log. Only the Monitor hosts include the main cluster log.

Each OSD and Monitor has its own log file, named <cluster-name>-osd.<number>.log and <cluster-name>-mon.<hostname>.log.

When you increase debugging level for Ceph subsystems, Ceph generates a new log files for those subsystems as well. For details about logging, see Chapter 2, Configuring Logging.

The following tables list the most common Ceph log error messages related to Monitors and OSDs. The tables provide links to corresponding sections that explain the errors and point to specific procedures to fix them.

Expand

Table 1.4. Common Error Messages in Ceph Logs Related to Monitors
Error message	Log file	See
`clock skew`	Main cluster log	Section 4.1.2, “Clock Skew”
`clocks not synchronized`	Main cluster log	Section 4.1.2, “Clock Skew”
`Corruption: error in middle of record`	Monitor log	Section 4.1.1, “A Monitor Is Out of Quorum” Section 4.3, “Recovering the Monitor Store”
`Corruption: 1 missing files`	Monitor log	Section 4.1.1, “A Monitor Is Out of Quorum” Section 4.3, “Recovering the Monitor Store”
`Caught signal (Bus error)`	Monitor log	Section 4.1.1, “A Monitor Is Out of Quorum”

Expand

Table 1.5. Common Error Messages in Ceph Logs Related to OSDs
Error message	Log file	See
`heartbeat_check: no reply from osd.X`	Main cluster log	Section 5.1.4, “Flapping OSDs”
`wrongly marked me down`	Main cluster log	Section 5.1.4, “Flapping OSDs”
`osds have slow requests`	Main cluster log	Section 5.1.5, “Slow Requests, and Requests are Blocked”
`FAILED assert(!m_filestore_fail_eio)`	OSD log	Section 5.1.3, “One or More OSDs Are Down”
`FAILED assert(0 == "hit suicide timeout")`	OSD log	Section 5.1.3, “One or More OSDs Are Down”

Este contenido no está disponible en el idioma seleccionado.

Chapter 1. Initial Troubleshooting

1.1. Identifying Problems
Copiar enlace

1.1.1. Diagnosing the Health of a Ceph Storage Cluster
Copiar enlace

1.2. Understanding the Output of the ceph health Command
Copiar enlace

1.3. Understanding Ceph Logs
Copiar enlace

Aprender

Pruebe, compre y venda

Comunidades

Acerca de la documentación de Red Hat

Hacer que el código abierto sea más inclusivo

Acerca de Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Este contenido no está disponible en el idioma seleccionado.

Chapter 1. Initial Troubleshooting

1.1. Identifying ProblemsCopiar enlaceEnlace copiado en el portapapeles!

1.1.1. Diagnosing the Health of a Ceph Storage ClusterCopiar enlaceEnlace copiado en el portapapeles!

1.2. Understanding the Output of the ceph health CommandCopiar enlaceEnlace copiado en el portapapeles!

1.3. Understanding Ceph LogsCopiar enlaceEnlace copiado en el portapapeles!

Aprender

Pruebe, compre y venda

Comunidades

Acerca de la documentación de Red Hat

Hacer que el código abierto sea más inclusivo

Acerca de Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

1.1. Identifying Problems
Copiar enlace

1.1.1. Diagnosing the Health of a Ceph Storage Cluster
Copiar enlace

1.2. Understanding the Output of the ceph health Command
Copiar enlace

1.3. Understanding Ceph Logs
Copiar enlace