此内容没有您所选择的语言版本。

Chapter 7. Changing an OSD Drive


Ceph is designed for fault tolerance, which means Ceph can operate in a degraded state without losing data. For example, Ceph can operate even if a data storage drive fails. In the context of a failed drive, the degraded state means that the extra copies of the data stored on other OSDs will backfill automatically to other OSDs in the cluster. However, if an OSD drive fails, you will have to replace the failed OSD drive and recreate the OSD manually.

When a drive fails, initially the OSD status will be down and in the cluster. Ceph health warnings will indicate that an OSD is down. Just because an OSD gets marked down doesn’t mean the drive has failed. For example, heart beating and other networking issues could get an OSD marked down even if is up.

Modern servers typically deploy with hot-swappable drives so you can pull a failed drive and replace it with a new one without bringing down the node. However, with Ceph Storage you will also have to address software-defined part of the OSD. The general procedure for replacing an OSD involves removing the OSD from your Ceph cluster, replacing the drive and then re-creating the OSD.

  1. Check cluster health.

    # ceph health
    Copy to Clipboard Toggle word wrap
  2. If an OSD is down, identify its location in the CRUSH hierarchy.

    # ceph osd tree | grep -i down
    Copy to Clipboard Toggle word wrap
  3. If an OSD is down and in, log in to the OSD node and try to restart it.

    # ssh {osd-node}
    # systemctl start ceph-osd@{osd-id}
    Copy to Clipboard Toggle word wrap

    If the command indicates that the OSD is already running, it may be a heartbeat or networking issue. If you cannot restart the OSD, the drive may have failed.

    Note

    If the OSD is down, it will eventually get marked out. This is normal behavior for Ceph Storage. When the OSD gets marked out, other OSDs with copies of the failed OSD’s data will begin backfilling to ensure that the required number of copies exist within the cluster. While the cluster is backfilling, the cluster will be in a degraded state.

  4. Check the failed OSD’s mount point.

    If you cannot restart the OSD, you should check the mount point. If the mount point no longer appears, you can try to re-mount the OSD drive and restart the OSD. For example, if the server restarted, but lost the mount point in fstab, remount the drive.

    # df -h
    Copy to Clipboard Toggle word wrap

    If you cannot restore the mount point, you may have a failed OSD drive. Use your drive utilities to determine if the drive is healthy. For example:

    # yum install smartmontools
    # smartctl -H /dev/{drive}
    Copy to Clipboard Toggle word wrap

    If the drive has failed, you will need to replace it.

  5. Ensure the OSD is out of cluster.

    # ceph osd out osd.<num>
    Copy to Clipboard Toggle word wrap
  6. Ensure the OSD process is stopped.

    # systemctl stop ceph-osd@<osd-id>
    Copy to Clipboard Toggle word wrap
  7. Ensure the failed OSD is backfilling.

    # ceph -w
    Copy to Clipboard Toggle word wrap
  8. Remove the OSD from the CRUSH Map.

    # ceph osd crush remove osd.<num>
    Copy to Clipboard Toggle word wrap
  9. Remove the OSD’s authentication keys.

    # ceph auth del osd.<num>
    Copy to Clipboard Toggle word wrap
  10. Remove the OSD from the Ceph Cluster.

    # ceph osd rm osd.<num>
    Copy to Clipboard Toggle word wrap
  11. Unmount the failed drive path.

    # umount /var/lib/ceph/{daemon}/{cluster}-{daemon-id}
    Copy to Clipboard Toggle word wrap
  12. Replace the physical drive. Refer to the documentation for your hardware node. If the drive is hot swappable, simply replace the failed drive with a new drive. If the drive is NOT hot swappable and the node contains multiple OSDs, you MAY need to bring the node down to replace the physical drive. If you need to bring the node down temporarily, you may set the cluster to noout to prevent backfilling.

    ceph osd set noout
    Copy to Clipboard Toggle word wrap

    Once you replace the drive and you bring the node and its OSDs back online, remove the noout setting.

    ceph osd unset noout
    Copy to Clipboard Toggle word wrap

    Allow the new drive to appear under /dev and make a note of the drive path before proceeding further.

  13. Find the OSD drive and format the disk.
  14. Recreate the OSD. See Adding an OSD for details.
  15. Check your CRUSH hierarchy to ensure it is accurate.

    ceph osd tree
    Copy to Clipboard Toggle word wrap

    If you are not satisfied with the location of the OSD in your CRUSH hierarchy, you may move it with the move command.

    ceph osd crush move <bucket-to-move> <bucket-type>=<parent-bucket>
    Copy to Clipboard Toggle word wrap
  16. Ensure the OSD is online.
返回顶部
Red Hat logoGithubredditYoutubeTwitter

学习

尝试、购买和销售

社区

关于红帽文档

通过我们的产品和服务,以及可以信赖的内容,帮助红帽用户创新并实现他们的目标。 了解我们当前的更新.

让开源更具包容性

红帽致力于替换我们的代码、文档和 Web 属性中存在问题的语言。欲了解更多详情,请参阅红帽博客.

關於紅帽

我们提供强化的解决方案,使企业能够更轻松地跨平台和环境(从核心数据中心到网络边缘)工作。

Theme

© 2025 Red Hat