2.3. 模拟磁盘失败
磁盘故障有两种:硬和软。硬故障意味着需要替换磁盘。软故障可能是设备驱动程序或其它一些软件组件的问题。
如果是软故障,可能不需要替换磁盘。如果替换磁盘,则需要遵循步骤来删除失败的磁盘,并将替换磁盘添加到 Ceph。为了模拟软磁盘失败,最好的操作是删除该设备。选择一个设备并从系统中删除设备。
先决条件
- 一个正常运行的 Red Hat Ceph Storage 集群。
- 对 Ceph OSD 节点的 root 级别访问权限。
流程
从
sysfs
中删除块设备:语法
echo 1 > /sys/block/BLOCK_DEVICE/device/delete
示例
[root@osd ~]# echo 1 > /sys/block/sdb/device/delete
在 Ceph OSD 日志中,Ceph 会检测到故障,并且自动启动恢复过程。
示例
[root@osd ~]# tail -50 /var/log/ceph/ceph-osd.1.log 2020-09-02 15:50:50.187067 7ff1ce9a8d80 1 bdev(0x563d263d4600 /var/lib/ceph/osd/ceph-2/block) close 2020-09-02 15:50:50.440398 7ff1ce9a8d80 -1 osd.2 0 OSD:init: unable to mount object store 2020-09-02 15:50:50.440416 7ff1ce9a8d80 -1 ^[[0;31m ** ERROR: osd init failed: (5) Input/output error^[[0m 2020-09-02 15:51:10.633738 7f495c44bd80 0 set uid:gid to 167:167 (ceph:ceph) 2020-09-02 15:51:10.633752 7f495c44bd80 0 ceph version 12.2.12-124.el7cp (e8948288b90d312c206301a9fcf80788fbc3b1f8) luminous (stable), process ceph-osd, pid 36209 2020-09-02 15:51:10.634703 7f495c44bd80 -1 bluestore(/var/lib/ceph/osd/ceph-2/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-2/block: (5) Input/output error 2020-09-02 15:51:10.635749 7f495c44bd80 -1 bluestore(/var/lib/ceph/osd/ceph-2/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-2/block: (5) Input/output error 2020-09-02 15:51:10.636642 7f495c44bd80 -1 bluestore(/var/lib/ceph/osd/ceph-2/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-2/block: (5) Input/output error 2020-09-02 15:51:10.637535 7f495c44bd80 -1 bluestore(/var/lib/ceph/osd/ceph-2/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-2/block: (5) Input/output error 2020-09-02 15:51:10.641256 7f495c44bd80 0 pidfile_write: ignore empty --pid-file 2020-09-02 15:51:10.669317 7f495c44bd80 0 load: jerasure load: lrc load: isa 2020-09-02 15:51:10.669387 7f495c44bd80 1 bdev create path /var/lib/ceph/osd/ceph-2/block type kernel 2020-09-02 15:51:10.669395 7f495c44bd80 1 bdev(0x55a423da9200 /var/lib/ceph/osd/ceph-2/block) open path /var/lib/ceph/osd/ceph-2/block 2020-09-02 15:51:10.669611 7f495c44bd80 1 bdev(0x55a423da9200 /var/lib/ceph/osd/ceph-2/block) open size 500103643136 (0x7470800000, 466GiB) block_size 4096 (4KiB) rotational 2020-09-02 15:51:10.670320 7f495c44bd80 -1 bluestore(/var/lib/ceph/osd/ceph-2/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-2/block: (5) Input/output error 2020-09-02 15:51:10.670328 7f495c44bd80 1 bdev(0x55a423da9200 /var/lib/ceph/osd/ceph-2/block) close 2020-09-02 15:51:10.924727 7f495c44bd80 1 bluestore(/var/lib/ceph/osd/ceph-2) _mount path /var/lib/ceph/osd/ceph-2 2020-09-02 15:51:10.925582 7f495c44bd80 -1 bluestore(/var/lib/ceph/osd/ceph-2/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-2/block: (5) Input/output error 2020-09-02 15:51:10.925628 7f495c44bd80 1 bdev create path /var/lib/ceph/osd/ceph-2/block type kernel 2020-09-02 15:51:10.925630 7f495c44bd80 1 bdev(0x55a423da8600 /var/lib/ceph/osd/ceph-2/block) open path /var/lib/ceph/osd/ceph-2/block 2020-09-02 15:51:10.925784 7f495c44bd80 1 bdev(0x55a423da8600 /var/lib/ceph/osd/ceph-2/block) open size 500103643136 (0x7470800000, 466GiB) block_size 4096 (4KiB) rotational 2020-09-02 15:51:10.926549 7f495c44bd80 -1 bluestore(/var/lib/ceph/osd/ceph-2/block) _read_bdev_label failed to read from /var/lib/ceph/osd/ceph-2/block: (5) Input/output error
查看 Ceph OSD 磁盘树,我们也会看到磁盘离线。
示例
[root@osd ~]# ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 0.28976 root default -2 0.09659 host ceph3 1 0.09659 osd.1 down 1.00000 1.00000 -3 0.09659 host ceph1 2 0.09659 osd.2 up 1.00000 1.00000 -4 0.09659 host ceph2 0 0.09659 osd.0 up 1.00000 1.00000