2.3. 模拟磁盘失败
存在两种磁盘故障方案:硬链接和软。硬故障意味着替换磁盘。软故障可能是设备驱动程序或某些其他软件组件的问题。
如果出现软故障,则可能不需要替换磁盘。如果替换磁盘,则需要遵循步骤来移除失败的磁盘,并将替换磁盘添加到 Ceph。为了模拟软磁盘失败,最好删除该设备。选择一个设备并从系统中删除该设备。
echo 1 > /sys/block/$DEVICE/device/delete
示例
[root@ceph1 ~]# echo 1 > /sys/block/sdb/device/delete
在 Ceph OSD 日志中,OSD 检测到故障并自动启动恢复过程。
示例
[root@ceph1 ~]# tail -50 /var/log/ceph/ceph-osd.1.log 2017-02-02 12:15:27.490889 7f3e1fa3d800 -1 ^[[0;31m ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-1: (5) Input/output error^[[0m 2017-02-02 12:34:17.777898 7fb7df1e7800 0 set uid:gid to 167:167 (ceph:ceph) 2017-02-02 12:34:17.777933 7fb7df1e7800 0 ceph version 10.2.3-17.el7cp (ca9d57c0b140eb5cea9de7f7133260271e57490e), process ceph-osd, pid 1752 2017-02-02 12:34:17.788885 7fb7df1e7800 0 pidfile_write: ignore empty --pid-file 2017-02-02 12:34:17.870322 7fb7df1e7800 0 filestore(/var/lib/ceph/osd/ceph-1) backend xfs (magic 0x58465342) 2017-02-02 12:34:17.871028 7fb7df1e7800 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2017-02-02 12:34:17.871035 7fb7df1e7800 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option 2017-02-02 12:34:17.871059 7fb7df1e7800 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features: splice is supported 2017-02-02 12:34:17.897839 7fb7df1e7800 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2017-02-02 12:34:17.897985 7fb7df1e7800 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_feature: extsize is disabled by conf 2017-02-02 12:34:17.921162 7fb7df1e7800 1 leveldb: Recovering log #22 2017-02-02 12:34:17.947335 7fb7df1e7800 1 leveldb: Level-0 table #24: started 2017-02-02 12:34:18.001952 7fb7df1e7800 1 leveldb: Level-0 table #24: 810464 bytes OK 2017-02-02 12:34:18.044554 7fb7df1e7800 1 leveldb: Delete type=0 #22 2017-02-02 12:34:18.045383 7fb7df1e7800 1 leveldb: Delete type=3 #20 2017-02-02 12:34:18.058061 7fb7df1e7800 0 filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled 2017-02-02 12:34:18.105482 7fb7df1e7800 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 18: 1073741824 bytes, block size 4096 bytes, directio = 1, aio = 1 2017-02-02 12:34:18.130293 7fb7df1e7800 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 18: 1073741824 bytes, block size 4096 bytes, directio = 1, aio = 1 2017-02-02 12:34:18.130992 7fb7df1e7800 1 filestore(/var/lib/ceph/osd/ceph-1) upgrade 2017-02-02 12:34:18.136547 7fb7df1e7800 0 <cls> cls/cephfs/cls_cephfs.cc:202: loading cephfs_size_scan 2017-02-02 12:34:18.142863 7fb7df1e7800 0 <cls> cls/hello/cls_hello.cc:305: loading cls_hello 2017-02-02 12:34:18.255019 7fb7df1e7800 0 osd.1 51 crush map has features 2200130813952, adjusting msgr requires for clients 2017-02-02 12:34:18.255041 7fb7df1e7800 0 osd.1 51 crush map has features 2200130813952 was 8705, adjusting msgr requires for mons 2017-02-02 12:34:18.255048 7fb7df1e7800 0 osd.1 51 crush map has features 2200130813952, adjusting msgr requires for osds 2017-02-02 12:34:18.296256 7fb7df1e7800 0 osd.1 51 load_pgs 2017-02-02 12:34:18.561604 7fb7df1e7800 0 osd.1 51 load_pgs opened 152 pgs 2017-02-02 12:34:18.561648 7fb7df1e7800 0 osd.1 51 using 0 op queue with priority op cut off at 64. 2017-02-02 12:34:18.562603 7fb7df1e7800 -1 osd.1 51 log_to_monitors {default=true} 2017-02-02 12:34:18.650204 7fb7df1e7800 0 osd.1 51 done with init, starting boot process 2017-02-02 12:34:19.274937 7fb7b78ba700 0 -- 192.168.122.83:6801/1752 >> 192.168.122.81:6801/2620 pipe(0x7fb7ec4d1400 sd=127 :6801 s=0 pgs=0 cs=0 l=0 c=0x7fb7ec42e480).accept connect_seq 0 vs existing 0 state connecting
查看 osd 磁盘树,我们还要看到磁盘脱机。
[root@ceph1 ~]# ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 0.28976 root default -2 0.09659 host ceph3 1 0.09659 osd.1 down 1.00000 1.00000 -3 0.09659 host ceph1 2 0.09659 osd.2 up 1.00000 1.00000 -4 0.09659 host ceph2 0 0.09659 osd.0 up 1.00000 1.00000