4.7. 为裸机部署恢复 Ceph 监控


如果所有 monitor 都在红帽 Ceph 存储集群中停机,并且 ceph -s 命令不会如预期执行,您可以使用 monmaptool 命令恢复 monitor。monmaptool 命令从守护进程的密钥环文件中重建 Ceph 监控器存储。

注意

这个过程只适用于 裸机 Red Hat Ceph Storage 部署。对于 容器化 Red Hat Ceph Storage 部署,请参阅知识库文章 Red Hat Ceph Storage 容器化部署的 MON 恢复过程。

先决条件

  • 裸机部署了红帽 Ceph 存储集群。
  • 所有节点的根级别访问权限。
  • 所有 Ceph 监视器都为 down。

流程

  1. 登录 monitor 节点。
  2. 从监控节点,如果您在不成为 root 用户的情况下无法访问 OSD 节点,请将公钥对复制到 OSD 节点:

    1. 生成 SSH 密钥对,接受默认文件名,并保留密语为空:

      示例

      [root@mons-1 ~]# ssh-keygen

    2. 将公钥复制到存储 集群中所有 OSD 节点:

      示例

      [root@mons-1 ~]#  ssh-copy-id root@osds-1
      [root@mons-1 ~]#  ssh-copy-id root@osds-2
      [root@mons-1 ~]#  ssh-copy-id root@osds-3

  3. 在所有 OSD 节点上停止 OSD 守护进程服务:

    示例

    [root@osds-1 ~]#  sudo systemctl stop ceph-osd\*.service ceph-osd.target

  4. 要从所有 OSD 节点收集 cluster map,请创建恢复文件并执行该脚本:

    1. 创建恢复文件:

      示例

      [root@mons-1 ~]# touch recover.sh

    2. 将以下内容添加到 文件中,并将 OSD_NODES 替换为所有 OSD 节点的 IP 地址或 Red Hat Ceph Storage 集群中所有 OSD 节点的主机名:

      语法

       --------------------------------------------------------------------------  NOTE: The directory names specified by 'ms', 'db', and 'db_slow' must end
       with a trailing / otherwise rsync will not operate properly.  --------------------------------------------------------------------------
      ms=/tmp/monstore/
      db=/root/db/
      db_slow=/root/db.slow/
      
      mkdir -p $ms $db $db_slow
      
       --------------------------------------------------------------------------  NOTE: Replace the contents inside double quotes for 'osd_nodes' below with
       the list of OSD nodes in the environment.  --------------------------------------------------------------------------
      osd_nodes="OSD_NODES_1 OSD_NODES_2 OSD_NODES_3..."
      
      for osd_node in $osd_nodes; do
      echo "Operating on $osd_node"
      rsync -avz --delete $ms $osd_node:$ms
      rsync -avz --delete $db $osd_node:$db
      rsync -avz --delete $db_slow $osd_node:$db_slow
      
      ssh -t $osd_node <<EOF
      for osd in /var/lib/ceph/osd/ceph-*; do
          ceph-objectstore-tool --type bluestore --data-path \$osd --op update-mon-db --no-mon-config --mon-store-path $ms
          if [ -e \$osd/keyring ]; then
              cat \$osd/keyring >> $ms/keyring
              echo '    caps mgr = "allow profile osd"' >> $ms/keyring
              echo '    caps mon = "allow profile osd"' >> $ms/keyring
              echo '    caps osd = "allow *"' >> $ms/keyring
          else
              echo WARNING: \$osd on $osd_node does not have a local keyring.
          fi
      done
      EOF
      
      rsync -avz --delete --remove-source-files $osd_node:$ms $ms
      rsync -avz --delete --remove-source-files $osd_node:$db $db
      rsync -avz --delete --remove-source-files $osd_node:$db_slow $db_slow
      done
       --------------------------------------------------------------------------  End of script
      ## --------------------------------------------------------------------------

    3. 对文件提供可执行权限:

      示例

      [root@mons-1 ~]# chmod 755 recover.sh

    4. 执行该文件,从存储集群中的所有 OSD 节点收集所有 OSD 的密钥环:

      示例

      [root@mons-1 ~]# ./recovery.sh

  5. 从对应节点获取其他守护进程的密钥环:

    1. 对于 Ceph Monitor,所有 Ceph 监视器的密钥环都是相同的。

      语法

      cat /var/lib/ceph/mon/ceph-MONITOR_NODE/keyring

      示例

      [root@mons-1 ~]# cat /var/lib/ceph/mon/ceph-mons-1/keyring

    2. 对于 Ceph Manager,从所有管理器节点获取密钥环:

      语法

      cat /var/lib/ceph/mgr/ceph-MANAGER_NODE/keyring

      示例

      [root@mons-1 ~]# cat /var/lib/ceph/mgr/ceph-mons-1/keyring

    3. 对于 Ceph OSD,密钥环由上述脚本生成,并存储在临时路径中:

      在本例中,OSD 密钥环保存在 /tmp/monstore/keyring 文件中。

    4. 对于客户端,从所有客户端节点获取密钥环:

      语法

      cat /etc/ceph/CLIENT_KEYRING

      示例

      [root@client ~]# cat /etc/ceph/ceph.client.admin.keyring

    5. 对于元数据数据服务器(MDS),请从所有 Ceph MDS 节点获取密钥环:

      语法

      cat /var/lib/ceph/mds/ceph-MDS_NODE/keyring

      示例

      [root@mons-2 ~]# cat /var/lib/ceph/mds/ceph-mds-1/keyring

      对于这个密钥环,如果不存在,会附加以下 caps:

      caps mds =  "allow"
      caps mon = "allow profile mds"
      caps osd = "allow *"
    6. 对于 Ceph 对象网关,请从所有 Ceph 对象网关节点获取密钥环:

      语法

      cat /var/lib/ceph/radosgw/ceph-CEPH_OBJECT_GATEWAY_NODE/keyring

      示例

      [root@mons-3 ~]# cat /var/lib/ceph/radosgw/ceph-rgw-1/keyring

      对于这个密钥环,如果不存在,会附加以下 caps:

      caps mon = "allow rw"
      caps osd = "allow *"
  6. 在 Ansible 管理节点上,创建带有上一步中获取的所有密钥环的文件:

    示例

    [root@admin ~]# cat /tmp/monstore/keyring
    
    [mon.]
    	key = AQAa+RxhAAAAABAApmwud0GQHX0raMBc9zTAYQ==
    	caps mon = "allow *"
    [client.admin]
    	key = AQAo+RxhcYWtGBAAiY4kKltMGnAXqPLM2A+B8w==
    	caps mds = "allow *"
    	caps mgr = "allow *"
    	caps mon = "allow *"
    	caps osd = "allow *"
    [mgr.mons-1]
    	key = AQA++RxhAAAAABAAKdG1ETTEMR8KPa9ZpfcIzw==
    	caps mds = "allow *"
    	caps mon = "allow profile mgr"
    	caps osd = "allow *"
    [mgr.mons-2]
    	key = AQA9+RxhAAAAABAAcCBxsoaIl0sdHTz3dqX4SQ==
    	caps mds = "allow *"
    	caps mon = "allow profile mgr"
    	caps osd = "allow *"
    [mgr.mons-3]
    	key = AQA/+RxhAAAAABAAwe/mwv0hS79fWP+00W6ypQ==
    	caps mds = "allow *"
    	caps mon = "allow profile mgr"
    	caps osd = "allow *"
    [osd.1]
    key = AQB/+RxhlH8rFxAAL3mb8Kdb+QuWWdJi+RvwGw==
        caps mgr = "allow profile osd"
        caps mon = "allow profile osd"
        caps osd = "allow *"
    [osd.5]
    key = AQCE+RxhKSNsHRAAIyLO5g75tqFVsl6MEEzwXw==
        caps mgr = "allow profile osd"
        caps mon = "allow profile osd"
        caps osd = "allow *"
    [osd.8]
    key = AQCJ+Rxhc0wHJhAA5Bb2kU9Nadpm3UCLASnCfw==
        caps mgr = "allow profile osd"
        caps mon = "allow profile osd"
        caps osd = "allow *"
    [osd.2]
    key = AQB/+RxhhrQCGRAAUhh77gIVhN8zsTbaKMJuHw==
        caps mgr = "allow profile osd"
        caps mon = "allow profile osd"
        caps osd = "allow *"
    [osd.4]
    key = AQCE+Rxh0mDxDRAApAeqKOJycW5bpP3IuAhSMw==
        caps mgr = "allow profile osd"
        caps mon = "allow profile osd"
        caps osd = "allow *"
    [osd.7]
    key = AQCJ+Rxhn+RAIhAAp1ImK1jiazBsDpmTQvVEVw==
        caps mgr = "allow profile osd"
        caps mon = "allow profile osd"
        caps osd = "allow *"
    [osd.0]
    key = AQB/+RxhPhh+FRAAc5b0nwiuK6o1AIbjVc6tQg==
        caps mgr = "allow profile osd"
        caps mon = "allow profile osd"
        caps osd = "allow *"
    [osd.3]
    key = AQCE+RxhJv8PARAAqCzH2br1xJmMTNnqH3I9mA==
        caps mgr = "allow profile osd"
        caps mon = "allow profile osd"
        caps osd = "allow *"
    [osd.6]
    key = AQCI+RxhAt4eIhAAYQEJqSNRT7l2WNl/rYQcKQ==
        caps mgr = "allow profile osd"
        caps mon = "allow profile osd"
        caps osd = "allow *"
    [osd.1]
    key = AQB/+RxhlH8rFxAAL3mb8Kdb+QuWWdJi+RvwGw==
        caps mgr = "allow profile osd"
        caps mon = "allow profile osd"
        caps osd = "allow *"
    [osd.5]
    key = AQCE+RxhKSNsHRAAIyLO5g75tqFVsl6MEEzwXw==
        caps mgr = "allow profile osd"
        caps mon = "allow profile osd"
        caps osd = "allow *"
    [osd.8]
    key = AQCJ+Rxhc0wHJhAA5Bb2kU9Nadpm3UCLASnCfw==
        caps mgr = "allow profile osd"
        caps mon = "allow profile osd"
        caps osd = "allow *"
    [osd.2]
    key = AQB/+RxhhrQCGRAAUhh77gIVhN8zsTbaKMJuHw==
        caps mgr = "allow profile osd"
        caps mon = "allow profile osd"
        caps osd = "allow *"
    [osd.0]
    key = AQB/+RxhPhh+FRAAc5b0nwiuK6o1AIbjVc6tQg==
        caps mgr = "allow profile osd"
        caps mon = "allow profile osd"
        caps osd = "allow *"
    [osd.3]
    key = AQCE+RxhJv8PARAAqCzH2br1xJmMTNnqH3I9mA==
        caps mgr = "allow profile osd"
        caps mon = "allow profile osd"
        caps osd = "allow *"
    [osd.6]
    key = AQCI+RxhAt4eIhAAYQEJqSNRT7l2WNl/rYQcKQ==
        caps mgr = "allow profile osd"
        caps mon = "allow profile osd"
        caps osd = "allow *"
    [mds.mds-1]
    	key = AQDs+RxhAF9vERAAdn6ArdUJ31RLr2sBVkzp3A==
            caps mds = "allow"
            caps mon = "allow profile mds"
            caps osd = "allow *"
    [mds.mds-2]
    	key = AQDs+RxhROoAFxAALAgMfM45wC5ht/vSFN2EzQ==
            caps mds = "allow"
            caps mon = "allow profile mds"
            caps osd = "allow *"
    [mds.mds-3]
    	key = AQDs+Rxhd092FRAArXLIHAhMp2z9zcWDCSoIDQ==
            caps mds = "allow"
            caps mon = "allow profile mds"
            caps osd = "allow *"
    [client.rgw.rgws-1.rgw0]
    	key = AQD9+Rxh0iP2MxAAYY76Js1AaZhzFG44cvcyOw==
            caps mon = "allow rw"
            caps osd = "allow *"

  7. 可选: 在每个 Ceph Monitor 节点上,确保 monitor map 不可用:

    示例

    [root@mons-1 ~]# ceph-monstore-tool /tmp/monstore get monmap -- --out /tmp/monmap
    [root@mons-1 ~]# monmaptool /tmp/monmap --print
    
    monmaptool: monmap file /tmp/monmap
    monmaptool: couldn't open /tmp/monmap: (2) No such file or directory
    
    Notice theNo such file or directory  error message if monmap is missed
    
    Notice that the “No such file or directory”  error message if monmap is missed

  8. 在每个 Ceph Monitor 节点中,从 etc/ceph/ceph.conf 文件获取 MONITOR_IDIP_ADDRESS_OF_MONITORFSID

    示例

    [global]
    cluster network = 10.0.208.0/22
    fsid = 9877bde8-ccb2-4758-89c3-90ca9550ffea
    mon host = [v2:10.0.211.00:3300,v1:10.0.211.00:6789],[v2:10.0.211.13:3300,v1:10.0.211.13:6789],[v2:10.0.210.13:3300,v1:10.0.210.13:6789]
    mon initial members = ceph-mons-1, ceph-mons-2, ceph-mons-3

  9. 在 Ceph Monitor 节点上,重建 monitor 映射:

    语法

    monmaptool --create --addv MONITOR_ID IP_ADDRESS_OF_MONITOR --enable-all-features --clobber PATH_OF_MONITOR_MAP --fsid FSID

    示例

    [root@mons-1 ~]# monmaptool --create --addv mons-1 [v2:10.74.177.30:3300,v1:10.74.177.30:6789] --addv mons-2 [v2:10.74.179.197:3300,v1:10.74.179.197:6789] --addv mons-3 [v2:10.74.182.123:3300,v1:10.74.182.123:6789] --enable-all-features --clobber /root/monmap.mons-1 --fsid 6c01cb34-33bf-44d0-9aec-3432276f6be8
    
    monmaptool: monmap file /root/monmap.mons-1
    monmaptool: set fsid to 6c01cb34-33bf-44d0-9aec-3432276f6be8
    monmaptool: writing epoch 0 to /root/monmap.mon-a (3 monitors)

  10. 在 Ceph Monitor 节点上,检查生成的 monitor 映射:

    语法

    monmaptool PATH_OF_MONITOR_MAP --print

    示例

    [root@mons-1 ~]# monmaptool /root/monmap.mons-1 --print
    
    monmaptool: monmap file /root/monmap.mons-1
    epoch 0
    fsid 6c01cb34-33bf-44d0-9aec-3432276f6be8
    last_changed 2021-11-23 02:57:23.235505
    created 2021-11-23 02:57:23.235505
    min_mon_release 0 (unknown)
    election_strategy: 1
    0: [v2:10.74.177.30:3300/0,v1:10.74.177.30:6789/0] mon.mons-1
    1: [v2:10.74.179.197:3300/0,v1:10.74.179.197:6789/0] mon.mons-2
    2: [v2:10.74.182.123:3300/0,v1:10.74.182.123:6789/0] mon.mons-3

  11. 在我们恢复 monitor 的 Ceph 监控节点上,从所收集的映射中重建 Ceph Monitor 存储:

    语法

    ceph-monstore-tool /tmp/monstore rebuild -- --keyring KEYRING_PATH  --monmap PATH_OF_MONITOR_MAP

    在本例中,恢复在 mons-1 节点上运行。

    示例

    [root@mons-1 ~]# ceph-monstore-tool /tmp/monstore rebuild -- --keyring /tmp/monstore/keyring  --monmap /root/monmap.mons-1

  12. 将 monstore 目录的所有权更改为 ceph:

    示例

    [root@mons-1 ~]# chown -R ceph:ceph /tmp/monstore

  13. 在所有 Ceph Monitor 节点上,对损坏的存储进行备份:

    示例

    [root@mons-1 ~]# mv /var/lib/ceph/mon/ceph-mons-1/store.db /var/lib/ceph/mon/ceph-mons-1/store.db.corrupted

  14. 在所有 Ceph Monitor 节点上,替换损坏的存储:

    示例

    [root@mons-1 ~]# scp -r /tmp/monstore/store.db mons-1:/var/lib/ceph/mon/ceph-mons-1/

  15. 在所有 Ceph Monitor 节点上,更改新存储的所有者:

    示例

    [root@mons-1 ~]# chown -R ceph:ceph /var/lib/ceph/mon/ceph-HOSTNAME/store.db

  16. 在所有 Ceph OSD 节点上,启动 OSD:

    示例

    [root@osds-1 ~]# sudo systemctl start ceph-osd.target

  17. 在所有 Ceph Monitor 节点上,启动 monitor

    示例

    [root@mons-1 ~]# sudo systemctl start ceph-mon.target

Red Hat logoGithubredditYoutubeTwitter

学习

尝试、购买和销售

社区

关于红帽文档

通过我们的产品和服务,以及可以信赖的内容,帮助红帽用户创新并实现他们的目标。 了解我们当前的更新.

让开源更具包容性

红帽致力于替换我们的代码、文档和 Web 属性中存在问题的语言。欲了解更多详情,请参阅红帽博客.

關於紅帽

我们提供强化的解决方案,使企业能够更轻松地跨平台和环境(从核心数据中心到网络边缘)工作。

Theme

© 2026 Red Hat
返回顶部