9.14. 将数据转换到 Amazon S3 云服务


您可以将数据迁移到远程云服务,作为使用存储类的生命周期配置的一部分,以降低成本并改进易管理性。转换是单向的,无法从远程区转换数据。此功能是启用数据转换到多个云供应商,如 Amazon (S3)。

使用 cloud-s3 作为 层类型,配置需要将数据转换到的远程云 S3 对象存储服务。它们不需要数据池,并在 zonegroup 放置目标中定义。

先决条件

  • 安装了 Ceph 对象网关的 Red Hat Ceph Storage 集群。
  • 远程云服务 Amazon S3 的用户凭据。
  • 在 Amazon S3 上创建的目标路径。
  • 在 bootstrapped 节点上安装 s3cmd
  • Amazon AWS 在本地配置为下载数据。

流程

  1. 创建带有访问密钥和 secret 密钥的用户:

    语法

    radosgw-admin user create --uid=USER_NAME --display-name="DISPLAY_NAME" [--access-key ACCESS_KEY --secret-key SECRET_KEY]

    示例

    [ceph: root@host01 /]# radosgw-admin user create --uid=test-user --display-name="test-user" --access-key a21e86bce636c3aa1 --secret-key cf764951f1fdde5e
    {
        "user_id": "test-user",
        "display_name": "test-user",
        "email": "",
        "suspended": 0,
        "max_buckets": 1000,
        "subusers": [],
        "keys": [
            {
                "user": "test-user",
                "access_key": "a21e86bce636c3aa1",
                "secret_key": "cf764951f1fdde5e"
            }
        ],
        "swift_keys": [],
        "caps": [],
        "op_mask": "read, write, delete",
        "default_placement": "",
        "default_storage_class": "",
        "placement_tags": [],
        "bucket_quota": {
            "enabled": false,
            "check_on_raw": false,
            "max_size": -1,
            "max_size_kb": 0,
            "max_objects": -1
        },
        "user_quota": {
            "enabled": false,
            "check_on_raw": false,
            "max_size": -1,
            "max_size_kb": 0,
            "max_objects": -1
        },
        "temp_url_keys": [],
        "type": "rgw",
        "mfa_ids": []
    }

  2. 在 bootstrapped 节点上,添加层类型为 cloud-s3 的存储类:

    注意

    使用 --tier-type=cloud-s3 选项创建存储类后,无法稍后将其修改为任何其他存储类类型。

    语法

    radosgw-admin zonegroup placement add --rgw-zonegroup =ZONE_GROUP_NAME \
                                --placement-id=PLACEMENT_ID \
                                --storage-class =STORAGE_CLASS_NAME \
                                --tier-type=cloud-s3

    示例

    [ceph: root@host01 /]# radosgw-admin zonegroup placement add --rgw-zonegroup=default \
                                                     --placement-id=default-placement \
                                                     --storage-class=CLOUDTIER \
                                                     --tier-type=cloud-s3
    [
        {
            "key": "default-placement",
            "val": {
                "name": "default-placement",
                "tags": [],
                "storage_classes": [
                    "CLOUDTIER",
                    "STANDARD"
                ],
                "tier_targets": [
                    {
                        "key": "CLOUDTIER",
                        "val": {
                            "tier_type": "cloud-s3",
                            "storage_class": "CLOUDTIER",
                            "retain_head_object": "false",
                            "s3": {
                                "endpoint": "",
                                "access_key": "",
                                "secret": "",
                                "host_style": "path",
                                "target_storage_class": "",
                                "target_path": "",
                                "acl_mappings": [],
                                "multipart_sync_threshold": 33554432,
                                "multipart_min_part_size": 33554432
                            }
                        }
                    }
                ]
            }
        }
    ]

  3. 更新 storage_class

    注意

    如果集群是多站点设置的一部分,请运行 period update --commit,以便 zonegroup 更改传播到多站点中的所有区域。

    注意

    确保 access_keysecret 不以数字开头。

    必需的参数为:

    • access_key 是用于特定连接的远程云 S3 访问密钥。
    • secret 是远程云 S3 服务的 secret 密钥。
    • endpoint 是远程云 S3 服务端点的 URL。
    • 区域 (用于 AWS)是远程云 S3 服务区域名称。

    可选参数:

    • target_path 定义目标路径的创建方式。目标路径指定源 bucket-name/object-name 附加到的前缀。如果没有指定,则创建的 target_path 是 rgwx-ZONE_GROUP_NAME-STORAGE_CLASS_NAME-cloud-bucket
    • target_storage_class 定义了一个对象传输到的目标存储类。如果没有指定,则对象将转换为 STANDARD 存储类。
    • retain_head_object (如果为 true)保留对象转换为云的元数据。如果为 false (默认),对象会在转换后删除。对于当前版本的对象,会忽略这个选项。
    • multipart_sync_threshold 指定此大小或更大对象使用多部分上传转换为云。
    • multipart_min_part_size 指定在使用多部分上传转换对象时要使用的最小部分大小。

      语法

      radosgw-admin zonegroup placement modify --rgw-zonegroup ZONE_GROUP_NAME \
                                               --placement-id PLACEMENT_ID \
                                               --storage-class STORAGE_CLASS_NAME  \
                                               --tier-config=endpoint=AWS_ENDPOINT_URL,\
                                               access_key=AWS_ACCESS_KEY,secret=AWS_SECRET_KEY,\
                                               target_path="TARGET_BUCKET_ON_AWS",\
                                               multipart_sync_threshold=44432,\
                                               multipart_min_part_size=44432,\
                                               retain_head_object=true
                                                region=REGION_NAME

      示例

      [ceph: root@host01 /]# radosgw-admin zonegroup placement modify --rgw-zonegroup default
                                                                      --placement-id default-placement \
                                                                      --storage-class CLOUDTIER \
                                                                      --tier-config=endpoint=http://10.0.210.010:8080,\
                                                                      access_key=a21e86bce636c3aa2,secret=cf764951f1fdde5f,\
                                                                      target_path="dfqe-bucket-01",\
                                                                      multipart_sync_threshold=44432,\
                                                                      multipart_min_part_size=44432,\
                                                                      retain_head_object=true
                                                                      region=us-east-1
      
      [
          {
              "key": "default-placement",
              "val": {
                  "name": "default-placement",
                  "tags": [],
                  "storage_classes": [
                      "CLOUDTIER",
                      "STANDARD",
                      "cold.test",
                      "hot.test"
                  ],
                  "tier_targets": [
                      {
                          "key": "CLOUDTIER",
                          "val": {
                              "tier_type": "cloud-s3",
                              "storage_class": "CLOUDTIER",
                              "retain_head_object": "true",
                              "s3": {
                                  "endpoint": "http://10.0.210.010:8080",
                                  "access_key": "a21e86bce636c3aa2",
                                  "secret": "cf764951f1fdde5f",
                                  "region": "",
                                  "host_style": "path",
                                  "target_storage_class": "",
                                  "target_path": "dfqe-bucket-01",
                                  "acl_mappings": [],
                                  "multipart_sync_threshold": 44432,
                                  "multipart_min_part_size": 44432
                              }
                          }
                      }
                  ]
              }
          }
      ]
      ]

  4. 重启 Ceph 对象网关:

    语法

    ceph orch restart CEPH_OBJECT_GATEWAY_SERVICE_NAME

    示例

    [ceph: root@host 01 /]# ceph orch restart rgw.rgw.1
    
    Scheduled to restart rgw.rgw.1.host03.vkfldf on host 'host03’

  5. 退出 shell 和以 root 用户身份,在 bootstrapped 节点上配置 Amazon S3:

    示例

    [root@host01 ~]# s3cmd --configure
    
    Enter new values or accept defaults in brackets with Enter.
    Refer to user manual for detailed description of all options.
    
    Access key and Secret key are your identifiers for Amazon S3. Leave them empty for using the env variables.
    Access Key: a21e86bce636c3aa2
    Secret Key: cf764951f1fdde5f
    Default Region [US]:
    
    Use "s3.amazonaws.com" for S3 Endpoint and not modify it to the target Amazon S3.
    S3 Endpoint [s3.amazonaws.com]: 10.0.210.78:80
    
    Use "%(bucket)s.s3.amazonaws.com" to the target Amazon S3. "%(bucket)s" and "%(location)s" vars can be used
    if the target S3 system supports dns based buckets.
    DNS-style bucket+hostname:port template for accessing a bucket [%(bucket)s.s3.amazonaws.com]: 10.0.210.78:80
    
    Encryption password is used to protect your files from reading
    by unauthorized persons while in transfer to S3
    Encryption password:
    Path to GPG program [/usr/bin/gpg]:
    
    When using secure HTTPS protocol all communication with Amazon S3
    servers is protected from 3rd party eavesdropping. This method is
    slower than plain HTTP, and can only be proxied with Python 2.7 or newer
    Use HTTPS protocol [Yes]: No
    
    On some networks all internet access must go through a HTTP proxy.
    Try setting it here if you can't connect to S3 directly
    HTTP Proxy server name:
    
    New settings:
      Access Key: a21e86bce636c3aa2
      Secret Key: cf764951f1fdde5f
      Default Region: US
      S3 Endpoint: 10.0.210.78:80
      DNS-style bucket+hostname:port template for accessing a bucket: 10.0.210.78:80
      Encryption password:
      Path to GPG program: /usr/bin/gpg
      Use HTTPS protocol: False
      HTTP Proxy server name:
      HTTP Proxy server port: 0
    
    Test access with supplied credentials? [Y/n] Y
    Please wait, attempting to list all buckets...
    Success. Your access key and secret key worked fine :-)
    
    Now verifying that encryption works...
    Not configured. Never mind.
    
    Save settings? [y/N] y
    Configuration saved to '/root/.s3cfg'

  6. 创建 S3 存储桶:

    语法

    s3cmd mb s3://NAME_OF_THE_BUCKET_FOR_S3

    示例

    [root@host01 ~]# s3cmd mb s3://awstestbucket
    Bucket 's3://awstestbucket/' created

  7. 创建文件,输入所有数据,并将其移动到 S3 服务:

    语法

    s3cmd put FILE_NAME  s3://NAME_OF_THE_BUCKET_ON_S3

    示例

    [root@host01 ~]# s3cmd put test.txt s3://awstestbucket
    
    upload: 'test.txt' -> 's3://awstestbucket/test.txt'  [1 of 1]
     21 of 21   100% in    1s    16.75 B/s  done

  8. 创建生命周期配置转换策略:

    语法

    <LifecycleConfiguration>
      <Rule>
        <ID>RULE_NAME</ID>
        <Filter>
          <Prefix></Prefix>
        </Filter>
        <Status>Enabled</Status>
        <Transition>
          <Days>DAYS</Days>
          <StorageClass>STORAGE_CLASS_NAME</StorageClass>
        </Transition>
      </Rule>
    </LifecycleConfiguration>

    示例

    [root@host01 ~]# cat lc_cloud.xml
    <LifecycleConfiguration>
      <Rule>
        <ID>Archive all objects</ID>
        <Filter>
          <Prefix></Prefix>
        </Filter>
        <Status>Enabled</Status>
        <Transition>
          <Days>2</Days>
          <StorageClass>CLOUDTIER</StorageClass>
        </Transition>
      </Rule>
    </LifecycleConfiguration>

  9. 设置生命周期配置转换策略:

    语法

    s3cmd setlifecycle FILE_NAME s3://NAME_OF_THE_BUCKET_FOR_S3

    示例

    [root@host01 ~]#  s3cmd setlifecycle lc_config.xml s3://awstestbucket
    
    s3://awstestbucket/: Lifecycle Policy updated

  10. 登录 cephadm shell

    示例

    [root@host 01 ~]# cephadm shell

  11. 重启 Ceph 对象网关:

    语法

    ceph orch restart CEPH_OBJECT_GATEWAY_SERVICE_NAME

    示例

    [ceph: root@host 01 /]# ceph orch restart rgw.rgw.1
    
    Scheduled to restart rgw.rgw.1.host03.vkfldf on host 'host03’

验证

  1. 在源集群中,使用 radosgw-admin lc list 命令验证数据是否已移到 S3:

    示例

    [ceph: root@host01 /]# radosgw-admin lc list
    [
        {
            "bucket": ":awstestbucket:552a3adb-39e0-40f6-8c84-00590ed70097.54639.1",
            "started": "Mon, 26 Sep 2022 18:32:07 GMT",
            "status": "COMPLETE"
        }
    ]

  2. 验证对象在云端点上的转换:

    示例

    [root@client ~]$ radosgw-admin bucket list
    [
        "awstestbucket"
    ]

  3. 列出存储桶中的对象:

    示例

    [root@host01 ~]$ aws s3api list-objects --bucket awstestbucket --endpoint=http://10.0.209.002:8080
    {
        "Contents": [
            {
                "Key": "awstestbucket/test",
                "LastModified": "2022-08-25T16:14:23.118Z",
                "ETag": "\"378c905939cc4459d249662dfae9fd6f\"",
                "Size": 29,
                "StorageClass": "STANDARD",
                "Owner": {
                    "DisplayName": "test-user",
                    "ID": "test-user"
                }
            }
        ]
    }

  4. 列出 S3 存储桶的内容:

    示例

    [root@host01 ~]# s3cmd ls s3://awstestbucket
    2022-08-25 09:57            0  s3://awstestbucket/test.txt

  5. 检查该文件的信息:

    示例

    [root@host01 ~]# s3cmd info s3://awstestbucket/test.txt
    s3://awstestbucket/test.txt (object):
       File size: 0
       Last mod:  Mon, 03 Aug 2022 09:57:49 GMT
       MIME type: text/plain
       Storage:   CLOUDTIER
       MD5 sum:   991d2528bb41bb839d1a9ed74b710794
       SSE:       none
       Policy:    none
       CORS:      none
       ACL:       test-user: FULL_CONTROL
       x-amz-meta-s3cmd-attrs: atime:1664790668/ctime:1664790668/gid:0/gname:root/md5:991d2528bb41bb839d1a9ed74b710794/mode:33188/mtime:1664790668/uid:0/uname:root

  6. 从 Amazon S3 本地下载数据:

    1. 配置 AWS:

      示例

      [client@client01 ~]$ aws configure
      
      AWS Access Key ID [****************6VVP]:
      AWS Secret Access Key [****************pXqy]:
      Default region name [us-east-1]:
      Default output format [json]:

    2. 列出 AWS 存储桶的内容:

      示例

      [client@client01 ~]$ aws s3 ls s3://dfqe-bucket-01/awstest
      PRE awstestbucket/

    3. 从 S3 下载数据:

      示例

      [client@client01 ~]$ aws s3 cp s3://dfqe-bucket-01/awstestbucket/test.txt .
      
      download: s3://dfqe-bucket-01/awstestbucket/test.txt to ./test.txt

Red Hat logoGithubRedditYoutubeTwitter

学习

尝试、购买和销售

社区

关于红帽文档

通过我们的产品和服务,以及可以信赖的内容,帮助红帽用户创新并实现他们的目标。

让开源更具包容性

红帽致力于替换我们的代码、文档和 Web 属性中存在问题的语言。欲了解更多详情,请参阅红帽博客.

關於紅帽

我们提供强化的解决方案,使企业能够更轻松地跨平台和环境(从核心数据中心到网络边缘)工作。

© 2024 Red Hat, Inc.