9.14. 将数据转换到 Amazon S3 云服务
您可以将数据迁移到远程云服务,作为使用存储类的生命周期配置的一部分,以降低成本并改进易管理性。转换是单向的,无法从远程区转换数据。此功能是启用数据转换到多个云供应商,如 Amazon (S3)。
使用 cloud-s3
作为 层类型
,配置需要将数据转换到的远程云 S3 对象存储服务。它们不需要数据池,并在 zonegroup 放置目标中定义。
先决条件
- 安装了 Ceph 对象网关的 Red Hat Ceph Storage 集群。
- 远程云服务 Amazon S3 的用户凭据。
- 在 Amazon S3 上创建的目标路径。
-
在 bootstrapped 节点上安装
s3cmd
。 - Amazon AWS 在本地配置为下载数据。
流程
创建带有访问密钥和 secret 密钥的用户:
语法
radosgw-admin user create --uid=USER_NAME --display-name="DISPLAY_NAME" [--access-key ACCESS_KEY --secret-key SECRET_KEY]
示例
[ceph: root@host01 /]# radosgw-admin user create --uid=test-user --display-name="test-user" --access-key a21e86bce636c3aa1 --secret-key cf764951f1fdde5e { "user_id": "test-user", "display_name": "test-user", "email": "", "suspended": 0, "max_buckets": 1000, "subusers": [], "keys": [ { "user": "test-user", "access_key": "a21e86bce636c3aa1", "secret_key": "cf764951f1fdde5e" } ], "swift_keys": [], "caps": [], "op_mask": "read, write, delete", "default_placement": "", "default_storage_class": "", "placement_tags": [], "bucket_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 }, "user_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 }, "temp_url_keys": [], "type": "rgw", "mfa_ids": [] }
在 bootstrapped 节点上,添加层类型为
cloud-s3
的存储类:注意使用
--tier-type=cloud-s3
选项创建存储类后,无法稍后将其修改为任何其他存储类类型。语法
radosgw-admin zonegroup placement add --rgw-zonegroup =ZONE_GROUP_NAME \ --placement-id=PLACEMENT_ID \ --storage-class =STORAGE_CLASS_NAME \ --tier-type=cloud-s3
示例
[ceph: root@host01 /]# radosgw-admin zonegroup placement add --rgw-zonegroup=default \ --placement-id=default-placement \ --storage-class=CLOUDTIER \ --tier-type=cloud-s3 [ { "key": "default-placement", "val": { "name": "default-placement", "tags": [], "storage_classes": [ "CLOUDTIER", "STANDARD" ], "tier_targets": [ { "key": "CLOUDTIER", "val": { "tier_type": "cloud-s3", "storage_class": "CLOUDTIER", "retain_head_object": "false", "s3": { "endpoint": "", "access_key": "", "secret": "", "host_style": "path", "target_storage_class": "", "target_path": "", "acl_mappings": [], "multipart_sync_threshold": 33554432, "multipart_min_part_size": 33554432 } } } ] } } ]
更新
storage_class
:注意如果集群是多站点设置的一部分,请运行
period update --commit
,以便 zonegroup 更改传播到多站点中的所有区域。注意确保
access_key
和secret
不以数字开头。必需的参数为:
-
access_key
是用于特定连接的远程云 S3 访问密钥。 -
secret
是远程云 S3 服务的 secret 密钥。 -
endpoint
是远程云 S3 服务端点的 URL。 -
区域
(用于 AWS)是远程云 S3 服务区域名称。
可选参数:
-
target_path
定义目标路径的创建方式。目标路径指定源bucket-name/object-name
附加到的前缀。如果没有指定,则创建的 target_path 是rgwx-ZONE_GROUP_NAME-STORAGE_CLASS_NAME-cloud-bucket
。 -
target_storage_class
定义了一个对象传输到的目标存储类。如果没有指定,则对象将转换为 STANDARD 存储类。 -
retain_head_object
(如果为 true)保留对象转换为云的元数据。如果为 false (默认),对象会在转换后删除。对于当前版本的对象,会忽略这个选项。 -
multipart_sync_threshold
指定此大小或更大对象使用多部分上传转换为云。 multipart_min_part_size
指定在使用多部分上传转换对象时要使用的最小部分大小。语法
radosgw-admin zonegroup placement modify --rgw-zonegroup ZONE_GROUP_NAME \ --placement-id PLACEMENT_ID \ --storage-class STORAGE_CLASS_NAME \ --tier-config=endpoint=AWS_ENDPOINT_URL,\ access_key=AWS_ACCESS_KEY,secret=AWS_SECRET_KEY,\ target_path="TARGET_BUCKET_ON_AWS",\ multipart_sync_threshold=44432,\ multipart_min_part_size=44432,\ retain_head_object=true region=REGION_NAME
示例
[ceph: root@host01 /]# radosgw-admin zonegroup placement modify --rgw-zonegroup default --placement-id default-placement \ --storage-class CLOUDTIER \ --tier-config=endpoint=http://10.0.210.010:8080,\ access_key=a21e86bce636c3aa2,secret=cf764951f1fdde5f,\ target_path="dfqe-bucket-01",\ multipart_sync_threshold=44432,\ multipart_min_part_size=44432,\ retain_head_object=true region=us-east-1 [ { "key": "default-placement", "val": { "name": "default-placement", "tags": [], "storage_classes": [ "CLOUDTIER", "STANDARD", "cold.test", "hot.test" ], "tier_targets": [ { "key": "CLOUDTIER", "val": { "tier_type": "cloud-s3", "storage_class": "CLOUDTIER", "retain_head_object": "true", "s3": { "endpoint": "http://10.0.210.010:8080", "access_key": "a21e86bce636c3aa2", "secret": "cf764951f1fdde5f", "region": "", "host_style": "path", "target_storage_class": "", "target_path": "dfqe-bucket-01", "acl_mappings": [], "multipart_sync_threshold": 44432, "multipart_min_part_size": 44432 } } } ] } } ] ]
-
重启 Ceph 对象网关:
语法
ceph orch restart CEPH_OBJECT_GATEWAY_SERVICE_NAME
示例
[ceph: root@host 01 /]# ceph orch restart rgw.rgw.1 Scheduled to restart rgw.rgw.1.host03.vkfldf on host 'host03’
退出 shell 和以 root 用户身份,在 bootstrapped 节点上配置 Amazon S3:
示例
[root@host01 ~]# s3cmd --configure Enter new values or accept defaults in brackets with Enter. Refer to user manual for detailed description of all options. Access key and Secret key are your identifiers for Amazon S3. Leave them empty for using the env variables. Access Key: a21e86bce636c3aa2 Secret Key: cf764951f1fdde5f Default Region [US]: Use "s3.amazonaws.com" for S3 Endpoint and not modify it to the target Amazon S3. S3 Endpoint [s3.amazonaws.com]: 10.0.210.78:80 Use "%(bucket)s.s3.amazonaws.com" to the target Amazon S3. "%(bucket)s" and "%(location)s" vars can be used if the target S3 system supports dns based buckets. DNS-style bucket+hostname:port template for accessing a bucket [%(bucket)s.s3.amazonaws.com]: 10.0.210.78:80 Encryption password is used to protect your files from reading by unauthorized persons while in transfer to S3 Encryption password: Path to GPG program [/usr/bin/gpg]: When using secure HTTPS protocol all communication with Amazon S3 servers is protected from 3rd party eavesdropping. This method is slower than plain HTTP, and can only be proxied with Python 2.7 or newer Use HTTPS protocol [Yes]: No On some networks all internet access must go through a HTTP proxy. Try setting it here if you can't connect to S3 directly HTTP Proxy server name: New settings: Access Key: a21e86bce636c3aa2 Secret Key: cf764951f1fdde5f Default Region: US S3 Endpoint: 10.0.210.78:80 DNS-style bucket+hostname:port template for accessing a bucket: 10.0.210.78:80 Encryption password: Path to GPG program: /usr/bin/gpg Use HTTPS protocol: False HTTP Proxy server name: HTTP Proxy server port: 0 Test access with supplied credentials? [Y/n] Y Please wait, attempting to list all buckets... Success. Your access key and secret key worked fine :-) Now verifying that encryption works... Not configured. Never mind. Save settings? [y/N] y Configuration saved to '/root/.s3cfg'
创建 S3 存储桶:
语法
s3cmd mb s3://NAME_OF_THE_BUCKET_FOR_S3
示例
[root@host01 ~]# s3cmd mb s3://awstestbucket Bucket 's3://awstestbucket/' created
创建文件,输入所有数据,并将其移动到 S3 服务:
语法
s3cmd put FILE_NAME s3://NAME_OF_THE_BUCKET_ON_S3
示例
[root@host01 ~]# s3cmd put test.txt s3://awstestbucket upload: 'test.txt' -> 's3://awstestbucket/test.txt' [1 of 1] 21 of 21 100% in 1s 16.75 B/s done
创建生命周期配置转换策略:
语法
<LifecycleConfiguration> <Rule> <ID>RULE_NAME</ID> <Filter> <Prefix></Prefix> </Filter> <Status>Enabled</Status> <Transition> <Days>DAYS</Days> <StorageClass>STORAGE_CLASS_NAME</StorageClass> </Transition> </Rule> </LifecycleConfiguration>
示例
[root@host01 ~]# cat lc_cloud.xml <LifecycleConfiguration> <Rule> <ID>Archive all objects</ID> <Filter> <Prefix></Prefix> </Filter> <Status>Enabled</Status> <Transition> <Days>2</Days> <StorageClass>CLOUDTIER</StorageClass> </Transition> </Rule> </LifecycleConfiguration>
设置生命周期配置转换策略:
语法
s3cmd setlifecycle FILE_NAME s3://NAME_OF_THE_BUCKET_FOR_S3
示例
[root@host01 ~]# s3cmd setlifecycle lc_config.xml s3://awstestbucket s3://awstestbucket/: Lifecycle Policy updated
登录
cephadm shell
:示例
[root@host 01 ~]# cephadm shell
重启 Ceph 对象网关:
语法
ceph orch restart CEPH_OBJECT_GATEWAY_SERVICE_NAME
示例
[ceph: root@host 01 /]# ceph orch restart rgw.rgw.1 Scheduled to restart rgw.rgw.1.host03.vkfldf on host 'host03’
验证
在源集群中,使用
radosgw-admin lc list
命令验证数据是否已移到 S3:示例
[ceph: root@host01 /]# radosgw-admin lc list [ { "bucket": ":awstestbucket:552a3adb-39e0-40f6-8c84-00590ed70097.54639.1", "started": "Mon, 26 Sep 2022 18:32:07 GMT", "status": "COMPLETE" } ]
验证对象在云端点上的转换:
示例
[root@client ~]$ radosgw-admin bucket list [ "awstestbucket" ]
列出存储桶中的对象:
示例
[root@host01 ~]$ aws s3api list-objects --bucket awstestbucket --endpoint=http://10.0.209.002:8080 { "Contents": [ { "Key": "awstestbucket/test", "LastModified": "2022-08-25T16:14:23.118Z", "ETag": "\"378c905939cc4459d249662dfae9fd6f\"", "Size": 29, "StorageClass": "STANDARD", "Owner": { "DisplayName": "test-user", "ID": "test-user" } } ] }
列出 S3 存储桶的内容:
示例
[root@host01 ~]# s3cmd ls s3://awstestbucket 2022-08-25 09:57 0 s3://awstestbucket/test.txt
检查该文件的信息:
示例
[root@host01 ~]# s3cmd info s3://awstestbucket/test.txt s3://awstestbucket/test.txt (object): File size: 0 Last mod: Mon, 03 Aug 2022 09:57:49 GMT MIME type: text/plain Storage: CLOUDTIER MD5 sum: 991d2528bb41bb839d1a9ed74b710794 SSE: none Policy: none CORS: none ACL: test-user: FULL_CONTROL x-amz-meta-s3cmd-attrs: atime:1664790668/ctime:1664790668/gid:0/gname:root/md5:991d2528bb41bb839d1a9ed74b710794/mode:33188/mtime:1664790668/uid:0/uname:root
从 Amazon S3 本地下载数据:
配置 AWS:
示例
[client@client01 ~]$ aws configure AWS Access Key ID [****************6VVP]: AWS Secret Access Key [****************pXqy]: Default region name [us-east-1]: Default output format [json]:
列出 AWS 存储桶的内容:
示例
[client@client01 ~]$ aws s3 ls s3://dfqe-bucket-01/awstest PRE awstestbucket/
从 S3 下载数据:
示例
[client@client01 ~]$ aws s3 cp s3://dfqe-bucket-01/awstestbucket/test.txt . download: s3://dfqe-bucket-01/awstestbucket/test.txt to ./test.txt