9.14. 将数据转换到 Amazon S3 云服务
您可以使用存储类将数据转换到远程云服务,以降低成本并提高可管理性。过渡是单向的,无法从远程区转换数据。此功能是使数据转换到多个云供应商,如 Amazon (S3)。
使用 cloud-s3
作为 层类型
,以配置需要转换数据的远程云 S3 对象存储服务。它们不需要数据池,并在 zonegroup 放置目标中定义。
先决条件
- 安装了 Ceph 对象网关的 Red Hat Ceph Storage 集群。
- 远程云服务 Amazon S3 的用户凭据。
- 在 Amazon S3 上创建的目标路径。
-
在 bootstrapped 节点上安装的
s3cmd
。 - Amazon AWS 在本地配置为下载数据。
流程
创建带有 access key 和 secret key 的用户:
语法
radosgw-admin user create --uid=USER_NAME --display-name="DISPLAY_NAME" [--access-key ACCESS_KEY --secret-key SECRET_KEY]
Example
[ceph: root@host01 /]# radosgw-admin user create --uid=test-user --display-name="test-user" --access-key a21e86bce636c3aa1 --secret-key cf764951f1fdde5e { "user_id": "test-user", "display_name": "test-user", "email": "", "suspended": 0, "max_buckets": 1000, "subusers": [], "keys": [ { "user": "test-user", "access_key": "a21e86bce636c3aa1", "secret_key": "cf764951f1fdde5e" } ], "swift_keys": [], "caps": [], "op_mask": "read, write, delete", "default_placement": "", "default_storage_class": "", "placement_tags": [], "bucket_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 }, "user_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 }, "temp_url_keys": [], "type": "rgw", "mfa_ids": [] }
在 bootstrapped 节点上,添加层类型为
cloud-s3
的存储类:注意使用
--tier-type=cloud-s3
选项创建存储类后,无法在以后将其修改到任何其他存储类类型。语法
radosgw-admin zonegroup placement add --rgw-zonegroup =ZONE_GROUP_NAME \ --placement-id=PLACEMENT_ID \ --storage-class =STORAGE_CLASS_NAME \ --tier-type=cloud-s3
Example
[ceph: root@host01 /]# radosgw-admin zonegroup placement add --rgw-zonegroup=default \ --placement-id=default-placement \ --storage-class=CLOUDTIER \ --tier-type=cloud-s3 [ { "key": "default-placement", "val": { "name": "default-placement", "tags": [], "storage_classes": [ "CLOUDTIER", "STANDARD" ], "tier_targets": [ { "key": "CLOUDTIER", "val": { "tier_type": "cloud-s3", "storage_class": "CLOUDTIER", "retain_head_object": "false", "s3": { "endpoint": "", "access_key": "", "secret": "", "host_style": "path", "target_storage_class": "", "target_path": "", "acl_mappings": [], "multipart_sync_threshold": 33554432, "multipart_min_part_size": 33554432 } } } ] } } ]
更新
storage_class
:注意如果集群是多站点设置的一部分,请运行
period update --commit
以便 zonegroup 更改传播到多站点中的所有区域。注意确保
access_key
和secret
没有以数字开头。必需的参数有:
-
access_key
是用于特定连接的远程云 S3 访问密钥。 -
secret
是远程云 S3 服务的 secret 密钥。 -
endpoint
是远程云 S3 服务端点的 URL。 -
区域
(用于 AWS)是远程云 S3 服务区域名称。
可选参数是:
-
target_path
定义如何创建目标路径。目标路径指定一个附加源bucket-name/object-name
的前缀。如果没有指定,则创建的 target_path 是rgwx-ZONE_GROUP_NAME-STORAGE_CLASS_NAME-cloud-bucket
。 -
target_storage_class
定义了一个对象传输到的目标存储类。如果没有指定,则对象将转换为 STANDARD 存储类。 -
retain_head_object
(如果为 true )保留对象转换为云的元数据。如果为 false (默认),则对象会在转换后删除。当前版本化对象会忽略这个选项。 -
multipart_sync_threshold
指定这个大小或更大的对象使用多部分上传迁移到云。 multipart_min_part_size
指定使用多部分上传转换对象时要使用的最小部分大小。语法
radosgw-admin zonegroup placement modify --rgw-zonegroup ZONE_GROUP_NAME \ --placement-id PLACEMENT_ID \ --storage-class STORAGE_CLASS_NAME \ --tier-config=endpoint=AWS_ENDPOINT_URL,\ access_key=AWS_ACCESS_KEY,secret=AWS_SECRET_KEY,\ target_path="TARGET_BUCKET_ON_AWS",\ multipart_sync_threshold=44432,\ multipart_min_part_size=44432,\ retain_head_object=true region=REGION_NAME
Example
[ceph: root@host01 /]# radosgw-admin zonegroup placement modify --rgw-zonegroup default --placement-id default-placement \ --storage-class CLOUDTIER \ --tier-config=endpoint=http://10.0.210.010:8080,\ access_key=a21e86bce636c3aa2,secret=cf764951f1fdde5f,\ target_path="dfqe-bucket-01",\ multipart_sync_threshold=44432,\ multipart_min_part_size=44432,\ retain_head_object=true region=us-east-1 [ { "key": "default-placement", "val": { "name": "default-placement", "tags": [], "storage_classes": [ "CLOUDTIER", "STANDARD", "cold.test", "hot.test" ], "tier_targets": [ { "key": "CLOUDTIER", "val": { "tier_type": "cloud-s3", "storage_class": "CLOUDTIER", "retain_head_object": "true", "s3": { "endpoint": "http://10.0.210.010:8080", "access_key": "a21e86bce636c3aa2", "secret": "cf764951f1fdde5f", "region": "", "host_style": "path", "target_storage_class": "", "target_path": "dfqe-bucket-01", "acl_mappings": [], "multipart_sync_threshold": 44432, "multipart_min_part_size": 44432 } } } ] } } ] ]
-
重启 Ceph 对象网关:
语法
ceph orch restart CEPH_OBJECT_GATEWAY_SERVICE_NAME
Example
[ceph: root@host 01 /]# ceph orch restart rgw.rgw.1 Scheduled to restart rgw.rgw.1.host03.vkfldf on host 'host03’
退出 shell,并以 root 用户身份,在 bootstrapped 节点上配置 Amazon S3:
Example
[root@host01 ~]# s3cmd --configure Enter new values or accept defaults in brackets with Enter. Refer to user manual for detailed description of all options. Access key and Secret key are your identifiers for Amazon S3. Leave them empty for using the env variables. Access Key: a21e86bce636c3aa2 Secret Key: cf764951f1fdde5f Default Region [US]: Use "s3.amazonaws.com" for S3 Endpoint and not modify it to the target Amazon S3. S3 Endpoint [s3.amazonaws.com]: 10.0.210.78:80 Use "%(bucket)s.s3.amazonaws.com" to the target Amazon S3. "%(bucket)s" and "%(location)s" vars can be used if the target S3 system supports dns based buckets. DNS-style bucket+hostname:port template for accessing a bucket [%(bucket)s.s3.amazonaws.com]: 10.0.210.78:80 Encryption password is used to protect your files from reading by unauthorized persons while in transfer to S3 Encryption password: Path to GPG program [/usr/bin/gpg]: When using secure HTTPS protocol all communication with Amazon S3 servers is protected from 3rd party eavesdropping. This method is slower than plain HTTP, and can only be proxied with Python 2.7 or newer Use HTTPS protocol [Yes]: No On some networks all internet access must go through a HTTP proxy. Try setting it here if you can't connect to S3 directly HTTP Proxy server name: New settings: Access Key: a21e86bce636c3aa2 Secret Key: cf764951f1fdde5f Default Region: US S3 Endpoint: 10.0.210.78:80 DNS-style bucket+hostname:port template for accessing a bucket: 10.0.210.78:80 Encryption password: Path to GPG program: /usr/bin/gpg Use HTTPS protocol: False HTTP Proxy server name: HTTP Proxy server port: 0 Test access with supplied credentials? [Y/n] Y Please wait, attempting to list all buckets... Success. Your access key and secret key worked fine :-) Now verifying that encryption works... Not configured. Never mind. Save settings? [y/N] y Configuration saved to '/root/.s3cfg'
创建 S3 存储桶:
语法
s3cmd mb s3://NAME_OF_THE_BUCKET_FOR_S3
Example
[root@host01 ~]# s3cmd mb s3://awstestbucket Bucket 's3://awstestbucket/' created
创建文件,输入所有数据,并将其移到 S3 服务:
语法
s3cmd put FILE_NAME s3://NAME_OF_THE_BUCKET_ON_S3
Example
[root@host01 ~]# s3cmd put test.txt s3://awstestbucket upload: 'test.txt' -> 's3://awstestbucket/test.txt' [1 of 1] 21 of 21 100% in 1s 16.75 B/s done
创建生命周期配置转换策略:
语法
<LifecycleConfiguration> <Rule> <ID>RULE_NAME</ID> <Filter> <Prefix></Prefix> </Filter> <Status>Enabled</Status> <Transition> <Days>DAYS</Days> <StorageClass>STORAGE_CLASS_NAME</StorageClass> </Transition> </Rule> </LifecycleConfiguration>
Example
[root@host01 ~]# cat lc_cloud.xml <LifecycleConfiguration> <Rule> <ID>Archive all objects</ID> <Filter> <Prefix></Prefix> </Filter> <Status>Enabled</Status> <Transition> <Days>2</Days> <StorageClass>CLOUDTIER</StorageClass> </Transition> </Rule> </LifecycleConfiguration>
设置生命周期配置转换策略:
语法
s3cmd setlifecycle FILE_NAME s3://NAME_OF_THE_BUCKET_FOR_S3
Example
[root@host01 ~]# s3cmd setlifecycle lc_config.xml s3://awstestbucket s3://awstestbucket/: Lifecycle Policy updated
登录到
cephadm shell
:Example
[root@host 01 ~]# cephadm shell
重启 Ceph 对象网关:
语法
ceph orch restart CEPH_OBJECT_GATEWAY_SERVICE_NAME
Example
[ceph: root@host 01 /]# ceph orch restart rgw.rgw.1 Scheduled to restart rgw.rgw.1.host03.vkfldf on host 'host03’
验证
在源集群中,使用
radosgw-admin lc list
命令验证数据是否已移至 S3:Example
[ceph: root@host01 /]# radosgw-admin lc list [ { "bucket": ":awstestbucket:552a3adb-39e0-40f6-8c84-00590ed70097.54639.1", "started": "Mon, 26 Sep 2022 18:32:07 GMT", "status": "COMPLETE" } ]
在云端点验证对象转换:
Example
[root@client ~]$ radosgw-admin bucket list [ "awstestbucket" ]
列出存储桶中的对象:
Example
[root@host01 ~]$ aws s3api list-objects --bucket awstestbucket --endpoint=http://10.0.209.002:8080 { "Contents": [ { "Key": "awstestbucket/test", "LastModified": "2022-08-25T16:14:23.118Z", "ETag": "\"378c905939cc4459d249662dfae9fd6f\"", "Size": 29, "StorageClass": "STANDARD", "Owner": { "DisplayName": "test-user", "ID": "test-user" } } ] }
列出 S3 存储桶的内容:
Example
[root@host01 ~]# s3cmd ls s3://awstestbucket 2022-08-25 09:57 0 s3://awstestbucket/test.txt
检查文件的信息:
Example
[root@host01 ~]# s3cmd info s3://awstestbucket/test.txt s3://awstestbucket/test.txt (object): File size: 0 Last mod: Mon, 03 Aug 2022 09:57:49 GMT MIME type: text/plain Storage: CLOUDTIER MD5 sum: 991d2528bb41bb839d1a9ed74b710794 SSE: none Policy: none CORS: none ACL: test-user: FULL_CONTROL x-amz-meta-s3cmd-attrs: atime:1664790668/ctime:1664790668/gid:0/gname:root/md5:991d2528bb41bb839d1a9ed74b710794/mode:33188/mtime:1664790668/uid:0/uname:root
从 Amazon S3 本地下载数据:
配置 AWS:
Example
[client@client01 ~]$ aws configure AWS Access Key ID [****************6VVP]: AWS Secret Access Key [****************pXqy]: Default region name [us-east-1]: Default output format [json]:
列出 AWS 存储桶的内容:
Example
[client@client01 ~]$ aws s3 ls s3://dfqe-bucket-01/awstest PRE awstestbucket/
从 S3: 下载数据:
Example
[client@client01 ~]$ aws s3 cp s3://dfqe-bucket-01/awstestbucket/test.txt . download: s3://dfqe-bucket-01/awstestbucket/test.txt to ./test.txt