Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.

Chapter 3. Ceph Object Gateway and the S3 API


As a developer, you can use a RESTful application programming interface (API) that is compatible with the Amazon S3 data access model. You can manage the buckets and objects stored in a Red Hat Ceph Storage cluster through the Ceph Object Gateway.

Prerequisites

  • A running Red Hat Ceph Storage cluster.
  • A RESTful client.

3.1. S3 limitations

Important

The following limitations should be used with caution. There are implications related to your hardware selections, so you should always discuss these requirements with your Red Hat account team.

  • Maximum object size when using Amazon S3: Individual Amazon S3 objects can range in size from a minimum of 0B to a maximum of 5TB. The largest object that can be uploaded in a single PUT is 5GB. For objects larger than 100MB, you should consider using the Multipart Upload capability.
  • Maximum metadata size when using Amazon S3: There is no defined limit on the total size of user metadata that can be applied to an object, but a single HTTP request is limited to 16,000 bytes.
  • The amount of data overhead Red Hat Ceph Storage cluster produces to store S3 objects and metadata: The estimate here is 200-300 bytes plus the length of the object name. Versioned objects consume additional space proportional to the number of versions. Also, transient overhead is produced during multi-part upload and other transactional updates, but these overheads are recovered during garbage collection.

Additional Resources

3.2. Accessing the Ceph Object Gateway with the S3 API

As a developer, you must configure access to the Ceph Object Gateway and the Secure Token Service (STS) before you can start using the Amazon S3 API.

Prerequisites

  • A running Red Hat Ceph Storage cluster.
  • A running Ceph Object Gateway.
  • A RESTful client.

3.2.1. S3 authentication

Requests to the Ceph Object Gateway can be either authenticated or unauthenticated. Ceph Object Gateway assumes unauthenticated requests are sent by an anonymous user. Ceph Object Gateway supports canned ACLs.

For most use cases, clients use existing open source libraries like the Amazon SDK’s AmazonS3Client for Java, and Python Boto. With open source libraries you simply pass in the access key and secret key and the library builds the request header and authentication signature for you. However, you can create requests and sign them too.

Authenticating a request requires including an access key and a base 64-encoded hash-based Message Authentication Code (HMAC) in the request before it is sent to the Ceph Object Gateway server. Ceph Object Gateway uses an S3-compatible authentication approach.

Example

HTTP/1.1
PUT /buckets/bucket/object.mpeg
Host: cname.domain.com
Date: Mon, 2 Jan 2012 00:01:01 +0000
Content-Encoding: mpeg
Content-Length: 9999999

Authorization: AWS ACCESS_KEY:HASH_OF_HEADER_AND_SECRET

In the above example, replace ACCESS_KEY with the value for the access key ID followed by a colon (:). Replace HASH_OF_HEADER_AND_SECRET with a hash of a canonicalized header string and the secret corresponding to the access key ID.

Generate hash of header string and secret

To generate the hash of the header string and secret:

  1. Get the value of the header string.
  2. Normalize the request header string into canonical form.
  3. Generate an HMAC using a SHA-1 hashing algorithm.
  4. Encode the hmac result as base-64.

Normalize header

To normalize the header into canonical form:

  1. Get all content- headers.
  2. Remove all content- headers except for content-type and content-md5.
  3. Ensure the content- header names are lowercase.
  4. Sort the content- headers lexicographically.
  5. Ensure you have a Date header AND ensure the specified date uses GMT and not an offset.
  6. Get all headers beginning with x-amz-.
  7. Ensure that the x-amz- headers are all lowercase.
  8. Sort the x-amz- headers lexicographically.
  9. Combine multiple instances of the same field name into a single field and separate the field values with a comma.
  10. Replace white space and line breaks in header values with a single space.
  11. Remove white space before and after colons.
  12. Append a new line after each header.
  13. Merge the headers back into the request header.

Replace the HASH_OF_HEADER_AND_SECRET with the base-64 encoded HMAC string.

Additional Resources

3.2.2. S3-server-side encryption

The Ceph Object Gateway supports server-side encryption of uploaded objects for the S3 application programming interface (API). Server-side encryption means that the S3 client sends data over HTTP in its unencrypted form, and the Ceph Object Gateway stores that data in the Red Hat Ceph Storage cluster in encrypted form.

Note

Red Hat does NOT support S3 object encryption of Static Large Object (SLO) or Dynamic Large Object (DLO).

Important

To use encryption, client requests MUST send requests over an SSL connection. Red Hat does not support S3 encryption from a client unless the Ceph Object Gateway uses SSL. However, for testing purposes, administrators can disable SSL during testing by setting the rgw_crypt_require_ssl configuration setting to false at runtime, using the ceph config set client.rgw command, and then restarting the Ceph Object Gateway instance.

In a production environment, it might not be possible to send encrypted requests over SSL. In such a case, send requests using HTTP with server-side encryption.

For information about how to configure HTTP with server-side encryption, see the Additional Resources section below.

There are two options for the management of encryption keys:

Customer-provided Keys

When using customer-provided keys, the S3 client passes an encryption key along with each request to read or write encrypted data. It is the customer’s responsibility to manage those keys. Customers must remember which key the Ceph Object Gateway used to encrypt each object.

Ceph Object Gateway implements the customer-provided key behavior in the S3 API according to the Amazon SSE-C specification.

Since the customer handles the key management and the S3 client passes keys to the Ceph Object Gateway, the Ceph Object Gateway requires no special configuration to support this encryption mode.

Key Management Service

When using a key management service, the secure key management service stores the keys and the Ceph Object Gateway retrieves them on demand to serve requests to encrypt or decrypt data.

Ceph Object Gateway implements the key management service behavior in the S3 API according to the Amazon SSE-KMS specification.

Important

Currently, the only tested key management implementations are HashiCorp Vault, and OpenStack Barbican. However, OpenStack Barbican is a Technology Preview and is not supported for use in production systems.

3.2.3. S3 access control lists

Ceph Object Gateway supports S3-compatible Access Control Lists (ACL) functionality. An ACL is a list of access grants that specify which operations a user can perform on a bucket or on an object. Each grant has a different meaning when applied to a bucket versus applied to an object:

Table 3.1. User Operations
PermissionBucketObject

READ

Grantee can list the objects in the bucket.

Grantee can read the object.

WRITE

Grantee can write or delete objects in the bucket.

N/A

READ_ACP

Grantee can read bucket ACL.

Grantee can read the object ACL.

WRITE_ACP

Grantee can write bucket ACL.

Grantee can write to the object ACL.

FULL_CONTROL

Grantee has full permissions for object in the bucket.

Grantee can read or write to the object ACL.

3.2.4. Preparing access to the Ceph Object Gateway using S3

You have to follow some pre-requisites on the Ceph Object Gateway node before attempting to access the gateway server.

Prerequisites

  • Installation of the Ceph Object Gateway software.
  • Root-level access to the Ceph Object Gateway node.

Procedure

  1. As root, open port 8080 on the firewall:

    [root@rgw ~]# firewall-cmd --zone=public --add-port=8080/tcp --permanent
    [root@rgw ~]# firewall-cmd --reload
  2. Add a wildcard to the DNS server that you are using for the gateway as mentioned in the Object Gateway Configuration and Administration Guide.

    You can also set up the gateway node for local DNS caching. To do so, execute the following steps:

    1. As root, install and setup dnsmasq:

      [root@rgw ~]# yum install dnsmasq
      [root@rgw ~]# echo "address=/.FQDN_OF_GATEWAY_NODE/IP_OF_GATEWAY_NODE" | tee --append /etc/dnsmasq.conf
      [root@rgw ~]# systemctl start dnsmasq
      [root@rgw ~]# systemctl enable dnsmasq

      Replace IP_OF_GATEWAY_NODE and FQDN_OF_GATEWAY_NODE with the IP address and FQDN of the gateway node.

    2. As root, stop NetworkManager:

      [root@rgw ~]# systemctl stop NetworkManager
      [root@rgw ~]# systemctl disable NetworkManager
    3. As root, set the gateway server’s IP as the nameserver:

      [root@rgw ~]# echo "DNS1=IP_OF_GATEWAY_NODE" | tee --append /etc/sysconfig/network-scripts/ifcfg-eth0
      [root@rgw ~]# echo "IP_OF_GATEWAY_NODE FQDN_OF_GATEWAY_NODE" | tee --append /etc/hosts
      [root@rgw ~]# systemctl restart network
      [root@rgw ~]# systemctl enable network
      [root@rgw ~]# systemctl restart dnsmasq

      Replace IP_OF_GATEWAY_NODE and FQDN_OF_GATEWAY_NODE with the IP address and FQDN of the gateway node.

    4. Verify subdomain requests:

      [user@rgw ~]$ ping mybucket.FQDN_OF_GATEWAY_NODE

      Replace FQDN_OF_GATEWAY_NODE with the FQDN of the gateway node.

      Warning

      Setting up the gateway server for local DNS caching is for testing purposes only. You won’t be able to access the outside network after doing this. It is strongly recommended to use a proper DNS server for the Red Hat Ceph Storage cluster and gateway node.

  3. Create the radosgw user for S3 access carefully as mentioned in the Object Gateway Configuration and Administration Guide and copy the generated access_key and secret_key. You will need these keys for S3 access and subsequent bucket management tasks.

3.2.5. Accessing the Ceph Object Gateway using Ruby AWS S3

You can use Ruby programming language along with aws-s3 gem for S3 access. Execute the steps mentioned below on the node used for accessing the Ceph Object Gateway server with Ruby AWS::S3.

Prerequisites

  • User-level access to Ceph Object Gateway.
  • Root-level access to the node accessing the Ceph Object Gateway.
  • Internet access.

Procedure

  1. Install the ruby package:

    [root@dev ~]# yum install ruby
    Note

    The above command will install ruby and its essential dependencies like rubygems and ruby-libs. If somehow the command does not install all the dependencies, install them separately.

  2. Install the aws-s3 Ruby package:

    [root@dev ~]# gem install aws-s3
  3. Create a project directory:

    [user@dev ~]$ mkdir ruby_aws_s3
    [user@dev ~]$ cd ruby_aws_s3
  4. Create the connection file:

    [user@dev ~]$ vim conn.rb
  5. Paste the following contents into the conn.rb file:

    Syntax

    #!/usr/bin/env ruby
    
    require 'aws/s3'
    require 'resolv-replace'
    
    AWS::S3::Base.establish_connection!(
            :server            => 'FQDN_OF_GATEWAY_NODE',
            :port           => '8080',
            :access_key_id     => 'MY_ACCESS_KEY',
            :secret_access_key => 'MY_SECRET_KEY'
    )

    Replace FQDN_OF_GATEWAY_NODE with the FQDN of the Ceph Object Gateway node. Replace MY_ACCESS_KEY and MY_SECRET_KEY with the access_key and secret_key that were generated when you created the radosgw user for S3 access as mentioned in the Red Hat Ceph Storage Object Gateway Configuration and Administration Guide.

    Example

    #!/usr/bin/env ruby
    
    require 'aws/s3'
    require 'resolv-replace'
    
    AWS::S3::Base.establish_connection!(
            :server            => 'testclient.englab.pnq.redhat.com',
            :port           => '8080',
            :access_key_id     => '98J4R9P22P5CDL65HKP8',
            :secret_access_key => '6C+jcaP0dp0+FZfrRNgyGA9EzRy25pURldwje049'
    )

    Save the file and exit the editor.

  6. Make the file executable:

    [user@dev ~]$ chmod +x conn.rb
  7. Run the file:

    [user@dev ~]$ ./conn.rb | echo $?

    If you have provided the values correctly in the file, the output of the command will be 0.

  8. Create a new file for creating a bucket:

    [user@dev ~]$ vim create_bucket.rb

    Paste the following contents into the file:

    #!/usr/bin/env ruby
    
    load 'conn.rb'
    
    AWS::S3::Bucket.create('my-new-bucket1')

    Save the file and exit the editor.

  9. Make the file executable:

    [user@dev ~]$ chmod +x create_bucket.rb
  10. Run the file:

    [user@dev ~]$ ./create_bucket.rb

    If the output of the command is true it would mean that bucket my-new-bucket1 was created successfully.

  11. Create a new file for listing owned buckets:

    [user@dev ~]$ vim list_owned_buckets.rb

    Paste the following content into the file:

    #!/usr/bin/env ruby
    
    load 'conn.rb'
    
    AWS::S3::Service.buckets.each do |bucket|
            puts "{bucket.name}\t{bucket.creation_date}"
    end

    Save the file and exit the editor.

  12. Make the file executable:

    [user@dev ~]$ chmod +x list_owned_buckets.rb
  13. Run the file:

    [user@dev ~]$ ./list_owned_buckets.rb

    The output should look something like this:

    my-new-bucket1 2020-01-21 10:33:19 UTC
  14. Create a new file for creating an object:

    [user@dev ~]$ vim create_object.rb

    Paste the following contents into the file:

    #!/usr/bin/env ruby
    
    load 'conn.rb'
    
    AWS::S3::S3Object.store(
            'hello.txt',
            'Hello World!',
            'my-new-bucket1',
            :content_type => 'text/plain'
    )

    Save the file and exit the editor.

  15. Make the file executable:

    [user@dev ~]$ chmod +x create_object.rb
  16. Run the file:

    [user@dev ~]$ ./create_object.rb

    This will create a file hello.txt with the string Hello World!.

  17. Create a new file for listing a bucket’s content:

    [user@dev ~]$ vim list_bucket_content.rb

    Paste the following content into the file:

    #!/usr/bin/env ruby
    
    load 'conn.rb'
    
    new_bucket = AWS::S3::Bucket.find('my-new-bucket1')
    new_bucket.each do |object|
            puts "{object.key}\t{object.about['content-length']}\t{object.about['last-modified']}"
    end

    Save the file and exit the editor.

  18. Make the file executable.

    [user@dev ~]$ chmod +x list_bucket_content.rb
  19. Run the file:

    [user@dev ~]$ ./list_bucket_content.rb

    The output will look something like this:

    hello.txt    12    Fri, 22 Jan 2020 15:54:52 GMT
  20. Create a new file for deleting an empty bucket:

    [user@dev ~]$ vim del_empty_bucket.rb

    Paste the following contents into the file:

    #!/usr/bin/env ruby
    
    load 'conn.rb'
    
    AWS::S3::Bucket.delete('my-new-bucket1')

    Save the file and exit the editor.

  21. Make the file executable:

    [user@dev ~]$ chmod +x del_empty_bucket.rb
  22. Run the file:

    [user@dev ~]$ ./del_empty_bucket.rb | echo $?

    If the bucket is successfully deleted, the command will return 0 as output.

    Note

    Edit the create_bucket.rb file to create empty buckets, for example, my-new-bucket4, my-new-bucket5. Next, edit the above-mentioned del_empty_bucket.rb file accordingly before trying to delete empty buckets.

  23. Create a new file for deleting non-empty buckets:

    [user@dev ~]$ vim del_non_empty_bucket.rb

    Paste the following contents into the file:

    #!/usr/bin/env ruby
    
    load 'conn.rb'
    
    AWS::S3::Bucket.delete('my-new-bucket1', :force => true)

    Save the file and exit the editor.

  24. Make the file executable:

    [user@dev ~]$ chmod +x del_non_empty_bucket.rb
  25. Run the file:

    [user@dev ~]$ ./del_non_empty_bucket.rb | echo $?

    If the bucket is successfully deleted, the command will return 0 as output.

  26. Create a new file for deleting an object:

    [user@dev ~]$ vim delete_object.rb

    Paste the following contents into the file:

    #!/usr/bin/env ruby
    
    load 'conn.rb'
    
    AWS::S3::S3Object.delete('hello.txt', 'my-new-bucket1')

    Save the file and exit the editor.

  27. Make the file executable:

    [user@dev ~]$ chmod +x delete_object.rb
  28. Run the file:

    [user@dev ~]$ ./delete_object.rb

    This will delete the object hello.txt.

3.2.6. Accessing the Ceph Object Gateway using Ruby AWS SDK

You can use the Ruby programming language along with aws-sdk gem for S3 access. Execute the steps mentioned below on the node used for accessing the Ceph Object Gateway server with Ruby AWS::SDK.

Prerequisites

  • User-level access to Ceph Object Gateway.
  • Root-level access to the node accessing the Ceph Object Gateway.
  • Internet access.

Procedure

  1. Install the ruby package:

    [root@dev ~]# yum install ruby
    Note

    The above command will install ruby and its essential dependencies like rubygems and ruby-libs. If somehow the command does not install all the dependencies, install them separately.

  2. Install the aws-sdk Ruby package:

    [root@dev ~]# gem install aws-sdk
  3. Create a project directory:

    [user@dev ~]$ mkdir ruby_aws_sdk
    [user@dev ~]$ cd ruby_aws_sdk
  4. Create the connection file:

    [user@dev ~]$ vim conn.rb
  5. Paste the following contents into the conn.rb file:

    Syntax

    #!/usr/bin/env ruby
    
    require 'aws-sdk'
    require 'resolv-replace'
    
    Aws.config.update(
            endpoint: 'http://FQDN_OF_GATEWAY_NODE:8080',
            access_key_id: 'MY_ACCESS_KEY',
            secret_access_key: 'MY_SECRET_KEY',
            force_path_style: true,
            region: 'us-east-1'
    )

    Replace FQDN_OF_GATEWAY_NODE with the FQDN of the Ceph Object Gateway node. Replace MY_ACCESS_KEY and MY_SECRET_KEY with the access_key and secret_key that were generated when you created the radosgw user for S3 access as mentioned in the Red Hat Ceph Storage Object Gateway Configuration and Administration Guide.

    Example

    #!/usr/bin/env ruby
    
    require 'aws-sdk'
    require 'resolv-replace'
    
    Aws.config.update(
            endpoint: 'http://testclient.englab.pnq.redhat.com:8080',
            access_key_id: '98J4R9P22P5CDL65HKP8',
            secret_access_key: '6C+jcaP0dp0+FZfrRNgyGA9EzRy25pURldwje049',
            force_path_style: true,
            region: 'us-east-1'
    )

    Save the file and exit the editor.

  6. Make the file executable:

    [user@dev ~]$ chmod +x conn.rb
  7. Run the file:

    [user@dev ~]$ ./conn.rb | echo $?

    If you have provided the values correctly in the file, the output of the command will be 0.

  8. Create a new file for creating a bucket:

    [user@dev ~]$ vim create_bucket.rb

    Paste the following contents into the file:

    Syntax

    #!/usr/bin/env ruby
    
    load 'conn.rb'
    
    s3_client = Aws::S3::Client.new
    s3_client.create_bucket(bucket: 'my-new-bucket2')

    Save the file and exit the editor.

  9. Make the file executable:

    [user@dev ~]$ chmod +x create_bucket.rb
  10. Run the file:

    [user@dev ~]$ ./create_bucket.rb

    If the output of the command is true, this means that bucket my-new-bucket2 was created successfully.

  11. Create a new file for listing owned buckets:

    [user@dev ~]$ vim list_owned_buckets.rb

    Paste the following content into the file:

    #!/usr/bin/env ruby
    
    load 'conn.rb'
    
    s3_client = Aws::S3::Client.new
    s3_client.list_buckets.buckets.each do |bucket|
            puts "{bucket.name}\t{bucket.creation_date}"
    end

    Save the file and exit the editor.

  12. Make the file executable:

    [user@dev ~]$ chmod +x list_owned_buckets.rb
  13. Run the file:

    [user@dev ~]$ ./list_owned_buckets.rb

    The output should look something like this:

    my-new-bucket2 2020-01-21 10:33:19 UTC
  14. Create a new file for creating an object:

    [user@dev ~]$ vim create_object.rb

    Paste the following contents into the file:

    #!/usr/bin/env ruby
    
    load 'conn.rb'
    
    s3_client = Aws::S3::Client.new
    s3_client.put_object(
            key: 'hello.txt',
            body: 'Hello World!',
            bucket: 'my-new-bucket2',
            content_type: 'text/plain'
    )

    Save the file and exit the editor.

  15. Make the file executable:

    [user@dev ~]$ chmod +x create_object.rb
  16. Run the file:

    [user@dev ~]$ ./create_object.rb

    This will create a file hello.txt with the string Hello World!.

  17. Create a new file for listing a bucket’s content:

    [user@dev ~]$ vim list_bucket_content.rb

    Paste the following content into the file:

    #!/usr/bin/env ruby
    
    load 'conn.rb'
    
    s3_client = Aws::S3::Client.new
    s3_client.list_objects(bucket: 'my-new-bucket2').contents.each do |object|
            puts "{object.key}\t{object.size}"
    end

    Save the file and exit the editor.

  18. Make the file executable.

    [user@dev ~]$ chmod +x list_bucket_content.rb
  19. Run the file:

    [user@dev ~]$ ./list_bucket_content.rb

    The output will look something like this:

    hello.txt    12    Fri, 22 Jan 2020 15:54:52 GMT
  20. Create a new file for deleting an empty bucket:

    [user@dev ~]$ vim del_empty_bucket.rb

    Paste the following contents into the file:

    #!/usr/bin/env ruby
    
    load 'conn.rb'
    
    s3_client = Aws::S3::Client.new
    s3_client.delete_bucket(bucket: 'my-new-bucket2')

    Save the file and exit the editor.

  21. Make the file executable:

    [user@dev ~]$ chmod +x del_empty_bucket.rb
  22. Run the file:

    [user@dev ~]$ ./del_empty_bucket.rb | echo $?

    If the bucket is successfully deleted, the command will return 0 as output.

    Note

    Edit the create_bucket.rb file to create empty buckets, for example, my-new-bucket6, my-new-bucket7. Next, edit the above-mentioned del_empty_bucket.rb file accordingly before trying to delete empty buckets.

  23. Create a new file for deleting a non-empty bucket:

    [user@dev ~]$ vim del_non_empty_bucket.rb

    Paste the following contents into the file:

    #!/usr/bin/env ruby
    
    load 'conn.rb'
    
    s3_client = Aws::S3::Client.new
    Aws::S3::Bucket.new('my-new-bucket2', client: s3_client).clear!
    s3_client.delete_bucket(bucket: 'my-new-bucket2')

    Save the file and exit the editor.

  24. Make the file executable:

    [user@dev ~]$ chmod +x del_non_empty_bucket.rb
  25. Run the file:

    [user@dev ~]$ ./del_non_empty_bucket.rb | echo $?

    If the bucket is successfully deleted, the command will return 0 as output.

  26. Create a new file for deleting an object:

    [user@dev ~]$ vim delete_object.rb

    Paste the following contents into the file:

    #!/usr/bin/env ruby
    
    load 'conn.rb'
    
    s3_client = Aws::S3::Client.new
    s3_client.delete_object(key: 'hello.txt', bucket: 'my-new-bucket2')

    Save the file and exit the editor.

  27. Make the file executable:

    [user@dev ~]$ chmod +x delete_object.rb
  28. Run the file:

    [user@dev ~]$ ./delete_object.rb

    This will delete the object hello.txt.

3.2.7. Accessing the Ceph Object Gateway using PHP

You can use PHP scripts for S3 access. This procedure provides some example PHP scripts to do various tasks, such as deleting a bucket or an object.

Important

The examples given below are tested against php v5.4.16 and aws-sdk v2.8.24.

Prerequisites

  • Root-level access to a development workstation.
  • Internet access.

Procedure

  1. Install the php package:

    [root@dev ~]# yum install php
  2. Download the zip archive of aws-sdk for PHP and extract it.
  3. Create a project directory:

    [user@dev ~]$ mkdir php_s3
    [user@dev ~]$ cd php_s3
  4. Copy the extracted aws directory to the project directory. For example:

    [user@dev ~]$ cp -r ~/Downloads/aws/ ~/php_s3/
  5. Create the connection file:

    [user@dev ~]$ vim conn.php
  6. Paste the following contents in the conn.php file:

    Syntax

    <?php
    define('AWS_KEY', 'MY_ACCESS_KEY');
    define('AWS_SECRET_KEY', 'MY_SECRET_KEY');
    define('HOST', 'FQDN_OF_GATEWAY_NODE');
    define('PORT', '8080');
    
    // require the AWS SDK for php library
    require '/PATH_TO_AWS/aws-autoloader.php';
    
    use Aws\S3\S3Client;
    
    // Establish connection with host using S3 Client
    client = S3Client::factory(array(
        'base_url' => HOST,
        'port' => PORT,
        'key'      => AWS_KEY,
        'secret'   => AWS_SECRET_KEY
    ));
    ?>

    Replace FQDN_OF_GATEWAY_NODE with the FQDN of the gateway node. Replace MY_ACCESS_KEY and MY_SECRET_KEY with the access_key and secret_key that were generated when creating the radosgw user for S3 access as mentioned in the Red Hat Ceph Storage Object Gateway Configuration and Administration Guide. Replace PATH_TO_AWS with the absolute path to the extracted aws directory that you copied to the php project directory.

    Save the file and exit the editor.

  7. Run the file:

    [user@dev ~]$ php -f conn.php | echo $?

    If you have provided the values correctly in the file, the output of the command will be 0.

  8. Create a new file for creating a bucket:

    [user@dev ~]$ vim create_bucket.php

    Paste the following contents into the new file:

    Syntax

    <?php
    
    include 'conn.php';
    
    client->createBucket(array('Bucket' => 'my-new-bucket3'));
    
    ?>

    Save the file and exit the editor.

  9. Run the file:

    [user@dev ~]$ php -f create_bucket.php
  10. Create a new file for listing owned buckets:

    [user@dev ~]$ vim list_owned_buckets.php

    Paste the following content into the file:

    Syntax

    <?php
    
    include 'conn.php';
    
    blist = client->listBuckets();
    echo "Buckets belonging to " . blist['Owner']['ID'] . ":\n";
    foreach (blist['Buckets'] as b) {
        echo "{b['Name']}\t{b['CreationDate']}\n";
    }
    
    ?>

    Save the file and exit the editor.

  11. Run the file:

    [user@dev ~]$ php -f list_owned_buckets.php

    The output should look similar to this:

    my-new-bucket3 2020-01-21 10:33:19 UTC
  12. Create an object by first creating a source file named hello.txt:

    [user@dev ~]$ echo "Hello World!" > hello.txt
  13. Create a new php file:

    [user@dev ~]$ vim create_object.php

    Paste the following contents into the file:

    Syntax

    <?php
    
    include 'conn.php';
    
    key         = 'hello.txt';
    source_file = './hello.txt';
    acl         = 'private';
    bucket      = 'my-new-bucket3';
    client->upload(bucket, key, fopen(source_file, 'r'), acl);
    
    ?>

    Save the file and exit the editor.

  14. Run the file:

    [user@dev ~]$ php -f create_object.php

    This will create the object hello.txt in bucket my-new-bucket3.

  15. Create a new file for listing a bucket’s content:

    [user@dev ~]$ vim list_bucket_content.php

    Paste the following content into the file:

    Syntax

    <?php
    
    include 'conn.php';
    
    o_iter = client->getIterator('ListObjects', array(
        'Bucket' => 'my-new-bucket3'
    ));
    foreach (o_iter as o) {
        echo "{o['Key']}\t{o['Size']}\t{o['LastModified']}\n";
    }
    ?>

    Save the file and exit the editor.

  16. Run the file:

    [user@dev ~]$ php -f list_bucket_content.php

    The output will look similar to this:

    hello.txt    12    Fri, 22 Jan 2020 15:54:52 GMT
  17. Create a new file for deleting an empty bucket:

    [user@dev ~]$ vim del_empty_bucket.php

    Paste the following contents into the file:

    Syntax

    <?php
    
    include 'conn.php';
    
    client->deleteBucket(array('Bucket' => 'my-new-bucket3'));
    ?>

    Save the file and exit the editor.

  18. Run the file:

    [user@dev ~]$ php -f del_empty_bucket.php | echo $?

    If the bucket is successfully deleted, the command will return 0 as output.

    Note

    Edit the create_bucket.php file to create empty buckets, for example, my-new-bucket4, my-new-bucket5. Next, edit the above-mentioned del_empty_bucket.php file accordingly before trying to delete empty buckets.

    Important

    Deleting a non-empty bucket is currently not supported in PHP 2 and newer versions of aws-sdk.

  19. Create a new file for deleting an object:

    [user@dev ~]$ vim delete_object.php

    Paste the following contents into the file:

    Syntax

    <?php
    
    include 'conn.php';
    
    client->deleteObject(array(
        'Bucket' => 'my-new-bucket3',
        'Key'    => 'hello.txt',
    ));
    ?>

    Save the file and exit the editor.

  20. Run the file:

    [user@dev ~]$ php -f delete_object.php

    This will delete the object hello.txt.

3.2.8. Secure Token Service

The Amazon Web Services' Secure Token Service (STS) returns a set of temporary security credentials for authenticating users.

Red Hat Ceph Storage Object Gateway supports a subset of Amazon STS application programming interfaces (APIs) for identity and access management (IAM).

Users first authenticate against STS and receive a short-lived S3 access key and secret key that can be used in subsequent requests.

Red Hat Ceph Storage can authenticate S3 users by integrating with a Single Sign-On by configuring an OIDC provider. This feature enables Object Storage users to authenticate against an enterprise identity provider rather than the local Ceph Object Gateway database. For instance, if the SSO is connected to an enterprise IDP in the backend, Object Storage users can use their enterprise credentials to authenticate and get access to the Ceph Object Gateway S3 endpoint.

By using STS along with the IAM role policy feature, you can create finely tuned authorization policies to control access to your data. This enables you to implement either a Role-Based Access Control (RBAC) or Attribute-Based Access Control (ABAC) authorization model for your object storage data, giving you complete control over who can access the data.

Simplifies workflow to access S3 resources with STS

  1. The user wants access S3 resources in Red Hat Ceph Storage.
  2. The user needs to authenticate against the SSO provider.
  3. The SSO provider is federated with an IDP and checks if the user credentials are valid, the user gets authenticated and the SSO provides a Token to the user.
  4. Using the Token provided by the SSO, the user accesses the Ceph Object Gateway STS endpoint, asking to assume an IAM role that provides the user with access to S3 resources.
  5. The Red Hat Ceph Storage gateway receives the user token and asks the SSO to validate the token.
  6. Once the SSO validates the token, the user is allowed to assume the role. Through STS, the user is with temporary access and secret keys that give the user access to the S3 resources.
  7. Depending on the policies attached to the IAM role the user has assumed, the user can access a set of S3 resources.
  8. For example, read for bucket A and write to bucket B.
STS workflow

Additional Resources

3.2.8.1. The Secure Token Service application programming interfaces

The Ceph Object Gateway implements the following Secure Token Service (STS) application programming interfaces (APIs):

AssumeRole

This API returns a set of temporary credentials for cross-account access. These temporary credentials allow for both, permission policies attached with Role and policies attached with AssumeRole API. The RoleArn and the RoleSessionName request parameters are required, but the other request parameters are optional.

RoleArn
Description
The role to assume for the Amazon Resource Name (ARN) with a length of 20 to 2048 characters.
Type
String
Required
Yes
RoleSessionName
Description
Identifying the role session name to assume. The role session name can uniquely identify a session when different principals or different reasons assume a role. This parameter’s value has a length of 2 to 64 characters. The =, ,, ., @, and - characters are allowed, but no spaces allowed.
Type
String
Required
Yes
Policy
Description
An identity and access management policy (IAM) in a JSON format for use in an inline session. This parameter’s value has a length of 1 to 2048 characters.
Type
String
Required
No
DurationSeconds
Description
The duration of the session in seconds, with a minimum value of 900 seconds to a maximum value of 43200 seconds. The default value is 3600 seconds.
Type
Integer
Required
No
ExternalId
Description
When assuming a role for another account, provide the unique external identifier if available. This parameter’s value has a length of 2 to 1224 characters.
Type
String
Required
No
SerialNumber
Description
A user’s identification number from their associated multi-factor authentication (MFA) device. The parameter’s value can be the serial number of a hardware device or a virtual device, with a length of 9 to 256 characters.
Type
String
Required
No
TokenCode
Description
The value generated from the multi-factor authentication (MFA) device, if the trust policy requires MFA. If an MFA device is required, and if this parameter’s value is empty or expired, then AssumeRole call returns an "access denied" error message. This parameter’s value has a fixed length of 6 characters.
Type
String
Required
No

AssumeRoleWithWebIdentity

This API returns a set of temporary credentials for users who have been authenticated by an application, such as OpenID Connect or OAuth 2.0 Identity Provider. The RoleArn and the RoleSessionName request parameters are required, but the other request parameters are optional.

RoleArn
Description
The role to assume for the Amazon Resource Name (ARN) with a length of 20 to 2048 characters.
Type
String
Required
Yes
RoleSessionName
Description
Identifying the role session name to assume. The role session name can uniquely identify a session when different principals or different reasons assume a role. This parameter’s value has a length of 2 to 64 characters. The =, ,, ., @, and - characters are allowed, but no spaces are allowed.
Type
String
Required
Yes
Policy
Description
An identity and access management policy (IAM) in a JSON format for use in an inline session. This parameter’s value has a length of 1 to 2048 characters.
Type
String
Required
No
DurationSeconds
Description
The duration of the session in seconds, with a minimum value of 900 seconds to a maximum value of 43200 seconds. The default value is 3600 seconds.
Type
Integer
Required
No
ProviderId
Description
The fully qualified host component of the domain name from the identity provider. This parameter’s value is only valid for OAuth 2.0 access tokens, with a length of 4 to 2048 characters.
Type
String
Required
No
WebIdentityToken
Description
The OpenID Connect identity token or OAuth 2.0 access token provided from an identity provider. This parameter’s value has a length of 4 to 2048 characters.
Type
String
Required
No

Additional Resources

3.2.8.2. Configuring the Secure Token Service

Configure the Secure Token Service (STS) for use with the Ceph Object Gateway by setting the rgw_sts_key, and rgw_s3_auth_use_sts options.

Note

The S3 and STS APIs co-exist in the same namespace, and both can be accessed from the same endpoint in the Ceph Object Gateway.

Prerequisites

  • A running Red Hat Ceph Storage cluster.
  • A running Ceph Object Gateway.
  • Root-level access to a Ceph Manager node.

Procedure

  1. Set the following configuration options for the Ceph Object Gateway client:

    Syntax

    ceph config set RGW_CLIENT_NAME rgw_sts_key STS_KEY
    ceph config set RGW_CLIENT_NAME rgw_s3_auth_use_sts true

    The rgw_sts_key is the STS key for encrypting or decrypting the session token and is exactly 16 hex characters.

    Important

    The STS key needs to be alphanumeric.

    Example

    [root@mgr ~]# ceph config set client.rgw rgw_sts_key 7f8fd8dd4700mnop
    [root@mgr ~]# ceph config set client.rgw rgw_s3_auth_use_sts true

  2. Restart the Ceph Object Gateway for the added key to take effect.

    Note

    Use the output from the ceph orch ps command, under the NAME column, to get the SERVICE_TYPE.ID information.

    1. To restart the Ceph Object Gateway on an individual node in the storage cluster:

      Syntax

      systemctl restart ceph-CLUSTER_ID@SERVICE_TYPE.ID.service

      Example

      [root@host01 ~]# systemctl restart ceph-c4b34c6f-8365-11ba-dc31-529020a7702d@rgw.realm.zone.host01.gwasto.service

    2. To restart the Ceph Object Gateways on all nodes in the storage cluster:

      Syntax

      ceph orch restart SERVICE_TYPE

      Example

      [ceph: root@host01 /]# ceph orch restart rgw

Additional Resources

3.2.8.3. Creating a user for an OpenID Connect provider

To establish trust between the Ceph Object Gateway and the OpenID Connect Provider create a user entity and a role trust policy.

Prerequisites

  • User-level access to the Ceph Object Gateway node.
  • Secure Token Service configured.

Procedure

  1. Create a new Ceph user:

    Syntax

    radosgw-admin --uid USER_NAME --display-name "DISPLAY_NAME" --access_key USER_NAME --secret SECRET user create

    Example

    [user@rgw ~]$ radosgw-admin --uid TESTER --display-name "TestUser" --access_key TESTER --secret test123 user create

  2. Configure the Ceph user capabilities:

    Syntax

    radosgw-admin caps add --uid="USER_NAME" --caps="oidc-provider=*"

    Example

    [user@rgw ~]$ radosgw-admin caps add --uid="TESTER" --caps="oidc-provider=*"

  3. Add a condition to the role trust policy using the Secure Token Service (STS) API:

    Syntax

    "{\"Version\":\"2020-01-17\",\"Statement\":[{\"Effect\":\"Allow\",\"Principal\":{\"Federated\":[\"arn:aws:iam:::oidc-provider/IDP_URL\"]},\"Action\":[\"sts:AssumeRoleWithWebIdentity\"],\"Condition\":{\"StringEquals\":{\"IDP_URL:app_id\":\"AUD_FIELD\"\}\}\}\]\}"

    Important

    The app_id in the syntax example above must match the AUD_FIELD field of the incoming token.

Additional Resources

3.2.8.4. Obtaining a thumbprint of an OpenID Connect provider

Get the OpenID Connect provider’s (IDP) configuration document.

Any SSO that follows the OIDC protocol standards is expected to work with the Ceph Object Gateway. Red Hat has tested with the following SSO providers:

  • Red Hat Single Sing-on
  • Keycloak

Prerequisites

  • Installation of the openssl and curl packages.

Procedure

  1. Get the configuration document from the IDP’s URL:

    Syntax

    curl -k -v \
         -X GET \
         -H "Content-Type: application/x-www-form-urlencoded" \
         "IDP_URL:8000/CONTEXT/realms/REALM/.well-known/openid-configuration" \
       | jq .

    Example

    [user@client ~]$ curl -k -v \
         -X GET \
         -H "Content-Type: application/x-www-form-urlencoded" \
         "http://www.example.com:8000/auth/realms/quickstart/.well-known/openid-configuration" \
       | jq .

  2. Get the IDP certificate:

    Syntax

    curl -k -v \
         -X GET \
         -H "Content-Type: application/x-www-form-urlencoded" \
         "IDP_URL/CONTEXT/realms/REALM/protocol/openid-connect/certs" \
         | jq .

    Example

    [user@client ~]$ curl -k -v \
         -X GET \
         -H "Content-Type: application/x-www-form-urlencoded" \
         "http://www.example.com/auth/realms/quickstart/protocol/openid-connect/certs" \
         | jq .

    Note

    The x5c cert can be available on the /certs path or in the /jwks path depending on the SSO provider.

  3. Copy the result of the "x5c" response from the previous command and paste it into the certificate.crt file. Include —–BEGIN CERTIFICATE—– at the beginning and —–END CERTIFICATE—– at the end.

    Example

    -----BEGIN CERTIFICATE-----
    
    MIIDYjCCAkqgAwIBAgIEEEd2CDANBgkqhkiG9w0BAQsFADBzMQkwBwYDVQQGEwAxCTAHBgNVBAgTADEJMAcGA1UEBxMAMQkwBwYDVQQKEwAxCTAHBgNVBAsTADE6MDgGA1UEAxMxYXV0aHN2Yy1pbmxpbmVtZmEuZGV2LnZlcmlmeS5pYm1jbG91ZHNlY3VyaXR5LmNvbTAeFw0yMTA3MDUxMzU2MzZaFw0zMTA3MDMxMzU2MzZaMHMxCTAHBgNVBAYTADEJMAcGA1UECBMAMQkwBwYDVQQHEwAxCTAHBgNVBAoTADEJMAcGA1UECxMAMTowOAYDVQQDEzFhdXRoc3ZjLWlubGluZW1mYS5kZXYudmVyaWZ5LmlibWNsb3Vkc2VjdXJpdHkuY29tMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAphyu3HaAZ14JH/EXetZxtNnerNuqcnfxcmLhBz9SsTlFD59ta+BOVlRnK5SdYEqO3ws2iGEzTvC55rczF+hDVHFZEBJLVLQe8ABmi22RAtG1P0dA/Bq8ReFxpOFVWJUBc31QM+ummW0T4yw44wQJI51LZTMz7PznB0ScpObxKe+frFKd1TCMXPlWOSzmTeFYKzR83Fg9hsnz7Y8SKGxi+RoBbTLT+ektfWpR7O+oWZIf4INe1VYJRxZvn+qWcwI5uMRCtQkiMknc3Rj6Eupiqq6FlAjDs0p//EzsHAlW244jMYnHCGq0UP3oE7vViLJyiOmZw7J3rvs3m9mOQiPLoQIDAQABMA0GCSqGSIb3DQEBCwUAA4IBAQCeVqAzSh7Tp8LgaTIFUuRbdjBAKXC9Nw3+pRBHoiUTdhqO3ualyGih9m/js/clb8Vq/39zl0VPeaslWl2NNX9zaK7xo+ckVIOY3ucCaTC04ZUn1KzZu/7azlN0C5XSWg/CfXgU2P3BeMNzc1UNY1BASGyWn2lEplIVWKLaDZpNdSyyGyaoQAIBdzxeNCyzDfPCa2oSO8WH1czmFiNPqR5kdknHI96CmsQdi+DT4jwzVsYgrLfcHXmiWyIAb883hR3Pobp+Bsw7LUnxebQ5ewccjYmrJzOk5Wb5FpXBhaJH1B3AEd6RGalRUyc/zUKdvEy0nIRMDS9x2BP3NVvZSADD
    
    -----END CERTIFICATE-----

  4. Get the certificate thumbprint:

    Syntax

    openssl x509 -in CERT_FILE -fingerprint -noout

    Example

    [user@client ~]$ openssl x509 -in certificate.crt -fingerprint -noout
    SHA1 Fingerprint=F7:D7:B3:51:5D:D0:D3:19:DD:21:9A:43:A9:EA:72:7A:D6:06:52:87

  5. Remove all the colons from the SHA1 fingerprint and use this as the input for creating the IDP entity in the IAM request.

Additional Resources

3.2.8.5. Registering the OpenID Connect provider

Register the OpenID Connect provider’s (IDP) configuration document.

Prerequisites

  • Installation of the openssl and curl packages.
  • Secure Token Service configured.
  • User created for an OIDC provider.
  • Thumbprint of an OIDC obtained.

Procedure

  1. Extract URL from the token.

    Example

    [root@host01 ~]# bash check_token_isv.sh | jq .iss
    
    "https://keycloak-sso.apps.ocp.example.com/auth/realms/ceph"

  2. Register the OIDC provider with Ceph Object Gateway.

    Example

    [root@host01 ~]# aws --endpoint https://cephproxy1.example.com:8443 iam create-open-id-connect-provider --url https://keycloak-sso.apps.ocp.example.com/auth/realms/ceph --thumbprint-list 00E9CFD697E0B16DD13C86B0FFDC29957E5D24DF

  3. Verify that the OIDC provider is added to the Ceph Object Gateway.

    Example

    [root@host01 ~]# aws --endpoint https://cephproxy1.example.com:8443 iam
    list-open-id-connect-providers
    
    {
     "OpenIDConnectProviderList": [
     {
     "Arn":
    "arn:aws:iam:::oidc-provider/keycloak-sso.apps.ocp.example.com/auth/realms/ceph"
     }
     ]
    }

3.2.8.6. Creating IAM roles and policies

Create IAM roles and policies.

Prerequisites

  • Installation of the openssl and curl packages.
  • Secure Token Service configured.
  • User created for an OIDC provider.
  • Thumbprint of an OIDC obtained.
  • The OIDC provider in Ceph Object Gateway registered.

Procedure

  1. Retrieve and validate JWT token.

    Example

    [root@host01 ~]# curl -k -q -L -X POST
    "https://keycloak-sso.apps.example.com/auth/realms/ceph/protocol/openid-connect/
    token" \
    -H 'Content-Type: application/x-www-form-urlencoded' \
    --data-urlencode 'client_id=ceph' \
    --data-urlencode 'grant_type=password' \
    --data-urlencode 'client_secret=XXXXXXXXXXXXXXXXXXXXXXX' \
    --data-urlencode 'scope=openid' \
    --data-urlencode "username=SSOUSERNAME" \
    --data-urlencode "password=SSOPASSWORD"

  2. Verify the token.

    Example

    [root@host01 ~]# cat check_token.sh
    USERNAME=$1
    PASSWORD=$2
    KC_CLIENT="ceph"
    KC_CLIENT_SECRET="7sQXqyMSzHIeMcSALoKaljB6sNIBDRjU"
    KC_ACCESS_TOKEN="$(./get_web_token.sh $USERNAME $PASSWORD | jq -r '.access_token')"
    KC_SERVER="https://keycloak-sso.apps.ocp.stg.local"
    KC_CONTEXT="auth"
    KC_REALM="ceph"
    curl -k -s -q \
    -X POST \
    -u "$KC_CLIENT:$KC_CLIENT_SECRET" \
    -d "token=$KC_ACCESS_TOKEN" \
    "$KC_SERVER/$KC_CONTEXT/realms/$KC_REALM/protocol/openid-connect/token/introspect" | jq .
    
    
    [root@host01 ~]# ./check_token.sh s3admin passw0rd | jq .sub
    "ceph"

    In this example, the jq filter is used by the subfield in the token and is set to ceph.

  3. Create a JSON file with role properties. Set Statement to Allow and the Action as AssumeRoleWithWebIdentity. Allow access to any user with the JWT token that matches the condition with sub:ceph.

    Example

    [root@host01 ~]# cat role-rgwadmins.json
    {
     "Version": "2012-10-17",
     "Statement": [
       {
         "Effect": "Allow",
         "Principal": {
           "Federated": [
             "arn:aws:iam:::oidc-provider/keycloak-sso.apps.example.com/auth/realms/ceph"
           ]
         },
         "Action": [
           "sts:AssumeRoleWithWebIdentity"
         ],
         "Condition": {
           "StringLike": { "keycloak-sso.apps.example.com/auth/realms/ceph:sub":"ceph" }
         }
       }
     ]
    }

  4. Create a Ceph Object Gateway role using the JSON file.

    Example

    [root@host01 ~]# radosgw-admin role create --role-name rgwadmins \
    --assume-role-policy-doc=$(jq -rc . /root/role-rgwadmins.json)

.

3.2.8.7. Accessing S3 resources

Verify the Assume Role with STS credentials to access S3 resources.

Prerequisites

  • Installation of the openssl and curl packages.
  • Secure Token Service configured.
  • User created for an OIDC provider.
  • Thumbprint of an OIDC obtained.
  • The OIDC provider in Ceph Object Gateway registered.
  • IAM roles and policies created

Procedure

  1. Following is an example of assume Role with STS to get temporary access and secret key to access S3 resources.

    [roo@host01 ~]# cat test-assume-role.sh
    #!/bin/bash
    export AWS_CA_BUNDLE="/etc/pki/ca-trust/source/anchors/cert.pem"
    unset AWS_ACCESS_KEY_ID
    unset AWS_SECRET_ACCESS_KEY
    unset AWS_SESSION_TOKEN
    KC_ACCESS_TOKEN=$(curl -k -q -L -X POST
    "https://keycloak-sso.apps.ocp.example.com/auth/realms/ceph/protocol/openid-connect/
    token" \
    -H 'Content-Type: application/x-www-form-urlencoded' \
    --data-urlencode 'client_id=ceph' \
    --data-urlencode 'grant_type=password' \
    --data-urlencode 'client_secret=XXXXXXXXXXXXXXXXXXXXXXX' \
    --data-urlencode 'scope=openid' \
    --data-urlencode "<varname>SSOUSERNAME</varname>" \
    --data-urlencode "<varname>SSOPASSWORD</varname>" | jq -r .access_token)
    echo ${KC_ACCESS_TOKEN}
    IDM_ASSUME_ROLE_CREDS=$(aws sts assume-role-with-web-identity --role-arn
    "arn:aws:iam:::role/$3" --role-session-name testbr
    --endpoint=https://cephproxy1.example.com:8443
    --web-identity-token="$KC_ACCESS_TOKEN")
    echo "aws sts assume-role-with-web-identity --role-arn "arn:aws:iam:::role/$3"
    --role-session-name testb --endpoint=https://cephproxy1.example.com:8443
    --web-identity-token="$KC_ACCESS_TOKEN""
    echo $IDM_ASSUME_ROLE_CREDS
    export AWS_ACCESS_KEY_ID=$(echo $IDM_ASSUME_ROLE_CREDS | jq -r
    .Credentials.AccessKeyId)
    export AWS_SECRET_ACCESS_KEY=$(echo $IDM_ASSUME_ROLE_CREDS | jq -r
    .Credentials.SecretAccessKey)
    export AWS_SESSION_TOKEN=$(echo $IDM_ASSUME_ROLE_CREDS | jq -r
    .Credentials.SessionToken)
  2. Run the script.

    Example

    [root@host01 ~]# source ./test-assume-role.sh s3admin passw0rd rgwadmins
    [root@host01 ~]# aws s3 mb s3://testbucket
    [root@host01 ~]# aws s3 ls

3.2.9. Configuring and using STS Lite with Keystone (Technology Preview)

The Amazon Secure Token Service (STS) and S3 APIs co-exist in the same namespace. The STS options can be configured in conjunction with the Keystone options.

Note

Both S3 and STS APIs can be accessed using the same endpoint in Ceph Object Gateway.

Prerequisites

  • Red Hat Ceph Storage 5.0 or higher.
  • A running Ceph Object Gateway.
  • Installation of the Boto Python module, version 3 or higher.
  • Root-level access to a Ceph Manager node.
  • User-level access to an OpenStack node.

Procedure

  1. Set the following configuration options for the Ceph Object Gateway client:

    Syntax

    ceph config set RGW_CLIENT_NAME rgw_sts_key STS_KEY
    ceph config set RGW_CLIENT_NAME rgw_s3_auth_use_sts true

    The rgw_sts_key is the STS key for encrypting or decrypting the session token and is exactly 16 hex characters.

    Important

    The STS key needs to be alphanumeric.

    Example

    [root@mgr ~]# ceph config set client.rgw rgw_sts_key 7f8fd8dd4700mnop
    [root@mgr ~]# ceph config set client.rgw rgw_s3_auth_use_sts true

  2. Generate the EC2 credentials on the OpenStack node:

    Example

    [user@osp ~]$ openstack ec2 credentials create
    
    +------------+--------------------------------------------------------+
    | Field      | Value                                                  |
    +------------+--------------------------------------------------------+
    | access     | b924dfc87d454d15896691182fdeb0ef                       |
    | links      | {u'self': u'http://192.168.0.15/identity/v3/users/     |
    |            | 40a7140e424f493d8165abc652dc731c/credentials/          |
    |            | OS-EC2/b924dfc87d454d15896691182fdeb0ef'}              |
    | project_id | c703801dccaf4a0aaa39bec8c481e25a                       |
    | secret     | 6a2142613c504c42a94ba2b82147dc28                       |
    | trust_id   | None                                                   |
    | user_id    | 40a7140e424f493d8165abc652dc731c                       |
    +------------+--------------------------------------------------------+

  3. Use the generated credentials to get back a set of temporary security credentials using GetSessionToken API:

    Example

    import boto3
    
    access_key = b924dfc87d454d15896691182fdeb0ef
    secret_key = 6a2142613c504c42a94ba2b82147dc28
    
    client = boto3.client('sts',
    aws_access_key_id=access_key,
    aws_secret_access_key=secret_key,
    endpoint_url=https://www.example.com/rgw,
    region_name='',
    )
    
    response = client.get_session_token(
        DurationSeconds=43200
    )

  4. Obtaining the temporary credentials can be used for making S3 calls:

    Example

    s3client = boto3.client('s3',
      aws_access_key_id = response['Credentials']['AccessKeyId'],
      aws_secret_access_key = response['Credentials']['SecretAccessKey'],
      aws_session_token = response['Credentials']['SessionToken'],
      endpoint_url=https://www.example.com/s3,
      region_name='')
    
    bucket = s3client.create_bucket(Bucket='my-new-shiny-bucket')
    response = s3client.list_buckets()
    for bucket in response["Buckets"]:
      print "{name}\t{created}".format(
        name = bucket['Name'],
        created = bucket['CreationDate'],
      )

  5. Create a new S3Access role and configure a policy.

    1. Assign a user with administrative CAPS:

      Syntax

      radosgw-admin caps add --uid="USER" --caps="roles=*"

      Example

      [root@mgr ~]# radosgw-admin caps add --uid="gwadmin" --caps="roles=*"

    2. Create the S3Access role:

      Syntax

      radosgw-admin role create --role-name=ROLE_NAME --path=PATH --assume-role-policy-doc=TRUST_POLICY_DOC

      Example

      [root@mgr ~]# radosgw-admin role create --role-name=S3Access --path=/application_abc/component_xyz/ --assume-role-policy-doc=\{\"Version\":\"2012-10-17\",\"Statement\":\[\{\"Effect\":\"Allow\",\"Principal\":\{\"AWS\":\[\"arn:aws:iam:::user/TESTER\"\]\},\"Action\":\[\"sts:AssumeRole\"\]\}\]\}

    3. Attach a permission policy to the S3Access role:

      Syntax

      radosgw-admin role-policy put --role-name=ROLE_NAME --policy-name=POLICY_NAME --policy-doc=PERMISSION_POLICY_DOC

      Example

      [root@mgr ~]# radosgw-admin role-policy put --role-name=S3Access --policy-name=Policy --policy-doc=\{\"Version\":\"2012-10-17\",\"Statement\":\[\{\"Effect\":\"Allow\",\"Action\":\[\"s3:*\"\],\"Resource\":\"arn:aws:s3:::example_bucket\"\}\]\}

    4. Now another user can assume the role of the gwadmin user. For example, the gwuser user can assume the permissions of the gwadmin user.
    5. Make a note of the assuming user’s access_key and secret_key values.

      Example

      [root@mgr ~]# radosgw-admin user info --uid=gwuser | grep -A1 access_key

  6. Use the AssumeRole API call, providing the access_key and secret_key values from the assuming user:

    Example

    import boto3
    
    access_key = 11BS02LGFB6AL6H1ADMW
    secret_key = vzCEkuryfn060dfee4fgQPqFrncKEIkh3ZcdOANY
    
    client = boto3.client('sts',
    aws_access_key_id=access_key,
    aws_secret_access_key=secret_key,
    endpoint_url=https://www.example.com/rgw,
    region_name='',
    )
    
    response = client.assume_role(
    RoleArn='arn:aws:iam:::role/application_abc/component_xyz/S3Access',
    RoleSessionName='Bob',
    DurationSeconds=3600
    )

    Important

    The AssumeRole API requires the S3Access role.

Additional Resources

  • See the Test S3 Access section in the Red Hat Ceph Storage Object Gateway Guide for more information on installing the Boto Python module.
  • See the Create a User section in the Red Hat Ceph Storage Object Gateway Guide for more information.

3.2.10. Working around the limitations of using STS Lite with Keystone (Technology Preview)

A limitation with Keystone is that it does not supports Secure Token Service (STS) requests. Another limitation is the payload hash is not included with the request. To work around these two limitations the Boto authentication code must be modified.

Prerequisites

  • A running Red Hat Ceph Storage cluster, version 5.0 or higher.
  • A running Ceph Object Gateway.
  • Installation of Boto Python module, version 3 or higher.

Procedure

  1. Open and edit Boto’s auth.py file.

    1. Add the following four lines to the code block:

      class SigV4Auth(BaseSigner):
        """
        Sign a request with Signature V4.
        """
        REQUIRES_REGION = True
      
        def __init__(self, credentials, service_name, region_name):
            self.credentials = credentials
            # We initialize these value here so the unit tests can have
            # valid values.  But these will get overriden in ``add_auth``
            # later for real requests.
            self._region_name = region_name
            if service_name == 'sts': 1
                self._service_name = 's3' 2
            else: 3
                self._service_name = service_name 4
    2. Add the following two lines to the code block:

      def _modify_request_before_signing(self, request):
              if 'Authorization' in request.headers:
                  del request.headers['Authorization']
              self._set_necessary_date_headers(request)
              if self.credentials.token:
                  if 'X-Amz-Security-Token' in request.headers:
                      del request.headers['X-Amz-Security-Token']
                  request.headers['X-Amz-Security-Token'] = self.credentials.token
      
              if not request.context.get('payload_signing_enabled', True):
                  if 'X-Amz-Content-SHA256' in request.headers:
                      del request.headers['X-Amz-Content-SHA256']
                  request.headers['X-Amz-Content-SHA256'] = UNSIGNED_PAYLOAD 1
              else: 2
                  request.headers['X-Amz-Content-SHA256'] = self.payload(request)

Additional Resources

  • See the Test S3 Access section in the Red Hat Ceph Storage Object Gateway Guide for more information on installing the Boto Python module.

3.3. S3 bucket operations

As a developer, you can perform bucket operations with the Amazon S3 application programming interface (API) through the Ceph Object Gateway.

The following table list the Amazon S3 functional operations for buckets, along with the function’s support status.

Table 3.2. Bucket operations
FeatureStatusNotes

List Buckets

Supported

 

Create a Bucket

Supported

Different set of canned ACLs.

Put Bucket Website

Supported

 

Get Bucket Website

Supported

 

Delete Bucket Website

Supported

 

Put Bucket replication

Supported

 

Get Bucket replication

Supported

 

Delete Bucket replication

Supported

 

Bucket Lifecycle

Partially Supported

Expiration, NoncurrentVersionExpiration and AbortIncompleteMultipartUpload supported.

Put Bucket Lifecycle

Partially Supported

Expiration, NoncurrentVersionExpiration and AbortIncompleteMultipartUpload supported.

Delete Bucket Lifecycle

Supported

 

Get Bucket Objects

Supported

 

Bucket Location

Supported

 

Get Bucket Version

Supported

 

Put Bucket Version

Supported

 

Delete Bucket

Supported

 

Get Bucket ACLs

Supported

Different set of canned ACLs

Put Bucket ACLs

Supported

Different set of canned ACLs

Get Bucket cors

Supported

 

Put Bucket cors

Supported

 

Delete Bucket cors

Supported

 

List Bucket Object Versions

Supported

 

Head Bucket

Supported

 

List Bucket Multipart Uploads

Supported

 

Bucket Policies

Partially Supported

 

Get a Bucket Request Payment

Supported

 

Put a Bucket Request Payment

Supported

 

Multi-tenant Bucket Operations

Supported

 

GET PublicAccessBlock

Supported

 

PUT PublicAccessBlock

Supported

 

Delete PublicAccessBlock

Supported

 

Prerequisites

  • A running Red Hat Ceph Storage cluster.
  • A RESTful client.

3.3.1. S3 create bucket notifications

Create bucket notifications at the bucket level. The notification configuration has the Red Hat Ceph Storage Object Gateway S3 events, ObjectCreated, ObjectRemoved, and ObjectLifecycle:Expiration. These need to be published and the destination to send the bucket notifications. Bucket notifications are S3 operations.

To create a bucket notification for s3:objectCreate, s3:objectRemove and s3:ObjectLifecycle:Expiration events, use PUT:

Example

client.put_bucket_notification_configuration(
   Bucket=bucket_name,
   NotificationConfiguration={
       'TopicConfigurations': [
           {
               'Id': notification_name,
               'TopicArn': topic_arn,
               'Events': ['s3:ObjectCreated:*', 's3:ObjectRemoved:*', 's3:ObjectLifecycle:Expiration:*']
           }]})

Important

Red Hat supports ObjectCreate events, such as put, post, multipartUpload, and copy. Red Hat also supports ObjectRemove events, such as object_delete and s3_multi_object_delete.

Request Entities

NotificationConfiguration
Description
list of TopicConfiguration entities.
Type
Container
Required
Yes
TopicConfiguration
Description
Id, Topic, and list of Event entities.
Type
Container
Required
Yes
id
Description
Name of the notification.
Type
String
Required
Yes
Topic
Description

Topic Amazon Resource Name(ARN)

Note

The topic must be created beforehand.

Type
String
Required
Yes
Event
Description
List of supported events. Multiple event entities can be used. If omitted, all events are handled.
Type
String
Required
No
Filter
Description
S3Key, S3Metadata and S3Tags entities.
Type
Container
Required
No
S3Key
Description
A list of FilterRule entities, for filtering based on the object key. At most, 3 entities may be in the list, for example Name would be prefix, suffix, or regex. All filter rules in the list must match for the filter to match.
Type
Container
Required
No
S3Metadata
Description
A list of FilterRule entities, for filtering based on object metadata. All filter rules in the list must match the metadata defined on the object. However, the object still matches if it has other metadata entries not listed in the filter.
Type
Container
Required
No
S3Tags
Description
A list of FilterRule entities, for filtering based on object tags. All filter rules in the list must match the tags defined on the object. However, the object still matches if it has other tags not listed in the filter.
Type
Container
Required
No
S3Key.FilterRule
Description
Name and Value entities. Name is : prefix, suffix, or regex. The Value would hold the key prefix, key suffix, or a regular expression for matching the key, accordingly.
Type
Container
Required
Yes
S3Metadata.FilterRule
Description
Name and Value entities. Name is the name of the metadata attribute for example x-amz-meta-xxx. The value is the expected value for this attribute.
Type
Container
Required
Yes
S3Tags.FilterRule
Description
Name and Value entities. Name is the tag key, and the value is the tag value.
Type
Container
Required
Yes

HTTP response

400
Status Code
MalformedXML
Description
The XML is not well-formed.
400
Status Code
InvalidArgument
Description
Missing Id or missing or invalid topic ARN or invalid event.
404
Status Code
NoSuchBucket
Description
The bucket does not exist.
404
Status Code
NoSuchKey
Description
The topic does not exist.

3.3.2. S3 get bucket notifications

Get a specific notification or list all the notifications configured on a bucket.

Syntax

Get /BUCKET?notification=NOTIFICATION_ID HTTP/1.1
Host: cname.domain.com
Date: date
Authorization: AWS ACCESS_KEY:HASH_OF_HEADER_AND_SECRET

Example

Get /testbucket?notification=testnotificationID HTTP/1.1
Host: cname.domain.com
Date: date
Authorization: AWS ACCESS_KEY:HASH_OF_HEADER_AND_SECRET

Example Response

<NotificationConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
    <TopicConfiguration>
        <Id></Id>
        <Topic></Topic>
        <Event></Event>
        <Filter>
            <S3Key>
                <FilterRule>
                    <Name></Name>
                    <Value></Value>
                </FilterRule>
                 </S3Key>
             <S3Metadata>
                 <FilterRule>
                     <Name></Name>
                     <Value></Value>
                 </FilterRule>
             </S3Metadata>
             <S3Tags>
                 <FilterRule>
                     <Name></Name>
                     <Value></Value>
                 </FilterRule>
             </S3Tags>
         </Filter>
    </TopicConfiguration>
</NotificationConfiguration>

Note

The notification subresource returns the bucket notification configuration or an empty NotificationConfiguration element. The caller must be the bucket owner.

Request Entities

notification-id
Description
Name of the notification. All notifications are listed if the ID is not provided.
Type
String
NotificationConfiguration
Description
list of TopicConfiguration entities.
Type
Container
Required
Yes
TopicConfiguration
Description
Id, Topic, and list of Event entities.
Type
Container
Required
Yes
id
Description
Name of the notification.
Type
String
Required
Yes
Topic
Description

Topic Amazon Resource Name(ARN)

Note

The topic must be created beforehand.

Type
String
Required
Yes
Event
Description
Handled event. Multiple event entities may exist.
Type
String
Required
Yes
Filter
Description
The filters for the specified configuration.
Type
Container
Required
No

HTTP response

404
Status Code
NoSuchBucket
Description
The bucket does not exist.
404
Status Code
NoSuchKey
Description
The notification does not exist if it has been provided.

3.3.3. S3 delete bucket notifications

Delete a specific or all notifications from a bucket.

Note

Notification deletion is an extension to the S3 notification API. Any defined notifications on a bucket are deleted when the bucket is deleted. Deleting an unknown notification for example double delete, is not considered an error.

To delete a specific or all notifications use DELETE:

Syntax

DELETE /BUCKET?notification=NOTIFICATION_ID HTTP/1.1

Example

DELETE /testbucket?notification=testnotificationID HTTP/1.1

Request Entities

notification-id
Description
Name of the notification. All notifications on the bucket are deleted if the notification ID is not provided.
Type
String

HTTP response

404
Status Code
NoSuchBucket
Description
The bucket does not exist.

3.3.4. Accessing bucket host names

There are two different modes of accessing the buckets. The first, and preferred method identifies the bucket as the top-level directory in the URI.

Example

GET /mybucket HTTP/1.1
Host: cname.domain.com

The second method identifies the bucket via a virtual bucket host name.

Example

GET / HTTP/1.1
Host: mybucket.cname.domain.com

Tip

Red Hat prefers the first method, because the second method requires expensive domain certification and DNS wild cards.

3.3.5. S3 list buckets

GET / returns a list of buckets created by the user making the request. GET / only returns buckets created by an authenticated user. You cannot make an anonymous request.

Syntax

GET / HTTP/1.1
Host: cname.domain.com

Authorization: AWS ACCESS_KEY:HASH_OF_HEADER_AND_SECRET

Response Entities

Buckets
Description
Container for list of buckets.
Type
Container
Bucket
Description
Container for bucket information.
Type
Container
Name
Description
Bucket name.
Type
String
CreationDate
Description
UTC time when the bucket was created.
Type
Date
ListAllMyBucketsResult
Description
A container for the result.
Type
Container
Owner
Description
A container for the bucket owner’s ID and DisplayName.
Type
Container
ID
Description
The bucket owner’s ID.
Type
String
DisplayName
Description
The bucket owner’s display name.
Type
String

3.3.6. S3 return a list of bucket objects

Returns a list of bucket objects.

Syntax

GET /BUCKET?max-keys=25 HTTP/1.1
Host: cname.domain.com

Parameters

prefix
Description
Only returns objects that contain the specified prefix.
Type
String
delimiter
Description
The delimiter between the prefix and the rest of the object name.
Type
String
marker
Description
A beginning index for the list of objects returned.
Type
String
max-keys
Description
The maximum number of keys to return. Default is 1000.
Type
Integer

HTTP Response

200
Status Code
OK
Description
Buckets retrieved.

GET /BUCKET returns a container for buckets with the following fields:

Bucket Response Entities

ListBucketResult
Description
The container for the list of objects.
Type
Entity
Name
Description
The name of the bucket whose contents will be returned.
Type
String
Prefix
Description
A prefix for the object keys.
Type
String
Marker
Description
A beginning index for the list of objects returned.
Type
String
MaxKeys
Description
The maximum number of keys returned.
Type
Integer
Delimiter
Description
If set, objects with the same prefix will appear in the CommonPrefixes list.
Type
String
IsTruncated
Description
If true, only a subset of the bucket’s contents were returned.
Type
Boolean
CommonPrefixes
Description
If multiple objects contain the same prefix, they will appear in this list.
Type
Container

The ListBucketResult contains objects, where each object is within a Contents container.

Object Response Entities

Contents
Description
A container for the object.
Type
Object
Key
Description
The object’s key.
Type
String
LastModified
Description
The object’s last-modified date and time.
Type
Date
ETag
Description
An MD-5 hash of the object. Etag is an entity tag.
Type
String
Size
Description
The object’s size.
Type
Integer
StorageClass
Description
Should always return STANDARD.
Type
String

3.3.7. S3 create a new bucket

Creates a new bucket. To create a bucket, you must have a user ID and a valid AWS Access Key ID to authenticate requests. You can not create buckets as an anonymous user.

Constraints

In general, bucket names should follow domain name constraints.

  • Bucket names must be unique.
  • Bucket names cannot be formatted as IP address.
  • Bucket names can be between 3 and 63 characters long.
  • Bucket names must not contain uppercase characters or underscores.
  • Bucket names must start with a lowercase letter or number.
  • Bucket names can contain a dash (-).
  • Bucket names must be a series of one or more labels. Adjacent labels are separated by a single period (.). Bucket names can contain lowercase letters, numbers, and hyphens. Each label must start and end with a lowercase letter or a number.
Note

The above constraints are relaxed if rgw_relaxed_s3_bucket_names is set to true. The bucket names must still be unique, cannot be formatted as IP address, and can contain letters, numbers, periods, dashes, and underscores of up to 255 characters long.

Syntax

PUT /BUCKET HTTP/1.1
Host: cname.domain.com
x-amz-acl: public-read-write

Authorization: AWS ACCESS_KEY:HASH_OF_HEADER_AND_SECRET

Parameters

x-amz-acl
Description
Canned ACLs.
Valid Values
private, public-read,public-read-write, authenticated-read
Required
No

HTTP Response

If the bucket name is unique, within constraints, and unused, the operation will succeed. If a bucket with the same name already exists and the user is the bucket owner, the operation will succeed. If the bucket name is already in use, the operation will fail.

409
Status Code
BucketAlreadyExists
Description
Bucket already exists under different user’s ownership.

3.3.8. S3 put bucket website

The put bucket website API sets the configuration of the website that is specified in the website subresource. To configure a bucket as a website, the website subresource can be added on the bucket.

Note

Put operation requires S3:PutBucketWebsite permission. By default, only the bucket owner can configure the website attached to a bucket.

Syntax

PUT /BUCKET?website-configuration=HTTP/1.1

Example

PUT /testbucket?website-configuration=HTTP/1.1

Additional Resources

  • For more information about this API call, see S3 API.

3.3.9. S3 get bucket website

The get bucket website API retrieves the configuration of the website that is specified in the website subresource.

Note

Get operation requires the S3:GetBucketWebsite permission. By default, only the bucket owner can read the bucket website configuration.

Syntax

GET /BUCKET?website-configuration=HTTP/1.1

Example

GET /testbucket?website-configuration=HTTP/1.1

Additional Resources

  • For more information about this API call, see S3 API.

3.3.10. S3 delete bucket website

The delete bucket website API removes the website configuration for a bucket.

Syntax

DELETE /BUCKET?website-configuration=HTTP/1.1

Example

DELETE /testbucket?website-configuration=HTTP/1.1

Additional Resources

  • For more information about this API call, see S3 API.

3.3.11. S3 put bucket replication

The put bucket replication API configures replication configuration for a bucket or replaces an existing one.

Syntax

PUT /BUCKET?replication HTTP/1.1

Example

PUT /testbucket?replication HTTP/1.1

3.3.12. S3 get bucket replication

The get bucket replication API returns the replication configuration of a bucket.

Syntax

GET /BUCKET?replication HTTP/1.1

Example

GET /testbucket?replication HTTP/1.1

3.3.13. S3 delete bucket replication

The delete bucket replication API deletes the replication configuration from a bucket.

Syntax

DELETE /BUCKET?replication HTTP/1.1

Example

DELETE /testbucket?replication HTTP/1.1

3.3.14. S3 delete a bucket

Deletes a bucket. You can reuse bucket names following a successful bucket removal.

Syntax

DELETE /BUCKET HTTP/1.1
Host: cname.domain.com

Authorization: AWS ACCESS_KEY:HASH_OF_HEADER_AND_SECRET

HTTP Response

204
Status Code
No Content
Description
Bucket removed.

3.3.15. S3 bucket lifecycle

You can use a bucket lifecycle configuration to manage your objects so they are stored effectively throughout their lifetime. The S3 API in the Ceph Object Gateway supports a subset of the AWS bucket lifecycle actions:

  • Expiration: This defines the lifespan of objects within a bucket. It takes the number of days the object should live or expiration date, at which point Ceph Object Gateway will delete the object. If the bucket doesn’t enable versioning, Ceph Object Gateway will delete the object permanently. If the bucket enables versioning, Ceph Object Gateway will create a delete marker for the current version, and then delete the current version.
  • NoncurrentVersionExpiration: This defines the lifespan of noncurrent object versions within a bucket. To use this feature, you must enable bucket versioning. It takes the number of days a noncurrent object should live, at which point Ceph Object Gateway will delete the noncurrent object.
  • NewerNoncurrentVersions: Specifies how many noncurrent object versions to retain. You can specify up to 100 noncurrent versions to retain. If the specified number to retain is more than 100, additional noncurrent versions are deleted.
  • AbortIncompleteMultipartUpload: This defines the number of days an incomplete multipart upload should live before it is aborted.
  • BlockPublicPolicy reject: This action is for public access block. It calls PUT access point policy and PUT bucket policy that are made through the access point if the specified policy (for either the access point or the underlying bucket) allows public access. The Amazon S3 Block Public Access feature is available in Red Hat Ceph Storage 5.x/ Ceph Pacific versions. It provides settings for access points, buckets, and accounts to help you manage public access to Amazon S3 resources. By default, new buckets, access points, and objects do not allow public access. However, you can modify bucket policies, access point policies, or object permissions to allow public access. S3 Block Public Access settings override these policies and permissions so that you can limit public access to these resources.

The lifecycle configuration contains one or more rules using the <Rule> element.

Example

<LifecycleConfiguration>
    <Rule>
      <Prefix/>
      <Status>Enabled</Status>
      <Expiration>
        <Days>10</Days>
      </Expiration>
    </Rule>
</LifecycleConfiguration>

A lifecycle rule can apply to all or a subset of objects in a bucket based on the <Filter> element that you specify in the lifecycle rule. You can specify a filter in several ways:

  • Key prefixes
  • Object tags
  • Both key prefix and one or more object tags

Key prefixes

You can apply a lifecycle rule to a subset of objects based on the key name prefix. For example, specifying <keypre/> would apply to objects that begin with keypre/:

<LifecycleConfiguration>
    <Rule>
        <Status>Enabled</Status>
        <Filter>
           <Prefix>keypre/</Prefix>
        </Filter>
    </Rule>
</LifecycleConfiguration>

You can also apply different lifecycle rules to objects with different key prefixes:

<LifecycleConfiguration>
    <Rule>
        <Status>Enabled</Status>
        <Filter>
           <Prefix>keypre/</Prefix>
        </Filter>
    </Rule>
    <Rule>
        <Status>Enabled</Status>
        <Filter>
           <Prefix>mypre/</Prefix>
        </Filter>
    </Rule>
</LifecycleConfiguration>

Object tags

You can apply a lifecycle rule to only objects with a specific tag using the <Key> and <Value> elements:

<LifecycleConfiguration>
    <Rule>
        <Status>Enabled</Status>
        <Filter>
           <Tag>
              <Key>key</Key>
              <Value>value</Value>
           </Tag>
        </Filter>
    </Rule>
</LifecycleConfiguration>

Both prefix and one or more tags

In a lifecycle rule, you can specify a filter based on both the key prefix and one or more tags. They must be wrapped in the <And> element. A filter can have only one prefix, and zero or more tags:

<LifecycleConfiguration>
    <Rule>
        <Status>Enabled</Status>
        <Filter>
          <And>
             <Prefix>key-prefix</Prefix>
             <Tag>
                <Key>key1</Key>
                <Value>value1</Value>
             </Tag>
             <Tag>
                <Key>key2</Key>
                <Value>value2</Value>
             </Tag>
              ...
          </And>
        </Filter>
    </Rule>
</LifecycleConfiguration>

Additional Resources

3.3.16. S3 GET bucket lifecycle

To get a bucket lifecycle, use GET and specify a destination bucket.

Syntax

GET /BUCKET?lifecycle HTTP/1.1
Host: cname.domain.com

Authorization: AWS ACCESS_KEY:HASH_OF_HEADER_AND_SECRET

Request Headers

See the S3 common request headers in Appendix B for more information about common request headers.

Response

The response contains the bucket lifecycle and its elements.

3.3.17. S3 create or replace a bucket lifecycle

To create or replace a bucket lifecycle, use PUT and specify a destination bucket and a lifecycle configuration. The Ceph Object Gateway only supports a subset of the S3 lifecycle functionality.

Syntax

PUT /BUCKET?lifecycle HTTP/1.1
Host: cname.domain.com

Authorization: AWS ACCESS_KEY:HASH_OF_HEADER_AND_SECRET
<LifecycleConfiguration>
  <Rule>
    <Expiration>
      <Days>10</Days>
    </Expiration>
  </Rule>
    ...
  <Rule>
  </Rule>
</LifecycleConfiguration>

Request Headers

content-md5
Description
A base64 encoded MD-5 hash of the message
Valid Values
String No defaults or constraints.
Required
No

Additional Resources

  • See the S3 common request headers section in Appendix B of the Red Hat Ceph Storage Developer Guide for more information on Amazon S3 common request headers.
  • See the S3 bucket lifecycles section of the Red Hat Ceph Storage Developer Guide for more information on Amazon S3 bucket lifecycles.

3.3.18. S3 delete a bucket lifecycle

To delete a bucket lifecycle, use DELETE and specify a destination bucket.

Syntax

DELETE /BUCKET?lifecycle HTTP/1.1
Host: cname.domain.com

Authorization: AWS ACCESS_KEY:HASH_OF_HEADER_AND_SECRET

Request Headers

The request does not contain any special elements.

Response

The response returns common response status.

Additional Resources

  • See the S3 common request headers section in Appendix B of the Red Hat Ceph Storage Developer Guide for more information on Amazon S3 common request headers.
  • See the S3 common response status codes section in Appendix C of Red Hat Ceph Storage Developer Guide for more information on Amazon S3 common response status codes.

3.3.19. S3 get bucket location

Retrieves the bucket’s zone group. The user needs to be the bucket owner to call this. A bucket can be constrained to a zone group by providing LocationConstraint during a PUT request.

Add the location subresource to the bucket resource as shown below.

Syntax

GET /BUCKET?location HTTP/1.1
Host: cname.domain.com

Authorization: AWS ACCESS_KEY:HASH_OF_HEADER_AND_SECRET

Response Entities

LocationConstraint
Description
The zone group where bucket resides, an empty string for default zone group.
Type
String

3.3.20. S3 get bucket versioning

Retrieves the versioning state of a bucket. The user needs to be the bucket owner to call this.

Add the versioning subresource to the bucket resource as shown below.

Syntax

GET /BUCKET?versioning HTTP/1.1
Host: cname.domain.com

Authorization: AWS ACCESS_KEY:HASH_OF_HEADER_AND_SECRET

3.3.21. S3 put bucket versioning

This subresource set the versioning state of an existing bucket. The user needs to be the bucket owner to set the versioning state. If the versioning state has never been set on a bucket, then it has no versioning state. Doing a GET versioning request does not return a versioning state value.

Setting the bucket versioning state:

Enabled: Enables versioning for the objects in the bucket. All objects added to the bucket receive a unique version ID. Suspended: Disables versioning for the objects in the bucket. All objects added to the bucket receive the version ID null.

Syntax

PUT /BUCKET?versioning HTTP/1.1

Example

PUT /testbucket?versioning HTTP/1.1

Bucket Request Entities

VersioningConfiguration
Description
A container for the request.
Type
Container
Status
Description
Sets the versioning state of the bucket. Valid Values: Suspended/Enabled
Type
String

3.3.22. S3 get bucket access control lists

Retrieves the bucket access control list. The user needs to be the bucket owner or to have been granted READ_ACP permission on the bucket.

Add the acl subresource to the bucket request as shown below.

Syntax

GET /BUCKET?acl HTTP/1.1
Host: cname.domain.com

Authorization: AWS ACCESS_KEY:HASH_OF_HEADER_AND_SECRET

Response Entities

AccessControlPolicy
Description
A container for the response.
Type
Container
AccessControlList
Description
A container for the ACL information.
Type
Container
Owner
Description
A container for the bucket owner’s ID and DisplayName.
Type
Container
ID
Description
The bucket owner’s ID.
Type
String
DisplayName
Description
The bucket owner’s display name.
Type
String
Grant
Description
A container for Grantee and Permission.
Type
Container
Grantee
Description
A container for the DisplayName and ID of the user receiving a grant of permission.
Type
Container
Permission
Description
The permission given to the Grantee bucket.
Type
String

3.3.23. S3 put bucket Access Control Lists

Sets an access control to an existing bucket. The user needs to be the bucket owner or to have been granted WRITE_ACP permission on the bucket.

Add the acl subresource to the bucket request as shown below.

Syntax

PUT /BUCKET?acl HTTP/1.1

Request Entities

S3 list multipart uploads

AccessControlList
Description
A container for the ACL information.
Type
Container
Owner
Description
A container for the bucket owner’s ID and DisplayName.
Type
Container
ID
Description
The bucket owner’s ID.
Type
String
DisplayName
Description
The bucket owner’s display name.
Type
String
Grant
Description
A container for Grantee and Permission.
Type
Container
Grantee
Description
A container for the DisplayName and ID of the user receiving a grant of permission.
Type
Container
Permission
Description
The permission given to the Grantee bucket.
Type
String

3.3.24. S3 get bucket cors

Retrieves the cors configuration information set for the bucket. The user needs to be the bucket owner or to have been granted READ_ACP permission on the bucket.

Add the cors subresource to the bucket request as shown below.

Syntax

GET /BUCKET?cors HTTP/1.1
Host: cname.domain.com

Authorization: AWS ACCESS_KEY:HASH_OF_HEADER_AND_SECRET

3.3.25. S3 put bucket cors

Sets the cors configuration for the bucket. The user needs to be the bucket owner or to have been granted READ_ACP permission on the bucket.

Add the cors subresource to the bucket request as shown below.

Syntax

PUT /BUCKET?cors HTTP/1.1
Host: cname.domain.com

Authorization: AWS ACCESS_KEY:HASH_OF_HEADER_AND_SECRET

3.3.26. S3 delete a bucket cors

Deletes the cors configuration information set for the bucket. The user needs to be the bucket owner or to have been granted READ_ACP permission on the bucket.

Add the cors subresource to the bucket request as shown below.

Syntax

DELETE /BUCKET?cors HTTP/1.1
Host: cname.domain.com

Authorization: AWS ACCESS_KEY:HASH_OF_HEADER_AND_SECRET

3.3.27. S3 list bucket object versions

Returns a list of metadata about all the version of objects within a bucket. Requires READ access to the bucket.

Add the versions subresource to the bucket request as shown below.

Syntax

GET /BUCKET?versions HTTP/1.1
Host: cname.domain.com

Authorization: AWS ACCESS_KEY:HASH_OF_HEADER_AND_SECRET

You can specify parameters for GET /BUCKET?versions, but none of them are required.

Parameters

prefix
Description
Returns in-progress uploads whose keys contain the specified prefix.
Type
String
delimiter
Description
The delimiter between the prefix and the rest of the object name.
Type
String
key-marker
Description
The beginning marker for the list of uploads.
Type
String
max-keys
Description
The maximum number of in-progress uploads. The default is 1000.
Type
Integer
version-id-marker
Description
Specifies the object version to begin the list.
Type
String

Response Entities

KeyMarker
Description
The key marker specified by the key-marker request parameter, if any.
Type
String
NextKeyMarker
Description
The key marker to use in a subsequent request if IsTruncated is true.
Type
String
NextUploadIdMarker
Description
The upload ID marker to use in a subsequent request if IsTruncated is true.
Type
String
IsTruncated
Description
If true, only a subset of the bucket’s upload contents were returned.
Type
Boolean
Size
Description
The size of the uploaded part.
Type
Integer
DisplayName
Description
The owner’s display name.
Type
String
ID
Description
The owner’s ID.
Type
String
Owner
Description
A container for the ID and DisplayName of the user who owns the object.
Type
Container
StorageClass
Description
The method used to store the resulting object. STANDARD or REDUCED_REDUNDANCY
Type
String
Version
Description
Container for the version information.
Type
Container
versionId
Description
Version ID of an object.
Type
String
versionIdMarker
Description
The last version of the key in a truncated response.
Type
String

3.3.28. S3 head bucket

Calls HEAD on a bucket to determine if it exists and if the caller has access permissions. Returns 200 OK if the bucket exists and the caller has permissions; 404 Not Found if the bucket does not exist; and, 403 Forbidden if the bucket exists but the caller does not have access permissions.

Syntax

HEAD /BUCKET HTTP/1.1
Host: cname.domain.com
Date: date
Authorization: AWS ACCESS_KEY:HASH_OF_HEADER_AND_SECRET

3.3.29. S3 list multipart uploads

GET /?uploads returns a list of the current in-progress multipart uploads, that is, the application initiates a multipart upload, but the service hasn’t completed all the uploads yet.

Syntax

GET /BUCKET?uploads HTTP/1.1

You can specify parameters for GET /BUCKET?uploads, but none of them are required.

Parameters

prefix
Description
Returns in-progress uploads whose keys contain the specified prefix.
Type
String
delimiter
Description
The delimiter between the prefix and the rest of the object name.
Type
String
key-marker
Description
The beginning marker for the list of uploads.
Type
String
max-keys
Description
The maximum number of in-progress uploads. The default is 1000.
Type
Integer
max-uploads
Description
The maximum number of multipart uploads. The range is from 1-1000. The default is 1000.
Type
Integer
version-id-marker
Description
Ignored if key-marker isn’t specified. Specifies the ID of the first upload to list in lexicographical order at or following the ID.
Type
String

Response Entities

ListMultipartUploadsResult
Description
A container for the results.
Type
Container
ListMultipartUploadsResult.Prefix
Description
The prefix specified by the prefix request parameter, if any.
Type
String
Bucket
Description
The bucket that will receive the bucket contents.
Type
String
KeyMarker
Description
The key marker specified by the key-marker request parameter, if any.
Type
String
UploadIdMarker
Description
The marker specified by the upload-id-marker request parameter, if any.
Type
String
NextKeyMarker
Description
The key marker to use in a subsequent request if IsTruncated is true.
Type
String
NextUploadIdMarker
Description
The upload ID marker to use in a subsequent request if IsTruncated is true.
Type
String
MaxUploads
Description
The max uploads specified by the max-uploads request parameter.
Type
Integer
Delimiter
Description
If set, objects with the same prefix will appear in the CommonPrefixes list.
Type
String
IsTruncated
Description
If true, only a subset of the bucket’s upload contents were returned.
Type
Boolean
Upload
Description
A container for Key, UploadId, InitiatorOwner, StorageClass, and Initiated elements.
Type
Container
Key
Description
The key of the object once the multipart upload is complete.
Type
String
UploadId
Description
The ID that identifies the multipart upload.
Type
String
Initiator
Description
Contains the ID and DisplayName of the user who initiated the upload.
Type
Container
DisplayName
Description
The initiator’s display name.
Type
String
ID
Description
The initiator’s ID.
Type
String
Owner
Description
A container for the ID and DisplayName of the user who owns the uploaded object.
Type
Container
StorageClass
Description
The method used to store the resulting object. STANDARD or REDUCED_REDUNDANCY
Type
String
Initiated
Description
The date and time the user initiated the upload.
Type
Date
CommonPrefixes
Description
If multiple objects contain the same prefix, they will appear in this list.
Type
Container
CommonPrefixes.Prefix
Description
The substring of the key after the prefix as defined by the prefix request parameter.
Type
String

3.3.30. S3 bucket policies

The Ceph Object Gateway supports a subset of the Amazon S3 policy language applied to buckets.

Creation and Removal

Ceph Object Gateway manages S3 Bucket policies through standard S3 operations rather than using the radosgw-admin CLI tool.

Administrators may use the s3cmd command to set or delete a policy.

Example

$ cat > examplepol
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {"AWS": ["arn:aws:iam::usfolks:user/fred"]},
    "Action": "s3:PutObjectAcl",
    "Resource": [
      "arn:aws:s3:::happybucket/*"
    ]
  }]
}

$ s3cmd setpolicy examplepol s3://happybucket
$ s3cmd delpolicy s3://happybucket

Limitations

Ceph Object Gateway only supports the following S3 actions:

  • s3:AbortMultipartUpload
  • s3:CreateBucket
  • s3:DeleteBucketPolicy
  • s3:DeleteBucket
  • s3:DeleteBucketWebsite
  • s3:DeleteBucketReplication
  • s3:DeleteReplicationConfiguration
  • s3:DeleteObject
  • s3:DeleteObjectVersion
  • s3:GetBucketAcl
  • s3:GetBucketCORS
  • s3:GetBucketLocation
  • s3:GetBucketPolicy
  • s3:GetBucketRequestPayment
  • s3:GetBucketVersioning
  • s3:GetBucketWebsite
  • s3:GetBucketReplication
  • s3:GetReplicationConfiguration
  • s3:GetLifecycleConfiguration
  • s3:GetObjectAcl
  • s3:GetObject
  • s3:GetObjectTorrent
  • s3:GetObjectVersionAcl
  • s3:GetObjectVersion
  • s3:GetObjectVersionTorrent
  • s3:ListAllMyBuckets
  • s3:ListBucketMultiPartUploads
  • s3:ListBucket
  • s3:ListBucketVersions
  • s3:ListMultipartUploadParts
  • s3:PutBucketAcl
  • s3:PutBucketCORS
  • s3:PutBucketPolicy
  • s3:PutBucketRequestPayment
  • s3:PutBucketVersioning
  • s3:PutBucketWebsite
  • s3:PutBucketReplication
  • s3:PutReplicationConfiguration
  • s3:PutLifecycleConfiguration
  • s3:PutObjectAcl
  • s3:PutObject
  • s3:PutObjectVersionAcl
Note

Ceph Object Gateway does not support setting policies on users, groups, or roles.

The Ceph Object Gateway uses the RGW tenant identifier in place of the Amazon twelve-digit account ID. Ceph Object Gateway administrators who want to use policies between Amazon Web Service (AWS) S3 and Ceph Object Gateway S3 will have to use the Amazon account ID as the tenant ID when creating users.

With AWS S3, all tenants share a single namespace. By contrast, Ceph Object Gateway gives every tenant its own namespace of buckets. At present, Ceph Object Gateway clients trying to access a bucket belonging to another tenant MUST address it as tenant:bucket in the S3 request.

In the AWS, a bucket policy can grant access to another account, and that account owner can then grant access to individual users with user permissions. Since Ceph Object Gateway does not yet support user, role, and group permissions, account owners will need to grant access directly to individual users.

Important

Granting an entire account access to a bucket grants access to ALL users in that account.

Bucket policies do NOT support string interpolation.

Ceph Object Gateway supports the following condition keys:

  • aws:CurrentTime
  • aws:EpochTime
  • aws:PrincipalType
  • aws:Referer
  • aws:SecureTransport
  • aws:SourceIp
  • aws:UserAgent
  • aws:username

Ceph Object Gateway ONLY supports the following condition keys for the ListBucket action:

  • s3:prefix
  • s3:delimiter
  • s3:max-keys

Impact on Swift

Ceph Object Gateway provides no functionality to set bucket policies under the Swift API. However, bucket policies that are set with the S3 API govern Swift and S3 operations.

Ceph Object Gateway matches Swift credentials against principals that are specified in a policy.

3.3.31. S3 get the request payment configuration on a bucket

Uses the requestPayment subresource to return the request payment configuration of a bucket. The user needs to be the bucket owner or to have been granted READ_ACP permission on the bucket.

Add the requestPayment subresource to the bucket request as shown below.

Syntax

GET /BUCKET?requestPayment HTTP/1.1
Host: cname.domain.com

Authorization: AWS ACCESS_KEY:HASH_OF_HEADER_AND_SECRET

3.3.32. S3 set the request payment configuration on a bucket

Uses the requestPayment subresource to set the request payment configuration of a bucket. By default, the bucket owner pays for downloads from the bucket. This configuration parameter enables the bucket owner to specify that the person requesting the download will be charged for the request and the data download from the bucket.

Add the requestPayment subresource to the bucket request as shown below.

Syntax

PUT /BUCKET?requestPayment HTTP/1.1
Host: cname.domain.com

Request Entities

Payer
Description
Specifies who pays for the download and request fees.
Type
Enum
RequestPaymentConfiguration
Description
A container for Payer.
Type
Container

3.3.33. Multi-tenant bucket operations

When a client application accesses buckets, it always operates with the credentials of a particular user. In Red Hat Ceph Storage cluster, every user belongs to a tenant. Consequently, every bucket operation has an implicit tenant in its context if no tenant is specified explicitly. Thus multi-tenancy is completely backward compatible with previous releases, as long as the referred buckets and referring user belong to the same tenant.

Extensions employed to specify an explicit tenant differ according to the protocol and authentication system used.

In the following example, a colon character separates tenant and bucket. Thus a sample URL would be:

https://rgw.domain.com/tenant:bucket

By contrast, a simple Python example separates the tenant and bucket in the bucket method itself:

Example

from boto.s3.connection import S3Connection, OrdinaryCallingFormat
  c = S3Connection(
    aws_access_key_id="TESTER",
    aws_secret_access_key="test123",
    host="rgw.domain.com",
    calling_format = OrdinaryCallingFormat()
  )
  bucket = c.get_bucket("tenant:bucket")

Note

It’s not possible to use S3-style subdomains using multi-tenancy, since host names cannot contain colons or any other separators that are not already valid in bucket names. Using a period creates an ambiguous syntax. Therefore, the bucket-in-URL-path format has to be used with multi-tenancy.

Additional Resources

  • See the Multi Tenancy section under User Management in the Red Hat Ceph Storage Object Gateway Guide for additional details.

3.3.34. S3 Block Public Access

You can use the S3 Block Public Access feature to set buckets and users to help you manage public access to Red Hat Ceph Storage object storage S3 resources.

Using this feature, bucket policies, access point policies, and object permissions can be overridden to allow public access. By default, new buckets, access points, and objects do not allow public access.

The S3 API in the Ceph Object Gateway supports a subset of the AWS public access settings:

  • BlockPublicPolicy: This defines the setting to allow users to manage access point and bucket policies. This setting does not allow the users to publicly share the bucket or the objects it contains. Existing access point and bucket policies are not affected by enabling this setting. Setting this option to TRUE causes the S3:

    • To reject calls to PUT Bucket policy.
    • To reject calls to PUT access point policy for all of the bucket’s same-account access points.
Important

Apply this setting at the user level so that users cannot alter a specific bucket’s block public access setting.

Note

The TRUE setting only works if the specified policy allows public access.

  • RestrictPublicBuckets: This defines the setting to restrict access to a bucket or access point with public policy. The restriction applies to only AWS service principals and authorized users within the bucket owner’s account and access point owner’s account. This blocks cross-account access to the access point or bucket, except for the cases specified, while still allowing users within the account to manage the access points or buckets. Enabling this setting does not affect existing access point or bucket policies. It only defines that Amazon S3 blocks public and cross-account access derived from any public access point or bucket policy, including non-public delegation to specific accounts.
Note

Access control lists (ACLs) are not currently supported by Red Hat Ceph Storage.

Bucket policies are assumed to be public unless defined otherwise. To block public access a bucket policy must give access only to fixed values for one or more of the following:

Note

A fixed value does not contain a wildcard (*) or an AWS Identity and Access Management Policy Variable.

  • An AWS principal, user, role, or service principal
  • A set of Classless Inter-Domain Routings (CIDRs), using aws:SourceIp
  • aws:SourceArn
  • aws:SourceVpc
  • aws:SourceVpce
  • aws:SourceOwner
  • aws:SourceAccount
  • s3:x-amz-server-side-encryption-aws-kms-key-id
  • aws:userid, outside the pattern AROLEID:*
  • s3:DataAccessPointArn

    Note

    When used in a bucket policy, this value can contain a wildcard for the access point name without rendering the policy public, as long as the account ID is fixed.

  • s3:DataAccessPointPointAccount

The following example policy is considered public.

Example

{
		"Principal": "*",
		"Resource": "*",
		"Action": "s3:PutObject",
		"Effect": "Allow",
		"Condition": { "StringLike": {"aws:SourceVpc": "vpc-*"}}
	}

To make a policy non-public, include any of the condition keys with a fixed value.

Example

{
		"Principal": "*",
		"Resource": "*",
		"Action": "s3:PutObject",
		"Effect": "Allow",
		"Condition": {"StringEquals": {"aws:SourceVpc": "vpc-91237329"}}
	}

Additional Resources

3.3.35. S3 GET PublicAccessBlock

To get the S3 Block Public Access feature configured, use GET and specify a destination AWS account.

Syntax

GET /v20180820/configuration/publicAccessBlock HTTP/1.1
Host: cname.domain.com
x-amz-account-id: _ACCOUNTID_

Request Headers

See the S3 common request headers in Appendix B for more information about common request headers.

Response

The response is an HTTP 200 response and is returned in XML format.

3.3.36. S3 PUT PublicAccessBlock

Use this to create or modify the PublicAccessBlock configuration for an S3 bucket.

To use this operation, you must have the s3:PutBucketPublicAccessBlock permission.

Important

If the PublicAccessBlock configuration is different between the bucket and the account, Amazon S3 uses the most restrictive combination of the bucket-level and account-level settings.

Syntax

PUT /?publicAccessBlock HTTP/1.1
Host: Bucket.s3.amazonaws.com
Content-MD5: ContentMD5
x-amz-sdk-checksum-algorithm: ChecksumAlgorithm
x-amz-expected-bucket-owner: ExpectedBucketOwner
<?xml version="1.0" encoding="UTF-8"?>
<PublicAccessBlockConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
   <BlockPublicAcls>boolean</BlockPublicAcls>
   <IgnorePublicAcls>boolean</IgnorePublicAcls>
   <BlockPublicPolicy>boolean</BlockPublicPolicy>
   <RestrictPublicBuckets>boolean</RestrictPublicBuckets>
</PublicAccessBlockConfiguration>

Request Headers

See the S3 common request headers in Appendix B for more information about common request headers.

Response

The response is an HTTP 200 response and is returned with an empty HTTP body.

3.3.37. S3 delete PublicAccessBlock

Use this to delete the PublicAccessBlock configuration for an S3 bucket.

Syntax

DELETE /v20180820/configuration/publicAccessBlock HTTP/1.1
Host: s3-control.amazonaws.com
x-amz-account-id: AccountId

Request Headers

See the S3 common request headers in Appendix B for more information about common request headers.

Response

The response is an HTTP 200 response and is returned with an empty HTTP body.

3.4. S3 object operations

As a developer, you can perform object operations with the Amazon S3 application programming interface (API) through the Ceph Object Gateway.

The following table list the Amazon S3 functional operations for objects, along with the function’s support status.

Table 3.3. Object operations
FeatureStatus

Get Object

Supported

Head object

Supported

Put Object Lock

Supported

Get Object Lock

Supported

Put Object Legal Hold

Supported

Get Object Legal Hold

Supported

Put Object Retention

Supported

Get Object Retention

Supported

Put Object Tagging

Supported

Get Object Tagging

Supported

Delete Object Tagging

Supported

Put Object

Supported

Delete Object

Supported

Delete Multiple Objects

Supported

Get Object ACLs

Supported

Put Object ACLs

Supported

Copy Object

Supported

Post Object

Supported

Options Object

Supported

Initiate Multipart Upload

Supported

Add a Part to a Multipart Upload

Supported

List Parts of a Multipart Upload

Supported

Assemble Multipart Upload

Supported

Copy Multipart Upload

Supported

Abort Multipart Upload

Supported

Multi-Tenancy

Supported

Prerequisites

  • A running Red Hat Ceph Storage cluster.
  • A RESTful client.

3.4.1. S3 get an object from a bucket

Retrieves an object from a bucket:

Syntax

GET /BUCKET/OBJECT HTTP/1.1

Add the versionId subresource to retrieve a particular version of the object:

Syntax

GET /BUCKET/OBJECT?versionId=VERSION_ID HTTP/1.1

Request Headers

partNumber
Description
Part number of the object being read. This enables a ranged GET request for the specified part. Using this request is useful for downloading just a part of an object.
Valid Values
A positive integer between 1 and 10,000.
Required
No
range
Description

The range of the object to retrieve.

Note

Multiple ranges of data per GET request are not supported.

Valid Values
Range:bytes=beginbyte-endbyte
Required
No
if-modified-since
Description
Gets only if modified since the timestamp.
Valid Values
Timestamp
Required
No
if-unmodified-since
Description
Gets only if not modified since the timestamp.
Valid Values
Timestamp
Required
No
if-match
Description
Gets only if object ETag matches ETag.
Valid Values
Entity Tag
Required
No
if-none-match
Description
Gets only if object ETag does not match ETag.
Valid Values
Entity Tag
Required
No

Sytnax with request headers

GET /BUCKET/OBJECT?partNumber=PARTNUMBER&versionId=VersionId HTTP/1.1
Host: Bucket.s3.amazonaws.com
If-Match: IfMatch
If-Modified-Since: IfModifiedSince
If-None-Match: IfNoneMatch
If-Unmodified-Since: IfUnmodifiedSince
Range: Range

Response Headers

Content-Range
Description
Data range, will only be returned if the range header field was specified in the request.
x-amz-version-id
Description
Returns the version ID or null.
x-rgw-replicated-from
Description
Returns the source zone and any intermediate zones involved in an object’s replication path within a Ceph multi-zone environment. This header is included in GetObject and HeadObject responses.
x-rgw-replicated-at
Description
Returns a timestamp indicating when the object was replicated to its current location. You can calculate the duration for replication to complete by using this header with Last-Modified header.
Note

As of now, x-rgw-replicated-from and x-rgw-replicated-at are supported by client tools like s3cmd or curl verify at the replicated zone. These tools can be used in addition to radosgw-admin command for verification. With radosgw-admin object stat we have a known issue BZ-2312552 of missing header key x-rgw-replicated-from.

3.4.2. S3 get object attributes

Use the S3 GetObjectAttributes API to retrieve the metadata of an object without returning the object’s data. GetObjectAttributes API combines the functionality of HeadObject and ListParts. It provides all the information returned by these two calls in a single request, streamlining the process and reducing the number of API calls needed.

Syntax

GET /BUCKET/OBJECT?attributes&versionId=VersionId

Example

GET /testbucket/testobject?attributes&versionId=testversionid
Host: Bucket.s3.amazonaws.com
x-amz-max-parts: MaxParts
x-amz-part-number-marker: PartNumberMarker
x-amz-server-side-encryption-customer-algorithm: SSECustomerAlgorithm
x-amz-server-side-encryption-customer-key: SSECustomerKey
x-amz-server-side-encryption-customer-key-MD5: SSECustomerKeyMD5
x-amz-request-payer: RequestPayer
x-amz-expected-bucket-owner: ExpectedBucketOwner
x-amz-object-attributes: ObjectAttributes

The versionId subresource retrieves a particular version of the object.

3.4.2.1. Request entities

Example

GET /{Key+}?attributes&versionId=VersionId HTTP/1.1
Host: Bucket.s3.amazonaws.com
x-amz-max-parts: MaxParts
x-amz-part-number-marker: PartNumberMarker
x-amz-server-side-encryption-customer-algorithm: SSECustomerAlgorithm
x-amz-server-side-encryption-customer-key: SSECustomerKey
x-amz-server-side-encryption-customer-key-MD5: SSECustomerKeyMD5
x-amz-request-payer: RequestPayer
x-amz-expected-bucket-owner: ExpectedBucketOwner
x-amz-object-attributes: ObjectAttributes

3.4.2.2. Get request headers

Name

Description

Type / Valid values

Required?

Bucket

The name of the bucket that contains the object.

String

Yes

Key

The object key.

String

Yes

versionId

The version ID used to reference a specific version of the object.

String

No

x-amz-max-parts

Sets the maximum number of parts to return.

String

No

x-amz-object-attributes

Specifies the fields at the root level that you want returned in the response. Fields that you do not specify are not returned.

ETag,Checksum,ObjectParts,StorageClass, ObjectSize

Yes

x-amz-part-number-marker

Specifies the part after which listing should begin. Only parts with higher part numbers will be listed.

String

No

3.4.2.3. Response entities

Example

HTTP/1.1 200
x-amz-delete-marker: DeleteMarker
Last-Modified: LastModified
x-amz-version-id: VersionId
x-amz-request-charged: RequestCharged
<?xml version="1.0" encoding="UTF-8"?>
<GetObjectAttributesOutput>
   <ETag>string</ETag>
   <Checksum>
      <ChecksumCRC32>string</ChecksumCRC32>
      <ChecksumCRC32C>string</ChecksumCRC32C>
      <ChecksumSHA1>string</ChecksumSHA1>
      <ChecksumSHA256>string</ChecksumSHA256>
   </Checksum>
   <ObjectParts>
      <IsTruncated>boolean</IsTruncated>
      <MaxParts>integer</MaxParts>
      <NextPartNumberMarker>integer</NextPartNumberMarker>
      <PartNumberMarker>integer</PartNumberMarker>
      <Part>
         <ChecksumCRC32>string</ChecksumCRC32>
         <ChecksumCRC32C>string</ChecksumCRC32C>
         <ChecksumSHA1>string</ChecksumSHA1>
         <ChecksumSHA256>string</ChecksumSHA256>
         <PartNumber>integer</PartNumber>
         <Size>long</Size>
      </Part>
      ...
      <PartsCount>integer</PartsCount>
   </ObjectParts>
   <StorageClass>string</StorageClass>
   <ObjectSize>long</ObjectSize>
</GetObjectAttributesOutput>

3.4.2.4. Get response headers

Name

Description

last modified

The creation date of the object.

x-amz-delete-marker

Specifies whether the object retrieved was (true) or was not (false) a delete marker. If false, this response header does not appear in the response.

x-amz-request-charged

If present, indicates that the requester was successfully charged for the request.

x-amz-version-id

The version ID of the object.

GetObjectAttributesOutput

TRoot level tag for the GetObjectAttributesOutput parameters.

Checksum

The checksum or digest of the object. ChecksumCRC32 (string) The base64-encoded, 32-bit CRC-32 checksum of the object. This will only be present if it was uploaded with the object. When you use an API operation on an object that was uploaded using multipart uploads, this value may not be a direct checksum value of the full object. Instead, it’s a calculation based on the checksum values of each individual part. For more information about how checksums are calculated with multipart uploads, see Checking object integrity in the Amazon S3 User Guide . ChecksumCRC32C (string)

The base64-encoded, 32-bit CRC-32C checksum of the object. This will only be present if it was uploaded with the object. When you use an API operation on an object that was uploaded using multipart uploads, this value may not be a direct checksum value of the full object. Instead, it’s a calculation based on the checksum values of each individual part. For more information about how checksums are calculated with multipart uploads, see Checking object integrity in the Amazon S3 User Guide .

ChecksumSHA1 (string)

The base64-encoded, 160-bit SHA-1 digest of the object. This will only be present if it was uploaded with the object. When you use the API operation on an object that was uploaded using multipart uploads, this value may not be a direct checksum value of the full object. Instead, it’s a calculation based on the checksum values of each individual part. For more information about how checksums are calculated with multipart uploads, see Checking object integrity in the Amazon S3 User Guide .

ChecksumSHA256 (string)

The base64-encoded, 256-bit SHA-256 digest of the object. This will only be present if it was uploaded with the object. When you use an API operation on an object that was uploaded using multipart uploads, this value may not be a direct checksum value of the full object. Instead, it’s a calculation based on the checksum values of each individual part. For more information about how checksums are calculated with multipart uploads, see Checking object integrity in the Amazon S3 User Guide .

ObjectParts

The creation date of the object.A collection of parts associated with a multipart upload. ObjectParts (structure)

A collection of parts associated with a multipart upload.

TotalPartsCount (integer)

The total number of parts.

PartNumberMarker (integer)

The marker for the current part.

NextPartNumberMarker (integer)

When a list is truncated, this element specifies the last part in the list, as well as the value to use for the PartNumberMarker request parameter in a subsequent request.

MaxParts (integer)

The maximum number of parts allowed in the response.

IsTruncated (boolean)

Indicates whether the returned list of parts is truncated. A value of true indicates that the list was truncated. A list can be truncated if the number of parts exceeds the limit returned in the MaxParts element.

Parts (list)

A container for elements related to a particular part. A response can contain zero or more Parts elements. Note

General purpose buckets - For GetObjectAttributes , if a additional checksum (including x-amz-checksum-crc32 , x-amz-checksum-crc32c , x-amz-checksum-sha1 , or x-amz-checksum-sha256 ) isn’t applied to the object specified in the request, the response doesn’t return Part . Directory buckets - For GetObjectAttributes , no matter whether a additional checksum is applied to the object specified in the request, the response returns Part .

(structure)

A container for elements related to an individual part.

PartNumber (integer)

The part number identifying the part. This value is a positive integer between 1 and 10,000.

Size (long)

The size of the uploaded part in bytes.

ChecksumCRC32 (string)

This header can be used as a data integrity check to verify that the data received is the same data that was originally sent. This header specifies the base64-encoded, 32-bit CRC-32 checksum of the object. For more information, see Checking object integrity in the Amazon S3 User Guide .

ChecksumCRC32C (string)

The base64-encoded, 32-bit CRC-32C checksum of the object. This will only be present if it was uploaded with the object. When you use an API operation on an object that was uploaded using multipart uploads, this value may not be a direct checksum value of the full object. Instead, it’s a calculation based on the checksum values of each individual part. For more information about how checksums are calculated with multipart uploads, see Checking object integrity in the Amazon S3 User Guide .

ChecksumSHA1 (string)

The base64-encoded, 160-bit SHA-1 digest of the object. This will only be present if it was uploaded with the object. When you use the API operation on an object that was uploaded using multipart uploads, this value may not be a direct checksum value of the full object. Instead, it’s a calculation based on the checksum values of each individual part. For more information about how checksums are calculated with multipart uploads, see Checking object integrity in the Amazon S3 User Guide .

ChecksumSHA256 (string)

The base64-encoded, 256-bit SHA-256 digest of the object. This will only be present if it was uploaded with the object. When you use an API operation on an object that was uploaded using multipart uploads, this value may not be a direct checksum value of the full object. Instead, it is a calculation based on the checksum values of each individual part. For more information about how checksums are calculated with multipart uploads, see Checking object integrity in the Amazon S3 User Guide .

ObjectSize

The size of the object in bytes.

StorageClass

Provides the storage class information of the object. Amazon S3 returns this header for all objects except for S3 Standard storage class objects.

3.4.3. Retrieve sync replication Headers of object

Returns information about an object. This request will return the same header information as with the Get Object request, but will include the metadata only, not the object data payload.

Retrieves the current version of the object:

Syntax

HEAD /BUCKET/OBJECT HTTP/1.1

Add the versionId subresource to retrieve info for a particular version:

Syntax

HEAD /BUCKET/OBJECT?versionId=VERSION_ID HTTP/1.1

Request Headers

range
Description
The range of the object to retrieve.
Valid Values
Range:bytes=beginbyte-endbyte
Required
No
if-modified-since
Description
Gets only if modified since the timestamp.
Valid Values
Timestamp
Required
No
if-match
Description
Gets only if object ETag matches ETag.
Valid Values
Entity Tag
Required
No
if-none-match
Description
Gets only if object ETag matches ETag.
Valid Values
Entity Tag
Required
No

Response Headers

x-amz-version-id
Description
Returns the version ID or null.
x-rgw-replicated-from
Description
Returns the source zone and any intermediate zones involved in an object’s replication path within a Ceph multi-zone environment. This header is included in GetObject and HeadObject responses.
x-rgw-replicated-at
Description
Returns a timestamp indicating when the object was replicated to its current location. You can calculate the duration for replication to complete by using this header with Last-Modified header.
Note

As of now, x-rgw-replicated-from and x-rgw-replicated-at are supported by client tools like s3cmd or curl verify at the replicated zone. These tools can be used in addition to radosgw-admin command for verification. With radosgw-admin object stat we have a known issue BZ-2312552 of missing header key x-rgw-replicated-from.

3.4.4. S3 put object lock

The put object lock API places a lock configuration on the selected bucket. With object lock, you can store objects using a Write-Once-Read-Many (WORM) model. Object lock ensures an object is not deleted or overwritten, for a fixed amount of time or indefinitely. The rule specified in the object lock configuration is applied by default to every new object placed in the selected bucket.

Important

Enable the object lock when creating a bucket otherwise, the operation fails.

Syntax

PUT /BUCKET?object-lock HTTP/1.1

Example

PUT /testbucket?object-lock HTTP/1.1

Request Entities

ObjectLockConfiguration
Description
A container for the request.
Type
Container
Required
Yes
ObjectLockEnabled
Description
Indicates whether this bucket has an object lock configuration enabled.
Type
String
Required
Yes
Rule
Description
The object lock rule in place for the specified bucket.
Type
Container
Required
No
DefaultRetention
Description
The default retention period applied to new objects placed in the specified bucket.
Type
Container
Required
No
Mode
Description
The default object lock retention mode. Valid values: GOVERNANCE/COMPLIANCE.
Type
Container
Required
Yes
Days
Description
The number of days specified for the default retention period.
Type
Integer
Required
No
Years
Description
The number of years specified for the default retention period.
Type
Integer
Required
No

HTTP Response

400
Status Code
MalformedXML
Description
The XML is not well-formed.
409
Status Code
InvalidBucketState
Description
The bucket object lock is not enabled.

Additional Resources

  • For more information about this API call, see S3 API.

3.4.5. S3 get object lock

The get object lock API retrieves the lock configuration for a bucket.

Syntax

GET /BUCKET?object-lock HTTP/1.1

Example

GET /testbucket?object-lock HTTP/1.1

Response Entities

ObjectLockConfiguration
Description
A container for the request.
Type
Container
Required
Yes
ObjectLockEnabled
Description
Indicates whether this bucket has an object lock configuration enabled.
Type
String
Required
Yes
Rule
Description
The object lock rule is in place for the specified bucket.
Type
Container
Required
No
DefaultRetention
Description
The default retention period applied to new objects placed in the specified bucket.
Type
Container
Required
No
Mode
Description
The default object lock retention mode. Valid values: GOVERNANCE/COMPLIANCE.
Type
Container
Required
Yes
Days
Description
The number of days specified for the default retention period.
Type
Integer
Required
No
Years
Description
The number of years specified for the default retention period.
Type
Integer
Required
No

Additional Resources

  • For more information about this API call, see S3 API.

3.4.8. S3 put object retention

The put object retention API places an object retention configuration on an object. A retention period protects an object version for a fixed amount of time. There are two modes: GOVERNANCE and COMPLIANCE. These two retention modes apply different levels of protection to your objects.

Note

During this period, your object is Write-Once-Read-Many-protected (WORM-protected) and cannot be overwritten or deleted.

Syntax

PUT /BUCKET/OBJECT?retention&versionId= HTTP/1.1

Example

PUT /testbucket/testobject?retention&versionId= HTTP/1.1

The versionId sub-resource retrieves a particular version of the object.

Request Entities

Retention
Description
A container for the request.
Type
Container
Required
Yes
Mode
Description
Retention mode for the specified object. Valid values: GOVERNANCE, COMPLIANCE.
Type
String
Required
Yes
RetainUntilDate
Description
Retention date.
Format
2020-01-05T00:00:00.000Z
Type
Timestamp
Required
Yes

Additional Resources

  • For more information about this API call, see S3 API.

3.4.9. S3 get object retention

The get object retention API retrieves an object retention configuration on an object.

Syntax

GET /BUCKET/OBJECT?retention&versionId= HTTP/1.1

Example

GET /testbucket/testobject?retention&versionId= HTTP/1.1

The versionId subresource retrieves a particular version of the object.

Response Entities

Retention
Description
A container for the request.
Type
Container
Required
Yes
Mode
Description
Retention mode for the specified object. Valid values: GOVERNANCE/COMPLIANCE
Type
String
Required
Yes
RetainUntilDate
Description
Retention date. Format: 2020-01-05T00:00:00.000Z
Type
Timestamp
Required
Yes

Additional Resources

  • For more information about this API call, see S3 API.

3.4.10. S3 put object tagging

The put object tagging API associates tags with an object. A tag is a key-value pair. To put tags of any other version, use the versionId query parameter. You must have permission to perform the s3:PutObjectTagging action. By default, the bucket owner has this permission and can grant this permission to others.

Syntax

PUT /BUCKET/OBJECT?tagging&versionId= HTTP/1.1

Example

PUT /testbucket/testobject?tagging&versionId= HTTP/1.1

Request Entities

Tagging
Description
A container for the request.
Type
Container
Required
Yes
TagSet
Description
A collection of a set of tags.
Type
String
Required
Yes

Additional Resources

  • For more information about this API call, see S3 API.

3.4.11. S3 get object tagging

The get object tagging API returns the tag of an object. By default, the GET operation returns information on the current version of an object.

Note

For a versioned bucket, you can have multiple versions of an object in your bucket. To retrieve tags of any other version, add the versionId query parameter in the request.

Syntax

GET /BUCKET/OBJECT?tagging&versionId= HTTP/1.1

Example

GET /testbucket/testobject?tagging&versionId= HTTP/1.1

Additional Resources

  • For more information about this API call, see S3 API.

3.4.12. S3 delete object tagging

The delete object tagging API removes the entire tag set from the specified object. You must have permission to perform the s3:DeleteObjectTagging action, to use this operation.

Note

To delete tags of a specific object version, add the versionId query parameter in the request.

Syntax

DELETE /BUCKET/OBJECT?tagging&versionId= HTTP/1.1

Example

DELETE /testbucket/testobject?tagging&versionId= HTTP/1.1

Additional Resources

  • For more information about this API call, see S3 API.

3.4.13. S3 add an object to a bucket

Adds an object to a bucket. You must have write permissions on the bucket to perform this operation.

Syntax

PUT /BUCKET/OBJECT HTTP/1.1

Request Headers

content-md5
Description
A base64 encoded MD-5 hash of the message.
Valid Values
A string. No defaults or constraints.
Required
No
content-type
Description
A standard MIME type.
Valid Values
Any MIME type. Default: binary/octet-stream.
Required
No
x-amz-meta-<…​>*
Description
User metadata. Stored with the object.
Valid Values
A string up to 8kb. No defaults.
Required
No
x-amz-acl
Description
A canned ACL.
Valid Values
private, public-read, public-read-write, authenticated-read
Required
No

Response Headers

x-amz-version-id
Description
Returns the version ID or null.

3.4.14. S3 delete an object

Removes an object. Requires WRITE permission set on the containing bucket.

Deletes an object. If object versioning is on, it creates a marker.

Syntax

DELETE /BUCKET/OBJECT HTTP/1.1

To delete an object when versioning is on, you must specify the versionId subresource and the version of the object to delete.

DELETE /BUCKET/OBJECT?versionId=VERSION_ID HTTP/1.1

3.4.15. S3 delete multiple objects

This API call deletes multiple objects from a bucket.

Syntax

POST /BUCKET/OBJECT?delete HTTP/1.1

3.4.16. S3 get an object’s Access Control List (ACL)

Returns the ACL for the current version of the object:

Syntax

GET /BUCKET/OBJECT?acl HTTP/1.1

Add the versionId subresource to retrieve the ACL for a particular version:

Syntax

GET /BUCKET/OBJECT?versionId=VERSION_ID&acl HTTP/1.1

Response Headers

x-amz-version-id
Description
Returns the version ID or null.

Response Entities

AccessControlPolicy
Description
A container for the response.
Type
Container
AccessControlList
Description
A container for the ACL information.
Type
Container
Owner
Description
A container for the bucket owner’s ID and DisplayName.
Type
Container
ID
Description
The bucket owner’s ID.
Type
String
DisplayName
Description
The bucket owner’s display name.
Type
String
Grant
Description
A container for Grantee and Permission.
Type
Container
Grantee
Description
A container for the DisplayName and ID of the user receiving a grant of permission.
Type
Container
Permission
Description
The permission given to the Grantee bucket.
Type
String

3.4.17. S3 set an object’s Access Control List (ACL)

Sets an object ACL for the current version of the object.

Syntax

PUT /BUCKET/OBJECT?acl

Request Entities

AccessControlPolicy
Description
A container for the response.
Type
Container
AccessControlList
Description
A container for the ACL information.
Type
Container
Owner
Description
A container for the bucket owner’s ID and DisplayName.
Type
Container
ID
Description
The bucket owner’s ID.
Type
String
DisplayName
Description
The bucket owner’s display name.
Type
String
Grant
Description
A container for Grantee and Permission.
Type
Container
Grantee
Description
A container for the DisplayName and ID of the user receiving a grant of permission.
Type
Container
Permission
Description
The permission given to the Grantee bucket.
Type
String

3.4.18. S3 copy an object

To copy an object, use PUT and specify a destination bucket and the object name.

Syntax

PUT /DEST_BUCKET/DEST_OBJECT HTTP/1.1
x-amz-copy-source: SOURCE_BUCKET/SOURCE_OBJECT

Request Headers

x-amz-copy-source
Description
The source bucket name + object name.
Valid Values
BUCKET/OBJECT
Required
Yes
x-amz-acl
Description
A canned ACL.
Valid Values
private, public-read, public-read-write, authenticated-read
Required
No
x-amz-copy-if-modified-since
Description
Copies only if modified since the timestamp.
Valid Values
Timestamp
Required
No
x-amz-copy-if-unmodified-since
Description
Copies only if unmodified since the timestamp.
Valid Values
Timestamp
Required
No
x-amz-copy-if-match
Description
Copies only if object ETag matches ETag.
Valid Values
Entity Tag
Required
No
x-amz-copy-if-none-match
Description
Copies only if object ETag matches ETag.
Valid Values
Entity Tag
Required
No

Response Entities

CopyObjectResult
Description
A container for the response elements.
Type
Container
LastModified
Description
The last modified date of the source object.
Type
Date
Etag
Description
The ETag of the new object.
Type
String

3.4.19. S3 add an object to a bucket using HTML forms

Adds an object to a bucket using HTML forms. You must have write permissions on the bucket to perform this operation.

Syntax

POST /BUCKET/OBJECT HTTP/1.1

3.4.20. S3 determine options for a request

A preflight request to determine if an actual request can be sent with the specific origin, HTTP method, and headers.

Syntax

OPTIONS /OBJECT HTTP/1.1

3.4.21. S3 initiate a multipart upload

Initiates a multi-part upload process. Returns a UploadId, which you can specify when adding additional parts, listing parts, and completing or abandoning a multi-part upload.

Syntax

POST /BUCKET/OBJECT?uploads

Request Headers

content-md5
Description
A base64 encoded MD-5 hash of the message.
Valid Values
A string. No defaults or constraints.
Required
No
content-type
Description
A standard MIME type.
Valid Values
Any MIME type. Default: binary/octet-stream
Required
No
x-amz-meta-<…​>
Description
User metadata. Stored with the object.
Valid Values
A string up to 8kb. No defaults.
Required
No
x-amz-acl
Description
A canned ACL.
Valid Values
private, public-read, public-read-write, authenticated-read
Required
No

Response Entities

InitiatedMultipartUploadsResult
Description
A container for the results.
Type
Container
Bucket
Description
The bucket that will receive the object contents.
Type
String
Key
Description
The key specified by the key request parameter, if any.
Type
String
UploadId
Description
The ID specified by the upload-id request parameter identifying the multipart upload, if any.
Type
String

3.4.22. S3 add a part to a multipart upload

Adds a part to a multi-part upload.

Specify the uploadId subresource and the upload ID to add a part to a multi-part upload:

Syntax

PUT /BUCKET/OBJECT?partNumber=&uploadId=UPLOAD_ID HTTP/1.1

The following HTTP response might be returned:

HTTP Response

404
Status Code
NoSuchUpload
Description
Specified upload-id does not match any initiated upload on this object.

3.4.23. S3 list the parts of a multipart upload

Specify the uploadId subresource and the upload ID to list the parts of a multi-part upload:

Syntax

GET /BUCKET/OBJECT?uploadId=UPLOAD_ID HTTP/1.1

Response Entities

InitiatedMultipartUploadsResult
Description
A container for the results.
Type
Container
Bucket
Description
The bucket that will receive the object contents.
Type
String
Key
Description
The key specified by the key request parameter, if any.
Type
String
UploadId
Description
The ID specified by the upload-id request parameter identifying the multipart upload, if any.
Type
String
Initiator
Description
Contains the ID and DisplayName of the user who initiated the upload.
Type
Container
ID
Description
The initiator’s ID.
Type
String
DisplayName
Description
The initiator’s display name.
Type
String
Owner
Description
A container for the ID and DisplayName of the user who owns the uploaded object.
Type
Container
StorageClass
Description
The method used to store the resulting object. STANDARD or REDUCED_REDUNDANCY
Type
String
PartNumberMarker
Description
The part marker to use in a subsequent request if IsTruncated is true. Precedes the list.
Type
String
NextPartNumberMarker
Description
The next part marker to use in a subsequent request if IsTruncated is true. The end of the list.
Type
String
IsTruncated
Description
If true, only a subset of the object’s upload contents were returned.
Type
Boolean
Part
Description
A container for Key, Part, InitiatorOwner, StorageClass, and Initiated elements.
Type
Container
PartNumber
Description
A container for Key, Part, InitiatorOwner, StorageClass, and Initiated elements.
Type
Integer
ETag
Description
The part’s entity tag.
Type
String
Size
Description
The size of the uploaded part.
Type
Integer

3.4.24. S3 assemble the uploaded parts

Assembles uploaded parts and creates a new object, thereby completing a multipart upload.

Specify the uploadId subresource and the upload ID to complete a multi-part upload:

Syntax

POST /BUCKET/OBJECT?uploadId=UPLOAD_ID HTTP/1.1

Request Entities

CompleteMultipartUpload
Description
A container consisting of one or more parts.
Type
Container
Required
Yes
Part
Description
A container for the PartNumber and ETag.
Type
Container
Required
Yes
PartNumber
Description
The identifier of the part.
Type
Integer
Required
Yes
ETag
Description
The part’s entity tag.
Type
String
Required
Yes

Response Entities

CompleteMultipartUploadResult
Description
A container for the response.
Type
Container
Location
Description
The resource identifier (path) of the new object.
Type
URI
bucket
Description
The name of the bucket that contains the new object.
Type
String
Key
Description
The object’s key.
Type
String
ETag
Description
The entity tag of the new object.
Type
String

3.4.25. S3 copy a multipart upload

Uploads a part by copying data from an existing object as data source.

Specify the uploadId subresource and the upload ID to perform a multi-part upload copy:

Syntax

PUT /BUCKET/OBJECT?partNumber=PartNumber&uploadId=UPLOAD_ID HTTP/1.1
Host: cname.domain.com

Authorization: AWS ACCESS_KEY:HASH_OF_HEADER_AND_SECRET

Request Headers

x-amz-copy-source
Description
The source bucket name and object name.
Valid Values
BUCKET/OBJECT
Required
Yes
x-amz-copy-source-range
Description
The range of bytes to copy from the source object.
Valid Values
Range: bytes=first-last, where the first and last are the zero-based byte offsets to copy. For example,bytes=0-9 indicates that you want to copy the first ten bytes of the source.
Required
No

Response Entities

CopyPartResult
Description
A container for all response elements.
Type
Container
ETag
Description
Returns the ETag of the new part.
Type
String
LastModified
Description
Returns the date the part was last modified.
Type
String

Additional Resources

3.4.26. S3 abort a multipart upload

Aborts a multipart upload.

Specify the uploadId subresource and the upload ID to abort a multi-part upload:

Syntax

DELETE /BUCKET/OBJECT?uploadId=UPLOAD_ID HTTP/1.1

3.4.27. S3 Hadoop interoperability

For data analytics applications that require Hadoop Distributed File System (HDFS) access, the Ceph Object Gateway can be accessed using the Apache S3A connector for Hadoop. The S3A connector is an open-source tool that presents S3 compatible object storage as an HDFS file system with HDFS file system read and write semantics to the applications while data is stored in the Ceph Object Gateway.

Ceph Object Gateway is fully compatible with the S3A connector that ships with Hadoop 2.7.3.

Additional Resources

3.5. S3 select operations

As a developer, you can run S3 select to accelerate throughput. Users can run S3 select queries directly without a mediator.

There are three S3 select workflow - CSV, Apache Parquet (Parquet), and JSON that provide S3 select operations with CSV, Parquet, and JSON objects:

  • A CSV file stores tabular data in plain text format. Each line of the file is a data record.
  • Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides highly efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk. Parquet enables the S3 select-engine to skip columns and chunks, thereby reducing IOPS dramatically (contrary to CSV and JSON format).
  • JSON is a format structure. The S3 select engine enables the use of SQL statements on top of the JSON format input data using the JSON reader, enabling the scanning of highly nested and complex JSON formatted data.

For example, a CSV, Parquet, or JSON S3 object with several gigabytes of data allows the user to extract a single column which is filtered by another column using the following query:

Example

select customerid from s3Object where age>30 and age<65;

Currently, the S3 object must retrieve data from the Ceph OSD through the Ceph Object Gateway before filtering and extracting data. There is improved performance when the object is large and the query is more specific. The Parquet format can be processed more efficiently than CSV.

Prerequisites

  • A running Red Hat Ceph Storage cluster.
  • A RESTful client.
  • A S3 user created with user access.

3.5.1. S3 select content from an object

The select object content API filters the content of an object through the structured query language (SQL). See the Metadata collected by inventory section in the AWS Systems Manager User Guide for an example of the description of what should reside in the inventory object. The inventory content impacts the type of queries that should be run against that inventory. The number of SQL statements that potentially could provide essential information is large, but S3 select is an SQL-like utility and therefore, some operators are not supported, such as group-by and join.

For CSV only, you must specify the data serialization format as comma-separated values of the object to retrieve the specified content. Parquet has no delimiter because it is in binary format. Amazon Web Services (AWS) command-line interface (CLI) select object content uses the CSV or Parquet format to parse object data into records and returns only the records specified in the query.

You must specify the data serialization format for the response. You must have s3:GetObject permission for this operation.

Note
  • The InputSerialization element describes the format of the data in the object that is being queried. Objects can be in CSV or Parquet format.
  • The OutputSerialization element is part of the AWS-CLI user client and describes how the output data is formatted. Ceph has implemented the server client for AWS-CLI and therefore, provides the same output according to OutputSerialization which currently is CSV only.
  • The format of the InputSerialization does not need to match the format of the OutputSerialization. So, for example, you can specify Parquet in the InputSerialization and CSV in the OutputSerialization.

Syntax

POST /BUCKET/KEY?select&select-type=2 HTTP/1.1\r\n

Example

POST /testbucket/sample1csv?select&select-type=2 HTTP/1.1\r\n
POST /testbucket/sample1parquet?select&select-type=2 HTTP/1.1\r\n

Request entities

Bucket
Description
The bucket to select object content from.
Type
String
Required
Yes
Key
Description
The object key.
Length Constraints
Minimum length of 1.
Type
String
Required
Yes
SelectObjectContentRequest
Description
Root level tag for the select object content request parameters.
Type
String
Required
Yes
Expression
Description
The expression that is used to query the object.
Type
String
Required
Yes
ExpressionType
Description
The type of the provided expression for example SQL.
Type
String
Valid Values
SQL
Required
Yes
InputSerialization
Description
Describes the format of the data in the object that is being queried.
Type
String
Required
Yes
OutputSerialization
Description
Format of data returned in comma separator and new-line.
Type
String
Required
Yes

Response entities

If the action is successful, the service sends back HTTP 200 response. Data is returned in XML format by the service:

Payload
Description
Root level tag for the payload parameters.
Type
String
Required
Yes
Records
Description
The records event.
Type
Base64-encoded binary data object
Required
No
Stats
Description
The stats event.
Type
Long
Required
No

The Ceph Object Gateway supports the following response:

Example

{:event-type,records} {:content-type,application/octet-stream} {:message-type,event}

Syntax (for CSV)

aws --endpoint-URL http://localhost:80 s3api select-object-content
 --bucket BUCKET_NAME
 --expression-type 'SQL'
 --input-serialization
 '{"CSV": {"FieldDelimiter": "," , "QuoteCharacter": "\"" , "RecordDelimiter" : "\n" , "QuoteEscapeCharacter" : "\\" , "FileHeaderInfo": "USE" }, "CompressionType": "NONE"}'
 --output-serialization '{"CSV": {}}'
 --key OBJECT_NAME.csv
 --expression "select count(0) from s3object where int(_1)<10;" output.csv

Example (for CSV)

aws --endpoint-url http://localhost:80 s3api select-object-content
 --bucket testbucket
 --expression-type 'SQL'
 --input-serialization
 '{"CSV": {"FieldDelimiter": "," , "QuoteCharacter": "\"" , "RecordDelimiter" : "\n" , "QuoteEscapeCharacter" : "\\" , "FileHeaderInfo": "USE" }, "CompressionType": "NONE"}'
 --output-serialization '{"CSV": {}}'
 --key testobject.csv
 --expression "select count(0) from s3object where int(_1)<10;" output.csv

Syntax (for Parquet)

aws --endpoint-url http://localhost:80 s3api select-object-content
 --bucket BUCKET_NAME
 --expression-type 'SQL'
 --input-serialization
 '{"Parquet": {}, {"CompressionType": "NONE"}'
 --output-serialization '{"CSV": {}}'
 --key OBJECT_NAME.parquet
 --expression "select count(0) from s3object where int(_1)<10;" output.csv

Example (for Parquet)

aws --endpoint-url http://localhost:80 s3api select-object-content
 --bucket testbucket
 --expression-type 'SQL'
 --input-serialization
 '{"Parquet": {}, {"CompressionType": "NONE"}'
 --output-serialization '{"CSV": {}}'
 --key testobject.parquet
 --expression "select count(0) from s3object where int(_1)<10;" output.csv

Syntax (for JSON)

aws --endpoint-URL http://localhost:80 s3api select-object-content
 --bucket BUCKET_NAME
 --expression-type 'SQL'
 --input-serialization
 '{"JSON": {"CompressionType": "NONE"}'
 --output-serialization '{"CSV": {}}}'
 --key OBJECT_NAME.json
 --expression "select count(0) from s3object where int(_1)<10;" output.csv

Example (for JSON)

aws --endpoint-url http://localhost:80 s3api select-object-content
 --bucket testbucket
 --expression-type 'SQL'
 --input-serialization
 '{"JSON": {"CompressionType": "NONE"}'
 --output-serialization '{"CSV": {}}}'
 --key testobject.json
 --expression "select count(0) from s3object where int(_1)<10;" output.csv

Example (for BOTO3)

import pprint
import boto3
from botocore.exceptions import ClientError

def run_s3select(bucket,key,query,column_delim=",",row_delim="\n",quot_char='"',esc_char='\\',csv_header_info="NONE"):

   s3 = boto3.client('s3',
       endpoint_url=endpoint,
       aws_access_key_id=access_key,
       region_name=region_name,
       aws_secret_access_key=secret_key)

   result = ""
   try:
       r = s3.select_object_content(
       Bucket=bucket,
       Key=key,
       ExpressionType='SQL',
       InputSerialization = {"CSV": {"RecordDelimiter" : row_delim, "FieldDelimiter" : column_delim,"QuoteEscapeCharacter": esc_char, "QuoteCharacter": quot_char, "FileHeaderInfo": csv_header_info}, "CompressionType": "NONE"},
       OutputSerialization = {"CSV": {}},
       Expression=query,
       RequestProgress = {"Enabled": progress})

   except ClientError as c:
       result += str(c)
       return result

   for event in r['Payload']:
           if 'Records' in event:
               result = ""
               records = event['Records']['Payload'].decode('utf-8')
               result += records
           if 'Progress' in event:
               print("progress")
               pprint.pprint(event['Progress'],width=1)
           if 'Stats' in event:
               print("Stats")
               pprint.pprint(event['Stats'],width=1)
           if 'End' in event:
               print("End")
               pprint.pprint(event['End'],width=1)

   return result




 run_s3select(
 "my_bucket",
 "my_csv_object",
 "select int(_1) as a1, int(_2) as a2 , (a1+a2) as a3 from s3object where a3>100 and a3<300;")

Supported features

Currently, only part of the AWS s3 select command is supported:

FeaturesDetailsDescriptionExample

Arithmetic operators

^ * % / + - ( )

 

select (int(_1)+int(_2))*int(_9) from s3object;

Arithmetic operators

% modulo

 

select count(*) from s3object where cast(_1 as int)%2 = 0;

Arithmetic operators

^ power-of

 

select cast(2^10 as int) from s3object;

Compare operators

> < >= ⇐ == !=

 

select _1,_2 from s3object where (int(_1)+int(_3))>int(_5);

logical operator

AND OR NOT

 

select count(*) from s3object where not (int(1)>123 and int(_5)<200);

logical operator

is null

Returns true/false for null indication in expression

 

logical operator and NULL

is not null

Returns true/false for null indication in expression

 

logical operator and NULL

unknown state

Review null-handle and observe the results of logical operations with NULL. The query returns 0.

select count(*) from s3object where null and (3>2);

Arithmetic operator with NULL

unknown state

Review null-handle and observe the results of binary operations with NULL. The query returns 0.

select count(*) from s3object where (null+1) and (3>2);

Compare with NULL

unknown state

Review null-handle and observe results of compare operations with NULL. The query returns 0.

select count(*) from s3object where (null*1.5) != 3;

missing column

unknown state

 

select count(*) from s3object where _1 is null;

projection column

Similar to if or then or else

 

select case when (1+1==(2+1)*3) then 'case_1' when 4*3)==(12 then 'case_2' else 'case_else' end, age*2 from s3object;

projection column

Similar to switch/case default

 

select case cast(_1 as int) + 1 when 2 then “a” when 3 then “b” else “c” end from s3object;

logical operator

 

coalesce returns first non-null argument

select coalesce(nullif(5,5),nullif(1,1.0),age+12) from s3object;

logical operator

 

nullif returns null in case both arguments are equal, or else the first one,nullif(1,1)=NULL nullif(null,1)=NULL nullif(2,1)=2

select nullif(cast(_1 as int),cast(_2 as int)) from s3object;

logical operator

 

{expression} in ( .. {expression} ..)

select count(*) from s3object where 'ben' in (trim(_5),substring(_1,char_length(_1)-3,3),last_name);

logical operator

 

{expression} between {expression} and {expression}

select _1 from s3object where cast(_1 as int) between 800 and 900; select count(*) from stdin where substring(_3,char_length(_3),1) between “x” and trim(_1) and substring(_3,char_length(_3)-1,1) = “:”;

logical operator

 

{expression} like {match-pattern}

select count() from s3object where first_name like '%de_'; select count() from s3object where _1 like "%a[r-s];

casting operator

  

select cast(123 as int)%2 from s3object;

casting operator

  

select cast(123.456 as float)%2 from s3object;

casting operator

  

select cast('ABC0-9' as string),cast(substr('ab12cd',3,2) as int)*4 from s3object;

casting operator

  

select cast(substring('publish on 2007-01-01',12,10) as timestamp) from s3object;

non AWS casting operator

  

select int(_1),int( 1.2 + 3.4) from s3object;

non AWS casting operator

  

select float(1.2) from s3object;

non AWS casting operator

  

select to_timestamp('1999-10-10T12:23:44Z') from s3object;

Aggregation Function

sun

 

select sum(int(_1)) from s3object;

Aggregation Function

avg

 

select avg(cast(_1 as float) + cast(_2 as int)) from s3object;

Aggregation Function

min

 

select avg(cast(_1 a float) + cast(_2 as int)) from s3object;

Aggregation Function

max

 

select max(float(_1)),min(int(_5)) from s3object;

Aggregation Function

count

 

select count(*) from s3object where (int(1)+int(_3))>int(_5);

Timestamp Functions

extract

 

select count(*) from s3object where extract(year from to_timestamp(_2)) > 1950 and extract(year from to_timestamp(_1)) < 1960;

Timestamp Functions

dateadd

 

select count(0) from s3object where date_diff(year,to_timestamp(_1),date_add(day,366,to_timestamp(_1))) = 1;

Timestamp Functions

datediff

 

select count(0) from s3object where date_diff(month,to_timestamp(_1),to_timestamp(_2)) = 2;

Timestamp Functions

utcnow

 

select count(0) from s3object where date_diff(hour,utcnow(),date_add(day,1,utcnow())) = 24

Timestamp Functions

to_string

 

select to_string( to_timestamp(“2009-09-17T17:56:06.234567Z”), “yyyyMMdd-H:m:s”) from s3object;

String Functions

substring

 

select count(0) from s3object where int(substring(_1,1,4))>1950 and int(substring(_1,1,4))<1960;

String Functions

substring

substring with from negative number is valid considered as first

select substring(“123456789” from -4) from s3object;

String Functions

substring

substring with from zero for out-of-bound number is valid just as (first,last)

select substring(“123456789” from 0 for 100) from s3object;

String Functions

trim

 

select trim(' foobar ') from s3object;

String Functions

trim

 

select trim(trailing from ' foobar ') from s3object;

String Functions

trim

 

select trim(leading from ' foobar ') from s3object;

String Functions

trim

 

select trim(both '12' from '1112211foobar22211122') from s3object;

String Functions

lower or upper

 

select lower('ABcD12#$e') from s3object;

String Functions

char_length, character_length

 

select count(*) from s3object where char_length(_3)=3;

Complex queries

  

select sum(cast(_1 as int)),max(cast(_3 as int)), substring('abcdefghijklm', (2-1)*3+sum(cast(_1 as int))/sum(cast(_1 as int))+1, (count() + count(0))/count(0)) from s3object;

alias support

  

select int(_1) as a1, int(_2) as a2 , (a1+a2) as a3 from s3object where a3>100 and a3<300;

Additional Resources

3.5.2. S3 supported select functions

S3 select supports the following functions: .Timestamp

to_timestamp(string)
Description
Converts string to timestamp basic type. In the string format, any missing 'time' value is populated with zero; for missing month and day value, 1 is the default value. 'Timezone' is in format +/-HH:mm or Z , where the letter 'Z' indicates Coordinated Universal Time (UTC). Value of timezone can range between - 12:00 and +14:00.
Supported

Currently it can convert the following string formats into timestamp:

  • YYYY-MM-DDTHH:mm:ss.SSSSSS+/-HH:mm
  • YYYY-MM-DDTHH:mm:ss.SSSSSSZ
  • YYYY-MM-DDTHH:mm:ss+/-HH:mm
  • YYYY-MM-DDTHH:mm:ssZ
  • YYYY-MM-DDTHH:mm+/-HH:mm
  • YYYY-MM-DDTHH:mmZ
  • YYYY-MM-DDT
  • YYYYT
to_string(timestamp, format_pattern)
Description
Returns a string representation of the input timestamp in the given input string format.
Parameters
FormatExampleDescription

yy

69

2-year digit.

y

1969

4-year digit.

yyyy

1969

Zero-padded 4-digit year.

M

1

Month of the year.

MM

01

Zero-padded month of the year.

MMM

Jan

Abbreviated month of the year name.

MMMM

January

full month of the year name.

MMMMM

J

Month of the year first letter. Not valid for use with the to_timestamp function.

d

2

Day of the month (1-31).

dd

02

Zero-padded day of the month (01-31).

a

AM

AM or PM of day.

h

3

Hour of the day (1-12).

hh

03

Zero-padded hour of day (01-12).

H

3

Hour of the day (0-23).

HH

03

Zero-padded hour of the day (00-23).

m

4

Minute of the hour (0-59).

mm

04

Zero-padded minute of the hour (00-59).

s

5

Second of the minute (0-59).

ss

05

Zero-padded second of the minute (00-59).

S

1

Fraction of the second (precision: 0.1, range: 0.0-0.9).

SS

12

Fraction of the second (precision: 0.01, range: 0.0-0.99).

SSS

123

Fraction of the second (precision: 0.01, range: 0.0-0.999).

SSSS

1234

Fraction of the second (precision: 0.001, range: 0.0-0.9999).

SSSSSS

123456

Fraction of the second (maximum precision: 1 nanosecond, range: 0.0-0.999999).

n

60000000

Nano of second.

X

+07 or Z

Offset in hours or “Z” if the offset is 0.

XX or XXXX

+0700 or Z

Offset in hours and minutes or “Z” if the offset is 0.

XXX or XXXXX

+07:00 or Z

Offset in hours and minutes or “Z” if the offset is 0.

x

7

Offset in hours.

xx or xxxx

700

Offset in hours and minutes.

xxx or xxxxx

+07:00

Offset in hours and minutes.

extract(date-part from timestamp)
Description
Returns integer according to date-part extract from input timestamp.
Supported
year, month, week, day, hour, minute, second, timezone_hour, timezone_minute.
date_add(date-part ,integer,timestamp)
Description
Returns timestamp, a calculation based on the results of input timestamp and date-part.
Supported
year, month, day, hour, minute, second.
date_diff(date-part,timestamp,timestamp)
Description
Return an integer, a calculated result of the difference between two timestamps according to date-part.
Supported
year, month, day, hour, minute, second.
utcnow()
Description
Return timestamp of current time.

Aggregation

count()
Description
Returns integers based on the number of rows that match a condition if there is one.
sum(expression)
Description
Returns a summary of expression on each row that matches a condition if there is one.
avg(expression)
Description
Returns an average expression on each row that matches a condition if there is one.
max(expression)
Description
Returns the maximal result for all expressions that match a condition if there is one.
min(expression)
Description
Returns the minimal result for all expressions that match a condition if there is one.

String

substring (string,from,for)
Description
Returns a string extract from the input string according to from, for inputs.
Char_length
Description
Returns a number of characters in string. Character_length also does the same.
trim([[leading | trailing | both remove_chars] from] string )
Description
Trims leading/trailing (or both) characters from the target string. The default value is a blank character.
Upper\lower
Description
Converts characters into uppercase or lowercase.

NULL

The NULL value is missing or unknown that is NULL can not produce a value on any arithmetic operations. The same applies to arithmetic comparison, any comparison to NULL is NULL that is unknown.

Table 3.4. The NULL use case
A is NULLResult(NULL=UNKNOWN)

Not A

NULL

A or False

NULL

A or True

True

A or A

NULL

A and False

False

A and True

NULL

A and A

NULL

Additional Resources

3.5.3. S3 alias programming construct

Alias programming construct is an essential part of the s3 select language because it enables better programming with objects that contain many columns or complex queries. When a statement with alias construct is parsed, it replaces the alias with a reference to the right projection column and on query execution, the reference is evaluated like any other expression. Alias maintains result-cache that is if an alias is used more than once, the same expression is not evaluated and the same result is returned because the result from the cache is used. Currently, Red Hat supports the column alias.

Example

select int(_1) as a1, int(_2) as a2 , (a1+a2) as a3 from s3object where a3>100 and a3<300;")

3.5.4. S3 parsing explained

The S3 select engine has parsers for all three file formats - CSV, Parquet, and JSON which separate the commands into more processable components, which are then attached to tags that define each component.

3.5.4.1. S3 CSV parsing

The CSV definitions with input serialization uses these default values:

  • Use {\n}` for row-delimiter.
  • Use {“} for quote.
  • Use {\} for escape characters.

The csv-header-info is parsed upon USE appearing in the AWS-CLI; this is the first row in the input object containing the schema. Currently, output serialization and compression-type is not supported. The S3 select engine has a CSV parser which parses S3-objects:

  • Each row ends with a row-delimiter.
  • The field-separator separates the adjacent columns.
  • The successive field separator defines the NULL column.
  • The quote-character overrides the field-separator; that is, the field separator is any character between the quotes.
  • The escape character disables any special character except the row delimiter.

The following are examples of CSV parsing rules:

Table 3.5. CSV parsing
FeatureDescriptionInput (Tokens)

NULL

Successive field delimiter

,,1,,2, =⇒ {null}{null}{1}{null}{2}{null}

QUOTE

The quote character overrides the field delimiter.

11,22,”a,b,c,d”,last =⇒ {11}{22}{“a,b,c,d”}{last}

Escape

The escape character overrides the meta-character.

A container for the object owner’s ID and DisplayName

row delimiter

There is no closed quote; row delimiter is the closing line.

11,22,a=”str,44,55,66 =⇒ {11}{22}{a=”str,44,55,66}

csv header info

FileHeaderInfo tag

USE value means each token on the first line is the column-name; IGNORE value means to skip the first line.

Additional Resources

3.5.4.2. S3 Parquet parsing

Apache Parquet is an open-source, columnar data file format designed for efficient data storage and retrieval.

The S3 select engine’s Parquet parser parses S3-objects as follows:

Example

4-byte magic number "PAR1"
<Column 1 Chunk 1 + Column Metadata>
<Column 2 Chunk 1 + Column Metadata>
...
<Column N Chunk 1 + Column Metadata>
<Column 1 Chunk 2 + Column Metadata>
<Column 2 Chunk 2 + Column Metadata>
...
<Column N Chunk 2 + Column Metadata>
...
<Column 1 Chunk M + Column Metadata>
<Column 2 Chunk M + Column Metadata>
...
<Column N Chunk M + Column Metadata>
File Metadata
4-byte length in bytes of file metadata
4-byte magic number "PAR1"

  • In the above example, there are N columns in this table, split into M row groups. The file metadata contains the locations of all the column metadata start locations.
  • Metadata is written after the data to allow for single pass writing.
  • All the column chunks can be found in the file metadata which should later be read sequentially.
  • The format is explicitly designed to separate the metadata from the data. This allows splitting columns into multiple files, as well as having a single metadata file reference multiple parquet files.

3.5.4.3. S3 JSON parsing

JSON document enables nesting values within objects or arrays without limitations. When querying a specific value in a JSON document in the S3 select engine, the location of the value is specified through a path in the SELECT statement.

The generic structure of a JSON document does not have a row and column structure like CSV and Parquet. Instead, it is the SQL statement itself that defines the rows and columns when querying a JSON document.

The S3 select engine’s JSON parser parses S3-objects as follows:

  • The FROM clause in the SELECT statement defines the row boundaries.
  • A row in a JSON document is similar to how the row delimiter is used to define rows for CSV objects, and how row groups are used to define rows for Parquet objects
  • Consider the following example:

    Example

    {
        "firstName": "Joe",
        "lastName": "Jackson",
        "gender": "male",
        "age": "twenty"
    },
    
    {
        "firstName": "Joe_2",
        "lastName": "Jackson_2",
        "gender": "male",
        "age": 21
    },
    
    "phoneNumbers":
    [
        { "type": "home1", "number": "734928_1","addr": 11 },
        { "type": "home2", "number": "734928_2","addr": 22 }
    ],
    
    "key_after_array": "XXX",
    
    "description" :
    {
        "main_desc" : "value_1",
        "second_desc" : "value_2"
    }
    
    # the from-clause define a single row.
    # _1 points to root object level.
    # _1.age appears twice in Documnet-row, the last value is used for the operation.
    query = "select _1.firstname,_1.key_after_array,_1.age+4,_1.description.main_desc,_1.description.second_desc from s3object[*].aa.bb.cc;";
    
    expected_result = Joe_2,XXX,25,value_1,value_2

    • The statement instructs the reader to search for the path aa.bb.cc and defines the row boundaries based on the occurrence of this path.
    • A row begins when the reader encounters the path, and it ends when the reader exits the innermost part of the path, which in this case is the object cc.

3.5.5. Integrating Ceph Object Gateway with Trino

Integrate the Ceph Object Gateway with Trino, an important utility that enables the user to run SQL queries 9x faster on S3 objects.

Following are some benefits of using Trino:

  • Trino is a complete SQL engine.
  • Pushes down S3 select requests wherein the Trino engine identifies part of the SQL statement that is cost effective to run on the server-side.
  • uses the optimization rules of Ceph/S3select to enhance performance.
  • Leverages Red Hat Ceph Storage scalability and divides the original object into multiple equal parts, performs S3 select requests, and merges the request.
Important

If the s3select syntax does not work while querying through trino, use the SQL syntax.

Prerequisites

  • A running Red Hat Ceph Storage cluster with Ceph Object Gateway installed.
  • Docker or Podman installed.
  • Buckets created.
  • Objects are uploaded.

Procedure

  1. Deploy Trino and hive.

    Example

    [cephuser@host01 ~]$ git clone https://github.com/ceph/s3select.git
    [cephuser@host01 ~]$ cd s3select

  2. Modify the hms_trino.yaml file with S3 endpoint, access key, and secret key.

    Example

    [cephuser@host01 s3select]$ cat container/trino/hms_trino.yaml
    version: '3'
    services:
      hms:
        image: galsl/hms:dev
        container_name: hms
        environment:
          # S3_ENDPOINT the CEPH/RGW end-point-url
          - S3_ENDPOINT=http://rgw_ip:port
          - S3_ACCESS_KEY=abc
          - S3_SECRET_KEY=abc
        # the container starts with booting the hive metastore
        command: sh -c '. ~/.bashrc; start_hive_metastore'
        ports:
          - 9083:9083
        networks:
          - trino_hms
    
      trino:
        image: trinodb/trino:405
        container_name: trino
        volumes:
          # the trino directory contains the necessary configuration
          - ./trino:/etc/trino
        ports:
          - 8080:8080
        networks:
          - trino_hms
    
    networks:
      trino_hm

  3. Modify the hive.properties file with S3 endpoint, access key, and secret key.

    Example

    [cephuser@host01 s3select]$ cat container/trino/trino/catalog/hive.properties
    connector.name=hive
    hive.metastore.uri=thrift://hms:9083
    
    #hive.metastore.warehouse.dir=s3a://hive/
    
    hive.allow-drop-table=true
    hive.allow-rename-table=true
    hive.allow-add-column=true
    hive.allow-drop-column=true
    hive.allow-rename-column=true
    
    hive.non-managed-table-writes-enabled=true
    hive.s3select-pushdown.enabled=true
    hive.s3.aws-access-key=abc
    hive.s3.aws-secret-key=abc
    
    # should modify per s3-endpoint-url
    hive.s3.endpoint=http://rgw_ip:port
    #hive.s3.max-connections=1
    #hive.s3select-pushdown.max-connections=1
    
    hive.s3.connect-timeout=100s
    hive.s3.socket-timeout=100s
    hive.max-splits-per-second=10000
    hive.max-split-size=128MB

  4. Start a Trino container to integrate Ceph Object Gateway.

    Example

    [cephuser@host01 s3select]$ sudo docker compose -f ./container/trino/hms_trino.yaml up -d

  5. Verify integration.

    Example

    [cephuser@host01 s3select]$ sudo docker exec -it trino /bin/bash
    trino@66f753905e82:/$ trino
    trino> create schema hive.csvbkt1schema;
    trino> create table hive.csvbkt1schema.polariondatacsv(c1 varchar,c2 varchar, c3 varchar, c4 varchar, c5 varchar, c6 varchar, c7 varchar, c8 varchar, c9 varchar) WITH ( external_location = 's3a://csvbkt1/',format = 'CSV');
    trino> select * from hive.csvbkt1schema.polariondatacsv;

    Note

    The external location must point to the bucket name or a directory, and not the end of a file.

Red Hat logoGithubRedditYoutubeTwitter

Lernen

Testen, kaufen und verkaufen

Communitys

Über Red Hat Dokumentation

Wir helfen Red Hat Benutzern, mit unseren Produkten und Diensten innovativ zu sein und ihre Ziele zu erreichen – mit Inhalten, denen sie vertrauen können.

Mehr Inklusion in Open Source

Red Hat hat sich verpflichtet, problematische Sprache in unserem Code, unserer Dokumentation und unseren Web-Eigenschaften zu ersetzen. Weitere Einzelheiten finden Sie in Red Hat Blog.

Über Red Hat

Wir liefern gehärtete Lösungen, die es Unternehmen leichter machen, plattform- und umgebungsübergreifend zu arbeiten, vom zentralen Rechenzentrum bis zum Netzwerkrand.

© 2024 Red Hat, Inc.