ホーム
製品
Red Hat Ceph Storage
6
開発者ガイド
3.5. S3 選択操作 (テクノロジープレビュー)

3.5. S3 選択操作 (テクノロジープレビュー)

重要

S3 選択操作はテクノロジープレビュー機能のみです。テクノロジープレビュー機能は、実稼働環境での Red Hat サービスレベルアグリーメント (SLA) ではサポートされておらず、機能的に完全ではない可能性があるため、Red Hat では実稼働環境での使用を推奨していません。これらの機能により、近日発表予定の製品機能をリリースに先駆けてご提供でき、お客様は開発プロセス時に機能をテストして、フィードバックをお寄せいただくことができます。詳細は、Red Hat テクノロジープレビュー機能のサポート範囲 を参照してください。

開発者は、S3 select を実行してスループットを加速できます。ユーザーは、メディエーターなしで S3 select クエリーを直接実行できます。

CSV 用と Apache Parquet (Parquet) 用の 2 つの S3 選択ワークフローがあり、CSV および Parquet オブジェクトを使用した S3 選択操作を提供します。

CSV ファイルには、表形式のデータがプレーンテキスト形式で格納されます。ファイルの各行はデータレコードです。
Parquet は、効率的なデータの保存と取得のために設計された、オープンソースのカラム型のデータファイル形式です。複雑なデータをまとめて処理するための強化されたパフォーマンスを備えた、非常に効率的なデータ圧縮およびエンコーディングスキームを提供します。

たとえば、数ギガバイトのデータを持つ CSV または Parquet S3 オブジェクトの場合、ユーザーは次のクエリーを使用して、別の列によってフィルター処理された単一の列を抽出できます。

例

select customerid from s3Object where age>30 and age<65;

現時点で、S3 オブジェクトはデータのフィルタリングおよび抽出の前に、Ceph Object Gateway 経由で Ceph OSD からデータを取得する必要があります。オブジェクトのサイズが大きく、クエリーが具体的な場合に、パフォーマンスが向上します。Parquet 形式は、CSV よりも効率的に処理できます。

前提条件

稼働中の Red Hat Ceph Storage クラスターがある。
RESTful クライアント。
ユーザーアクセスで作成された S3 ユーザー。

3.5.1. S3 select content from an object
リンクのコピー

select object content API は、構造化されたクエリー言語 (SQL) でオブジェクトの内容をフィルターします。インベントリーオブジェクトに含める必要がある内容の記述例は、AWS Systems Manager User Guide の Metadata collected by inventory セクションを参照してください。インベントリーの内容は、そのインベントリーに対して実行する必要があるクエリーのタイプに影響します。重要な情報を提供できる可能性のある SQL ステートメントの数は多いものの、S3 select は SQL に似たユーティリティーであるため、group-by や join などの一部の演算子はサポートされていません。

CSV の場合のみ、オブジェクトのコンマ区切りの値であるデータのシリアライズ形式を指定して、指定のコンテンツを取得する必要があります。Parquet はバイナリー形式であるため、区切り文字はありません。Amazon Web Services (AWS) のコマンドラインインターフェイス (CLI) 選択オブジェクトコンテンツは、CSV または Parquet 形式を使用してオブジェクトデータをレコードに解析し、クエリーで指定されたレコードのみを返します。

応答のデータシリアライゼーション形式を指定する必要があります。この操作には s3:GetObject パーミッションが必要です。

注記

InputSerialization 要素は、クエリーされるオブジェクトに含まれるデータの形式を記述します。オブジェクトは、CSV または Parquet 形式にすることができます。
OutputSerialization 要素は AWS-CLI ユーザークライアントの一部で、出力データのフォーマット方法を記述します。Ceph は AWS-CLI のサーバークライアントを実装しているため、現在 CSV のみである OutputSerialization に従って同じ出力を提供します。
InputSerialization の形式は、OutputSerialization の形式と一致する必要はありません。そのため、たとえば InputSerialization で Parquet を指定し、OutputSerialization で CSV を指定することもできます。

構文

POST /BUCKET/KEY?select&select-type=2 HTTP/1.1\r\n

例

POST /testbucket/sample1csv?select&select-type=2 HTTP/1.1\r\n
POST /testbucket/sample1parquet?select&select-type=2 HTTP/1.1\r\n

要求エンティティー

Bucket

説明: オブジェクトコンテンツを選択するバケット。
型: String
必須: はい

Key

説明: オブジェクトキー。
長さに関する制約: 最小長は 1 です。
型: String
必須: はい

SelectObjectContentRequest

説明: select オブジェクトコンテンツ要求パラメーターのルートレベルタグ。
型: String
必須: はい

式

説明: オブジェクトのクエリーに使用される式。
型: String
必須: はい

ExpressionType

説明: SQL など、提供された式のタイプ。
型: String
有効な値: SQL
必須: はい

InputSerialization

説明: クエリーされるオブジェクトに含まれるデータの形式を記述します。
型: String
必須: はい

OutputSerialization

説明: コンマセパレーターおよび改行で返されるデータの形式。
型: String
必須: はい

応答エンティティー

アクションに成功すると、サービスは HTTP 200 応答を返します。データは、サービスによって XML 形式で返されます。

Payload

説明: ペイロードパラメーターのルートレベルタグ。
型: String
必須: はい

Records

説明: レコードイベント。
型: base64 でエンコードされたバイナリーデータオブジェクト
必須: いいえ

Stats

説明: stats イベント。
型: Long
必須: いいえ

Ceph Object Gateway は以下の応答をサポートします。

例

{:event-type,records} {:content-type,application/octet-stream} :message-type,event}

構文 (CSV の場合)

aws --endpoint-URL http://localhost:80 s3api select-object-content
 --bucket BUCKET_NAME
 --expression-type 'SQL'
 --input-serialization
 '{"CSV": {"FieldDelimiter": "," , "QuoteCharacter": "\"" , "RecordDelimiter" : "\n" , "QuoteEscapeCharacter" : "\\" , "FileHeaderInfo": "USE" }, "CompressionType": "NONE"}'
 --output-serialization '{"CSV": {}}'
 --key OBJECT_NAME
 --expression "select count(0) from s3object where int(_1)<10;" output.csv

例 (CSV の場合)

aws --endpoint-url http://localhost:80 s3api select-object-content
 --bucket testbucket
 --expression-type 'SQL'
 --input-serialization
 '{"CSV": {"FieldDelimiter": "," , "QuoteCharacter": "\"" , "RecordDelimiter" : "\n" , "QuoteEscapeCharacter" : "\\" , "FileHeaderInfo": "USE" }, "CompressionType": "NONE"}'
 --output-serialization '{"CSV": {}}'
 --key testobject
 --expression "select count(0) from s3object where int(_1)<10;" output.csv

構文 (Parquet の場合)

aws --endpoint-URL http://localhost:80 s3api select-object-content
 --bucket BUCKET_NAME
 --expression-type 'SQL'
 --input-serialization
 '{"Parquet": {}, {"CompressionType": "NONE"}'
 --output-serialization '{"CSV": {}}'
 --key OBJECT_NAME.parquet
 --expression "select count(0) from s3object where int(_1)<10;" output.csv

例 (Parquet の場合)

aws --endpoint-url http://localhost:80 s3api select-object-content
 --bucket testbucket
 --expression-type 'SQL'
 --input-serialization
 '{"Parquet": {}, {"CompressionType": "NONE"}'
 --output-serialization '{"CSV": {}}'
 --key testobject.parquet
 --expression "select count(0) from s3object where int(_1)<10;" output.csv

サポートされる機能

現時点で、AWS s3 select コマンドの一部のみがサポートされます。

Expand

機能	詳細	説明	例
算術演算子	^ * % / + - ( )		select (int(_1)+int(_2))*int(_9) from s3object;
算術演算子	% modulo		select count(*) from s3object where cast(_1 as int)%2 == 0;
算定演算子	^ power-of		select cast(2^10 as int) from s3object;
演算子の比較	> < >= ⇐ == !=		select _1,_2 from s3object where (int(_1)+int(_3))>int(_5);
論理演算子	AND または NOT		select count(*) from s3object where not (int(1)>123 and int(_5)<200);
論理演算子	is null	式の null 表示の場合は true/false を返します。
論理演算子および NULL	is not null	式の null 表示の場合は true/false を返します。
論理演算子および NULL	不明な状態	null 処理を確認し、NULL で論理操作の結果を確認します。クエリーは `0` を返します。	`select count(*) from s3object where null and (3>2);`
NULL を使用した算術演算子	不明な状態	null 処理を確認し、NULL でバイナリー操作の結果を確認します。クエリーは `0` を返します。	`select count(*) from s3object where (null+1) and (3>2);`
NULL との比較	不明な状態	null 処理を確認し、比較操作の結果を NULL で確認します。クエリーは `0` を返します。	`select count() from s3object where (null1.5) != 3;`
列がない	不明な状態		`select count(*) from s3object where _1 is null;`
投影列	if、then、または else と同様です。	ケースの選択	`when (1+1==(2+1)3) then ‘case_1' when 43)==(12 then ‘case_2' else ‘case_else' end, age*2 from s3object;`
論理演算子		`coalesce` は、最初の null 以外の引数を返します。	`select coalesce(nullif(5,5),nullif(1,1.0),age+12) from s3object;`
論理演算子		`nullif` の場合は、両方の引数が等しい場合は null を返し、それ以外の場合は最初の引数 `nullif(1,1)=NULL nullif(null,1)=NULL nullif(2,1)=2` を返します。	`select nullif(cast(_1 as int),cast(_2 as int)) from s3object;`
論理演算子		`{expression} in ( .. {expression} ..)`	`select count(*) from s3object where ‘ben' in (trim(_5),substring(_1,char_length(_1)-3,3),last_name);`
論理演算子		`{expression} between {expression} and {expression}`	`select count(*) from stdin where substring(_3,char_length(_3),1) between “x" and trim(_1) and substring(_3,char_length(_3)-1,1) == “:";`
論理演算子		`{expression} like {match-pattern}`	`select count() from s3object where first_name like ‘%de_'; select count() from s3object where _1 like "%a[r-s];`
キャスト演算子			`select cast(123 as int)%2 from s3object;`
キャスト演算子			`select cast(123.456 as float)%2 from s3object;`
キャスト演算子			`select cast(‘ABC0-9' as string),cast(substr(‘ab12cd',3,2) as int)*4 from s3object;`
キャスト演算子			`select cast(substring(‘publish on 2007-01-01',12,10) as timestamp) from s3object;`
AWS 以外のキャスト演算子			`select int(_1),int( 1.2 + 3.4) from s3object;`
AWS 以外のキャスト演算子			`select float(1.2) from s3object;`
AWS 以外のキャスト演算子			`select timestamp(‘1999:10:10-12:23:44') from s3object;`
集約機能	sun		`select sum(int(_1)) from s3object;`
集約機能	avg		`select avg(cast(_1 a float) + cast(_2 as int)) from s3object;`
集約機能	min		`select avg(cast(_1 a float) + cast(_2 as int)) from s3object;`
集約機能	max		`select max(float(_1)),min(int(_5)) from s3object;`
集約機能	count		`select count(*) from s3object where (int(1)+int(_3))>int(_5);`
タイムスタンプ関数	extract		`select count(*) from s3object where extract(‘year',timestamp(_2)) > 1950 and extract(‘year',timestamp(_1)) < 1960;`
タイムスタンプ関数	dateadd		`select count(0) from s3object where datediff(‘year',timestamp(_1),dateadd(‘day',366,timestamp(_1))) == 1;`
タイムスタンプ関数	datediff		`select count(0) from s3object where datediff(‘month’,timestamp(_1),timestamp(_2))) == 2;`
タイムスタンプ関数	utcnow		`select count(0) from s3object where datediff(‘hours',utcnow(),dateadd(‘day',1,utcnow())) == 24`
文字列関数	substring		`select count(0) from s3object where int(substring(_1,1,4))>1950 and int(substring(_1,1,4))<1960;`
文字列関数	trim		`select trim(‘ foobar ‘) from s3object;`
文字列関数	trim		`select trim(trailing from ‘ foobar ‘) from s3object;`
文字列関数	trim		`select trim(leading from ‘ foobar ‘) from s3object;`
文字列関数	trim		`select trim(both ‘12' from ‘1112211foobar22211122') from s3objects;`
文字列関数	lower または upper		`select trim(both ‘12' from ‘1112211foobar22211122') from s3objects;`
文字列関数	char_length, character_length		`select count(*) from s3object where char_length(_3)==3;`
複雑なクエリー			`select sum(cast(_1 as int)),max(cast(_3 as int)), substring(‘abcdefghijklm', (2-1)*3+sum(cast(_1 as int))/sum(cast(_1 as int))+1, (count() + count(0))/count(0)) from s3object;`
エイリアスのサポート			`select int(_1) as a1, int(_2) as a2 , (a1+a2) as a3 from s3object where a3>100 and a3<300;`

3.5.2. S3 supported select functions
リンクのコピー

S3 select は、.Timestamp の機能をサポートします。

timestamp(string)

説明: 文字列をタイムスタンプの基本タイプに変換します。
サポート対象: 現在、yyyy:mm:dd hh:mi:dd に変換します。

extract(date-part,timestamp)

説明: 入力タイムスタンプからの date-part の抽出に従って整数を返します。
サポート対象: date-part: year,month,week,day.

dateadd(date-part ,integer,timestamp)

説明: 入力されたタイムスタンプと date-part の結果に基づいて計算されたタイムスタンプを返します。
サポート対象: date-part : year,month,day.

datediff(date-part,timestamp,timestamp)

説明: 整数を返します。これは、date-part に応じた 2 つのタイムスタンプの差の計算結果です。
サポート対象: date-part : year,month,day,hours.

utcnow()

説明: 現在の時刻のタイムスタンプを返します。

集約

count()

説明: (条件がある場合) 条件と一致する行数に基づいて整数を返します。

sum(expression)

説明: (条件がある場合) 条件と一致する各行の式の概要を返します。

avg(expression)

説明: (条件がある場合) 条件に一致する各行の平均式を返します。

max(expression)

説明: (条件がある場合) 条件に一致するすべての式について最大結果を返します。

min(expression)

説明: (条件がある場合) 条件に一致するすべての式の最小結果を返します。

String

substring(string,from,to)

説明: from および input をもとに、入力文字列から抽出した文字列を返します。

Char_length

説明: 文字列の文字数を返します。Character_length も同じです。

Trim

説明: ターゲット文字列から先頭または末尾の文字をトリミングします。デフォルトは空白です。

Upper\lower

説明: 文字を大文字または小文字に変換します。

NULL

NULL 値が見つからないか、不明な値で、NULL が任意の演算に値を生成できません。同じことが算術比較にも当てはまります。NULL との比較は不明である NULL です。

Expand

表3.4 NULL ユースケース
A is NULL	Result(NULL=UNKNOWN)
Not A	`NULL`
A or False	`NULL`
A or True	`True`
A or A	`NULL`
A and False	`False`
A and True	`NULL`
A and A	`NULL`

3.5.3. S3 alias programming construct
リンクのコピー

エイリアスプログラミング構築は、多くの列または複雑なクエリーを含むオブジェクトを持つプログラミングを容易にするため、s3 select 言語に不可欠な部分です。エイリアス構造を含むステートメントを解析すると、エイリアスを適切な投影列への参照に置き換え、クエリーの実行時に参照が他の式として評価されます。エイリアスは結果キャッシュを維持します。つまり、エイリアスが複数回使用された場合は、キャッシュからの結果が使用されるため、同じ式は評価されず、同じ結果が返されます。現在、Red Hat は列エイリアスをサポートしています。

例

select int(_1) as a1, int(_2) as a2 , (a1+a2) as a3 from s3object where a3>100 and a3<300;")

3.5.4. S3 CSV parsing explained
リンクのコピー

入力シリアライゼーションを含む CSV 定義では、次のデフォルト値が使用されます。

行区切り文字には {\n}` を使用します。
引用には {“} を使用します。
エスケープ文字には {\} を使用します。

csv-header-info は、AWS-CLI に表示される USE で解析されます。これは、スキーマを含む入力オブジェクトの最初の行です。現在、シリアル化および圧縮タイプの出力はサポートされていません。S3 select エンジンには、S3-objects を解析する CSV パーサーがあります。

各行は、行区切り文字で終わります。
フィールド区切り文字は、隣接する列を区切ります。
連続するフィールドの区切り文字は NULL 列を定義します。
引用符は、フィールド区切り文字をオーバーライドします。フィールド区切り文字は、引用符の間の任意の文字です。
エスケープ文字は、行区切り文字以外の特殊文字を無効にします。

以下は、CSV 解析ルールの例です。

Expand

表3.5 CSV の解析
機能	説明	入力 (トークン)
`NULL`	連続するフィールド区切り文字	`,,1,,2, =⇒ {null}{null}{1}{null}{2}{null}`
`QUOTE`	引用符は、フィールドの区切り文字をオーバーライドします。	`11,22,"a,b,c,d",last =⇒ {11}{22}{“a,b,c,d"}{last}`
`Escape`	エスケープ文字はメタ文字をオーバーライドします。	オブジェクトの所有者の `ID` および `DisplayName` のコンテナー。
`row delimiter`	終わりの引用符はありません。行区切り文字は終了行になります。	`11,22,a="str,44,55,66 =⇒ {11}{22}{a="str,44,55,66}`
`csv header info`	FileHeaderInfo タグ	USE の値は、最初の行の各トークンが column-name であることを示します。IGNORE 値は最初の行をスキップすることを意味します。

3.5. S3 選択操作 (テクノロジープレビュー)

3.5.1. S3 select content from an object
リンクのコピー

3.5.2. S3 supported select functions
リンクのコピー

3.5.3. S3 alias programming construct
リンクのコピー

3.5.4. S3 CSV parsing explained
リンクのコピー

詳細情報

試用、購入および販売

コミュニティー

会社概要

多様性を受け入れるオープンソースの強化

Red Hat ドキュメントについて

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

3.5. S3 選択操作 (テクノロジープレビュー)

3.5.1. S3 select content from an objectリンクのコピーリンクがクリップボードにコピーされました!

3.5.2. S3 supported select functionsリンクのコピーリンクがクリップボードにコピーされました!

3.5.3. S3 alias programming constructリンクのコピーリンクがクリップボードにコピーされました!

3.5.4. S3 CSV parsing explainedリンクのコピーリンクがクリップボードにコピーされました!

詳細情報

試用、購入および販売

コミュニティー

会社概要

多様性を受け入れるオープンソースの強化

Red Hat ドキュメントについて

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

3.5.1. S3 select content from an object
リンクのコピー

3.5.2. S3 supported select functions
リンクのコピー

3.5.3. S3 alias programming construct
リンクのコピー

3.5.4. S3 CSV parsing explained
リンクのコピー