Chapter 39. HDFS

HDFS Component
링크 복사

The hdfs component is now part of the core Apache Camel product.

URI format
링크 복사

hdfs://hostname[:port][/path][?options]

You can append query options to the URI in the following format, ?option=value&option=value&... The path is treated in the following way:

as a consumer, if it's a file, it just reads the file, otherwise if it represents a directory it scans all the file under the path satisfying the configured pattern. All the files under that directory must be of the same type.
as a producer, if at least one split strategy is defined, the path is considered a directory and under that directory the producer creates a different file per split named seg0, seg1, seg2, etc.

Options
링크 복사

Expand

Name	Default Value	Description
`overwrite`	`true`	The file can be overwritten
`bufferSize`	`4096`	The buffer size used by HDFS
`replication`	`3`	The HDFS replication factor
`blockSize`	`67108864`	The size of the HDFS blocks
`fileType`	`NORMAL_FILE`	It can be SEQUENCE_FILE, MAP_FILE, ARRAY_FILE, or BLOOMMAP_FILE, see Hadoop
`fileSystemType`	`HDFS`	It can be LOCAL for local filesystem
`keyType`	`NULL`	The type for the key in case of sequence or map files. See below.
`valueType`	`TEXT`	The type for the key in case of sequence or map files. See below.
`splitStrategy`		A string describing the strategy on how to split the file based on different criteria. See below.
`openedSuffix`	`opened`	When a file is opened for reading/ writing the file is renamed with this suffix to avoid to read it during the writing phase.
`readSuffix`	`read`	Once the file has been read is renamed with this suffix to avoid to read it again.
`initialDelay`	`0`	For the consumer, how much to wait (milliseconds) before to start scanning the directory.
`delay`	`0`	The interval (milliseconds) between the directory scans.
`pattern`	`*`	The pattern used for scanning the directory
`chunkSize`	`4096`	When reading a normal file, this is split into chunks producing a message per chunk
`connectOnStartup`	`true`	Camel 2.9.3/2.10.1: Whether to connect to the HDFS file system on starting the producer/consumer. If `false` then the connection is created on-demand. Notice that HDFS may take up till 15 minutes to establish a connection, as it has hardcoded 45 x 20 sec redelivery. By setting this option to `false` allows your application to startup, and not block for up till 15 minutes.

KeyType and ValueType
링크 복사

NULL it means that the key or the value is absent
BYTE for writing a byte, the java Byte class is mapped into a BYTE
BYTES for writing a sequence of bytes. It maps the java ByteBuffer class
INT for writing java integer
FLOAT for writing java float
LONG for writing java long
DOUBLE for writing java double
TEXT for writing java strings

BYTES is also used with everything else, for example, in Camel a file is sent around as an InputStream, int this case is written in a sequence file or a map file as a sequence of bytes.

In the current version of Hadoop opening a file in append mode is disabled, since it's not reliable enough. So, for the moment, it's only possible to create new files. The Camel HDFS endpoint tries to solve this problem in this way:

If the split strategy option has been defined, the actual file name will become a directory name and a <file name>/seg0 will be initially created.
Every time a splitting condition is met a new file is created with name <original file name>/segN where N is 1, 2, 3, etc.The splitStrategy option is defined as a string with the following syntax:splitStrategy=<ST>:<value>,<ST>:<value>,*

Where <ST> can be:

BYTES a new file is created, and the old is closed when the number of written bytes is more than <value>
MESSAGES a new file is created, and the old is closed when the number of written messages is more than <value>
IDLE a new file is created, and the old is closed when no writing happened in the last <value> milliseconds

For example:

hdfs://localhost/tmp/simple-file?splitStrategy=IDLE:1000,BYTES:5

it means: a new file is created either when it has been idle for more than 1 second or if more than 5 bytes have been written. So, running hadoop fs ls /tmp/simplefile you'll find the following files seg0, seg1, seg2, etc.

Using this component in OSGi
링크 복사

This component is fully functional in an OSGi environment however, it requires some actions from the user. Hadoop uses the thread context class loader in order to load resources. Usually, the thread context classloader will be the bundle class loader of the bundle that contains the routes. So, the default configuration files need to be visible from the bundle class loader. A typical way to deal with it is to keep a copy of core-default.xml in your bundle root. That file can be found in the hadoop-common.jar.

이 콘텐츠는 선택한 언어로 제공되지 않습니다.

HDFS Component
링크 복사

URI format
링크 복사

Options
링크 복사

KeyType and ValueType
링크 복사

Splitting Strategy
링크 복사

Using this component in OSGi
링크 복사

자세한 정보

평가판, 구매 및 판매

커뮤니티

Red Hat 소개

보다 포괄적 수용을 위한 오픈 소스 용어 교체

Red Hat 문서 정보

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

이 콘텐츠는 선택한 언어로 제공되지 않습니다.

Chapter 39. HDFS

HDFS Component링크 복사링크가 클립보드에 복사되었습니다!

URI format링크 복사링크가 클립보드에 복사되었습니다!

Options링크 복사링크가 클립보드에 복사되었습니다!

KeyType and ValueType링크 복사링크가 클립보드에 복사되었습니다!

Splitting Strategy링크 복사링크가 클립보드에 복사되었습니다!

Using this component in OSGi링크 복사링크가 클립보드에 복사되었습니다!

자세한 정보

평가판, 구매 및 판매

커뮤니티

Red Hat 소개

보다 포괄적 수용을 위한 오픈 소스 용어 교체

Red Hat 문서 정보

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

HDFS Component
링크 복사

URI format
링크 복사

Options
링크 복사

KeyType and ValueType
링크 복사

Splitting Strategy
링크 복사

Using this component in OSGi
링크 복사