Chapter 336. Tika Component
Available as of Camel version 2.19
The Tika: components provides the ability to detect and parse documents with Apache Tika. This component uses Apache Tika as underlying library to work with documents.
In order to use the Tika component, Maven users will need to add the following dependency to their pom.xml
:
pom.xml
<dependency> <groupId>org.apache.camel</groupId> <artifactId>camel-tika</artifactId> <version>x.x.x</version> <!-- use the same version as your Camel core version --> </dependency>
The TIKA component only supports producer endpoints.
336.1. Options
The Tika component has no options.
The Tika endpoint is configured using URI syntax:
tika:operation
with the following path and query parameters:
336.1.1. Path Parameters (1 parameters):
Name | Description | Default | Type |
---|---|---|---|
operation | Required Tika Operation. parse or detect | TikaOperation |
336.1.2. Query Parameters (5 parameters):
Name | Description | Default | Type |
---|---|---|---|
tikaConfig (producer) | Tika Config | TikaConfig | |
tikaConfigUri (producer) | Tika Config Uri: The URI of tika-config.xml | String | |
tikaParseOutputEncoding (producer) | Tika Parse Output Encoding - Used to specify the character encoding of the parsed output. Defaults to Charset.defaultCharset() . | String | |
tikaParseOutputFormat (producer) | Tika Output Format. Supported output formats. xml: Returns Parsed Content as XML. html: Returns Parsed Content as HTML. text: Returns Parsed Content as Text. textMain: Uses the boilerpipe library to automatically extract the main content from a web page. | xml | TikaParseOutputFormat |
synchronous (advanced) | Sets whether synchronous processing should be strictly used, or Camel is allowed to use asynchronous processing (if supported). | false | boolean |
336.2. To Detect a file’s MIME Type
The file should be placed in the Body.
from("direct:start") .to("tika:detect");
336.3. To Parse a File
The file should be placed in the Body.
from("direct:start") .to("tika:parse");