Chapter 89. Lucene
Lucene (Indexer and Search) Component
Available as of Apache Camel 2.2
The lucene component is based on the Apache Lucene project. Apache Lucene is a powerful high-performance, full-featured text search engine library written entirely in Java. For more details about Lucene, please see the following links:
The lucene component in camel facilitates integration and utilization of Lucene endpoints in enterprise integration patterns and scenarios. The lucene component does the following
- builds a searchable index of documents when payloads are sent to the Lucene Endpoint
- facilitates performing of indexed searches in Apache Camel
This component only supports producer endpoints.
URI format
lucene:searcherName:insert[?options] lucene:searcherName:query[?options]
You can append query options to the URI in the following format,
?option=value&option=value&...
Insert Options
Name | Default Value | Description |
---|---|---|
analyzer
|
StandardAnalyzer
|
An Analyzer builds TokenStreams, which analyze text. It thus represents a policy for extracting index terms from text. The value for analyzer can be any class that extends the abstract class org.apache.lucene.analysis.Analyzer. Lucene also offers a rich set of analyzers out of the box |
indexDir
|
./indexDirectory
|
A file system directory in which index files are created upon analysis of the document by the specified analyzer |
srcDir
|
null
|
An optional directory containing files to be used to be analyzed and added to the index at producer startup. |
Query Options
Name | Default Value | Description |
---|---|---|
analyzer
|
StandardAnalyzer
|
An Analyzer builds TokenStreams, which analyze text. It thus represents a policy for extracting index terms from text. The value for analyzer can be any class that extends the abstract class org.apache.lucene.analysis.Analyzer. Lucene also offers a rich set of analyzers out of the box |
indexDir
|
./indexDirectory
|
A file system directory in which index files are created upon analysis of the document by the specified analyzer |
maxHits
|
10
|
An integer value that limits the result set of the search operation |
Message Headers
Header | Description |
---|---|
QUERY
|
The Lucene Query to performed on the index. The query may include wildcards and phrases. |
RETURN_LUCENE_DOCS
|
Camel 2.15: Set this header to true to include the actual Lucene documentation when returning hit information. |
Lucene Producers
This component supports 2 producer endpoints.
- insert - The insert producer builds a searchable index by analyzing the body in incoming exchanges and associating it with a token ("content").
- query - The query producer performs searches on a pre-created index. The query uses the searchable index to perform score & relevance based searches. Queries are sent via the incoming exchange contains a header property name called 'QUERY'. The value of the header property 'QUERY' is a Lucene Query. For more details on how to create Lucene Queries check out http://lucene.apache.org/java/3_0_0/queryparsersyntax.html
Lucene Processor
There is a processor called LuceneQueryProcessor available to perform queries against lucene without the need to create a producer.
Example 1: Creating a Lucene index
RouteBuilder builder = new RouteBuilder() { public void configure() { from("direct:start"). to("lucene:whitespaceQuotesIndex:insert?analyzer=#whitespaceAnalyzer&indexDir=#whitespace&srcDir=#load_dir"). to("mock:result"); } };
Example 2: Loading properties into the JNDI registry in the Camel Context
@Override protected JndiRegistry createRegistry() throws Exception { JndiRegistry registry = new JndiRegistry(createJndiContext()); registry.bind("whitespace", new File("./whitespaceIndexDir")); registry.bind("load_dir", new File("src/test/resources/sources")); registry.bind("whitespaceAnalyzer", new WhitespaceAnalyzer()); return registry; } ... CamelContext context = new DefaultCamelContext(createRegistry());
Example 2: Performing searches using a Query Producer
RouteBuilder builder = new RouteBuilder() { public void configure() { from("direct:start"). setHeader("QUERY", constant("Seinfeld")). to("lucene:searchIndex:query?analyzer=#whitespaceAnalyzer&indexDir=#whitespace&maxHits=20"). to("direct:next"); from("direct:next").process(new Processor() { public void process(Exchange exchange) throws Exception { Hits hits = exchange.getIn().getBody(Hits.class); printResults(hits); } private void printResults(Hits hits) { LOG.debug("Number of hits: " + hits.getNumberOfHits()); for (int i = 0; i < hits.getNumberOfHits(); i++) { LOG.debug("Hit " + i + " Index Location:" + hits.getHit().get(i).getHitLocation()); LOG.debug("Hit " + i + " Score:" + hits.getHit().get(i).getScore()); LOG.debug("Hit " + i + " Data:" + hits.getHit().get(i).getData()); } } }).to("mock:searchResult"); } };
Example 3: Performing searches using a Query Processor
RouteBuilder builder = new RouteBuilder() { public void configure() { try { from("direct:start"). setHeader("QUERY", constant("Rodney Dangerfield")). process(new LuceneQueryProcessor("target/stdindexDir", analyzer, null, 20)). to("direct:next"); } catch (Exception e) { e.printStackTrace(); } from("direct:next").process(new Processor() { public void process(Exchange exchange) throws Exception { Hits hits = exchange.getIn().getBody(Hits.class); printResults(hits); } private void printResults(Hits hits) { LOG.debug("Number of hits: " + hits.getNumberOfHits()); for (int i = 0; i < hits.getNumberOfHits(); i++) { LOG.debug("Hit " + i + " Index Location:" + hits.getHit().get(i).getHitLocation()); LOG.debug("Hit " + i + " Score:" + hits.getHit().get(i).getScore()); LOG.debug("Hit " + i + " Data:" + hits.getHit().get(i).getData()); } } }).to("mock:searchResult"); } };