Chapter 14. Hibernate Search
14.1. Getting Started with Hibernate Search
14.1.1. About Hibernate Search
14.1.2. First Steps with Hibernate Search
- See Configuration in the JBoss EAP Administration and Configuration Guide to configure Hibernate Search.
14.1.3. Enable Hibernate Search using Maven
hibernate-search-orm
dependencies:
<dependencyManagement> <dependencies> <dependency> <groupId>org.hibernate</groupId> <artifactId>hibernate-search-orm</artifactId> <version>4.6.0.Final-redhat-2</version> </dependency> </dependencies> </dependencyManagement> <dependencies> <dependency> <groupId>org.hibernate</groupId> <artifactId>hibernate-search-orm</artifactId> <scope>provided</scope> </dependency> </dependencies>
14.1.4. Add Annotations
example.Book
and example.Author
and you want to add free text search capabilities to your application to enable searching for books.
Example 14.1. Entities Book and Author Before Adding Hibernate Search Specific Annotations
package example; ... @Entity public class Book { @Id @GeneratedValue private Integer id; private String title; private String subtitle; @ManyToMany private Set<Author> authors = new HashSet<Author>(); private Date publicationDate; public Book() {} // standard getters/setters follow here ... }
package example; ... @Entity public class Author { @Id @GeneratedValue private Integer id; private String name; public Author() {} // standard getters/setters follow here ... }
Book
and Author
class. The first annotation @Indexed
marks Book
as indexable. By design Hibernate Search stores an untokenized ID in the index to ensure index unicity for a given entity. @DocumentId
marks the property to use for this purpose and is in most cases the same as the database primary key. The @DocumentId
annotation is optional in the case where an @Id
annotation exists.
title
and subtitle
and annotate both with @Field
. The parameter index=Index.YES
will ensure that the text will be indexed, while analyze=Analyze.YES
ensures that the text will be analyzed using the default Lucene analyzer. Usually, analyzing means chunking a sentence into individual words and potentially excluding common words like 'a'
or 'the
'. We will talk more about analyzers a little later on. The third parameter we specify within @Field
, store=Store.NO
, ensures that the actual data will not be stored in the index. Whether this data is stored in the index or not has nothing to do with the ability to search for it. From Lucene's perspective it is not necessary to keep the data once the index is created. The benefit of storing it is the ability to retrieve it via projections ( see Section 14.3.1.10.5, “Projection”).
index=Index.YES
, analyze=Analyze.YES
and store=Store.NO
are the default values for these parameters and could be omitted.
@DateBridge
. This annotation is one of the built-in field bridges in Hibernate Search. The Lucene index is purely string based. For this reason Hibernate Search must convert the data types of the indexed fields to strings and vice-versa. A range of predefined bridges are provided, including the DateBridge
which will convert a java.util.Date
into a String
with the specified resolution. For more details see Section 14.2.4, “Bridges”.
@IndexedEmbedded.
This annotation is used to index associated entities (@ManyToMany
, @*ToOne
, @Embedded
and @ElementCollection
) as part of the owning entity. This is needed since a Lucene index document is a flat data structure which does not know anything about object relations. To ensure that the authors' name will be searchable you have to ensure that the names are indexed as part of the book itself. On top of @IndexedEmbedded
you will also have to mark all fields of the associated entity you want to have included in the index with @Indexed
. For more details see Section 14.2.1.3, “Embedded and Associated Objects”
Example 14.2. Entities After Adding Hibernate Search Annotations
package example; ... @Entity @Indexed public class Book { @Id @GeneratedValue private Integer id; @Field(index=Index.YES, analyze=Analyze.YES, store=Store.NO) private String title; @Field(index=Index.YES, analyze=Analyze.YES, store=Store.NO) private String subtitle; @Field(index = Index.YES, analyze=Analyze.NO, store = Store.YES) @DateBridge(resolution = Resolution.DAY) private Date publicationDate; @IndexedEmbedded @ManyToMany private Set<Author> authors = new HashSet<Author>(); public Book() { } // standard getters/setters follow here ... }
package example;
...
@Entity
public class Author {
@Id
@GeneratedValue
private Integer id;
@Field
private String name;
public Author() {
}
// standard getters/setters follow here
...
}
14.1.5. Indexing
Example 14.3. Using the Hibernate Session to Index Data
FullTextSession fullTextSession = org.hibernate.search.Search.getFullTextSession(session); fullTextSession.createIndexer().startAndWait();
Example 14.4. Using JPA to Index Data
EntityManager em = entityManagerFactory.createEntityManager(); FullTextEntityManager fullTextEntityManager = org.hibernate.search.jpa.Search.getFullTextEntityManager(em); fullTextEntityManager.createIndexer().startAndWait();
/var/lucene/indexes/example.Book
. Go ahead an inspect this index with Luke. It will help you to understand how Hibernate Search works.
14.1.6. Searching
org.hibernate.Query
to get the required functionality from the Hibernate API. The following code prepares a query against the indexed fields. Executing the code returns a list of Book
s.
Example 14.5. Using a Hibernate Search Session to Create and Execute a Search
FullTextSession fullTextSession = Search.getFullTextSession(session); Transaction tx = fullTextSession.beginTransaction(); // create native Lucene query using the query DSL // alternatively you can write the Lucene query using the Lucene query parser // or the Lucene programmatic API. The Hibernate Search DSL is recommended though QueryBuilder qb = fullTextSession.getSearchFactory() .buildQueryBuilder().forEntity( Book.class ).get(); org.apache.lucene.search.Query query = qb .keyword() .onFields("title", "subtitle", "authors.name", "publicationDate") .matching("Java rocks!") .createQuery(); // wrap Lucene query in a org.hibernate.Query org.hibernate.Query hibQuery = fullTextSession.createFullTextQuery(query, Book.class); // execute search List result = hibQuery.list(); tx.commit(); session.close();
Example 14.6. Using JPA to Create and Execute a Search
EntityManager em = entityManagerFactory.createEntityManager(); FullTextEntityManager fullTextEntityManager = org.hibernate.search.jpa.Search.getFullTextEntityManager(em); em.getTransaction().begin(); // create native Lucene query using the query DSL // alternatively you can write the Lucene query using the Lucene query parser // or the Lucene programmatic API. The Hibernate Search DSL is recommended though QueryBuilder qb = fullTextEntityManager.getSearchFactory() .buildQueryBuilder().forEntity( Book.class ).get(); org.apache.lucene.search.Query query = qb .keyword() .onFields("title", "subtitle", "authors.name", "publicationDate") .matching("Java rocks!") .createQuery(); // wrap Lucene query in a javax.persistence.Query javax.persistence.Query persistenceQuery = fullTextEntityManager.createFullTextQuery(query, Book.class); // execute search List result = persistenceQuery.getResultList(); em.getTransaction().commit(); em.close();
14.1.7. Analyzer
Refactoring: Improving the Design of Existing Code
and that hits are required for the following queries: refactor
, refactors
, refactored
, and refactoring
. Select an analyzer class in Lucene that applies word stemming when indexing and searching. Hibernate Search offers several ways to configure the analyzer (see Section 14.2.3.1, “Default Analyzer and Analyzer by Class” for more information):
- Set the
analyzer
property in the configuration file. The specified class becomes the default analyzer. - Set the
annotation at the entity level.@Analyzer
- Set the
@
annotation at the field level.Analyzer
@AnalyzerDef
annotation with the @Analyzer
annotation. The Solr analyzer framework with its factories are utilized for the latter option. For more information about factory classes, see the Solr JavaDoc or read the corresponding section on the Solr Wiki (http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters)
StandardTokenizerFactory
is used by two filter factories: LowerCaseFilterFactory
and SnowballPorterFilterFactory
. The tokenizer splits words at punctuation characters and hyphens but keeping email addresses and internet hostnames intact. The standard tokenizer is ideal for this and other general operations. The lowercase filter converts all letters in the token into lowercase and the snowball filter applies language specific stemming.
Example 14.7. Using @AnalyzerDef and the Solr Framework to Define and Use an Analyzer
@Indexed @AnalyzerDef( name = "customanalyzer", tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters = { @TokenFilterDef(factory = LowerCaseFilterFactory.class), @TokenFilterDef(factory = SnowballPorterFilterFactory.class, params = { @Parameter(name = "language", value = "English") }) }) public class Book implements Serializable { @Field @Analyzer(definition = "customanalyzer") private String title; @Field @Analyzer(definition = "customanalyzer") private String subtitle; @IndexedEmbedded private Set authors = new HashSet(); @Field(index = Index.YES, analyze = Analyze.NO, store = Store.YES) @DateBridge(resolution = Resolution.DAY) private Date publicationDate; public Book() { } // standard getters/setters follow here ... }
@AnalyzerDef
to define an analyzer, then apply it to entities and properties using @Analyzer
. In the example, the customanalyzer
is defined but not applied on the entity. The analyzer is only applied to the title
and subtitle
properties. An analyzer definition is global. Define the analyzer for an entity and reuse the definition for other entities as required.