此内容没有您所选择的语言版本。

1.6. Analyzer


Let us make things a little more interesting now. Assume that one of your indexed book entities has the title "Refactoring: Improving the Design of Existing Code" and you want to get hits for all of the following queries: "refactor", "refactors", "refactored" and "refactoring". In Lucene this can be achieved by choosing an analyzer class which applies word stemming during the indexing as well as search process. Hibernate Search offers several ways to configure the analyzer to use (see Section 4.1.6, “Analyzer”):
  • Setting the hibernate.search.analyzer property in the configuration file. The specified class will then be the default analyzer.
  • Setting the @Analyzer annotation at the entity level.
  • Setting the @Analyzer annotation at the field level.
When using the @Analyzer annotation one can either specify the fully qualified classname of the analyzer to use or one can refer to an analyzer definition defined by the @AnalyzerDef annotation. In the latter case the Solr analyzer framework with its factories approach is utilized. To find out more about the factory classes available you can either browse the Solr JavaDoc or read the corresponding section on the Solr Wiki. Note that depending on the chosen factory class additional libraries on top of the Solr dependencies might be required. For example, the PhoneticFilterFactory depends on commons-codec.
In the example below a StandardTokenizerFactory is used followed by two filter factories, LowerCaseFilterFactory and SnowballPorterFilterFactory. The standard tokenizer splits words at punctuation characters and hyphens while keeping email addresses and internet hostnames intact. It is a good general purpose tokenizer. The lowercase filter lowercases the letters in each token whereas the snowball filter finally applies language specific stemming.
Generally, when using the Solr framework you have to start with a tokenizer followed by an arbitrary number of filters.

Example 1.10. Using @AnalyzerDef and the Solr framework to define and use an analyzer

package example;
...
@Entity
@Indexed
@AnalyzerDef(name = "customanalyzer", tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters = { @TokenFilterDef(factory = LowerCaseFilterFactory.class), @TokenFilterDef(factory = SnowballPorterFilterFactory.class, params = { @Parameter(name = "language", value = "English") }) })
public class Book {

  @Id
  @GeneratedValue
  @DocumentId
  private Integer id;
  
  @Field(index=Index.TOKENIZED, store=Store.NO)
  @Analyzer(definition = "customanalyzer")
  private String title;
  
  @Field(index=Index.TOKENIZED, store=Store.NO)
  @Analyzer(definition = "customanalyzer")
  private String subtitle; 

  @IndexedEmbedded
  @ManyToMany 
  private Set<Author> authors = new HashSet<Author>();

  @Field(index = Index.UN_TOKENIZED, store = Store.YES)
  @DateBridge(resolution = Resolution.DAY)
  private Date publicationDate;
  
  public Book() {
  } 
  
  // standard getters/setters follow here
  ... 
}

Copy to Clipboard Toggle word wrap
返回顶部
Red Hat logoGithubredditYoutubeTwitter

学习

尝试、购买和销售

社区

关于红帽文档

通过我们的产品和服务,以及可以信赖的内容,帮助红帽用户创新并实现他们的目标。 了解我们当前的更新.

让开源更具包容性

红帽致力于替换我们的代码、文档和 Web 属性中存在问题的语言。欲了解更多详情,请参阅红帽博客.

關於紅帽

我们提供强化的解决方案,使企业能够更轻松地跨平台和环境(从核心数据中心到网络边缘)工作。

Theme

© 2025 Red Hat