4.5.2. Named Analyzers
The Query Module uses analyzer definitions to deal with the complexity of the Analyzer function. Analyzer definitions are reusable by multiple
@Analyzer
declarations and includes the following:
- a name: the unique string used to refer to the definition.
- a list of
CharFilters
: eachCharFilter
is responsible to pre-process input characters before the tokenization.CharFilters
can add, change, or remove characters. One common usage is for character normalization. - a
Tokenizer
: responsible for tokenizing the input stream into individual words. - a list of filters: each filter is responsible to remove, modify, or sometimes add words into the stream provided by the
Tokenizer
.
The
Analyzer
separates these components into multiple tasks, allowing individual components to be reused and components to be built with flexibility using the following procedure:
Procedure 4.1. The Analyzer Process
- The
CharFilters
process the character input. Tokenizer
converts the character input into tokens.- The tokens are the processed by the
TokenFilters
.
The Lucene-based Query API supports this infrastructure by utilizing the Solr analyzer framework.