4.5.2. Named Analyzers


The Query Module uses analyzer definitions to deal with the complexity of the Analyzer function. Analyzer definitions are reusable by multiple @Analyzer declarations and includes the following:
  • a name: the unique string used to refer to the definition.
  • a list of CharFilters: each CharFilter is responsible to pre-process input characters before the tokenization. CharFilters can add, change, or remove characters. One common usage is for character normalization.
  • a Tokenizer: responsible for tokenizing the input stream into individual words.
  • a list of filters: each filter is responsible to remove, modify, or sometimes add words into the stream provided by the Tokenizer.
The Analyzer separates these components into multiple tasks, allowing individual components to be reused and components to be built with flexibility using the following procedure:

Procedure 4.1. The Analyzer Process

  1. The CharFilters process the character input.
  2. Tokenizer converts the character input into tokens.
  3. The tokens are the processed by the TokenFilters.
The Lucene-based Query API supports this infrastructure by utilizing the Solr analyzer framework.
Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.