Este conteúdo não está disponível no idioma selecionado.

4.5.8. Available Analyzers


Apache Solr and Lucene ship with a number of default CharFilters, tokenizers, and filters. A complete list of CharFilter, tokenizer, and filter factories is available at http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters. The following tables provide some example CharFilters, tokenizers, and filters.
Expand
Table 4.1. Example of available CharFilters
Factory Description Parameters Additional dependencies
MappingCharFilterFactory Replaces one or more characters with one or more characters, based on mappings specified in the resource file
mapping: points to a resource file containing the mappings using the format:


                    "á" => "a"
                    "ñ" => "n"
                    "ø" => "o"

none
HTMLStripCharFilterFactory Remove HTML standard tags, keeping the text none none
Expand
Table 4.2. Example of available tokenizers
Factory Description Parameters Additional dependencies
StandardTokenizerFactory Use the Lucene StandardTokenizer none none
HTMLStripCharFilterFactory Remove HTML tags, keep the text and pass it to a StandardTokenizer. none solr-core
PatternTokenizerFactory Breaks text at the specified regular expression pattern.
pattern: the regular expression to use for tokenizing
group: says which pattern group to extract into tokens
solr-core
Expand
Table 4.3. Examples of available filters
Factory Description Parameters Additional dependencies
StandardFilterFactory Remove dots from acronyms and 's from words none solr-core
LowerCaseFilterFactory Lowercases all words none solr-core
StopFilterFactory Remove words (tokens) matching a list of stop words
words: points to a resource file containing the stop words
ignoreCase: true if case should be ignored when comparing stop words, false otherwise
solr-core
SnowballPorterFilterFactory Reduces a word to it's root in a given language. (example: protect, protects, protection share the same root). Using such a filter allows searches matching related words. language: Danish, Dutch, English, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, Swedish and a few more solr-core
ISOLatin1AccentFilterFactory Remove accents for languages like French none solr-core
PhoneticFilterFactory Inserts phonetically similar tokens into the token stream
encoder: One of DoubleMetaphone, Metaphone, Soundex or RefinedSoundex
inject: true will add tokens to the stream, false will replace the existing token
maxCodeLength: sets the maximum length of the code to be generated. Supported only for Metaphone and DoubleMetaphone encodings
solr-core and commons-codec
CollationKeyFilterFactory Converts each token into its java.text.CollationKey, and then encodes the CollationKey with IndexableBinaryStringTools, to allow it to be stored as an index term. custom, language, country, variant, strength, decompositionsee Lucene's CollationKeyFilter javadocs for more info solr-core and commons-io
It is recommended that all implementations of org.apache.solr.analysis.TokenizerFactory and org.apache.solr.analysis.TokenFilterFactory are checked in your IDE to see available implementations.
Red Hat logoGithubredditYoutubeTwitter

Aprender

Experimente, compre e venda

Comunidades

Sobre a documentação da Red Hat

Ajudamos os usuários da Red Hat a inovar e atingir seus objetivos com nossos produtos e serviços com conteúdo em que podem confiar. Explore nossas atualizações recentes.

Tornando o open source mais inclusivo

A Red Hat está comprometida em substituir a linguagem problemática em nosso código, documentação e propriedades da web. Para mais detalhes veja o Blog da Red Hat.

Sobre a Red Hat

Fornecemos soluções robustas que facilitam o trabalho das empresas em plataformas e ambientes, desde o data center principal até a borda da rede.

Theme

© 2026 Red Hat
Voltar ao topo