Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.

4.5.8. Available Analyzers

Apache Solr and Lucene ship with a number of default CharFilters, tokenizers, and filters. A complete list of CharFilter, tokenizer, and filter factories is available at http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters. The following tables provide some example CharFilters, tokenizers, and filters.

Expand

Table 4.1. Example of available CharFilters
Factory	Description	Parameters	Additional dependencies
`MappingCharFilterFactory`	Replaces one or more characters with one or more characters, based on mappings specified in the resource file	`mapping`: points to a resource file containing the mappings using the format: "á" => "a" "ñ" => "n" "ø" => "o"	none
`HTMLStripCharFilterFactory`	Remove HTML standard tags, keeping the text	none	none

Expand

Table 4.2. Example of available tokenizers
Factory	Description	Parameters	Additional dependencies
`StandardTokenizerFactory`	Use the Lucene StandardTokenizer	none	none
`HTMLStripCharFilterFactory`	Remove HTML tags, keep the text and pass it to a StandardTokenizer.	none	`solr-core`
`PatternTokenizerFactory`	Breaks text at the specified regular expression pattern.	`pattern`: the regular expression to use for tokenizing group: says which pattern group to extract into tokens	`solr-core`

Expand

Table 4.3. Examples of available filters
Factory	Description	Parameters	Additional dependencies
`StandardFilterFactory`	Remove dots from acronyms and 's from words	none	`solr-core`
`LowerCaseFilterFactory`	Lowercases all words	none	`solr-core`
`StopFilterFactory`	Remove words (tokens) matching a list of stop words	`words`: points to a resource file containing the stop words ignoreCase: true if `case` should be ignored when comparing stop words, `false` otherwise	`solr-core`
`SnowballPorterFilterFactory`	Reduces a word to it's root in a given language. (example: protect, protects, protection share the same root). Using such a filter allows searches matching related words.	`language`: Danish, Dutch, English, Finnish, French, German, Italian, Norwegian, Portuguese, Russian, Spanish, Swedish and a few more	`solr-core`
`ISOLatin1AccentFilterFactory`	Remove accents for languages like French	none	`solr-core`
`PhoneticFilterFactory`	Inserts phonetically similar tokens into the token stream	`encoder`: One of DoubleMetaphone, Metaphone, Soundex or RefinedSoundex inject: `true` will add tokens to the stream, `false` will replace the existing token `maxCodeLength`: sets the maximum length of the code to be generated. Supported only for Metaphone and DoubleMetaphone encodings	`solr-core` and `commons-codec`
`CollationKeyFilterFactory`	Converts each token into its `java.text.CollationKey`, and then encodes the `CollationKey` with `IndexableBinaryStringTools`, to allow it to be stored as an index term.	`custom`, `language`, `country`, `variant`, `strength`, `decomposition`see Lucene's `CollationKeyFilter` javadocs for more info	`solr-core` and `commons-io`

It is recommended that all implementations of org.apache.solr.analysis.TokenizerFactory and org.apache.solr.analysis.TokenFilterFactory are checked in your IDE to see available implementations.

Report a bug

Nach oben

Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.

4.5.8. Available Analyzers

Lernen

Testen, kaufen und verkaufen

Communitys

Über Red Hat Dokumentation

Mehr Inklusion in Open Source

Über Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links