|
Class Summary |
| ArabicLetterTokenizerFactory |
|
| ArabicNormalizationFilterFactory |
|
| ArabicStemFilterFactory |
|
| ASCIIFoldingFilterFactory |
|
| BaseCharFilterFactory |
|
| BaseTokenFilterFactory |
Simple abstract implementation that handles init arg processing. |
| BaseTokenizerFactory |
Simple abstract implementation that handles init arg processing. |
| BrazilianStemFilterFactory |
|
| BufferedTokenStream |
Deprecated. This class does not support custom attributes. |
| CapitalizationFilterFactory |
A filter to apply normal capitalization rules to Tokens. |
| ChineseFilterFactory |
|
| ChineseTokenizerFactory |
|
| CJKTokenizerFactory |
|
| CollationKeyFilterFactory |
Factory for CollationKeyFilter. |
| CommonGramsFilter |
Construct bigrams for frequently occurring terms while indexing. |
| CommonGramsFilterFactory |
Constructs a CommonGramsFilter |
| CommonGramsQueryFilter |
Wrap a CommonGramsFilter optimizing phrase queries by only returning single
words when they are not a member of a bigram. |
| CommonGramsQueryFilterFactory |
Construct CommonGramsQueryFilter
This is pretty close to a straight copy from StopFilterFactory |
| DelimitedPayloadTokenFilterFactory |
|
| DictionaryCompoundWordTokenFilterFactory |
|
| DoubleMetaphoneFilter |
|
| DoubleMetaphoneFilterFactory |
|
| DutchStemFilterFactory |
|
| EdgeNGramFilterFactory |
Creates new instances of EdgeNGramTokenFilter. |
| EdgeNGramTokenizerFactory |
Creates new instances of EdgeNGramTokenizer. |
| ElisionFilterFactory |
|
| EnglishPorterFilterFactory |
Deprecated. Use SnowballPorterFilterFactory with language="English" instead |
| FrenchStemFilterFactory |
|
| GermanStemFilterFactory |
|
| GreekLowerCaseFilterFactory |
Factory for GreekLowerCaseFilter |
| HTMLStripCharFilter |
A CharFilter that wraps another Reader and attempts to strip out HTML constructs. |
| HTMLStripCharFilterFactory |
|
| HTMLStripReader |
Deprecated. Use HTMLStripCharFilter |
| HTMLStripWhitespaceTokenizerFactory |
Deprecated. Use HTMLStripCharFilterFactory and WhitespaceTokenizerFactory |
| HyphenatedWordsFilter |
When the plain text is extracted from documents, we will often have many words hyphenated and broken into
two lines. |
| HyphenatedWordsFilterFactory |
Factory for HyphenatedWordsFilter |
| ISOLatin1AccentFilterFactory |
Factory for ISOLatin1AccentFilter
$Id: ISOLatin1AccentFilterFactory.java 591158 2007-11-01 22:37:42Z hossman $ |
| KeepWordFilter |
A TokenFilter that only keeps tokens with text contained in the
required words. |
| KeepWordFilterFactory |
|
| KeywordTokenizerFactory |
|
| LengthFilterFactory |
|
| LetterTokenizerFactory |
|
| LowerCaseFilterFactory |
|
| LowerCaseTokenizerFactory |
|
| MappingCharFilterFactory |
|
| NGramFilterFactory |
Creates new instances of NGramTokenFilter. |
| NGramTokenizerFactory |
Creates new instances of NGramTokenizer. |
| NumericPayloadTokenFilterFactory |
|
| PatternReplaceCharFilter |
CharFilter that uses a regular expression for the target of replace string. |
| PatternReplaceCharFilterFactory |
|
| PatternReplaceFilter |
A TokenFilter which applies a Pattern to each token in the stream,
replacing match occurances with the specified replacement string. |
| PatternReplaceFilterFactory |
|
| PatternTokenizer |
This tokenizer uses regex pattern matching to construct distinct tokens
for the input stream. |
| PatternTokenizerFactory |
This tokenizer uses regex pattern matching to construct distinct tokens
for the input stream. |
| PersianNormalizationFilterFactory |
|
| PhoneticFilter |
Create tokens for phonetic matches. |
| PhoneticFilterFactory |
Create tokens based on phonetic encoders
http://jakarta.apache.org/commons/codec/api-release/org/apache/commons/codec/language/package-summary.html
This takes two arguments:
"encoder" required, one of "DoubleMetaphone", "Metaphone", "Soundex", "RefinedSoundex"
"inject" (default=true) add tokens to the stream with the offset=0 |
| PorterStemFilterFactory |
|
| PositionFilterFactory |
Set the positionIncrement of all tokens to the "positionIncrement", except the first return token which retains its
original positionIncrement value. |
| RemoveDuplicatesTokenFilter |
A TokenFilter which filters out Tokens at the same position and Term
text as the previous token in the stream. |
| RemoveDuplicatesTokenFilterFactory |
|
| ReversedWildcardFilter |
This class produces a special form of reversed tokens, suitable for
better handling of leading wildcards. |
| ReversedWildcardFilterFactory |
Factory for ReversedWildcardFilter-s. |
| ReverseStringFilterFactory |
A FilterFactory which reverses the input. |
| RussianLetterTokenizerFactory |
Deprecated. Use StandardTokenizerFactory instead. |
| RussianLowerCaseFilterFactory |
Deprecated. Use LowerCaseFilterFactory instead which has the
same functionality. |
| RussianStemFilterFactory |
Deprecated. Use SnowballPorterFilterFactory with "Russian" instead,
which has the same functionality. |
| ShingleFilterFactory |
|
| SnowballPorterFilterFactory |
Factory for SnowballFilters, with configurable language
Browsing the code, SnowballFilter uses reflection to adapt to Lucene... |
| SolrAnalyzer |
|
| SolrAnalyzer.TokenStreamInfo |
|
| StandardFilterFactory |
|
| StandardTokenizerFactory |
|
| StopFilterFactory |
|
| SynonymFilter |
SynonymFilter handles multi-token synonyms with variable position increment offsets. |
| SynonymFilterFactory |
|
| SynonymMap |
Mapping rules for use with SynonymFilter |
| ThaiWordFilterFactory |
|
| TokenizerChain |
|
| TokenOffsetPayloadTokenFilterFactory |
|
| TrimFilter |
Trims leading and trailing whitespace from Tokens in the stream. |
| TrimFilterFactory |
|
| TypeAsPayloadTokenFilterFactory |
|
| WhitespaceTokenizerFactory |
|
| WordDelimiterFilterFactory |
|
| WordDelimiterIterator |
A BreakIterator-like API for iterating over subwords in text, according to WordDelimiterFilter rules. |