org.apache.uima.conceptMapper.support.tokens
Class TokenNormalizer

java.lang.Object
  extended by org.apache.uima.conceptMapper.support.tokens.TokenNormalizer

public class TokenNormalizer
extends java.lang.Object


Field Summary
static java.lang.String PARAM_CASE_MATCH
          Configuration parameter key/label for the case matching string
static java.lang.String PARAM_STEMMER_CLASS
          Configuration parameter key/label for the stemmer class spec.
static java.lang.String PARAM_STEMMER_DICT
          Configuration parameter key/label for the stemmer dictionary, passed into the stemmer's initialization method
 
Constructor Summary
TokenNormalizer(org.apache.uima.analysis_engine.annotator.AnnotatorContext annotatorContext, Logger logger)
           
 
Method Summary
 java.lang.String foldCase(java.lang.String token)
          If one of the case folding flags is true and the input string matches the character pattern corresponding to that flag, then convert all letters to lowercase.
 Stemmer getStemmer()
           
 boolean isCaseFoldAll()
           
 boolean isCaseFoldDigit()
           
 boolean isCaseFoldInitCap()
           
 java.lang.String normalize(java.lang.String token)
           
 void setCaseFoldAll(boolean caseFoldAll)
           
 void setCaseFoldDigit(boolean caseFoldDigit)
           
 void setCaseFoldInitCap(boolean caseFoldInitCap)
           
 void setStemmer(Stemmer stemmer)
           
 boolean shouldFoldCase(java.lang.String token)
           
 boolean shouldStem()
           
 java.lang.String stem(java.lang.String token)
          If the stemming flag is true, then return the stemmed form of the supplied word using the Porter stemmer.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

PARAM_CASE_MATCH

public static final java.lang.String PARAM_CASE_MATCH
Configuration parameter key/label for the case matching string

See Also:
Constant Field Values

PARAM_STEMMER_CLASS

public static final java.lang.String PARAM_STEMMER_CLASS
Configuration parameter key/label for the stemmer class spec. If left out, no stemmer is used

See Also:
Constant Field Values

PARAM_STEMMER_DICT

public static final java.lang.String PARAM_STEMMER_DICT
Configuration parameter key/label for the stemmer dictionary, passed into the stemmer's initialization method

See Also:
Constant Field Values
Constructor Detail

TokenNormalizer

public TokenNormalizer(org.apache.uima.analysis_engine.annotator.AnnotatorContext annotatorContext,
                       Logger logger)
                throws org.apache.uima.analysis_engine.annotator.AnnotatorContextException
Parameters:
annotatorContext -
logger -
Throws:
org.apache.uima.analysis_engine.annotator.AnnotatorContextException
Method Detail

getStemmer

public Stemmer getStemmer()
Returns:
Returns the stemmer.

setStemmer

public void setStemmer(Stemmer stemmer)
Parameters:
stemmer - The stemmer to set.

shouldStem

public boolean shouldStem()

isCaseFoldAll

public boolean isCaseFoldAll()
Returns:
Returns the caseFoldAll.

setCaseFoldAll

public void setCaseFoldAll(boolean caseFoldAll)
Parameters:
caseFoldAll - The caseFoldAll to set.

isCaseFoldDigit

public boolean isCaseFoldDigit()
Returns:
Returns the caseFoldDigit.

setCaseFoldDigit

public void setCaseFoldDigit(boolean caseFoldDigit)
Parameters:
caseFoldDigit - The caseFoldDigit to set.

isCaseFoldInitCap

public boolean isCaseFoldInitCap()
Returns:
Returns the caseFoldInitCap.

setCaseFoldInitCap

public void setCaseFoldInitCap(boolean caseFoldInitCap)
Parameters:
caseFoldInitCap - The caseFoldInitCap to set.

shouldFoldCase

public boolean shouldFoldCase(java.lang.String token)

foldCase

public java.lang.String foldCase(java.lang.String token)
If one of the case folding flags is true and the input string matches the character pattern corresponding to that flag, then convert all letters to lowercase.

Parameters:
token - The string to case fold
Returns:
The case folded string

stem

public java.lang.String stem(java.lang.String token)
If the stemming flag is true, then return the stemmed form of the supplied word using the Porter stemmer.

Parameters:
token - the word to stem
Returns:
the original word if the stemming flag is false, otherwise the stemmed form of the word

normalize

public java.lang.String normalize(java.lang.String token)


Copyright © 2011. All Rights Reserved.