org.apache.uima.conceptMapper
Class ConceptMapper

java.lang.Object
  extended by org.apache.uima.analysis_engine.annotator.Annotator_ImplBase
      extended by org.apache.uima.conceptMapper.ConceptMapper
All Implemented Interfaces:
org.apache.uima.analysis_engine.annotator.BaseAnnotator, org.apache.uima.analysis_engine.annotator.TextAnnotator

public class ConceptMapper
extends org.apache.uima.analysis_engine.annotator.Annotator_ImplBase
implements org.apache.uima.analysis_engine.annotator.TextAnnotator


Field Summary
protected  java.lang.String[] attributeNames
          Array of attribute names for the XML dictionary token element, obtained as a configuration parameter.
protected  java.lang.String[] featureNames
          Array of feature names, obtained as a configuration parameter.
protected  org.apache.uima.cas.Feature[] features
          Array of Feature objects associated with {link #annotationType annotationType}
static java.lang.String PARAM_ANNOTATION_NAME
          Configuration parameter key/label for the annotation name
static java.lang.String PARAM_ATTRIBUTE_LIST
          Configuration parameter key/label for the attribute list
static java.lang.String PARAM_DICT_FILE
          Configuration parameter key/label for the dictionary file to load
static java.lang.String PARAM_ENCLOSINGSPAN
          Configuration parameter key/label for the name of the feature that contains the resulting term's span, i.e.
static java.lang.String PARAM_FEATURE_LIST
          Configuration parameter key/label for the feature list
static java.lang.String PARAM_FINDALLMATCHES
           
static java.lang.String PARAM_MATCHEDFEATURE
          Configuration parameter feature in resulting annotation to store text matched in successful dict lookup
static java.lang.String PARAM_MATCHEDTOKENSFEATURENAME
          Configuration parameter for name of feature in result annotations to contain list of matched tokens
static java.lang.String PARAM_ORDERINDEPENDENTLOOKUP
          Configuration parameter key/label to indicate if order-independent lookup is to be performed.
static java.lang.String PARAM_SEARCHSTRATEGY
          Configuration parameter to indicate search strategy, either: LongestMatch: longest match of contiguous tokens within enclosing span(taking into account included/excluded items).
static java.lang.String PARAM_TOKENANNOTATION
          Configuration parameter giving type of tokens
static java.lang.String PARAM_TOKENCLASSFEATURENAME
          Configuration parameter for name of token class feature of token annotations, to distinguish classes of tokens to skip during lookups.
static java.lang.String PARAM_TOKENCLASSWRITEBACKFEATURENAMES
          array of features of the token annotation which should be written back to the token from the resulting entry.
static java.lang.String PARAM_TOKENTEXTFEATURENAME
          Configuration parameter specifying name of token's feature containing text.
static java.lang.String PARAM_TOKENTYPEFEATURENAME
          Configuration parameter for name of token type feature of token annotations, to distinguish types of tokens to skip during lookups.
static java.lang.String PARAMVALUE_CONTIGUOUSMATCH
           
static java.lang.String PARAMVALUE_SKIPANYMATCH
           
static java.lang.String PARAMVALUE_SKIPANYMATCHALLOWOVERLAP
           
protected  java.lang.String resultAnnotationName
          The name of the annotation type posted to the CAS by this TAE
protected  org.apache.uima.cas.Type resultAnnotationType
          The type of annotation posted to the CAS by this TAE
protected  org.apache.uima.cas.Type tokenType
          The type of token annotations to consider
 
Constructor Summary
ConceptMapper()
           
 
Method Summary
 void initialize(org.apache.uima.analysis_engine.annotator.AnnotatorContext annotatorContext)
          Initialize the annotator, which includes compilation of regular expressions, fetching configuration parameters from XML descriptor file, and loading of the dictionary file.
protected  void makeAnnotation(org.apache.uima.cas.CAS tcas, int start, int end, EntryProperties properties, org.apache.uima.jcas.tcas.Annotation spanAnnotation, java.lang.String matchedText, java.util.Collection<org.apache.uima.cas.text.AnnotationFS> matched, Logger log)
           
 void process(org.apache.uima.cas.CAS tcas, org.apache.uima.analysis_engine.ResultSpecification aResultSpec)
          Perform the actual analysis.
protected  void processTokenList(int searchStrategy, boolean findAllMatches, org.apache.uima.cas.CAS tcas, java.util.ArrayList<org.apache.uima.cas.text.AnnotationFS> tokens, org.apache.uima.jcas.tcas.Annotation spanAnnotation)
           
 void typeSystemInit(org.apache.uima.cas.TypeSystem typeSystem)
          Perform local type system initialization.
 
Methods inherited from class org.apache.uima.analysis_engine.annotator.Annotator_ImplBase
destroy, finalize, getContext, getTypeSystem, reconfigure
 
Methods inherited from class java.lang.Object
clone, equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.uima.analysis_engine.annotator.BaseAnnotator
destroy, reconfigure
 

Field Detail

PARAM_DICT_FILE

public static final java.lang.String PARAM_DICT_FILE
Configuration parameter key/label for the dictionary file to load

See Also:
Constant Field Values

PARAM_TOKENCLASSFEATURENAME

public static final java.lang.String PARAM_TOKENCLASSFEATURENAME
Configuration parameter for name of token class feature of token annotations, to distinguish classes of tokens to skip during lookups. Token class features are Strings.

See Also:
Constant Field Values

PARAM_TOKENTYPEFEATURENAME

public static final java.lang.String PARAM_TOKENTYPEFEATURENAME
Configuration parameter for name of token type feature of token annotations, to distinguish types of tokens to skip during lookups. Token type features are Integers

See Also:
Constant Field Values

PARAM_ANNOTATION_NAME

public static final java.lang.String PARAM_ANNOTATION_NAME
Configuration parameter key/label for the annotation name

See Also:
Constant Field Values

PARAM_ENCLOSINGSPAN

public static final java.lang.String PARAM_ENCLOSINGSPAN
Configuration parameter key/label for the name of the feature that contains the resulting term's span, i.e. sentence

See Also:
Constant Field Values

PARAM_MATCHEDFEATURE

public static final java.lang.String PARAM_MATCHEDFEATURE
Configuration parameter feature in resulting annotation to store text matched in successful dict lookup

See Also:
Constant Field Values

PARAM_ATTRIBUTE_LIST

public static final java.lang.String PARAM_ATTRIBUTE_LIST
Configuration parameter key/label for the attribute list

See Also:
Constant Field Values

PARAM_FEATURE_LIST

public static final java.lang.String PARAM_FEATURE_LIST
Configuration parameter key/label for the feature list

See Also:
Constant Field Values

PARAM_TOKENANNOTATION

public static final java.lang.String PARAM_TOKENANNOTATION
Configuration parameter giving type of tokens

See Also:
Constant Field Values

PARAM_TOKENTEXTFEATURENAME

public static final java.lang.String PARAM_TOKENTEXTFEATURENAME
Configuration parameter specifying name of token's feature containing text. If not specified, the token annotation's covered text is used

See Also:
Constant Field Values

PARAM_TOKENCLASSWRITEBACKFEATURENAMES

public static final java.lang.String PARAM_TOKENCLASSWRITEBACKFEATURENAMES
array of features of the token annotation which should be written back to the token from the resulting entry. For example, if a Part of Speech is specified as part of a dictionary entry, it could be written back to the token so that a subsequent POS tagger would be able to use it as a preannotated item.

See Also:
Constant Field Values

PARAM_MATCHEDTOKENSFEATURENAME

public static final java.lang.String PARAM_MATCHEDTOKENSFEATURENAME
Configuration parameter for name of feature in result annotations to contain list of matched tokens

See Also:
Constant Field Values

PARAM_ORDERINDEPENDENTLOOKUP

public static final java.lang.String PARAM_ORDERINDEPENDENTLOOKUP
Configuration parameter key/label to indicate if order-independent lookup is to be performed. If true, words in a phrase are sorted alphabetically before lookup. This implies that a phrase "C D A" would be considered equivalent to "A C D" and "D A C", etc.

See Also:
Constant Field Values

PARAMVALUE_CONTIGUOUSMATCH

public static final java.lang.String PARAMVALUE_CONTIGUOUSMATCH
See Also:
Constant Field Values

PARAMVALUE_SKIPANYMATCH

public static final java.lang.String PARAMVALUE_SKIPANYMATCH
See Also:
Constant Field Values

PARAMVALUE_SKIPANYMATCHALLOWOVERLAP

public static final java.lang.String PARAMVALUE_SKIPANYMATCHALLOWOVERLAP
See Also:
Constant Field Values

PARAM_SEARCHSTRATEGY

public static final java.lang.String PARAM_SEARCHSTRATEGY
Configuration parameter to indicate search strategy, either: LongestMatch: longest match of contiguous tokens within enclosing span(taking into account included/excluded items). DEFAULT strategy SkipAnyMatch: longest match of noncontiguous tokens enclosing span (taking into account included/excluded items). IMPLIES order-independent lookup

See Also:
Constant Field Values

PARAM_FINDALLMATCHES

public static final java.lang.String PARAM_FINDALLMATCHES
See Also:
Constant Field Values

resultAnnotationName

protected java.lang.String resultAnnotationName
The name of the annotation type posted to the CAS by this TAE


resultAnnotationType

protected org.apache.uima.cas.Type resultAnnotationType
The type of annotation posted to the CAS by this TAE


tokenType

protected org.apache.uima.cas.Type tokenType
The type of token annotations to consider


features

protected org.apache.uima.cas.Feature[] features
Array of Feature objects associated with {link #annotationType annotationType}


featureNames

protected java.lang.String[] featureNames
Array of feature names, obtained as a configuration parameter.


attributeNames

protected java.lang.String[] attributeNames
Array of attribute names for the XML dictionary token element, obtained as a configuration parameter.

Constructor Detail

ConceptMapper

public ConceptMapper()
Method Detail

initialize

public void initialize(org.apache.uima.analysis_engine.annotator.AnnotatorContext annotatorContext)
                throws org.apache.uima.analysis_engine.annotator.AnnotatorConfigurationException,
                       org.apache.uima.analysis_engine.annotator.AnnotatorInitializationException
Initialize the annotator, which includes compilation of regular expressions, fetching configuration parameters from XML descriptor file, and loading of the dictionary file.

Specified by:
initialize in interface org.apache.uima.analysis_engine.annotator.BaseAnnotator
Overrides:
initialize in class org.apache.uima.analysis_engine.annotator.Annotator_ImplBase
Throws:
org.apache.uima.analysis_engine.annotator.AnnotatorConfigurationException
org.apache.uima.analysis_engine.annotator.AnnotatorInitializationException

typeSystemInit

public void typeSystemInit(org.apache.uima.cas.TypeSystem typeSystem)
                    throws org.apache.uima.analysis_engine.annotator.AnnotatorConfigurationException,
                           org.apache.uima.analysis_engine.annotator.AnnotatorInitializationException
Perform local type system initialization.

Specified by:
typeSystemInit in interface org.apache.uima.analysis_engine.annotator.BaseAnnotator
Overrides:
typeSystemInit in class org.apache.uima.analysis_engine.annotator.Annotator_ImplBase
Parameters:
typeSystem - the current type system.
Throws:
org.apache.uima.analysis_engine.annotator.AnnotatorConfigurationException
org.apache.uima.analysis_engine.annotator.AnnotatorInitializationException
See Also:
BaseAnnotator.typeSystemInit(TypeSystem)

process

public void process(org.apache.uima.cas.CAS tcas,
                    org.apache.uima.analysis_engine.ResultSpecification aResultSpec)
             throws org.apache.uima.analysis_engine.annotator.AnnotatorProcessException
Perform the actual analysis. Iterate over the document content looking for any matching words or phrases in the loaded dictionary and post an annotation for each match found.

Specified by:
process in interface org.apache.uima.analysis_engine.annotator.TextAnnotator
Parameters:
tcas - the current CAS to process.
aResultSpec - a specification of the result annotation that should be created by this annotator
Throws:
org.apache.uima.analysis_engine.annotator.AnnotatorProcessException
See Also:
TextAnnotator.process(CAS,ResultSpecification)

processTokenList

protected void processTokenList(int searchStrategy,
                                boolean findAllMatches,
                                org.apache.uima.cas.CAS tcas,
                                java.util.ArrayList<org.apache.uima.cas.text.AnnotationFS> tokens,
                                org.apache.uima.jcas.tcas.Annotation spanAnnotation)
Parameters:
searchStrategy -
tcas -
tokens -
spanAnnotation -

makeAnnotation

protected void makeAnnotation(org.apache.uima.cas.CAS tcas,
                              int start,
                              int end,
                              EntryProperties properties,
                              org.apache.uima.jcas.tcas.Annotation spanAnnotation,
                              java.lang.String matchedText,
                              java.util.Collection<org.apache.uima.cas.text.AnnotationFS> matched,
                              Logger log)
Parameters:
start -
end -
properties -
matched -


Copyright © 2011. All Rights Reserved.