org.apache.uima.examples.tagger.trainAndTest
Class ModelGeneration

java.lang.Object
  extended by org.apache.uima.examples.tagger.trainAndTest.ModelGeneration
All Implemented Interfaces:
java.io.Serializable

public class ModelGeneration
extends java.lang.Object
implements java.io.Serializable

Trains an N-gram model for the tagger, iterating over the files from some predefined training directory. Writes the resulting model to a binary file. NB. At the moment: both bi-and trigram statistics are saved in one model file..

See Also:
Serialized Form

Field Summary
 double[] lambdas2
           
 double[] lambdas3
           
 java.util.Map suffix_tree
           
 java.util.Map suffix_tree_capitalized
           
 double theta
           
 java.util.Map<NGram,java.lang.Double> transition_probs
          Map containing N-gram probabilities
 java.util.Map<java.lang.String,java.util.Map<java.lang.String,java.lang.Double>> word_probs
          Map containing <word,tag> probabilities, that is probability of a certain word given a certain tag at a time t: P(wordt|tagt))
 
Constructor Summary
ModelGeneration(java.util.List<Token> corpus, java.lang.String OutputFile)
           
 
Method Summary
static boolean capitalized(java.lang.String word)
          Check is the token is capitalized
 void init()
           
static void main(java.lang.String[] args)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

suffix_tree

public java.util.Map suffix_tree

suffix_tree_capitalized

public java.util.Map suffix_tree_capitalized

word_probs

public java.util.Map<java.lang.String,java.util.Map<java.lang.String,java.lang.Double>> word_probs
Map containing <word,tag> probabilities, that is probability of a certain word given a certain tag at a time t: P(wordt|tagt))


transition_probs

public java.util.Map<NGram,java.lang.Double> transition_probs
Map containing N-gram probabilities


lambdas2

public double[] lambdas2

lambdas3

public double[] lambdas3

theta

public double theta
Constructor Detail

ModelGeneration

public ModelGeneration(java.util.List<Token> corpus,
                       java.lang.String OutputFile)
Method Detail

init

public void init()

capitalized

public static boolean capitalized(java.lang.String word)
Check is the token is capitalized


main

public static void main(java.lang.String[] args)


Copyright © 2011. All Rights Reserved.