org.apache.uima.examples.tagger.trainAndTest
Class ModelGeneration
java.lang.Object
org.apache.uima.examples.tagger.trainAndTest.ModelGeneration
- All Implemented Interfaces:
- java.io.Serializable
public class ModelGeneration
- extends java.lang.Object
- implements java.io.Serializable
Trains an N-gram model for the tagger, iterating over the files from some predefined training directory.
Writes the resulting model to a binary file.
NB. At the moment: both bi-and trigram statistics are saved in one model file..
- See Also:
- Serialized Form
Field Summary |
double[] |
lambdas2
|
double[] |
lambdas3
|
java.util.Map |
suffix_tree
|
java.util.Map |
suffix_tree_capitalized
|
double |
theta
|
java.util.Map<NGram,java.lang.Double> |
transition_probs
Map containing N-gram probabilities |
java.util.Map<java.lang.String,java.util.Map<java.lang.String,java.lang.Double>> |
word_probs
Map containing <word,tag> probabilities, that is probability of a certain word given a certain tag at a time t: P(wordt|tagt)) |
Method Summary |
static boolean |
capitalized(java.lang.String word)
Check is the token is capitalized |
void |
init()
|
static void |
main(java.lang.String[] args)
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
suffix_tree
public java.util.Map suffix_tree
suffix_tree_capitalized
public java.util.Map suffix_tree_capitalized
word_probs
public java.util.Map<java.lang.String,java.util.Map<java.lang.String,java.lang.Double>> word_probs
- Map containing
<word,tag>
probabilities, that is probability of a certain word given a certain tag at a time t: P(wordt|tagt))
transition_probs
public java.util.Map<NGram,java.lang.Double> transition_probs
- Map containing N-gram probabilities
lambdas2
public double[] lambdas2
lambdas3
public double[] lambdas3
theta
public double theta
ModelGeneration
public ModelGeneration(java.util.List<Token> corpus,
java.lang.String OutputFile)
init
public void init()
capitalized
public static boolean capitalized(java.lang.String word)
- Check is the token is capitalized
main
public static void main(java.lang.String[] args)
Copyright © 2011. All Rights Reserved.