|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectweka.core.Instances
public class Instances
Class for handling an ordered set of weighted instances.
Typical usage:
import weka.core.converters.ConverterUtils.DataSource; ... // Read all the instances in the file (ARFF, CSV, XRFF, ...) DataSource source = new DataSource(filename); Instances instances = source.getDataSet(); // Make the last attribute be the class instances.setClassIndex(instances.numAttributes() - 1); // Print header and instances. System.out.println("\nDataset:\n"); System.out.println(instances); ...
All methods that change a set of instances are safe, ie. a change of a set of instances does not affect any other sets of instances. All methods that change a datasets's attribute information clone the dataset before it is changed.
Field Summary | |
---|---|
static java.lang.String |
ARFF_DATA
The keyword used to denote the start of the arff data section |
static java.lang.String |
ARFF_RELATION
The keyword used to denote the start of an arff header |
static java.lang.String |
FILE_EXTENSION
The filename extension that should be used for arff files |
static java.lang.String |
SERIALIZED_OBJ_FILE_EXTENSION
The filename extension that should be used for bin. |
Constructor Summary | |
---|---|
Instances(Instances dataset)
Constructor copying all instances and references to the header information from the given set of instances. |
|
Instances(Instances dataset,
int capacity)
Constructor creating an empty set of instances. |
|
Instances(Instances source,
int first,
int toCopy)
Creates a new set of instances by copying a subset of another set. |
|
Instances(java.io.Reader reader)
Reads an ARFF file from a reader, and assigns a weight of one to each instance. |
|
Instances(java.io.Reader reader,
int capacity)
Deprecated. instead of using this method in conjunction with the readInstance(Reader) method, one should use the
ArffLoader or DataSource class instead. |
|
Instances(java.lang.String name,
FastVector attInfo,
int capacity)
Creates an empty set of instances. |
Method Summary | |
---|---|
void |
add(Instance instance)
Adds one instance to the end of the set. |
Attribute |
attribute(int index)
Returns an attribute. |
Attribute |
attribute(java.lang.String name)
Returns an attribute given its name. |
AttributeStats |
attributeStats(int index)
Calculates summary statistics on the values that appear in this set of instances for a specified attribute. |
double[] |
attributeToDoubleArray(int index)
Gets the value of all instances in this dataset for a particular attribute. |
boolean |
checkForAttributeType(int attType)
Checks for attributes of the given type in the dataset |
boolean |
checkForStringAttributes()
Checks for string attributes in the dataset |
boolean |
checkInstance(Instance instance)
Checks if the given instance is compatible with this dataset. |
Attribute |
classAttribute()
Returns the class attribute. |
int |
classIndex()
Returns the class attribute's index. |
void |
compactify()
Compactifies the set of instances. |
void |
delete()
Removes all instances from the set. |
void |
delete(int index)
Removes an instance at the given position from the set. |
void |
deleteAttributeAt(int position)
Deletes an attribute at the given position (0 to numAttributes() - 1). |
void |
deleteAttributeType(int attType)
Deletes all attributes of the given type in the dataset. |
void |
deleteStringAttributes()
Deletes all string attributes in the dataset. |
void |
deleteWithMissing(Attribute att)
Removes all instances with missing values for a particular attribute from the dataset. |
void |
deleteWithMissing(int attIndex)
Removes all instances with missing values for a particular attribute from the dataset. |
void |
deleteWithMissingClass()
Removes all instances with a missing class value from the dataset. |
java.util.Enumeration |
enumerateAttributes()
Returns an enumeration of all the attributes. |
java.util.Enumeration |
enumerateInstances()
Returns an enumeration of all instances in the dataset. |
boolean |
equalHeaders(Instances dataset)
Checks if two headers are equivalent. |
Instance |
firstInstance()
Returns the first instance in the set. |
java.util.Random |
getRandomNumberGenerator(long seed)
Returns a random number generator. |
java.lang.String |
getRevision()
Returns the revision string. |
void |
insertAttributeAt(Attribute att,
int position)
Inserts an attribute at the given position (0 to numAttributes()) and sets all values to be missing. |
Instance |
instance(int index)
Returns the instance at the given position. |
double |
kthSmallestValue(Attribute att,
int k)
Returns the kth-smallest attribute value of a numeric attribute. |
double |
kthSmallestValue(int attIndex,
int k)
Returns the kth-smallest attribute value of a numeric attribute. |
Instance |
lastInstance()
Returns the last instance in the set. |
static void |
main(java.lang.String[] args)
Main method for this class. |
double |
meanOrMode(Attribute att)
Returns the mean (mode) for a numeric (nominal) attribute as a floating-point value. |
double |
meanOrMode(int attIndex)
Returns the mean (mode) for a numeric (nominal) attribute as a floating-point value. |
static Instances |
mergeInstances(Instances first,
Instances second)
Merges two sets of Instances together. |
int |
numAttributes()
Returns the number of attributes. |
int |
numClasses()
Returns the number of class labels. |
int |
numDistinctValues(Attribute att)
Returns the number of distinct values of a given attribute. |
int |
numDistinctValues(int attIndex)
Returns the number of distinct values of a given attribute. |
int |
numInstances()
Returns the number of instances in the dataset. |
void |
randomize(java.util.Random random)
Shuffles the instances in the set so that they are ordered randomly. |
void |
randomizeAttribute(int attIdx,
java.util.Random random,
int rounds)
Shuffles the values of a given attribute in all instances. |
boolean |
readInstance(java.io.Reader reader)
Deprecated. instead of using this method in conjunction with the readInstance(Reader) method, one should use the
ArffLoader or DataSource class instead. |
java.lang.String |
relationName()
Returns the relation's name. |
void |
renameAttribute(Attribute att,
java.lang.String name)
Renames an attribute. |
void |
renameAttribute(int att,
java.lang.String name)
Renames an attribute. |
void |
renameAttributeValue(Attribute att,
java.lang.String val,
java.lang.String name)
Renames the value of a nominal (or string) attribute value. |
void |
renameAttributeValue(int att,
int val,
java.lang.String name)
Renames the value of a nominal (or string) attribute value. |
Instances |
resample(java.util.Random random)
Creates a new dataset of the same size using random sampling with replacement. |
Instances |
resampleWithWeights(java.util.Random random)
Creates a new dataset of the same size using random sampling with replacement according to the current instance weights. |
Instances |
resampleWithWeights(java.util.Random random,
double[] weights)
Creates a new dataset of the same size using random sampling with replacement according to the given weight vector. |
void |
setClass(Attribute att)
Sets the class attribute. |
void |
setClassIndex(int classIndex)
Sets the class index of the set. |
void |
setRelationName(java.lang.String newName)
Sets the relation's name. |
void |
sort(Attribute att)
Sorts the instances based on an attribute. |
void |
sort(int attIndex)
Sorts the instances based on an attribute. |
void |
stratify(int numFolds)
Stratifies a set of instances according to its class values if the class attribute is nominal (so that afterwards a stratified cross-validation can be performed). |
Instances |
stringFreeStructure()
Create a copy of the structure if the data has string or relational attributes, "cleanses" string types (i.e. |
double |
sumOfWeights()
Computes the sum of all the instances' weights. |
void |
swap(int i,
int j)
Swaps two instances in the set. |
static void |
test(java.lang.String[] argv)
Method for testing this class. |
Instances |
testCV(int numFolds,
int numFold)
Creates the test set for one fold of a cross-validation on the dataset. |
java.lang.String |
toString()
Returns the dataset as a string in ARFF format. |
java.lang.String |
toSummaryString()
Generates a string summarizing the set of instances. |
Instances |
trainCV(int numFolds,
int numFold)
Creates the training set for one fold of a cross-validation on the dataset. |
Instances |
trainCV(int numFolds,
int numFold,
java.util.Random random)
Creates the training set for one fold of a cross-validation on the dataset. |
void |
undoRandomizeAttribute()
Does an undo of a previous call to randomizeAttribute, so that the original values of the attribute are restored. |
double |
variance(Attribute att)
Computes the variance for a numeric attribute. |
double |
variance(int attIndex)
Computes the variance for a numeric attribute. |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public static final java.lang.String FILE_EXTENSION
public static final java.lang.String SERIALIZED_OBJ_FILE_EXTENSION
public static final java.lang.String ARFF_RELATION
public static final java.lang.String ARFF_DATA
Constructor Detail |
---|
public Instances(java.io.Reader reader) throws java.io.IOException
reader
- the reader
java.io.IOException
- if the ARFF file is not read
successfully@Deprecated public Instances(java.io.Reader reader, int capacity) throws java.io.IOException
readInstance(Reader)
method, one should use the
ArffLoader
or DataSource
class instead.
reader
- the readercapacity
- the capacity
java.lang.IllegalArgumentException
- if the header is not read successfully
or the capacity is negative.
java.io.IOException
- if there is a problem with the reader.ArffLoader
,
ConverterUtils.DataSource
public Instances(Instances dataset)
dataset
- the set to be copiedpublic Instances(Instances dataset, int capacity)
dataset
- the instances from which the header
information is to be takencapacity
- the capacity of the new datasetpublic Instances(Instances source, int first, int toCopy)
source
- the set of instances from which a subset
is to be createdfirst
- the index of the first instance to be copiedtoCopy
- the number of instances to be copied
java.lang.IllegalArgumentException
- if first and toCopy are out of rangepublic Instances(java.lang.String name, FastVector attInfo, int capacity)
name
- the name of the relationattInfo
- the attribute informationcapacity
- the capacity of the setMethod Detail |
---|
public Instances stringFreeStructure()
public void add(Instance instance)
instance
- the instance to be addedpublic Attribute attribute(int index)
index
- the attribute's index (index starts with 0)
public Attribute attribute(java.lang.String name)
name
- the attribute's name
public boolean checkForAttributeType(int attType)
attType
- the attribute type to look for
public boolean checkForStringAttributes()
public boolean checkInstance(Instance instance)
instance
- the instance to check
public Attribute classAttribute()
UnassignedClassException
- if the class is not setpublic int classIndex()
public void compactify()
public void delete()
public void delete(int index)
index
- the instance's position (index starts with 0)public void deleteAttributeAt(int position)
position
- the attribute's position (position starts with 0)
java.lang.IllegalArgumentException
- if the given index is out of range
or the class attribute is being deletedpublic void deleteAttributeType(int attType)
attType
- the attribute type to delete
java.lang.IllegalArgumentException
- if attribute couldn't be
successfully deleted (probably because it is the class attribute).public void deleteStringAttributes()
java.lang.IllegalArgumentException
- if string attribute couldn't be
successfully deleted (probably because it is the class attribute).deleteAttributeType(int)
public void deleteWithMissing(int attIndex)
attIndex
- the attribute's index (index starts with 0)public void deleteWithMissing(Attribute att)
att
- the attributepublic void deleteWithMissingClass()
UnassignedClassException
- if class is not setpublic java.util.Enumeration enumerateAttributes()
public java.util.Enumeration enumerateInstances()
public boolean equalHeaders(Instances dataset)
dataset
- another dataset
public Instance firstInstance()
public java.util.Random getRandomNumberGenerator(long seed)
seed
- the given seed
public void insertAttributeAt(Attribute att, int position)
att
- the attribute to be insertedposition
- the attribute's position (position starts with 0)
java.lang.IllegalArgumentException
- if the given index is out of rangepublic Instance instance(int index)
index
- the instance's index (index starts with 0)
public double kthSmallestValue(Attribute att, int k)
att
- the Attribute objectk
- the value of k
public double kthSmallestValue(int attIndex, int k)
attIndex
- the attribute's indexk
- the value of k
public Instance lastInstance()
public double meanOrMode(int attIndex)
attIndex
- the attribute's index (index starts with 0)
public double meanOrMode(Attribute att)
att
- the attribute
public int numAttributes()
public int numClasses()
UnassignedClassException
- if the class is not setpublic int numDistinctValues(int attIndex)
attIndex
- the attribute (index starts with 0)
public int numDistinctValues(Attribute att)
att
- the attribute
public int numInstances()
public void randomize(java.util.Random random)
random
- a random number generatorpublic void undoRandomizeAttribute() throws java.lang.Exception
java.lang.Exception
- if there was no call to randomizeAttribute or if
attributes were added or removed since the last call to
randomizeAttribute
randomizeAttribute
public void randomizeAttribute(int attIdx, java.util.Random random, int rounds)
randomizeAttribute
and
undoRandomizeAttribute
.
attIdx
- the index of the attribute to shufflerandom
- a random number generatorrounds
- how many rounds of shuffling, minimum must be 1. As more
rounds of shuffling the more random your attribute value distribution
(e.g. choose 3, but note that the time needed for shuffling is proportional
to the number of rounds).undoRandomizeAttribute
@Deprecated public boolean readInstance(java.io.Reader reader) throws java.io.IOException
readInstance(Reader)
method, one should use the
ArffLoader
or DataSource
class instead.
reader
- the reader
java.io.IOException
- if the information is not read
successfullyArffLoader
,
ConverterUtils.DataSource
public java.lang.String relationName()
public void renameAttribute(int att, java.lang.String name)
att
- the attribute's index (index starts with 0)name
- the new namepublic void renameAttribute(Attribute att, java.lang.String name)
att
- the attributename
- the new namepublic void renameAttributeValue(int att, int val, java.lang.String name)
att
- the attribute's index (index starts with 0)val
- the value's index (index starts with 0)name
- the new namepublic void renameAttributeValue(Attribute att, java.lang.String val, java.lang.String name)
att
- the attributeval
- the valuename
- the new namepublic Instances resample(java.util.Random random)
random
- a random number generator
public Instances resampleWithWeights(java.util.Random random)
random
- a random number generator
public Instances resampleWithWeights(java.util.Random random, double[] weights)
random
- a random number generatorweights
- the weight vector
java.lang.IllegalArgumentException
- if the weights array is of the wrong
length or contains negative weights.public void setClass(Attribute att)
att
- attribute to be the classpublic void setClassIndex(int classIndex)
classIndex
- the new class index (index starts with 0)
java.lang.IllegalArgumentException
- if the class index is too big or < 0public void setRelationName(java.lang.String newName)
newName
- the new relation name.public void sort(int attIndex)
attIndex
- the attribute's index (index starts with 0)public void sort(Attribute att)
att
- the attributepublic void stratify(int numFolds)
numFolds
- the number of folds in the cross-validation
UnassignedClassException
- if the class is not setpublic double sumOfWeights()
public Instances testCV(int numFolds, int numFold)
numFolds
- the number of folds in the cross-validation. Must
be greater than 1.numFold
- 0 for the first fold, 1 for the second, ...
java.lang.IllegalArgumentException
- if the number of folds is less than 2
or greater than the number of instances.public java.lang.String toString()
toString
in class java.lang.Object
public Instances trainCV(int numFolds, int numFold)
numFolds
- the number of folds in the cross-validation. Must
be greater than 1.numFold
- 0 for the first fold, 1 for the second, ...
java.lang.IllegalArgumentException
- if the number of folds is less than 2
or greater than the number of instances.public Instances trainCV(int numFolds, int numFold, java.util.Random random)
numFolds
- the number of folds in the cross-validation. Must
be greater than 1.numFold
- 0 for the first fold, 1 for the second, ...random
- the random number generator
java.lang.IllegalArgumentException
- if the number of folds is less than 2
or greater than the number of instances.public double variance(int attIndex)
attIndex
- the numeric attribute (index starts with 0)
java.lang.IllegalArgumentException
- if the attribute is not numericpublic double variance(Attribute att)
att
- the numeric attribute
java.lang.IllegalArgumentException
- if the attribute is not numericpublic AttributeStats attributeStats(int index)
index
- the index of the attribute to summarize (index starts with 0)
public double[] attributeToDoubleArray(int index)
index
- the index of the attribute.
public java.lang.String toSummaryString()
public void swap(int i, int j)
i
- the first instance's index (index starts with 0)j
- the second instance's index (index starts with 0)public static Instances mergeInstances(Instances first, Instances second)
first
- the first set of Instancessecond
- the second set of Instances
java.lang.IllegalArgumentException
- if the datasets are not the same sizepublic static void test(java.lang.String[] argv)
argv
- should contain one element: the name of an ARFF filepublic static void main(java.lang.String[] args)
weka.core.Instances
helpweka.core.Instances
<filename>weka.core.Instances
merge <filename1> <filename2>weka.core.Instances
append <filename1> <filename2>weka.core.Instances
randomize <seed> <filename>
args
- the commandline parameterspublic java.lang.String getRevision()
getRevision
in interface RevisionHandler
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |