公有成员 | 静态公有成员 | 保护成员 | 保护属性

CStringFeatures< ST >模板类参考


详细描述

template<class ST>
class shogun::CStringFeatures< ST >

Template class StringFeatures implements a list of strings.

As this class is a template the underlying storage type is quite arbitrary and not limited to character strings, but could also be sequences of floating point numbers etc. Strings differ from matrices (cf. CSimpleFeatures) in a way that the dimensionality of the feature vectors (i.e. the strings) is not fixed; it may vary between strings.

Most string kernels require StringFeatures but a number of them actually requires strings to have same length.

When preprocessors are attached to string features they may shorten the string, but are not allowed to return strings longer than max_string_length, as some algorithms depend on this.

Also note that string features cannot currently be computed on-the-fly.

在文件StringFeatures.h127行定义。

继承图,类CStringFeatures< ST >
Inheritance graph
[图例]

所有成员的列表。

公有成员

 CStringFeatures ()
 CStringFeatures (EAlphabet alpha)
 CStringFeatures (T_STRING< ST > *p_features, int32_t p_num_vectors, int32_t p_max_string_length, EAlphabet alpha)
 CStringFeatures (T_STRING< ST > *p_features, int32_t p_num_vectors, int32_t p_max_string_length, CAlphabet *alpha)
 CStringFeatures (CAlphabet *alpha)
 CStringFeatures (const CStringFeatures &orig)
 CStringFeatures (CFile *loader, EAlphabet alpha=DNA)
virtual ~CStringFeatures ()
virtual void cleanup ()
virtual void cleanup_feature_vector (int32_t num)
virtual EFeatureClass get_feature_class ()
virtual EFeatureType get_feature_type ()
CAlphabetget_alphabet ()
virtual CFeaturesduplicate () const
void get_feature_vector (ST **dst, int32_t *len, int32_t num)
void set_feature_vector (ST *src, int32_t len, int32_t num)
void enable_on_the_fly_preprocessing ()
void disable_on_the_fly_preprocessing ()
ST * get_feature_vector (int32_t num, int32_t &len, bool &dofree)
CStringFeatures< ST > * get_transposed ()
T_STRING< ST > * get_transposed (int32_t &num_feat, int32_t &num_vec)
void free_feature_vector (ST *feat_vec, int32_t num, bool dofree)
virtual ST get_feature (int32_t vec_num, int32_t feat_num)
virtual int32_t get_vector_length (int32_t vec_num)
virtual int32_t get_max_vector_length ()
virtual int32_t get_num_vectors ()
floatmax_t get_num_symbols ()
floatmax_t get_max_num_symbols ()
floatmax_t get_original_num_symbols ()
int32_t get_order ()
ST get_masked_symbols (ST symbol, uint8_t mask)
ST shift_offset (ST offset, int32_t amount)
ST shift_symbol (ST symbol, int32_t amount)
virtual void load (CFile *loader)
void load_ascii_file (char *fname, bool remap_to_bin=true, EAlphabet ascii_alphabet=DNA, EAlphabet binary_alphabet=RAWDNA)
bool load_fasta_file (const char *fname, bool ignore_invalid=false)
bool load_fastq_file (const char *fname, bool ignore_invalid=false, bool bitremap_in_single_string=false)
bool load_from_directory (char *dirname)
bool set_features (T_STRING< ST > *p_features, int32_t p_num_vectors, int32_t p_max_string_length)
bool append_features (CStringFeatures< ST > *sf)
bool append_features (T_STRING< ST > *p_features, int32_t p_num_vectors, int32_t p_max_string_length)
virtual T_STRING< ST > * get_features (int32_t &num_str, int32_t &max_str_len)
virtual T_STRING< ST > * copy_features (int32_t &num_str, int32_t &max_str_len)
virtual void get_features (T_STRING< ST > **dst, int32_t *num_str)
virtual void save (CFile *writer)
virtual bool load_compressed (char *src, bool decompress)
virtual bool save_compressed (char *dest, E_COMPRESSION_TYPE compression, int level)
virtual int32_t get_size ()
virtual bool apply_preproc (bool force_preprocessing=false)
int32_t obtain_by_sliding_window (int32_t window_size, int32_t step_size, int32_t skip=0)
int32_t obtain_by_position_list (int32_t window_size, CDynamicArray< int32_t > *positions, int32_t skip=0)
bool obtain_from_char (CStringFeatures< char > *sf, int32_t start, int32_t p_order, int32_t gap, bool rev)
template<class CT >
bool obtain_from_char_features (CStringFeatures< CT > *sf, int32_t start, int32_t p_order, int32_t gap, bool rev)
bool have_same_length (int32_t len=-1)
void embed_features (int32_t p_order)
void compute_symbol_mask_table (int64_t max_val)
void unembed_word (ST word, uint8_t *seq, int32_t len)
ST embed_word (ST *seq, int32_t len)
void determine_maximum_string_length ()
virtual void set_feature_vector (int32_t num, ST *string, int32_t len)
virtual void get_histogram (float64_t **hist, int32_t *rows, int32_t *cols, bool normalize=true)
virtual void create_random (float64_t *hist, int32_t rows, int32_t cols, int32_t num_vec)
virtual const char * get_name () const

静态公有成员

static ST * get_zero_terminated_string_copy (T_STRING< ST > str)

保护成员

virtual ST * compute_feature_vector (int32_t num, int32_t &len)

保护属性

CAlphabetalphabet
 alphabet
int32_t num_vectors
 number of string vectors
T_STRING< ST > * features
 this contains the array of features.
ST * single_string
 true when single string / created by sliding window
int32_t length_of_single_string
 length of prior single string
int32_t max_string_length
 length of longest string
floatmax_t num_symbols
 number of used symbols
floatmax_t original_num_symbols
 original number of used symbols (before higher order mapping)
int32_t order
 order used in higher order mapping
ST * symbol_mask_table
 order used in higher order mapping
bool preprocess_on_get
 preprocess on-the-fly?
CCache< ST > * feature_cache

构造及析构函数文档

default constructor

在文件StringFeatures.h133行定义。

CStringFeatures ( EAlphabet  alpha )

constructor

参数:
alphaalphabet (type) to use for string features

在文件StringFeatures.h144行定义。

CStringFeatures ( T_STRING< ST > *  p_features,
int32_t  p_num_vectors,
int32_t  p_max_string_length,
EAlphabet  alpha 
)

constructor

参数:
p_featuresnew features
p_num_vectorsnumber of vectors
p_max_string_lengthmaximum string length
alphaalphabet (type) to use for string features

在文件StringFeatures.h163行定义。

CStringFeatures ( T_STRING< ST > *  p_features,
int32_t  p_num_vectors,
int32_t  p_max_string_length,
CAlphabet alpha 
)

constructor

参数:
p_featuresnew features
p_num_vectorsnumber of vectors
p_max_string_lengthmaximum string length
alphaan actual alphabet

在文件StringFeatures.h184行定义。

CStringFeatures ( CAlphabet alpha )

constructor

参数:
alphaalphabet to use for string features

在文件StringFeatures.h202行定义。

CStringFeatures ( const CStringFeatures< ST > &  orig )

copy constructor

在文件StringFeatures.h216行定义。

CStringFeatures ( CFile loader,
EAlphabet  alpha = DNA 
)

constructor

参数:
loaderFile object via which to load data
alphaalphabet (type) to use for string features

在文件StringFeatures.h256行定义。

virtual ~CStringFeatures (  ) [virtual]

在文件StringFeatures.h268行定义。


成员函数文档

bool append_features ( CStringFeatures< ST > *  sf )

append features

参数:
sffeatures to append
返回:
if setting was successful

在文件StringFeatures.h1113行定义。

bool append_features ( T_STRING< ST > *  p_features,
int32_t  p_num_vectors,
int32_t  p_max_string_length 
)

append features

参数:
p_featuresfeatures to append
p_num_vectorsnumber of vectors
p_max_string_lengthmaximum string length

note that p_features will be delete[]'d on success

返回:
if setting was successful

在文件StringFeatures.h1139行定义。

virtual bool apply_preproc ( bool  force_preprocessing = false ) [virtual]

apply preprocessor

参数:
force_preprocessingif preprocssing shall be forced
返回:
if applying was successful

在文件StringFeatures.h1411行定义。

virtual void cleanup (  ) [virtual]

cleanup string features

CStringFileFeatures< ST >重载。

在文件StringFeatures.h276行定义。

virtual void cleanup_feature_vector ( int32_t  num ) [virtual]

cleanup a single feature vector

CStringFileFeatures< ST >重载。

在文件StringFeatures.h306行定义。

virtual ST* compute_feature_vector ( int32_t  num,
int32_t &  len 
) [protected, virtual]

compute feature vector for sample num if target is set the vector is written to target len is returned by reference

default implementation returns

参数:
numwhich vector
lenlength of vector
返回:
feature vector

在文件StringFeatures.h1926行定义。

void compute_symbol_mask_table ( int64_t  max_val )

compute symbol mask table

required to access bit-based symbols

在文件StringFeatures.h1724行定义。

virtual T_STRING<ST>* copy_features ( int32_t &  num_str,
int32_t &  max_str_len 
) [virtual]

copy_features

参数:
num_strnumber of strings (returned)
max_str_lenmaximal string length (returned)
返回:
string features

在文件StringFeatures.h1208行定义。

virtual void create_random ( float64_t hist,
int32_t  rows,
int32_t  cols,
int32_t  num_vec 
) [virtual]

create some random strings based on normalized histogram

在文件StringFeatures.h1879行定义。

void determine_maximum_string_length (  )

determine new maximum string length

在文件StringFeatures.h1790行定义。

void disable_on_the_fly_preprocessing (  )

call this to disable on the fly feature preprocessing on get_feature_vector. Useful when you manually apply preprocessors.

在文件StringFeatures.h410行定义。

virtual CFeatures* duplicate (  ) const [virtual]

duplicate feature object

返回:
feature object

实现了CFeatures

在文件StringFeatures.h343行定义。

void embed_features ( int32_t  p_order )

embed string features in bit representation in-place

在文件StringFeatures.h1669行定义。

ST embed_word ( ST *  seq,
int32_t  len 
)

embed a single word

参数:
seqsequence of size len in a bitfield
len

在文件StringFeatures.h1775行定义。

void enable_on_the_fly_preprocessing (  )

call this to preprocess string features upon get_feature_vector

在文件StringFeatures.h402行定义。

void free_feature_vector ( ST *  feat_vec,
int32_t  num,
bool  dofree 
)

free feature vector

参数:
feat_vecfeature vector to free
numindex in feature cache
dofreeif vector should be really deleted

在文件StringFeatures.h519行定义。

CAlphabet* get_alphabet (  )

get alphabet used in string features

返回:
alphabet

在文件StringFeatures.h333行定义。

virtual ST get_feature ( int32_t  vec_num,
int32_t  feat_num 
) [virtual]

get feature

参数:
vec_numwhich vector
feat_numwhich feature
返回:
feature

在文件StringFeatures.h534行定义。

virtual EFeatureClass get_feature_class (  ) [virtual]

get feature class

返回:
feature class STRING

实现了CFeatures

在文件StringFeatures.h321行定义。

virtual EFeatureType get_feature_type (  ) [virtual]

get feature type

返回:
templated feature type

实现了CFeatures

在文件StringFeatures.h327行定义。

ST* get_feature_vector ( int32_t  num,
int32_t &  len,
bool &  dofree 
)

get feature vector for sample num

参数:
numindex of feature vector
lenlength is returned by reference
dofreewhether returned vector must be freed by caller via free_feature_vector
返回:
feature vector for sample num

在文件StringFeatures.h423行定义。

void get_feature_vector ( ST **  dst,
int32_t *  len,
int32_t  num 
)

get string for selected example num

参数:
dstdestination where vector will be stored
lennumber of features in vector
numindex of the string

在文件StringFeatures.h354行定义。

virtual T_STRING<ST>* get_features ( int32_t &  num_str,
int32_t &  max_str_len 
) [virtual]

get_features

参数:
num_strnumber of strings (returned)
max_str_lenmaximal string length (returned)
返回:
string features

在文件StringFeatures.h1195行定义。

virtual void get_features ( T_STRING< ST > **  dst,
int32_t *  num_str 
) [virtual]

get_features (swig compatible)

参数:
dststring features (returned)
num_strnumber of strings (returned)

在文件StringFeatures.h1235行定义。

virtual void get_histogram ( float64_t **  hist,
int32_t *  rows,
int32_t *  cols,
bool  normalize = true 
) [virtual]

compute histogram over strings

在文件StringFeatures.h1834行定义。

ST get_masked_symbols ( ST  symbol,
uint8_t  mask 
)

a higher order mapped symbol will be shaped such that the symbols specified by bits in the mask will be returned.

参数:
symbolsymbol to mask
maskmask to apply
返回:
masked symbol

在文件StringFeatures.h613行定义。

floatmax_t get_max_num_symbols (  )

get maximum number of symbols

Note: floatmax_t sounds weird, but int64_t is not long enough (and there is no int128_t type)

返回:
maximum number of symbols

在文件StringFeatures.h590行定义。

virtual int32_t get_max_vector_length (  ) [virtual]

get maximum vector length

返回:
maximum vector/string length

在文件StringFeatures.h564行定义。

virtual const char* get_name (  ) const [virtual]
返回:
object name

实现了CSGObject

在文件StringFeatures.h1912行定义。

floatmax_t get_num_symbols (  )

get number of symbols

Note: floatmax_t sounds weird, but LONG is not long enough

返回:
number of symbols

在文件StringFeatures.h581行定义。

virtual int32_t get_num_vectors (  ) [virtual]

get number of vectors

返回:
number of vectors

实现了CFeatures

在文件StringFeatures.h573行定义。

int32_t get_order (  )

order used for higher order mapping

返回:
order

在文件StringFeatures.h604行定义。

floatmax_t get_original_num_symbols (  )

number of symbols before higher order mapping

返回:
original number of symbols

在文件StringFeatures.h598行定义。

virtual int32_t get_size (  ) [virtual]

get memory footprint of one feature

返回:
memory footprint of one feature

实现了CFeatures

在文件StringFeatures.h1404行定义。

T_STRING<ST>* get_transposed ( int32_t &  num_feat,
int32_t &  num_vec 
)

compute and return the transpose of string features matrix which will be prepocessed. num_feat, num_vectors are returned by reference caller has to clean up

note that strings all have to have same length

参数:
num_featnumber of features in matrix
num_vecnumber of vectors in matrix
返回:
transposed string features

在文件StringFeatures.h482行定义。

CStringFeatures<ST>* get_transposed (  )

get a transposed copy of the features

返回:
transposed copy

在文件StringFeatures.h462行定义。

virtual int32_t get_vector_length ( int32_t  vec_num ) [virtual]

get vector length

参数:
vec_numwhich vector
返回:
length of vector

在文件StringFeatures.h551行定义。

static ST* get_zero_terminated_string_copy ( T_STRING< ST >  str ) [static]

get a zero terminated copy of the string

参数:
strthe string to copy
返回:
zero terminated copy of str

note that this function is only sensible for character strings

在文件StringFeatures.h1805行定义。

bool have_same_length ( int32_t  len = -1 )

check if length of each vector in this feature object equals the given length.

参数:
lenvector length to check against
返回:
if length of each vector in this feature object equals the given length.

在文件StringFeatures.h1647行定义。

virtual void load ( CFile loader ) [virtual]

load features from file

参数:
loaderFile object via which to load data

重载CFeatures

void load_ascii_file ( char *  fname,
bool  remap_to_bin = true,
EAlphabet  ascii_alphabet = DNA,
EAlphabet  binary_alphabet = RAWDNA 
)

load ascii line-based string features from file

参数:
fnamefilename to load from
remap_to_binif translation to other binary alphabet should be performed
ascii_alphabetsrc alphabet
binary_alphabetalphabet to translate to

在文件StringFeatures.h657行定义。

virtual bool load_compressed ( char *  src,
bool  decompress 
) [virtual]

load compressed features from file

参数:
srcfilename to load from
decompresswhether to decompress on loading
返回:
if loading was successful

在文件StringFeatures.h1255行定义。

bool load_fasta_file ( const char *  fname,
bool  ignore_invalid = false 
)

load fasta file as string features

参数:
fnamefilename to load from
ignore_invalidif set to true, characters other than A,C,G,T are converted to A
返回:
if loading was successful

在文件StringFeatures.h796行定义。

bool load_fastq_file ( const char *  fname,
bool  ignore_invalid = false,
bool  bitremap_in_single_string = false 
)

load fastq file as string features

参数:
fnamefilename to load from
ignore_invalidif set to true, characters other than A,C,G,T are converted to A
bitremap_in_single_stringif set to true, do binary embedding of symbols
返回:
if loading was successful

在文件StringFeatures.h895行定义。

bool load_from_directory ( char *  dirname )

load features from directory

参数:
dirnamedirectory name to load from
返回:
if loading was successful

在文件StringFeatures.h1000行定义。

int32_t obtain_by_position_list ( int32_t  window_size,
CDynamicArray< int32_t > *  positions,
int32_t  skip = 0 
)

extracts windows of size window_size from first string using the positions in list

参数:
window_sizewindow size
positionspositions
skipskip
返回:
something inty

在文件StringFeatures.h1486行定义。

int32_t obtain_by_sliding_window ( int32_t  window_size,
int32_t  step_size,
int32_t  skip = 0 
)

slides a window of size window_size over the current single string step_size is the amount by which the window is shifted. creates (string_len-window_size)/step_size many feature obj if skip is nonzero, skip the first 'skip' characters of each string

参数:
window_sizewindow size
step_sizestep size
skipskip
返回:
something inty

在文件StringFeatures.h1444行定义。

bool obtain_from_char ( CStringFeatures< char > *  sf,
int32_t  start,
int32_t  p_order,
int32_t  gap,
bool  rev 
)

obtain string features from char features

wrapper for template method

参数:
sfstring features
startstart
p_orderorder
gapgap
revreverse
返回:
if obtaining was successful

在文件StringFeatures.h1551行定义。

bool obtain_from_char_features ( CStringFeatures< CT > *  sf,
int32_t  start,
int32_t  p_order,
int32_t  gap,
bool  rev 
)

template obtain from char features

参数:
sfstring features
startstart
p_orderorder
gapgap
revreverse
返回:
if obtaining was successful

在文件StringFeatures.h1566行定义。

virtual void save ( CFile writer ) [virtual]

save features to file

参数:
writerFile object via which to save data

重载CFeatures

virtual bool save_compressed ( char *  dest,
E_COMPRESSION_TYPE  compression,
int  level 
) [virtual]

save compressed features to file

参数:
destfilename to save to
compressioncompressor to use
levelcompression level to use (1-9)
返回:
if saving was successful

在文件StringFeatures.h1342行定义。

void set_feature_vector ( ST *  src,
int32_t  len,
int32_t  num 
)

set string for selected example num

参数:
srcdestination where vector will be stored
lennumber of features in vector
numindex of the string

在文件StringFeatures.h379行定义。

virtual void set_feature_vector ( int32_t  num,
ST *  string,
int32_t  len 
) [virtual]

set feature vector for sample num

参数:
numindex of feature vector
stringstring with the feature vector's content
lenlength of the string

在文件StringFeatures.h1820行定义。

bool set_features ( T_STRING< ST > *  p_features,
int32_t  p_num_vectors,
int32_t  p_max_string_length 
)

set features

参数:
p_featuresnew features
p_num_vectorsnumber of vectors
p_max_string_lengthmaximum string length
返回:
if setting was successful

在文件StringFeatures.h1074行定义。

ST shift_offset ( ST  offset,
int32_t  amount 
)

shift offset to the left by amount

参数:
offsetoffset to shift
amountamount to shift the offset
返回:
shifted offset

在文件StringFeatures.h625行定义。

ST shift_symbol ( ST  symbol,
int32_t  amount 
)

shift symbol to the right by amount (taking care of custom symbol sizes)

参数:
symbolsymbol to shift
amountamount to shift the symbol
返回:
shifted symbol

在文件StringFeatures.h637行定义。

void unembed_word ( ST  word,
uint8_t *  seq,
int32_t  len 
)

remap bit-based word to character sequence

参数:
wordword to remap
seqsequence of size len that remapped characters are written to
lenlength of sequence and word

在文件StringFeatures.h1754行定义。


成员数据文档

CAlphabet* alphabet [protected]

alphabet

在文件StringFeatures.h2026行定义。

CCache<ST>* feature_cache [protected]

feature cache

在文件StringFeatures.h2059行定义。

T_STRING<ST>* features [protected]

this contains the array of features.

在文件StringFeatures.h2032行定义。

int32_t length_of_single_string [protected]

length of prior single string

在文件StringFeatures.h2038行定义。

int32_t max_string_length [protected]

length of longest string

在文件StringFeatures.h2041行定义。

floatmax_t num_symbols [protected]

number of used symbols

在文件StringFeatures.h2044行定义。

int32_t num_vectors [protected]

number of string vectors

在文件StringFeatures.h2029行定义。

int32_t order [protected]

order used in higher order mapping

在文件StringFeatures.h2050行定义。

original number of used symbols (before higher order mapping)

在文件StringFeatures.h2047行定义。

bool preprocess_on_get [protected]

preprocess on-the-fly?

在文件StringFeatures.h2056行定义。

ST* single_string [protected]

true when single string / created by sliding window

在文件StringFeatures.h2035行定义。

ST* symbol_mask_table [protected]

order used in higher order mapping

在文件StringFeatures.h2053行定义。


该类的文档由以下文件生成:

SHOGUN Machine Learning Toolbox - Documentation