The CommWordString kernel may be used to compute the spectrum kernel from strings that have been mapped into unsigned 16bit integers.
These 16bit integers correspond to k-mers. To applicable in this kernel they need to be sorted (e.g. via the SortWordString pre-processor).
It basically uses the algorithm in the unix "comm" command (hence the name) to compute:
where maps a sequence
that consists of letters in
to a feature vector of size
. In this feature vector each entry denotes how often the k-mer appears in that
.
Note that this representation is especially tuned to small alphabets (like the 2-bit alphabet DNA), for which it enables spectrum kernels of order up to 8.
For this kernel the linadd speedups are quite efficiently implemented using direct maps.
Definition at line 44 of file CommWordStringKernel.h.
Public Member Functions | |
CCommWordStringKernel (int32_t size, bool use_sign) | |
CCommWordStringKernel (CStringFeatures< uint16_t > *l, CStringFeatures< uint16_t > *r, bool use_sign=false, int32_t size=10) | |
virtual | ~CCommWordStringKernel () |
virtual bool | init (CFeatures *l, CFeatures *r) |
virtual void | cleanup () |
bool | load_init (FILE *src) |
bool | save_init (FILE *dest) |
virtual EKernelType | get_kernel_type () |
virtual const char * | get_name () const |
virtual bool | init_dictionary (int32_t size) |
virtual bool | init_optimization (int32_t count, int32_t *IDX, float64_t *weights) |
virtual bool | delete_optimization () |
virtual float64_t | compute_optimized (int32_t idx) |
virtual void | add_to_normal (int32_t idx, float64_t weight) |
virtual void | clear_normal () |
virtual EFeatureType | get_feature_type () |
void | get_dictionary (int32_t &dsize, float64_t *&dweights) |
virtual float64_t * | compute_scoring (int32_t max_degree, int32_t &num_feat, int32_t &num_sym, float64_t *target, int32_t num_suppvec, int32_t *IDX, float64_t *alphas, bool do_init=true) |
char * | compute_consensus (int32_t &num_feat, int32_t num_suppvec, int32_t *IDX, float64_t *alphas) |
void | set_use_dict_diagonal_optimization (bool flag) |
bool | get_use_dict_diagonal_optimization () |
Protected Member Functions | |
virtual float64_t | compute (int32_t idx_a, int32_t idx_b) |
virtual float64_t | compute_helper (int32_t idx_a, int32_t idx_b, bool do_sort) |
virtual float64_t | compute_diag (int32_t idx_a) |
Protected Attributes | |
int32_t | dictionary_size |
float64_t * | dictionary_weights |
bool | use_sign |
bool | use_dict_diagonal_optimization |
int32_t * | dict_diagonal_optimization |
Friends | |
class | CSqrtDiagKernelNormalizer |
class | CAvgDiagKernelNormalizer |
class | CFirstElementKernelNormalizer |
class | CTanimotoKernelNormalizer |
class | CDiceKernelNormalizer |
CCommWordStringKernel::CCommWordStringKernel | ( | int32_t | size, | |
bool | use_sign | |||
) |
constructor
size | cache size | |
use_sign | if sign shall be used |
Definition at line 17 of file CommWordStringKernel.cpp.
CCommWordStringKernel::CCommWordStringKernel | ( | CStringFeatures< uint16_t > * | l, | |
CStringFeatures< uint16_t > * | r, | |||
bool | use_sign = false , |
|||
int32_t | size = 10 | |||
) |
constructor
l | features of left-hand side | |
r | features of right-hand side | |
use_sign | if sign shall be used | |
size | cache size |
Definition at line 26 of file CommWordStringKernel.cpp.
CCommWordStringKernel::~CCommWordStringKernel | ( | ) | [virtual] |
Definition at line 51 of file CommWordStringKernel.cpp.
void CCommWordStringKernel::add_to_normal | ( | int32_t | idx, | |
float64_t | weight | |||
) | [virtual] |
add to normal
idx | where to add | |
weight | what to add |
Reimplemented from CKernel.
Reimplemented in CWeightedCommWordStringKernel.
Definition at line 238 of file CommWordStringKernel.cpp.
void CCommWordStringKernel::cleanup | ( | ) | [virtual] |
clean up kernel
Reimplemented from CKernel.
Reimplemented in CWeightedCommWordStringKernel.
Definition at line 73 of file CommWordStringKernel.cpp.
void CCommWordStringKernel::clear_normal | ( | ) | [virtual] |
virtual float64_t CCommWordStringKernel::compute | ( | int32_t | idx_a, | |
int32_t | idx_b | |||
) | [protected, virtual] |
compute kernel function for features a and b idx_{a,b} denote the index of the feature vectors in the corresponding feature object
idx_a | index a | |
idx_b | index b |
Implements CKernel.
Definition at line 222 of file CommWordStringKernel.h.
char * CCommWordStringKernel::compute_consensus | ( | int32_t & | num_feat, | |
int32_t | num_suppvec, | |||
int32_t * | IDX, | |||
float64_t * | alphas | |||
) |
compute consensus
num_feat | number of features | |
num_suppvec | number of support vectors | |
IDX | IDX | |
alphas | alphas |
Definition at line 490 of file CommWordStringKernel.cpp.
float64_t CCommWordStringKernel::compute_diag | ( | int32_t | idx_a | ) | [protected, virtual] |
helper to compute only diagonal normalization for training
idx_a | index a |
Definition at line 89 of file CommWordStringKernel.cpp.
float64_t CCommWordStringKernel::compute_helper | ( | int32_t | idx_a, | |
int32_t | idx_b, | |||
bool | do_sort | |||
) | [protected, virtual] |
helper for compute
idx_a | index a | |
idx_b | index b | |
do_sort | if sorting shall be performed |
Reimplemented in CWeightedCommWordStringKernel.
Definition at line 131 of file CommWordStringKernel.cpp.
float64_t CCommWordStringKernel::compute_optimized | ( | int32_t | idx | ) | [virtual] |
compute optimized
idx | index to compute |
Reimplemented from CKernel.
Reimplemented in CWeightedCommWordStringKernel.
Definition at line 320 of file CommWordStringKernel.cpp.
float64_t * CCommWordStringKernel::compute_scoring | ( | int32_t | max_degree, | |
int32_t & | num_feat, | |||
int32_t & | num_sym, | |||
float64_t * | target, | |||
int32_t | num_suppvec, | |||
int32_t * | IDX, | |||
float64_t * | alphas, | |||
bool | do_init = true | |||
) | [virtual] |
compute scoring
max_degree | maximum degree | |
num_feat | number of features | |
num_sym | number of symbols | |
target | target | |
num_suppvec | number of support vectors | |
IDX | IDX | |
alphas | alphas | |
do_init | if initialization shall be performed |
Reimplemented in CWeightedCommWordStringKernel.
Definition at line 367 of file CommWordStringKernel.cpp.
bool CCommWordStringKernel::delete_optimization | ( | ) | [virtual] |
delete optimization
Reimplemented from CKernel.
Definition at line 312 of file CommWordStringKernel.cpp.
void CCommWordStringKernel::get_dictionary | ( | int32_t & | dsize, | |
float64_t *& | dweights | |||
) |
get dictionary
dsize | dictionary size will be stored in here | |
dweights | dictionary weights will be stored in here |
Definition at line 160 of file CommWordStringKernel.h.
virtual EFeatureType CCommWordStringKernel::get_feature_type | ( | ) | [virtual] |
return feature type the kernel can deal with
Reimplemented from CStringKernel< uint16_t >.
Reimplemented in CWeightedCommWordStringKernel.
Definition at line 153 of file CommWordStringKernel.h.
virtual EKernelType CCommWordStringKernel::get_kernel_type | ( | ) | [virtual] |
return what type of kernel we are
Implements CKernel.
Reimplemented in CWeightedCommWordStringKernel.
Definition at line 102 of file CommWordStringKernel.h.
virtual const char* CCommWordStringKernel::get_name | ( | ) | const [virtual] |
return the kernel's name
Implements CSGObject.
Reimplemented in CWeightedCommWordStringKernel.
Definition at line 108 of file CommWordStringKernel.h.
bool CCommWordStringKernel::get_use_dict_diagonal_optimization | ( | ) |
get.use.dict.diagonal.optimization
Definition at line 208 of file CommWordStringKernel.h.
initialize kernel
l | features of left-hand side | |
r | features of right-hand side |
Reimplemented from CStringKernel< uint16_t >.
Reimplemented in CWeightedCommWordStringKernel.
Definition at line 59 of file CommWordStringKernel.cpp.
bool CCommWordStringKernel::init_dictionary | ( | int32_t | size | ) | [virtual] |
bool CCommWordStringKernel::init_optimization | ( | int32_t | count, | |
int32_t * | IDX, | |||
float64_t * | weights | |||
) | [virtual] |
initialize optimization
count | count | |
IDX | index | |
weights | weights |
Reimplemented from CKernel.
Definition at line 286 of file CommWordStringKernel.cpp.
bool CCommWordStringKernel::load_init | ( | FILE * | src | ) | [virtual] |
load kernel init_data
src | file to load from |
Implements CKernel.
Definition at line 79 of file CommWordStringKernel.cpp.
bool CCommWordStringKernel::save_init | ( | FILE * | dest | ) | [virtual] |
save kernel init_data
dest | file to save to |
Implements CKernel.
Definition at line 84 of file CommWordStringKernel.cpp.
void CCommWordStringKernel::set_use_dict_diagonal_optimization | ( | bool | flag | ) |
set_use_dict_diagonal_optimization
flag | enable diagonal optimization |
Definition at line 199 of file CommWordStringKernel.h.
friend class CAvgDiagKernelNormalizer [friend] |
Reimplemented from CKernel.
Definition at line 47 of file CommWordStringKernel.h.
friend class CDiceKernelNormalizer [friend] |
Reimplemented from CKernel.
Definition at line 50 of file CommWordStringKernel.h.
friend class CFirstElementKernelNormalizer [friend] |
Reimplemented from CKernel.
Definition at line 48 of file CommWordStringKernel.h.
friend class CSqrtDiagKernelNormalizer [friend] |
Reimplemented from CKernel.
Definition at line 46 of file CommWordStringKernel.h.
friend class CTanimotoKernelNormalizer [friend] |
Reimplemented from CKernel.
Definition at line 49 of file CommWordStringKernel.h.
int32_t* CCommWordStringKernel::dict_diagonal_optimization [protected] |
array to hold counters for all strings
Definition at line 257 of file CommWordStringKernel.h.
int32_t CCommWordStringKernel::dictionary_size [protected] |
size of dictionary (number of possible strings)
Definition at line 246 of file CommWordStringKernel.h.
float64_t* CCommWordStringKernel::dictionary_weights [protected] |
dictionary weights - array to hold counters for all possible strings
Definition at line 249 of file CommWordStringKernel.h.
bool CCommWordStringKernel::use_dict_diagonal_optimization [protected] |
whether diagonal optimization shall be used
Definition at line 255 of file CommWordStringKernel.h.
bool CCommWordStringKernel::use_sign [protected] |
if sign shall be used
Definition at line 252 of file CommWordStringKernel.h.