CCommWordStringKernel Class Reference


Detailed Description

The CommWordString kernel may be used to compute the spectrum kernel from strings that have been mapped into unsigned 16bit integers.

These 16bit integers correspond to k-mers. To applicable in this kernel they need to be sorted (e.g. via the SortWordString pre-processor).

It basically uses the algorithm in the unix "comm" command (hence the name) to compute:

\[ k({\bf x},({\bf x'})= \Phi_k({\bf x})\cdot \Phi_k({\bf x'}) \]

where $\Phi_k$ maps a sequence ${\bf x}$ that consists of letters in $\Sigma$ to a feature vector of size $|\Sigma|^k$. In this feature vector each entry denotes how often the k-mer appears in that ${\bf x}$.

Note that this representation is especially tuned to small alphabets (like the 2-bit alphabet DNA), for which it enables spectrum kernels of order up to 8.

For this kernel the linadd speedups are quite efficiently implemented using direct maps.

Definition at line 44 of file CommWordStringKernel.h.

Inheritance diagram for CCommWordStringKernel:
Inheritance graph
[legend]

List of all members.

Public Member Functions

 CCommWordStringKernel (int32_t size, bool use_sign)
 CCommWordStringKernel (CStringFeatures< uint16_t > *l, CStringFeatures< uint16_t > *r, bool use_sign=false, int32_t size=10)
virtual ~CCommWordStringKernel ()
virtual bool init (CFeatures *l, CFeatures *r)
virtual void cleanup ()
bool load_init (FILE *src)
bool save_init (FILE *dest)
virtual EKernelType get_kernel_type ()
virtual const char * get_name () const
virtual bool init_dictionary (int32_t size)
virtual bool init_optimization (int32_t count, int32_t *IDX, float64_t *weights)
virtual bool delete_optimization ()
virtual float64_t compute_optimized (int32_t idx)
virtual void add_to_normal (int32_t idx, float64_t weight)
virtual void clear_normal ()
virtual EFeatureType get_feature_type ()
void get_dictionary (int32_t &dsize, float64_t *&dweights)
virtual float64_tcompute_scoring (int32_t max_degree, int32_t &num_feat, int32_t &num_sym, float64_t *target, int32_t num_suppvec, int32_t *IDX, float64_t *alphas, bool do_init=true)
char * compute_consensus (int32_t &num_feat, int32_t num_suppvec, int32_t *IDX, float64_t *alphas)
void set_use_dict_diagonal_optimization (bool flag)
bool get_use_dict_diagonal_optimization ()

Protected Member Functions

virtual float64_t compute (int32_t idx_a, int32_t idx_b)
virtual float64_t compute_helper (int32_t idx_a, int32_t idx_b, bool do_sort)
virtual float64_t compute_diag (int32_t idx_a)

Protected Attributes

int32_t dictionary_size
float64_tdictionary_weights
bool use_sign
bool use_dict_diagonal_optimization
int32_t * dict_diagonal_optimization

Friends

class CSqrtDiagKernelNormalizer
class CAvgDiagKernelNormalizer
class CFirstElementKernelNormalizer
class CTanimotoKernelNormalizer
class CDiceKernelNormalizer

Constructor & Destructor Documentation

CCommWordStringKernel::CCommWordStringKernel ( int32_t  size,
bool  use_sign 
)

constructor

Parameters:
size cache size
use_sign if sign shall be used

Definition at line 17 of file CommWordStringKernel.cpp.

CCommWordStringKernel::CCommWordStringKernel ( CStringFeatures< uint16_t > *  l,
CStringFeatures< uint16_t > *  r,
bool  use_sign = false,
int32_t  size = 10 
)

constructor

Parameters:
l features of left-hand side
r features of right-hand side
use_sign if sign shall be used
size cache size

Definition at line 26 of file CommWordStringKernel.cpp.

CCommWordStringKernel::~CCommWordStringKernel (  )  [virtual]

Definition at line 51 of file CommWordStringKernel.cpp.


Member Function Documentation

void CCommWordStringKernel::add_to_normal ( int32_t  idx,
float64_t  weight 
) [virtual]

add to normal

Parameters:
idx where to add
weight what to add

Reimplemented from CKernel.

Reimplemented in CWeightedCommWordStringKernel.

Definition at line 238 of file CommWordStringKernel.cpp.

void CCommWordStringKernel::cleanup (  )  [virtual]

clean up kernel

Reimplemented from CKernel.

Reimplemented in CWeightedCommWordStringKernel.

Definition at line 73 of file CommWordStringKernel.cpp.

void CCommWordStringKernel::clear_normal (  )  [virtual]

clear normal

Reimplemented from CKernel.

Definition at line 280 of file CommWordStringKernel.cpp.

virtual float64_t CCommWordStringKernel::compute ( int32_t  idx_a,
int32_t  idx_b 
) [protected, virtual]

compute kernel function for features a and b idx_{a,b} denote the index of the feature vectors in the corresponding feature object

Parameters:
idx_a index a
idx_b index b
Returns:
computed kernel function at indices a,b

Implements CKernel.

Definition at line 222 of file CommWordStringKernel.h.

char * CCommWordStringKernel::compute_consensus ( int32_t &  num_feat,
int32_t  num_suppvec,
int32_t *  IDX,
float64_t alphas 
)

compute consensus

Parameters:
num_feat number of features
num_suppvec number of support vectors
IDX IDX
alphas alphas
Returns:
computed consensus

Definition at line 490 of file CommWordStringKernel.cpp.

float64_t CCommWordStringKernel::compute_diag ( int32_t  idx_a  )  [protected, virtual]

helper to compute only diagonal normalization for training

Parameters:
idx_a index a
Returns:
unnormalized diagonal value

Definition at line 89 of file CommWordStringKernel.cpp.

float64_t CCommWordStringKernel::compute_helper ( int32_t  idx_a,
int32_t  idx_b,
bool  do_sort 
) [protected, virtual]

helper for compute

Parameters:
idx_a index a
idx_b index b
do_sort if sorting shall be performed
Returns:
computed value

Reimplemented in CWeightedCommWordStringKernel.

Definition at line 131 of file CommWordStringKernel.cpp.

float64_t CCommWordStringKernel::compute_optimized ( int32_t  idx  )  [virtual]

compute optimized

Parameters:
idx index to compute
Returns:
optimized value at given index

Reimplemented from CKernel.

Reimplemented in CWeightedCommWordStringKernel.

Definition at line 320 of file CommWordStringKernel.cpp.

float64_t * CCommWordStringKernel::compute_scoring ( int32_t  max_degree,
int32_t &  num_feat,
int32_t &  num_sym,
float64_t target,
int32_t  num_suppvec,
int32_t *  IDX,
float64_t alphas,
bool  do_init = true 
) [virtual]

compute scoring

Parameters:
max_degree maximum degree
num_feat number of features
num_sym number of symbols
target target
num_suppvec number of support vectors
IDX IDX
alphas alphas
do_init if initialization shall be performed
Returns:
computed scores

Reimplemented in CWeightedCommWordStringKernel.

Definition at line 367 of file CommWordStringKernel.cpp.

bool CCommWordStringKernel::delete_optimization (  )  [virtual]

delete optimization

Returns:
if deleting was successful

Reimplemented from CKernel.

Definition at line 312 of file CommWordStringKernel.cpp.

void CCommWordStringKernel::get_dictionary ( int32_t &  dsize,
float64_t *&  dweights 
)

get dictionary

Parameters:
dsize dictionary size will be stored in here
dweights dictionary weights will be stored in here

Definition at line 160 of file CommWordStringKernel.h.

virtual EFeatureType CCommWordStringKernel::get_feature_type (  )  [virtual]

return feature type the kernel can deal with

Returns:
feature type WORD

Reimplemented from CStringKernel< uint16_t >.

Reimplemented in CWeightedCommWordStringKernel.

Definition at line 153 of file CommWordStringKernel.h.

virtual EKernelType CCommWordStringKernel::get_kernel_type (  )  [virtual]

return what type of kernel we are

Returns:
kernel type COMMWORDSTRING

Implements CKernel.

Reimplemented in CWeightedCommWordStringKernel.

Definition at line 102 of file CommWordStringKernel.h.

virtual const char* CCommWordStringKernel::get_name (  )  const [virtual]

return the kernel's name

Returns:
name CommWordString

Implements CSGObject.

Reimplemented in CWeightedCommWordStringKernel.

Definition at line 108 of file CommWordStringKernel.h.

bool CCommWordStringKernel::get_use_dict_diagonal_optimization (  ) 

get.use.dict.diagonal.optimization

Returns:
true if diagonal optimization is on

Definition at line 208 of file CommWordStringKernel.h.

bool CCommWordStringKernel::init ( CFeatures l,
CFeatures r 
) [virtual]

initialize kernel

Parameters:
l features of left-hand side
r features of right-hand side
Returns:
if initializing was successful

Reimplemented from CStringKernel< uint16_t >.

Reimplemented in CWeightedCommWordStringKernel.

Definition at line 59 of file CommWordStringKernel.cpp.

bool CCommWordStringKernel::init_dictionary ( int32_t  size  )  [virtual]

initialize dictionary

Parameters:
size size

Definition at line 40 of file CommWordStringKernel.cpp.

bool CCommWordStringKernel::init_optimization ( int32_t  count,
int32_t *  IDX,
float64_t weights 
) [virtual]

initialize optimization

Parameters:
count count
IDX index
weights weights
Returns:
if initializing was successful

Reimplemented from CKernel.

Definition at line 286 of file CommWordStringKernel.cpp.

bool CCommWordStringKernel::load_init ( FILE *  src  )  [virtual]

load kernel init_data

Parameters:
src file to load from
Returns:
if loading was successful

Implements CKernel.

Definition at line 79 of file CommWordStringKernel.cpp.

bool CCommWordStringKernel::save_init ( FILE *  dest  )  [virtual]

save kernel init_data

Parameters:
dest file to save to
Returns:
if saving was successful

Implements CKernel.

Definition at line 84 of file CommWordStringKernel.cpp.

void CCommWordStringKernel::set_use_dict_diagonal_optimization ( bool  flag  ) 

set_use_dict_diagonal_optimization

Parameters:
flag enable diagonal optimization

Definition at line 199 of file CommWordStringKernel.h.


Friends And Related Function Documentation

friend class CAvgDiagKernelNormalizer [friend]

Reimplemented from CKernel.

Definition at line 47 of file CommWordStringKernel.h.

friend class CDiceKernelNormalizer [friend]

Reimplemented from CKernel.

Definition at line 50 of file CommWordStringKernel.h.

friend class CFirstElementKernelNormalizer [friend]

Reimplemented from CKernel.

Definition at line 48 of file CommWordStringKernel.h.

friend class CSqrtDiagKernelNormalizer [friend]

Reimplemented from CKernel.

Definition at line 46 of file CommWordStringKernel.h.

friend class CTanimotoKernelNormalizer [friend]

Reimplemented from CKernel.

Definition at line 49 of file CommWordStringKernel.h.


Member Data Documentation

array to hold counters for all strings

Definition at line 257 of file CommWordStringKernel.h.

size of dictionary (number of possible strings)

Definition at line 246 of file CommWordStringKernel.h.

dictionary weights - array to hold counters for all possible strings

Definition at line 249 of file CommWordStringKernel.h.

whether diagonal optimization shall be used

Definition at line 255 of file CommWordStringKernel.h.

if sign shall be used

Definition at line 252 of file CommWordStringKernel.h.


The documentation for this class was generated from the following files:

SHOGUN Machine Learning Toolbox - Documentation