COligoStringKernel Class Reference


Detailed Description

This class offers access to the Oligo Kernel introduced by Meinicke et al. in 2004.

The class has functions to preprocess the data such that the kernel computation can be pursued faster. The kernel function is then kernelOligoFast or kernelOligo.

Requires significant speedup, should be working but as is might be applicable only to academic small scale problems:

Uses CSqrtDiagKernelNormalizer, as the vanilla kernel seems to be very diagonally dominant.

Definition at line 39 of file OligoStringKernel.h.

Inheritance diagram for COligoStringKernel:
Inheritance graph
[legend]

List of all members.

Public Member Functions

 COligoStringKernel (int32_t cache_size, int32_t k, float64_t width)
virtual ~COligoStringKernel ()
virtual bool init (CFeatures *l, CFeatures *r)
virtual bool load_init (FILE *)
virtual bool save_init (FILE *)
virtual EKernelType get_kernel_type ()
virtual const char * get_name () const
virtual float64_t compute (int32_t x, int32_t y)
virtual void cleanup ()

Protected Member Functions

float64_t kernelOligoFast (const std::vector< std::pair< int32_t, float64_t > > &x, const std::vector< std::pair< int32_t, float64_t > > &y, int32_t max_distance=-1)
 returns the value of the oligo kernel for sequences 'x' and 'y'

Static Protected Member Functions

static void encodeOligo (const std::string &sequence, uint32_t k_mer_length, const std::string &allowed_characters, std::vector< std::pair< int32_t, float64_t > > &values)
 encodes the signals of the sequence
static void getSequences (const std::vector< std::string > &sequences, uint32_t k_mer_length, const std::string &allowed_characters, std::vector< std::vector< std::pair< int32_t, float64_t > > > &encoded_sequences)
 encodes all sequences with the encodeOligo function and stores them in 'encoded_sequences'

Protected Attributes

int32_t k
float64_t width
float64_tgauss_table

Constructor & Destructor Documentation

COligoStringKernel::COligoStringKernel ( int32_t  cache_size,
int32_t  k,
float64_t  width 
)

Constructor

Parameters:
cache_size cache size for kernel
k k-mer length
width - equivalent to 2*sigma^2

Definition at line 22 of file OligoStringKernel.cpp.

COligoStringKernel::~COligoStringKernel (  )  [virtual]

Destructor

Definition at line 28 of file OligoStringKernel.cpp.


Member Function Documentation

void COligoStringKernel::cleanup (  )  [virtual]

clean up your kernel

Reimplemented from CKernel.

Definition at line 33 of file OligoStringKernel.cpp.

float64_t COligoStringKernel::compute ( int32_t  x,
int32_t  y 
) [virtual]

compute kernel function for features a and b idx_{a,b} denote the index of the feature vectors in the corresponding feature object

abstract base method

Parameters:
x index a
y index b
Returns:
computed kernel function at indices a,b

Implements CKernel.

Definition at line 221 of file OligoStringKernel.cpp.

void COligoStringKernel::encodeOligo ( const std::string &  sequence,
uint32_t  k_mer_length,
const std::string &  allowed_characters,
std::vector< std::pair< int32_t, float64_t > > &  values 
) [static, protected]

encodes the signals of the sequence

This function stores the oligo function signals in 'values'.

The 'k_mer_length' and the 'allowed_characters' determine, which signals are used. Every pair contains the position of the signal and a numerical value reflecting the signal. The numerical value represents the k_mer to a base n = |allowed_characters|. Example: The value of k_mer CG for the allowed characters ACGT would be 1 * n^1 + 2 * n^0 = 6.

Definition at line 55 of file OligoStringKernel.cpp.

virtual EKernelType COligoStringKernel::get_kernel_type (  )  [virtual]

return what type of kernel we are

Returns:
kernel type OLIGO

Implements CKernel.

Definition at line 82 of file OligoStringKernel.h.

virtual const char* COligoStringKernel::get_name (  )  const [virtual]

return the kernel's name

Returns:
name Oligo

Implements CSGObject.

Definition at line 88 of file OligoStringKernel.h.

void COligoStringKernel::getSequences ( const std::vector< std::string > &  sequences,
uint32_t  k_mer_length,
const std::string &  allowed_characters,
std::vector< std::vector< std::pair< int32_t, float64_t > > > &  encoded_sequences 
) [static, protected]

encodes all sequences with the encodeOligo function and stores them in 'encoded_sequences'

This function encodes the sequences of 'sequences' via the function encodeOligo.

Definition at line 113 of file OligoStringKernel.cpp.

bool COligoStringKernel::init ( CFeatures l,
CFeatures r 
) [virtual]

initialize kernel

Parameters:
l features of left-hand side
r features of right-hand side
Returns:
if initializing was successful

Reimplemented from CStringKernel< char >.

Definition at line 41 of file OligoStringKernel.cpp.

float64_t COligoStringKernel::kernelOligoFast ( const std::vector< std::pair< int32_t, float64_t > > &  x,
const std::vector< std::pair< int32_t, float64_t > > &  y,
int32_t  max_distance = -1 
) [protected]

returns the value of the oligo kernel for sequences 'x' and 'y'

This function computes the kernel value of the oligo kernel, which was introduced by Meinicke et al. in 2004. 'x' and 'y' are encoded by encodeOligo and 'exp_cache' has to be constructed by getExpFunctionCache.

'max_distance' can be used to speed up the computation even further by restricting the maximum distance between a k_mer at position i in sequence 'x' and a k_mer at position j in sequence 'y'. If i - j > 'max_distance' the value is not added to the kernel value. This approximation is switched off by default (max_distance < 0).

Definition at line 139 of file OligoStringKernel.cpp.

virtual bool COligoStringKernel::load_init ( FILE *   )  [virtual]

load kernel init_data

Returns:
if loading was successful

Implements CKernel.

Definition at line 64 of file OligoStringKernel.h.

virtual bool COligoStringKernel::save_init ( FILE *   )  [virtual]

save kernel init_data

Returns:
if saving was successful

Implements CKernel.

Definition at line 73 of file OligoStringKernel.h.


Member Data Documentation

cache for exp (see getExpFunctionCache above)

Definition at line 173 of file OligoStringKernel.h.

int32_t COligoStringKernel::k [protected]

member variable k

Definition at line 169 of file OligoStringKernel.h.

width of kernel

Definition at line 171 of file OligoStringKernel.h.


The documentation for this class was generated from the following files:

SHOGUN Machine Learning Toolbox - Documentation