This class offers access to the Oligo Kernel introduced by Meinicke et al. in 2004.
The class has functions to preprocess the data such that the kernel computation can be pursued faster. The kernel function is then kernelOligoFast or kernelOligo.
Requires significant speedup, should be working but as is might be applicable only to academic small scale problems:
Uses CSqrtDiagKernelNormalizer, as the vanilla kernel seems to be very diagonally dominant.
在文件OligoStringKernel.h第41行定义。
公有成员 | |
COligoStringKernel (int32_t cache_size, int32_t k, float64_t width) | |
virtual | ~COligoStringKernel () |
virtual bool | init (CFeatures *l, CFeatures *r) |
virtual EKernelType | get_kernel_type () |
virtual const char * | get_name () const |
virtual float64_t | compute (int32_t x, int32_t y) |
virtual void | cleanup () |
保护成员 | |
float64_t | kernelOligoFast (const std::vector< std::pair< int32_t, float64_t > > &x, const std::vector< std::pair< int32_t, float64_t > > &y, int32_t max_distance=-1) |
returns the value of the oligo kernel for sequences 'x' and 'y' | |
静态保护成员 | |
static void | encodeOligo (const std::string &sequence, uint32_t k_mer_length, const std::string &allowed_characters, std::vector< std::pair< int32_t, float64_t > > &values) |
encodes the signals of the sequence | |
static void | getSequences (const std::vector< std::string > &sequences, uint32_t k_mer_length, const std::string &allowed_characters, std::vector< std::vector< std::pair< int32_t, float64_t > > > &encoded_sequences) |
encodes all sequences with the encodeOligo function and stores them in 'encoded_sequences' | |
保护属性 | |
int32_t | k |
float64_t | width |
float64_t * | gauss_table |
COligoStringKernel | ( | int32_t | cache_size, |
int32_t | k, | ||
float64_t | width | ||
) |
Constructor
cache_size | cache size for kernel |
k | k-mer length |
width | - equivalent to 2*sigma^2 |
在文件OligoStringKernel.cpp第24行定义。
~COligoStringKernel | ( | ) | [virtual] |
Destructor
在文件OligoStringKernel.cpp第30行定义。
void cleanup | ( | ) | [virtual] |
float64_t compute | ( | int32_t | x, |
int32_t | y | ||
) | [virtual] |
compute kernel function for features a and b idx_{a,b} denote the index of the feature vectors in the corresponding feature object
abstract base method
x | index a |
y | index b |
实现了CKernel。
在文件OligoStringKernel.cpp第223行定义。
void encodeOligo | ( | const std::string & | sequence, |
uint32_t | k_mer_length, | ||
const std::string & | allowed_characters, | ||
std::vector< std::pair< int32_t, float64_t > > & | values | ||
) | [static, protected] |
encodes the signals of the sequence
This function stores the oligo function signals in 'values'.
The 'k_mer_length' and the 'allowed_characters' determine, which signals are used. Every pair contains the position of the signal and a numerical value reflecting the signal. The numerical value represents the k_mer to a base n = |allowed_characters|. Example: The value of k_mer CG for the allowed characters ACGT would be 1 * n^1 + 2 * n^0 = 6.
在文件OligoStringKernel.cpp第57行定义。
virtual EKernelType get_kernel_type | ( | ) | [virtual] |
virtual const char* get_name | ( | ) | const [virtual] |
void getSequences | ( | const std::vector< std::string > & | sequences, |
uint32_t | k_mer_length, | ||
const std::string & | allowed_characters, | ||
std::vector< std::vector< std::pair< int32_t, float64_t > > > & | encoded_sequences | ||
) | [static, protected] |
encodes all sequences with the encodeOligo function and stores them in 'encoded_sequences'
This function encodes the sequences of 'sequences' via the function encodeOligo.
在文件OligoStringKernel.cpp第115行定义。
initialize kernel
l | features of left-hand side |
r | features of right-hand side |
在文件OligoStringKernel.cpp第43行定义。
float64_t kernelOligoFast | ( | const std::vector< std::pair< int32_t, float64_t > > & | x, |
const std::vector< std::pair< int32_t, float64_t > > & | y, | ||
int32_t | max_distance = -1 |
||
) | [protected] |
returns the value of the oligo kernel for sequences 'x' and 'y'
This function computes the kernel value of the oligo kernel, which was introduced by Meinicke et al. in 2004. 'x' and 'y' are encoded by encodeOligo and 'exp_cache' has to be constructed by getExpFunctionCache.
'max_distance' can be used to speed up the computation even further by restricting the maximum distance between a k_mer at position i in sequence 'x' and a k_mer at position j in sequence 'y'. If i - j > 'max_distance' the value is not added to the kernel value. This approximation is switched off by default (max_distance < 0).
在文件OligoStringKernel.cpp第141行定义。
float64_t* gauss_table [protected] |
cache for exp (see getExpFunctionCache above)
在文件OligoStringKernel.h第157行定义。
int32_t k [protected] |
member variable k
在文件OligoStringKernel.h第153行定义。
width of kernel
在文件OligoStringKernel.h第155行定义。