The following are a set of methods intended for regression in which the target value is expected to be a linear combination of the input variables. In mathematical notion, if \hat{y} is the predicted value.
\hat{y}(x, w) = w_0 + w_1 x_1 + ... + w_D x_D
Across the module, we designate the vector w = (w1, ..., w_D) as coef_ and w_0 as intercept_.
This method minimizes the sum of squared distances between the observed responses in the dataset, and the responses predicted by the linear approximation.
In the following methods, We add a regularization term to an error function in order to control over-fitting.
Coefficient estimates for multiple linear regression models rely on the independence of the model terms. When terms are correlated and the columns of the design matrix X have an approximate linear dependence, the matrix X(X^T X)^{-1} becomes close to singular. As a result, the least-squares estimate:
\hat{\beta} = (X^T X)^{-1} X^T y
becomes highly sensitive to random errors in the observed response y, producing a large variance. This situation of multicollinearity can arise, for example, when data are collected without an experimental design.
Ridge regression adresses the problem by estimating regression coefficients using:
\hat{\beta} = (X^T X + \alpha I)^{-1} X^T y
The Lasso is a linear model trained with L1 prior as regularizer. The objective function to minimize is:
0.5 * ||y - X w||_2 ^ 2 + \alpha * ||w||_1
The lasso estimate solves thus solves the minization of the least-squares penalty with \alpha * ||w||_1 added, where \alpha is a constant and ||w||_1 is the L1-norm of the parameter vector.
This formulation is useful in some context due to its tendency to prefer solutions with fewer parameter values, effectively reducing the number of variables upon which the given solution is dependent. For this reason, the LASSO and its variants are fundamental to the field of compressed sensing.
This implementation uses coordinate descent as the algorithm to fit the coeffcients.
The function lasso_path computes the coefficients along the full path of possible values.
Elastic Net is a linear model trained with L1 and L2 prior as regularizer.
The objective function to minize is in this case
0.5 * ||y - X w||_2 ^ 2 + \alpha * \rho * ||w||_1 + \alpha * (1-\rho) * 0.5 * ||w||_2 ^ 2
example_plot_lasso_coordinate_descent_path.py