Lasso regression is a popular linear regression method used in machine learning, which aims to find the best-fitting model by ignoring irrelevant feature variables. This article will introduce how to implement Lasso regression in Python and provide an actual data set for demonstration.
Introduction to Lasso Regression
Lasso regression is a method of solving ordinary least squares problems by adding a penalty term to the objective function. This penalty term is implemented using L1 regularization (also called Lasso penalty), and its form is as follows:
$J(eta)= rac{1}{2n}sum_{i=1}^ {n}(y_i-sum_{j=1}^{p}X_{ij} eta_j)^2 lpha sum_{j=1}^{p}| eta_j|$
where, $y$ is the response variable, $X$ is the independent variable matrix, $eta$ is the model coefficient, $n$ is the number of samples, $p$ is the number of features, and $lpha$ is the penalty parameter. The difficult part of Lasso regression is the non-convex optimization problem of the penalty term.
One way to implement Lasso regression is to solve it through the coordinate descent (CD) algorithm. The basic idea is that in each iteration, only one coefficient is changed. In this way, the CD algorithm cleverly bypasses the non-convex optimization problem of the penalty term.
Python Lasso Regression Implementation
Python provides many machine learning libraries, such as Scikit-learn, that can easily implement Lasso regression.
First, import the required libraries as follows:
import numpy as np import pandas as pd from sklearn.linear_model import LassoCV from sklearn.datasets import load_boston from sklearn.preprocessing import StandardScaler
Next, we load the Boston housing price data set and normalize it:
boston = load_boston() X = boston.data y = boston.target X = StandardScaler().fit_transform(X)
Then, we use Scikit-learn LassoCV in implements Lasso regression. The model automatically performs cross-validation and selects the optimal $lpha$ value.
lasso_reg = LassoCV(alphas=np.logspace(-3, 3, 100), cv=5, max_iter=100000) lasso_reg.fit(X, y)
Finally, we output the obtained optimal $lpha$ value and model coefficient:
print('Best alpha:', lasso_reg.alpha_) print('Model coefficients:', lasso_reg.coef_)
Full code example:
import numpy as np import pandas as pd from sklearn.linear_model import LassoCV from sklearn.datasets import load_boston from sklearn.preprocessing import StandardScaler boston = load_boston() X = boston.data y = boston.target X = StandardScaler().fit_transform(X) lasso_reg = LassoCV(alphas=np.logspace(-3, 3, 100), cv=5, max_iter=100000) lasso_reg.fit(X, y) print('Best alpha:', lasso_reg.alpha_) print('Model coefficients:', lasso_reg.coef_)
The output results are as follows:
Best alpha: 0.10000000000000002 Model coefficients: [-0.89521162 1.08556604 0.14359222 0.68736347 -2.04113155 2.67946138 0.01939491 -3.08179223 2.63754058 -2.05806301 -2.05202597 0.89812875 -3.73066641]
This shows that through Lasso regression, we can determine the best model for predicting Boston house prices and extract the features most relevant to the response variable.
Conclusion
This article introduces how to implement Lasso regression in Python and demonstrates the application of this method through an actual data set. Lasso regression is a very useful linear regression technique, especially suitable for processing high-dimensional data. In actual problems, techniques such as cross-validation and standardization can be used to optimize model performance and extract the most relevant features.
The above is the detailed content of Lasso regression example in Python. For more information, please follow other related articles on the PHP Chinese website!