Polynomial regression is a method commonly used in regression problems. It builds a model by fitting polynomial data to the data, so that the model can predict the target value more accurately. Python provides a wealth of data processing and machine learning libraries that can easily implement polynomial regression models. This article will introduce how to implement polynomial regression in Python and give an example based on polynomial regression.
1. The principle of polynomial regression
The principle of polynomial regression is relatively simple, which is to explain the value of the independent variable through a polynomial function. That is:
$y = b_0 b_1x_1 b_2x_2^2 ... b_nx_n^n$
Where, $y$ is the dependent variable, $b_0, b_1, b_2, ... , b_n $ is the regression coefficient, $x_1, x_2, ... , x_n$ are independent variables. Because polynomial regression can improve the flexibility of the model, it is often used in problems that require a high degree of fitting.
2. Python implements polynomial regression
In Python, polynomial regression can be implemented through the scikit-learn library. The scikit-learn library is a commonly used machine learning library in Python, providing various models and tools to process data and build models.
The following is a simple polynomial regression implementation step:
import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures
The numpy library and matplotlib library are imported here. LinearRegression class and PolynomialFeatures class in sklearn library.
# 创建数据 x = np.linspace(-10, 10, num=50).reshape(-1, 1) # 自变量数据 y = np.sin(x) + np.random.randn(50, 1) * 0.2 # 因变量数据
The linspace function in the numpy library is used here to generate 50 equally spaced independent variable data from -10 to 10, and use the sin function Generate dependent variable data. To make the results more realistic, some random noise is also added.
# 使用多项式模型拟合数据 poly_reg = PolynomialFeatures(degree=5) # degree表示多项式的次数 x_poly = poly_reg.fit_transform(x) lin_reg = LinearRegression() lin_reg.fit(x_poly, y)
The PolynomialFeatures class is used to convert the independent variable x into a polynomial, and then the LinearRegression class is used to process the polynomial data. Use the fit method to train the model.
# 可视化结果 plt.scatter(x, y) plt.plot(x, lin_reg.predict(poly_reg.fit_transform(x)), color='red') plt.show()
The matplotlib library is used here to visualize the results. The raw data are displayed via a scatter plot and a polynomial regression curve is plotted on the graph.
3. Example of Polynomial Regression
Consider an example: predicting the driving distance based on the car’s speed and braking time. We use the dataset provided by Udacity to solve this problem. The data set includes the car's speed, braking time and corresponding driving distance.
import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures # 读入数据 data = pd.read_csv('data/car.csv')
Here the car data set stored in a csv file is read.
# 提取特征和目标 X = data.iloc[:, :-1].values y = data.iloc[:, -1].values # 将刹车时间转为2维数据 X = X.reshape(-1, 1) # 划分训练集和测试集 from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
The pandas library is used here to read the data into DataFrame format. Then use the iloc method to extract features and targets by column, and use the reshape function to convert the braking time into 2-dimensional data. Finally, use the train_test_split function to divide the data set into a training set and a test set in proportion.
# 训练模型 poly_reg = PolynomialFeatures(degree = 2) X_poly = poly_reg.fit_transform(X_train) lin_reg = LinearRegression() model = lin_reg.fit(X_poly, y_train) # 可视化结果 plt.scatter(X_train, y_train, color='red') plt.plot(X_train, lin_reg.predict(poly_reg.fit_transform(X_train)), color='blue') plt.title('Car distance prediction') plt.xlabel('Speed + Brake Time') plt.ylabel('Distance') plt.show() # 测试模型 y_pred = model.predict(poly_reg.fit_transform(X_test))
The PolynomialFeatures class is used to convert the data into a quadratic polynomial, and then the LinearRegression class is used to process the polynomial data. Then use the fit method to train the model. Finally, use the predict method to predict the results of the model.
# 计算评估指标 from sklearn.metrics import mean_squared_error, r2_score rmse = np.sqrt(mean_squared_error(y_test, y_pred)) r2 = r2_score(y_test, y_pred) print('Root Mean Squared Error: ', rmse) print('R2 Score: ', r2)
The mean_squared_error function and r2_score function in the sklearn library are used to calculate the evaluation indicators, which are the root mean square error (RMSE) and the coefficient of determination respectively. (R2).
Through the above steps, we can use the polynomial regression model to predict the driving distance of the car.
Summary
This article introduces the principle of polynomial regression and its implementation method in Python. Through a prediction example of automobile data, we can see the advantages of polynomial regression in building models and predicting results. Of course, polynomial regression also has some shortcomings, such as overfitting and other problems. Therefore, in practical applications, it is necessary to select appropriate regression methods and parameters according to the actual situation.
The above is the detailed content of Polynomial regression example in Python. For more information, please follow other related articles on the PHP Chinese website!