Multiple linear regression is a statistical model widely used in data analysis and machine learning. It predicts the value of one or more dependent variables by using multiple independent variables. In Python, we can use many different libraries and frameworks to implement multiple linear regression models, such as NumPy, Pandas, Scikit-Learn, etc.
Below, we will use the Scikit-Learn library to build a multiple linear regression model to predict house prices. In this example, we will use data from the House Prices dataset. The data set contains 506 samples and 13 independent variables, including urban crime rate, average number of rooms in a house, age of the house, etc.
First, we need to import the required libraries and datasets:
import numpy as np import pandas as pd from sklearn.datasets import load_boston boston = load_boston() X = pd.DataFrame(boston.data, columns=boston.feature_names) y = pd.DataFrame(boston.target, columns=['MEDV'])
Here, we use the Pandas library to load the dataset into a DataFrame object and put The independent and dependent variables are stored in X and y respectively.
Next, we need to split the data set into a training set and a test set. The training set is used to fit the model, while the test set is used to evaluate the performance of the model.
from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
Here, we use the train_test_split function in the Scikit-Learn library to split the data set into a training set and a test set. We specify the size of the test set using the test_size parameter and set the random seed using the random_state parameter to ensure reproducibility of results.
Next, we can use a linear regression model to fit the data set.
from sklearn.linear_model import LinearRegression regressor = LinearRegression() regressor.fit(X_train, y_train)
Here, we use the LinearRegression class in the Scikit-Learn library to create a linear regression model and use the fit method to fit the training data.
Now we can use the model to predict the house prices in the test set.
y_pred = regressor.predict(X_test)
Here, we use the predict method to predict the house prices in the test set.
Finally, we can use some evaluation metrics from the Scikit-Learn library to evaluate the performance of the model.
from sklearn.metrics import mean_squared_error, r2_score print('Mean squared error: %.2f' % mean_squared_error(y_test, y_pred)) print('Coefficient of determination: %.2f' % r2_score(y_test, y_pred))
Here, we use the mean_squared_error function to calculate the mean square error and the r2_score function to calculate the coefficient of determination. These metrics help us understand the performance and accuracy of the model.
In short, using Python to design multiple linear regression models can be very simple. We just need to import the required libraries and datasets, fit the model and use some evaluation metrics to evaluate the model's performance. In practical applications, we need to perform exploratory data analysis, feature engineering and model optimization on data to obtain better prediction results.
The above is the detailed content of Detailed explanation of how to implement multiple linear regression models in Python. For more information, please follow other related articles on the PHP Chinese website!