In the field of machine learning, linear regression is a commonly used method. Multivariable linear regression is a method that can predict the linear relationship between one or more independent variables and a dependent variable. It is usually used to predict market trends such as real estate and stock prices.
Python is a very popular programming language that is easy to learn, write and debug. In Python, multivariable linear regression models can be easily implemented using the Scikit-learn library.
In this article, we will introduce multivariable linear regression in Python through an example of housing price prediction.
Import libraries and data
In order to implement the multivariable linear regression model, we need to import some Python libraries:
1 2 3 4 5 6 |
|
The data used here comes from Boston on the Kaggle website Real estate data set. We can use the read_csv function in the Pandas library to read data from the data file:
1 |
|
Exploration and visualization of data
Before building the model, we should explore the data and understand each Distribution of features and relationships between features.
We can use the describe function and corr function in the Pandas library to understand the feature distribution of the data and the correlation between features. Among them, the corr function returns the correlation coefficient matrix between each feature and other features.
1 2 |
|
We can also use the Matplotlib library to visualize the data. For example, draw a scatter plot between two features:
1 2 3 4 5 |
|
Data preprocessing and model training
In order to train a multivariate linear regression model, we need to split the data into two parts: training set and test set. We can use the train_test_split function in the Scikit-learn library to randomly divide the data set into a training set and a test set:
1 2 3 4 |
|
Next, we can use the LinearRegression function in the Scikit-learn library to initialize multivariable linear regression model and use the training set to fit the model:
1 2 |
|
Evaluation and prediction of the model
To evaluate the performance of the model, we can use the test set to predict house prices and use the average variance ( Mean Squared Error (MSE) to measure the difference between predicted results and actual results. We can use the mean_squared_error function in the Scikit-learn library to calculate the average variance of the prediction results:
1 2 3 |
|
If the value of the average variance is smaller, it means that the model prediction results are more accurate.
Finally, we can use the model to predict new house prices. For example, we want to predict the price of a house that has 6 rooms, 2 bathrooms, is 4.5 miles from a business district, etc. We can input the values of these features into the model and use the model’s predict function to predict the price of the house:
1 2 3 |
|
The predicted price of this house is approximately $239,000.
Summary
In this article, we introduced how to implement a multivariable linear regression model in Python using the LinearRegression function of the Scikit-learn library. We use the Boston real estate data set as an example to illustrate the steps of data import, exploration, visualization, preprocessing, and model training, evaluation, and prediction. I hope this article can help readers better master the multivariable linear regression method in Python.
The above is the detailed content of Multivariable linear regression example in Python. For more information, please follow other related articles on the PHP Chinese website!