Home > Technology peripherals > AI > body text

Implement a simple linear regression method in R and explain its concepts

WBOY
Release: 2024-01-22 23:09:11
forward
1153 people have browsed it

Simple linear regression is a statistical method used to study the relationship between two continuous variables. Among them, one variable is called the independent variable (x) and the other variable is called the dependent variable (y). We assume that there is a linear relationship between these two variables and try to find a linear function that accurately predicts the response value (y) of the dependent variable based on the characteristics of the independent variable. By fitting a straight line, we can get the predicted results. This predictive model can be used to understand and predict how the dependent variable changes as the independent variables change.

In order to understand this concept, we can use a salary data set, which contains the value of the dependent variable (salary) corresponding to each independent variable (years of experience).

Salary Data Set

Annual Salary and Experience

1.1 39343.00

1.3 46205.00

1.5 37731.00

2.0 43525.00

2.2 39891.00

2.9 56642.00

3.0 60150.00

3.2 54445.00

3.2 64445.00

3.7 57189.00

For general purposes, we define:

x as the feature vector, that is, x=[x_1,x_2,....,x_n],

y as the response vector, That is, y=[y_1,y_2,....,y_n]

for n observations (in the above example, n=10).

Scatterplot of the given data set

简单线性回归概念 R代码实现简单线性回归

Now, we have to find a line that fits the above scatterplot by It allows us to predict the response for any y value or any x value.

The line of best fit is called the regression line.

The following R code is used to implement simple linear regression

dataset=read.csv('salary.csv')
install.packages('caTools')
library(caTools)
split=sample.split(dataset$Salary,SplitRatio=0.7)
trainingset=subset(dataset,split==TRUE)
testset=subset(dataset,split==FALSE)
lm.r=lm(formula=Salary~YearsExperience,
data=trainingset)
coef(lm.r)
ypred=predict(lm.r,newdata=testset)
install.packages("ggplot2")
library(ggplot2)
ggplot()+geom_point(aes(x=trainingset$YearsExperience,
y=trainingset$Salary),colour='red')+
geom_line(aes(x=trainingset$YearsExperience,
y=predict(lm.r,newdata=trainingset)),colour='blue')+
ggtitle('Salary vs Experience(Training set)')+
xlab('Years of experience')+
ylab('Salary')
ggplot()+
geom_point(aes(x=testset$YearsExperience,y=testset$Salary),
colour='red')+
geom_line(aes(x=trainingset$YearsExperience,
y=predict(lm.r,newdata=trainingset)),
colour='blue')+
ggtitle('Salary vs Experience(Test set)')+
xlab('Years of experience')+
ylab('Salary')
Copy after login

Visualize the training set results

简单线性回归概念 R代码实现简单线性回归

The above is the detailed content of Implement a simple linear regression method in R and explain its concepts. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:163.com
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!