Simple linear regression is a statistical method used to study the relationship between two continuous variables. Among them, one variable is called the independent variable (x) and the other variable is called the dependent variable (y). We assume that there is a linear relationship between these two variables and try to find a linear function that accurately predicts the response value (y) of the dependent variable based on the characteristics of the independent variable. By fitting a straight line, we can get the predicted results. This predictive model can be used to understand and predict how the dependent variable changes as the independent variables change.
In order to understand this concept, we can use a salary data set, which contains the value of the dependent variable (salary) corresponding to each independent variable (years of experience).
Annual Salary and Experience
1.1 39343.00
1.3 46205.00
1.5 37731.00
2.0 43525.00
2.2 39891.00
2.9 56642.00
3.0 60150.00
3.2 54445.00
3.2 64445.00
3.7 57189.00
For general purposes, we define:
x as the feature vector, that is, x=[x_1,x_2,....,x_n],
y as the response vector, That is, y=[y_1,y_2,....,y_n]
for n observations (in the above example, n=10).
Now, we have to find a line that fits the above scatterplot by It allows us to predict the response for any y value or any x value.
The line of best fit is called the regression line.
dataset=read.csv('salary.csv') install.packages('caTools') library(caTools) split=sample.split(dataset$Salary,SplitRatio=0.7) trainingset=subset(dataset,split==TRUE) testset=subset(dataset,split==FALSE) lm.r=lm(formula=Salary~YearsExperience, data=trainingset) coef(lm.r) ypred=predict(lm.r,newdata=testset) install.packages("ggplot2") library(ggplot2) ggplot()+geom_point(aes(x=trainingset$YearsExperience, y=trainingset$Salary),colour='red')+ geom_line(aes(x=trainingset$YearsExperience, y=predict(lm.r,newdata=trainingset)),colour='blue')+ ggtitle('Salary vs Experience(Training set)')+ xlab('Years of experience')+ ylab('Salary') ggplot()+ geom_point(aes(x=testset$YearsExperience,y=testset$Salary), colour='red')+ geom_line(aes(x=trainingset$YearsExperience, y=predict(lm.r,newdata=trainingset)), colour='blue')+ ggtitle('Salary vs Experience(Test set)')+ xlab('Years of experience')+ ylab('Salary')
The above is the detailed content of Implement a simple linear regression method in R and explain its concepts. For more information, please follow other related articles on the PHP Chinese website!