About
Multiple linear regression with R functions such as lm
Articles Related
Steps
Linear Model
Unstandardized
Unstandardized Multiple Regression
myFit=lm(response~predictor1+predictor2,data=data.frame)
myFit
Call:
lm(formula = response~ predictor1 + predictor2, data = data.frame)
Coefficients:
(Intercept) predictor1 predictor2
33.22276 -1.03207 0.03454
Standardized
Regression analyses, standardized (in the z scale).
modelz <- lm(scale(data$OutcomeVariable) ~ scale(data$PredictorVariable1) + scale(data$PredictorVariable2) )
All variables are predictors
myFit=lm(outcome~.,DataFrame)
The point is a short-cut to select all variables.
Updating a model
Updating a model to remove the non-significant predictors.
myModel1=update(myModel1,~.-NoSignificantPredictor1-NoSignificantPredictor1)
Model Attributes
attributes(myFit)
See R - Names
$names
[1] "coefficients" "residuals" "effects" "rank"
[5] "fitted.values" "assign" "qr" "df.residual"
[9] "xlevels" "call" "terms" "model"
$class
[1] "lm"
where:
- coefficient are the regression coefficient
- residuals are the residuals
- effects are the effects
- …
Summary
summary(myFit)
Call:
lm(formula = response~ predictor1 + predictor2, data = data.frame)
Residuals:
Min 1Q Median 3Q Max
-15.981 -3.978 -1.283 1.968 23.158
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 33.22276 0.73085 45.458 < 2e-16 ***
predictor1 -1.03207 0.04819 -21.416 < 2e-16 ***
predictor2 0.03454 0.01223 2.826 0.00491 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6.173 on 503 degrees of freedom
Multiple R-squared: 0.5513, Adjusted R-squared: 0.5495
F-statistic: 309 on 2 and 503 DF, p-value: < 2.2e-16
where:
- Multiple R-squared: Statistics - R-squared ( |Coefficient of determination) for Model Accuracy
- F-statistic: Statistics - (F-Statistic|F-test|F-ratio)
Confidence Interval
confint(myFit)
2.5 % 97.5 %
(Intercept) 31.78687150 34.65864956
predictor1 -1.12674848 -0.93738865
predictor2 0.01052507 0.05856361
Prediction
predict(fit2,data.frame(predictor1=c(5,10,15),predictor2=c(20,30,40)),interval="confidence")
fit lwr upr
1 28.75330 27.67694 29.82967
2 23.93841 22.97305 24.90376
3 19.12351 18.12607 20.12094
Plot
# A two by two frames to receive the scatter-plots
par(mfrow=c(2,2))
# Plot
plot(model)
Plot gives various views of the linear model:
- Residuals against the fitted values.
The goal is to capture non-linearities. If we see a curve in the residuals, it means that the model is not quite capturing everything that's going on because of some non-linearity effect.
- Normal QQ
- Scale Location
- Residuals vs Leverage