R - Multiple Linear Regression

Card Puncher Data Processing

About

Multiple linear regression with R functions such as lm

Steps

Linear Model

Unstandardized

Unstandardized Multiple Regression

myFit=lm(response~predictor1+predictor2,data=data.frame)
myFit
Call:
lm(formula = response~ predictor1 + predictor2, data = data.frame)

Coefficients:
(Intercept)   predictor1   predictor2
   33.22276     -1.03207      0.03454  

Standardized

Regression analyses, standardized (in the z scale).

modelz <- lm(scale(data$OutcomeVariable) ~ scale(data$PredictorVariable1) + scale(data$PredictorVariable2) )

All variables are predictors

myFit=lm(outcome~.,DataFrame)

The point is a short-cut to select all variables.

Updating a model

Updating a model to remove the non-significant predictors.

myModel1=update(myModel1,~.-NoSignificantPredictor1-NoSignificantPredictor1)

Model Attributes

attributes(myFit)

See R - Names

$names
 [1] "coefficients"  "residuals"     "effects"       "rank"         
 [5] "fitted.values" "assign"        "qr"            "df.residual"  
 [9] "xlevels"       "call"          "terms"         "model"        

$class
[1] "lm"

where:

Summary

Summary Statistics

summary(myFit)
Call:
lm(formula = response~ predictor1 + predictor2, data = data.frame)

Residuals:
    Min      1Q  Median      3Q     Max 
-15.981  -3.978  -1.283   1.968  23.158 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 33.22276    0.73085  45.458  < 2e-16 ***
predictor1  -1.03207    0.04819 -21.416  < 2e-16 ***
predictor2   0.03454    0.01223   2.826  0.00491 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 6.173 on 503 degrees of freedom
Multiple R-squared:  0.5513,	Adjusted R-squared:  0.5495 
F-statistic:   309 on 2 and 503 DF,  p-value: < 2.2e-16

where:

Confidence Interval

Confidence Interval

confint(myFit)
2.5 %      97.5 %
(Intercept) 31.78687150 34.65864956
predictor1  -1.12674848 -0.93738865
predictor2   0.01052507  0.05856361

Prediction

predict(fit2,data.frame(predictor1=c(5,10,15),predictor2=c(20,30,40)),interval="confidence")
fit      lwr      upr
1 28.75330 27.67694 29.82967
2 23.93841 22.97305 24.90376
3 19.12351 18.12607 20.12094

Plot

# A two by two frames to receive the scatter-plots
par(mfrow=c(2,2))
# Plot
plot(model)

Plot gives various views of the linear model:

  • Residuals against the fitted values.

The goal is to capture non-linearities. If we see a curve in the residuals, it means that the model is not quite capturing everything that's going on because of some non-linearity effect.

  • Normal QQ
  • Scale Location
  • Residuals vs Leverage





Discover More
Card Puncher Data Processing
R - Interaction Analysis

interaction with R . An interaction term between a numeric x and z is just the product of x and z. lm processes the “” operator between variables andautomatically: add the interaction...
Card Puncher Data Processing
R - Non-linear Effect Analysis

Non-linear Analysis with R. where: the quadratic is indicated by the power 2 (predictor1^2) As power has a meaning in this formula, the identity function (I) is used to protect it....



Share this page:
Follow us:
Task Runner