# R - Feature selection - Model Generation (Best Subset and Stepwise)

### Table of Contents

## 1 - About

This article talks about the first step of feature selection in R that is the models generation.

Once the models are generated, you can select the best model with one of this approach:

## 2 - Articles Related

## 3 - Steps

### 3.1 - Function regsubsets

#### 3.1.1 - Best subset

Best subset regression looks through all possible regression models of all different subset sizes and looks for the best of each size. And so produces a sequence of models which is the best subset for each particular size.

best subset is quite aggressive looking at all possible subsets.

A function in leaps called regsubsets do best subset modeling.

```
# Load the library
library(leaps)
myPathOfModel=regsubsets(Response~.,data=myDataFrame, nvmax=8)
# By default it only goes up to subsets of size 8, you can change it with the parameter nvmax
```

#### 3.1.2 - Forward Stepwise

Statistics - Forward and Backward Stepwise (Selection|Regression)

Forward stepwise is a greedy algorithm. It produces a nested sequence of models as each time you just add the variable that improves the set the most.

The models selected are nested because each new model includes all the variables that were before plus one new one.

The regsubsets with the method=“forward” option

```
library(leaps)
myPathOfModel=regsubsets(response~.,data=myDataFrame,nvmax=10,method="forward")
```

### 3.2 - Summary

```
myPathOfModel.summary=summary(myPathOfModel)
myPathOfModel.summary
```

Summary will summarize the best subset models:

- for the best subset of size 1, it puts 1 star next to the variable that's in the best subset of size 1.
- for the best subset of size 2, it puts 2 stars next to the variables that are in the best subset of size 2
- …
- for the best subset of size n, it puts n stars next to the variables that are in the best subset of size n

```
Subset selection object
Call: regsubsets.formula(Response ~ ., data = myDataFrame)
3 Variables (and intercept)
Forced in Forced out
Var1 FALSE FALSE
Var2 FALSE FALSE
Var3 FALSE FALSE
1 subsets of each size up to 8
Selection Algorithm: exhaustive
Var1 Var2 Var3
1 ( 1 ) " " " " "*"
2 ( 1 ) "*" "*" " "
```

### 3.3 - Variables

```
names(myPathOfModel.summary)
```

```
[1] "which" "rsq" "rss" "adjr2" "cp" "bic" "outmat" "obj"
```

Like for each models, the best subset models has the following variables:

- the adjusted r squared,
- the Cp statistic,
- the BIC statistic.

You use this data to R - Feature Selection - Indirect Model Selection them

### 3.4 - Get the regression coefficient vector for a model

```
coef(myPathOfModel,10)
```

```
(Intercept) Var1 Var2 .... Varn
162.5354420 -2.1686501 6.9180175 5.7732246 -0.1300798
```