# R - Feature Selection - Indirect Model Selection

### Table of Contents

## About

In a feature selection process, once you have generated all possible models, you have to select the best one. This article talks about the indirect methods.

## Articles Related

## Model Selection

### Adujstement formula

We will select the models using CP but as you see below, the regsubset object that has been created in the model generation step, has also adjusted r squared and BIC

`names(myPathOfModel.summary)`

`[1] "which" "rsq" "rss" "adjr2" "cp" "bic" "outmat" "obj" `

Like for each models, the best subset models has the following variables:

- the adjusted r squared,
- the Cp statistic,
- the BIC statistic.

You use this data to plot them

### Function

The idea here is to pick a model with the lowest Cp. To identify it, you can do that with the Cp plot or with the following function:

`which.min(myPathOfModel.summary$cp)`

`[1] 10`

In this case, the model with 10 variables is the smallest.

### Plot

#### Cp

Cp is an estimate of prediction error.

`plot(myPathOfModel.summary$cp,xlab="Number of Variables",ylab="Cp")`

You can also plot the best point:

`points(10,myPathOfModel.summary$cp[10],pch=20,col="red")`

#### Cp Model by variables

`plot(myPathOfModel,scale="Cp")`

This plot gives a quick summary of all the models by variables, as opposed to just seeing the Cp statistics.

- the unique value of Cp for each model are in a descendant order (worst and worst) on the y axis (Small is good)
- the variables are on the x axis
- The black squares indicates that variable's are in (one) and the white squares indicates that variable's are out (null)