About
In machine learning and statistics, dimensionality reduction is the process of reducing the number of random variables (features) under consideration and can be divided into:
- feature selection (returns a subset of the features)
- and feature extraction (create new features that are functions of the original features)
This methods are some called “Model selection methods”.
They are an essential tool for data analysis, especially for big datasets involving many predictors.
In dimensionality reduction, the goal is to select/retain a subset of features while still retaining as much of the variance in the dataset as possible.
Articles Related
Benefits
- Improved model interpretability (When we have a small number of features, the model becomes more interpretable),
- Shorter training times, (It allows for smaller, faster scoring, and more meaningful Generalized Linear Models (GLM))
- Enhanced generalisation by reducing overfitting.
Concept
- random projections
- feature hashing