Data Mining - (Dimension|Feature) (Reduction)

1 - About

In machine learning and statistics, dimensionality reduction is the process of reducing the number of random variables (features) under consideration and can be divided into:

This methods are some called “Model selection methods”.

They are an essential tool for data analysis, especially for big datasets involving many predictors.

In dimensionality reduction, the goal is to select/retain a subset of features while still retaining as much of the variance in the dataset as possible.

3 - Benefits

  • Improved model interpretability (When we have a small number of features, the model becomes more interpretable),
  • Shorter training times, (It allows for smaller, faster scoring, and more meaningful Generalized Linear Models (GLM))
  • Enhanced generalisation by reducing overfitting.

4 - Concept

  • random projections
  • feature hashing

5 - Documentation / Reference

Data Science
Data Analysis
Data Science
Linear Algebra Mathematics

Powered by ComboStrap