# Data Mining - Dimensionality (number of variable, parameter) (P)

Not to confound with d: the model size. You may have 1000 attributes (p=1000) in your sample but after feature selection for instance, you model may use only a handful (d=5)

In physics and mathematics, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it. (ie the number of variable to to be able to define an outcome)

## Curse

In high dimension, it's really difficult to stay local. See Data Mining - High Dimension (Curse of Dimensionality)

## Documentation / Reference

Discover More
Data Mining - (Anomaly|outlier) Detection

The goal of anomaly detection is to identify unusual or suspicious cases based on deviation from the norm within data that is seemingly homogeneous. Anomaly detection is an important tool: in data...
Data Mining - (Attribute|Feature) (Selection|Importance)

Feature selection is the second class of dimension reduction methods. They are used to reduce the number of predictors used by a model by selecting the best d predictors among the original p predictors....
Data Mining - (Dimension|Feature) (Reduction)

In machine learning and statistics, dimensionality reduction is the process of reducing the number of random variables (features) under consideration and can be divided into: feature selection (returns...
Data Mining - (Feature|Attribute) Extraction Function

Feature extraction is the second class of methods for dimension reduction. dimension reduction It creates new attributes (features) using linear combinations of the (original|existing) attributes. ...
Data Mining - Global vs Local

Global refers to calculation that are made over the whole data set whereas local refers to calculations that are made local to a point or a partition. In high dimension, it's really difficult to stay...
Data Mining - High Dimension (Curse of Dimensionality)

High dimension In high dimension, it's really difficult to stay local. edge cases See this interactive app in R Shiny on the Curse of Dimensionality. Circle...
Data Mining - Naive Bayes (NB)

Naive Bayes (NB) is a simple supervised function and is special form of discriminant analysis. It's a generative model and therefore returns probabilities. It's the opposite classification strategy...
Data Mining - Problem

A page the problem definition in data Type of target: nominal or quantitative Type of target class: binomial of multiclass Number of parameters: Type of (predictor|features): nominal or numeric....
Machine Learning - K-Nearest Neighbors (KNN) algorithm - Instance based learning

“Nearest‐neighbor” learning is also known as “Instance‐based” learning. K-Nearest Neighbors, or KNN, is a family of simple: classification and regression algorithms based on Similarity...
R - Feature selection - Model Generation (Best Subset and Stepwise)

This article talks the first step of feature selection in R that is the models generation. Once the models are generated, you can select the best model with one of this approach: Best...