Data Mining - Global vs Local

Thomas Bayes


Global refers to calculation that are made over the whole data set whereas local refers to calculations that are made local to a point or a partition.

High dimension vs Local

In high dimension, it's really difficult to stay local.

Curse Of Dimensionality Radius Volume

Neighborhood Curse Of Dimensionality 1d 2d

Example: the percentage of volume that contains 10% of the data in an hypercube is:

  • Two dimension: <math>1^2 - 0.9^2 = 1 - 0.81 = 0.19 = 19\%</math>
  • Ten dimensions: <math>1^{10} - 0.9^{10} = 1 - 0.35 = 0.65 = 65\%</math>

To resolved this problem, (structured|parametrized) model have been introduced. The simplest one is the linear model.

Documentation / Reference

Recommended Pages
Third Degree Polynomial
Data Mining - (Global) Polynomial Regression (Degree)

polynomials regression Although polynomials are easy to think of, splines are much better behaved and more local. With polynomial regression, you create new variables that are just transformations...
Thomas Bayes
Data Mining - Dimensionality (number of variable, parameter) (P)

Not to confound with d: the model size. You may have 1000 attributes (p=1000) in your sample but after feature selection for instance, you model may use only a handful (d=5) In physics and mathematics,...
Thomas Bayes
Data Mining - High Dimension (Curse of Dimensionality)

High dimension In high dimension, it's really difficult to stay local. edge cases See this interactive app in R Shiny on the Curse of Dimensionality. Circle...
Step Function Linear Regression
Data Mining - Step Function (piecewise constants)

Step functions, are another way of fitting non-linearities. (especially popular in epidemiology and biostatistics) Continuous variable are cut into discrete sub-ranges and fit a constant model in...
Thomas Bayes
Statistics - Knots (Cut points)

Knots are cutpoints that defines different regions (or partitions) for a variable. In each regions, a fitting must occurs. The definition of different regions is a way to stay local in the fitting process....
Local Regession Loess
Statistics - LOcal (Weighted) regrESSion (LOESS|LOWESS)

Popular family of methods called local regression that helps fitting non-linear functions just focusing locally on the data. LOESS and LOWESS (locally weighted scatterplot smoothing) are two strongly...
Piecewise Cubic Polynomial Non Continu
Statistics - Piecewise polynomials

Piecewise polynomials generalize the idea of piecewise constants. The idea try to get rid of the global polynomial because it's global and not local. Instead of having asingle polynomial over the whole...

Share this page:
Follow us:
Task Runner