Statistics vs (Machine Learning|Data Mining)


This articles tries to list the differences between the statistics fields.

The best one would be to consider Machine Learning and Data Mining as applied statistics.


Statistics vs Machine Learning

Leo Breiman, Statistical Modeling: The Two Cultures, Statistical Science 16(3), 2001

  • One View:

  • Find a function that predicts y from x: no model of nature implied or needed (decision tree, neutral net)

Data mining vs Machine learning

Data mining is the application machine learning are the algorithm.

machine learning has the upper hand in Marketing.

Statistics vs Data Mining

There is a great deal of overlap between data mining and statistics. In fact most of the techniques used in data mining can be placed in a statistical framework.

Howewer, Data mining techniques are not the same as traditional statistical techniques.

Statistics Type Automatic validation of the model Automation Large Set of Data
Data Mining Yes Yes Yes
Traditional statistics No No No


Data mining techniques are not the same as traditional statistical techniques on two points:

  • validation of the model
  • large data set
Validation of the model

Traditional statistical methods, in general, require a great deal of user interaction in order to validate the correctness of a model. As a result, statistical methods can be difficult to automate.

Large Data set

Moreover, statistical methods typically do not scale well to very large data sets. Statistical methods rely on testing hypotheses or finding correlations based on smaller, representative samples of a larger population. Data mining methods are suitable for large data sets and can be more readily automated. In fact, data mining algorithms often require large data sets for the creation of quality models.

Documentation / Reference

Powered by ComboStrap