About
This articles tries to list the differences between the statistics fields.
The best one would be to consider Machine Learning and Data Mining as applied statistics.
Articles Related
Vs
Statistics vs Machine Learning
Leo Breiman, Statistical Modeling: The Two Cultures, Statistical Science 16(3), 2001
- One View:
- Emphasis on stochastic models of nature (linear regression / logistic regression, Cox model)
- Find a function that predicts y from x: no model of nature implied or needed (decision tree, neutral net)
Data mining vs Machine learning
Data mining is the application machine learning are the algorithm.
machine learning has the upper hand in Marketing.
Statistics vs Data Mining
There is a great deal of overlap between data mining and statistics. In fact most of the techniques used in data mining can be placed in a statistical framework.
Howewer, Data mining techniques are not the same as traditional statistical techniques.
Statistics Type | Automatic validation of the model | Automation | Large Set of Data |
---|---|---|---|
Data Mining | Yes | Yes | Yes |
Traditional statistics | No | No | No |
What distinguishes data science from statistics? Bottom-up (data-driven) vs top-down (hypothesis-driven)
Difference
Data mining techniques are not the same as traditional statistical techniques on two points:
- validation of the model
- large data set
Validation of the model
Traditional statistical methods, in general, require a great deal of user interaction in order to validate the correctness of a model. As a result, statistical methods can be difficult to automate.
Large Data set
Moreover, statistical methods typically do not scale well to very large data sets. Statistical methods rely on testing hypotheses or finding correlations based on smaller, representative samples of a larger population. Data mining methods are suitable for large data sets and can be more readily automated. In fact, data mining algorithms often require large data sets for the creation of quality models.