This articles tries to list the differences between the statistics fields.
The best one would be to consider Machine Learning and Data Mining as applied statistics.
Leo Breiman, Statistical Modeling: The Two Cultures, Statistical Science 16(3), 2001
Data mining is the application machine learning are the algorithm.
machine learning has the upper hand in Marketing.
There is a great deal of overlap between data mining and statistics. In fact most of the techniques used in data mining can be placed in a statistical framework.
Howewer, Data mining techniques are not the same as traditional statistical techniques.
Statistics Type | Automatic validation of the model | Automation | Large Set of Data |
---|---|---|---|
Data Mining | Yes | Yes | Yes |
Traditional statistics | No | No | No |
What distinguishes data science from statistics? Bottom-up (data-driven) vs top-down (hypothesis-driven)
Data mining techniques are not the same as traditional statistical techniques on two points:
Traditional statistical methods, in general, require a great deal of user interaction in order to validate the correctness of a model. As a result, statistical methods can be difficult to automate.
Moreover, statistical methods typically do not scale well to very large data sets. Statistical methods rely on testing hypotheses or finding correlations based on smaller, representative samples of a larger population. Data mining methods are suitable for large data sets and can be more readily automated. In fact, data mining algorithms often require large data sets for the creation of quality models.