Before tackling a data mining problem, some considerations must be take into account in order to get good interpretations of the results.
Strong correlations of data do not necessarily prove a cause-and-effect link.
It is important to remember that the predictive relationships discovered through data mining are not necessarily causes of an action or behavior.
- data mining might determine that males with incomes between 50,000 and 65,000 who subscribe to certain magazines are likely to buy a given product. You can use this information to help you develop a marketing strategy. However, you should not assume that the population identified through data mining will buy the product because they belong to this population.
- In the late 1940s, before there was a polio vaccine, public health experts in America noted that polio cases increased in step with the consumption of ice cream and soft drinks, according to David Alan Grier, a historian and statistician at George Washington University. Eliminating such treats was even recommended as part of an anti-polio diet. It turned out that polio outbreaks were most common in the hot months of summer, when people naturally ate more ice cream, showing only an association, Mr. Grier said.
Business user may already be aware of important patterns as a result of working with your data over time.
Data mining can confirm or qualify such empirical observations in addition to finding new patterns that may not be immediately discernible through simple observation.
Data mining does not automatically discover solutions without guidance. The patterns you find through data mining will be very different depending on how you formulate the problem.
To obtain meaningful results, you must learn how to ask the right questions. For example, rather than trying to learn how to “improve the response to a direct mail solicitation,” you might try to find the characteristics of people who have responded to your solicitations in the past.
Characteristics of the data
Data mining algorithms are often sensitive to specific characteristics of the data:
- outliers (data values that are very different from the typical values in your database),
- irrelevant columns,
- columns that vary together (such as age and date of birth),
- data coding,
- and data that you choose to include or exclude.
Understanding Your Data
To ensure meaningful data mining results, you must understand your data.
You need to understand the data that was used to build the model in order to properly interpret the results when the model is applied.