Data Preparation is a data step that prepares your data for further analyis.
It's a key factor in any data project:
Preparing has several steps that are explained below.
For data mining, the data exists within a single table or view where each case (record) is a row.
If you want to mine data stored in a star schema, you still need to create a unique table with a record by case and the variable of interests.
The data mining development process may require several data sets.
A data set may be:
Data transformations may be required by algorithms.
The data must be properly cleansed to eliminate inconsistencies and support the needs of the mining application.
Put data into bin: binning (discretization)
Data Correction - There is always a bad input
Normalize the data to be able to have:
Example:
outlier suppression is required to not skew the result.