Data for mining must exist within a single table or view. The information for each case (record) must be stored in a separate row.
Proper preparation of the data is a key factor in any data mining project.
Dimensioned data (for example, star schemas) are supported through nested table transformations. ???
The data must be properly cleansed to eliminate inconsistencies and support the needs of the mining application.
Additionally, most algorithms require some form of data transformation, such as:
- or normalization.
DBMS_DATA_MINING_TRANSFORM is a flexible data transformation package that includes a variety of missing value and outlier treatments, as well as binning and normalization capabilities.
The data mining development process may require several data sets.
A data set may be:
- needed for building (training) the model;
- used for scoring.
- used for testing.