Table of Contents

Data Mining - Data (Preparation | Wrangling | Munging)

About

Data Preparation is a data step that prepares your data for further analyis.

It's a key factor in any data project:

Preparing has several steps that are explained below.

Output

Data Mining

For data mining, the data exists within a single table or view where each case (record) is a row.

If you want to mine data stored in a star schema, you still need to create a unique table with a record by case and the variable of interests.

The data mining development process may require several data sets.

A data set may be:

Type Preparation / Data Transformation

Data transformations may be required by algorithms.

Data Cleansing

The data must be properly cleansed to eliminate inconsistencies and support the needs of the mining application.

Data Discretization

Put data into bin: binning (discretization)

Data Correction

Data Correction - There is always a bad input

Data Normalization

Normalize the data to be able to have:

Example:

Outlier suppression

outlier suppression is required to not skew the result.

Tools

Documentation / Reference