data_mining:data_preparation

About

Data Preparation is a data step that prepares your data for further analyis.

It's a key factor in any data project:

Preparing has several steps that are explained below.

If you want to mine data stored in a star schema, you still need to create a unique table with a record by case and the variable of interests.

The data mining development process may require several data sets.

A data set may be:

Data transformations may be required by algorithms.

The data must be properly cleansed to eliminate inconsistencies and support the needs of the mining application.

Data Correction - There is always a bad input

Normalize the data to be able to have:

Example:

outlier suppression is required to not skew the result.

Google Cloud Dataprep, an intelligent, fully-managed cloud service (built in collaboration with Trifacta) that visually explores, cleans and prepares structured and unstructured data for analysis or training machine-learning models.
In Oracle, DBMS_DATA_MINING_TRANSFORM is a data transformation package that includes a variety of missing value and outlier treatments, as well as binning and normalization capabilities.