Data Quality - Data Correction

Dataquality Metrics

About

Data Correction is the second step in a data cleansing process after the detection of values that not meet the business rules (data rules).

For each data values that are not accepted, you can have to choose one of the following actions:

  • Ignore: The data rule is ignored and, therefore, no values are rejected based on this data rule.
  • Report: The data rule is run only after the data has been loaded for reporting purposes only. It is like the Ignore option, except a report is created that contains the values that did not adhere to the data rules.
  • Cleanse: The values rejected by this data rule are moved to an error table where cleansing strategies are applied. When you choose this option, you must specify a cleansing strategy of correction rule. See the following section for details about cleansing strategies.

Cleansing Strategy (Data Cleansing Rule)

Cleansing or correction rules, identifies a violation of some expectation and a way to modify the data to then meet the business needs. For example, while there are many ways that people provide telephone numbers, an application may require that each telephone number be separated into its area code, exchange, and line components. This is a cleansing rule, as is shown in the figure below, which can be implemented and tracked using data cleansing tools.

Data Quality Correction Rule

Cleansing Strategy Description
Remove Does not populate the target table with error records
Custom Custom function in the target table
Set to Min Sets the attribute value of the error record to the minimum value defined in the data rule.
Set to Max Sets the attribute value of the error record to the maximum value defined in the data rule.
Similarity Uses a similarity algorithm based on permitted domain values to find a value that is similar to the error record.
Soundex Uses a soundex algorithm based on permitted domain values to find a value that is similar to the error record.
Merge Merge duplicate records into a single row.





Discover More
Thomas Bayes
Data Mining - Data (Preparation | Wrangling | Munging)

Data Preparation is a data step that prepares your data for further analyis. It's a key factor in any data project: mining, ai analytics Preparing has several steps that are explained below. ...
Dataquality Metrics
Data Quality

measures the quality of data through metrics and try to improve them. You will find it in two main domains : The management of attribute data with the Master Data Management (MDM) The management...
Data Quality Cycle
Data Quality - Cycle Overview

A typical use of data quality follow the cycle of : Quality Assessment. Part of the process of refining data quality rules for proactive monitoring deals with establishing the relationship between recognized...
Probabilistic Script Pclean
Data Quality - Data (cleaning|scrubbing|wrangling)

Data cleansing or data scrubbing is the act of: detecting (ie data validation) and correcting (or removing) corrupt or inaccurate records from a record set, table, or database. Used mainly in...
Dataquality Metrics
Data Quality - Data Rules

Data rules are rule that can have various designations such as: business rules (in the data modeling), data test, quality screen. They follow the same concept than the rules from an event driven...
Data System Architecture
Data Warehousing - 34 Kimball Subsytems

This page takes back the Kimball Datawarehouse 34 Subsystem as a table of content and links them to a page on this website....



Share this page:
Follow us:
Task Runner