About
Entity Resolution, or wiki/Record linkage is the process of (joining|matching) records from one data source with another that describe the same Entity.
Also known as :
- entity disambiguation/linking,
- duplicate detection or deduplication
- record matching,
- (reference) reconciliation,
- object identification,
- and wiki/conflation.
Entity Resolution (ER) refers to the task of finding records in a data set that refer to the same entity across different data sources. (identifier)
A data set that has undergone ER may be referred to as being cross-linked.
Entity resolution is a data cleaning and integration problem..
Articles Related
Example
- Entity resolution across two data sets of commercial products.
Approach
- A simple approach to entity resolution is to treat all records as strings and compute their similarity with a string distance function.
- See also: wiki/Device_fingerprint
Library
- https://github.com/dedupeio/dedupe - A python library for accurate and scaleable fuzzy matching, record deduplication and entity-resolution. Dedupe is based on Mikhail Yuryevich Bilenko's Ph.D. dissertation: Learnable Similarity Functions and their Application to Record Linkage and Clustering. See similarity