Mistaken Identity Resolution Part V: Identity Resolution vs. Data Quality
By Robert Barker, Infoglide Senior Vice President & Chief Marketing Officer
In this series of posts on Mistaken Identity Resolution we have compared identity resolution with other market spaces that it’s sometimes confused with, such as Master Data Management (MDM), data integration, and data warehousing. With Informatica’s recent acquisition of Identity Systems, now’s a good time to address the confusion between identity resolution and data quality.
A Gartner study done several years ago estimated that poor quality customer data costs U.S. businesses an estimated $611 billion dollars a year [see correction]. So obviously data quality is a very important component of data management.
Data quality is defined by Whatis.com as “the reliability and effectiveness of data… maintaining data quality requires going through the data periodically and scrubbing it. Typically this involves updating it, standardizing it, and de-duplicating records to create a single view of the data, even if it is stored in multiple disparate systems.”
Identity resolution is defined by wikipedia as the process that “analyzes all of the information relating to individuals and/or entities from multiple sources of data, and then applies likelihood and probability scoring to determine which identities are a match and what, if any, non-obvious relationships exist between those identities.”
So while both data quality and identity resolution seek to create a unified view of the data and determine which entities are the same, identity resolution takes the process further by also determining which entities are related.
While both technologies can de-dupe data records, identity resolution adds powerful matching functions to data quality, so numerous patterns that otherwise would go undetected are uncovered quickly and accurately. These data patterns include multi-cultural name and address matching, character insertions/deletions, nicknames, abbreviations, transpositions, repetitions, etc. And perhaps the most valuable reason for incorporating identity resolution into data quality is being able to automate the decision process and integrate it into existing business processes. Specific rules can be applied to the intelligence gathered from search, relationship, and identity results, and an explicit action executed based on business requirements.
Perhaps instead of asking how data quality and identity resolution differ, the better question to ask is, “Are you risking poor data quality if identity resolution is absent?”
