By Robert Barker, Infoglide Senior Vice President & Chief Marketing Officer
We received a comment on our recent post that contrasted identity resolution with data matching that I can’t let go unanswered. Here’s what the respondent said:
“Interesting. So what Identity Resolution consists of is a bunch of data standardization tables and a matching tool? Seems like a name equivalence table and a color equivalence table and any of the off the shelf matching tools would solve your problem. That is a pretty trivial solution to a pretty complicated problem. Thanks for [sic] you insight.”
Wow, talk about missing the point! Data matching products are clearly useful for certain types of problems, e.g. cleansing data before insertion into a data warehouse. What we stated is that whole classes of problems demand a different approach, and the current tendency to re-brand data quality products as “identity resolution” is misleading. Here’s why.
First, identity resolution is far more than a name equivalence table and a color equivalence table. There is no such thing as a passport equivalence table. The technology has to “understand” the standard formats for passports from many different countries and also understand the possible ways those passport numbers may be manipulated. Also, an equivalence table and COTS matching tools wouldn’t be able to determine that two homes are right next to each other even though they have totally different street names.
Additionally, a robust identity resolution technology needs to be able to search and analyze free text and compare different elements in an unstructured blob of text to find similarities. Those are just a few examples of the types of data comparisons that can be accomplished with identity resolution. Our Identity Resolution Engine™, for example, uses over 50 domain-specific Similarity Search algorithms, each with its own intellectual property, to compare many different types of attributes.
Second, data matching tools typically reduce the amount of available data by combining “like” entities. The goal is “de-duping” and standardization of the data. Typical responses are simply “yes it’s a match” or “no it’s not a match.” While fine for basic MDM and data warehousing efforts, it’s not so great for mission-critical applications, or if you’re trying to retain the data for future analysis. Losing data is not an option – the diversity of the data contains valuable forensic information about how (id)entities are matched or linked as relationships.
With ten years of R&D, we’ve perfected the combination of lexigraphic algorithms with over 50 domain-specific algorithms to deliver a high degree of precision which precludes false positives – something that can’t be approached using a single generic equation. For example, DHS’s Secure Flight program required true identity resolution, and that’s why we won that business over hundreds of others.
And finally, a comprehensive identity resolution technology, in addition to data matching, should have the ability to:
• Uncover non-obvious relationships between seemingly disparate identities/entities,
• Apply rules and decisioning based on the specific industry, application, and organization,
• And integrate that knowledge back into existing business applications.
If you’re seriously interested about educating yourself on how identity resolution differs from data matching, we’ve written extensively about the subject. Check out posts in early December, then a week later, then once more right before the holidays.
If you’re not serious about identity resolution, then I’m not sure why you’re reading this! If you ARE serious, we’d like to hear your thoughts.