HOME

The Other Half of Entity Resolution

By Robert Barker, Infoglide Senior VP & Chief Marketing Officer

In a recent post, Jonathan McDonald quotes one definition of entity resolution:

According to Gartner, entity resolution is “the capability to resolve multiple labels for individuals, products or other noun classes of data into a single resolved entity when pseudonyms, alias names or other synonym-style constructs exist. This is especially true in cases wherein there exists intentional falsification of information or the creation of false identities. While most prevalent in detecting perpetrators of criminal or illegal activity, more-commercial applications exist as well.

While the definition nicely captures the value of “first degree” entity resolution, it falls short by omitting non-obvious relationship detection.

Basic entity resolution determines “who’s who” by sifting through massive amounts of noun/attribute data in multiple disparate data sources. Cutting through ambiguity caused by missing attributes, pseudonyms, aliases, and obvious efforts to deceive, it mines and resolves the essential elements of identity to form an unambiguous picture that greatly enhances business decisions and reduces risk.

However, in many application domains, pinpointing “who knows whom” is equally valuable. In detecting insider trading, for example, it’s important to resolve identity information to achieve an unambiguous picture of a person of interest, but to expose fraudulent activity, it’s critical to identify second and third degree linkages between suspects and their friends, relatives, and business associates.

More examples abound. In insurance, fraudsters change roles each time they stage a car accident and also intentionally modify their identities in accident reports. Fraudulent employers who want to reduce their workers’ compensation premiums will close their company and start a new one with modified identities of corporate officers. In retail, non-receipted returns of merchandise are often linked to store employees and the customers they enlist to act as their confederates. The list goes on and on.

In each case, entity resolution finds hidden connections by evaluating multiple ambiguous attributes with the same algorithms used to resolve identities. A retail employee who takes a customer’s winning lottery ticket (while telling the customer he didn’t win!) can be traced through address and phone information to other suspiciously connected people, e.g. frequent lottery winners and lottery commission employees.

With apologies to the experts at Gartner, here’s a suggested addition to the definition that acknowledges the other half of entity resolution:

The capability to (a) resolve multiple labels for individuals, products or other noun classes of data into a single resolved entity when ambiguity from pseudonyms, alias names or other synonym-style constructs exists, and (b) to expose hidden connections between entities that are two or more degrees of separation apart. This is especially true in cases where there exists intentional falsification of information or the creation of false identities. While most prevalent in detecting perpetrators of criminal or illegal activity, more-commercial applications exist as well.

2 Responses to “The Other Half of Entity Resolution”

  1. Jim Zaiss Says:

    I suggest two further tweaks to the definition of entity resolution — one to part (a) and one to part (b). Part (a) would be better stated as:

    The capability to (a) resolve multiple labels for individuals, products, or
    other _types_of_objects_ into a single resolved entity when ambiguity
    from pseudonyms, alias names or other synonym-style constructs exist

    The “multiple labels” in question are not typically labels for nouns or noun classes. They ARE nouns, and they are labels for (i.e. they denote) objects in the world. The original wording of (a) suffers from a use-mention confusion.

    Regarding part (b) of the definition, I think it’s misleading to describe the connections of interest here as “between entities that are _two_ or more degrees of separation apart.” I am separated by two degrees from friends of my friends; I am separated by one degree from my friends; and I am separated by *zero* degrees from myself. Since (b) is explicitly about connections between _entities_ (as opposed to, say, labels for entities), I would replace the word ‘two’ in part (b) with ‘one’.

    Regards,
    Jim Zaiss
    AWARE Software, Inc.

  2. John Talburt Says:

    Jim,
    I agree, but to quote William Money in “Unforgiven”, I think are there 3 halves to ER. These ER activities are Mining the entity references when they are embedded in unstructured data, resolving the references to the same entity (0 degree of separation), and exploring the relationships among resolved entities. It is true that people often confuse the entity references (labels) with the entities themselves. Breaking ER down into these 3 activities is nice for discussion, but in practice they tend to overlap.
    Thanks,
    -jrt-

Leave a Reply


Bad Behavior has blocked 1318 access attempts in the last 7 days.

Close
E-mail It
Portfolio Strategy News The Direct Marketing Voice