Unobtrusive Measures and Identity Resolution

April 1st, 2010

By Mike Betron, Infoglide Director of Marketing

For decades, researchers in the social sciences have used “unobtrusive measures” as defined originally in a 1966 book by Webb, Campbell, Schwartz, and Sechrest. The idea is to collect and analyze data without disturbing the subjects of the study. For example, instead of surveying subjects to find out how many candy bars they eat each day, the subjects’ garbage is searched and the number of candy wrappers is tallied.

Social science researchers are driven to unobtrusive measures when they encounter or anticipate either intentional or unintentional bias in their subjects’ responses. For example, in the study above, one may be inclined to understate the number of candy bars consumed (either intentionally or unintentionally) to improve personal perception.  In the case of fraud, bad actors have an even stronger motivation to purposely bias any information because they don’t want to get caught.

Data analysis using unobtrusive measures can be extremely effective for discovering fraud and risk because bad actors often provide different versions of identifying data to avoid detection. For example, suppose you’re responsible for the placement of foster children in safe homes. A key requirement is to avoid placing a child in a home where a registered sex offender lives. However, what if no sex offender has the same official address as the foster home candidate but one does in fact live in the foster home or has a relationship with the foster home owner. How can identity resolution be used to alert you to that fact?

By using sophisticated algorithms to measure and score similarity between data fields, non-obvious relationship analytics (NORAn) helps users discover relationships between people that would otherwise go undetected. In our foster home example, NORAn could be applied to uncover the fact that a resident of the candidate foster home shares a sequential phone number with another person who shares the same address as a registered sex offender. The information highlighted in red shows the at least partial matches (all information is fictitious).


When we see these results, it’s not hard to speculate that Sally may be Jane’s mother or daughter and that John is a boyfriend who is living with her. If Sally and John visit Jane, there could be significant risk to any foster child living with Jane. Although we can’t determine that there is a relationship here with 100% certainty, the statistical probability of a potential link to a known sex offender is high enough to warrant further investigation.

Another area of social services where this might be relevant would be when a social worker is making a home visit and needs to check the home to make sure that no one with a record of violent crime lives at the house. What if a violent offender has listed his official address as his mother’s house but spends about half his time at his girlfriend’s house? A social worker who is conducting a home visit at the girlfriend’s house would want to know that.

Fortunately a major pizza delivery company sells their data, and that it turns out to be pretty accurate because if people want their pizza delivered they have to give the right address. By tapping into this data as well as other internally or externally available data, identity resolution technology will uncover the fact that the violent offender uses that address periodically as his residence.

Many health and human services organizations utilize data matching technology with varied levels of sophistication but, as evidenced by these examples, data matching alone is not enough. Social workers have to have the ability to understand and be made aware of non-obvious relationships as well.

Identity Resolution Daily Links 2010-03-29

March 29th, 2010

By the Infoglide Team

Forrester Blog: TIBCO jumps onto MDM M&A train with acquisition of data matching vendor Netrics

[Rob Karel] “Netrics seemed the most likely target for Oracle to replace Identity Systems with its small footprint and relatively low acquisition cost, but now with Netrics off the market, Oracle should consider other matching vendors such as S3 Matching Technologies, Syslore or identity resolution/matching vendor Infoglide Software.”

msnbc: What is TSA’s Secure Flight Program?

Secure Flight launched in August, is currently in a phase-in stage, and is intended to be fully in place by November 2010 for all flights leaving from and/or arriving in the U.S. Essentially, the airlines and booking engines will collect your full name, gender and birth date when you book your flight and send that info to the TSA, which will then compare the information against the no-fly list. The name you give when you book must synch up with your full name as shown on the government-issued ID you use when checking in for your flight.”

Michael Power: Can Governments Force Patient Data into EHRs?

“As a brief and somewhat simplistic aside, ‘electronic health record’ is a term often incorrectly used to describe both EHRs and EMRs. There is a distinction between the two and it is an important one. Hospitals and physicians use EMRs. EMRs, along with other databases, are expected to feed into a longitudinal ‘virtual’ patient record which is to be accessible across providers and institutions and which is properly referred to as the EHR.”

Security Management: Terror Threat Tracking System Shares Thousands of Tips from Locals, FBI Says

“The eGuardian system is one of the core technological elements of the Information Sharing Environment (ISE) established by congressional mandate in response to the intelligence failures that preceded the 9-11 attacks. In a typical scenario, a law enforcement agency will either generate its own SAR or field one from the public.”

Identity Resolution Daily Links 2010-03-26

March 26th, 2010

[Post from Infoglide] Garbage In, Garbage Out? Not Necessarily.

“One of the oldest phrases in computer science seems to still be in vogue. ‘Garbage in, garbage out’ (GIGO) is a term coined during the early days of the computing industry. It pointed out that the value of computer systems of the day were entirely dependent upon their input data. No amount of processing power could produce a right answer from bad data. Fast forward many decades…”

Formtek: Technology: Data Consistency via Master Data Management

“The concept of MDM is a good one, and many companies have piloted MDM projects over the last few years.  Now research firm Baseline Consulting  says that many companies are beginning to move beyond their MDM pilot systems.  Baseline Consulting co-founder Jill Dyche said that ‘the fact that data quality, data governance, and data enrichment processes may accompany an MDM initiative make it all the more attractive as an enterprise solution.’”

HSToday: DHS Intelligence Needs More Oversight

“The success of the fusion center program,” said the report, “ is dependent on the infrastructure that enables state and local fusion centers to have access to each other’s information as well as to the appropriate federal databases. The fusion center program and the Nationwide Suspicious Activity Report Initiative (NSI) rely on the concept of shared space architecture, where the fusion centers replicate data from their systems to an external server under their control, making the decision on what to share totally under their control.”

HealthITExchange: EHR implementation a foregone conclusion, ONC says

“No matter how the rules shake out, EHR implementation in the United States is a foregone conclusion, Blumenthal said. He sees the skills of collecting, using, searching and sharing health data electronically becoming part of the assumed professional skill set for health care providers, just as using a stethoscope is now. In the next five to 10 years, hospitals will use their robust EHR systems to recruit physicians; solo physicians who succeed in implementing EHR will sell their practices more easily when the time comes, but solo physicians still using paper will not be able to sell their practices at all.”

Garbage In, Garbage Out? Not Necessarily.

March 24th, 2010

By Douglas Wood, Infoglide Senior Vice President

One of the oldest phrases in computer science seems to still be in vogue. “Garbage in, garbage out” (GIGO) is a term coined during the early days of the computing industry. It pointed out that the value of computer systems of the day were entirely dependent upon their input data. No amount of processing power could produce a right answer from bad data.

Fast forward many decades. The same phrase is still used today to emphasize the importance of data quality in many application areas (e.g., healthcare). While high quality data remains important, two factors influence me to say that GIGO is not the absolute rule that it once was: (1) advancements in the evolution of software and hardware technology, and (2) the emergence of whole classes of new applications targeting fraud detection.

What happens when the quality of data is “enhanced”? Processes like data transformation, data cleansing, and de-duplication filter out information that is unnecessary and confusing. Names, addresses, and other attributes are standardized. Duplicate records are deleted. Links to “bad” data are broken. Master records, aka “golden records”, are created for use by multiple systems.

While this has great value for traditional systems, it can devastate fraud detection efforts. For example, discovering and evaluating multiple addresses during fraud analysis is crucial in finding and prosecuting perpetrators of fraud. Or conversely, standardizing multiple forms and instances of someone’s name held in multiple data sources may remove vital clues and break a forensic chain of evidence.  We sometimes refer to the result as data deterioration.

So “garbage in, garbage out” is still an operative phrase for most software systems, but for entity resolution, we’ve found repeatedly that “one man’s garbage is another man’s treasure.”

Identity Resolution Daily Links 2010-03-23

March 23rd, 2010

By the Infoglide Team

Fraud Magazine: Suspicious Activity Reports

“Ultimately, the defendant admitted making numerous deposits of less than $10,000 each to avoid triggering bank filing of the Currency Transaction Reports (CTRs) required for all activity involving five figures or more. In a single month, he made nearly 30 such deposits at a number of banks, totaling more than $260,000. Later, in almost 20 transactions at various branches of a single bank, he deposited an additional $185,000. That bank promptly filed a SAR detailing how the mortgage broker had deposited into his personal and business accounts sums ranging from $9,000 to $9,800.”

GCN: State fusion centers look to expand beyond counterterrorism efforts

“The development of fusion centers has faced some significant challenges. First and foremost, the centers must overcome the practical challenge of integrating data. Even in the same state, you can have 500 police departments using different software to manage their [computer-aided design] and intelligence needs,” Serrao said. And generally that data is saved in different formats.”

Community of Experts: Future of MDM – Master Policy Management?

“However, as the thought processes for establishing a business case for MDM mature, we are starting to see where the desire for the unified view is not completely dependent on an instantiation of a single consolidated repository. Instead, in these situations the business needs are supported by the availability of master data services implementing consistent information policies across an extended enterprise… the consistent application of policies can be done both in the presence of a unified repository or as a federated collection of common repositories!”

Worker’s Compensation Law Center: IL: Chiropractor, Physician Among Three Defendants Indicted in Alleged $1 Million Health Care Fraud Scheme

“As part of the scheme, Minnis allegedly forged and caused others to forge physicians’ signatures on various documents falsely representing that services, treatment, physical therapy and/or testing had been provided, ordered or supervised by medical doctors. Minnis allegedly forged the doctors’ signatures, and caused them to sign reports without having done patient exams, knowing that Workers’ Comp would not accept a chiropractor’s opinions or reports as medical evidence to support patients’ claims.”

Identity Resolution Daily Links 2010-03-19

March 19th, 2010

[Post from Infoglide] Recession Driving Insurance Fraud

“A recent post on McClatchy’s blog attributes growing insurance fraud to the recession: A recent survey of 37 state insurance-fraud bureaus by the Coalition Against Insurance Fraud found that the recession “appears to have had a significant impact on the incidence of fraud” last year. On average, the bureaus reported increases in case referrals and new investigations in all 15 categories of fraud the survey covers.”

Liliendahl on Data Quality: What is Data Quality anyway?

“If we look at what data quality tools today actually do, they in fact mostly support you with automation of data profiling and data matching, which is probably only some of the data quality challenges you have.”

Voice of America: Murder of US Consulate Workers in Mexico Signals New Phase in Violence

“Scott Stewart, vice president of tactical intelligence for Austin, Texas-based analysis firm Stratfor, says the killings might have been related to a recently announced U.S. plan to increase cooperation with Mexican law enforcement agencies. ‘We believe that it is likely related to a decision last month to start working more closely with the Mexican government by the Americans,” said Scott Stewart. “They were going to put some personnel into a joint fusion center in Juarez.’”

Coalition Against Insurance Fraud: False claims act for Maryland

“The Coalition issued a statement supporting the bill, saying it would serve as a deterrent and a powerful incentive for medical providers to have strong compliance programs and to “play by the rules.” False claims acts help detect fraudulent schemes that otherwise might not ever be known because they allow insiders to blow the whistle and initiate civil actions.”

Recession Driving Insurance Fraud

March 17th, 2010

By Infoglide Software CEO Mike Shultz

A recent post on McClatchy’s blog attributes growing insurance fraud to the recession:

A recent survey of 37 state insurance-fraud bureaus by the Coalition Against Insurance Fraud found that the recession “appears to have had a significant impact on the incidence of fraud” last year. On average, the bureaus reported increases in case referrals and new investigations in all 15 categories of fraud the survey covers.


The two largest sources of fraud listed in the CAIF study are phantom vehicle accident and staged accidents. In staged accidents, perpetrators of these crimes tend to be involved in multiple incidents. They create and leave a trail of information that remains captured in insurance company datasets. Unfortunately, many of these companies don’t take advantage of sophisticated tools that can find the crooks.

Let’s look at the example of staged vehicle crimes and how they can be stopped. A ring of people who successfully pull off a staged accident and are subsequently reimbursed by insurance companies usually decide to repeat their success. Since they fear being caught, each person takes different roles, changing his/her name and address slightly to avoid being caught by the data matching algorithms employed by insurance companies in the claims process. One person acting as driver in one staged accident plays the role of witness in the next accident and the passenger in the third. Each time an accident is reported, that person changes attributes of their identity, like name and address, to trip up existing software systems.

The state of entity resolution technology has been advancing rapidly. What used to be undetectable using “data matching” software can now be easily found using entity resolution. We’ve written before about the difference between simple data matching and entity resolution and how entity resolution enables hidden relationships to be uncovered.

Working with ambiguous data is a challenge, and it can overpower traditional data matching and fuzzy matching techniques. Entity resolution disambiguates insurance fraud data to find the hidden relationships between participants in fraud rings, allowing them to be stopped and prosecuted

Identity Resolution Daily Links 2010-03-16

March 16th, 2010

By the Infoglide Team

McClatchy: Recession is fueling a boom in insurance fraud

“Whether it’s worthless health plans peddled by fax, staged auto accidents, arson or slip-and-fall accidents at the local mall, insurance fraud of all kinds is booming in the recession and consumers are paying the price in higher premiums.”

SC Magazine: Technology solutions can be the resolution to terrorist threats

“Poulter said that when it comes to areas such as fraud detection and anti-money laundering (AML), identity resolution technology can assist financial institutions in combating identity fraud and leverage name matching of hidden patterns and correlations to prevent attempts to disguise identity. A single view of this information plays its part in the fight against terror, giving authorities a greater ability to prevent money laundering, which may lead to the funding of terrorist campaigns.”

Computerworld: Cloud Computing Will Cause Three IT Revolutions

“Over the next two to five years, expect to see enormous conflict about the technical pros and cons of cloud computing that will, at bottom, be motivated by the perception on the part of the participants as to whether cloud computing represents a benefit to be embraced or a threat to be resisted. In particular, cloud computing’s three characteristics — the illusion of infinite scalability, lack of a long-term commitment, and pay-by-the-use — will result in three revolutions in the way IT is performed, and each of the revolutions will have its adherents and detractors.”

Initiate blog: An Economic Business Case for Entity Resolution

“Restated, intelligence information sharing, analysis and proactive action (such as that performed by Interpol) are the most efficient way to economically battle terrorism. Interpol’s mission is to coordinate information and operations among nations, to allow countries to track criminals across borders and share information of common interest. This also happens to be a very good way to describe the business function of entity resolution technology.”

Identity Resolution Daily Links 2010-03-13

March 13th, 2010

[Post from Infoglide] Architectures for Entity Resolution-Part 2

“In the last post we examined how entity resolution (ER) systems are actually implemented, starting with the most basic merge/purge process and heterogeneous join systems. Both of these approaches focus on collecting equivalent references from among the sources provided, either as a large batch of references in a single file, or through queries against a federation of databases…”

The Foundry: Thwarting the Next Terrorist Attack: Are We More Prepared?

“Knowing what we know now, would the U.S. be able to stop another attack like that of Christmas Day 2009? This is certainly the question on the minds of many Americans today.  It is also one that Jamie McIntyre, veteran journalist and blogger for Military.com, had the opportunity to ask of Rand Beers, Under Secretary for National Protection and Programs Directorate from DHS, at a Heritage Foundation National Security Bloggers Luncheon.”

Perceptive Information Strategies: Informatica and the Identity Opportunity

“In the middle of all of this are software providers, primarily IBM InfoSphere Identity Insight Solutions, Infoglide (which is providing software for the DHS) and Informatica… Identity recognition and resolution systems enable organizations to use data matches to gain a better understanding of identity across multiple systems. This could include not just individual identities but also networks and relationships: that is, who people know and how they are connected.”

Managing Automation: The MDM Supplier Market Gets a Little Smaller

“It’s been a heady couple of months in the IT infrastructure market, as any independent company that wasn’t tied down seemed to be swept up in a whirlwind of M&A activity. Independent data integration specialist Informatica, a 4,000-customer company in business since 1993, announced in January that it had acquired Siperian for $130 million.”

Architectures for Entity Resolution-Part 2

March 10th, 2010

By John Talburt, PhD, CDMP, Director, UALR Laboratory for Advanced Research in Entity Resolution and Information Quality (ERIQ)

In the last post we examined how entity resolution (ER) systems are actually implemented, starting with the most basic merge/purge process and heterogeneous join systems. Both of these approaches focus on collecting equivalent references from among the sources provided, either as a large batch of references in a single file, or through queries against a federation of databases.  The entity identities found by these ER systems are transient in the sense that they depend upon the sources input into the process.  When different sources are provided, different identities will emerge.

On the other hand, there are ER systems that retain and manage identity information.  By doing this they are able to “recognize” the same identity over time and assign that identity the same entity identifier (sometimes called “persistent identifiers” or “persistent links”).  In Customer Data Integration (CDI) applications, these kinds of systems are sometimes called Customer Recognition Systems.

Two major types of ER systems perform identity management.  The first type is the “identity resolution” system.  It is most effective in situations where a fairly stable set of known identities of interest exists, such as the set of vendors or customers of a company, a set of products, or the students enrolled in a school.  The attributes of these identities are pre-loaded into the system and assigned identifiers.  When a reference is given to the system, it then decides whether the reference is to one of the known identities, and if so, returns the identifier of that identity.

Identity resolution systems can operate in either batch or transactional mode.  In cases where there are a large number of pre-stored identities, the performance of batch operations can be improved through distributed processing where the identities are partitioned over multiple processors and resolved in parallel.

However, there are many situations where the identities are not necessarily known in advance, or in some cases  the entities are known but simply not organized in such a way that they can be easily pre-loaded.  For example, suppose two companies merge and each company has its own customer database. The customers are identified in different ways in each database, and furthermore, for the customers of one company, poor systems and practices prevent having any confidence that the master records are unduplicated across business lines or company locations.

The type of system often applied in these situations is an “identity capture” system.  The identity capture architecture can be seen as a hybrid of  merge/purge and identity resolution systems.  It supports identity management and persistent identifiers, but without starting with a preloaded set of identities.  In my next post, we’ll delve deeper into the identity capture process.

Bad Behavior has blocked 1364 access attempts in the last 7 days.

E-mail It
Portfolio Strategy News The Direct Marketing Voice