Archive for the ‘Entity Analytics’ Category

Architectures for Entity Resolution-Part 2

Wednesday, March 10th, 2010

By John Talburt, PhD, CDMP, Director, UALR Laboratory for Advanced Research in Entity Resolution and Information Quality (ERIQ)

In the last post we examined how entity resolution (ER) systems are actually implemented, starting with the most basic merge/purge process and heterogeneous join systems. Both of these approaches focus on collecting equivalent references from among the sources provided, either as a large batch of references in a single file, or through queries against a federation of databases.  The entity identities found by these ER systems are transient in the sense that they depend upon the sources input into the process.  When different sources are provided, different identities will emerge.

On the other hand, there are ER systems that retain and manage identity information.  By doing this they are able to “recognize” the same identity over time and assign that identity the same entity identifier (sometimes called “persistent identifiers” or “persistent links”).  In Customer Data Integration (CDI) applications, these kinds of systems are sometimes called Customer Recognition Systems.

Two major types of ER systems perform identity management.  The first type is the “identity resolution” system.  It is most effective in situations where a fairly stable set of known identities of interest exists, such as the set of vendors or customers of a company, a set of products, or the students enrolled in a school.  The attributes of these identities are pre-loaded into the system and assigned identifiers.  When a reference is given to the system, it then decides whether the reference is to one of the known identities, and if so, returns the identifier of that identity.

Identity resolution systems can operate in either batch or transactional mode.  In cases where there are a large number of pre-stored identities, the performance of batch operations can be improved through distributed processing where the identities are partitioned over multiple processors and resolved in parallel.

However, there are many situations where the identities are not necessarily known in advance, or in some cases  the entities are known but simply not organized in such a way that they can be easily pre-loaded.  For example, suppose two companies merge and each company has its own customer database. The customers are identified in different ways in each database, and furthermore, for the customers of one company, poor systems and practices prevent having any confidence that the master records are unduplicated across business lines or company locations.

The type of system often applied in these situations is an “identity capture” system.  The identity capture architecture can be seen as a hybrid of  merge/purge and identity resolution systems.  It supports identity management and persistent identifiers, but without starting with a preloaded set of identities.  In my next post, we’ll delve deeper into the identity capture process.

Identity Resolution Daily Links 2010-03-08

Monday, March 8th, 2010

By the Infoglide Team

tdwi: Informatica Ups the MDM Stakes

“Until now, Informatica’s MDM strategy has largely been peripheral. It had most of the tools (e.g., data integration, data quality, data profiling, and identity resolution) but tended to partner with bigger or best-of-breed players to promote MDM-oriented offerings or services… What’s risky about the acquisition of Siperian is that it imperils Informatica’s existing MDM partnerships (especially with Oracle Corp.) and compromises its neutrality pitch.”

GCN: Fusion centers to be assessed

Fusion centers will conduct self-assessments, followed by a gap analysis and peer reviews, according to officials at the National Fusion Center Association, a new not-for-profit organization based in Alexandria, Va., that represents the 72 fusion centers. The assessments are meant to determine their progress in reaching baseline capabilities. Those capabilities were created by a federal advisory committee that also wrote the original guidelines for those centers.”

WorkersCompensation.com: NYSIF Announces 154 Arrests

“Recent significant cases resulting in millions of dollars in savings to NYSIF have included claimants who receive benefits while operating businesses or remain employed in other capacities, the most prevalent type of workers’ comp. fraud. Other cases involve premium fraud, the most costly type, in construction, asbestos abatement and other contracting, including investigations in conjunction with the U.S. Department of Labor, the U.S. Postal Inspector, and local labor racketeering bureaus. Still other cases involve fraudulent provider billing.”

SignalScape: Experts Ponder Both Sides of Border Security

“The DHS has also tested mobile identification systems and created an information sharing plan with the Department of Justice which allows officials to search for criminal records. Art Macius, chief of staff at the Transportation Security Administration (TSA) added that organizations such as his and the DHS must also share information with their international counterparts. This international cooperation includes efforts such as cargo screening for commercial aircraft though efforts such as the Secure Flight program. Macius said that by this spring, the program will work with U.S. airlines to screen baggage and air cargo, and that the coverage will extend to international carriers by the end of the year.”

Identity Resolution Daily Links 2010-03-06

Saturday, March 6th, 2010

[Post from Infoglide] Is MDM Dead?

“Andrew White of Gartner recently posed a question about whether master data management (MDM) is dead. He didn’t actually suggest that the demise of master data management is imminent. He was challenging whether our current terminology adequately clarifies the current reality about MDM and associated product areas.”

Inside the Biz: The Good News about MDM Market Consolidation

[Jill Dyche] “Last year, Informatica’s MDM story verged on the schizophrenic as the company simultaneously advocated a “roll your own” approach to MDM using various software components while at the same time making investments in both Siperian and rival Initiate Systems. Siperian fills in some significant voids in Informatica’s MDM capabilities, most notably hierarchy management and transaction integration—updating the golden record in real time.”

porter: FAQ Secure Flight

“What is Secure Flight and what does it do? Secure Flight is a behind the scenes program that streamlines the watch list matching process. It will improve the travel experience for all passengers, including those who have been misidentified in the past.”

Computerworld: Meeting an Olympic-size security challenge

“First is the classic ‘entity resolution‘ challenge. Information about any individual is likely going to be scattered across a range of databases. While one database may contain a red-flag item — a pending drug charge or a secondary connection to a known terrorist — another database may not. The challenge is bringing this information together to create a single record — a ’single version of the truth’ — about an individual or entity.”

Is MDM Dead?

Wednesday, March 3rd, 2010

By Mike Shultz, Infoglide Software CEO

Andrew White of Gartner recently posed a question about whether master data management (MDM) is dead. He didn’t actually suggest that the demise of master data management is imminent. He was challenging whether our current terminology adequately clarifies the current reality about MDM and associated product areas.

Certainly the terms describing many markets and types of products are being associated with MDM. Jackie Roberts of DATAForge pointed out that the definition of MDM now seems to include “data integrity, data quality, entity resolution, matching, data integration, governance, metrics and analysis.”

While entity resolution was mentioned in her list, our obsessive focus on entity resolution (aka identity resolution) leads to the conclusion that, rather than being subsumed, its role is growing. Wayne Eckerson at TDWI seems to agree that identity resolution is a critical component of the recent MDM acquisitions. In his post about the acquisitions by Informatica and IBM of Siperian and Initiate Systems, respectively, he described the two transactions this way:

“You could say that Siperian is mostly MDM, but with identity resolution and other capabilities, whereas Initiate is mostly about identity resolution, but with MDM and other capabilities.”

Identity resolution is becoming an integral part of many product areas. Within MDM itself, creating a single-entity view is best done with an identity resolution engine. Data mining is greatly enhanced by the addition of entity resolution. Dan Power of Hub Solution Designs wrote about how key identity resolution is to data matching. We’ve talked about how social CRM can resolve identities of individuals across multiple disparate data sources using identity resolution, as well as “rationalize multiple variations and errors and anomalies that block finding existing customers within their systems”.

Although identity resolution technology has been years in the making, it has only recently risen into the consciousness of most analysts and customers. Because of its ability to bring enhanced clarity to ambiguous data, advanced identity resolution is now beginning to have a significant impact across many data-centered disciplines.

Identity Resolution Daily Links 2010-03-01

Monday, March 1st, 2010

By the Infoglide Team

IT-Director.com: The Informatica Event

[Philip Howard] “To begin with, the company talked about its acquisition of Siperian. I have already commented on this but one point that emerged at the conference was the way that Informatica describes Siperian as infrastructure MDM as opposed to application MDM. This is a hitherto unrecognised distinction (with respect to terminology) in the MDM market. Informatica distinguishes the former from the latter by saying that infrastructure MDM is domain and data model independent.”

Workforce Management: Medical Clinic Owners Plead No Contest to $60 Million Workers’ Compensation Fraud

“Investigators alleged that the pair purchased thousands of workers’ compensation client referrals from an attorney television advertising service. Clients were then sent to doctors who had a relationship with Premier, which would handle billing and collection work in return for a 50 percent fee for money they collected. Clients were then sent to attorneys who had a business relationship with Fish and Bacino, investigators allege. ‘Getting kickbacks for referring medical payments is illegal and drives up the costs in the system,’ California Insurance Commissioner Steve Poizner said in a statement.”

SignalScape: DC Police Chief Cathy Lanier Describes How Technology Is Changing Police Work in the Capitol

“The MPD also established a fusion center, which is responsible for the national capitol region. From a homeland security perspective, Chief Lanier said that the center collects and stores crime and terror alerts into a data warehouse.”

Injured Workers’ Law Firm Blog: Insurance Fraud Is a Huge Crime

“The fraudulent claims that can be made through insurance companies are categorized as being soft or hard. Soft fraud is the most common type of fraud and usually takes place when someone exaggerates a claim being made. Hard fraud takes place when someone deliberately plans a deceptive act such as a collision or the theft of their vehicle.”

Identity Resolution Daily Links 2010-02-27

Saturday, February 27th, 2010

[Post from Infoglide] Attacking Subscription Fraud with Identity Resolution

“In March 2006, the Communications Fraud Control Association (CFCA) estimated that annual global fraud losses in the telecom sector were between $54 billion and $60 billion, and the losses continue to be substantial. Many types of fraud have been identified, but by far the most prevalent is subscription fraud.”

ITBusinessEdge: Analyst: SAP Missed Out During Recent MDM Acquisition Spree

SAP, on the other hand, has had a lot of issues in the past couple of years. They haven’t made a direct MDM acquisition since they acquired A2i years and years ago, which was a PIM vendor and they’ve just been working off of that architecture and been trying to improve it.”

Liliendahl On Data Quality: Data Quality Tools Revealed

“Data matching is the ability to compare records that are not exactly the same but are so similar that we may conclude, that they represent the same real world object.”

BeyeNETWORK: Master Data Management: Moving Forward…

“So now that MDM has been around for a while, and the master data terminology has drifted into our standard vocabulary, it might be worth stepping back and asking a different question:  Is MDM the revolutionary approach to organizational data consolidation and enterprise information management or is it devolving into yet another  (of many) data management tools?”

Attacking Subscription Fraud with Identity Resolution

Friday, February 26th, 2010

By Mike Shultz, Infoglide Software CEO

In March 2006, the Communications Fraud Control Association (CFCA) estimated that annual global fraud losses in the telecom sector were between $54 billion and $60 billion, and the losses continue to be substantial. Many types of fraud have been identified, but by far the most prevalent is subscription fraud.

A new subscriber signs up for mobile service using false or stolen identification, with no intention of paying the bill. Since new subscribers are given a grace period of one to three months before the account is shut off, the criminal can make thousands of dollars worth of calls before being detected.

Subscription fraud can be difficult to differentiate from simple bad debt when genuine customers are unable to pay. It’s been estimated that 30% or more of all bad debt is actually subscription fraud.

Different solutions have been tried yet fraud continues to be a problem. One common method is to look for patterns of use that suggest potential fraud, but criminals adapt and learn to probe the limits of these fraud detection systems fairly quickly.

Given the industry’s long history with fraudsters, it seems probable that enough is known about them that they could be spotted at the time they subscribe.  Using similarity searching technology, would-be fraudsters can be vetted against lists of known bad actors. Using multiple public and private data sources, non-obvious relationships can highlight risky individuals, and they can then be asked to submit to a more thorough qualification process.

Identity resolution is already used across multiple industries to solve similar problems. By matching an individual’s attributes with common attributes associated with those committing fraud, the “bad guys” are being detected in areas like lottery fraud, fusion centers, insider trading, and workers’ compensation employer fraud. Part of finding the bad guys is finding hidden relationships, connections that often uncover rings of criminals.

The “birds of a feather” axiom predicts that subscription fraud criminals often share the same types of social networks. Applying identity resolution to subscription fraud problem may be the way to finally solve it.

Identity Resolution Daily Links 2010-02-23

Tuesday, February 23rd, 2010

By the Infoglide Team

WFAA.com: What is Texas doing to prevent terrorism?

“The Dallas police has a high tech fusion center that monitors potential threats in Dallas. They helped foil the plot when a man was planning on blowing up the Bank of America building… Four years ago, Dallas Police put alert on Kimberly Al-Homsi because she was scouting runways at Love Field. On Saturday, she was arrested allegedly with pipe bombs in her car.”

Liliendahl on Data Quality: Candidate Selection in Deduplication

“When a recruiter and/or a hiring manager finds someone for a job position it is basically done by getting in a number of candidates and then choose the best fit among them. This of course don’t make up for, that there may be someone better fit among all those people that were not among the candidates. We have the same problem in data matching when we are deduplicating, consolidating or matching for other purposes.”

Health Data Management: New Obama Health Plan Has I.T. Angles

“Proposals in Obama’s new proposal with a strong I.T. flavor include… Adopt real-time analysis of claims and payments data to identify waste, fraud and abuse in public health programs… Establish a CMS/IRS data-matching program to match information on entities that have evaded filing taxes against provider billing data to better detect fraudulent providers.”

Identity Resolution Daily Links 2010-02-20

Saturday, February 20th, 2010

[Post from Infoglide] Identity Resolution Still On the Rise

“We’ve noted several times over the past couple of years how the market visibility of entity resolution has been evolving. Now the consolidation of the master data management (MDM) market is causing even more conjecture about the crucial role of this technology.”

SIGNAL ONLINE: Good Guys Share, Bad Guys Lose

“Lindsey adds that personnel on Joint Terrorism Task Forces, in fusion centers or in other counterterrorism-related positions could benefit from the system by accessing the more complete data source and incorporating information found there into their own analyses and evaluations. ‘We’re out there for the crime fighters, but we’re also out there to prevent terrorism activities,’ he states.”

Claims Magazine: Fraud Triage Programs 

“The Federal Bureau of Investigation estimates that the total cost of insurance fraud (excluding health care) exceeds $40 billion per year. That means insurance fraud costs the average U.S. family between $400 and $700 annually in the form of increased premiums. In California alone, the Department of Insurance (CDOI) identified the potential loss from fraud in the 2007/2008 fiscal year at $1.2 billion, according to the 2008 Annual Report of the Insurance Commissioner.”

FoxNews.com: Flight Diverted to Florida Over Passenger’s Mistaken Identity

“Some airlines already have moved to a new identification program, called Secure Flight. All domestic carriers are expected to move to the new program by March. The government system will include more details about the passenger in question, including the passenger’s sex, birth date and full name as it appears on a government identification document.”

Precision Document Imaging: What is EMR?

“The American Recovery and Reinvestment Act of 2009 provides significant cash incentives to physicians who implement electronic health records. However, in order to qualify for these incentives the physician must not only have the proper software but must engage in “meaningful use” of the software. The government plans to publish the criteria for meaningful use in February 2010. ARRA incentive reimbursement to physicians will begin in 2011.”

Identity Resolution Still On the Rise

Wednesday, February 17th, 2010

By Mike Shultz, Infoglide Software CEO

We’ve noted several times over the past couple of years how the market visibility of entity resolution has been evolving. Now the consolidation of the master data management (MDM) market is causing even more conjecture about the crucial role of this technology.

We’re continually on the lookout for the trends and opportunities that affect the identity resolution space. We’ve written about entity resolution moving into cloud computing, the growing use of entity resolution by state agencies, the crucial role that identity resolution plays in fusion centers, how it’s related to “social CRM”, and how it might be used in e-discovery.

A few days after IBM’s recent announcement about buying Initiate Systems and a little over a week after Informatica’s acquisition of Siperian, Wayne Eckerson at tdwi wrote an insightful article in which he noted that these acquisitions are about MDM, yet they are also about identity resolution:

“Siperian is well-known for its master data management (MDM) solution… Initiate, on the other hand, is well-known for its identity resolution hub… At this point, I need to cycle back to Siperian and point out that it, too, provides identity resolution capabilities. And I forgot to mention that Initiate also has some MDM capabilities. You could say that Siperian is mostly MDM, but with identity resolution and other capabilities, whereas Initiate is mostly about identity resolution, but with MDM and other capabilities.”

Considering IBM’s acquisition of Initiate Systems, along with Informatica’s purchase of Siperian shortly before that, plus its 2008 purchase of Identity Systems, it’s clear that IdentityResolutionDaily is going to have even more to write about this year than before!

Bad Behavior has blocked 1290 access attempts in the last 7 days.

E-mail It
Portfolio Strategy News The Direct Marketing Voice