HOME

Archive for the ‘Identity Resolution’ Category

Identity Resolution Daily Links 2009-09-28

Monday, September 28th, 2009

[Post from Infoglide] Social CRM, CDI, and Identity Resolution

“In her well-read book on CDI, Jill Dyché offers a definition of CDI that also seems to describe social CRM. Try reading her definition of CDI, replacing ‘CDI’ with ’social CRM’: CDI is a set of procedures, controls, skills and automation that standardize and integrate customer data originating from multiple sources.”

Concord Monitor: Don’t play games when giving your name

“What do they want? Your date of birth, your gender and your middle initial. This information will be relayed to the TSA, and the TSA will match the information against information maintained by the Terrorist Screening Center (an arm of the FBI that gathers and consolidates watch lists). The theory is that a 12-year-old boy named John X. Doe can more easily be separated from John Z. Doe, who happens to be a 37-year-old man with a history of making bombs, if additional information is collected during the booking process. Once TSA has cleared you, you’ll be issued a boarding pass.”

pressdemocrat.com: Achieving paperless health care

“Medical record-keeping, until recently, relied on rooms full of paper files that were easily misplaced and filled with hurried, handwritten entries that could be hard to read. Electronic records hold orderly, keyboard-entered data that never leaves a hard drive and have the potential to move seamlessly from a primary care provider’s office to an emergency room or specialist’s suite.”

ebizQ: MDM Becoming More Critical in Light of Cloud Computing

[David Linthicum] “We’re moving from complex federated on-premise systems, to complex federated on-premise and cloud-delivered systems.   Typically, we’re moving in these new directions without regard for an underlying strategy around MDM, or other data management issues for that matter.”

Homeland Security: I&A Reconceived: Defining a Homeland Security Intelligence Role

“There are currently 72 fusion centers up and running around the country (a substantial increase from 38 centers in 2006).  I&A has deployed 39 intelligence officers to fusion centers nationwide, with another five in pre-deployment training and nearly 20 in various stages of administrative processing.  I&A will deploy a total of 70 officers by the end of FY 2010, and will complete installation of the Homeland Secure Data Network (HSDN), which allows the federal government to share Secret-level intelligence and information with state and local partners, at all 72 fusion centers.”

Identity Resolution Daily Links 2009-9-25

Friday, September 25th, 2009

By the Infoglide Team

[Post from Infoglide] Social CRM, CDI, and Identity Resolution

“In her well-read book on CDI, Jill Dyché offers a definition of CDI that also seems to describe social CRM. Try reading her definition of CDI, replacing ‘CDI’ with ’social CRM’:  CDI is a set of procedures, controls, skills and automation that standardize and integrate customer data originating from multiple sources(1).”

Charleston Daily Mail: Former owner of WVa trucking company sentenced

“Leonard Cline formerly owned H & H Trucking. The insurance commissioner says he defrauded the old state workers’ compensation system of more than $500,000 in unpaid premiums, penalties and claims for benefits over about 10 years.”

WTVQ: Eight People Indicted for Insurance Fraud

“The US attorney’s office says the suspects intentionally damaged insured automobiles owned by other conspirators then filed claims.”

KansasCity.com: Push for electronic medical records picks up steam

“With or without health care reform this year, electronic medical records are picking up steam. Recent technological advances are easing the transition for doctors and hospitals, and there’s the little matter of the Health Information Technology for Economic and Clinical Health Act. The act, part of last spring’s stimulus package, included billions of dollars to ‘advance the use of health information technology.’ There’s plenty of advancing to do, with one group estimating that less than half the hospitals and only one in five physicians are equipped to fully use electronic records. ‘The United States is far more advanced in grocery store technology than in medical records technology,’ said Steve Lieber, president and chief executive officer of the Healthcare Information and Management Systems Society in Chicago.”

pnj.com: Man charged with workers’ comp fraud

“Florida Chief Financial Officer Alex Sink announced the arrest today in a news release. In the release, Sink said her Division of Insurance Fraud said Soto is charged with falsifying employment numbers with the intent of avoiding higher workers’ compensation premium payments.”

Federal News Radio: Update: Identity management in the Obama administration

“The alphabet soup of identity management programs from the Bush administration — HSPD-12, TWIC, Real ID, and many more — have gotten little attention publicly during the first nine months of the Obama presidency. But that doesn’t mean identity management has been ignored totally, says one senior administration official.”

London Evening Standard: Lloyd’s chief warns of more insurance fraud

“Lloyd’s of London’s chief executive Richard Ward today warned the deep recession would increase the number of fraudulent claims being made against the insurance market.”

Computerworld: Laptop searches at airports infrequent, DHS privacy report says

“The U.S. Department of Homeland Security’s annual privacy report card revealed more details on the agency’s  controversial policy involving searches of electronic devices at U.S. borders. . . . For instance, numbers released in the report indicate that warrantless searches of electronic devices at U.S. borders are occurring less frequently than some privacy and civil rights advocates might have feared. Of the more than 144 million travelers that arrived at U.S. ports of entry between Oct. 1, 2008 and May 5, 2009, searches of electronic media were conducted on 1,947 of them, the DHS said.Of this number, 696 searches were performed on laptop computers, the DHS said. Even here, not all of the laptops received an ‘in-depth’ search of the device, the report states. A search sometimes may have been as simple as turning on a device to ensure that it was what it purported to be. U.S. Customs and Border Protection agents conducted ‘in-depth’ searches on 40 laptops, but the report did not describe what an in-depth search entailed. . . . The report chronicled similar efforts to monitor the privacy implications of a range of projects that privacy groups are also watching. Examples include  Einstein 2.0 network monitoring technology that improves the ability of federal agencies to detect and respond to threats, and the  Real ID identity credentialing program. The DHS’s terror watch list program, its numerous  data mining projects  and the secure flight initiative were also mentioned in the report.”

Social CRM, CDI, and Identity Resolution

Wednesday, September 23rd, 2009

By Robert Barker, Infoglide Senior VP & Chief Marketing Officer

In her well-read book on CDI, Jill Dyché offers a definition of CDI that also seems to describe social CRM. Try reading her definition of CDI, replacing “CDI” with “social CRM”:

CDI is a set of procedures, controls, skills and automation that standardize and integrate customer data originating from multiple sources(1).

In fact, Ray Wang of A Software Insider’s Point of View suggests that social CRM initiatives could be more effective by leveraging MDM technology. In a recent post he listed key questions that social CRM and other relationship management initiatives like CDI have to answer:

1.    Do we know the identity of the individual?
2.    Can we tell if there are any apparent and potential relationships?
3.    Are they advocates or detractors?
4.    How do we know whether or not we have a false positive?
5.    What products and services have been purchased in the past?
6.    Have we assessed how much credit risk we can be exposed to?
7.    What pricing and entitlements are customers eligible for?

So how exactly can social CRM systems resolve identities of individuals across multiple disparate data sources? How can they rationalize multiple variations and errors and anomalies that block finding existing customers within their systems?

The obvious answer is identity resolution. We highlighted in an earlier post that Dyché declared that identity resolution supports and enhances five of the eight core MDM functions enumerated in her book with Evan Levy. Similarly, identity resolution is critical in accurately answering key questions about identity in social CRM.

Ray’s list of questions can be divided into two sets. Accurately answering the first set related to identity and relationships (questions 1, 2, and 4) is critical to answering the rest of the questions. If we blow it on identity, it is impossible to make sense of social CRM data.

Social media marketing and social CRM are becoming more and more mainstream. If you want to get more familiar with social media marketing and social CRM, Paul Gillin’s recent book is a great way to get started.

If you’re already familiar and want to comment or take issue with this post, let us hear from you.

(1)Dyché, Jill and Levy, Evan. Customer Data Integration: Reaching a Single Version of the Truth. John Wiley & Sons, Inc. 2006. Page 274.

Identity Resolution Daily Links 2009-09-18

Friday, September 18th, 2009

[Post from Infoglide] Metrics for Entity Resolution

“In the last post I discussed the concepts of internal and external views of identity.  The fact that we can have different views of the same identity then raises the question of how to go about comparing different views.  What complicates this issue is that, even though we can talk about resolving references in pairs (i.e. linking two records if they refer to the same entity), the total number of references can be quite large, and consequently, there are many possible pair-wise combinations to consider.”

FederalComputerWeek: DOD opens some classified information to non-federal officials

“The non-federal officials will get access via the Homeland Security department’s secret-level Homeland Security Data Network. That network is currently deployed at 27 of the more than 70 fusion centers located around the country, according to DHS.”

Gerson Lehrman Group: Stylish Master Data Management

“In my experience, one of these styles is nigh-on impractical.  ‘Centralized’ (also called ‘transaction’) implies a wrenching architectural shift whereby the a master data hub becomes the one and only source of master data for an enterprise, replacing the functionality of generating master data in existing transaction systems, and serving up ‘golden copy’ data to other systems, perhaps via an enterprise service bus architecture. This sounds elegant, but is extremely invasive.”

INFORMATICA PERSPECTIVES: Get To “Meaningful Use” Faster

Identity resolution’s goal is to find the right person at the right time, regardless of the potential for error and variation in what information is available at the time of request.  This could be during patient registration and admission, patient transfers or referrals, emergency room visits, and simply sharing information across providers or insurers.  The ability to do this effectively must become the most basic and core function.”


Metrics for Entity Resolution

Thursday, September 17th, 2009

By John Talburt, PhD, CDMP, Director, UALR Laboratory for Advanced Research in Entity Resolution and Information Quality (ERIQ)

In the last post I discussed the concepts of internal and external views of identity.  The fact that we can have different views of the same identity then raises the question of how to go about comparing different views.  What complicates this issue is that, even though we can talk about resolving references in pairs (i.e. linking two records if they refer to the same entity), the total number of references can be quite large, and consequently, there are many possible pair-wise combinations to consider.

The number of pair combinations increases geometrically as the size of the list grows linearly.  The basic formula is that the number of pairs of distinct items from a list of N items is calculated by the formula N*(N-1)/2.  So even in a list of 10 items, there are 45 possible pair combinations – not so bad.

But now consider the issue of how many different ways these 45 links (comparisons) among the 10 references could be labeled as true (the two references are to the same entity) or false (the two references are to different entities) and at the same time make sense as links.  By making sense, I mean that if we were to label the link between references 1 and 2 as true, and the link between references 2 and 3 as true, then we would also have to label the link between references 1 and 3 as true.  Even in light of this condition, it still turns out that there are 115,975 ways that a set of 10 references could be linked together.

So to follow our example. Suppose that the 10 records are represented by the first 10 letters of the alphabet {a, b, c, d, e, f, g, h, i, j}.  One of the 115,975 ways they could be linked together would be {a, b, c} all linked as belonging to the same entity, {d, e} together, {f} by itself, and {g, h, i, j} together.  Another way is {a, c, d} together, {b, f} together, {e} by itself, and {g, h, i} together and {j} by itself.  So how similar is the first way of linking these 10 references to the second way of linking them?

This is not a new problem, and there are many ways to approach it.  The problem is easier to visualize if we build an “intersection matrix”.  The matrix is simply a table that lists one set of groupings as row labels and the other set as column labels.  The cell at a particular row and column is the size of the intersection between the groupings.  Here is the intersection matrix for the example just given:

talburt-matrix.png

The number 2 in the cell at the second row and second column of the table indicates that the first grouping in the first way the references were linked and the first grouping in the second way the references were linked share 2 elements in common, “a” and “c”.

In statistics, this is called cluster analysis, and there are several methods for comparing these clusters.  Most notable is the Rand Index that has a value from 0 to 1, with values closer to zero indicating less similarity and closer to one indicating more similarity.  The value is equal to 1 only when the two sets of groupings are identical.

The calculation of the Rand Index takes a bit of explaining, but basically it involves counting the pair-wise links in the various cells shown in the table above.  In this example, the value of the Rand Index turns out to be 0.800, or 80% similar.

A few years ago, Dr. Rich Wang, Director of the MIT Information Quality Program, and I wanted a simpler similarity index that could be used as a quick way to assess entity resolution results.  The method we developed is much simpler to calculate, in that it does not involve the formula for combinations.  The key values for calculating our index are just the number of groupings and the number of overlaps between those groupings.  The formula is as follows:

talburt-formula-091609.png

Where
|A| represents the number of groupings in the first linkage (number of rows in the table)
|B| represents the number of groupings in the second linkage (number of columns)
|C| represents the number of overlaps between the groupings (number of cells > 0)

For the example given in the table the value is TW = SQRT(4 x 5)/7 = 0.639.

According to our index, the two grouping are only about 64% similar.  In the next post I will discuss the application of our index and other metrics that can be used to assess entity resolution outcomes.

Identity Resolution Daily Links 2009-09-14

Monday, September 14th, 2009

By the Infoglide Team

MAINJUSTICE: Report Finds Flaws in DOJ Worker Comp Oversight

[easy registration required] “The Justice Department does not have effective measures in place to prevent fraud, abuse and waste in its program to provide compensation for employees with work-related injuries or illnesses, according to a DOJ Office of Inspector General report released today.”

Information Management: HP and Informatica’s Expanded Relationship: Portent of Bigger Deals to Come?

“So is the partnership with Informatica a ‘proof of concept’ for future acquisition or is it simply HP BIS’s answer: ‘We are a services business and we will leave software to our partners’?”

FederalComputerWeek: 5 decisions that will determine the fate of e-health records

“Under the economic stimulus law passed earlier this year, as much as $45 billion will be distributed to health care providers who buy and use approved electronic health record systems. The road ahead is still bumpy for EHRs, but experts say success hinges on the outcomes of five major decisions.”

Dalton’s Blog: Migrating Data into an MDM Repository - Case Study

“Notice that if you’re using Data Federation to implement your MDM solution, there is no data migration. Data Federation acts as a virtual central repository, and as such, does not require a physical copy of your source data. Data Federation “translates” the source information in real time according to required business rules and definitions. It is, so to speak, a real-time Extract-Transform process.”

Identity Resolution Daily Links 2009-09-12

Saturday, September 12th, 2009

[Post from Infoglide] False Positives versus Citizen Profiles

“A post from Steve Bennett in Australia refers to an announcement by the Dutch government about their intent to prevent crime by profiling their citizens. By creating a digital profile of each citizen using banking, flight, and internet usage information, their justice department plans to compare citizen profiles with those of convicted criminals, then let law enforcement authorities know when matches are found. Needless to day, the move has created quite a bit of discussion in the Netherlands.”

MAINJUSTICE: Rival Agencies Agree to Halt Turf Battles

[quick registration] “‘By bringing together the agencies and personnel with existing resources and expertise we can work more effectively as partners to shut down organized crime networks, seize assets and save taxpayer dollars in the process,’ said Deputy Attorney General David Ogden in [a] statement announcing the partnership.”

HealthData Management: Assessing Demand for EHRs

“On Aug. 20, David Blumenthal, M.D., national coordinator for health information technology, predicted that the final definition of the “meaningful use” of electronic health records that will be used to determine eligibility for incentive payments under the economic stimulus program will not be available until the middle or end of spring 2010.”

South Florida Business Journal: NICB: Suspicious insurance claims up

“The number of suspicious insurance claims rose to 41,619 in the first half of the year, up from 36,743 in the prior-year period, according to a review of insurance claims referred to the National Insurance Crime Bureau.”

False Positives versus Citizen Profiles

Wednesday, September 9th, 2009

By Mike Shultz, Infoglide Software CEO

A post from Steve Bennett in Australia refers to an announcement by the Dutch government about their intent to prevent crime by profiling their citizens. By creating a digital profile of each citizen using banking, flight, and internet usage information, their justice department plans to compare citizen profiles with those of convicted criminals, then let law enforcement authorities know when matches are found. Needless to day, the move has created quite a bit of discussion in the Netherlands.

In no way would such a move fly in the United States. From the time of its founding, our citizens have consistently shown a distrust of government that has limited its control over basic freedoms. While some would argue that the U.S. government has gained too much control over the years, that healthy distrust has definitely limited government intrusion into our personal freedoms.

In contrast to the broad approach proposed by the Dutch Minister of Justice, systems using entity resolution can avoid the “boil the ocean” approach. You can target specific data sources that hold relevant information, and then compare the bare minimum of attributes needed to discover hidden relationships, all without creating and storing profiles on millions of non-criminal citizens.

With such a system, can false positives occur? Yes, but the technology has become so sophisticated that the chance of a false positive is minuscule. The judgment to be made is whether the number of false positives outweighs the increased level of security afforded the public.

No doubt, the lively discussion between those concerned about invasion of privacy and those focused on keeping the populace safe will continue. And that’s how it should be.

Identity Resolution Daily Links 2009-09-04

Friday, September 4th, 2009

[Post from Infoglide] Shell Games

“We’ve talked before about how some employers will dissolve a company, then re-form it with the same people but under a new name. The objective? Reduce payments to workers’ compensation programs, where premiums are based on the historical level of claims. Erase the history by forming a new company, and voila! Your premiums are now lower, but there’s a catch – doing that constitutes fraud and it’s illegal.”

OCDQ Blog: To Parse or Not To Parse

[Jim Harris] “Data matching often uses data standardization to prepare its input.  This allows for more direct and reliable comparisons of parsed sub-fields with standardized values, decreases the failure to match records because of data variations, and increases the probability of effective match results.”

kpvi.com: Idaho Falls Woman Arrested in Undercover Lottery Sting

“An undercover operative gave McKelley a decoy lottery ticket that McKelley thought was worth at least $100,000. She kept the ticket and took [it] to Boise to the Idaho Lottery Headquarters to collect. Police arrested her when she showed up and she now faces felony charges of presenting an illegally obtained lottery ticket.”

Initiate Blog: Data Hubs: Master Data Repository or Master Data Service?

“The current reality is that the concept of a data hub includes a much more active approach to data than just storage of a “golden record”. The data hub makes the best decisions on entity and relationship resolution by arbitrating the content of data in the source systems where the master data is created.”

Shell Games

Wednesday, September 2nd, 2009

By Robert Barker, Infoglide Senior VP & Chief Marketing Officer

We’ve talked before about how some employers will dissolve a company, then re-form it with the same people but under a new name. The objective? Reduce payments to workers’ compensation programs, where premiums are based on the historical level of claims. Erase the history by forming a new company, and voila! Your premiums are now lower, but there’s a catch – doing that constitutes fraud and it’s illegal.

Now I just read a Computerworld article about what appears to be a similar scheme.  In the largest H-1B fraud case ever brought by the federal government, prosecutors have filed against a New Jersey IT services company that recruits talent from overseas and gets them H-1B visas. On the surface, that sounds legal, right?

Yes, but you’re required to pay those imported workers at the prevailing wage rate of the state in which they’ll be working. The law won’t let you pay less than the prevailing rate because that would penalize U.S. citizens who could be hired to do the same job.

So, this New Jersey firm allegedly decided to improve their profitability by creating shell firms in Iowa and obtaining the H-1B visas there, where the prevailing wage rate is significantly lower than wages in New Jersey. Oops – now there’s an 18-count indictment against them because they “have substantially deprived U.S. citizens of employment.”

So how would you automate detecting similar situations? I confess I’m not familiar with exactly how this company was caught, but it seems like an obvious opportunity to apply entity resolution technology. In this case, you’d use identity resolution software (see IRE as an example) to connect to multiple data sources, like a database of information on H-1B applicants in various statesand a database of companies who are H-1B filers along with other sources of data such as income tax filings and various corporate filings, then let the software do the work of finding the hidden connections.

Applying this technique to workers’ comp fraud where company officers similarly have formed shells and even dissolved existing companies and started new ones, the results have been quite successful. It would be interesting to consider a similar solution with someone with H-1B investigation experience.

If you’re out there, what are your thoughts?


Bad Behavior has blocked 1203 access attempts in the last 7 days.

Close
E-mail It
Portfolio Strategy News The Direct Marketing Voice