HOME

Archive for February, 2009

Identity Resolution Daily Links 2009-02-27

Friday, February 27th, 2009

By the Infoglide Team

[Post from Infoglide] Rules-Based and Probabilistic Entity Resolution

“If you’ve followed recent developments in the entity resolution market, including the recent re-positioning of existing vendors like Netrics and Initiate Systems, you may have heard discussion about the relative merits of rules-based entity resolution using attribute-specific analytics versus probabilistic entity resolution that uses mathematical analytics exclusively.”

[Post from Infoglide] Entity Extraction: The Flip Side of Entity Resolution

“Under our working definition of entity resolution as locating and merging references to the same entity, the last installment focused on the merge problem, and how matching is often used as a stand-in for ER.  Now let’s take a look at the locating problem.”

NY State Dept. of Labor: Monroe County Contractors Arrested on Charges of Fraudulently Misclassifying Employees

“‘I know times are tough, and employers are looking to cut costs wherever they can,’ said M. Patricia Smith, Commissioner of the Department of Labor. ‘But it is precisely because times are tough that employers must continue to obey the law. Workers misclassified as ‘independent contractors’ or paid off the books are not receiving protections they are entitled to - protections like unemployment insurance that are particularly critical, given today’s uncertain economy.’”

SearchSAP.com: Successful MDM strategy starts with finding broken processes, not technology

MDM tends to come across like an infrastructure project or middleware — something that IT would sponsor, according to Dan Power, president of Hub Solution Designs Inc., an MDM consulting firm. But placing sponsorship with IT misses the point, he said.”

BeyeNetwork: The Impact of the Obama Healthcare Agenda on Business Intelligence

“There are several areas of the Obama-Biden plan that could have a significant impact on business intelligence if they come to fruition. The first is the intent to ‘invest in proven strategies to reduce preventable medical errors.’ First and foremost is wider adoption of electronic medical records (EMR).”

[Infoglide founder David Wheeler’s father Roger was an owner of World Jai Alai. Winter Hill Gang members James J. “Whitey” Bulger, Stephen Flemmi, and Johnny Martorano were indicted for his 1981 murder 20 years later in 2001.]  

Mercury News: Ex-FBI agent sentenced to 40 years in 1982 killing

“Former FBI agent John Connolly was sentenced Thursday to 40 years in prison for slipping information to Boston mobsters that led to the 1982 shooting death of a Miami gambling executive.”

Rules-Based and Probabilistic Entity Resolution

Thursday, February 26th, 2009

By Robert Barker, Infoglide Senior VP and Chief Marketing Officer

If you’ve followed recent developments in the entity resolution market, including the recent re-positioning of existing vendors like Netrics and Initiate Systems, you may have heard discussion about the relative merits of rules-based entity resolution using attribute-specific analytics versus probabilistic entity resolution that uses mathematical analytics exclusively. Facts can get distorted in the heat of discussion, so let’s examine a few of the arguments and then look at the facts:

  1. Rules-based systems can only yield binary answers; i.e. they require that attributes either match exactly or not.
  2. FACT:  The best systems use the best of rules and similarity searching technology, enabling them to compute the distance between attributes and make complex decisions based on those calculations. These hybrid solutions can actually run more efficiently while providing better results.

  3. Rules-based systems demand that all data sources be centralized and rationalized.
  4. FACT: Flexible systems do not require data sources to be centralized, warehoused, or have conforming schemas.

  5. Probabilistic-only entity resolution systems enable decision-making based on relative likelihood, while rules-based systems do not.
  6. FACT: It’s possible to combine the best of rules and probability to create effective identity resolution solutions that make decisions based on relative likelihood.

“One size fits all” doesn’t work well in many domains, and it unnecessarily constrains the development of entity resolution solutions. For example, suppose your solution needs to include automobile license plate numbers as one attribute to help resolve entities. Mathematical probability won’t detect that “13” is similar to “B” while an attribute-specific analytic quickly makes the connection.

So what’s the takeaway? Be skeptical when you hear that rules and probability don’t mix. More importantly, question why you have to choose one or the other when you could have both.

Entity Extraction: The Flip Side of Entity Resolution

Wednesday, February 25th, 2009

By John Talburt, PhD, CDMP, Director of the Laboratory for Advanced Research in Entity Resolution and Information Quality (ERIQ) at the University of Arkansas at Little Rock

John Talburt - smallUnder our working definition of entity resolution as locating and merging references to the same entity, the last installment focused on the merge problem, and how matching is often used as a stand-in for ER.  Now let’s take a look at the locating problem.

First we should note that information comes to us in two forms, structured and unstructured.  The traditional world of IT has been built around structured information based on the discipline of relational database schemas.  In essence, data is structured if it is ready to be loaded into a relational database, i.e. all of the entities and their attributes are clearly delimited or tagged in a way that a computer can correctly read the entire data set by following one simple, repeating pattern.  In the good ole days, the flat-file format gave us this by requiring that every record must have a fixed length and every attribute must occupy a fixed position in the record.  Inspired by the spreadsheet paradigm, a friendlier version came along only requiring that all of the attributes be presented together in a fixed order, each separated from the other by a specially designated character, the delimiter.  Now XML has brought us yet another discipline of explicitly tagging the start and end of records and attributes with a consistent naming convention.

So in the structured world, locating is easy, you just follow the pattern.  The problem is that we are now beginning to realize that there is a tremendous amount of information in unstructured formats such as free-form documents, photos, videos, audio files, sensor data, and other formats, formats that are not easily mapped into an entity-attribute schema.  Even if we just focus on information encoded in character (text) format, the total amount of unstructured information in most organizations often exceeds the amount of structured information by a considerable amount.  What’s more, we now realize that some of this information could be important, i.e. that processes like customer relationship management (CRM) could be transformed if the company only knew what their customers were saying in their emails to the company or in the comment they gave to telemarketers or technical support personnel who typed those comments into a free-form, notes field.

So how did we end up with so much unstructured information? Did good information go bad?  No, the reason is that the information age operates on four channels –  people to computers, computers to people, computers to computers, and people to people – and it is the latter generates the unstructured information.  Person-to-person communication is inherently complex and often carries a tremendous amount of implicit and explicit context that people understand, but computers don’t.

Early in my career, I worked with a professor on the problem of disambiguation of homographs using thesauri (a fancy way of asking if a computer can understand the difference in meaning between two words that are spelled the same, but mean different things, just by looking at the synonyms of the words around them., e.g. “I can open this can.”)  His favorite test was “Time flies like an arrow, but fruit flies like a banana.”

But getting back on topic, if you want to resolve whether references are to the same or different entities, you must first have the references.  So if the information sources are unstructured, the locating side of entity resolution is about finding the entity references.  This process is variously referred to as “named entity recognition”, “entity identification”, or “entity extraction”.  In the next installment we will discuss some of the strategies for entity extraction from unstructured text documents.

Identity Resolution Daily Links 2009-02-23

Monday, February 23rd, 2009

By the Infoglide Team

The Peterborough Examiner: Uncovering lottery fraud

“In 2007, two Parisian tobacconists admitted they had cheated a customer out of a C$42 million lottery win. The Paris shop keepers had told an accountant customer his tickets were not winners, but then handed in the tickets themselves, through a third party. It wasn’t until a bank noticed odd transactions by the fake winner, that investigators were called in.”

Gartner: Of Data Quality and its role within MDM; and defining master data (again)

[Andrew White]”On a lighter note, this week as focused on briefings from vendors on their MDM for Product Data offerings and strategies.  While all that was going on, Gartner analysts were merrily chatting along about what is the difference between MDM and metadata management; and master data and metadata; and finally entity resolution.”

WorkersCompensation.com: Texas Mutual Announces a Triple Scoop of Double-Dipping Convictions

“Double-dipping occurs when claimants collect workers’ comp benefits for being too injured to work when they are, in fact, gainfully employed. Texas law requires claimants to contact their workers’ comp carrier when they return to work. Left unchecked, double-dipping and other workers’ comp fraud can lead to higher premiums for all Texas employers.”

Chico Enterprise Record: Second Chico Lotto sting nets mini-mart owner

Lotto officials allege Dosanjh told decoys posing as players they held worthless Scratcher tickets, when some of them should actually have scanned as $1,000 winners. Lotto law enforcement spokesman Bill Hertoghe said Dosanjh passed at least one of the tickets to a friend, who attempted to redeem it. Officials said the prize money would be forthcoming, but launched an investigation that eventually resulted in a $15,000 arrest warrant for Dosanjh.”

Identity Resolution Daily Links 2009-02-20

Friday, February 20th, 2009

[Post from Infoglide] The Human Element in Identity Resolution

We’ve written quite a few posts here on the subject of identity resolution’s application to a broad range of problems that include terrorism, insurance fraud, crime, lottery fraud, sexual predators, workers comp employer fraud, and retail returns fraud. What we haven’t discussed very much is the relationship between the technology and the human beings that employ it.

Boston Globe: Woman to be sentenced in asbestos case

“Deleon ‘cheated the system’ in two ways to enrich herself, according to the government’s case. Under her ownership, Environmental Compliance Training issued false asbestos removal training certificates and lied about it to the state. She also evaded payroll taxes and workers’ compensation insurance premiums by paying hundreds of employees of Methuen Abatement Staffing under the table. The company had a gross unreported payroll of $4.6 million from 2002 to 2006, according to a government document introduced at the trial.”

New Mexico Independent: Homeland security ‘fusion centers’ are working, but concerns abound

“The federal assessment of the nation’s fusion centers — which borrows heavily from earlier reports by such internal watchdogs as the Congressional Research Service (CRS) and General Accountability Office (GAO) — lists a few privacy, transparency and oversight concerns about the fusion centers.

Wall Street Journal: Tips for TSA to Make Flying Safer, Easier

“Some experts suggest that the TSA cut back on the air marshal program, which puts law-enforcement agents on some flights, and shift spending to more effective security measures. Experts also want to see major changes in the current Registered Traveler program to further speed up security procedures for frequent travelers and focus resources on travelers who haven’t undergone background checks. They also want to see more variation to today’s predictable screening so bad guys don’t know exactly how to circumvent security.”

Gartner: Can “single view” of master data be achieved without an MDM technology?

[Andrew White] “Certainly users have been trying to achieve ’single view’ for many years, before the phrase master data management was coined.  The problem of trying to maintain a semantically consistent definition of master data across the business has been a long standing desire for most firms.  It is because business (and to a great extend, IT also) has grown to be so complex, that since 2000 many firms have begun to look to specific tools to help.”

WorkersCompensation.com: Chenango Man Charged As Fraud In Fish Story

“Investigators from NYSIF’s Division of Confidential Investigations said Mr. Panus was receiving workers’ compensation payments for a work-related back injury that occurred in 1988. The investigation, conducted in cooperation with the New York State Insurance Department Frauds Bureau and the Office of the Workers’ Compensation Fraud Inspector General, found that Mr. Panus was allegedly self-employed as the owner of Ponderosa Fish Farm while receiving benefits totaling $66,100.”

The Human Element in Identity Resolution

Wednesday, February 18th, 2009

By Robert Barker, Infoglide Senior VP and Chief Marketing Officer

We’ve written quite a few posts here on the subject of identity resolution’s application to a broad range of problems that include terrorism, insurance fraud, crime, lottery fraud, sexual predators, workers comp employer fraud, and retail returns fraud. What we haven’t discussed very much is the relationship between the technology and the human beings that employ it.

We software marketers are sometimes tempted to make it sound as though our products solve problems automatically. The truth is that identity resolution software performs tasks that humans could do, but it does them at a level of speed and precision that significantly enhances the results accomplished through those tasks. In order for the software to achieve excellent results, however, human judgment is required both in implementing the software and in applying the results.

The specifics of a particular problem differ markedly, and every solution is different. A person of interest in airline passenger screening has very different characteristics from a person of interest in workers compensation fraud, for example. Solutions differ even within a single problem domain, e.g. Nordstrom and Walmart have very different philosophies for merchandise returns.

In simpler data quality applications, default configurations can address many problems, but in identity resolution, a little tuning by experts greatly increases the solution’s value.  A domain expert may not understand the technology, but they understand their problem, industry, application, and company. And because of their depth of understanding of their domain, they can tell great results from good results in a heartbeat.

For maximum benefit, human domain experts work with technology experts to tune the software during implementation to apply similar “judgment” as the experts themselves would use to resolve multiple identities, uncover hidden relationships, and minimize false positives and false negatives. Technology’s critical role is to automate the process of sifting through the data to find likely matches and non-obvious relationships and to prioritize the cases that require human intervention so that finite human resources can focus on the most important things first.

While it’s critical to have software that can produce results right “off the shelf,” it is the domain expertise coupled with the technology expertise that creates a solution that is perfectly matched to the needs of a particular industry, application, and company.

Identity Resolution Daily Links 2009-02-13

Friday, February 13th, 2009

[Post from Infoglide] Shocking Behavior

“This will rock you to your core: bad guys who are the targets of law enforcement investigations try very hard to hide their identities whenever possible. OK, so maybe that isn’t so shocking. There are very few of us who have not heard the acronym ‘AKA’ (Also Known As). We associate such terms – AKA, alias, assumed name, handle, etc. – as signaling devious intent. Formal studies have shown that nearly a third of criminals have used false names for the purpose of intentional deception.”

nextgov: ‘Fusion center’ privacy fears persist

“As the program matures, the DHS Privacy Office anticipates discovering new privacy challenges that need to be addressed and the PIA will be updated whenever necessary, the document said. Additionally, the Privacy Office called for ‘a regular and ongoing examination of privacy issues within the fusion centers.’”

Toronto City News: OLG Bans Staff From Buying Tickets

“The move follows scathing criticism by ombudsman Andre Marin, who also suggested including lottery retailers in the ban. ‘If retailers, insiders, prove they are an ungovernable lot, then we should ban them from playing the lottery,’ he noted.”

North Country Gazette: Charter Bus Operator Accused Of Workers Comp Fraud

“According to investigators, DiPaolo failed to provide workers’ compensation insurance for the 25 employees of his school and charter bus service between 2006 and 2008, thereby avoiding payment of an estimated $130,000 in premiums.”

CIO: Data-Management Danger: Less Than Half of MDM Plans Are Effective

“So while many companies acknowledge their data problems and resultant poor decision-making processes, less than one-third of businesses have taken steps to remedy the situation with a data-governance program, according to the results. Thirteen percent of respondents said they were unclear as to what data governance was.”

Destination CRM: Megavendors Look Smart in Gartner Magic Quadrant for Business Intelligence

“The report calls 2008 ‘a year of transition’ following ‘the vendor merger and acquisition turbulence of 2007′ — a reference to the major acquisitions of Business Objects by SAP, Cognos by IBM, and Hyperion Solutions by Oracle.”

NOTE: Our next post will be on Wednesday, February 18. Happy President’s Day!

Shocking Behavior

Wednesday, February 11th, 2009

By Charles Clendenen, Infoglide Director of Professional Services

This will rock you to your core: bad guys who are the targets of law enforcement investigations try very hard to hide their identities whenever possible.

OK, so maybe that isn’t so shocking. There are very few of us who have not heard the acronym “AKA” (Also Known As). We associate such terms – AKA, alias, assumed name, handle, etc. – as signaling devious intent. Formal studies have shown that nearly a third of criminals have used false names for the purpose of intentional deception. You might assume criminals prefer to adopt an entirely new and different identity with a different value for every attribute, e.g, name, address, phone, SSN, and so forth. That happens more in the movies than in real life.

When you are living a lie, it can be hard to keep your stories straight. It’s difficult and time-consuming to establish a new identity completely from scratch. When you open a new account at a bank or apply for credit card, for example, a lot of picky little questions get asked, and it takes a lot of attention to detail to change every attribute of your identity. Criminals can get lazy and may cut corners and look for the path of least resistance to get what they want.

So people using fraudulent identities, knowing that their behavior is dishonest and trying to hide their activities in various ways, do so by (1) making transactions look as normal as possible, (2) by obfuscating identity information, and (3) by hiding their relationships with associates. Rather than concoct an entirely new identity, some learn how to obscure their identity to the automated systems that track almost everyone now. They change small details of their identity, like transposing numbers and letters in their address or by slightly changing a phone number

So, how does law enforcement deal with “dirty data” like this when searching with the assistance of technology? It may seem counter-intuitive, but in a typical database search, the more attributes you use in your search, the more records you eliminate from the search results. Try it yourself with your favorite Web search engine. If you are too specific and use too many terms in your search, you will get few results or none at all. With identity resolution and similarity search, you will actually get more results with the most useful results weighted and ranked by proximity to your search terms. Similarity search techniques are also invaluable for extending relationship detection beyond the obvious to the non-obvious.

Just as in other endeavors where authorities and investigators are using technology to find bad guys (e.g. insurance claims processing, airline passenger screening, and retail fraud screening), an emerging technology called identity resolution can sort out true identities and find hidden relationships within previously obscured and obfuscated criminal investigation data sources.

If you have a story to share about the challenges of breaking through to find who’s who and who knows whom in the law enforcement world, drop us a comment.

Correction

Wednesday, February 11th, 2009

In a piece entitled “Mistaken Identity Resolution Part V: Identity Resolution vs. Data Quality” that was posted May 28th of last year, we said that a “Gartner study done several years ago estimated that poor quality customer data costs U.S. businesses an estimated $611 billion dollars a year.” We thank Ted Friedman  and Mark Beyer at Gartner for recently pointing out that the estimate actually came from a 2002 TWDI study by Wayne Eckerson entitled “Data Quality and the Bottom Line.”

Identity Resolution Daily Links 2009-02-09

Monday, February 9th, 2009

By the Infoglide Team

CBC News: Insider lottery wins almost double earlier estimate

“Also, the analysis found 50 instances that suggested an insider retained a winning ticket and claimed it for himself or herself. This misreporting occurs when a lottery player has two winning “free” tickets, but the retailer reports only one ticket as a winner.”

Bastrop Daily Enterprise: La. Workforce Commission Announces Workers’ Compensation Fraud Program

LWC Executive Director Tim Barfield and Office of Workers’ Compensation Administration Director Chris Broadwater were joined by David Caldwell, the deputy director of the Office of the Attorney General’s Criminal Division, in announcing the program. ‘All employers doing business in Louisiana are required to provide workers’ compensation coverage for their employees. Businesses that are not covering their employees are shortchanging those employees and driving up costs for other businesses,’ Barfield said.”

Forrester: Is BI Recession-Proof, Or Are We Just Bracing For The Next Shoe To Drop?

[James Kobielus] “What’s going on here? Is the BI industry recession proof, or is the next soft-economy shoe–or heavy hammer–poised to drop on this segment’s unsuspecting heads?”

FederalComputerWeek: House again passes traveler redress bill

“Problems with the name-based watch list have been scrutinized in congressional hearings and received extensive press coverage. However, DHS officials have said the Secure Flight Program will reduce mismatches.”

Risk & Insurance: A Simple Solution to a Multibillion-Dollar Problem

“In August 2007, the Employers’ Fraud Task Force rolled out a draft of O’Brien’s form to get the impressions of the industry. The EFTF is a private agency based in California that has been fighting workers’ compensation fraud for 10 years. Its members consist of a group of large, self-insured companies including Disney, Safeway and Warner Brothers. Reactions from EFTF members on the O’Brien form were very positive.”


Bad Behavior has blocked 594 access attempts in the last 7 days.

Close
E-mail It
Portfolio Strategy News The Direct Marketing Voice