HOME

Archive for March, 2009

Identity Resolution Daily Links 2009-03-30

Monday, March 30th, 2009

By the Infoglide Team

data quality PRO: Identifying Duplicate Customers (Part 1)

[Jim Harris] “What is sometimes overlooked is that although technology provides the solution, what is being solved is a business problem. Technology sometimes carries with it a dangerous conceit – that what works in the laboratory and the engineering department will work in the board room and the accounting department, that what is true for the mathematician and the computer scientist will be true for the business analyst and the data steward.”

Correction Officers Going Wrong: California Correctional Officer Arrested for Fraud

“Each insurance fraud count carries up to five years in state prison. Also, California workers’ compensation fraud statutes require restitution of double the monetary amount of the fraud; the suspected loss on this case is more than $150,000, not including more than $1.6 million in disability retirement from the California Public Employee Retirement System (CalPERS) that would have been paid out on this suspect claim.”

TwinCities.com: What if your lottery sales clerk said your ticket was a loser … and lied?

“‘(We) really need our retailers to be honest and to have their employees do it right every time,’ said state lottery director Clint Harris. The stings took place last December and January at 186 randomly selected metro stores, Harris said. Undercover agents would ask clerks to verify the specially constructed crossword game scratch-offs as winners. The prizes ranged from $7,000 to $21,000.”

Register Herald: Fusion center helps fight war on crime

“Kirk is handling a new mission in life, directing West Virginia’s fledgling fusion center, a new tack in the war on terrorism and crime in general. Put simply, it acts as a clearinghouse so data can be analyzed and the proper law enforcement agency put on notice for immediate, or long-range, investigations.”

Dashboard INSIGHT: The increasing convergence of MDM and data governance

“The concept of data governance is simple.  The Data Governance Institute (datagovernance.com) defines it as ‘a system of decision rights and accountabilities for information-related processes, executed according to agreed-upon models which describe who can take what actions with what information, and when, under what circumstances, using what methods.’”

Identity Resolution Daily Links 2009-03-27

Friday, March 27th, 2009

[Post from Infoglide] The Myth of Matching: Why We Need Entity Resolution

“In a previous post I talked about the two sides of entity resolution - locating and merging.  Before continuing the discussion on entity extraction, let’s briefly revisit the issue of merging, in particular the popular misconception that matching or record de-duplication is the same as entity resolution.  Matching is a necessary part of entity resolution, but it is not sufficient.”

WorkersCompensation.com: Corrections Officer Charged With Workers’ Comp. Fraud

NYSIF DCI conducted the investigation in cooperation with the New York State Insurance Department Frauds Bureau, the Office of the Workers’ Compensation Board Fraud Inspector General and the New York State Department of Corrections. Investigators alleged that Mr. Fetzer collected $31,052 to which he was not entitled.”

USA Today: 80,000 on TSA’s ‘cleared’ fliers list

“The additions to the Transportation Security Administration’s ‘cleared list’ reflect an influx of requests from people asking to be removed from the watch list. The watch list database has expanded 32% since 2007, to more than 1 million entries. The cleared list has grown because about 99% of the fliers seeking to be removed from the watch list were never on it…”

Service Oriented Blog: Is having SOA and Master Data Management at the same time a form of overkill?

SOA in and of itself holds little value to an organization unless it provides the capability to open up information to the enterprise. As is the case with SOA, successful MDM is a silo-breaker, invoking collaboration across the enterprise. MDM helps assure that the information populating SOA-based services is accurate, timely, and consistent.”

The Myth of Matching: Why We Need Entity Resolution

Wednesday, March 25th, 2009

By John Talburt, PhD, CDMP, Director, UALR Laboratory for Advanced Research in Entity Resolution and Information Quality (ERIQ)

In a previous post I talked about the two sides of entity resolution - locating and merging.  Before continuing the discussion on entity extraction, let’s briefly revisit the issue of merging, in particular the popular misconception that matching or record de-duplication is the same as entity resolution.  Matching is a necessary part of entity resolution, but it is not sufficient.

Record matching as a proxy for entity resolution is based on the premise that “if two records share the same (or almost the same) set of identity attributes (i.e. they match), then they represent the same entity.”  However, there are two problems with this assumption:

(1) The set of identity attributes being used is not always sufficient to differentiate among all of the entity references.

(2) The converse of the statement is not true: if two records represent the same entity, then they do not necessarily have the same set of identity attributes (i.e., they may not match).

Anyone who has worked with name and address information has been stung by the situation where “John Doe, 123 Oak St” was matched with “John Doe, 123 Oak St” and linked these records as references the same entity only to discover later that one record was a reference to John Doe, Sr. and the other to John Doe, Jr., different people.

The problem here is the absence of the name suffix attribute or other attributes such as age that would allow us to disambiguate between these references, thus a false positive resolution. The collection of identity attributes should be sufficient to differentiate entities within a specific context.  For example, a simple email prefix created from initials and last name may be a unique identifier in your company, but is likely to collide with another email identifier in a larger context such as Yahoo mail.

Even given that the set of identity attributes is large enough to avoid a false positive, the larger problem with matching as a surrogate for entity resolution is that it produces false negatives.  For example, “Mary Doe, 234 Elm St” and “Mary Smith, 456 Pine St” do not match, but does that mean they are not references to the same entity?  It could very well be the case that Mary Doe married John Smith and moved to his house at 456 Pine St.

The false negative problem is a more difficult problem to solve.  In the best case, it is a matter of updating our entity view with readily available information.  In the worst case, it is deliberate attempt to conceal a connection or collaboration, something that can be much harder to determine.  In any case it is an area of active research and development.

Currently there are two primary approaches to solving the false-negative problem.  The first is to enlarge the scope of identity information in an attempt to insure there will be a path connecting any two references to the same entity.  The second is to locate and save associative information among the entities of interest to build explicit declarations of connection. Both of these approaches have their advantages and disadvantages.  In the next post I will discuss both of these approaches in more detail.

Identity Resolution Daily Links 2009-03-23

Monday, March 23rd, 2009

By the Infoglide Team

Vermont News Guy: When Is a Worker Not an Employee?

“The practice - scorned as ‘1099ing,’ by construction union officials (for the Internal Revenue Service form that freelance workers fill out) -short-changes Worker Compensation, Unemployment Insurance and Social Security funds. It also ‘creates an unlevel playing field,’ in the words of Vermont Labor Commissioner Patricia Moulton Powden. Businesses that play by the rules can be underbid by their competitors who do not. The specific reason is the company, GNPB/Kal-Vin , which sometimes goes by only one or the other of those names, and which is known by contractors, union leaders, and government officials as a company with a spotty labor law record.”

Forrester Blog: Lean Information Management Strategies For Lean Times

[James Kobielus] “Many organizations struggle to gain control over information infrastructures that have become too bloated, rigid, and slow to realign with new business drivers. Lean information management practices are essential for corporate survival. They are far more than belt-tightening exercises. They also help you build analytic muscle for excelling in any business environment.”

Chicago Daily Herald: State’s new fraud unit targets workers’ comp abusers

“Every employer in Illinois is required by law to have workers’ compensation insurance. The amount a company pays for this insurance depends on the business and - just like with car insurance - it will go up based on the number of claims filed. For this reason, workers’ compensation fraud is not committed only by employees, but by medical providers and employers trying to avoid pricey premiums and payouts. They do this in a variety of ways, said Michael McRaith, director of insurance for the Illinois Department of Financial and Professional Regulation.”

Identity Resolution Daily Links 2009-03-20

Friday, March 20th, 2009

[Post from Infoglide] Matching – MDM’s “Secret Sauce”

[Dan Power, Hub Solution Designs]”Few areas in master data management (MDM) are as critical as identity resolution. Just yesterday, I was working with a client on a matching issue where their customer (a car dealership) was matched to a veterinary clinic because the business names both contained the city and the client had somehow entered the address of the vet clinic in their customer record.”

Risk & Insurance: California Insurance commissioner announces $1 million grant to fight comp fraud

“After a meeting of the state’s Advisory Task Force on Insurance Fraud’s Blue Ribbon Review Committee last year, Poizner announced the implementation several actions to help reduce fraudulent claims, including the creation of a fusion center for insurance fraud investigations so law enforcement can share information more efficiently and quickly to identify emerging trends and crime patterns.”

BeyeNETWORK: Can Enterprise Data Warehousing and Master Data Management Projects Survive the Recession?

“Ambitious IT projects in areas such as enterprise data warehousing (EDW) and master data management (MDM) are likely to suffer as CIOs focus more on reducing IT costs, than leveraging IT to help the business fight the recession. As I noted in a recent blog,  the solutions that will have the most impact in 2009 will be those that offer quick and low-cost approaches that help organizations reduce costs and enable business users to become more productive and self-sufficient.”

Peter Greenburg Worldwide: Additional TSA Security Measures: Progressive or Oppressive?

“Dubbed ‘Secure Flight,’ the new program will require anyone who buys a plane ticket to give their birth date and gender along with their name when they make a reservation. The information will then be cross-referenced against various government ‘watch lists’ of potential terrorists, and only people who don’t match will be cleared to fly. Authorities claim the new procedure will improve the quality of the ‘no fly’ list and prevent misidentifications from occurring.”

Matching – MDM’s “Secret Sauce”

Wednesday, March 18th, 2009

By Dan Power, President and Founder, Hub Solution Designs

Few areas in master data management (MDM) are as critical as identity resolution. Just yesterday, I was working with a client on a matching issue where their customer (a car dealership) was matched to a veterinary clinic because the business names both contained the city and the client had somehow entered the address of the vet clinic in their customer record.

This situation (a “false positive” if ever there was one) is far too common. While the current generation of MDM platforms has come a long way in the last five years, identity resolution is one of the most difficult problems to solve, especially when both your source data and the hub or referential source you’re matching to have data quality issues.

There are several times in a typical MDM project’s life cycle when matching is critical:

•    the initial load of data from the first source system into the hub,

•    every subsequent load of additional source systems being brought into the hub,

•    ongoing data stewardship looking for unmerged duplicate records,

•    providing a robust “New Customer” service to the enterprise, searching for existing records before allowing new ones to be created, and

•    searching in general, bringing back the right number of matches for the user’s criteria.

All of these are critical functions for an MDM system, and all depend heavily on how good your hub’s matching / identity resolution capabilities are.

And once you’ve solved the identity resolution question (“who’s who”) at the “entity” level (typically, either an organization / business or a person / consumer), then you’ve got to handle the “who knows whom” question, looking for obvious and non-obvious relationships between entities.

External content providers like D&B (for businesses) and Acxiom (for consumers) can help with that, but many applications like fraud detection, denied parties lists, terrorist watch lists, etc. typically require a more robust approach.

Matching typically varies widely from company to company and even from application to application within a company. So having an identity resolution engine with a robust set of algorithms that is easily tunable eliminates a lot of the cases of car dealerships being matched to veterinary clinics.

Dan Power is president of Hub Solution Designs, Inc., a consulting firm specializing in master data management and data governance. He has 22 years of experience in management consulting, enterprise applications and strategic alliances at companies like D&B, Deloitte & Touche, and CSC. He writes a popular blog and a column for Information Management magazine, speaks frequently at technology conferences, and regularly advises clients on developing & implementing high impact MDM and data governance strategies.

Identity Resolution Daily Links 2009-03-17

Tuesday, March 17th, 2009

By the Infoglide Team

Happy St. Patrick’s Day!

WORKERS-COMP-NEWS.COM: Workers’ Compensation Fraud

“The insurance companies, in particular, have helped create the myth that many people fake their on-the-job injuries in order to fraudulently collect workers’ compensation benefits. But the truth is, worker fraud is much less common and much less costly than employer and insurance company fraud. In virtually every independent study, worker fraud has been found to be less than 2 percent of total claims.”

Dealerscope: Congress Combats Retail Crime

“Another bill, the bipartisan Organized Retail Crime Act of 2009, was introduced in the House by Reps. Brad Ellsworth (D-Ind.) and Jim Jordan (R-Ohio.) The bill would specifically amend federal code to address organized retail crime, and also force online marketplaces to force off sellers accused of wrongdoing. A third bill, the E-Fencing Enforcement Act of 2009, is sponsored by Democratic Rep. Bobby Scott of Virginia and would push online retailers to halt sales of stolen merchandise.”

newsday.com: Former Lakehurst worker admits disability fraud

“He admitted Monday that he stated on federal forms that he was not employed when in fact he was operating a landscaping and handyman business. Prosecutors say he also failed to report his income from the business. As part of his plea agreement, he agreed to make restitution of more than $180,000 and to file amended tax returns.”

Business Traveller: Security: The TSA is adding a twist to passenger screening

“While privacy advocates believe Secure Flight is tantamount to government restriction on travel, federal authorities insist otherwise, saying it would improve the quality of the watch lists that contain names of suspected terrorism and criminal suspects. They said it would help alleviate the misidentification of innocent passengers placed on ‘no-fly‘ lists because their names are similar to those found on watch lists, which happens all too often.”

WorkersCompensation.com: NC Man Charged With Workers’ Compensation Fraud

“While collecting benefits as a result of the injury, he was found to also be owner and operator of Carolina Comfort Heating and Air Conditioning Company in Charlotte, N.C., the warrant alleges. Days after Connecticut investigators requested records detailing his income, Mr. Gjuraj agreed to pay $40,000 in restitution for the workers’ compensation benefits he had received, the warrant states.”

Identity Resolution Daily Links 2009-03-13

Friday, March 13th, 2009

[Post from Infoglide] Ontology or Living Context?

“IBM’s Jeff Jonas recently wrote a post about what types of problems are appropriate for developing an ontology, i.e. “a rigorous and exhaustive organization of some knowledge domain that is usually hierarchical and contains all the relevant entities and their relations.”  His key point: as the complexity of the data that the ontology is meant to organize increases, the value of the ontology decreases.”

Workers’ Compensation Perspectives: Workers’ Compensation Fraud

“Workers’ compensation fraud is not a victimless crime. The victims of such frauds are not really the workers’ compensation insurers. Instead, the victims are those who play by the rules–the workers, employers and providers/suppliers of services. Workers receive more scrutiny of their claims, employers are subject to more audits and must bear the costs not being covered by those fraudulently under-reporting payroll.”

Chicago Tribune: New airport security rules to require more personal information

“Requiring the airlines to collect more personal information will improve the quality of the watch lists that contain names of possible terrorism and criminal suspects, federal authorities said. It’s also being done to reduce the misidentification of innocent travelers who are mistakenly placed on “no-fly” lists because their names are similar to those found on watch lists—a situation the TSA calls ‘a frustratingly common occurrence.’”

Forrester Blog: BI Nirvana

[Boris Evelson] “I had an amazing client experience the other day. I searched long and hard for a client with flawless, perfect, 100% efficient and effective BI environment and applications. My criteria were tough and that’s why it took me so long (I’ve been searching for as long as I’ve been in the BI business, almost 30 years).”

Insurance Journal: Texas Mutual Fraud Investigations Recover $4.1M in 2008

“According to the company, the $4.1 million includes: claimant fraud future savings of $2,185,170; restitution of $39,374 from claimants; restitution of $1,569,873 in premium fraud; and prevention of $301,467 in health care provider fraud and abusive billing.”

TAWPI Blog: Mastering Data Management

MDM tends to come across like an infrastructure project or middleware — something that IT would sponsor, according to Dan Power, president of Hub Solution Designs Inc., an MDM consulting firm. But placing sponsorship with IT misses the point, he said. The line of business needs to sponsor the project because it can identify the business value the data holds and how a single view of that data can affect the bottom line.”

Ontology or Living Context?

Wednesday, March 11th, 2009

By Charles Clendenen, Infoglide Director of Professional Services

IBM’s Jeff Jonas recently wrote a post about what types of problems are appropriate for developing an ontology, i.e. “a rigorous and exhaustive organization of some knowledge domain that is usually hierarchical and contains all the relevant entities and their relations.”  His key point: as the complexity of the data that the ontology is meant to organize increases, the value of the ontology decreases.

Building on that idea, it’s equally true that the dynamic versus static nature of the data sources determines the value of the ontology. John Ripley speaks about a “living context” that continuously evolves and adapts to the data. The notion is that challenging problems arise from the reality that today’s solutions draw from dynamic and disparate data stores. e.g. departmental silos, web services, semantic web info, and public and private data sources, and that “facts” in an online, interconnected environment change constantly.

Dealing with all that complexity and change using a static ontology is not feasible for many problem domains. New relationships between entities and new attribute values for those entities require constant readjustment. Flexibility is key. Every assumption is an opportunity for errors to be introduced and compounded. Every change introduces a new piece to the puzzle, and the more rigid the ontology, the more likely it is that the new piece will not fit.

That the available data will change is a certainty. That our view of the data will become more complex is likely. That we will be able to adapt new data to fit existing technology is not at all certain. It certainly makes more sense to assume that entity resolution technology must adapt to the data, not the other way around. Change will happen, and we must be prepared to deal with it.

Identity Resolution Daily Links 2009-03-10

Tuesday, March 10th, 2009

By the Infoglide Team

Homeland Security Watch: Fusion Center Focus

“The President’s budget proposal to Congress has increased federal support for state-operated fusion centers.  This sustained support is consistent with recommendations of  an April 2008 GAO study. The fusion centers are an essential element in anticipating and preventing terrorist activity.”

WorkersCompensation.com: Colorado Woman Accused Of Workers’ Comp Fraud

“Rondina-Fitzgerald began accepting benefits after injuring her back while employed as a veterinary assistant. She later submitted signed statements to the Insurance Fund claiming she was not employed. At the same time, she was actually working as a waitress. If she is convicted, Rondina-Fitzgerald could be sentenced to four years in prison.”

Jeff Jonas: Ontology And Why I Am Not Obsessed With This Fancy Little Overrated Word

“I think in many cases ontologies are best not pre-defined, more ideally the structures and hierarchies should emerge based on actual use/context. They are not static – they evolve and accumulate over time.”

California Employer Bulletin: Economy Affecting Workers’ Comp Leaves?

California Insurance Commissioner Steve Poizner has announced a $1 million grant to help fight insurance fraud in Fresno County, noting that ‘In this struggling economy, it is more important than ever to help businesses to stay and expand in the state…fraud drives up the costs of workers’ compensation insurance.’”


Bad Behavior has blocked 594 access attempts in the last 7 days.

Close
E-mail It
Portfolio Strategy News The Direct Marketing Voice