HOME

Archive for the ‘Data-Mining’ Category

Identity Resolution Daily Links 2010-08-24

Tuesday, August 24th, 2010

By the Infoglide Team

Fraud Prevention: Medicare and Medicaid Fraud: US Healthcare Reform

“Earlier this year, a jury found Pfizer owed Wisconsin $9 million for violating the state Medicaid fraud law more than 1.4 million times by purposely overcharging the state for prescription drugs. The company faces potential fines from $140 million to $21 billion.”

Security Debrief: What is a Law Enforcement Fusion Center?

Fusion centers that are doing strategic analysis are best positioned to prevent criminal acts. Trained intelligence analysts in these centers look at a local tip or Suspicious Activity Report (SAR) and then use advanced search tools across many databases simultaneously for indications that the tip could be part of a much bigger ‘iceberg’ hiding below the surface.”

ZDNet: Yankee Group: Infrastructure as a Service now a bona fide cloud strategy

“The survey of 400 enterprises finds ‘24 percent of large enterprises with cloud experience’ are already using IaaS, and an additional 37 percent expect to adopt IaaS during the next 24 months. ‘While adoption is still much slower than that of SaaS solutions, the market is gaining traction,’ says Yankee.”

Detroit Free Press: Tiny name differences on tickets not a worry

“Under its new ‘Secure Flight‘ process, the government compares airline passenger names, gender and birth dates with data on a terror watch list. However, a reservation or boarding pass that uses a middle initial instead of a full middle name, misses a hyphen, contains a tiny typo or leaves off the ‘Jr.’ designation should not cause a problem, according to the Transportation Security Administration.”

Is Government Committed to Solving Healthcare Fraud – Or Not?

Wednesday, June 23rd, 2010

By Mike Shultz, Infoglide Sofware CEO

Last week Rep. Scott Murphy of Glen Falls (D-NY) told a House panel that more effective policing of Medicare and Medicaid fraud claims is imperative in order to reduce the estimated $60 billion in fraudulent claims that is disbursed each year. We agree with the expressed intent, and we are hopeful that CMS and others involved in decision-making will take advantage of any and all technologies to stamp out fraud, including identity resolution.

The press has reported repeatedly that both the Medicare (federal) and Medicaid (state) programs are riddled with fraud. Unscrupulous doctors, medical supply firms, and others involved in the healthcare supply chain are able to extract millions up millions of dollars of taxpayer money from these government programs.

A recent example reported by the Miami Herald told how the Miracle Group Rehabilitation Center was “falsely billing the federal healthcare program $3.1 million over just three months. Medicare paid Ramos $1.9 million for rehab services never provided to angry beneficiaries… This past year, CMS paid approximately $60 billion to criminals impersonating doctors and patients in order to file false claims.” Multiply this by the thousands of dishonest people who abuse these systems and you can easily exceed the $60 billion estimate!

So what is being done? The federal government included legislation targeting healthcare fraud in the comprehensive healthcare bill it passed last year. As we’ve reported before, many states are passing legislation as well.

Passing stiffer laws is great, yet we’re certain the information needed to stop most of the fraud is in data already held by government agencies. Honest suppliers stand by helplessly as competitors cash in on the bonanza. One such supplier wrote in a guest post about how current detection methods are inadequate and how the problem can be attacked with the right technology.

It appears to be a matter of educating those in power about existing technology and having the political resolve to apply it. We hope that happens soon.

Is MDM Dead?

Wednesday, March 3rd, 2010

By Mike Shultz, Infoglide Software CEO

Andrew White of Gartner recently posed a question about whether master data management (MDM) is dead. He didn’t actually suggest that the demise of master data management is imminent. He was challenging whether our current terminology adequately clarifies the current reality about MDM and associated product areas.

Certainly the terms describing many markets and types of products are being associated with MDM. Jackie Roberts of DATAForge pointed out that the definition of MDM now seems to include “data integrity, data quality, entity resolution, matching, data integration, governance, metrics and analysis.”

While entity resolution was mentioned in her list, our obsessive focus on entity resolution (aka identity resolution) leads to the conclusion that, rather than being subsumed, its role is growing. Wayne Eckerson at TDWI seems to agree that identity resolution is a critical component of the recent MDM acquisitions. In his post about the acquisitions by Informatica and IBM of Siperian and Initiate Systems, respectively, he described the two transactions this way:

“You could say that Siperian is mostly MDM, but with identity resolution and other capabilities, whereas Initiate is mostly about identity resolution, but with MDM and other capabilities.”

Identity resolution is becoming an integral part of many product areas. Within MDM itself, creating a single-entity view is best done with an identity resolution engine. Data mining is greatly enhanced by the addition of entity resolution. Dan Power of Hub Solution Designs wrote about how key identity resolution is to data matching. We’ve talked about how social CRM can resolve identities of individuals across multiple disparate data sources using identity resolution, as well as “rationalize multiple variations and errors and anomalies that block finding existing customers within their systems”.

Although identity resolution technology has been years in the making, it has only recently risen into the consciousness of most analysts and customers. Because of its ability to bring enhanced clarity to ambiguous data, advanced identity resolution is now beginning to have a significant impact across many data-centered disciplines.

Identity Resolution Still On the Rise

Wednesday, February 17th, 2010

By Mike Shultz, Infoglide Software CEO

We’ve noted several times over the past couple of years how the market visibility of entity resolution has been evolving. Now the consolidation of the master data management (MDM) market is causing even more conjecture about the crucial role of this technology.

We’re continually on the lookout for the trends and opportunities that affect the identity resolution space. We’ve written about entity resolution moving into cloud computing, the growing use of entity resolution by state agencies, the crucial role that identity resolution plays in fusion centers, how it’s related to “social CRM”, and how it might be used in e-discovery.

A few days after IBM’s recent announcement about buying Initiate Systems and a little over a week after Informatica’s acquisition of Siperian, Wayne Eckerson at tdwi wrote an insightful article in which he noted that these acquisitions are about MDM, yet they are also about identity resolution:

“Siperian is well-known for its master data management (MDM) solution… Initiate, on the other hand, is well-known for its identity resolution hub… At this point, I need to cycle back to Siperian and point out that it, too, provides identity resolution capabilities. And I forgot to mention that Initiate also has some MDM capabilities. You could say that Siperian is mostly MDM, but with identity resolution and other capabilities, whereas Initiate is mostly about identity resolution, but with MDM and other capabilities.”

Considering IBM’s acquisition of Initiate Systems, along with Informatica’s purchase of Siperian shortly before that, plus its 2008 purchase of Identity Systems, it’s clear that IdentityResolutionDaily is going to have even more to write about this year than before!

Privacy – A Dying Concept?

Wednesday, October 7th, 2009

By Gary Seeger, Infoglide Vice President

An intriguing post by Nate Anderson on Ars Technica highlights a difficult reality about today’s easy availability of vast quantities of “anonymized” data. Quoting from a recent paper by Paul Ohm at the University of Colorado Law School, Anderson writes that “as Ohm notes, this illustrates a central reality of data collection: ‘data can either be useful or perfectly anonymous but never both.’”

A seminal study published in 2000 by Latanya Sweeney at Carnegie Mellon opened the issue by proving that a simple combination of a very small number of publicly available attributes can uniquely identify individuals:

“It was found that 87% (216 million of 248 million) of the population in the United States had reported characteristics that likely made them unique based only on {5-digit ZIP, gender, date of birth}. About half of the U.S. population (132 million of 248 million or 53%) are likely to be uniquely identified by only {place, gender, date of birth}, where place is basically the city, town, or municipality in which the person resides… In general, few characteristics are needed to uniquely identify a person.”

Faced with a choice between exploiting easily obtainable data for righteous ends versus the potential misuse of identifying individuals, can an appropriate balance be struck by privacy legislation? Anderson points out that:

“Because most data privacy laws focus on restricting personally identifiable information (PII), most data privacy laws need to be rethought. And there won’t be any magic bullet; the measures that are taken will increase privacy or reduce the utility of data, but there will be no way to guarantee maximal usefulness and maximal privacy at the same time.”

Looking at the subject from a business perspective, using technologies such as identity resolution to connect non-obvious data relationships serves many initiatives. It would seem admirable to exploit public records and other forms of publicly available information to mitigate risks, uncover fraud, or track down “bad” guys. Yet some cry foul when the technology exposes individuals who didn’t anticipate that their “private” information would be used to identify and/or track them down.

In the rapidly evolving cyber-information age, the desires, conflicts, and limitations of protecting privacy will continue to be sorted out in the legal realm. Those of us who solve business issues using identity resolution technology will swim in this legal quagmire for many years. Finding an appropriate balance between the protection of individual privacy and bona fide business uses of “public” data will almost certainly be a growing challenge to the moral and legal minds of our community.

Identity Resolution Daily Links 2009-08-21

Friday, August 21st, 2009

[Post from Infoglide] Walking the Privacy/Security Tightrope

“In a post last April, we talked about the privacy/security balance issue for fusion centers and for vendors with supporting technology. Now an article in the Austin Sunday paper about a proposed fusion center again highlights the tension between security and privacy. Each time a fusion center is proposed, the story goes like this…”

information management: MDM for Tough Times: 5 trends to strengthen organizations during recession

[Aaron Zornes] “Enterprise MDM solutions are steadily but rapidly evolving away from data-centric hubs into full-blown application stacks. In other words, MDM is becoming less of a standalone technology infrastructure as the emphasis is increasingly on relationships between domains, user interface and integration with other emerging and adjacent technologies such as RFID, entity analytics and business intelligence.”

InformationWeek: Healthcare Tech: Can BI Help Save The System?

“Healthcare IT is a good place to be these days. While IT budgets in many verticals have been tightly reined, healthcare is enjoying multiple government mandates. This has resulted in an infusion of funds to modernize and integrate IT infrastructure, applications, and data. However, we aren’t starting from a high ground. There are multiple challenges to attaining a 21st century-grade IT environment.”

OCDQ Blog: Adventures in Data Profiling (Part 2)

“The adventures began with the following scenario – You are an external consultant on a new data quality initiative.  You have got 3,338,190 customer records to analyze, a robust data profiling tool, half a case of Mountain Dew, it’s dark, and you’re wearing sunglasses…ok, maybe not those last two or three things – but the rest is true.”

VIDEO: Interview with Secure Flight

TSA Secure Flight Program Director Paul Leyh is interviewed about recent developments.

Identity Resolution Daily Links 2009-07-31

Friday, July 31st, 2009

[Post from Infoglide] Data Finds Data in Real-Time Entity Resolution

“Jeff Jonas of IBM recently quoted from a chapter called “Data Finds Data”  that he co-wrote for a book entitled Beautiful Data: The Stories Behind Elegant Data Solutions, and I was impressed by how well this passage describes the effective use of entity resolution software (e.g., IRE 2.2)…”

IT-Director.com: GRC is not enough

[Philip Howard]”If you think about these different forms of risk, they can mostly be managed within existing GRC frameworks: business risk, data and IT governance and compliance cover five of these seven types of risk. But they don’t cover fraud or cyber attacks or similar security issues.”

SunSentinel.com: Roofer ducked $400,000 in worker’s comp premiums

“Investigators with the state’s Division of Insurance Fraud said Robert McDonald, owner of Gulfstream Roofing Inc., funneled $3 million in payroll through several fake companies between 2002 and 2006, claiming the money was being paid to insured subcontractors instead of his own workers.”

BNET Healthcare: What Can US Learn From European Health IT Experience?

“The three countries also use universal patient identification numbers in health care. This is much easier to do in Europe than it is in the U.S., where the mistrust of government is so high that the issue of having a single patient identifier number is no longer even under discussion. There’s also the small matter of our low EHR adoption rate, which is less than 20 percent for physicians and lower for hospitals. By contrast, most physicians in the three European countries are using some kind of EHR.”

Identity Resolution Daily Links 2009-07-24

Friday, July 24th, 2009

[Post from Infoglide] Entity Resolution as Data Mining

“In my last post, I suggested that entity resolution in the broadest sense (“Big ER”) really encompasses three activities.  The first is locating and collecting entity references from unstructured sources (entity extraction), the second is resolving and merging references to the same entity (“Little ER”), and the third is analyzing associations among entities.  Not every ER process involves all three activities.”

BeyeNETWORK: Some Perspectives on Quality

[Bill Inmon] “There are then very legitimate circumstances where incorrect data is best left in the database or data warehouse. Stated differently, there is no circumstance where correcting data or not correcting data is the right thing to do. In order to determine which approach is proper, the context of the corrections has to be known. Only then can it be determined whether correcting errors is the proper thing to do.”

Homeland Security Watch: How To Improve Homeland Security: Give the ODNI Oversight Responsibility for Fusion Centers

“To me, fusion centers are a fine example of Darwinian logic in homeland security.  There was no comprehensive national plan to create fusion centers.  In original intent, Founding-Fathers-federalism fashion, states and cities decided they were not getting the intelligence they wanted.  Arizona, Georgia, Illinois, New York and a handful of other jurisdictions took responsibility for processing - or “fusing” - their own intelligence.”

ITBusinessEdge: Master Data Management and the CIO’s Strategic Plan

“If we look at MDM as a collection of techniques providing enterprise-wide data requirements analysis and subsequent implementation of best practices in data management, then the savvy IT manager might cherry-pick from the tools offered by vendors to provide the optimal solution that unifies the view of critical data concepts while satisfying the data quality requirements imposed by a horizontal information solution.”

I, Cringely: Medical Records R Us

“So medical records are an area where IT could make us healthier and, if done correctly, ought to save lots of money, too.  What we need is some form of centralized medical record keeping that preserves patient privacy yet, at the same time, keeps us from shopping all over town for bogus Oxycontin prescriptions.”

Entity Resolution as Data Mining

Wednesday, July 22nd, 2009

By John Talburt, PhD, CDMP, Director, UALR Laboratory for Advanced Research in Entity Resolution and Information Quality (ERIQ)

In my last post, I suggested that entity resolution in the broadest sense (“Big ER”) really encompasses three activities.  The first is locating and collecting entity references from unstructured sources (entity extraction), the second is resolving and merging references to the same entity (“Little ER”), and the third is analyzing associations among entities.  Not every ER process involves all three activities.  As I noted, the “pre-activity” of entity extraction only comes into play when the entity reference sources are unstructured, for example facial recognition in surveillance videos.  Before the facial characteristics can be analyzed and compared to known faces, the portion of the images in the video that represents the face must first be located and extracted.  In image processing this is called “feature extraction” and is the genesis of my use of the term “entity extraction” for this activity.

When the notion of entity resolution first developed, it was in the context of a database entity-relation schema.  In those days, ER was just about merging all the references to the same entity.  There was no entity extraction activity because the information in the database was already structured.  The entity extraction activity grew out of the realization that useful information may reside in a structured format.

Now I’d like to talk about the third activity, exploring networks of associations.  Once you have located and merged all of the references to the same entity, the next step is to ask whether any relationships exist among the entities.  One of the first to be explored was the “household” relationship.  Companies realize that there is value in understanding who’s living with whom at the same location, yet interestingly it is still one of the hardest relationships to define and manage.  The simplest definition is “all the people at the same address with the same last name.”  While simple, it doesn’t capture the nuances of current demographics such as unmarried couples, stepchildren, and extended families.

Exploring entity relationships brings us to the intersection of entity resolution with data mining.  Data mining is all about discovering non-explicit (non-obvious) relationships.  A record or database instance by definition is an explicit relationship among the attribute values, i.e. they belong to the same entity.  However, just as in the case of households, we can discover relationships that are not explicitly given, e.g. people living at the same address.

Building associations is a natural extension of the Little ER process.  Just because there is not enough asserted or inferred evidence to conclude that two references are to the same entity, it may still be possible to establish an association.  For example, a record for Bill Smith at 123 Oak Street and a record for John Doe at 123 Oak Street would not resolve as references to the same person (unless there was evidence of deliberate deception), but it does establish that they shared a residence at some time.  If they shared it at the same time, it might be an important relationship in the context of a criminal investigation, e.g. looking for known associates of Bill Smith.

Like the small world hypothesis and six degrees of separation, entity association can extend many levels beyond direct associations like a shared address.  For example, Bill Smith and John Doe may never have shared the same address, but they may have both shared the same address with Fred Johnson, thus establishing an indirect connection.

This simple example is based on shared address, but entity connections can be established through many combinations of inferred associations such as shared telephone or PO Box address as well as asserted associations such as call records between telephone numbers or change-of-address records.  Just as with entity extraction, the analysis of association networks has its own body of research and knowledge that practitioners can draw upon.

I hope that this series of posts has provided a broader perspective on the variety activities that comprise entity resolution.  I certainly find it a fascinating subject.  In my next post, I will discuss the concept and internal view of identity versus an external view of identity

Identity Resolution Daily Links 2009-06-22

Monday, June 22nd, 2009

By the Infoglide Team

intelligent enterprise: They Better Get This MDM Program Right

“As reported in The New York Times and on the TSA Web site, the Secure Flight program will improve upon current practices in matching passenger identities to watch lists in many ways. At first glance, this appears to be a well thought-out program that conforms to several basic tenets of Master Data Management (in bold below), in this case for the ‘Customer’ entity.”

EHRWMS: Georgia’s Best EMR Used By Three of Top Ten Pediatricians

“Of approximately 100 respondents, 28 used an EMR, of which 40% used the EncounterPRO Pediatric EMR. There were only three other EMRs used more than once, and they were used by only 10%, 7%, and 7% of the survey respondents respectively.”

Government Executive: Enforcement agencies boost cooperation on drug investigations

“In addition, ICE agents for the first time will fully participate in the Organized Crime Drug Enforcement Task Force Fusion Center. The center allows participating federal, state and local law enforcement agencies, including DEA and the FBI, to share information and analytical resources to enhance their overall investigative capacity.”

SmartData Collective: The Data-Information Continuum

“Data could be considered a constant while information is a variable that redefines data for each specific use. Data is not truly a constant since it is constantly changing. However, information is still derived from data and many different derivations can be performed while data is in the same state (i.e. before it changes again).”


Bad Behavior has blocked 1423 access attempts in the last 7 days.

Close
E-mail It
Portfolio Strategy News The Direct Marketing Voice