HOME

Archive for July, 2009

Identity Resolution Daily Links 2009-07-31

Friday, July 31st, 2009

[Post from Infoglide] Data Finds Data in Real-Time Entity Resolution

“Jeff Jonas of IBM recently quoted from a chapter called “Data Finds Data”  that he co-wrote for a book entitled Beautiful Data: The Stories Behind Elegant Data Solutions, and I was impressed by how well this passage describes the effective use of entity resolution software (e.g., IRE 2.2)…”

IT-Director.com: GRC is not enough

[Philip Howard]”If you think about these different forms of risk, they can mostly be managed within existing GRC frameworks: business risk, data and IT governance and compliance cover five of these seven types of risk. But they don’t cover fraud or cyber attacks or similar security issues.”

SunSentinel.com: Roofer ducked $400,000 in worker’s comp premiums

“Investigators with the state’s Division of Insurance Fraud said Robert McDonald, owner of Gulfstream Roofing Inc., funneled $3 million in payroll through several fake companies between 2002 and 2006, claiming the money was being paid to insured subcontractors instead of his own workers.”

BNET Healthcare: What Can US Learn From European Health IT Experience?

“The three countries also use universal patient identification numbers in health care. This is much easier to do in Europe than it is in the U.S., where the mistrust of government is so high that the issue of having a single patient identifier number is no longer even under discussion. There’s also the small matter of our low EHR adoption rate, which is less than 20 percent for physicians and lower for hospitals. By contrast, most physicians in the three European countries are using some kind of EHR.”

Data Finds Data in Real-Time Entity Resolution

Wednesday, July 29th, 2009

By Robert Barker, Infoglide Senior VP & Chief Marketing Officer

Jeff Jonas of IBM recently quoted from a chapter called “Data Finds Data”  that he co-wrote for a book entitled Beautiful Data: The Stories Behind Elegant Data Solutions, and I was impressed by how well this passage describes the effective use of entity resolution software (e.g., IRE 2.2):

With each new transaction an organization learns something. It is at the moment something is learned that there exists an opportunity, in fact an obligation, to make some sense of what this new piece of data means and respond appropriately. For example, does the address change on the customer record now reveal that this customer is connected to one of your top 50 customers? If an organization cannot evaluate how new data points relate to its historical data holding in real time, the organization will miss opportunities for action.

Most existing entity resolution solutions take full advantage of the technology’s ability to resolve multiple identities into one, and the true innovators also make the most of its capacity for finding hidden, non-obvious relationships. Clearly, corporate decision-making improves greatly when such enhanced information is available in reports, but the ultimate value of entity resolution is realized when it’s applied in real time.

Many organizations maintain massive amounts of online data. They can enhance their existing systems by incorporating identity resolution in the form of software as a service (SaaS). Using the address change example, a web service call from an existing address change function would invoke an identity resolution service to discover second- and third-level relationships based on the new address. Discovery of a new relationship could trigger an alert to the appropriate person in the business unit who monitors activity in the system.

Although entity resolution technology has been evolving for years, it’s still a new concept for most organizations. We’ve begun to discover how using entity resolution can dramatically improve operational decision-making, but there are many undiscovered opportunities still waiting to be exploited.

Identity Resolution Daily Links 2009-07-27

Monday, July 27th, 2009

By the Infoglide Team

information management: Multidomain Master Data Management for Business Success

“All data that flows through an enterprise can be categorized into six different types: who, what, when, where, how and why. Master data is about who, what, when and where. ‘Who’ data is about the parties of interest that matter most to a business or organization including stakeholders, benefactors, customers, suppliers, owners, providers, partners, etc.”

HSToday: DHS Highlights Intelligence Improvements in Report Marking 9/11 Report Anniversary

“To date, 72 fusion centers have been designated throughout the country, with DHS having provided more than $340 million from fiscal years 2004-2009 to state and local governments to support these centers. DHS also deployed the Homeland Security Data Network to 29 fusion centers, which allows the federal government to share information and intelligence with states and provides fusion center staff access to the most current terrorism-related information.”

The Healthcare IT Guy: Guest Article: Why Doctors Hate Electronic Medical Records

“The fact is that doctors love high-tech. They have reason to hate EMRs but not computers and iPhones.”

DecisionStats: Interview Jim Harris Data Quality Expert OCDQ Blog

Jim Harris - ‘I know that Gartner has reported that 25% of critical data within large businesses is somehow inaccurate or incomplete and that 50% of implementations fail due to lack of attention to data quality issues.’”

Identity Resolution Daily Links 2009-07-24

Friday, July 24th, 2009

[Post from Infoglide] Entity Resolution as Data Mining

“In my last post, I suggested that entity resolution in the broadest sense (“Big ER”) really encompasses three activities.  The first is locating and collecting entity references from unstructured sources (entity extraction), the second is resolving and merging references to the same entity (“Little ER”), and the third is analyzing associations among entities.  Not every ER process involves all three activities.”

BeyeNETWORK: Some Perspectives on Quality

[Bill Inmon] “There are then very legitimate circumstances where incorrect data is best left in the database or data warehouse. Stated differently, there is no circumstance where correcting data or not correcting data is the right thing to do. In order to determine which approach is proper, the context of the corrections has to be known. Only then can it be determined whether correcting errors is the proper thing to do.”

Homeland Security Watch: How To Improve Homeland Security: Give the ODNI Oversight Responsibility for Fusion Centers

“To me, fusion centers are a fine example of Darwinian logic in homeland security.  There was no comprehensive national plan to create fusion centers.  In original intent, Founding-Fathers-federalism fashion, states and cities decided they were not getting the intelligence they wanted.  Arizona, Georgia, Illinois, New York and a handful of other jurisdictions took responsibility for processing - or “fusing” - their own intelligence.”

ITBusinessEdge: Master Data Management and the CIO’s Strategic Plan

“If we look at MDM as a collection of techniques providing enterprise-wide data requirements analysis and subsequent implementation of best practices in data management, then the savvy IT manager might cherry-pick from the tools offered by vendors to provide the optimal solution that unifies the view of critical data concepts while satisfying the data quality requirements imposed by a horizontal information solution.”

I, Cringely: Medical Records R Us

“So medical records are an area where IT could make us healthier and, if done correctly, ought to save lots of money, too.  What we need is some form of centralized medical record keeping that preserves patient privacy yet, at the same time, keeps us from shopping all over town for bogus Oxycontin prescriptions.”

Entity Resolution as Data Mining

Wednesday, July 22nd, 2009

By John Talburt, PhD, CDMP, Director, UALR Laboratory for Advanced Research in Entity Resolution and Information Quality (ERIQ)

In my last post, I suggested that entity resolution in the broadest sense (“Big ER”) really encompasses three activities.  The first is locating and collecting entity references from unstructured sources (entity extraction), the second is resolving and merging references to the same entity (“Little ER”), and the third is analyzing associations among entities.  Not every ER process involves all three activities.  As I noted, the “pre-activity” of entity extraction only comes into play when the entity reference sources are unstructured, for example facial recognition in surveillance videos.  Before the facial characteristics can be analyzed and compared to known faces, the portion of the images in the video that represents the face must first be located and extracted.  In image processing this is called “feature extraction” and is the genesis of my use of the term “entity extraction” for this activity.

When the notion of entity resolution first developed, it was in the context of a database entity-relation schema.  In those days, ER was just about merging all the references to the same entity.  There was no entity extraction activity because the information in the database was already structured.  The entity extraction activity grew out of the realization that useful information may reside in a structured format.

Now I’d like to talk about the third activity, exploring networks of associations.  Once you have located and merged all of the references to the same entity, the next step is to ask whether any relationships exist among the entities.  One of the first to be explored was the “household” relationship.  Companies realize that there is value in understanding who’s living with whom at the same location, yet interestingly it is still one of the hardest relationships to define and manage.  The simplest definition is “all the people at the same address with the same last name.”  While simple, it doesn’t capture the nuances of current demographics such as unmarried couples, stepchildren, and extended families.

Exploring entity relationships brings us to the intersection of entity resolution with data mining.  Data mining is all about discovering non-explicit (non-obvious) relationships.  A record or database instance by definition is an explicit relationship among the attribute values, i.e. they belong to the same entity.  However, just as in the case of households, we can discover relationships that are not explicitly given, e.g. people living at the same address.

Building associations is a natural extension of the Little ER process.  Just because there is not enough asserted or inferred evidence to conclude that two references are to the same entity, it may still be possible to establish an association.  For example, a record for Bill Smith at 123 Oak Street and a record for John Doe at 123 Oak Street would not resolve as references to the same person (unless there was evidence of deliberate deception), but it does establish that they shared a residence at some time.  If they shared it at the same time, it might be an important relationship in the context of a criminal investigation, e.g. looking for known associates of Bill Smith.

Like the small world hypothesis and six degrees of separation, entity association can extend many levels beyond direct associations like a shared address.  For example, Bill Smith and John Doe may never have shared the same address, but they may have both shared the same address with Fred Johnson, thus establishing an indirect connection.

This simple example is based on shared address, but entity connections can be established through many combinations of inferred associations such as shared telephone or PO Box address as well as asserted associations such as call records between telephone numbers or change-of-address records.  Just as with entity extraction, the analysis of association networks has its own body of research and knowledge that practitioners can draw upon.

I hope that this series of posts has provided a broader perspective on the variety activities that comprise entity resolution.  I certainly find it a fascinating subject.  In my next post, I will discuss the concept and internal view of identity versus an external view of identity

Identity Resolution Daily Links 2009-07-17

Friday, July 17th, 2009

[Post from Infoglide] iPhones, Identity Resolution, and Cloud Computing

“A personal favorite saying for years has been “invention is the mother of necessity” (a twist on the original saying, of course). It aptly conveys what has driven the high tech industry for the last several decades. Principles like Moore’s Law and its equivalent for the internet have created unanticipated waves of computing and networking power. All that available power has released the combined creativity of tens of thousands of engineers and marketers who dreamed up ways of interacting and managing our lives and businesses that were inconceivable 30 years ago…”

Liliendahl on Data Quality: Match Destinations

“When matching party data – names and addresses – very often it is not just only about hitting similar records, but also about performing some form of transformation with the data before, during and after the hitting.”

Tech Law Notes: Health IT & Open Source

“Repeatedly, I hear the refrain that this stimulus money is going to go to systems that can be put to a “meaningful use,” and that is going to exclude rogue open source Health IT developers from being funded, squelching innovation in the market place.  I imagine that complying with the security regulations under HIPAA probably hinder innovation, too, but they increase the reliability of the system vendors that remain in the market place and reduce the risk to the data of patients that might be in their computer systems.”

The Data Doghouse: People, Process & Politics: Integration Portfolio

“Existing IT projects may be under the label of: Corporate Performance Management (CPM), Master Data Management (MDM), Customer Data Integration (CDI), Product Information Management (PIM), Enterprise Information Management (EIM), Data Warehousing (DW) and Business Intelligence (BI).”

iPhones, Identity Resolution, and Cloud Computing

Tuesday, July 14th, 2009

By Robert Barker, Infoglide Senior VP & Chief Marketing Officer

A personal favorite saying for years has been “invention is the mother of necessity” (a twist on the original saying, of course). It aptly conveys what has driven the high tech industry for the last several decades. Principles like Moore’s Law and its equivalent for the internet have created unanticipated waves of computing and networking power. All that available power has released the combined creativity of tens of thousands of engineers and marketers who dreamed up ways of interacting and managing our lives and businesses that were inconceivable 30 years ago (for a couple of interesting exceptions, check out this and that).

After listening to friends and relatives rave about their iPhones, I recently pre-ordered the new 3G S so I could pick it up the first day. It’s now my favorite toy, um, er… business productivity tool. And it’s funny how such a simple change can lead you to consider new possibilities.

New iPhone applications pop up every day and they’re being consumed at an incredible rate. In fact, the billionth iPhone app was downloaded in April. In watching and participating in this phenomenon, you start to see all kinds of possibilities, and of course my thoughts turned to identity resolution.

For example, if I’m on my way to meet someone new either socially or for business, why shouldn’t I be able to find out everything that’s publicly available about this person and see it on my iPhone? If you assume that at some point identity resolution becomes available via web services and cloud computing, what if you could speak that person’s name and other attributes into your phone and then receive a synopsis of publicly available information about his or her life and associations?

You’re probably screaming “Big Brother” at me right now. I’m talking about publicly available data, so if that idea bothers you, you’re really objecting to the amount of information that is currently freely available about you, a very real concern. All identity resolution may eventually do is make it much easier to aggregate it and boil it down to its essentials.

Hopefully you’ll excuse me for suggesting such an application. We’re experiencing what could end up being the hottest summer on record here in Austin, with quite a few days already with highs over 100, so my thoughts may be a bit muddled.

How about you: got any hot thoughts about identity resolution in the cloud?

Identity Resolution Daily Links 2009-07-13

Monday, July 13th, 2009

By the Infoglide Team

CiOZone: Ask the CIO: What’s your data management strategy?

“Undoubtedly, the recession has interrupted—or scaled back—some MDM-related work. But as Aaron Zornes said during an interview just before the July 4th holiday, ‘It’s not as if projects like risk management or Anti-Money Laundering, or Know Your Customer can wait until the economy improves.’ Some work just has to go forward—rain, shine, or economic downturn.”

Insurance Journal: California Efforts To Reduce Prison Budget Would Hurt Fraud Fighting

“‘Taking the power out of the hands of the public prosecutor to charge someone with a felony crime will have a serious impact on public safety. Insurance fraud is a serious crime that demands serious consequences,’ the Coalition Against Insurance Fraud, National Insurance Crime Bureau, National Health Care Anti-Fraud Association and the International Association of Special Investigation Units said in a joint letter to Schwarzenegger.”

mydesert.com: 6 accused of trying to steal winning lottery tickets in Coachella Valley

“Undercover investigators posing as customers handed clerks decoy winning tickets and asked if they had won, Jeandron said. Some of the clerks told the investigators that their ticket was not a winner, but then went on to file a claim with the Lottery to collect the winnings.”

HISTALK: Monday Morning Update 7/13/09

“I don’t have access to the full text of the article, but I truly believe that once the pain of getting EMRs running as data collection appliances is over (meaning we’ve got data collection clerks known as doctors and nurses in place, which is the ‘pain’ part), the benefit will be incredible.”

Identity Resolution Daily Links 2009-07-10

Friday, July 10th, 2009

[Post from Infoglide] What’s the Data Quality Business Message?

“What’s it going to take to move the data quality space forward in the future? That’s the question recently addressed by Ted Friedman of Gartner as reported in an article in destinationCRM.com. He suggests that the real answer may be messaging.”

CiOZone: Master Data Management Ready For Prime Time

“As with many application areas, Microsoft’s sweeping move into MDM signals a mainstreaming of the field, according to Aaron Zornes, founder and chief research officer of the The MDM Institute, Burlingame, Calif. ‘As a practice, MDM has been going on in some industries since 1980s, but it’s only been formalized with a growing, purpose-built vendor base in recent years,’ he says. ‘In the time since the institute was founded in 2004, the industry has matured considerably.’”

data quality PRO: Data Quality Blog Roundup - June 2009 Edition

“Another marked increase in online publishing this month for the data quality sector. A smattering of new entrants means there is a steady flow of fresh ideas and insight in this months blog roundup.”

Government Security News: OPINION / Analyzing intelligence data: Matching information in foreign languages

“While the challenge is great, there is technology specifically designed to “connect the dots” among persons, places and things of interest. Called “entity resolution,” this technology is coming into the mainstream, specifically in light of the growing urgency to track down terrorists and stop terrorist threats before they happen.”

Life as a Healthcare CIO: International EHR Adoption

“The most widely implemented are England, Denmark, Netherlands, and certain regions of Spain which are close to 100%. Sweden, Norway are at 80% and behind and Germany/France are at 50%. The US is somewhere between 2 and 20%, depending on how you classify a comprehensive EHR.”

What’s the Data Quality Business Message?

Wednesday, July 8th, 2009

By Robert Barker, Infoglide Senior VP & Chief Marketing Officer

What’s it going to take to move the data quality space forward in the future? That’s the question recently addressed by Ted Friedman of Gartner as reported in an article in destinationCRM.com. He suggests that the real answer may be messaging.

“Vendors have done a reasonably poor job in that they could get better at articulating the true business value [of data quality solutions],” he says. The Gartner analyst notes that vendors tend to talk about functionality in terms of technological advances, rather than conveying how that technology actually supports the business infrastructure. Friedman also notes that, in general, vendors could get better at articulating how tools support initiatives such as information governance and regulatory compliance — two notable industry trends.

Ted is right that to call out vendors to improve our messaging. At Identity Resolution Daily, we get down into technical details fairly often. For example, our bloggers have talked about data matching, its relationship to identity resolution, critical requirements for identity resolution, and we’ve had a series on data quality. Professor John Talburt of UALR’s Center for Advanced Research in Entity Resolution and Information Quality (ERIQ) is a regular contributor who talks about technical definitions and issues surrounding these topic areas.

That’s not to say that business issues around entity resolution and information quality have been ignored. Real world problems like lottery retailer fraud have been a frequent topic, as has organized retail crime. Another business problem we’ve talked about is employers trying to cheat workers compensation laws, and we’ve actually discussed regulatory compliance (OK, so we did get a bit technical on that one).

A huge issue related to information governance is preserving the rights of individual privacy. Because of our involvement in TSA’s Secure Flight program, we’ve written about this issue repeatedly since Identity Resolution was created. A recent post captures the essence of the issue.

So at best, I’d have to say we get a C+ or a B- on our messaging. With our upcoming release of IRE 2.2, we’ll make every effort to respond to Ted’s constructive criticism of the data quality space.


Bad Behavior has blocked 708 access attempts in the last 7 days.

Close
E-mail It
Portfolio Strategy News The Direct Marketing Voice