HOME

Archive for June, 2009

Identity Resolution Daily Links 2009-06-30

Tuesday, June 30th, 2009

By the Infoglide Team

Francine Hardaway’s Blog: Are There Economies of Scale in Medicine?

“The efficiencies come when a group of physicians are all responsible for a patient’s continuity of care, and when they share information such as that possible with electronic health records (EHRs).”

Insurance & Financial Advisor: Poizner, industry oppose California downgrading of insurance fraud felonies

“‘Reclassifying 73 crimes including ‘false insurance claims’ is a disservice to the consumers and businesses in the state of California,” the letter said. “In addition, taking the power out of the hands of the public prosecutor to charge someone with a felony crime will have a serious impact on public safety.’”

BAM INTEL: A Growing Trend - Fusion Centers Connect Private and Public Sector Thinking

“The private sector owns about 80% of all critical infrastructure, and a communication disconnect could result catastrophically in a disaster scenario.”

Identity Resolution Daily Links 2009-06-27

Saturday, June 27th, 2009

[Post from Infoglide] The Real Test of Identity Resolution

“So the title ‘Catching Terrorists and Making the World a Safer Place’ certainly caught my eye! And the content of the post did not disappoint, as the author Chris Boorman of Informatica did a great job of crystallizing the issue that drove the creation of this blog over two years ago: ‘So how do we balance the freedom of movement we have come to expect as hard-working citizens with the need to spot terrorists?’”

[Post from Infoglide] Identity Resolution Daily on Twitter

“At Identity Resolution Daily, we often come across interesting tidbits about entity resolution, and now we can share them in real time. Just add our ID - @IDResolution - to your twitter sources. Happy tweeting!”

GreenvilleOnline.com: Consumers may see insurance rates rise

“According to Love, the average family spends about $1,000 more per year as a result of insurance fraud. That’s felt in higher insurance premiums, taxes, and the cost of goods and services, she said.”

Ezine: Fraud Alert - Lottery Retailers Win More Than Their Customers Do

California Lottery did an undercover sting where they brought, what they knew to be, a winning lottery ticket to a retailer to have it verified. They caught many retailers on hidden camera telling them that the winning ticket was a loser and, subsequently, went on to claim the money themselves. On top of that, a statistician studied big wins of lottery retailers in Ontario, Canada and found that retailers won big jackpots a lot more than you would statistically expect them too.”

data quality pro: Rethinking Data Quality: The Need for a Data Quality Profession

“Processes, projects, products – each of these contributes to the efforts to improve data quality. But they haven’t solved the problems individually or collectively. To really make substantial and sustainable differences in the quality of data we need to take a different approach. We need to think of data quality as a profession.”


The Real Test of Identity Resolution

Wednesday, June 24th, 2009

By Robert Barker, Infoglide Senior VP & Chief Marketing Officer

So the title “Catching Terrorists and Making the World a Safer Place” certainly caught my eye! And the content of the post did not disappoint, as the author Chris Boorman of Informatica did a great job of crystallizing the issue that drove the creation of this blog over two years ago: “So how do we balance the freedom of movement we have come to expect as hard-working citizens with the need to spot terrorists?” His answer is “technology” and of course we agree.

When Identity Resolution Daily first began in the summer of 2007, we pointed out the constant tension between freedom and privacy versus the need for security:

In the US, the debate between personal privacy (and perhaps liberties in general) versus security is a long-standing one with roots in the very founding of the nation itself. Folks interested in obtaining data often wonder how much people are willing to give up in the name of greater security or convenience. On the other hand, those more focused on privacy worry about how data is obtained, what it’s used for and where it ends up.

Infoglide CEO Mike Shultz also discussed the responsibility that comes with providing technology that deals with identity:

It was important to all of us here that we didn’t create some sort of Big-Brother-enabling technology. As a result, we designed software that can resolve identities across multiple sources while protecting data privacy and security.

The point he made about the design of the software being critical is vital, and The Center for Digital Government’s white paper entitled “Resolving Identity: The Importance of Who’s Who and the Search for the Perfect Engine” delves into what technology can do to answer questions like “who’s who” and “who’s related to whom.”

In a more recent post, we talked about the components needed for an effective identity resolution solution. It’s not enough to have great similarity matching algorithms, and it’s not even enough to be able to find hidden connections in real time across millions of rows of data, although both those capabilities are obviously required. The real test in catching terrorists and making the world a safer place using identity resolution is how decision-making is automated and integrated into existing business processes.

Identity Resolution Daily on Twitter

Wednesday, June 24th, 2009

At Identity Resolution Daily, we often come across interesting tidbits about entity resolution, and now we can share them in real time. Just add our ID - @IDResolution - to your twitter sources. Happy tweeting!

Identity Resolution Daily Links 2009-06-22

Monday, June 22nd, 2009

By the Infoglide Team

intelligent enterprise: They Better Get This MDM Program Right

“As reported in The New York Times and on the TSA Web site, the Secure Flight program will improve upon current practices in matching passenger identities to watch lists in many ways. At first glance, this appears to be a well thought-out program that conforms to several basic tenets of Master Data Management (in bold below), in this case for the ‘Customer’ entity.”

EHRWMS: Georgia’s Best EMR Used By Three of Top Ten Pediatricians

“Of approximately 100 respondents, 28 used an EMR, of which 40% used the EncounterPRO Pediatric EMR. There were only three other EMRs used more than once, and they were used by only 10%, 7%, and 7% of the survey respondents respectively.”

Government Executive: Enforcement agencies boost cooperation on drug investigations

“In addition, ICE agents for the first time will fully participate in the Organized Crime Drug Enforcement Task Force Fusion Center. The center allows participating federal, state and local law enforcement agencies, including DEA and the FBI, to share information and analytical resources to enhance their overall investigative capacity.”

SmartData Collective: The Data-Information Continuum

“Data could be considered a constant while information is a variable that redefines data for each specific use. Data is not truly a constant since it is constantly changing. However, information is still derived from data and many different derivations can be performed while data is in the same state (i.e. before it changes again).”

Identity Resolution Daily Links 2009-06-19

Friday, June 19th, 2009

[Post from Infoglide] Speaking of Narrative Fallacy

Nassim Nicholas Taleb’s book The Black Swan: The Impact of the Highly Improbable uses “narrative fallacy” to describe how we humans tend to enhance ex post facto our ability to predict events that in fact are extremely complex and random. A recent post on Netrics HD attempts to leverage this argument to demonstrate the superiority of “Machine Learning” (i.e. probabilistic analysis) over “data matching” (i.e. deterministic analysis).

advance: Security and Privacy Challenges to EHR Adoption

“Lest we forget, our country is trying to establish similar capabilities with the widespread initiative to implement electronic health records (EHRs). My health history should travel with me — just as easily as my financial information. With some sort of authentication process, a “core” set of data should be easily available to assist in my receipt of health services.”

New York Times: Flying? Don’t Book Under a Nickname

“The government’s aim is to streamline the process of checking travelers’ names against its watch lists — a task currently handled separately by each airline — and to collect more detailed information so passengers with names similar to those on the watch list are less likely to be mistakenly detained. Asking for a birth date, for instance, decreases the likelihood that a child with a name close to one on the list would be subject to an additional search — one example of a false match that has led to complaints.”

Integrated Solutions for Retailers: Organized Retail Crime: Scope, Solutions

“Popular targets of organized retail crime rings include Crest Whitestrips, Rogaine, Similac baby formula, razor blades, and pregnancy tests. Having not been stored or managed properly, these items can pose serious health risks for innocent shoppers looking for a good bargain. And, because most of these items are sold “new in box,” well-meaning consumers are unaware that what they purchased may be spoiled or expired  —  and stolen.”

Speaking of Narrative Fallacy

Wednesday, June 17th, 2009

By Robert Barker, Infoglide Senior VP & Chief Marketing Officer

Nassim Nicholas Taleb’s book The Black Swan: The Impact of the Highly Improbable uses “narrative fallacy” to describe how we humans tend to enhance ex post facto our ability to predict events that in fact are extremely complex and random. A recent post on Netrics HD attempts to leverage this argument to demonstrate the superiority of “Machine Learning” (i.e. probabilistic analysis) over “data matching” (i.e. deterministic analysis).

Product managers have a long history of creating oversimplified comparisons to competing products and technologies to demonstrate the superiority of their own. A favorite technique is to set up a straw man that can then be knocked down. In the case under discussion, describe a “rules based” system that is very unwieldy to use and requires huge amounts of time to tune, and embed an underlying premise that assumes each new application of a rules-based system starts from scratch with no accumulated domain-specific intelligence. (Of course, this doesn’t work if you choose a more intelligent identity resolution system for comparison.)

We’ve spent time here before talking about the differences between these two approaches, so I’m not going to restate the details again. Truthfully, probabilistic systems like that from Netrics have their place in screening large amounts of data, but like any system, they have their limitations. While they can reach a certain level of performance in emulating users’ decisions, they typically don’t leave a trail for an investigator to follow, they don’t support a rational drill-down into possible suspect transactions the way that deterministic systems do, and they don’t allow attribute-specific tweaking so you can leverage the information and better understanding that you’ve gained over time.

The larger issue is whether a solution can take advantage of appropriate technologies in appropriate circumstances (e.g. using both probabilistic and deterministic analytics in one solution), rather than being forced into an either/or, one-size-fits-all scenario. Solutions like those offered by identity resolution companies supply a framework that can incorporate all of them.

Identity Resolution Daily Links 2009-06-15

Monday, June 15th, 2009

By the Infoglide Team

New England Journal of Medicine: Use of Electronic Health Records in U.S. Hospitals

“The very low levels of adoption of electronic health records in U.S. hospitals suggest that policymakers face substantial obstacles to the achievement of health care performance goals that depend on health information technology.”

Federal Computer Week: Standard updated for reporting suspicious activity

“The changes from the Office of the Director of National Intelligence’s Program Manager for the Information Sharing Environment (PM-ISE) come as that office continues a pilot program for the SAR information sharing program at sites around the country. The program uses state and local intelligence fusion centers as a node for verifying and disseminating data on suspicious activity through information technology systems.”

Travel Sentry: Secure Flight Q&A

TSA collects as little personal information as possible to conduct effective watch list matching. Also, personal data is collected, used, distributed, stored, and disposed of in accordance with stringent guidelines and all applicable privacy laws and regulations.”

Central Valley Business Times: Three accused of multi-million workers comp fraud

“‘When businesses cheat the system to save money, they are only setting themselves up to pay later — by serving time in prison,’ says state Insurance Commissioner Steve Poizner.”

Identity Resolution Daily Links 2009-06-12

Friday, June 12th, 2009

[Post from Infoglide] Data Source Disintermediation?

“According to Wikipedia, ‘disintermediation is the removal of intermediaries in a supply chain: ‘cutting out the middleman’… Buyers bypass the middlemen (wholesalers and retailers) in order to buy directly from the manufacturer and thereby pay less.’”

[Jim Harris] OCDQ Blog: The Two Headed Monster of Data Matching

“Data matching is commonly defined as the comparison of two or more records in order to evaluate if they correspond to the same real world entity (i.e. are duplicates) or represent some other data relationship (e.g. a family household). Data matching is commonly plagued by what I refer to as The Two Headed Monster…”

CorpWatch: CorpWatch announces release of the CrocTail application and open CorpWatch API

CrocTail provides an interface for browsing information about several hundred thousand U.S. publicly traded corporations and their many foreign and domestic subsidiaries. Information from company Securities and Exchange Commission (SEC) filings has been parsed and annotated by CorpWatch to highlight specific corporate accountability issues. CrocTail also serves as a demonstration of the features and data available through the CorpWatch API.”

Vos Is Neias: Washington - TSA Advising Travelers To Book Airline Tickets Using Full Real Names

“While the T.S.A. has announced Aug. 15 as a target date for the airlines to begin asking for each passenger’s full name, gender and date of birth, and has already begun publicizing the program, called Secure Flight, the agency acknowledged that it would go into effect in phases as the airlines update their systems.”

Data Source Disintermediation?

Wednesday, June 10th, 2009

By Robert Barker, Infoglide Senior VP & Chief Marketing Officer

According to Wikipedia, “disintermediation is the removal of intermediaries in a supply chain: ‘cutting out the middleman’… Buyers bypass the middlemen (wholesalers and retailers) in order to buy directly from the manufacturer and thereby pay less.” Some famous disintermediation examples are:

•    Bookselling (e.g., Amazon’s long-tail marketing of millions of books online)
•    Travel (e.g., Southwest Airlines selling tickets direct to consumers on the web)
•    Computers (e.g., Dell selling computers direct to consumer and businesses over the internet).

Disintermediation was THE hot topic during the dot com boom, but the heady prediction that virtually every industry would be disintermediated has yet to become a reality. Nevertheless, over the past decade or so we’ve all tracked the news as one business model after another is attacked by competitors who seek a way to “disintermediate” a particular sector.

Part of the power of identity resolution solutions derives from the data sources upon which they’re based, and both the quantity and quality of data sources can affect the results. One challenging identity resolution problem we’ve written about that relies on a variety of data sources is insider trading (see Leveraging Identity Resolution Data Sources). Drawing on multiple data internal and external, public and private data sources, identity resolution unwinds multiple degrees of business, friendship, and familial relationships to uncover likely illegal stock market gains.

Now potential disintermediation plays related to data sources are emerging. CrunchBase is a well-known example, offering a free database of technology companies, people, and investors that anyone can edit. San Francisco-based CorpWatch is a non-profit engaged in “investigative research and journalism to expose corporate malfeasance and to advocate for multinational corporate accountability and transparency”. They’ve just announced an API that makes it easier to search SEC data:

“Although the SEC provides a search interface for locating company filings (EDGAR / IDEA), and the subsidiary information is not presented in a standardized format suitable for automated use or insertion into a database. The CorpWatch API uses parsers to “scrape” the subsidiary relationship information from Exhibit 21 of the 10-K filings and provides a well-structured interface for programs to query and process the subsidiary data.”

The free CorpWatch API enables identity resolution and other applications to look up the formal names of corporations, ascertain their relationships to other corporations, find their locations around the world, learn their alternate and formal names, and access other useful information. Up to now, you could only get this kind of information from relatively expensive paid subscriptions from commercial data providers.

Is it possible that the efforts of organizations like CorpWatch point to a future in which an abundance of new, free sources of data will make it even easier to create identity resolution applications?


Bad Behavior has blocked 594 access attempts in the last 7 days.

Close
E-mail It
Portfolio Strategy News The Direct Marketing Voice