HOME

Identity Resolution Daily Links 2009-09-04

September 4th, 2009

[Post from Infoglide] Shell Games

“We’ve talked before about how some employers will dissolve a company, then re-form it with the same people but under a new name. The objective? Reduce payments to workers’ compensation programs, where premiums are based on the historical level of claims. Erase the history by forming a new company, and voila! Your premiums are now lower, but there’s a catch – doing that constitutes fraud and it’s illegal.”

OCDQ Blog: To Parse or Not To Parse

[Jim Harris] “Data matching often uses data standardization to prepare its input.  This allows for more direct and reliable comparisons of parsed sub-fields with standardized values, decreases the failure to match records because of data variations, and increases the probability of effective match results.”

kpvi.com: Idaho Falls Woman Arrested in Undercover Lottery Sting

“An undercover operative gave McKelley a decoy lottery ticket that McKelley thought was worth at least $100,000. She kept the ticket and took [it] to Boise to the Idaho Lottery Headquarters to collect. Police arrested her when she showed up and she now faces felony charges of presenting an illegally obtained lottery ticket.”

Initiate Blog: Data Hubs: Master Data Repository or Master Data Service?

“The current reality is that the concept of a data hub includes a much more active approach to data than just storage of a “golden record”. The data hub makes the best decisions on entity and relationship resolution by arbitrating the content of data in the source systems where the master data is created.”

Shell Games

September 2nd, 2009

By Robert Barker, Infoglide Senior VP & Chief Marketing Officer

We’ve talked before about how some employers will dissolve a company, then re-form it with the same people but under a new name. The objective? Reduce payments to workers’ compensation programs, where premiums are based on the historical level of claims. Erase the history by forming a new company, and voila! Your premiums are now lower, but there’s a catch – doing that constitutes fraud and it’s illegal.

Now I just read a Computerworld article about what appears to be a similar scheme.  In the largest H-1B fraud case ever brought by the federal government, prosecutors have filed against a New Jersey IT services company that recruits talent from overseas and gets them H-1B visas. On the surface, that sounds legal, right?

Yes, but you’re required to pay those imported workers at the prevailing wage rate of the state in which they’ll be working. The law won’t let you pay less than the prevailing rate because that would penalize U.S. citizens who could be hired to do the same job.

So, this New Jersey firm allegedly decided to improve their profitability by creating shell firms in Iowa and obtaining the H-1B visas there, where the prevailing wage rate is significantly lower than wages in New Jersey. Oops – now there’s an 18-count indictment against them because they “have substantially deprived U.S. citizens of employment.”

So how would you automate detecting similar situations? I confess I’m not familiar with exactly how this company was caught, but it seems like an obvious opportunity to apply entity resolution technology. In this case, you’d use identity resolution software (see IRE as an example) to connect to multiple data sources, like a database of information on H-1B applicants in various statesand a database of companies who are H-1B filers along with other sources of data such as income tax filings and various corporate filings, then let the software do the work of finding the hidden connections.

Applying this technique to workers’ comp fraud where company officers similarly have formed shells and even dissolved existing companies and started new ones, the results have been quite successful. It would be interesting to consider a similar solution with someone with H-1B investigation experience.

If you’re out there, what are your thoughts?

Identity Resolution Daily Links 2009-08-31

August 31st, 2009

By the Infoglide Team

SearchDataManagement.com: Poor data quality costing companies millions of dollars annually

“The average organization surveyed by Gartner said it loses $8.2 million annually through poor data quality. Further, of the 140 companies surveyed, 22% estimated their annual losses resulting from bad data at $20 million. Four percent put that figure as high as an astounding $100 million.”

A Software Insider’s Point of View: Monday’s Musings: Why Every Social CRM Initiative Needs An MDM Backbone

“Despite the massive scale of collected, fragmented data, Social CRM initiatives complement other relationship management initiatives in asking and answering key questions such as:
* Do we know the identity of the individual?
* Can we tell if there are any apparent and potential relationships?
* How do we know whether or not we have a false positive?”

azcentral.com: East Valley Fusion Center on cutting edge of law enforcement

“And since coming online Sept. 1, 2007, Mesa Police Sgt. Lance Heivilin said the capabilities of the center and its ability to search and manipulate various crime data have only grown.’Our first year we were involved in about 200 cases,’ Heivilin said. ‘In 2008, it was 800 investigations. This year we passed our 2008 numbers in July.’”

Identity Resolution Daily Links 2009-08-28

August 28th, 2009

[Post from Infoglide] Internal and External Views of Identity

“In an earlier post, I stated my view that identity resolution and entity resolution are somewhat different processes.  In particular, I consider identity resolution as a special form of entity resolution in which entity references are resolved by comparing them to the characteristics of a given set of known entities.  Regardless of the approach, identity plays an important role in all forms of entity resolution.”

Modern Medicine: State privacy laws deter EHR adoption in hospitals

“The study looked at 19 states, and shows that states that have enacted medical privacy laws restricting hospitals from disclosing patient information have seen a 24 percent reduction in EHR adoption over a ten-year period, while states without these regulations experienced a 21 percent gain in hospital EHR adoption.”

Workers’ Comp Kit Blog: CALIFORNIA Millions of Dollars Medical Insurance Fraud Scheme

“The defendants  in the outpatient surgery center were accused of participating in a $154 million medical insurance fraud scheme by recruiting 2,841 healthy people nationwide and bribing them with money or low cost cosmetic surgery, to receive unnecessary and dangerous surgeries and submitting fraudulent claims to medical insurance companies.”

OCDQ Blog: Adventures in Data Profiling (Part 4)

“In Part 4, you will continue your adventures in data profiling by going postal…postal address that is, by first analyzing the following fields : City Name, State Appreviation, State Abbreviation, Zip Code, and Country Code.”

Internal and External Views of Identity

August 27th, 2009

By John Talburt, PhD, CDMP, Director, UALR Laboratory for Advanced Research in Entity Resolution and Information Quality (ERIQ)

In an earlier post, I stated my view that identity resolution and entity resolution are somewhat different processes.  In particular, I consider identity resolution as a special form of entity resolution in which entity references are resolved by comparing them to the characteristics of a given set of known entities.  Regardless of the approach, identity plays an important role in all forms of entity resolution.

The identity of an entity is a set of attributes and rules for comparing the attribute values that allow it to be distinguished from all other entities of the same type in a given context.  A key  feature is that identity is context-dependent, i.e., it depends upon the total set of entities under consideration.  For example, a common scheme for creating email addresses in an organization uses a person’s first two initials and last name, e.g. jrtalburt.  In a small organization, this is usually sufficient to make a unique address for each employee.  However, applying this in a much larger pool of users such as the yahoo.com or gmail.com domains quickly surfaces that these attributes are insufficient.

For a more relevant business example, consider the case of a customer, Mary Smith.  For simplicity, assume that the totality of her adult residential address history comprises:
1.    Mary Smith, 123 Oak St, Anytown, NY, 1998-06 to 2000-03
2.    Mary Jones, 234 Elm St, Anytown, NY, 2000-04 to 2002-11
3.    Mary Jones, 345 Pine St, Anytown, NY, 2002-12 to present

Despite having used 2 names and 3 addresses, these are all references to the same person. There are two ways to view the issue of identity as illustrated by this history.

One is to start with the identity based on vital statistics, e.g. Mary Smith, a female born on December 3, 1980, in Anytown, NY, to parents Robert and Susan Smith, then to follow that identity through its various representations of name and address as shown above.  This “internal view of identity” is the view of Mary Smith herself and might well be the view of a sibling or other close relative, someone with complete knowledge about her address history.  The internal view of identity represents a closed universe model in which all of the possible occupancy variants are known to the internal viewer (system) and any occupancy record not equivalent to one of the known variants must belong to some other identity.

On the other hand, an external view of identity is one in which some number of address records for a customer’s identity have been linked, but the viewer (system) does not know if it is the complete history.  Given another customer address record not equivalent to one of the records in the history, it must be determined if it does or does not belong to Mary’s history.

Suppose that a system has only the first two address records of Mary’s history.  In this case, the system’s knowledge of Mary’s identity would be incomplete.  It may be incomplete because either the third address record is not in the system (has not been acquired) or because the system hasn’t linked it to the first two records.  In the latter case, the system would assume that the third record is part of a different customer’s identity.  Even though an internal viewer would know that the third address record should also be part of the Mary’s complete history, the external viewer has not made that determination.

Conversely, an external viewer may assemble an inaccurate view of Mary’s history by linking the first two records of her address history to an address for a different Mary Smith.  These entity resolution failures, incomplete and inaccurate histories, are information quality dimensions and indicate why the areas of entity resolution and information quality are so closely related. (Several classes of failures were discussed in another recent post.)

In an external view, the identity of the customer is equivalent to the set of occupancy records that have been resolved (i.e. linked).  The known address records comprise the external viewer’s (or system’s) entire knowledge of the customer’s identity.  If additional occupancy records are acquired and are correctly determined to be for this same customer, then the system’s knowledge about this identity increases.

The external view of identity reflects the experience of a business or government agency using entity resolution tools and processes in an effort to link disparate records into a single view of a customer or agency client.  The “external view of identity” represents an open universe model because if the system is presented with a new occupancy record, it does not necessarily follow that the new records must be a part of a different identity.  It may or may not be part of an existing identity, something that the ER process must decide.

The major point to note is that an internal viewer is in a position to judge the quality of an external view.  With complete knowledge, the internal viewer can determine if any particular external viewer has omitted some records (completeness) or has linked records from different identities or failed to link records for the same identity (accuracy).

Along with Dr. Wang at MIT, I have introduced a quality metric in the form of an index for assessing the similarity of two identity resolutions.  In cases where one resolution represents an internal view (correct) and the other is an external view, the index provides a metric for entity resolution accuracy. I plan to explain this metric in my next post.

Identity Resolution Daily Links 2009-08-24

August 24th, 2009

By the Infoglide Team

CRMBuyer: The BI Outlook: A Bright Spot of Growth in a Gloomy Economy

“Investing in business intelligence is important for a company now more than ever, agreed Bill Barberg, president of Insightformation and an expert in Balanced Scorecard methodology. Sound business intelligence helps companies make fact-based decisions as they try to navigate in today’s stormy economy, he told CRM Buyer. “Business intelligence can help companies make much better decisions,’ he said.”

OCDQ Blog: Adventures in Data Profiling (Part 3)

“In Part 3, you will continue your adventures by using a combination of field values and field formats to begin your analysis of the following fields: Birth Date, Telephone Number and E-mail Address.”

SearchSOA.com: SOA with MDM prevents messaging confusion

“Increasingly, organizations are designing SOA into the MDM architecture from the beginning, says Dan Power, president and founder of consulting firm Hub Solution Designs Inc. in Hingham, Mass. This creates challenges in meshing the real-time realities with the need to keep the data accurate.”

iHealthBeat: Privacy and Security: Experts Focus on Legal Issues Surrounding EHR Use at AHIMA Summit

“Linda Kloss, AHIMA CEO, said many vendors have not focused on developing legally defensible EHR systems. In addition, health care providers have not created a demand for such functionality.”

Identity Resolution Daily Links 2009-08-21

August 21st, 2009

[Post from Infoglide] Walking the Privacy/Security Tightrope

“In a post last April, we talked about the privacy/security balance issue for fusion centers and for vendors with supporting technology. Now an article in the Austin Sunday paper about a proposed fusion center again highlights the tension between security and privacy. Each time a fusion center is proposed, the story goes like this…”

information management: MDM for Tough Times: 5 trends to strengthen organizations during recession

[Aaron Zornes] “Enterprise MDM solutions are steadily but rapidly evolving away from data-centric hubs into full-blown application stacks. In other words, MDM is becoming less of a standalone technology infrastructure as the emphasis is increasingly on relationships between domains, user interface and integration with other emerging and adjacent technologies such as RFID, entity analytics and business intelligence.”

InformationWeek: Healthcare Tech: Can BI Help Save The System?

“Healthcare IT is a good place to be these days. While IT budgets in many verticals have been tightly reined, healthcare is enjoying multiple government mandates. This has resulted in an infusion of funds to modernize and integrate IT infrastructure, applications, and data. However, we aren’t starting from a high ground. There are multiple challenges to attaining a 21st century-grade IT environment.”

OCDQ Blog: Adventures in Data Profiling (Part 2)

“The adventures began with the following scenario – You are an external consultant on a new data quality initiative.  You have got 3,338,190 customer records to analyze, a robust data profiling tool, half a case of Mountain Dew, it’s dark, and you’re wearing sunglasses…ok, maybe not those last two or three things – but the rest is true.”

VIDEO: Interview with Secure Flight

TSA Secure Flight Program Director Paul Leyh is interviewed about recent developments.

Walking the Privacy/Security Tightrope

August 19th, 2009

By Mike Shultz, Infoglide Software CEO

In a post last April, we talked about the privacy/security balance issue for fusion centers and for vendors with supporting technology. Now an article in the Austin Sunday paper about a proposed fusion center again highlights the tension between security and privacy. Each time a fusion center is proposed, the story goes like this:

“Local law enforcement officials see benefit of two-way information sharing with other local, state, and national agencies… privacy groups are concerned about unnecessary intrusions into personal information.”

As of July 2009, 72 such centers have been put in place and are operational across the country. The Department of Homeland Security (DHS), in conjunction with the Justice Department, has tried to address the need for consistent operating principles. Starting in 2005, they published and continue to maintain a set of guidelines suggesting how to establish collaboration and data sharing between agencies while protecting the privacy and civil liberties of citizens.

It would be nice to report that every fusion center has performed flawlessly in solving crimes while preserving American freedoms. Given that they are run by human beings, execution at every center hasn’t always fallen within the guidelines. There are instances where the centers have been ineffective, and there are instances where controversial privacy issues have been raised when centers overstepped their bounds.

The Austin American Statesman article presented a balanced view of the issues surrounding fusion centers without sensationalizing them. Instances of controversies surrounding fusion centers were discussed, yet instances of the benefits of existing centers were also given.

As Jack Thomas Tomarchio, former deputy undersecretary for intelligence and analysis operations at DHS was quoted, “These things are brand new. They haven’t been around 20 years, and even the ones that have been around three or four years are still in their formative years. In many cases, they don’t have a track record.”

While existing software technology addresses both privacy and security issues, the ultimate decision to use it wisely falls to the people who run the fusion centers. In the City of Austin case, the concerns of privacy and security seem to be receiving equal consideration so that the best results can be achieved without trampling on civil liberties.

Identity Resolution Daily Links 2009-08-10

August 17th, 2009

By the Infoglide Team

Jeff Jonas: Your Movements Speak for Themselves: Space-Time Travel Data is Analytic Super-Food!

“I’ve seen a lot of data in my life, and I’d like to think I have a decent grip on what can be accomplished with data and analytics.  However, I recently stumbled upon some facts that have radically reshaped my understanding of the world we are living in.  What I thought was years away is already here! Our toes are dangling over the edge of a very different future.”

information management: Styles and Architectures for Master Data Management

“We have conducted a survey aimed at gaining deeper insight into the views and plans of businesses regarding their current or planned MDM initiatives, focused on the styles and architectures adopted or planned to be implemented.”

statesman.com: Police, sheriffs establishing regional intelligence center

“David Carter, an Austin assistant police chief in charge of the intelligence center project, said analysts stationed at the facility also will stitch together information collected by various agencies to create new files on suspects in criminal cases or on suspects they think may be planning to carry out crimes and merit further surveillance.”

Secure Flight on CBS: View story broadcast on August 13

Identity Resolution Daily Links 2009-08-14

August 14th, 2009

[Post from Infoglide] Vetting Sharks and Whales

“If you’re not in the casino industry, the title of this post may be meaningless, but for casino managers, “sharks” are the bad guys and “whales” are the good guys. Sharks are people who try to defraud the casino through illegal activities, while whales are the high rollers who are apt to win $20,000 one trip and lost $25,000 the next. If there’s any environment where you’d be motivated as a businessperson to know as much as you can about who you’re dealing with, it’s a casino.”

DATAWARE HOUSING: Business Intelligence and Identity Recognition—IBM’s Entity Analytics

“This article will define master data management (MDM) and explain how customer data integration (CDI) fits within MDM’s framework. Additionally, this article will provide an understanding of how MDM and CDI differ from entity analytics, outline their practical uses, and discuss how organizations can leverage their benefits.”

Workers’Comp Kit Blog: Failure to Pay Workers Compensation Premiums

“A New York asbestos  contractor failed to pay $1.6 Million in workers’ compensation premiums and will serve four years in prison. Upon his release he will be deported to his home country as he is an illegal immigrant… He repeatedly changed the name of his company.”

The TSA Blog: Secure Flight Q&A II

“Each one of these layers alone is capable of stopping a terrorist attack. In combination their security value is multiplied, creating a much stronger, formidable system. A terrorist who has to overcome multiple security layers in order to carry out an attack is more likely to be pre-empted, deterred, or to fail during the attempt.”


Bad Behavior has blocked 904 access attempts in the last 7 days.

Close
E-mail It
Portfolio Strategy News The Direct Marketing Voice