Archive for April, 2009

Stuck in the Middle

Wednesday, April 29th, 2009

By Robert Barker, Infoglide Senior VP & Chief Marketing Officer

Clowns to the left of me, Jokers to the right,
Here I am, stuck in the middle with you.

One reason Identity Resolution Daily began two years ago was to create a venue to address privacy/security controversies. Because we supply and support core technology used in DHS’s Secure Flight program that performs airline passenger watch list matching, we established a vehicle to discuss how powerful technology can help the country be more secure while simultaneously protecting privacy.

The discussion rages on: how can we balance society’s competing needs for privacy and security? Fusion centers were created to increase the collaboration and effectiveness of law enforcement agencies in combating crime and terrorism, but now we see privacy groups and legislatures (among others) pitted against each other, and guess who’s caught in the middle? Those enforcing the law.

From wikipedia, “a Fusion Center is a terrorism prevention and response center” that gathers “information not only from government sources, but also from their partners in the private sector. They are designed to promote information sharing at the federal level between agencies such as the CIA, FBI and Department of Justice) and at the state and local level.”

A concept developed as a response to the events of 9/11, the objective of fusion centers is to coordinate law enforcement efforts to prevent future terrorist events. While 58 state and local fusion centers have been implemented, standardization is lacking when it comes to how they operate and what they focus on.

Of course, any effort that deals with personal information produces the potential for abuse. Recent news stories have raised cries from both the left and right ends of the political spectrum, resulting in some strange partnerships.

So where do we fall on this issue? You might say we’re stuck in the middle. Like law enforcement agencies, we’re trying to do our job as best we can. In the case of the agencies, it’s catching the bad guys before they do damage, yet without infringing on citizens’ privacy. For us, that means supplying software that allows them to do just that.

Identity Resolution Daily Links 2009-04-27

Monday, April 27th, 2009

By the Infoglide Team

New York Times: Name Not on Our List? Change It, China Says

“By some estimates, 100 surnames cover 85 percent of China’s citizens. Laobaixing, or “old hundred names,” is a colloquial term for the masses. By contrast, 70,000 surnames cover 90 percent of Americans. The number of Chinese family names in use has tended to shrink as China’s population has grown, a winnowing of surnames that has occurred in many cultures over time.”

OCDQ Blog: All I Really Need To Know About Data Quality I Learned In Kindergarten

“When you present the business case for your data quality initiative to executive management and other corporate stakeholders, remember the lessons of show and tell.  Poor data quality is not a theoretical problem - it is a real business problem that negatively impacts the quality of decision critical enterprise information.”

BTNonline: Secure Flight Roils Booking Tech

“To facilitate the implementation of Secure Flight’s new data requirements for the travel industry, officials from the International Air Transport Association and Department of Homeland Security this year decided to use passenger data fields already used to transmit visa and passport information. TSA noted those IATA standards go into effect May 1.”

Security Systems News: Retail industry to ’speak with a single voice’

“There will now be a single entity both helping to establish best practices for loss prevention and lobbying state and federal government in regard to major security issues like organized retail crime.”

Identity Resolution Daily Links 2009-04-24

Friday, April 24th, 2009

[Post from Infoglide] Solving the False Negative Problem

“In my March 25, 2009 post “The Myth of Matching,” I discussed the confusion between entity resolution and matching as in record de-duplication.  Matching is a necessary part of entity resolution, but it is not sufficient.”

Semantic Web Company: Chris Bizer: Within the corporate market, there is interest in using Linked Data as a lightweight, pay-as-you-go data integration technology

“‘I think we will see a growing number of applications that use data from the public Web as background knowledge to offer better search capabilities and to augment local content with additional content from the Web of Data.’”

TheStreet.com: Ombudsman: Iowa Lottery should focus on fraud

“‘Many of these were the types of cases where the lottery investigator would need to ‘make the case,’ ‘ the report said. ‘Most of the time they didn’t even try.’ The report also said that even when the lottery discovered cases of fraud or theft by retailers, the retailer wasn’t held accountable.”

Security Management: Fusion Center Dialogue Continues

“We don’t have to choose between security and liberty. In order to be effective, intelligence activities need to be narrowly focused on real threats, tightly regulated and closely monitored.”

data quality pro: Expert Interview With Dan Power of Hub Solution Designs Inc.

“Sometimes, the business comes forward and says “we’ve got to have the single view of the customer”. Sometimes, IT sees it as a way to become more agile and to reduce system maintenance costs. It is pretty clear, though, that MDM initiatives are more likely to succeed when they’re driven by the business, even if it may have been originally initiated by IT.”

Solving the False Negative Problem

Wednesday, April 22nd, 2009

By John Talburt, PhD, CDMP, Director, UALR Laboratory for Advanced Research in Entity Resolution and Information Quality (ERIQ)

In my March 25, 2009 post “The Myth of Matching,” I discussed the confusion between entity resolution and matching as in record de-duplication.  Matching is a necessary part of entity resolution, but it is not sufficient.  In particular I brought up the issue of “false negatives,” cases where records don’t match, but are in fact references to the same entity.  I used the example of Mary Doe living on Elm Street who married John Smith living on Pine Street resulting in two references “Mary Doe, 234 Elm St” and “Mary Smith, 456 Pine St” that don’t match, but are never-the-less references to the same person.  Let’s discuss a couple of approaches to solving this problem - enlarging the scope of identity attributes and utilizing asserted associations.

The Mary Doe - Mary Smith case might be resolved if the scope of identity attributes were increased, i.e. if additional information such as date-of-birth, drivers license, or social security number were available in both records.  But as anyone acquainted with information quality understands, acquiring and maintaining additional information can create as many problems as it solves.  It also brings up a number of questions that the information custodians and collectors must answer.

Is this information available? Is it costly? Is use for this purpose permissible/legal?  Even if expanding the number of identity attributes is an option, it is not necessarily a panacea.  Increasing the number of identity attributes also increases the complexity of the matching.  What if some values are missing?  What if some values agree, but others disagree?

A second approach is to collect and use asserted associations.  The fundamental problem is that if Mary Doe and Mary Smith do not share any matching identity attributes, you cannot know that they are the same person without some separately acquired knowledge that they are in fact the same person.  Moreover, because not all Mary Doe’s are the same person as Mary Smith, you also need additional context such as the address to make the connection clear.  The upshot is that you need to possess the explicit knowledge that “Mary Doe at 234 Elm St is the same person as Mary Smith at 456 Pine St.”

If Mary lives in the United States and Mary registers her change of name and address with the US Postal Service, then you might be able to resolve this through the USPS Change of Address file.  Besides the fact that this is only helpful in the US, relying on the USPS COA file has other disadvantages, not the least of which is that Mary may have decided not to register with the USPS.  For this reason, some companies choose to maintain their own knowledge by acquiring information from other public and private sources.

For example in the US, marriage records are publicly available and are a possible source of this associative information.  It may also be true that while Mary didn’t register her change of address with the USPS, she may have wanted to avoid missing any issues of her Modern Square Dancing magazine subscription and promptly registered her change of address with the publisher.  There are potentially many other data sources, such as changes in utility service, cable service, or required licensure notifications.

Even though the application of external association information can alleviate the false negative problem, it comes at a cost.  The collection and maintenance of associative information can be a monumental task for some types of entities. For example, at least 20% of the US population moves each year.  Because it is too large a task for most organizations to take on by themselves, companies that aggregate large amounts of associative data sometimes offer the application of this knowledge as a product.

In the next installment, I will discuss another common confusion, the difference between entity resolution and identity resolution.

Identity Resolution Daily Links 2009-04-21

Tuesday, April 21st, 2009

By the Infoglide Team

Los Angeles Times: L.A. County reserve deputy is accused of fraud at his security firm

“Jane Robison, a spokeswoman for Dist. Atty. Steve Cooley, said the men created a shell company, International Armored Solutions Inc., to hide the true number of employees at the security firm to avoid paying higher workers’ compensation insurance premiums to the State Compensation Insurance Fund.”

ArticleRooms.com: The Benefits of Master Data Management

“Next, Master Data Management can also help prevent fraud. With the passing of Sarbanes-Oxley which holds executives of public companies accountable for their financial statement, these executives have now placed pressure on the organization to get things right.”

Greene County Daily World: Looking back: Area schools safer because of Columbine shooting incident

Fusion centers are central locations where local, state and federal officials work to receive, integrate and analyze intelligence. The ultimate goal of a fusion center is to provide a mechanism where law enforcement, public safety, and private partners can come together with a common purpose and improve the ability to safeguard our homeland and prevent criminal activity.”

SmartDataCollective: Enterprise Data World 2009

[Jim Harris] “Enterprise Data World is the business world’s most comprehensive vendor-neutral educational event about data and information management.  This year’s program was bigger than ever before, with more sessions, more case studies, and more can’t-miss content.”

All About B2B: PAXLST and CUSRES – How EDI keeps our planes safe from Terrorists

“Through government ownership, the risk of security breaches is minimized and a higher level of consistency can be enforced across airlines.  In the first phase of the program, TSA will perform screening of only US domestic flights.  In future versions of the program, monitoring will expand to include international flights as well.”

Identity Resolution Daily Links 2009-04-17

Friday, April 17th, 2009

[Post from Infoglide] Identity Resolution in These “Interesting Times”

“Judging from recent remarks by Dr. Leonard Schaefer and Edward Lull Jr., the need for identity resolution is heightened by the turbulent economic circumstances of the “interesting times” we find ourselves living in.  While specifically referring to name analytics in a recent article in Bank Systems and Technology, the point applies equally to the identity resolution solutions which encompass name analytics.”

SQLblog.com: Whence Microsoft’s Master Data Management Product?

“As I mentioned in my last entry, we’ll be formally announcing the go to market plans for the master data management product soon. If the number of emails in my inbox is any indication, interest in Microsoft’s MDM offering is really heating up.”

Wolters Kluwer: Enhancing E-Verify will be a FY 2010 priority, testified USCIS acting deputy director

“In FY 2010, USCIS plans to improve the system’s ability to automatically verify international students and exchange visitors through the incorporation of Immigration and Custom Enforcement’s Student and Exchange Visitors Information System (SEVIS) data. By incorporating SEVIS nonimmigrant student visa data into the automatic initial E-Verify check, said Aytes, the number of students and exchange visitors who receive initial mismatches should be reduced.”

All About B2B: Product Master Data Management VS Product Information Management

“So my question to you is do you think that product master data management (product MDM) and product information management (PIM) are distinctly different? Is one a subset of the other? Is product MDM actually PIM in disguise or vice-versa?”

Government Security News: Beyond the fusion center: Intelligence-led policing’s central, strategic role

“Becoming an intelligence-led organization involves change from the very top to the very bottom of the law enforcement organization. A combination of threat assessment, information collection, and analysis — consistently applied to command level decision-making — is the true formula for success.”

Identity Resolution in These “Interesting Times”

Wednesday, April 15th, 2009

By Robert Barker, Infoglide Senior Vice President & Chief Marketing Officer

Judging from recent remarks by Dr. Leonard Schaefer and Edward Lull Jr., the need for identity resolution is heightened by the turbulent economic circumstances of the “interesting times” we find ourselves living in.  While specifically referring to name analytics in a recent article in Bank Systems and Technology, the point applies equally to the identity resolution solutions which encompass name analytics. The authors state that:

As big enterprise applications such as CRM are no longer the center of the IT universe, more attention is being focused on the information itself. Banks today have now become more reliant on customer information—independent of applications and business processes to make faster and smarter business decisions in response to changing market conditions.

They go on to detail four factors in international financial service organizations that drive the use of new software technologies to resolve identities using name information: compliance, customer data consolidation and quality, CRM assets review, and continuity of service.

Compliance – Key banking initiatives are anti-money laundering (AML) and know-your-customer (KYC) compliance that seek to prevent money that flows into financial institutions from ending up in the hands of prohibited groups.

Customer data consolidation and quality – The prevalence of large mergers between multi-national banks is driving requirements for name-validation “at the moment of capture” to prevent bad data from entering business systems.

CRM assets review – Combining millions of accounts from more than one bank risks overwhelming existing CRM systems, so resolving identities early in the process can mitigate the risk of future problems.

Continuity of service – Not anticipating the impact of merging large customer databases can interrupt customer service, leading to negative consequences from customer dissatisfaction all the way to losing large blocks of customers.

In Identity Resolution Daily, we’ve often written about the growing market requirement for sophisticated identity resolution technology, and we like to share relevant information from other sources. The referenced article is worth the read.

As always, let us hear your thoughts and comments.

Identity Resolution Daily Links 2009-04-13

Monday, April 13th, 2009

By the Infoglide Team

Contact Center Solutions: Master Data Management: The Importance of Relationship Management

“Businesses are looking for ways to understand family and partnership relationships to correctly determine who their best customers are, how to estimate the risk-adjusted values of customer relationships, and what the organization should offer to attract new customers and retain their best existing customers. Government agencies want to gain a deeper understanding of relationships between suspects and criminal organizations to prevent terrorist threats, money laundering and other criminal activities or unwanted events.”

Security Debrief: Secure Flight is a Milestone Achievement

“With the advent of Secure Flight, we witness an important new tool in the fight to protect our commercial aviation system while at the same time we have reduced costs to the private sector and defeated battalions of lawyers who would gladly have prevented Secure Flight from coming on line.”

data quality PRO: Identifying Duplicate Customers (Part 5)

“In this article, the fifth and final part in the series, we will discuss topics related to duplicate consolidation, including techniques for creating a “best of breed” representative record for duplicates, physical removal vs. logical linkage, and consolidation vs. cross population.”

Security Management

Robert Riegle, director of the state and local program office of DHS’ Office of Intelligence and Analysis, said that all DHS intelligence and analysis staff assigned to state fusion centers are trained in privacy, civil rights, and civil liberties issues. He also noted that DHS has made recommendations to fusion centers to promote transparency and privacy protections among its staff.”

cbs6albany.com: Hudson Falls skydiver charged with bilking workers’ comp

“According to the State Insurance Department, the back injury that supposedly prevented Jacob Bancroft from working apparently didn’t keep him from jumping out of airplanes. The 28-year-old Hudson Falls resident has been charged with illegally collecting $83,000 in workers’ compensation benefits for a back injury he said he suffered while working as a press operator.”

Identity Resolution Daily Links 2009-04-10

Friday, April 10th, 2009

By the Infoglide Team

data quality PRO: Identifying Duplicate Customers (Part 4)

“Too many data quality initiatives fail because of lofty expectations, unmanaged scope creep, and the unrealistic perspective that problems can be permanently ‘fixed’ as opposed to needing eternal vigilance.”

nextgov: State law enforcement agencies call for national information sharing network

“While centers consolidate data to identify potential threats in their area and share information with DHS, they don’t have any means of collaborating with other centers nationwide to identify trends or to plan a more comprehensive response to far-reaching threats.”

All About B2B: Trouble Free B2B – Is It Possible?

“Without a product master data management solution to manage the product information across all channels, think about how many ways and places there are to make mistakes: dimensions, weight, UPC, orderable units and pricing with new product introductions, promotions, purchase orders, advanced ship notices, shippers, pack lists, space planning, inventory, sales, and more.”

The Heritage Foundation: Secure Flight Program Creates Safer Skies

Secure Flight checks a passenger’s data against a federal database of the FBI Terrorist Screening Center—a center that integrates all available information on known or suspected terrorists into a central repository. While alternative proposals considered prior to Secure Flight would have tasked the airline industry with this screening process, under this program the airlines’ only charge is to gather basic information (full name, date of birth, and gender) when the passenger makes a reservation.”

Data Quality, Entity Resolution, and OFAC Compliance

Wednesday, April 8th, 2009

By Robert Barker, Infoglide Senior Vice President & Chief Marketing Officer

In a February post blogger Steve Sarsfield talked about government mandates that direct financial institutions to avoid doing business with known “bad guys”:

The mandates have to do with the lists of terrorists offered by the European Union, Australia, Canada and the United States. For example, in the U.S., the US Treasury Department publishes a list of terrorists and narcotics traffickers. These individuals and companies are called “Specially Designated Nationals” or “SDNs.” Their assets are blocked and companies in the U.S. are discouraged from dealing with them by the Office of Foreign Asset Control (OFAC)… If your company fails to identify and block a bad guy… there could be real world consequences such as an enforcement action against your bank or company, and negative publicity.

He goes on to describe the role that data quality software plays in addressing the problem. While I agree with Steve that improving data quality is an important component of some solutions, I’d emphasize that it’s critical to know when and where to improve it. Too much “quality” can actually hurt a solution’s effectiveness.

Professor John Talburt (ERIQ) illustrated this notion in a recent guest post. Making the case for using entity resolution to find hidden relationships, he first showed how the absence of sufficient attributes can cause false positives. He then went on to say:

Even given that the set of identity attributes is large enough to avoid a false positive, the larger problem with matching as a surrogate for entity resolution is that it produces false negatives.  For example, “Mary Doe, 234 Elm St” and “Mary Smith, 456 Pine St” do not match, but does that mean they are not references to the same entity?  It could very well be the case that Mary Doe married John Smith and moved to his house at 456 Pine St.

So in looking for bad actors, suppose the address of one of the two Mary Does above had been resolved to the “correct” address by applying data quality software before using entity resolution to search for hidden relationships. We might have never discovered that Mary Doe married the nefarious John Smith who is on the OFAC list!

If the goal of the solution is merely compliance with a minimum of false positives, data quality can help achieve these goals. But if the goal of the solution is to find bad guys by discovering non-obvious relationships, false negatives are a more important consideration. While false positives are a costly annoyance that require extra resources to resolve, false negatives can mean missing bad guys altogether, and that hurts much more than the bottom line. It can mean not complying with the mandates.

Bad Behavior has blocked 594 access attempts in the last 7 days.

E-mail It
Portfolio Strategy News The Direct Marketing Voice