HOME

Playing the Name Game with Terrorist Watch Lists and Shoplifter Databases

By Infoglide Software Architect Brian Calvert

When it comes to data-mining to improve terrorist watch lists, it can get complicated. Consider Arabic names, for example. There are over 200 ways to spell the name Mohammed, according to Arabic expert Thomas Milo, and this makes things extremely difficult for those in charge of securing U.S. borders. The complexity of identity resolution is not just a national security issue, it also affects retailers’ loss prevention efforts.

In The Politics of Naming: Post-9/11 Security and the Transliteration of Arabic Names, Courtney C. Radsch has written a great post on the difficulty of transliterating Arabic names that is a must-read for identity resolution professionals. Ms. Radsch is a freelance journalist who as written for publications like the New Times and she’s currently working on a her Ph.D. dissertation on foreign policy and the impact of the Arab media’s influence.

In her post about the complexity of translating Arabic names, Ms. Radsch writes,

“A major challenge to Romanizing Arabic names is that many sounds in Arabic are unwritten, like short vowels, or are represented by a single letter in English. Diacritical marks and unwritten vowels impact the pronunciation and meaning of a word, but are not usually written. The name Rabih, which contains only four letters in Arabic, can also be spelled Rabia, Rabiaa or Rabiha.

“Furthermore, if the standard transliteration were to use two English letters to represent a single Arabic one, such as sh or ‘th’ it would be impossible to tell the difference between ‘th’ as in thin or the two distinct sounds as in ‘must have.’”

Ms. Radsch later contemplates the pros and cons of an international transliteration standard and quotes Jack Hermenson, CEO of recently acquired Language Analysis Systems, who says,

“We can’t change the way people write their names but we can make computers and people smarter in the way they search for names,’ he said, adding that every standard was inadequate in some way. ‘It’ll make some congressman happy to not see names spelled many different ways, but it’s not going to make the borders more secure.’”

The language complexities pointed out in Ms. Radsch’s post illustrate the difficulties in resolving identities — and this holds true for both national security officials and retail loss prevention professionals. Whether you are at the ticket counter talking to an airline passenger, or at the returns desk with a customer, you need a complex solution to resolve the complexities of knowing who is who.

When it comes to making computers and people smarter in their search for names there are vast differences between name matching and simple similarity technology versus sophisticated identity resolution software like Infoglide Software’s Identity Resolution Engine (IRE) that incorporates over 70 carefully tailored algorithms to reliably compare data of special types.

While the risks are much higher with mis-identifying a terrorist, both homeland and retail security pros can use IRE’s patented technology to resolve these variables below to separate the good guys from the bad guys:

  • Exact name matching is clearly not sufficient because it results in false negatives (e.g., “Zachary Taylor” and “Zachery Taylor” – misspelling — and “Zachray Taylor” – typographical error).
  • Name transposition can also introduce confusion (e.g., “Chester Arthur” or “Arthur Chester”). Resolution rules need to take this into account.
  • Similarity measures are a must, but unfortunately, many people consider common mechanisms such as “soundex” and “metaphone” to be sufficient. They’re not. Many are susceptible to false negatives (failed matches that should have succeeded) due to misspellings or common variations, false positives (matches that should have been rejected) due to gross simplifications, and sensitivity to cultural variations. For more, see Soundex Shortcomings and Variations.
  • With soundex, different first character can cause a false negative – “Carter” versus “Karter.”
  • Omissions or substitutions can complicate matters. Might “William Jefferson Clinton” and “William Jefferson Blythe Clinton” and “William Clinton” be the same person? What about “Jefferson Clinton?”
  • Abbreviations introduce their own challenges. Are “George Herbert Walker Bush” and “George H W Bush” and “George H Bush” the same person? (Rules enter the picture again: will additional information – such as address or date of birth – clarify the situation?)
  • Nicknames take the cake. Are “John Kennedy,” “Jack Kennedy” and “Johnny Kennedy” potentially the same person? How about “William Clinton” and “Billy Clinton?”

And that’s just names. Street addresses, cities and states, even zip and postal codes each have their own peculiarities that need to be handled if you incorporate either a terrorist watch list or a known shoplifter database into an identity resolution solution. If you want not to “compound the hassle people who share the same or similar names with those on the watch list feel,” highly sophisticated software is required.

Leave a Reply


Bad Behavior has blocked 1301 access attempts in the last 7 days.

Close
E-mail It
Portfolio Strategy News The Direct Marketing Voice