By Robert Barker, Infoglide Senior VP & Chief Marketing Officer
Paul Rosenzweig, former Deputy Assistant Secretary for Policy at the Department of Homeland Security, recently posted an intriguing piece on Harvard National Security Journal about connecting the dots regarding the Christmas Bomber. He makes a strong case that a decision to stop research on data analytic tools in 2003 has contributed to the problem analysts face today in making sense of the massive and manifold data sources they sift through.
Initiating more research would clearly add to the tools that analysts have at their disposal. At the same time, applying existing entity resolution software technology to more data sources could add significant firepower and help address the data challenge.
Let’s examine four issues Mr. Rosenzweig raised and evaluate the current state of entity resolution technology to address each issue:
“This is a veritable flood of data. In hindsight, of course, it is very easy to see the pieces that connect together to form a picture of Abdulmutallab’s plot. But those 10 or so bits of information were floating in an ocean of other data—literally millions of different individual entries from thousands of different sources in a host of different databases.”
Existing entity resolution technology scales to handle multiple tens of millions of transactions daily. While the “flood of data” would likely test the limits of existing systems, it’s not clear that reaching the required scalability is limited by the software or is simply a function of establishing well-founded rules and incorporating the needed amount of hardware capacity.
2. Real-Time Analysis
“We continue to rely on the intuition of analysts to provide the insight we need. It is all well and good to say ‘with the NSA intercept about a Nigerian we should have started looking at all Nigerians’ or ‘we should have begun looking at everyone named Umar Farouk,’ but those leaps of insight and anticipation are not routine—they require analysis and consideration. And that requires time—time to ponder the necessity of making precisely that inquiry. But time is what our analysts don’t have. At least not enough of it. Not with the flood of data we are seeing. They have to prioritize and move certain lines of inquiry to the top of the pile.”
Crucial attributes of entity resolution technology are its ability to (a) process massive amounts of data in real time and (b) make automated decisions that prioritize the importance of each element. Entity resolution will never displace trained analysts, but its ability to sift through millions of pieces of data to produce a prioritized list of the most important potential connections offers the best way to fully exploit analysts’ brainpower and accelerate the process of detecting impending terrorism.
3. Automated Scoring
“What we lack is not human intuition. Rather we lack the tools to make human intuition effective and automated. The head of the NCTC told a rather shocked Senate committee the other day that, in effect, NCTC analysts don’t have a “Google‐like” tool for database inquiries. They can’t, for example, simply type in ‘Umar Farouk’ and pull up all the pages with links to that name.”
While a “Google-like” tool isn’t currently being used, the components needed to build one are available. By connecting to the appropriate data sources, some of the more powerful entity analytic software can “similarity search” a name across multiple disparate (and even remote) databases, and the software will detect similar attributes of multiple identities, and then combine them to yield a broad picture of an individual’s activities as documented in the data sources.
4. Multiple Attributes
“But even that wouldn’t be enough—because there would likely still be far too many ‘Umar Farouk’ pages for any analyst to review (especially if instead the name we had was, for example, ‘Omar Abdul’). What is necessary, as the Markle Foundation has said persistently, is for us to authorize and invest in tools that allow for automated analytics—things like tagged data (so that corrections to information are automatically transmitted for updates), identity resolution techniques (so that ‘Umar’ and ‘Omar’ are both considered), and persistent queries (so that a question that an analyst asked last month about Umar Farouk persists in the databases and is automatically linked to a father’s warning about his son Umar when that comes in three weeks later).”
One untouched topic is the effect of associating other attributes with an identity in addition to names, e.g. phone, SSN, passport, license plate, eye color, DOB). Matching similar names in the absence of other information may not be adequate to raise an alert about an identity, but when other attributes are captured and added, the problem becomes markedly more manageable. “Persisting” an identity is a good suggestion that enables more attributes to be added over time. Growing the data in this fashion will enable the system to trigger when a connection to someone on a watch list is identified.
Entity resolution technology is already sufficient to make an enormous difference today if it were just more broadly applied. While Mr. Rosenzweig is correct in his assertion that more research on data analytics tools is needed and can help move the process forward, we should also move rapidly to leverage available technology: entity resolution.