Data Fatigue
By Brian Calvert, Infoglide Senior Software Architect
Four years ago this week, a small aircraft lifted off from Watson Island in Miami. It was the plane’s 39,743rd flight. And as the tiny craft first vented white smoke and then lost its right wing in an explosion, it became clear that this was its last. All twenty people in the Grumman G73-T, including three infants, perished. The National Transportation Safety Board later determined that the culprit was metal fatigue.
Metal fatigue, or more generally “material fatigue”, is a well-understood concept in the “real” non-digital world. Over time, materials like metal begin to fail through deterioration induced by various kinds of stress. The individual stresses are less than the strength of the material. But they weaken it, and can eventually overcome it. Left unchecked, material fatigue can lead to failure of parts, and the consequences can be devastating, like the crash of Chalk’s Ocean Airways Flight 101 on December 19, 2005.
In working with clients and observing the challenges they face, the concept of “data fatigue” has crept into our conversations. The idea is that a company’s data – about customers, vendors, employees, products, whatever – wears out over time due to entropy. Yes, you’re right, bits don’t start disappearing randomly, but changes to the data do introduce ambiguity and errors over time: people marry, products are retired, companies change offices, assumptions change.
Large manufactured objects are made up of thousands of individual parts. Data are the key “parts” of information systems, and we’re not the first in pointing out the critical nature of maintaining data quality. What’s novel is the idea of instituting a continuous refresh of organizational data: resolving, enriching, and augmenting corporate data beyond everyday transactional updates.
In fact, you can view the transactions as stressors that introduce ambiguities, conflicts and errors. Many methods of fighting “data fatigue” may already be in place – e.g., pre-transaction editing and verification, and periodic data cleansing – yet corporate data continues to deteriorate over time because these methods usually focus on single data sources.
In a world where the efficiency and margins of an organization can be profoundly affected by the accuracy of its data, threats to the accuracy and currency of that data must be countered.
Performing this refresh manually is a daunting task even for a smaller organization. But for hundreds of thousands or even millions of records it is impractical to do by hand. Automated solutions become necessary, and technologies like entity resolution can create a continual data refresh cycle.
