Now, however, a new architectural pattern, called data fabric promises to deliver the high-confidence data that companies need. It stitches together disparate databases, making a single view of enterprise-wide data accessible via just about any tool of choice.
Data fabrics minimize data replication through virtualization, giving consumers speedy access to high-trust raw data that resides in its native state.
The problem of data replication
One of the greatest fundamentals of data trust is lineage. Do you know where your data comes from, are you dealing with the most reliable version of that data, and if it’s not in its raw state, how has it been transformed?
Trying to unravel data lineage is a bit like the old game of “Telephone” we used to play at school, where a message is whispered between children and over time becomes increasingly garbled. The same dynamic is in play when data is replicated across multiple systems – it is transformed in various ways and a kind of entropy takes place, which erodes trust.
To clean up data, it’s necessary to track it back to the raw source. The problem is, with every replication you need to reconcile the data and that can be an incredibly time-consuming and complex process and is achieved with various levels of success. It also introduces a temporal difference in the data while it’s being checked. It’s clearly impractical to attempt to clean up data in this way every time you want to make a business decision.
From warehouses to lakes – data-trust issues abound
Data scientists have been trying to solve this challenge of data trust for decades. First came the data warehouse in the 1980s and then the data lake around 2010. Each solution was an improvement on its predecessor, but they were both fundamentally flawed because they failed to solve any of the issues around data replication, lineage, entropy and reconciliation.
Data fabric, however, promises to solve all these issues of data trust, unlocking digital transformation and placing solutions such as enterprise-wide analytics and industrial-scale artificial intelligence (AI) within reach.