5 minute read 20 Aug 2021
Man working in subway

How not to let data-trust issues undermine your transformation

By Faisal M. Alam

EY Americas Consulting Emerging Technology Leader

Emerging technology and data evangelist. Large transformational program leader. Husband and father of two amazing children. Avid snowboarder.

5 minute read 20 Aug 2021

Discover how data fabric can help overcome challenges with data replication and inconsistency enabling powerful business decisions.

In brief
  • Trustworthy data is a critical element for any organization's success. To overcome data-trust issues, first tackle data replication and lineage challenges.
  • Data fabric promises to solve all data-trust issues, unlocking digital transformation and placing solutions such as analytics and AI within reach.
  • Data fabric places similar levels of trust and agility within the reach of incumbent organizations nearly a decade later.

Organizations have been talking about digital transformation for more than a decade. In a 2020 global EY report – “Tech Horizon: Leadership perspectives on technology and transformation” – only 44% of large corporations said they’re making good progress or that digital transformation is fully embedded and optimized across their business. 

Tech Horizon: Leadership perspectives on technology and transformation

44%

of large corporations said they’re making good progress or that digital transformation is fully embedded and optimized across their business.

Aside from organizational issues, the single biggest reason that businesses struggle with transformation is because they overlook the fundamental fact that trustworthy data is a critical element for success. Companies can invest heavily in the latest front-end technology, which may provide the impression of transformation, but if they couple it with inflexible legacy back-end infrastructure, the integrity of their data simply won’t improve.

And in a world where data-driven insights are the oxygen of modern commerce, the difference between trustworthy and untrustworthy data is the difference between success and failure. Indeed, the same EY report revealed that companies that are advanced on their journey toward digital transformation are 50% more likely to see an annual Earnings Before Interest, Taxes, Depreciation and Amortization (EBITDA) increase of 15% or more.

Tech Horizon: Leadership perspectives on technology and transformation

50%

of companies advanced on their journey toward digital transformation are more likely to see an annual Earnings Before Interest, Taxes, Depreciation and Amortization (EBITDA) increase of 15% or more.

Now, however, a new architectural pattern, called data fabric promises to deliver the high-confidence data that companies need. It stitches together disparate databases, making a single view of enterprise-wide data accessible via just about any tool of choice.

Data fabrics minimize data replication through virtualization, giving consumers speedy access to high-trust raw data that resides in its native state.

The problem of data replication

One of the greatest fundamentals of data trust is lineage. Do you know where your data comes from, are you dealing with the most reliable version of that data, and if it’s not in its raw state, how has it been transformed?

Trying to unravel data lineage is a bit like the old game of “Telephone” we used to play at school, where a message is whispered between children and over time becomes increasingly garbled. The same dynamic is in play when data is replicated across multiple systems – it is transformed in various ways and a kind of entropy takes place, which erodes trust.

To clean up data, it’s necessary to track it back to the raw source. The problem is, with every replication you need to reconcile the data and that can be an incredibly time-consuming and complex process and is achieved with various levels of success. It also introduces a temporal difference in the data while it’s being checked. It’s clearly impractical to attempt to clean up data in this way every time you want to make a business decision.

From warehouses to lakes – data-trust issues abound

Data scientists have been trying to solve this challenge of data trust for decades. First came the data warehouse in the 1980s and then the data lake around 2010. Each solution was an improvement on its predecessor, but they were both fundamentally flawed because they failed to solve any of the issues around data replication, lineage, entropy and reconciliation.

Data fabric, however, promises to solve all these issues of data trust, unlocking digital transformation and placing solutions such as enterprise-wide analytics and industrial-scale artificial intelligence (AI) within reach.

Much of that telemetry data from the edge is referenced in place to provide insights on everything from product improvement opportunities to enhanced and predictive diagnostics on equipment failures. Because EY reduced replication, throughout these downstream processes, where the data has come from, how it’s being used, and where it’s going is never in doubt.

The secret of data fabric’s success is that each data type or data domain is left to reside in its original data storage mechanism – as long as that repository is fit for purpose. There is no replication of data and no data entropy as a direct result.

This is the first viable paradigm that unifies data at the compute level instead of the storage level. Instead of laboriously moving data to a warehouse or lake, these repositories are linked using modern lightweight cloud-native federated query or data-visualization tools to something called the semantic layer. This layer enables the consumer to view raw data from potentially hundreds of linked databases quickly and easily via a single pane of glass.

Accessing data faster, cleaner and with more confidence

Until now, report writers, visualization designers and data scientists could spend up to 80% of their time making connections to dozens of data stores, gathering the information they need. Now, however, data fabric makes this heavy lifting obsolete. Building the right infrastructure to store and retrieve data from a data lake can take four to six weeks, but by using data fabric it’s possible to obtain highly trustworthy data from its native state, using the semantic layer and virtualization, in as little as 30 minutes.

Data fabric finally places similar levels of trust and agility within the reach of incumbent organizations nearly a decade later – without the need to painfully rip and replace whole ecosystems of legacy technology.

Four steps to high-trust data

Companies serious about overcoming data-trust issues should first tackle the data replication and lineage challenges. It’s impossible to focus on every single data point within an enterprise. So, the first step is to identify and prioritize critical data elements – what data will provide the outcomes you need and the biggest return on your investment? Isolate and focus on this data above all else.

The next step is to catalog exactly where this critical data is being held – this involves understanding data domains, what data products your organization is producing and how to access it.

Having done this, it’s easier to identify the best system of record for your critical data. Does this repository hold raw data in its native state, or has it been transformed or altered?

The final step is to design a plan for unifying these data sources. A well-designed data lake may be a good starting point for some organizations to tackle a specific tactical problem. Ultimately, however, an organization’s long-term goal should be to break down internal silos with a data fabric design pattern.

Unlocking the data-trust dividend

Businesses that seize the data-fabric opportunity and overcome data-trust issues can expect a compelling return on their investment in the form of agility, enablement and long-term value.

Without the need to spend weeks building data pipelines to integrate databases, businesses are able to access trustworthy insights within minutes. They also don’t need to build functionality – thanks to data fabric, users can go to a data marketplace, view data products, download the data they need and start building their algorithms in record time. There’s also no need to clean data or build scripts so data can be replicated.

The biggest benefit, however, is that thanks to data fabric, C-suites will have the confidence to make high-impact business decisions in near-real time, safe in the knowledge that they are acting on the best data insight available.

Summary

Data Fabric as an approach to help organizations overcome data management challenges and allow C-suite level to undertake informed business decisions. By having access to real-time trustable-data, organizations can allow C-suite to make high-impact decisions.

About this article

By Faisal M. Alam

EY Americas Consulting Emerging Technology Leader

Emerging technology and data evangelist. Large transformational program leader. Husband and father of two amazing children. Avid snowboarder.