11 minute read 16 Apr 2020
Man is weaving cloth

How to create a consistent data experience across your data ecosystems

By Chandra Devarkonda

FSO Data & Analytics Executive Director Ernst & Young LLP (US)

Data and Analytics specialist. Cloud strategist. Programmer. Marathoner.

11 minute read 16 Apr 2020

Customer trust, system capacity and managing costs are critical, especially in crises. Data fabric, a modern architecture approach, can help.

Companies are increasingly migrating out of fragmented, monolithic data platforms to more streamlined and flexible data ecosystems to respond to and provide tailored customer services and a more consistent digital experience. And, in times of unexpected crises like COVID 19, a unified and flexible data ecosystem is critical.

New data ecosystems such as those offered by public cloud providers are attractive migration options. However, instead of moving all of their data to a single public cloud provider, companies are mitigating risk by diversifying across cloud providers as well retaining some of the critical data within their internal private data ecosystems. This hybrid approach opens up several new opportunities for business agility but also present new challenges in data management.  For example:

  1.  Data management across data ecosystems require significant time and investment just to achieve consistency and traceability to improve trust with customers.
  2.  Allocation of resources for data storage, computation and access across data ecosystems can be sub-optimal as companies need to track, manage and balance the internal data ecosystem capacity with that on the public clouds.
  3. With the use of pay-as-you-go operational expenditure model on the public cloud versus pre-paid capital expenditure model internally, companies need to know the value of the data and how to balance value with the cost of maintaining their data across these ecosystems.  

Customer trust, system capacity and managing costs are critical to survival in times of unexpected crises, like the one the world is currently witnessing. Fortunately, these challenges can be resolved through a more modern architecture approach called the “data fabric,” or data plane, that provide a unified, elastic approach to data management across any number of data ecosystems.

What is data fabric?

Data fabric is a set of independent services that are stitched together to provide a single view of your data, irrespective of the repositories where it is generated, migrated to or consumed from. These services, built using artificial intelligence (AI) methods and modern software engineering principles, include business services, data management services, monitoring services, a data catalog and more. Services can also track data through its life cycle and attribute value to it, informing data resiliency and life cycle management planning, while also providing the flexibility to apply tiered data quality, privacy and security solutions.

Several inherent properties of this approach make a data fabric flexible, stretchable and durable:  

  • New services and sub-services can be added or removed as needed, without disruptions.   
  • More data platforms can be added easily (in the case of mergers and acquisitions, for example) by simply connecting new resources to the existing set of services.   
  • It is technology- and platform-agnostic. Users accessing these services are abstracted from how the newer platforms are integrated to existing platforms.

As a result, this approach will help organizations evolve in stride alongside changes in business needs as well as technology. It is designed to be long-lasting and forward-looking, and it can cater to a variety of upcoming technology trends.

Stitching together disparate data ecosystems

In the recent past, some leading cloud vendors started providing services that allow organizations to connect their cloud and internal data centers and manage these different environments through a unified approach, an enabler for data fabric services.  

In its data fabric, a company can include a service to integrate third-party vendor products that specialize in certain functions, adding a virtual integration layer that allows for updates and replacements as needed. For example, the metadata service on the data fabric could integrate a vendor product specializing in business metadata and a product that specializes in technical metadata. 

So while the cloud providers do not have the business context, they do provide a base set of services to enable the data fabric. And vendor products specialize only in some of the functionalities that still warrant scalable, automated integration with other services. It is in this natural gap that the services that would comprise the data fabric now fit in.  

The makeup of the data fabric

Each of the unifying set of services that make up the data fabric is a grouping of independent software sub-services meant to serve a specific function. For example, a metadata service is composed of metadata capture, metadata linkage and a metadata graph sub-service. Each sub-service operates on its own and is thus updated, replaced or tailored as needed without disturbing the functioning of the overall metadata service, which is independent of other data fabric services, such as the data quality service or the privacy service. 

However, all these services are still loosely coupled through an infrastructure layer called “service mesh.” This service mesh provides a secure, fast, fault-tolerant inter-service communication, along with other crucial functions such as load balancing, encryption, tracing and monitoring.    

Each data fabric service also incorporates AI as part of its group of sub-services.  Continuing the metadata service example, a technique called “Bayesian network” is used to detect relationships between metadata given their corresponding data distributions and build a probabilistic graph that connects these metadata elements. The same is true for the data quality service or privacy service.   

As such, these services need to continuously learn from their interactions and self-correct, allow new software updates to be incorporated through continuous integration and delivery, and provide the ability for autonomous execution.  

Additionally, all these data fabric services are orchestrated using a modern data catalog, which is a set of services that goes beyond the traditional repository of data definitions. The data catalog links business metadata with technical metadata, using AI to learn from actual data and update the metadata definitions. It also helps with managing application usage through changes in underlying data structures, incorporating data and business polices and rules through intuitive interfaces, and acts as a distributed store for event logs. All of these functions are provided through sub-services orchestrated as a data catalog service and are accessible by other services, as well the rest of the organization.    

The data fabric includes a marketplace digital interface for users to access, execute and monitor each of these services. The interface allows users to group tagged services to view relevant operational data and dashboards, showing insights and performance. These intelligent services can be utilized to strengthen governance and controls, provide common business functions used across processes and manage the infrastructure of the data ecosystems.  

What you gain through data fabric

A data fabric approach to provisioning and managing large-scale networks of services that abstract business, data and infrastructure functions across hybrid and multi-cloud data ecosystems enable organizations to reliably manage data, while simplifying the implementation of consistent policies across these ecosystems.  

In this digitally disruptive age, automatic abstraction of commonly used business functions, centralizing key data management functions and better managing data infrastructure for storage, computation and access will help organizations increase focus, mitigate infrastructure costs and better allocate spend and resources on creating business value. Let’s explore these benefits in greater detail.

1. Abstraction

Each of the independently hosted services that comprise the data fabric help organizations abstract the functions that the services provide. While the services are primarily targeted at the chief data officer (CDO) and chief information officer (CIO) levels to manage data and infrastructure, these services also impact functions such as marketing, customer service and finance.     

Consider the marketing function. If marketers want to more nimbly test new offers or services, such as by comparing the effectiveness of different web pages or product offers by customer segment and rolling out continuous updates, they can use specific services hosted on the data fabric that encapsulate all the necessary functions and provide that visibility.  

Likewise, in finance functions, independent financial calculation functions such as loan amortizations or cash flow calculations are hosted on the data fabric and updated without disrupting how they are accessed. Data usage by these functions are also independent of where such data is stored. Data quality checks on these data are automatically built so the trust in usage of the data significantly increases.

2. Trust and risk mitigation

Improved lineage, traceability, explainability and transparency form the core of future-focused governance.

By consistently applying identification, tagging and controls to your data elements, along with privacy screening as needed, you mitigate risk. And consistent data experiences, privacy applications and data quality improve trust between the organization and its customers. 

To gain insight into the type of privacy screen to be applied when data is consumed, an organization can tag sensitive data nonintrusively, track and assign qualitative metrics based on cost of acquisition, and process and store consumer data using operational logs and business rules. The privacy group of services can enable such functionality while also using AI to apply configurable privacy screens. They also enable use of natural language processing to digitize new privacy rules and incorporate them into the privacy screening services without much human intervention. 

3. Economic modeling

At the infrastructure level, an economic utility service on the data fabric determines data storage needs and computer needs through a cost-benefit analysis, for when multiple data centers are being managed. The economic model uses workload costs and several historical analyses of similar workloads to help make a more informed decision on the type of infrastructure needed to execute the workload across public cloud ecosystems. Additionally, an organization can better match and distribute workloads across data centers through simulations.

Data fabric architects and engineers can work with you to design and implement the services that will help you manage your diverse data ecosystems across cloud platforms. Consultants may use a pilot prototype for demonstration and as part of a workshop to assess your needs and evaluate areas where you need help.  

Data fabric in action

To illustrate the challenges, we’ll use a case from a global insurance organization. The organization is considering using a product pricing application across its internal data center and on a public cloud.  

The application needs to scan through terabytes of data (not all of which is available on the public cloud), select valuable data points as inputs and then execute a complex set of mathematical functions iteratively to arrive at a possible price.   

This price needs to be accessible by a customer through the organization’s website and mobile app. It also needs to be recorded within internal data systems.   

The organization uses the public cloud for the pricing calculations because it is far easier to spin up requisite hardware and software that matches the needs of the pricing application, access the data, perform the calculations and generate the price. However, it also needs to continue using its internal data systems due to dependent applications and incumbent business processes.   

This poses several different challenges, including:

  • How does the organization know beforehand how much infrastructure it would need to run the pricing calculations?  
  • What if it has some idle capacity in their internal data systems that could be used along with the additional capacity a public cloud can provide?  
  • If the organization was using multiple cloud vendors, how can it decide quickly how to federate the application execution to optimize the overall costs?
  • How can the metadata that is being captured in internal data systems be linked on a continuous basis with that on the cloud to provide the organization a consistent metadata experience?
  • If the application needs to provide a price to millions of customers worldwide in real-time, how can it be sure it is providing an accurate price, calculated with the speed demanded and provide unbiased transparency on how it arrived at that price in a consistent manner?  
  • How can new business rules be injected into the pricing application that incorporates changing market factors as well as local market regulations in a globally consistent manner?

These multitude of different challenges are addressable by using the menu of services offered and hosted on the data fabric. Each service addresses a specific data challenge such as data quality, metadata management or workload allocation. Each service is itself composed of several sub-services, each performing an independent function. For example, one service would capture data related to its function, the second would perform validation checks, the third would apply advanced algorithms to learn about the data and the fourth service could offer a recommendation or provide an alert. Each of these sub-services can be managed and updated without disrupting the overall service. The data fabric services, while independently functioning, are meshed together through the public cloud platforms and hosted in a cloud-based data center of the organization, thereby removing the need to manage the infrastructure to host the data fabric services.  

This makes data management more automated, anonymous and scalable across data ecosystems. This also allows for organizations to manage their data across their myriad platforms though a unified layer, greatly abstracting from their data infrastructure while also providing the transparency.

The authors would like to thank Sacheen Punater, Raviraj Vijaykar, Natasha Malik, Rahul Joshi and Emily Johnson for their contributions to this article.

Summary

A unified and flexible data ecosystem is critical, as well as having a single view of your data, irrespective of the repositories where it is generated, migrated to or consumed from. The elasticity of a data fabric means it’s long-lasting, forward-looking and beneficial, especially in a crisis.

About this article

By Chandra Devarkonda

FSO Data & Analytics Executive Director Ernst & Young LLP (US)

Data and Analytics specialist. Cloud strategist. Programmer. Marathoner.