Podcast transcript: Synthetic data: artificial data; real solutions

06 min | 28 March 2023

In conversation with:

Alexy Thomas
EY India Technology Consulting Partner

Silloo Jangalwala: Hello, this is Silloo welcoming you to a new episode of the EY Podcast series. As we have concluded the series on ESG, we are now moving to our next topic, technology. Here, we look at the most trending technology topics that India Inc. needs to know in its digitization journey. Today, we will look at synthetic data, which has become a favorite buzzword in the artificial intelligence space. 

There is no doubt that the industry needs quality data to train new AI models. But with data privacy concerns and stringent regulations on data sharing, accessing real quality data is very difficult. Synthetic data tries to address these problems. Today, we are going to explore more about synthetic data. What is synthetic data and why it is called the future of AI? Will it solve the privacy concerns? Is it a one-stop solution for all your AI data needs? To answer these questions, we have with us today, Alexy Thomas, Technology Consulting Partner on data analytics and ESG at EY India, joining me. 

Welcome to the podcast, Alexy. You are in the hot seat today.  

Alexy: Thanks, Silloo, for the wonderful introduction. This is a very interesting subject, so I will try to answer your questions as much as I can.

Silloo: Great, Alexy. So, I would say that synthetic data is a relatively new term and not many are familiar with it. Can you briefly explain the concept and why we need synthetic data? 

Alexy: Absolutely. With new AI-based technologies coming in, there is no doubt that our data needs are so huge that we need to train these AI modules well to get the best benefits out of them. But accessing quality real data is very tedious and time-consuming, and often, it is a very costly affair. In many scenarios, for example, self-driving automobiles and so on, use cases may be new and real data would not be available at all. Even if they are available, they might not be accessible because of data privacy rules. In such a case , synthetic data comes very handy to train AI models. Synthetic data is, by definition, generated artificially with or without the use of real data for training AI modules. It serves a real purpose; it serves the purpose of real data and is sometimes even better. And it has all the granular elements of the original dataset that you would want.

Silloo: So, tell me Alexy; what does synthetic data offer? 

Alexy: Digitally generated data has the same predictive power as real data, as it replicates the statistical characteristics of the existing dataset. It can be generated for unseen conditions and events. When actual data lacks quality, volume, or variety, synthetic data overcomes these weaknesses, as it is generated for all the different unseen conditions in all the permutations and combinations of a given situation. Thus, it can better train AI models, test systems better, and build better prototypes than even actual datasets. Also, as they can be created for any specific data requirements, they do not involve the same concerns as real data from a privacy point of view. It will also a provide faster turnaround for AI testing. The number of iterations required is sometimes very large. In coming years, synthetic data is going to overshadow real data in AI models.

Silloo: That is interesting, Alexy. In your view, which sectors can make use of synthetic data?

Alexy: Silloo, synthetic data can be very helpful in many sectors ranging from manufacturing and mobility to retail and natural-language processing where many of the use cases might be new – for example, training virtual AI assistants in Indian regional languages. It is also a savior in industries like healthcare and pharmaceuticals, where data privacy is a huge concern. In sectors like financial services, synthetic data can really help evaluate market behavior and develop new and innovative products, which is what large and small financial services organizations are trying to do.

Silloo: But will it solve all the data-related problems? Is it a silver bullet? 

Alexy: No, it certainly is not a silver bullet, Silloo. While synthetic data provides many benefits, it is not really a one-stop solution for all the data-related problems. Just like any other technology, synthetic data  comes with significant risks and limitations, since the quality of synthetic data generated largely depends on the quality of the model that created it. So, if the input has errors or biases, the data generated using it will lead to false insights. Just like they say – garbage in, garbage out. 

Silloo: Thanks a lot, Alexy, for explaining all these concepts so clearly.

Alexy: Thanks, Silloo, I too really enjoyed the talk.