Building confidence: the next evolution of ethical AI
The EY organization has a long history of working with AI, and it recently announced EY.ai, a unifying platform that brings together human capabilities and AI to help clients transform their businesses through confident and responsible adoption of AI. EY.ai leverages leading-edge EY technology platforms and AI capabilities, with deep experience in strategy, transactions, transformation, risk, assurance and tax, all augmented by a robust AI ecosystem. However, as the technology matures and the demands of the market evolve, so do our language, methodologies, and approach to managing the myriad of risks emanating from deploying AI.
As a result of this sustained involvement with EY clients and with AI technologies, we are shifting our focus away from building trust and toward building confidence. This new focus is better aligned to the current and emerging risks raised by leading AI techniques, and clearly better aligned to widespread concerns relating to the ethics of AI. The motivation for this change is quite elemental; LLMs are probabilistic, rather than deterministic, and therefore they bring new risks that we must manage.
In a probabilistic environment, the same prompt or input can generate different outputs, and even seemingly minor perturbations in the prompt such as an extra comma, apostrophe, or alteration in spelling that a human reader would ignore can push the outputs of the model into a completely different space, and as such produce unwanted or unexpected model outcomes (aka “hallucinations”). These new models are also vulnerable to new attack strategies, such as prompt injection attacks where specially designed prompts push the model outside of established safety boundaries. These are additional risks beyond those presented by earlier AI techniques.
Such variability can’t be tolerated in highly sensitive contexts, like hospitals or clinics for treatments and diagnostics or pharmaceutical processes such as determining doses. With a high degree of model uncertainty, LLMs need to be tightly monitored and controlled in high-risk sectors, applications, or in the automation of business-critical functions. For example, imagine an AI Copilot deployed into emergency rooms to help doctors and nurses triage ailing patients, which to be clear, such products are already in development. A high degree of model uncertainty has the potential to do irreversible harm in such an application.
In turn, this means the incredible technology which has been compared to fire, electricity, and the iPhone in terms of socioeconomic impacts by major technology leaders can’t yet be fully leveraged in key sectors like financial services or healthcare, which together represent around 40% of GDP. As such, many of the most valuable use cases in the economy remain unattainable until strong governance can ensure safety, robustness, and ethicality. Hence there is a clear and urgent need for rigorous controls to unlock the transformative value of GenAI.
What is required to help ensure safety and robustness are techniques to help us understand the reliability and accuracy of the model prediction, and to measure or predict the degree of uncertainty in its outputs. Techniques are also emerging to further constrain the model outputs sufficiently so that, for example medical experts can be confident that the model is accurate, and that the algorithmic guidance won’t do any harm to the patient.
While this is not a solved problem technically, one example is “fine tuning” where additional text is provided to strengthen the model’s expertise in a particular expert domain. Another is “reinforcement learning” where humans in the loop provide the model with additional guidance on uncertain edge cases. Next, we can increase the “context window” to provide a source of “ground truth” for the model to reference check, and finally there are “adversarial” approaches where a duelling model essentially reviews and fact checks the outputs of the first model to defined boundaries, such as the language in the United Nations (UN) charter on human rights.
But none of these technical options in isolation or combined are adequate for enterprise adoption at scale. Enterprises need a solution-level approach that encompasses the full model life cycle, from data collection and engineering, to model training and validation, to deployment and real time risk monitoring. Only such a holistic view can give business leaders confidence that the risks are manageable, and that adopting GenAI will be value accretive.
In statistics, the concept of a “confidence interval” is a simple way to describe the degree of uncertainty for a given parameter, providing a range of possible estimates of something unknown. This is a much more appropriate framing for models producing probabilistic outputs. The growing public debate about ethical AI is therefore more about safety and responsibility. We believe it is possible to ensure AI is safe, and we believe it is the responsibility of businesses who deploy AI to do so. EY teams are here to help you invest in AI with confidence, unlocking the most value generating use cases to drive exponential growth. Our people-centric approach to AI helps enable the technology to augment your talent, driving efficiency and productivity gains across business functions. And world-leading multi-disciplinary professionals in risk, strategy, technology and transformation help to ensure adoption is aligned with your organization’s purpose, culture, values, and key stakeholders so that AI drives positive human impact.