Why enterprise AI feels more reliable than it is

Agentic Coding Framework

Accelerate delivery with EY’s Agentic Coding Framework. AI driven discipline for quality, governance and predictable software outcomes.
Read more

Why small errors become business risk

These unverified flaws compound rapidly. A minor hallucination in a data extraction task becomes a structural error in a financial summary, mutating into a catastrophic miscalculation in a strategic forecast. Unmanaged AI systems effectively industrialise the transformation of minor statistical anomalies into systemic institutional risks. The gap between rapid creation and reliable use acts as a brutal bottleneck on operational confidence.

Humans make these exact same errors: misreading the source, missing the inconsistency, overstating the conclusion, or carrying a flawed assumption into the next stage of work. The actual peril lies in the scale, velocity, and frictionless delivery of these familiar mistakes. AI generates polished flaws continuously, churning them out 24 hours a day. Each one slips past our defences because it arrives wrapped in fluent, plausible language.

A catastrophic outcome emerges from hundreds of small, articulate errors, hiding in plain sight and accumulating until the final forecast or recommendation is entirely indefensible. This necessitates designing strict verification tollgates into the workflow; we must engineer friction back into the system to catch the slop before it compounds.

An operating model for engineering confidence

This brings us to The Foundry. It is the industrial process that extracts value from probabilistic systems while systematically neutralising the risk. The Foundry is an operating model designed to enforce deterministic review. It operates in cycles, with each cycle acting as a complete unit comprising four distinct stages to create a validated, decision-grade artefact.
First, the Forge stage uses the large language model to generate a candidate draft. This is the only phase where probabilistic behaviour is permitted, quarantining the inherent variability to the exact point of creation.

Second, the Quench stage applies deterministic validation. This involves running the candidate draft through explicit rules, schemas, and repeatable automated checks. It strips away the polished gloss and tests the actual structural integrity of the output.

Third, the Appraise stage examines the results of the Quench tests to form a judgement. The system determines whether the artefact satisfies the defined laws and requirements of its specific domain based on the empirical evidence gathered.

Finally, the Sort stage deterministically routes the artefact based on the appraisal. A validated artefact moves forward to the decision-maker, and a flawed artefact is sent back for reprocessing or flagged for human intervention.