Modern American smart city with skyscrapers and digital animation showing business digitalization and new era of entrepreneurship

From days to minutes: AI-powered logic analysis of legacy ETL  


AI accelerates the comprehension of legacy ETL platforms by converting fragmented logic and raw code into clear, scalable, and standardized insights.


In brief

  • Legacy and highly centralized ETL platforms encapsulate decades of undocumented logic, making modernization risky, costly, slow and heavily dependent on scarce SME knowledge. 
  • The presented AI solution follows a four-stage architecture that 1- rebuilds data flows, 2 - parses heterogeneous code, 3 - interprets logic, and 4 - validates outputs to create accurate, standardized functional documentation. 
  • The solution enables organizations to analyze legacy environments dramatically faster and with substantially reduced effort, while maintaining a lower error rate and higher consistency than in comparable human-centric analysis.  
  • End-to-end visibility removes risk and accelerates transformation initiatives. 

Every large enterprise has them: mission-critical systems built over decades, containing thousands of transformation rules, business logic scattered across multiple layers and minimal documentation. The developers who built them have moved on or retired. The "why" behind complex transformations exists only in production code.

Now you need to modernize, migrate to the cloud or decommission these systems. The first step? Understanding what they actually do.

Anatomy of the problem domain

Most enterprise data transformation platforms share common characteristics:

Multi-layer architecture:

  • Data is sourced from multiple systems, not always from the master of records
  • Transformation logic is scattered across database objects, ETL workflows and scripts 
  • Many-to-many dependencies between individual transformation steps form a complex web 
  • Multiple output formats consumed by downstream applications 

Accumulated complexity:

  • Decades of continuous, layered, and heterogeneous development
  • The codebase suffers from inconsistent coding conventions and individual developer styles 
  • Weak architectural design embeds business logic directly into technical code, without external business references 
  • Stacked, sequential transformations with chained inputs and outputs lead to excessive complexity 
  • Short-term tactical ETL implementation accumulates over time into undocumented, business-critical dependencies 

Documentation void:

  • Documentation of original business requirements is outdated or has been lost 
  • No standard documentation of what each transformation does
  • Critical knowledge exists only in the heads of people who are not necessarily employed by the enterprise any longer 

Solution approach:

To overcome the above-described problem, EY has developed a streamlined and powerful methodology leveraging AI and automation, consisting of the following steps:

1. Identify and assess key complexity points within the ETL platform
2. Create an inventory of all data producers and consumers 
3. Extract and securely store raw code and related artifacts
4. Parse and transform the collected code into a structured format
5. Reconstruct end-to-end data flows and dependencies
6. Apply multi-step prompting using parsed data and lineage information
7. Evaluate and assign quality scores to the generated insights
8. Produce a comprehensive, standardized functional documentation report

ey.com-graphic-template 002

Comparing Traditional vs. AI-Driven ETL Analysis 

This comparison highlights the stark differences between conventional manual approaches and EY’s AI-powered methodology for legacy ETL analysis. The AI-driven approach delivers significant improvements in speed, consistency, quality, and scalability, while substantially reducing the need for human intervention – driving an estimated ~5x (≈500%) reduction in effort and cost versus traditional manual approaches.

table

Architecture deep dive

1. Data flow exploration engine

The Data Flow Exploration Engine reconstructs end-to-end data flows by aggregating information from every component involved in the transformation process — regardless of platform, syntax or underlying technology.

Key capabilities:

  • Multisource ingestion: Consumes information from operational artifacts such as landing zones, database objects, workflow definitions, schedulers, configuration assets and system metadata — independent of their format.
  • Cross-layer dependency reconstruction: Identifies and maps the relationships between data objects throughout extraction, transformation, and enrichment processes, regardless of their source.
  • Object mapping: Detects relationships between systems, inputs, intermediate objects and outputs by analyzing references, expressions, schemas and transformation rules.
  • Unified data flow graph: Produces a standardized representation of upstream and downstream dependencies, enabling clear visibility into how each field or dataset is produced and consumed across the full path.

This approach allows the system to map data flows at scale, independent of the size of the ETL platform — even in environments where multiple generations of tools coexist — without depending on proprietary connectors or technology-specific integrations.

2. Specialized parsers

Specialized parsers convert raw, unstructured transformation code into structured, machine-interpretable objects. Each parser is optimized for a particular syntactic family (e.g., SQL variants, XML-based ETL workflows, procedural logic or metadata configurations).

Key capabilities:

  • Syntax-aware processing: Identifies statements, expressions, joins, filters, conditions, data mappings and transformation steps with high fidelity.
  • Cross-technology normalization: Converts heterogeneous logic into a consistent intermediate representation that enables and empowers AI analysis.
  • Error-tolerant ingestion: Handles malformed or legacy code that no longer conforms to modern standards.
  • Embedded logic extraction: Captures business rules expressed inside technical constructs (CASE statements, nested queries, parameterized routines, etc.).

By standardizing logic across platforms, the parsers eliminate one of the biggest blockers in legacy analysis: technological fragmentation.

3. LLM integration

The LLM layer interprets the reconstructed data flows and parsed data to produce human-understandable explanations of complex transformation logic. It doesn’t “guess”; it reasons based on structured evidence extracted by the earlier components.

Key capabilities:

  • Context-aware interpretation: Consumes systems definitions, data flow graphs, parsed objects and dependency structures to generate precise descriptions.
  • Multi-step reasoning: Breaks down complex transformations into sequenced prompts to ensure accuracy and completeness.
  • Pattern detection: Identifies recurring logic patterns (aggregations, validations, enrichment steps, cleansing operations) that would otherwise require manual review.
  • Consistent and standardized output: Ensures every explanation follows the same structure and terminology — something that is impossible to achieve manually.

This layer transforms raw technical logic into business-friendly, functional documentation.

4. Validation engine

The Validation Engine ensures the correctness, completeness and reliability of the AI-generated documentation and insights. It verifies alignment between parsed logic, data flows, and LLM-generated explanations.


Key capabilities:

  • Quality scoring: Assigns confidence levels to the LLM‑generated explanations based on how accurately and consistently they reflect the underlying parsed logic and lineage.
  • Anomaly detection: Identifies inconsistencies or gaps in the LLM results, such as missing steps, unclear reasoning, unreachable paths or logic that does not match the underlying parsed transformations.
  • Human-in-the-loop refinement: Allows SMEs to review flagged sections and provide targeted feedback that continuously improves the system. The role of humans goes from purely manual tasks to focusing on steps with lower quality scoring or anomalies.

The Validation Engine closes the loop by guaranteeing that generated outputs are reliable and auditable — a critical requirement for enterprise modernization programs.

final

Credentials & proposal

At EY Technology Consulting, we have successfully delivered similar automation and AI use cases across large-scale transformation projects in the financial sector, enabling our clients to accelerate modernization efforts and reduce manual analysis workloads. The results have been astonishing.

If your organization faces a similar challenge — a centrally-managed ETL platform with large transformation pipelines, limited documentation and fragmented logic across technologies — don’t hesitate to reach out for a discussion. Our approach uses AI to dramatically increase speed, consistency, and transparency while reducing manual effort. Whether you aim to migrate, optimize, or simply understand your current state, we would be happy to support you with our proven methodology and experience.

Summary

Legacy ETL platforms often contain decades of undocumented, complex logic, making modernization slow, risky, and dependent on scarce expertise. EY’s AI-driven methodology addresses this by reconstructing data flows, parsing raw code, interpreting logic with LLMs, and validating outputs for accuracy.

This four-stage approach transforms fragmented legacy systems into clear, standardized, and actionable insights—accelerating analysis from days to minutes, improving consistency, reducing manual effort, and providing end-to-end visibility. Proven in large-scale transformations, EY’s solution enables organizations to confidently migrate, optimize, and govern their legacy ETL environments at enterprise scale.

Acknowledgement

We kindly thank Francisco Jose Araque Pineda and Kjell Vanden Berghe for their valuable contribution to this article.


FAQs

Related articles

Risks and benefits of generative AI in the financial sector

Explore the benefits and challenges of generative AI in finance. Learn how FINMA guides AI in financial reporting with a focus on governance and compliance.

Banking Barometer 2026 - (Re)action

Discover insights from the EY Banking Barometer on Swiss banking outlook, AI banking trends, and sustainability in banking. Learn what’s shaping the future.

Measurable impact: Organizations that use Artificial Intelligence in the right way increase profits and reduce costs

Artificial intelligence (AI) has come to stay. More and more people – whether in their private lives, their work routines, or both areas – are utilizing the opportunities presented by this new technology. The European AI Barometer 2025 shows, how their perspective on AI applications has changed over the past 12 months, how satisfied they are with the application of technology in their everyday work and where they still see challenges.


About this article

Authors

Request for proposal (RFP) - exclusively for Switzerland

|

Submit your request now!