Abstract technology background

Using knowledge graphs to unlock GenAI at scale

Knowledge graphs can be used to leverage domain-specific data and increase trust in GenAI outputs.


In brief
  • GenAI can support many business tasks if foundational models are supplemented with domain-specific data.
  • Retrieval augmented generation (RAG) ​is often used to do this but has contextual and abstractive limitations.
  • Bootstrapping RAG with knowledge graphs facilitates a holistic understanding of concepts across a body of knowledge and at various levels of abstraction.

Introduction

GenAI has the potential to drive significant business impacts for organizations by accelerating research and development, improving product design, and hardening engineering specifications. GenAI is becoming a business enabler for organizations that must differentiate their offerings and capabilities from competitors’ because the technology can help not only analyze and identify patterns in vast data sets but also synthesize novel solutions that can be pressure-tested in silico. Since creating differentiated offerings (e.g., new customer-facing products) is high risk, high reward and very domain specific, the applicability for GenAI to be used immediately and repeatably for these tasks must be assessed use case by use case. However, GenAI can be used readily to augment and enhance back-office tasks such as data entry, financial reporting and compliance monitoring, which are often resource intensive and prone to human error. By streamlining these processes, companies can achieve greater operational efficiency, reduce costs and allocate more resources to the strategic initiatives described earlier.

However, as organizations seek to operationalize GenAI in and around their business processes, two concerns are most often cited as inhibitors to trusted progress. The first is that hallucinations negatively impact response veracity, and the second is that domain-specific or temporally relevant data cannot be leveraged without model fine-tuning. Alleviating both concerns is critical, and the prescribed solution often involves retrieval augmented generation (RAG) against a vector store as the mechanism to bootstrap large language model (LLM) “understanding” with information from proprietary sources. These sources are often large and unstructured, which makes them suboptimal for knowledge retrieval with or without GenAI. Harnessing knowledge graphs (KGs) to use as the basis for RAG implementations is showing substantial promise for factually anchoring LLM responses against massive corpora, high-precision retrieval, and solving abstract summarization tasks. In this article we will explore this technique and share how it can be used to enhance GenAI applications in the enterprise.

Bootstrapping RAG with knowledge graphs

 

A KG is a structured way of representing how entities are interrelated. Depending on the domain, entities can be things, events or concepts, and their modeled relationships can describe equivalence, hierarchy, ownership and other associations. KGs are stored in graph databases, which are databases that are optimized for storing data with complex relationships that must be traversed at query time. KGs are useful for large organizations because they break down data silos and enable holistic understanding of operations by fusing intra- and interdomain knowledge from multiple data sources at all levels of abstraction. Since KGs support fusing, locating, contextualizing and understanding data, they represent a substantial maturity shift from unstructured text corpora being embedded and stored in a vector database for RAG applications.

 

As expected, a common KG-bootstrapped architecture is like traditional RAG architecture but inclusive of a graph database, data engineering pipelines that hydrate and enrich the graph, and additional orchestration and prompting to exploit the graph’s hierarchy in AI-generated responses. There are technical differences between knowledge-augmented generation (KAG) and graph RAG methods, but generally the graph database serves as the backbone of the architecture, storing entities extracted from upstream content and their relationships in a domain subgraph and, depending on the use case, document chunks from that same upstream content in a lexical subgraph. Since a domain graph describes business knowledge and a lexical graph describes linguistic structure, together they paint a rich picture of precisely where business domain knowledge exists in source content. Further, the superstructure of the graph allows broad understanding at various levels of abstraction such as summarizing across a corpus or identification of common themes within and across documents.

graph-domain-and-lexical-subgraphs
graph-domain-and-lexical-subgraphs

To build this knowledge graph, data engineering pipelines must perform a variety of extraction and enrichment tasks to distill upstream sources and knowledge bases to structured data inside the graph. Optical character recognition (OCR), text extraction, chunking and named entity resolution must be performed to ingest data from raw text, document objects and API feeds, and business logic must follow to generate contextualized and relevant metadata that forms the graph’s topology. Many of the upstream tasks that are used to create embeddings and persist data in the vector store, such as OCR, text extraction and chunking, can be reused here, but dedicated effort is necessary to create a rich domain subgraph that interfaces meaningfully with the lexical subgraph described above. This can be done in several ways to balance index cost, query cost and answer quality, but the intention is to facilitate query resolution that holistically cuts across the private data set used as the basis for the RAG implementation.

Contract warranty liability example

Contracts and their supplements often contain detailed specifications, performance benchmarks and maintenance obligations that can be challenging to interpret and manage. Humans commonly misinterpret how this information impacts warranty claims and the scope of subsequent maintenance contracts, especially in scenarios involving complex terms and conditions, and the possible downside impact to margin is substantial. Although GenAI could be used to support these business processes, we must protect response veracity, given the domain specificity of the problem and the large downside impact of an incorrect interpretation. We will give an example below of how GenAI bootstrapped with a knowledge graph can be used for this problem class.

Suppose for a large, complex contract we want to know whether hydraulics are covered under warranty and how this posture compares with other recent contracts. This is a great opportunity to use GenAI with a knowledge graph to augment our business processes. Distilling the problem into its components, we must first assess what “covered” means; here, a foundational LLM and good prompting will suffice. A modern LLM will be sophisticated enough to discern with minimal prompting that we are interested in “coverage” in the legal and financial senses as opposed to those of the news media, cellular networks or defenses in football.

Next, we must also understand within the target contract the factually correct coverage posture and context supporting this position; here, traditional vector RAG and additional prompting can help. Using only a foundational LLM, we will have no basis upon which to provide a response as private contract data were not used to train the model. RAG, prompt stuffing and fine-tuning are our options, and modern thinking is preferential to RAG for cost and speed considerations. However, depending on contract structure and content, generating highly accurate and contextualized responses may be problematic.

Further, if we want to connect the dots across contracts, abstraction is necessary. Retrieval techniques against knowledge graphs can help on both these fronts by quickly identifying relevant communities of entities and relationships in the graph, and by enabling contextual “zooming” and “panning” of the graph to generate synthesized insights that are most germane to the prompts. In the warranty coverage example, this means we may receive content in our response that preferentially cites this year’s contracts vs. older contracts, describes coverage alignment or deviation from a synthesized “average” contract, and provides insight as to whether hydraulics coverage substantially differs from coverage of other subsystems. Reliably generating this type of deeper insight with a typical RAG approach is not possible, even if the approach is generally able to factually anchor responses to source content.

In conclusion, GenAI can assist analysts in reviewing existing contracts to identify any ambiguities or gaps in warranty clauses, reducing the likelihood of disputes arising from unclear terms. Further upstream in the drafting part of contracting workflow, GenAI can be used to create contract templates that include well-defined warranty clauses based on the specific needs and risks associated with different types of equipment.

How EY US will help your organization

Since the arrival of the first generative adversarial network in 2014, EY US has created substantial value with GenAI for many of the world’s largest clients. Our industry-aligned teams have the domain knowledge and technical acumen to help you increase the return on your GenAI investments and help your organization succeed wherever it is on its journey.

Summary 

By incorporating knowledge graphs into GenAI solution architectures, organizations can better deliver highly trusted, domain-specific applications, ultimately driving better user satisfaction and business outcomes.

About this article

Authors


Related articles

AI evolution: empowering people and businesses

Discover how the EY organization is navigating AI transformation while prioritizing ethical use of AI and empowering the workforce.

Prepare your data for success

CDO roadmap: Is your data AI-ready? Leverage your business data for AI success. | Find out more.

How to implement generative AI for 400,000 employees

EY leaders have been racing to effectively leverage GenAI across the organization. Here’s what we learned in rolling it out to 400,000 employees.