GxP and AI tools: Compliance, Validation and Trust in Pharma

AI in pharma demands trust. Discover how AI can validate AI itself, ensuring GxP compliance, accuracy and patient safety.

In brief

Compliance with Annex 11/22, EU AI Act, GDPR and CSV/CSA principles is essential to ensure patient safety and regulatory trust.
Leveraging AI to validate AI enables scalable, risk-based testing, dramatically reducing manual effort while strengthening compliance evidence — with subject matter experts and human oversight as safeguards.

GxP meets AI: ensuring accuracy and trust in AI tools designed for Pharma and HCPs

The integration of AI functionality into pharmaceutical business is no longer a theoretical concept, it is a strategic reality. AI in pharma is already supporting core functions from R&D to supply chain operations. While the EU AI Act currently does not provide clear implementation and application guidelines for limited-risk applications, the pharmaceutical industry still has another set of guidelines that must be followed. Aside from general data safety and security standards (e.g., GDPR), certain quality standards, validation and software assurance must be guaranteed for use within the pharmaceutical company, especially for those functionalities offered to external parties such as healthcare professionals (HCPs). This rigorous quality control (QC) is mandatory to meet EU regulatory requirements, maintain patient safety and protect data integrity – and also applies to AI tools.

AI tools can be applied at different stages along the value chain and for various purposes within pharmaceutical organizations. For example, AI tools may support pharmacovigilance (e.g., adverse event intake and signal detection), streamline clinical trial operations (e.g., patient matching and protocol deviation monitoring) or enhance regulatory affairs (e.g., automated review of submission dossiers). Their functionality may rely on static datasets (e.g., validated SmPC libraries), dynamic data generated within the company or even publicly available information. The nature of the underlying data, the way results are produced and the intended use of the output create important differences in expected output and variations of results. These differences also affect reproducibility: outputs derived from static, validated datasets are expected to remain highly consistent whereas systems that rely on open-ended web crawling can produce variable and less predictable results. Such differences should not only impact guidelines for internal governance and validation strategies, but should also be taken into account by regulatory frameworks, ensuring that assurance requirements are proportional to the type of data and risk associated with the tool’s application.

This paper outlines the relevant EU regulatory framework, QC and software assurance requirements as well as a risk-based validation strategy for deploying AI tools within a GxP environment or for interactions with HCPs. The focus is on AI validation in pharma, ensuring tools are both innovative and compliant.

1. Relevant regulatory frameworks within the EU

Under the European regulatory framework for medicinal products, the quality and reliability of any computerized system that could influence understanding or decision-making are paramount. Standards for these factors are captured in Good Practices, referred to as GxP. GxP regulations require accuracy, traceability and reproducibility of information provided through standards defined by organizations such as ICH (International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use) or through regulation explicitly for the EU. For manufacturing and distribution, the EU GMP Regulations as presented in EudraLex Vol. 4 are applicable.

Annexes to the EU GMP Regulations further detail the requirements as defined by the EU Commission. In EU GMP Annex 11, the requirements for the use of computerized systems are defined. It requires that all such systems undergo formal validation, typically through installation qualification, operational qualification and performance qualification to demonstrate that they consistently perform as intended. In the EU, this process is referred to as computerised system validation (CSV), which traditionally follows a structured, documentation-heavy top-down approach. Even if a new version of the annex has recently been published for public consultation, it still follows former patterns of software development and deployment not reflecting today’s standards such as cloud computing and computing environments serviced by third parties – not to mention concepts such as blockchain and AI.

By contrast, in the United States, the FDA promotes computer software assurance (CSA) as a successor for CSV which uses a more flexible, risk-based approach integrating platforms and services provided by third parties. While the EU has not formally adopted CSA, CSA-style risk-based testing can be applied within Annex 11/GAMP 5 when justified and documented.

For the EU, regulations about the use of AI are developed in a new annex to EU GMP, Annex 22. This is where GMP meets AI, bridging traditional pharmaceutical manufacturing standards with modern AI-driven processes. By July 2025, the first version of that annex has been published for public consultation until October 2025. This version is limited to output generated by AI tools based on static content.

On top of the GxP requirements, the EU AI Act is being rolled out in phases. Chatbots that perform factual retrieval are usually categorized as “limited-risk” under the AI Act, subject mainly to transparency obligations (e.g., disclosure that the user is interacting with AI). The output of the Chatbot is reviewed by an expert so there is human interaction that drives the application of the retrieved information to patients. However, even as a low-risk application, governance and quality assurance are of the utmost importance.

In parallel, GDPR governs the processing of any personal data within physician interactions, demanding robust access control, auditability and secure data storage. Personal data (including physician identifiers and patient cases shared during queries) must be processed under the GDPR Art. 5 principles of data minimization and purpose limitation, with explicit consideration of retention and audit trail requirements.

2. Quality Control

In the pharmaceutical context, software assurance means providing documented, lifecycle-long evidence that the system is fit for its intended use, resilient to change and consistently reliable. Following a framework to ensure compliance (such as the ISPE GAMP 5 principles), a risk-based approach should guide the depth of verification and validation activities. Supplier qualifications are essential, especially for cloud-hosted models or third-party natural language processing components, where service agreements must address data integrity, change notification and uptime commitments. Change management procedures should govern updates to datasets, model versions or conversation handling logic, ensuring that no modification enters production without prior verification. Operational controls, including user access management, monitoring and incident response processes, provide an additional safeguard against both technical and compliance failures. Continuous assurance, through periodic re-testing and accuracy monitoring, ensures that performance does not degrade over time—a risk that is particularly relevant for AI models subject to drift.

QC in this context must address the accuracy of the AI system. AI quality control encompasses accuracy, reliability and auditability across diverse input-output scenarios. The AI tool must reliably retrieve information from trustworthy sources, without introducing promotional bias or off-label interpretations. Each interaction should be logged with the necessary details (e.g., query, retrieved source, generated output and system version) to create an audit trail that would withstand a regulatory inspection. A central question is: how is accuracy measured? Metrics such as exact match accuracy, factual accuracy rate and critical error rate must be complemented by qualitative assessments of conversational coherence with a wide range of possible inputs, e.g., variations of the questions asked or the input provided. Given potential patient safety impact, the critical error rate should be set as low as possible. For instance, when validating a pharmacovigilance case-intake assistant, exact match accuracy may be applied to verify that all required data is captured correctly, factual accuracy rate can be applied to literature summarization tools and critical error rate is particularly relevant when AI is used for dosage-related outputs.

3. AI as a tool for validation

As AI systems can process a wide variety of inputs and generate equally diverse outputs, validating them by providing a fixed set of input phrases and comparing the resulting outputs for completeness, correctness and consistency presents a significant challenge. The traditional validation paradigm—where a specific input deterministically produces one predefined output—is no longer sufficient or appropriate for AI-driven systems. Instead, validation must account for acceptable ranges of variation, ensure that mandatory information is always present and confirm that outputs remain accurate, safe and compliant within defined boundaries.

One of the most promising developments in this space is the use of AI itself to evaluate AI systems under a risk-based testing paradigm. Traditionally, CSV or CSA processes involve manually designing test cases, crafting prompts and manually reviewing chatbot outputs—a highly resource-intensive process. Emerging tools now allow the automated generation of thousands of prompts based on defined parameters. These prompts are then fed into the AI tool and the outputs are evaluated or screened automatically using AI against defined quality categories, such as the five ChatGPT QC categories (factual accuracy, completeness, relevance, safety and style). For example, in a validation scenario, 1000 automatically generated physician queries about contraindications could be submitted to the AI tool with outputs scored for factual accuracy against the content, relevance to the query and safety (no off-label advice). By leveraging AI in this way, it becomes possible to test with far greater coverage in less time, enabling much more comprehensive risk-based validation. This demonstrates the potential of AI for GxP validation where AI is both the subject and the instrument of assurance. That process is guided by human expertise and the clear definition of expected outcomes. A subject matter expert (SME) must specify representative input patterns as well as the corresponding expected outputs that the system should generate. The SME also determines which elements of an output are mandatory and which may be considered optional or acceptable alternatives. In certain cases, the SME may explicitly define content that must not be produced by the tool, such as off-label or promotional statements. To ensure comprehensive testing, combinations of inputs and their respective expected outcomes should also be defined. As the number of input patterns and the variety of possible outputs grow, the dataset to be analyzed and compared can quickly reach a level of considerable complexity, requiring structured methods and tools for efficient evaluation.

Generating a wide range of inputs based on data defined by the expert as well as checking the output against expected outcome across the results provided is a perfect field for the use of AI. Variations of the input can be generated by AI, fed automatically into the AI tool and the results of the exercise can be checked against the expectation of the expert as well as across all results to point out differences and to evaluate completion of the testing. All that can be presented in a testing report generated by AI to be then checked by the expert.

With that, AI is used to test AI automatically. To avoid similarities and patterns, the AI tool testing the target AI tool should be based on a different AI model. Regulators will likely expect that the testing AI model is demonstrably independent from the tool under validation and that all evaluations are subject to final human oversight before release decisions are made.

The predefined quality categories could be:

Request for proposal (RFP) - exclusively for Switzerland

Trending topics

Spotlight

Services

Spotlight

Industries

Trending

Careers

Spotlight

About us

Spotlight

Events

Upcoming events

AI validation in pharma: maintaining compliance and trust

AI in pharma demands trust. Discover how AI can validate AI itself, ensuring GxP compliance, accuracy and patient safety.

GxP meets AI: ensuring accuracy and trust in AI tools designed for Pharma and HCPs

1. Relevant regulatory frameworks within the EU

2. Quality Control

3. AI as a tool for validation

4. Conclusion

Summary

Acknowledgement

FAQs

Request for proposal (RFP) - exclusively for Switzerland

Trending topics

Spotlight

Services

Spotlight

Industries

Trending

Careers

Spotlight

About us

Spotlight

Events

Upcoming events

AI validation in pharma: maintaining compliance and trust

AI in pharma demands trust. Discover how AI can validate AI itself, ensuring GxP compliance, accuracy and patient safety.

GxP meets AI: ensuring accuracy and trust in AI tools designed for Pharma and HCPs

1. Relevant regulatory frameworks within the EU

2. Quality Control

3. AI as a tool for validation

4. Conclusion

Summary

Acknowledgement

FAQs

1. How can AI itself be used to validate other AI systems?

2. What role does human oversight play in ensuring compliant AI tool deployment?

3. What QC metrics are most relevant for AI tools in pharma?

4. Why is AI-based validation valuable for pharma?