In Laboratory Over the Shoulder View of Scientist in Protective Clothes Doing Research on a Personal Computer. Modern Manufactory Producing Semiconductors and Pharmaceutical Items.

AI validation in pharma: maintaining compliance and trust


Related topics

AI in pharma demands trust. Discover how AI can validate AI itself, ensuring GxP compliance, accuracy and patient safety.


In brief

  • Compliance with Annex 11/22, EU AI Act, GDPR and CSV/CSA principles is essential to ensure patient safety and regulatory trust.
  •  Leveraging AI to validate AI enables scalable, risk-based testing, dramatically reducing manual effort while strengthening compliance evidence — with subject matter experts and human oversight as safeguards.

GxP meets AI: ensuring accuracy and trust in AI tools designed for Pharma and HCPs

The integration of AI functionality into pharmaceutical business is no longer a theoretical concept, it is a strategic reality. AI in pharma is already supporting core functions from R&D to supply chain operations. While the EU AI Act currently does not provide clear implementation and application guidelines for limited-risk applications, the pharmaceutical industry still has another set of guidelines that must be followed. Aside from general data safety and security standards (e.g., GDPR), certain quality standards, validation and software assurance must be guaranteed for use within the pharmaceutical company, especially for those functionalities offered to external parties such as healthcare professionals (HCPs). This rigorous quality control (QC) is mandatory to meet EU regulatory requirements, maintain patient safety and protect data integrity – and also applies to AI tools.

AI tools can be applied at different stages along the value chain and for various purposes within pharmaceutical organizations. For example, AI tools may support pharmacovigilance (e.g., adverse event intake and signal detection), streamline clinical trial operations (e.g., patient matching and protocol deviation monitoring) or enhance regulatory affairs (e.g., automated review of submission dossiers). Their functionality may rely on static datasets (e.g., validated SmPC libraries), dynamic data generated within the company or even publicly available information. The nature of the underlying data, the way results are produced and the intended use of the output create important differences in expected output and variations of results. These differences also affect reproducibility: outputs derived from static, validated datasets are expected to remain highly consistent whereas systems that rely on open-ended web crawling can produce variable and less predictable results. Such differences should not only impact guidelines for internal governance and validation strategies, but should also be taken into account by regulatory frameworks, ensuring that assurance requirements are proportional to the type of data and risk associated with the tool’s application.

Complexity of verification and testing increases from traditional applications to AI with dynamic internet content, with EU GMP Annex 22 covering only AI with static content

This paper outlines the relevant EU regulatory framework, QC and software assurance requirements as well as a risk-based validation strategy for deploying AI tools within a GxP environment or for interactions with HCPs. The focus is on AI validation in pharma, ensuring tools are both innovative and compliant.

1. Relevant regulatory frameworks within the EU

Under the European regulatory framework for medicinal products, the quality and reliability of any computerized system that could influence understanding or decision-making are paramount. Standards for these factors are captured in Good Practices, referred to as GxP. GxP regulations require accuracy, traceability and reproducibility of information provided through standards defined by organizations such as ICH (International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use) or through regulation explicitly for the EU. For manufacturing and distribution, the EU GMP Regulations as presented in EudraLex Vol. 4 are applicable.

Annexes to the EU GMP Regulations further detail the requirements as defined by the EU Commission. In EU GMP Annex 11, the requirements for the use of computerized systems are defined. It requires that all such systems undergo formal validation, typically through installation qualification, operational qualification and performance qualification to demonstrate that they consistently perform as intended. In the EU, this process is referred to as computerised system validation (CSV), which traditionally follows a structured, documentation-heavy top-down approach. Even if a new version of the annex has recently been published for public consultation, it still follows former patterns of software development and deployment not reflecting today’s standards such as cloud computing and computing environments serviced by third parties – not to mention concepts such as blockchain and AI.

By contrast, in the United States, the FDA promotes computer software assurance (CSA) as a successor for CSV which uses a more flexible, risk-based approach integrating platforms and services provided by third parties. While the EU has not formally adopted CSA, CSA-style risk-based testing can be applied within Annex 11/GAMP 5 when justified and documented.

For the EU, regulations about the use of AI are developed in a new annex to EU GMP, Annex 22. This is where GMP meets AI, bridging traditional pharmaceutical manufacturing standards with modern AI-driven processes. By July 2025, the first version of that annex has been published for public consultation until October 2025. This version is limited to output generated by AI tools based on static content.

On top of the GxP requirements, the EU AI Act is being rolled out in phases. Chatbots that perform factual retrieval are usually categorized as “limited-risk” under the AI Act, subject mainly to transparency obligations (e.g., disclosure that the user is interacting with AI). The output of the Chatbot is reviewed by an expert so there is human interaction that drives the application of the retrieved information to patients. However, even as a low-risk application, governance and quality assurance are of the utmost importance.

In parallel, GDPR governs the processing of any personal data within physician interactions, demanding robust access control, auditability and secure data storage. Personal data (including physician identifiers and patient cases shared during queries) must be processed under the GDPR Art. 5 principles of data minimization and purpose limitation, with explicit consideration of retention and audit trail requirements.

2. Quality Control

In the pharmaceutical context, software assurance means providing documented, lifecycle-long evidence that the system is fit for its intended use, resilient to change and consistently reliable. Following a framework to ensure compliance (such as the ISPE GAMP 5 principles), a risk-based approach should guide the depth of verification and validation activities. Supplier qualifications are essential, especially for cloud-hosted models or third-party natural language processing components, where service agreements must address data integrity, change notification and uptime commitments. Change management procedures should govern updates to datasets, model versions or conversation handling logic, ensuring that no modification enters production without prior verification. Operational controls, including user access management, monitoring and incident response processes, provide an additional safeguard against both technical and compliance failures. Continuous assurance, through periodic re-testing and accuracy monitoring, ensures that performance does not degrade over time—a risk that is particularly relevant for AI models subject to drift.

QC in this context must address the accuracy of the AI system. AI quality control encompasses accuracy, reliability and auditability across diverse input-output scenarios. The AI tool must reliably retrieve information from trustworthy sources, without introducing promotional bias or off-label interpretations. Each interaction should be logged with the necessary details (e.g., query, retrieved source, generated output and system version) to create an audit trail that would withstand a regulatory inspection. A central question is: how is accuracy measured? Metrics such as exact match accuracy, factual accuracy rate and critical error rate must be complemented by qualitative assessments of conversational coherence with a wide range of possible inputs, e.g., variations of the questions asked or the input provided. Given potential patient safety impact, the critical error rate should be set as low as possible. For instance, when validating a pharmacovigilance case-intake assistant, exact match accuracy may be applied to verify that all required data is captured correctly, factual accuracy rate can be applied to literature summarization tools and critical error rate is particularly relevant when AI is used for dosage-related outputs.

3.  AI as a tool for validation

As AI systems can process a wide variety of inputs and generate equally diverse outputs, validating them by providing a fixed set of input phrases and comparing the resulting outputs for completeness, correctness and consistency presents a significant challenge. The traditional validation paradigm—where a specific input deterministically produces one predefined output—is no longer sufficient or appropriate for AI-driven systems. Instead, validation must account for acceptable ranges of variation, ensure that mandatory information is always present and confirm that outputs remain accurate, safe and compliant within defined boundaries.

One of the most promising developments in this space is the use of AI itself to evaluate AI systems under a risk-based testing paradigm. Traditionally, CSV or CSA processes involve manually designing test cases, crafting prompts and manually reviewing chatbot outputs—a highly resource-intensive process. Emerging tools now allow the automated generation of thousands of prompts based on defined parameters. These prompts are then fed into the AI tool and the outputs are evaluated or screened automatically using AI against defined quality categories, such as the five ChatGPT QC categories (factual accuracy, completeness, relevance, safety and style). For example, in a validation scenario, 1000 automatically generated physician queries about contraindications could be submitted to the AI tool with outputs scored for factual accuracy against the content, relevance to the query and safety (no off-label advice). By leveraging AI in this way, it becomes possible to test with far greater coverage in less time, enabling much more comprehensive risk-based validation. This demonstrates the potential of AI for GxP validation where AI is both the subject and the instrument of assurance. That process is guided by human expertise and the clear definition of expected outcomes. A subject matter expert (SME) must specify representative input patterns as well as the corresponding expected outputs that the system should generate. The SME also determines which elements of an output are mandatory and which may be considered optional or acceptable alternatives. In certain cases, the SME may explicitly define content that must not be produced by the tool, such as off-label or promotional statements. To ensure comprehensive testing, combinations of inputs and their respective expected outcomes should also be defined. As the number of input patterns and the variety of possible outputs grow, the dataset to be analyzed and compared can quickly reach a level of considerable complexity, requiring structured methods and tools for efficient evaluation.

Generating a wide range of inputs based on data defined by the expert as well as checking the output against expected outcome across the results provided is a perfect field for the use of AI. Variations of the input can be generated by AI, fed automatically into the AI tool and the results of the exercise can be checked against the expectation of the expert as well as across all results to point out differences and to evaluate completion of the testing. All that can be presented in a testing report generated by AI to be then checked by the expert.

A proposed process that is compliant while involving minimal human effort

With that, AI is used to test AI automatically. To avoid similarities and patterns, the AI tool testing the target AI tool should be based on a different AI model. Regulators will likely expect that the testing AI model is demonstrably independent from the tool under validation and that all evaluations are subject to final human oversight before release decisions are made.

The suggested process for an AI tool QC with minimal human effort.

The predefined quality categories could be:

Key QC metrics for AI tool validation

This AI-assisted QC methodology aligns particularly well with CSA principles, where testing efforts are concentrated on higher-risk features. Within the EU’s more prescriptive CSV framework, it can still be adopted if the testing tools are qualified, their parameters documented and human oversight applied to critical cases. In essence, this approach transforms AI from being solely the subject of validation into also being the instrument of validation.
 

4. Conclusion

The convergence of GxP principles, emerging AI regulation and innovative testing approaches highlights both the challenge and the opportunity in deploying AI tools. For pharmaceutical companies, success lies not simply in building an intelligent system, but in building one that regulators, healthcare professionals and ultimately patients can trust. By embedding rigorous quality control, adopting risk-based validation methodologies and leveraging AI-driven testing to scale assurance activities, the industry can set a new benchmark for digital tools in medicine.

The strategic benefit extends beyond compliance. A thoroughly tested and qualified AI tool becomes an asset that strengthens medical information services, accelerates knowledge transfer to HCPs and reduces the risk of miscommunication that could compromise patient safety. It demonstrates that innovation in pharma can be achieved without compromising the principles of accuracy, transparency and accountability that underpin GxP. With AI now being applied not only as the subject of validation but also as the instrument of validation, pharma has the chance to lead by example, showing that trustworthy AI in medicine is both possible and sustainable. For regulators and QA teams, the value lies in demonstrable assurance and risk reduction. For business leaders, it lies in accelerating innovation cycles without compromising patient safety, thereby strengthening both compliance and competitiveness.


Summary

AI tools are transforming pharma, but compliance with GxP, GDPR and the EU AI Act is critical. A key innovation is AI testing AI: using AI to generate test cases, evaluate outputs and flag risks with human experts overseeing critical results. This enables scalable, efficient validation, ensuring accuracy, safety and regulatory trust, turning AI from a subject of validation into a powerful tool for assurance.

Acknowledgement

We thank Dr. Sharon Kaufman for co-authoring this article.

FAQs


Related articles




    About this article

    Request for proposal (RFP) - exclusively for Switzerland

    |

    Submit your request now!