Techathon 6.0

Problem Statement 5: Retail

About the business:

Healthcare payers struggle with maintaining accurate provider directories, with studies showing 80%+ of provider entries contain errors like incorrect addresses, phone numbers, professional details, license details. Manual validation processes are time-intensive, requiring staff to call providers, verify credentials, and update multiple systems. This creates frustration among members when they can't reach providers, regulatory compliance risks, and wasted operational resources. A simplified AI solution focused on automating basic provider data validation and directory updates can demonstrate significant value while being feasible for hackathon development using publicly available data sources.

Business Problem 

Current Challenges:

  • Provider directories contain 40-80% inaccurate contact information causing member frustration and access issues
  • Manual verification processes requiring staff to call hundreds of providers monthly for basic updates
  • Multiple data entry points creating inconsistencies between online directories, mobile apps, and printed materials
  • Regulatory requirements demanding frequent provider data updates with limited automation capabilities
  • Time-consuming credential verification processes delaying provider network additions by weeks or months
  • Member complaints about outdated provider information leading to unsuccessful appointment attempts

Desired Outcomes

  • Automate provider data validation through intelligent web scraping and API calls
  • Reduce manual verification time through AI assistance
  • Achieve target provider contact information accuracy through continuous automated validation
  • Create unified provider data management reducing inconsistencies across member-facing platforms
  • Demonstrate reduction in provider directory maintenance costs through intelligent automation

Goal

Develop a simplified Agentic AI system that automates basic provider data validation using publicly available sources, demonstrates intelligent data quality improvement, and showcases the potential for full-scale provider data management automation with synthetic and public data sources.

  • Provider data includes
  • Demographics: Name, contact information
  • Professional Details: Specialties, licenses, certifications
  • Network Affiliations: Insurance networks, affiliations with other providers or groups
  • Services Offered: Clinical focus, appointment availability
  • Location and Facilities: Addresses, medical imaging facilities

Key Deliverable (Demo)

Demonstration Scenario: Automated validation and updating of 200 provider profiles using publicly available data sources.

  • Input: Sample provider dataset with names, addresses, phone numbers, specialties, and basic credential information. Must include scanned pdf (unstructured data) formats.
  • Process: AI agent automatically validates contact information via web scraping, checks credentials against public databases, identifies inconsistencies, and flags providers needing manual review
  • Output: User interface to show updated provider profiles with confidence scores, actions status reports, prioritized list of providers requiring human attention, and generate communication email.
  • Timeline: Complete validation cycle in under 30 minutes versus traditional manual work

Agentic AI Roles (Suggested / Illustrative)

Data Validation Agent:

  • Performs automated web scraping of provider practice websites to verify current contact information and services
  • Cross-references provider information against public databases including NPI registry and state licensing boards
  • Conducts intelligent phone number and address validation using publicly available verification services
  • Generates confidence scores for each data element based on source reliability and cross-validation results

Information Enrichment Agent:

  • Searches public sources for additional provider information including education, board certifications, and specialties
  • Analyzes provider websites and online profiles for updated practice information and service offerings
  • Identifies potential network gaps by analyzing geographic distribution and specialty coverage
  • Creates standardized provider profiles with enriched data from multiple public sources

Quality Assurance Agent:

  • Compares provider information across multiple sources to identify discrepancies and inconsistencies
  • Flags providers with suspicious or potentially fraudulent information for manual review
  • Tracks data quality metrics and generates reports on validation success rates and common error patterns
  • Prioritizes providers for manual verification based on member impact and data confidence levels

Directory Management Agent:

  • Generates updated provider directory entries in multiple formats (web, mobile app, PDF)
  • Creates automated alerts for providers requiring immediate attention or manual verification
  • Produces summary reports showing validation results, data quality improvements, and recommended actions
  • Manages workflow queues for human reviewers with prioritized tasks and supporting documentation

Data and System Assumptions

Publicly Available Data Sources:
  • NPI Registry (CMS): Free API access for provider basic information, credentials, and practice locations
  • State Medical Board Websites: Public license verification and disciplinary action information (can be scraped)
  • Hospital/Health System Websites: Provider directory pages with current practice information and contact details
  • Google My Business/Maps API: Practice location verification, phone numbers, and patient review data
  • Medicare Provider Utilization Database: Public claims data showing provider specialties and practice patterns
Synthetic Data Generation:
  • Provider Profile Generator: Create realistic provider datasets with names, addresses, specialties, and credential information
  • Validation Scenario Creator: Generate common data quality issues like outdated phone numbers, moved practices, and credential changes
  • Member Impact Simulator: Create synthetic member complaint data related to provider directory accuracy
  • Network Coverage Generator: Generate geographic and specialty distribution data for network adequacy analysis

Evaluation Criteria

Criterion

Weight

What “Good” Looks Like

Technical Design

35%

Clear agent orchestration (LangGraph/AutoGen/etc.), robust function-calling, safe-guarded actions, resilient retries/timeouts.

Automation Impact and Compliance

25%

Refer to target KPI section

Prototype

20%

Screen prototype of improved processes

Data and Workflow Realism

10%

Realistic data, KB, CRM/ITSM flows; strong negative testing (missing data).

Demo and Storytelling

10%

Compelling narrative; clear before/after; edge-case handling (policy violation, sentiment spike, dead-end knowledge).

Submission Format

  • Demo: Live prototype or 3–4 minute video.
  • Documentation (brief deck or PDF):
    • System architecture diagram.
    • Agent roles and decision logic (auto-resolve vs. handoff).
    • Data schema & API assumptions (source PDF,KB article JSON).
    • Compliance guardrails (PII, content moderation).

Tips for Participants

  • Guardrails First: Enforce content moderation/PII redaction and grounded (SOP/KB) before enabling sensitive actions (rejection).
  • Fast Wins: Target top 5-8 high-volume intents/properties for AHT reduction.
  • Edge-Case Rigor: Demonstrate behavior for ambiguous/handwritten content, fuzzy match, missing data, and source failure scenarios.
  • Modular Orchestration: Keep Worker Agents loosely coupled so new programs (clients) and tasks can be added safely.

Target KPIs (for pilot)

  • Validation Accuracy: 80%+ success rate in identifying outdated provider contact information
  • Processing Speed: Complete validation of 100 providers in under 5 minutes versus hours of manual effort
  • Information extraction: Achieve 85%+ accuracy during information extraction from unstructured documents/scanned PDFs with 95% right confidence score   
  • Processing Throughput: Handle 500+ provider validations per hour through automated pipeline

Example Flows to Implement (Pick 2–3)

Flow 1: Automated Provider Contact Information Validation

Trigger: Daily batch processing of 200 provider profiles for contact information accuracy

Process Steps:

  • Data Validation Agent extracts provider practice information from synthetic dataset including names, addresses, and phone numbers
  • Agent performs web scraping of provider practice websites and Google My Business listings to verify current contact information
  • Cross-validation against NPI registry API to confirm provider identification and basic practice details
  • Quality Assurance Agent compares information across sources and generates confidence scores for each data element
  • Directory Management Agent creates validation report showing confirmed updates, discrepancies, and providers needing manual review
  • Automated prioritization of providers for human verification based on member impact and data confidence levels

Flow 2: New Provider Credential Verification and Onboarding

Trigger: 25 new providers applying for network inclusion with basic credential documentation

Process Steps:

  • Information Enrichment Agent extracts provider information from application forms and searches NPI registry for verification
  • Automated lookup of provider licenses through state medical board websites and public credential databases
  • Data Validation Agent performs background research on provider education, board certifications, and practice history
  • Quality Assurance Agent cross-references information across multiple sources and identifies any red flags or inconsistencies
  • Automated generation of provider profiles with enriched information and confidence ratings for credentialing decisions
  • Directory Management Agent creates summary reports for credentialing committee with recommendations and supporting documentation

Flow 3: Provider Directory Quality Assessment and Improvement

Trigger: Weekly quality assessment of entire provider directory database (500 providers) for accuracy and completeness

Process Steps:

  • Quality Assurance Agent analyzes all provider profiles to identify missing information, outdated data, and potential inconsistencies
  • Data Validation Agent performs selective verification of providers identified as high-risk for accuracy issues
  • Information Enrichment Agent attempts to fill data gaps through public source research and web scraping
  • Automated generation of data quality metrics showing improvement trends and areas needing attention
  • Directory Management Agent creates prioritized action lists for staff including specific providers and data elements requiring manual update
  • Production of executive dashboard showing overall directory quality scores and improvement recommendations

Minimal Tech Stack (Reference/Illustrative)

  • Programming Language: Python with libraries including BeautifulSoup, Requests, Pandas, and Scikit-learn for web scraping and data processing
  • AI/ML Framework: OpenAI API (free tier) or Hugging Face Transformers for natural language processing and data matching
  • Data extraction: VLM based data extraction for maximum performance
  • Database: SQLite or PostgreSQL for local provider data storage and validation tracking
  • Web Framework: Flask or FastAPI for creating simple dashboard and API endpoints
  • NPI Registry API: Free CMS API for provider verification and basic information lookup
  • Web Scraping Tools: Selenium or BeautifulSoup for automated provider website data extraction
  • Google Maps API: Free tier for location verification and contact information validation
  • State License APIs: Public APIs where available or web scraping for license verification
  • Task Scheduling: Python Cron jobs or Celery for automated validation cycles
  • Data Processing: Pandas and NumPy for data manipulation and quality analysis
  • Confidence Scoring: Custom algorithms using data source reliability and cross-validation results
  • Report Generation: Python libraries like ReportLab or Matplotlib for creating validation reports and dashboards

Previous    

  Problem Statement 5          

Next

  EY Techathon 6


Contact us
Like what you’ve seen? Get in touch to learn more.