BSc Thesis: Audit System and Adversarial Injection Detection in Clinical AI

Security Gateway for protecting AI systems in healthcare environments

General Description

The objective of this BSc thesis is to design and implement a “Security Gateway” for healthcare environments, capable of intercepting, analyzing, and auditing clinical text flows before they are processed by critical decision systems.

Motivation

The vulnerability of Large Language Models (LLMs) to data poisoning attacks represents a critical risk in healthcare environments. Small manipulations in text (indirect injections or “trigger” words) can force erroneous diagnoses or information leaks, remaining invisible to traditional quality controls.

Defense Framework: MEDLEY

The system will implement the MEDLEY defense framework (Medical Ensemble Diagnostic system with Leveraged diversitY) described in recent literature.

Key premise: Instead of relying on a single AI model, the platform will orchestrate a heterogeneous model ensemble (with different architectures). While one model may be vulnerable to a specific attack, it is statistically improbable that multiple diverse models will fail in the same way against the same malicious input.

System Architecture

The platform will act as a governance and monitoring layer, performing the following functions:

  1. Reception: Receive clinical texts (e.g., simulated clinical notes)
  2. Distribution: Send texts to multiple analysis engines in parallel
  3. Measurement: Calculate discrepancy metrics (entropy/disagreement)
  4. Alert: Generate security alerts for anomalous divergence patterns

Specific Objectives

1. Modular Mediation Architecture

Design a complete architecture including:

Ingestion Module

  • Receive and normalize text inputs
  • Simulate synthetic clinical histories

Orchestration Module

  • Load distribution to multiple inference engines (AI)
  • Parallel execution

Audit Module (MEDLEY)

  • Real-time calculation of disagreement metrics between models
  • Anomaly detection

2. Attack Simulation Environment (Red Teaming)

Implement adversarial testing capabilities:

Controlled Dataset Generation

  • Use public data such as MTSamples
  • Inject “marks” or triggers (keywords or syntactic patterns)
  • Validate detectably divergent responses

“Victim” Model Configuration

  • Light fine-tuning of language models
  • Make them react to specific triggers
  • Validate threat effectiveness

3. Diversity Detection Mechanism

Implement detection logic:

Multiple Architecture Integration

  • At least two different model architectures
    • Example: one based on BERT
    • Example: another based on rules or distilled variant

Decision Logic

  • Critical question: When is disagreement considered a security alert vs. legitimate clinical ambiguity?
  • Define appropriate thresholds and metrics
  • Implement alert classification system

4. Observability and Alert Dashboard

Develop monitoring interface:

Visualizations

  • Detected attack attempts
  • Average discrepancy rate
  • Real-time security metrics

Features

  • Configurable alerts
  • Event history
  • Trend analysis

Technologies and Tools

Models and Frameworks

  • Transformers (Hugging Face)
  • BERT and variants
  • Diverse LLMs for the ensemble

Backend and Orchestration

  • Python
  • FastAPI or similar for APIs
  • Queue management for load distribution

Monitoring and Visualization

  • Web dashboard (React/Vue or similar)
  • Grafana or equivalent tools

Student Profile

Requirements:

  • Solid Python knowledge
  • Machine Learning and NLP fundamentals
  • Interest in cybersecurity and critical systems
  • Autonomous work capability

Desirable:

  • Experience with Transformers and LLMs
  • Knowledge of microservice architectures
  • Familiarity with Red Teaming methodologies

Duration and Modality

  • Estimated duration: 4-6 months
  • Modality: Hybrid (flexible on-site/remote)
  • Type: BSc Thesis (Proyecto Fin de Grado)

Expected Results

  1. Functional system Security Gateway for clinical text
  2. Evaluation dataset with documented synthetic attacks
  3. Performance metrics for anomaly detection
  4. Operational monitoring dashboard
  5. Complete technical documentation
  6. Possibility of scientific publication

Supervision and Support

This project will be supervised by InnoTep expert researchers with experience in:

  • Artificial Intelligence applied to healthcare
  • Cybersecurity and critical systems
  • Natural Language Processing

Technical support:

  • Access to computational resources
  • Public clinical text datasets
  • Pre-trained models
  • Weekly progress reviews

Contact

For more information or to express interest in this project:

📧 Email: gi.innotep@upm.es
🏛️ Location: ETSIST - Universidad Politécnica de Madrid


Interested in this project? Contact us to discuss details and start your thesis in a high-impact research area.