General Description
The objective of this BSc thesis is to design and implement a “Security Gateway” for healthcare environments, capable of intercepting, analyzing, and auditing clinical text flows before they are processed by critical decision systems.
Motivation
The vulnerability of Large Language Models (LLMs) to data poisoning attacks represents a critical risk in healthcare environments. Small manipulations in text (indirect injections or “trigger” words) can force erroneous diagnoses or information leaks, remaining invisible to traditional quality controls.
Defense Framework: MEDLEY
The system will implement the MEDLEY defense framework (Medical Ensemble Diagnostic system with Leveraged diversitY) described in recent literature.
Key premise: Instead of relying on a single AI model, the platform will orchestrate a heterogeneous model ensemble (with different architectures). While one model may be vulnerable to a specific attack, it is statistically improbable that multiple diverse models will fail in the same way against the same malicious input.
System Architecture
The platform will act as a governance and monitoring layer, performing the following functions:
- Reception: Receive clinical texts (e.g., simulated clinical notes)
- Distribution: Send texts to multiple analysis engines in parallel
- Measurement: Calculate discrepancy metrics (entropy/disagreement)
- Alert: Generate security alerts for anomalous divergence patterns
Specific Objectives
1. Modular Mediation Architecture
Design a complete architecture including:
Ingestion Module
- Receive and normalize text inputs
- Simulate synthetic clinical histories
Orchestration Module
- Load distribution to multiple inference engines (AI)
- Parallel execution
Audit Module (MEDLEY)
- Real-time calculation of disagreement metrics between models
- Anomaly detection
2. Attack Simulation Environment (Red Teaming)
Implement adversarial testing capabilities:
Controlled Dataset Generation
- Use public data such as MTSamples
- Inject “marks” or triggers (keywords or syntactic patterns)
- Validate detectably divergent responses
“Victim” Model Configuration
- Light fine-tuning of language models
- Make them react to specific triggers
- Validate threat effectiveness
3. Diversity Detection Mechanism
Implement detection logic:
Multiple Architecture Integration
- At least two different model architectures
- Example: one based on BERT
- Example: another based on rules or distilled variant
Decision Logic
- Critical question: When is disagreement considered a security alert vs. legitimate clinical ambiguity?
- Define appropriate thresholds and metrics
- Implement alert classification system
4. Observability and Alert Dashboard
Develop monitoring interface:
Visualizations
- Detected attack attempts
- Average discrepancy rate
- Real-time security metrics
Features
- Configurable alerts
- Event history
- Trend analysis
Technologies and Tools
Models and Frameworks
- Transformers (Hugging Face)
- BERT and variants
- Diverse LLMs for the ensemble
Backend and Orchestration
- Python
- FastAPI or similar for APIs
- Queue management for load distribution
Monitoring and Visualization
- Web dashboard (React/Vue or similar)
- Grafana or equivalent tools
Student Profile
Requirements:
- Solid Python knowledge
- Machine Learning and NLP fundamentals
- Interest in cybersecurity and critical systems
- Autonomous work capability
Desirable:
- Experience with Transformers and LLMs
- Knowledge of microservice architectures
- Familiarity with Red Teaming methodologies
Duration and Modality
- Estimated duration: 4-6 months
- Modality: Hybrid (flexible on-site/remote)
- Type: BSc Thesis (Proyecto Fin de Grado)
Expected Results
- Functional system Security Gateway for clinical text
- Evaluation dataset with documented synthetic attacks
- Performance metrics for anomaly detection
- Operational monitoring dashboard
- Complete technical documentation
- Possibility of scientific publication
Supervision and Support
This project will be supervised by InnoTep expert researchers with experience in:
- Artificial Intelligence applied to healthcare
- Cybersecurity and critical systems
- Natural Language Processing
Technical support:
- Access to computational resources
- Public clinical text datasets
- Pre-trained models
- Weekly progress reviews
Contact
For more information or to express interest in this project:
📧 Email: gi.innotep@upm.es
🏛️ Location: ETSIST - Universidad Politécnica de Madrid
Interested in this project? Contact us to discuss details and start your thesis in a high-impact research area.