HNTAI Medical Data Extraction - Refactored System

Overview

This project has been completely refactored to provide a unified, flexible model management system that supports any model name and type, including GGUF models for patient summary generation. The system now offers dynamic model loading, runtime model switching, and robust fallback mechanisms.

🚀 Key Features

✨ Universal Model Support

Any Model Name: Use any Hugging Face model, local model, or custom model
Any Model Type: Support for text-generation, summarization, NER, GGUF, OpenVINO, and more
Automatic Type Detection: The system automatically detects model types from names
Dynamic Loading: Load models at runtime without restarting the application

🔄 GGUF Model Integration

Seamless GGUF Support: Full integration with llama.cpp for GGUF models
Patient Summary Generation: Optimized for medical text summarization
Memory Efficient: Ultra-conservative settings for Hugging Face Spaces
Fallback Mechanisms: Automatic fallback when GGUF models fail

🧠 Unified Model Manager

Single Interface: One manager handles all model types
Smart Caching: Intelligent model caching with memory management
Fallback Chains: Multiple fallback options for robustness
Performance Monitoring: Built-in timing and memory tracking

🏗️ Architecture

Core Components

UnifiedModelManager - Central model management system
BaseModelLoader - Abstract interface for all model loaders
TransformersModelLoader - Hugging Face Transformers models
GGUFModelLoader - GGUF models via llama.cpp
OpenVINOModelLoader - OpenVINO optimized models
PatientSummarizerAgent - Enhanced patient summary generation

Model Type Support

Model Type	Description	Example Models
`text-generation`	Causal language models	`facebook/bart-base`, `microsoft/DialoGPT-medium`
`summarization`	Text summarization models	`Falconsai/medical_summarization`, `facebook/bart-large-cnn`
`ner`	Named Entity Recognition	`dslim/bert-base-NER`, `Jean-Baptiste/roberta-large-ner-english`
`gguf`	GGUF format models	`microsoft/Phi-3-mini-4k-instruct-gguf`
`openvino`	OpenVINO optimized models	`microsoft/Phi-3-mini-4k-instruct`

🚀 Quick Start

1. Basic Usage

from ai_med_extract.utils.model_manager import model_manager

# Load any model dynamically
loader = model_manager.get_model_loader(
    model_name="microsoft/Phi-3-mini-4k-instruct-gguf",
    model_type="gguf",
    filename="Phi-3-mini-4k-instruct-q4.gguf"
)

# Generate text
result = loader.generate("Generate a medical summary for...")

2. Patient Summary Generation

from ai_med_extract.agents.patient_summary_agent import PatientSummarizerAgent

# Create agent with any model
agent = PatientSummarizerAgent(
    model_name="microsoft/Phi-3-mini-4k-instruct-gguf",
    model_type="gguf"
)

# Generate clinical summary
summary = agent.generate_clinical_summary(patient_data)

3. Runtime Model Switching

# Switch models at runtime
agent.update_model(
    model_name="Falconsai/medical_summarization",
    model_type="summarization"
)

📡 API Endpoints

Model Management API

Load Model

POST /api/models/load
Content-Type: application/json

{
    "model_name": "microsoft/Phi-3-mini-4k-instruct-gguf",
    "model_type": "gguf",
    "filename": "Phi-3-mini-4k-instruct-q4.gguf",
    "force_reload": false
}

Generate Text

POST /api/models/generate
Content-Type: application/json

{
    "model_name": "microsoft/Phi-3-mini-4k-instruct-gguf",
    "model_type": "gguf",
    "prompt": "Generate a medical summary for...",
    "max_tokens": 512,
    "temperature": 0.7
}

Switch Agent Model

POST /api/models/switch
Content-Type: application/json

{
    "agent_name": "patient_summarizer",
    "model_name": "microsoft/Phi-3-mini-4k-instruct-gguf",
    "model_type": "gguf"
}

Get Model Information

GET /api/models/info?model_name=microsoft/Phi-3-mini-4k-instruct-gguf

Health Check

GET /api/models/health

Patient Summary API

Generate Patient Summary

POST /generate_patient_summary
Content-Type: application/json

{
    "patientid": "12345",
    "token": "your_token",
    "key": "your_api_key",
    "patient_summarizer_model_name": "microsoft/Phi-3-mini-4k-instruct-gguf",
    "patient_summarizer_model_type": "gguf"
}

🔧 Configuration

Environment Variables

# Cache directories
HF_HOME=/tmp/huggingface
XDG_CACHE_HOME=/tmp
TORCH_HOME=/tmp/torch
WHISPER_CACHE=/tmp/whisper

# GGUF optimization
GGUF_N_THREADS=2
GGUF_N_BATCH=64

Model Configuration

The system automatically uses optimized models for different environments:

Local Development: Full model capabilities
Hugging Face Spaces: Memory-optimized models
Production: Configurable based on resources

🎯 Use Cases

1. Medical Document Processing

# Extract medical data with any model
medical_data = model_manager.generate_text(
    model_name="facebook/bart-base",
    model_type="text-generation",
    prompt="Extract medical entities from: " + document_text
)

2. Patient Summary Generation

# Use GGUF model for patient summaries
summary = model_manager.generate_text(
    model_name="microsoft/Phi-3-mini-4k-instruct-gguf",
    model_type="gguf",
    prompt=patient_data_prompt,
    max_tokens=512
)

3. Dynamic Model Switching

# Switch between models based on task requirements
if task == "summarization":
    model_name = "Falconsai/medical_summarization"
    model_type = "summarization"
elif task == "extraction":
    model_name = "facebook/bart-base"
    model_type = "text-generation"

loader = model_manager.get_model_loader(model_name, model_type)

🔒 Memory Management

Hugging Face Spaces Optimization

The system automatically detects Hugging Face Spaces and applies ultra-conservative memory settings:

GGUF Models: 1 thread, 16 batch size, 512 context
Transformers: Float32 precision, minimal memory usage
Automatic Fallbacks: Graceful degradation when memory is limited

Memory Monitoring

# Check memory usage
health = requests.get("/api/models/health").json()
print(f"GPU Memory: {health['gpu_info']['memory_allocated']}")
print(f"Loaded Models: {health['loaded_models_count']}")

🧪 Testing

Test GGUF Models

# Test GGUF model loading
python test_gguf.py

# Test specific model
python -c "
from ai_med_extract.utils.model_manager import model_manager
loader = model_manager.get_model_loader('microsoft/Phi-3-mini-4k-instruct-gguf', 'gguf')
result = loader.generate('Test prompt')
print(f'Success: {len(result)} characters generated')
"

Model Validation

from ai_med_extract.utils.model_config import validate_model_config

# Validate model configuration
validation = validate_model_config(
    model_name="microsoft/Phi-3-mini-4k-instruct-gguf",
    model_type="gguf"
)

print(f"Valid: {validation['valid']}")
print(f"Warnings: {validation['warnings']}")

🚨 Error Handling

Fallback Mechanisms

Primary Model: Attempts to load the specified model
Fallback Model: Uses predefined fallback for the model type
Text Fallback: Generates structured text responses
Graceful Degradation: Continues operation with reduced functionality

Common Issues

GGUF Model Loading Fails

# Check model file
if not os.path.exists(model_path):
    # Download from Hugging Face
    from huggingface_hub import hf_hub_download
    model_path = hf_hub_download(repo_id, filename)

Memory Issues

# Clear cache and reload
model_manager.clear_cache()
torch.cuda.empty_cache()

# Use smaller model
loader = model_manager.get_model_loader(
    model_name="facebook/bart-base",  # Smaller model
    model_type="text-generation"
)

📊 Performance

Benchmarking

import time

# Time model loading
start = time.time()
loader = model_manager.get_model_loader(model_name, model_type)
load_time = time.time() - start

# Time generation
start = time.time()
result = loader.generate(prompt)
gen_time = time.time() - start

print(f"Load: {load_time:.2f}s, Generate: {gen_time:.2f}s")

Optimization Tips

Use Appropriate Model Size: Smaller models for limited resources
Enable Caching: Models are cached after first load
Batch Processing: Process multiple requests together
Memory Monitoring: Regular health checks

🔮 Future Enhancements

Planned Features

Model Quantization: Automatic model optimization
Distributed Loading: Load models across multiple devices
Model Versioning: Track and manage model versions
Performance Analytics: Detailed performance metrics
Auto-scaling: Automatic model scaling based on load

Extensibility

The system is designed for easy extension:

class CustomModelLoader(BaseModelLoader):
    def __init__(self, model_name: str):
        self.model_name = model_name
    
    def load(self):
        # Custom loading logic
        pass
    
    def generate(self, prompt: str, **kwargs):
        # Custom generation logic
        pass

📝 Migration Guide

From Old System

Replace Hardcoded Models:

# Old
model = LazyModelLoader("facebook/bart-base", "text-generation")

# New
model = model_manager.get_model_loader("facebook/bart-base", "text-generation")

Update Patient Summarizer:

# Old
agent = PatientSummarizerAgent()

# New
agent = PatientSummarizerAgent(
    model_name="microsoft/Phi-3-mini-4k-instruct-gguf",
    model_type="gguf"
)

Use Dynamic Model Selection:

# Old: Fixed model types
# New: Dynamic model selection
model_type = request.form.get("model_type", "text-generation")
model_name = request.form.get("model_name", "facebook/bart-base")

🤝 Contributing

Development Setup

# Clone repository
git clone <repository-url>
cd HNTAI

# Install dependencies
pip install -r requirements.txt

# Run tests
python -m pytest tests/

# Start development server
python -m ai_med_extract.app

Adding New Model Types

Create Loader Class:

class CustomModelLoader(BaseModelLoader):
    # Implement required methods
    pass

Update Model Manager:

if model_type == "custom":
    loader = CustomModelLoader(model_name)

Add Configuration:

DEFAULT_MODELS["custom"] = {
    "primary": "default/custom-model",
    "fallback": "fallback/custom-model"
}

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Support

Getting Help

Documentation: This README and inline code comments
Issues: GitHub Issues for bug reports
Discussions: GitHub Discussions for questions
Examples: See test_gguf.py and other test files

Common Questions

Q: Can I use my own GGUF model? A: Yes! Just provide the path to your .gguf file or upload it to Hugging Face.

Q: How do I optimize for memory? A: Use smaller models, enable caching, and monitor memory usage via /api/models/health.

Q: Can I switch models without restarting? A: Yes! Use the /api/models/switch endpoint to change models at runtime.

Q: What if a model fails to load? A: The system automatically falls back to alternative models and provides detailed error information.

🎉 Congratulations! You now have a powerful, flexible system that can work with any model name and type, including GGUF models for patient summary generation. The system is designed to be robust, efficient, and easy to use while maintaining backward compatibility.