Spaces:
Sleeping
HNTAI Medical Data Extraction - Refactored System
Overview
This project has been completely refactored to provide a unified, flexible model management system that supports any model name and type, including GGUF models for patient summary generation. The system now offers dynamic model loading, runtime model switching, and robust fallback mechanisms.
๐ Key Features
โจ Universal Model Support
- Any Model Name: Use any Hugging Face model, local model, or custom model
- Any Model Type: Support for text-generation, summarization, NER, GGUF, OpenVINO, and more
- Automatic Type Detection: The system automatically detects model types from names
- Dynamic Loading: Load models at runtime without restarting the application
๐ GGUF Model Integration
- Seamless GGUF Support: Full integration with llama.cpp for GGUF models
- Patient Summary Generation: Optimized for medical text summarization
- Memory Efficient: Ultra-conservative settings for Hugging Face Spaces
- Fallback Mechanisms: Automatic fallback when GGUF models fail
๐ง Unified Model Manager
- Single Interface: One manager handles all model types
- Smart Caching: Intelligent model caching with memory management
- Fallback Chains: Multiple fallback options for robustness
- Performance Monitoring: Built-in timing and memory tracking
๐๏ธ Architecture
Core Components
UnifiedModelManager
- Central model management systemBaseModelLoader
- Abstract interface for all model loadersTransformersModelLoader
- Hugging Face Transformers modelsGGUFModelLoader
- GGUF models via llama.cppOpenVINOModelLoader
- OpenVINO optimized modelsPatientSummarizerAgent
- Enhanced patient summary generation
Model Type Support
Model Type | Description | Example Models |
---|---|---|
text-generation |
Causal language models | facebook/bart-base , microsoft/DialoGPT-medium |
summarization |
Text summarization models | Falconsai/medical_summarization , facebook/bart-large-cnn |
ner |
Named Entity Recognition | dslim/bert-base-NER , Jean-Baptiste/roberta-large-ner-english |
gguf |
GGUF format models | microsoft/Phi-3-mini-4k-instruct-gguf |
openvino |
OpenVINO optimized models | microsoft/Phi-3-mini-4k-instruct |
๐ Quick Start
1. Basic Usage
from ai_med_extract.utils.model_manager import model_manager
# Load any model dynamically
loader = model_manager.get_model_loader(
model_name="microsoft/Phi-3-mini-4k-instruct-gguf",
model_type="gguf",
filename="Phi-3-mini-4k-instruct-q4.gguf"
)
# Generate text
result = loader.generate("Generate a medical summary for...")
2. Patient Summary Generation
from ai_med_extract.agents.patient_summary_agent import PatientSummarizerAgent
# Create agent with any model
agent = PatientSummarizerAgent(
model_name="microsoft/Phi-3-mini-4k-instruct-gguf",
model_type="gguf"
)
# Generate clinical summary
summary = agent.generate_clinical_summary(patient_data)
3. Runtime Model Switching
# Switch models at runtime
agent.update_model(
model_name="Falconsai/medical_summarization",
model_type="summarization"
)
๐ก API Endpoints
Model Management API
Load Model
POST /api/models/load
Content-Type: application/json
{
"model_name": "microsoft/Phi-3-mini-4k-instruct-gguf",
"model_type": "gguf",
"filename": "Phi-3-mini-4k-instruct-q4.gguf",
"force_reload": false
}
Generate Text
POST /api/models/generate
Content-Type: application/json
{
"model_name": "microsoft/Phi-3-mini-4k-instruct-gguf",
"model_type": "gguf",
"prompt": "Generate a medical summary for...",
"max_tokens": 512,
"temperature": 0.7
}
Switch Agent Model
POST /api/models/switch
Content-Type: application/json
{
"agent_name": "patient_summarizer",
"model_name": "microsoft/Phi-3-mini-4k-instruct-gguf",
"model_type": "gguf"
}
Get Model Information
GET /api/models/info?model_name=microsoft/Phi-3-mini-4k-instruct-gguf
Health Check
GET /api/models/health
Patient Summary API
Generate Patient Summary
POST /generate_patient_summary
Content-Type: application/json
{
"patientid": "12345",
"token": "your_token",
"key": "your_api_key",
"patient_summarizer_model_name": "microsoft/Phi-3-mini-4k-instruct-gguf",
"patient_summarizer_model_type": "gguf"
}
๐ง Configuration
Environment Variables
# Cache directories
HF_HOME=/tmp/huggingface
XDG_CACHE_HOME=/tmp
TORCH_HOME=/tmp/torch
WHISPER_CACHE=/tmp/whisper
# GGUF optimization
GGUF_N_THREADS=2
GGUF_N_BATCH=64
Model Configuration
The system automatically uses optimized models for different environments:
- Local Development: Full model capabilities
- Hugging Face Spaces: Memory-optimized models
- Production: Configurable based on resources
๐ฏ Use Cases
1. Medical Document Processing
# Extract medical data with any model
medical_data = model_manager.generate_text(
model_name="facebook/bart-base",
model_type="text-generation",
prompt="Extract medical entities from: " + document_text
)
2. Patient Summary Generation
# Use GGUF model for patient summaries
summary = model_manager.generate_text(
model_name="microsoft/Phi-3-mini-4k-instruct-gguf",
model_type="gguf",
prompt=patient_data_prompt,
max_tokens=512
)
3. Dynamic Model Switching
# Switch between models based on task requirements
if task == "summarization":
model_name = "Falconsai/medical_summarization"
model_type = "summarization"
elif task == "extraction":
model_name = "facebook/bart-base"
model_type = "text-generation"
loader = model_manager.get_model_loader(model_name, model_type)
๐ Memory Management
Hugging Face Spaces Optimization
The system automatically detects Hugging Face Spaces and applies ultra-conservative memory settings:
- GGUF Models: 1 thread, 16 batch size, 512 context
- Transformers: Float32 precision, minimal memory usage
- Automatic Fallbacks: Graceful degradation when memory is limited
Memory Monitoring
# Check memory usage
health = requests.get("/api/models/health").json()
print(f"GPU Memory: {health['gpu_info']['memory_allocated']}")
print(f"Loaded Models: {health['loaded_models_count']}")
๐งช Testing
Test GGUF Models
# Test GGUF model loading
python test_gguf.py
# Test specific model
python -c "
from ai_med_extract.utils.model_manager import model_manager
loader = model_manager.get_model_loader('microsoft/Phi-3-mini-4k-instruct-gguf', 'gguf')
result = loader.generate('Test prompt')
print(f'Success: {len(result)} characters generated')
"
Model Validation
from ai_med_extract.utils.model_config import validate_model_config
# Validate model configuration
validation = validate_model_config(
model_name="microsoft/Phi-3-mini-4k-instruct-gguf",
model_type="gguf"
)
print(f"Valid: {validation['valid']}")
print(f"Warnings: {validation['warnings']}")
๐จ Error Handling
Fallback Mechanisms
- Primary Model: Attempts to load the specified model
- Fallback Model: Uses predefined fallback for the model type
- Text Fallback: Generates structured text responses
- Graceful Degradation: Continues operation with reduced functionality
Common Issues
GGUF Model Loading Fails
# Check model file
if not os.path.exists(model_path):
# Download from Hugging Face
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(repo_id, filename)
Memory Issues
# Clear cache and reload
model_manager.clear_cache()
torch.cuda.empty_cache()
# Use smaller model
loader = model_manager.get_model_loader(
model_name="facebook/bart-base", # Smaller model
model_type="text-generation"
)
๐ Performance
Benchmarking
import time
# Time model loading
start = time.time()
loader = model_manager.get_model_loader(model_name, model_type)
load_time = time.time() - start
# Time generation
start = time.time()
result = loader.generate(prompt)
gen_time = time.time() - start
print(f"Load: {load_time:.2f}s, Generate: {gen_time:.2f}s")
Optimization Tips
- Use Appropriate Model Size: Smaller models for limited resources
- Enable Caching: Models are cached after first load
- Batch Processing: Process multiple requests together
- Memory Monitoring: Regular health checks
๐ฎ Future Enhancements
Planned Features
- Model Quantization: Automatic model optimization
- Distributed Loading: Load models across multiple devices
- Model Versioning: Track and manage model versions
- Performance Analytics: Detailed performance metrics
- Auto-scaling: Automatic model scaling based on load
Extensibility
The system is designed for easy extension:
class CustomModelLoader(BaseModelLoader):
def __init__(self, model_name: str):
self.model_name = model_name
def load(self):
# Custom loading logic
pass
def generate(self, prompt: str, **kwargs):
# Custom generation logic
pass
๐ Migration Guide
From Old System
Replace Hardcoded Models:
# Old model = LazyModelLoader("facebook/bart-base", "text-generation") # New model = model_manager.get_model_loader("facebook/bart-base", "text-generation")
Update Patient Summarizer:
# Old agent = PatientSummarizerAgent() # New agent = PatientSummarizerAgent( model_name="microsoft/Phi-3-mini-4k-instruct-gguf", model_type="gguf" )
Use Dynamic Model Selection:
# Old: Fixed model types # New: Dynamic model selection model_type = request.form.get("model_type", "text-generation") model_name = request.form.get("model_name", "facebook/bart-base")
๐ค Contributing
Development Setup
# Clone repository
git clone <repository-url>
cd HNTAI
# Install dependencies
pip install -r requirements.txt
# Run tests
python -m pytest tests/
# Start development server
python -m ai_med_extract.app
Adding New Model Types
Create Loader Class:
class CustomModelLoader(BaseModelLoader): # Implement required methods pass
Update Model Manager:
if model_type == "custom": loader = CustomModelLoader(model_name)
Add Configuration:
DEFAULT_MODELS["custom"] = { "primary": "default/custom-model", "fallback": "fallback/custom-model" }
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Support
Getting Help
- Documentation: This README and inline code comments
- Issues: GitHub Issues for bug reports
- Discussions: GitHub Discussions for questions
- Examples: See
test_gguf.py
and other test files
Common Questions
Q: Can I use my own GGUF model? A: Yes! Just provide the path to your .gguf file or upload it to Hugging Face.
Q: How do I optimize for memory?
A: Use smaller models, enable caching, and monitor memory usage via /api/models/health
.
Q: Can I switch models without restarting?
A: Yes! Use the /api/models/switch
endpoint to change models at runtime.
Q: What if a model fails to load? A: The system automatically falls back to alternative models and provides detailed error information.
๐ Congratulations! You now have a powerful, flexible system that can work with any model name and type, including GGUF models for patient summary generation. The system is designed to be robust, efficient, and easy to use while maintaining backward compatibility.