Spaces:

salvinjose
/

HNTAI

Paused

App Files Files Community

sachinchandrankallar commited on Aug 27

Commit

c6f267d

1 Parent(s): 8704dff

optimized code

Browse files

Files changed (8) hide show

REFACTORED_README.md +463 -0
ai_med_extract/agents/patient_summary_agent.py +313 -1534
ai_med_extract/api/model_management.py +397 -0
ai_med_extract/api/routes.py +172 -386
ai_med_extract/app.py +19 -92
ai_med_extract/utils/model_config.py +165 -0
ai_med_extract/utils/model_manager.py +408 -0
test_refactored_system.py +321 -0

REFACTORED_README.md ADDED Viewed

	@@ -0,0 +1,463 @@

+# HNTAI Medical Data Extraction - Refactored System
+## Overview
+This project has been completely refactored to provide a unified, flexible model management system that supports **any model name and type**, including GGUF models for patient summary generation. The system now offers dynamic model loading, runtime model switching, and robust fallback mechanisms.
+## 🚀 Key Features
+### ✨ **Universal Model Support**
+- **Any Model Name**: Use any Hugging Face model, local model, or custom model
+- **Any Model Type**: Support for text-generation, summarization, NER, GGUF, OpenVINO, and more
+- **Automatic Type Detection**: The system automatically detects model types from names
+- **Dynamic Loading**: Load models at runtime without restarting the application
+### 🔄 **GGUF Model Integration**
+- **Seamless GGUF Support**: Full integration with llama.cpp for GGUF models
+- **Patient Summary Generation**: Optimized for medical text summarization
+- **Memory Efficient**: Ultra-conservative settings for Hugging Face Spaces
+- **Fallback Mechanisms**: Automatic fallback when GGUF models fail
+### 🧠 **Unified Model Manager**
+- **Single Interface**: One manager handles all model types
+- **Smart Caching**: Intelligent model caching with memory management
+- **Fallback Chains**: Multiple fallback options for robustness
+- **Performance Monitoring**: Built-in timing and memory tracking
+## 🏗️ Architecture
+### Core Components
+1. **`UnifiedModelManager`** - Central model management system
+2. **`BaseModelLoader`** - Abstract interface for all model loaders
+3. **`TransformersModelLoader`** - Hugging Face Transformers models
+4. **`GGUFModelLoader`** - GGUF models via llama.cpp
+5. **`OpenVINOModelLoader`** - OpenVINO optimized models
+6. **`PatientSummarizerAgent`** - Enhanced patient summary generation
+### Model Type Support
+| Model Type | Description | Example Models |
+|------------|-------------|----------------|
+| `text-generation` | Causal language models | `facebook/bart-base`, `microsoft/DialoGPT-medium` |
+| `summarization` | Text summarization models | `Falconsai/medical_summarization`, `facebook/bart-large-cnn` |
+| `ner` | Named Entity Recognition | `dslim/bert-base-NER`, `Jean-Baptiste/roberta-large-ner-english` |
+| `gguf` | GGUF format models | `microsoft/Phi-3-mini-4k-instruct-gguf` |
+| `openvino` | OpenVINO optimized models | `microsoft/Phi-3-mini-4k-instruct` |
+## 🚀 Quick Start
+### 1. Basic Usage
+```python
+from ai_med_extract.utils.model_manager import model_manager
+# Load any model dynamically
+loader = model_manager.get_model_loader(
+    model_name="microsoft/Phi-3-mini-4k-instruct-gguf",
+    model_type="gguf",
+    filename="Phi-3-mini-4k-instruct-q4.gguf"
+)
+# Generate text
+result = loader.generate("Generate a medical summary for...")
+```
+### 2. Patient Summary Generation
+```python
+from ai_med_extract.agents.patient_summary_agent import PatientSummarizerAgent
+# Create agent with any model
+agent = PatientSummarizerAgent(
+    model_name="microsoft/Phi-3-mini-4k-instruct-gguf",
+    model_type="gguf"
+)
+# Generate clinical summary
+summary = agent.generate_clinical_summary(patient_data)
+```
+### 3. Runtime Model Switching
+```python
+# Switch models at runtime
+agent.update_model(
+    model_name="Falconsai/medical_summarization",
+    model_type="summarization"
+)
+```
+## 📡 API Endpoints
+### Model Management API
+#### Load Model
+```http
+POST /api/models/load
+Content-Type: application/json
+{
+    "model_name": "microsoft/Phi-3-mini-4k-instruct-gguf",
+    "model_type": "gguf",
+    "filename": "Phi-3-mini-4k-instruct-q4.gguf",
+    "force_reload": false
+}
+```
+#### Generate Text
+```http
+POST /api/models/generate
+Content-Type: application/json
+{
+    "model_name": "microsoft/Phi-3-mini-4k-instruct-gguf",
+    "model_type": "gguf",
+    "prompt": "Generate a medical summary for...",
+    "max_tokens": 512,
+    "temperature": 0.7
+}
+```
+#### Switch Agent Model
+```http
+POST /api/models/switch
+Content-Type: application/json
+{
+    "agent_name": "patient_summarizer",
+    "model_name": "microsoft/Phi-3-mini-4k-instruct-gguf",
+    "model_type": "gguf"
+}
+```
+#### Get Model Information
+```http
+GET /api/models/info?model_name=microsoft/Phi-3-mini-4k-instruct-gguf
+```
+#### Health Check
+```http
+GET /api/models/health
+```
+### Patient Summary API
+#### Generate Patient Summary
+```http
+POST /generate_patient_summary
+Content-Type: application/json
+{
+    "patientid": "12345",
+    "token": "your_token",
+    "key": "your_api_key",
+    "patient_summarizer_model_name": "microsoft/Phi-3-mini-4k-instruct-gguf",
+    "patient_summarizer_model_type": "gguf"
+}
+```
+## 🔧 Configuration
+### Environment Variables
+```bash
+# Cache directories
+HF_HOME=/tmp/huggingface
+XDG_CACHE_HOME=/tmp
+TORCH_HOME=/tmp/torch
+WHISPER_CACHE=/tmp/whisper
+# GGUF optimization
+GGUF_N_THREADS=2
+GGUF_N_BATCH=64
+```
+### Model Configuration
+The system automatically uses optimized models for different environments:
+- **Local Development**: Full model capabilities
+- **Hugging Face Spaces**: Memory-optimized models
+- **Production**: Configurable based on resources
+## 🎯 Use Cases
+### 1. **Medical Document Processing**
+```python
+# Extract medical data with any model
+medical_data = model_manager.generate_text(
+    model_name="facebook/bart-base",
+    model_type="text-generation",
+    prompt="Extract medical entities from: " + document_text
+)
+```
+### 2. **Patient Summary Generation**
+```python
+# Use GGUF model for patient summaries
+summary = model_manager.generate_text(
+    model_name="microsoft/Phi-3-mini-4k-instruct-gguf",
+    model_type="gguf",
+    prompt=patient_data_prompt,
+    max_tokens=512
+)
+```
+### 3. **Dynamic Model Switching**
+```python
+# Switch between models based on task requirements
+if task == "summarization":
+    model_name = "Falconsai/medical_summarization"
+    model_type = "summarization"
+elif task == "extraction":
+    model_name = "facebook/bart-base"
+    model_type = "text-generation"
+loader = model_manager.get_model_loader(model_name, model_type)
+```
+## 🔒 Memory Management
+### Hugging Face Spaces Optimization
+The system automatically detects Hugging Face Spaces and applies ultra-conservative memory settings:
+- **GGUF Models**: 1 thread, 16 batch size, 512 context
+- **Transformers**: Float32 precision, minimal memory usage
+- **Automatic Fallbacks**: Graceful degradation when memory is limited
+### Memory Monitoring
+```python
+# Check memory usage
+health = requests.get("/api/models/health").json()
+print(f"GPU Memory: {health['gpu_info']['memory_allocated']}")
+print(f"Loaded Models: {health['loaded_models_count']}")
+```
+## 🧪 Testing
+### Test GGUF Models
+```bash
+# Test GGUF model loading
+python test_gguf.py
+# Test specific model
+python -c "
+from ai_med_extract.utils.model_manager import model_manager
+loader = model_manager.get_model_loader('microsoft/Phi-3-mini-4k-instruct-gguf', 'gguf')
+result = loader.generate('Test prompt')
+print(f'Success: {len(result)} characters generated')
+"
+```
+### Model Validation
+```python
+from ai_med_extract.utils.model_config import validate_model_config
+# Validate model configuration
+validation = validate_model_config(
+    model_name="microsoft/Phi-3-mini-4k-instruct-gguf",
+    model_type="gguf"
+)
+print(f"Valid: {validation['valid']}")
+print(f"Warnings: {validation['warnings']}")
+```
+## 🚨 Error Handling
+### Fallback Mechanisms
+1. **Primary Model**: Attempts to load the specified model
+2. **Fallback Model**: Uses predefined fallback for the model type
+3. **Text Fallback**: Generates structured text responses
+4. **Graceful Degradation**: Continues operation with reduced functionality
+### Common Issues
+#### GGUF Model Loading Fails
+```python
+# Check model file
+if not os.path.exists(model_path):
+    # Download from Hugging Face
+    from huggingface_hub import hf_hub_download
+    model_path = hf_hub_download(repo_id, filename)
+```
+#### Memory Issues
+```python
+# Clear cache and reload
+model_manager.clear_cache()
+torch.cuda.empty_cache()
+# Use smaller model
+loader = model_manager.get_model_loader(
+    model_name="facebook/bart-base",  # Smaller model
+    model_type="text-generation"
+)
+```
+## 📊 Performance
+### Benchmarking
+```python
+import time
+# Time model loading
+start = time.time()
+loader = model_manager.get_model_loader(model_name, model_type)
+load_time = time.time() - start
+# Time generation
+start = time.time()
+result = loader.generate(prompt)
+gen_time = time.time() - start
+print(f"Load: {load_time:.2f}s, Generate: {gen_time:.2f}s")
+```
+### Optimization Tips
+1. **Use Appropriate Model Size**: Smaller models for limited resources
+2. **Enable Caching**: Models are cached after first load
+3. **Batch Processing**: Process multiple requests together
+4. **Memory Monitoring**: Regular health checks
+## 🔮 Future Enhancements
+### Planned Features
+- **Model Quantization**: Automatic model optimization
+- **Distributed Loading**: Load models across multiple devices
+- **Model Versioning**: Track and manage model versions
+- **Performance Analytics**: Detailed performance metrics
+- **Auto-scaling**: Automatic model scaling based on load
+### Extensibility
+The system is designed for easy extension:
+```python
+class CustomModelLoader(BaseModelLoader):
+    def __init__(self, model_name: str):
+        self.model_name = model_name
+    def load(self):
+        # Custom loading logic
+        pass
+    def generate(self, prompt: str, **kwargs):
+        # Custom generation logic
+        pass
+```
+## 📝 Migration Guide
+### From Old System
+1. **Replace Hardcoded Models**:
+   ```python
+   # Old
+   model = LazyModelLoader("facebook/bart-base", "text-generation")
+   # New
+   model = model_manager.get_model_loader("facebook/bart-base", "text-generation")
+   ```
+2. **Update Patient Summarizer**:
+   ```python
+   # Old
+   agent = PatientSummarizerAgent()
+   # New
+   agent = PatientSummarizerAgent(
+       model_name="microsoft/Phi-3-mini-4k-instruct-gguf",
+       model_type="gguf"
+   )
+   ```
+3. **Use Dynamic Model Selection**:
+   ```python
+   # Old: Fixed model types
+   # New: Dynamic model selection
+   model_type = request.form.get("model_type", "text-generation")
+   model_name = request.form.get("model_name", "facebook/bart-base")
+   ```
+## 🤝 Contributing
+### Development Setup
+```bash
+# Clone repository
+git clone <repository-url>
+cd HNTAI
+# Install dependencies
+pip install -r requirements.txt
+# Run tests
+python -m pytest tests/
+# Start development server
+python -m ai_med_extract.app
+```
+### Adding New Model Types
+1. **Create Loader Class**:
+   ```python
+   class CustomModelLoader(BaseModelLoader):
+       # Implement required methods
+       pass
+   ```
+2. **Update Model Manager**:
+   ```python
+   if model_type == "custom":
+       loader = CustomModelLoader(model_name)
+   ```
+3. **Add Configuration**:
+   ```python
+   DEFAULT_MODELS["custom"] = {
+       "primary": "default/custom-model",
+       "fallback": "fallback/custom-model"
+   }
+   ```
+## 📄 License
+This project is licensed under the MIT License - see the LICENSE file for details.
+## 🆘 Support
+### Getting Help
+- **Documentation**: This README and inline code comments
+- **Issues**: GitHub Issues for bug reports
+- **Discussions**: GitHub Discussions for questions
+- **Examples**: See `test_gguf.py` and other test files
+### Common Questions
+**Q: Can I use my own GGUF model?**
+A: Yes! Just provide the path to your .gguf file or upload it to Hugging Face.
+**Q: How do I optimize for memory?**
+A: Use smaller models, enable caching, and monitor memory usage via `/api/models/health`.
+**Q: Can I switch models without restarting?**
+A: Yes! Use the `/api/models/switch` endpoint to change models at runtime.
+**Q: What if a model fails to load?**
+A: The system automatically falls back to alternative models and provides detailed error information.
+---
+**🎉 Congratulations!** You now have a powerful, flexible system that can work with any model name and type, including GGUF models for patient summary generation. The system is designed to be robust, efficient, and easy to use while maintaining backward compatibility.

ai_med_extract/agents/patient_summary_agent.py CHANGED Viewed

@@ -1,1559 +1,338 @@
-# # import datetime
-# # from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
-# # import torch
-# # from ai_med_extract.utils.patient_summary_utils import patient_chunk_text, flatten_to_string_list
-# # class PatientSummarizerAgent:
-# #     def __init__(self):
-# #         # Device configuration
-# #         self.device = 'cuda' if torch.cuda.is_available() else 'cpu'
-# #         if self.device == 'cuda':
-# #             torch.cuda.empty_cache()
-# #         # Load medical summarization model
-# #         self.MODEL_NAME = "Falconsai/medical_summarization"
-# #         try:
-# #             self.tokenizer, self.model = self.load_model(self.MODEL_NAME, self.device)
-# #         except RuntimeError as e:
-# #             exit()
-# #     def load_model(self, model_name: str, device: str):
-# #         try:
-# #             tokenizer = AutoTokenizer.from_pretrained(model_name)
-# #             model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
-# #             if device == 'cuda':
-# #                 model = model.half()
-# #             model.to(device)
-# #             model.eval()
-# #             return tokenizer, model
-# #         except Exception as e:
-# #             raise RuntimeError(f"Model loading failed: {str(e)}")
-# #     def summarize_chunk(self, text):
-# #         try:
-# #             inputs = self.tokenizer(text, return_tensors="pt", truncation=True, max_length=1024).to(self.device)
-# #             outputs = self.model.generate(
-# #                 **inputs,
-# #                 max_new_tokens=400,
-# #                 num_beams=4,
-# #                 temperature=0.7,
-# #                 early_stopping=True
-# #             )
-# #             return self.tokenizer.decode(outputs[0], skip_special_tokens=True).strip()
-# #         except Exception as e:
-# #             return f"Error summarizing chunk: {str(e)}"
-# #     def generate_clinical_summary(self, patient_data: dict) -> str:
-# #         try:
-# #             # Use flattened data and chunking for summarization
-# #             flattened_lines = flatten_to_string_list(patient_data)
-# #             chunks = patient_chunk_text(flattened_lines, chunk_size=1500)
-# #             chunk_summaries = [self.summarize_chunk(chunk) for chunk in chunks]
-# #             raw_summary = " ".join(chunk_summaries)
-# #             return self.format_clinical_output(raw_summary, patient_data)
-# #         except Exception as e:
-# #             return f"Error generating summary: {str(e)}"
-# #     def format_clinical_output(self, raw_summary: str, patient_data: dict) -> str:
-# #         current_time = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
-# #         result = patient_data['result']
-# #         formatted = f"\n--- CLINICAL DECISION SUMMARY ---\n"
-# #         formatted += f"Summary Generated On: {current_time}\n"
-# #         # Demographics
-# #         formatted += f"\n--- PATIENT DEMOGRAPHICS ---\n"
-# #         formatted += f"Patient ID: {result.get('patientnumber', 'Unknown')}\n"
-# #         gender = result.get('gender', 'Unknown')
-# #         formatted += f"Age/Sex: {result.get('agey', 'Unknown')} {gender[0] if gender and gender != 'Unknown' else 'U'}\n"
-# #         formatted += f"Date of Birth: {result.get('dob', 'N/A')}\n"
-# #         formatted += f"Blood Group: {result.get('bloodgrp', 'N/A')}\n"
-# #         formatted += f"Last Visit Date: {result.get('lastvisitdt', 'N/A')}\n"
-# #         allergies = result.get('allergies') or ['None known']
-# #         formatted += f"Allergies: **{', '.join(allergies)}**\n"
-# #         formatted += f"Social History: {result.get('social_history', 'Not specified')}\n"
-# #         # Reason for visit
-# #         formatted += f"\n--- REASON FOR VISIT ---\n"
-# #         formatted += f"Chief Complaint: **{result.get('chief_complaint', 'Not specified')}**\n"
-# #         # Past medical history
-# #         formatted += f"\n--- PAST MEDICAL HISTORY ---\n"
-# #         past_history = result.get('past_medical_history') or ['None']
-# #         for item in past_history:
-# #             formatted += f"{item}\n"
-# #         # Vitals
-# #         formatted += f"\n--- CURRENT VITALS ---\n"
-# #         vitals = result.get('vitals', {})
-# #         formatted += f"BP: {vitals.get('BP', 'N/A')}\n"
-# #         formatted += f"Temp: {vitals.get('Temp', 'N/A')}\n"
-# #         formatted += f"SpO2: {vitals.get('SpO2', 'N/A')}\n"
-# #         formatted += f"Height: {vitals.get('Height', 'N/A')}\n"
-# #         formatted += f"Weight: {vitals.get('Weight', 'N/A')}\n"
-# #         formatted += f"BMI: {vitals.get('BMI', 'N/A')}\n"
-# #         # Lab & Imaging
-# #         formatted += f"\n--- LAB & IMAGING ---\n"
-# #         formatted += f"\n**Lab Tests Results:**\n"
-# #         lab_results = result.get('lab_results') or []
-# #         if lab_results:
-# #             for lab in lab_results:
-# #                 value = lab.get('value', 'N/A')
-# #                 test_name = lab.get('name', 'Unknown Test')
-# #                 formatted += f"{test_name}: **{value}**\n"
-# #         else:
-# #             labtests = result.get('labtests') or ['None']
-# #             for test in labtests:
-# #                 formatted += f"{test}\n"
-# #         formatted += f"\n**Radiology Orders:**\n"
-# #         radiology_orders = result.get('radiologyorders') or ['None']
-# #         for order in radiology_orders:
-# #             formatted += f"{order}\n"
-# #         # Medications
-# #         formatted += f"\n--- CURRENT MEDICATIONS ---\n"
-# #         medications = result.get('medications') or ['None']
-# #         for med in medications:
-# #             if med and str(med).lower() != 'null':
-# #                 formatted += f"{med}\n"
-# #         # Diagnoses
-# #         formatted += f"\n--- ASSESSMENT & DIAGNOSES ---\n"
-# #         diagnoses = result.get('diagnosis') or ['None']
-# #         for dx in diagnoses:
-# #             formatted += f"{dx}\n"
-# #         # Plan
-# #         formatted += f"\n--- PLAN ---\n"
-# #         plan = result.get('assessment_plan', 'No plan specified')
-# #         plan_lines = [line.strip() for line in plan.split('\n') if line.strip()]
-# #         for line in plan_lines:
-# #             formatted += f"{line}\n"
-# #         # Follow-up
-# #         formatted += f"\n--- FOLLOW-UP RECOMMENDATIONS ---\n"
-# #         formatted += "Re-evaluate in 5-7 days if not improving\n"
-# #         formatted += "Return immediately for worsening dyspnea or new symptoms\n"
-# #         formatted += f"\n--- MODEL-GENERATED SUMMARY ---\n{raw_summary}\n"
-# #         return formatted
-# # import datetime
-# # from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
-# # import torch
-# # from ai_med_extract.utils.patient_summary_utils import patient_chunk_text, flatten_to_string_list
-# # class PatientSummarizerAgent:
-# #     def __init__(self):
-# #         self.device = 'cuda' if torch.cuda.is_available() else 'cpu'
-# #         self.MODEL_NAME = "Falconsai/medical_summarization"  # Or replace with Flan-T5
-# #         self.tokenizer, self.model = self.load_model(self.MODEL_NAME)
-# #     def load_model(self, model_name):
-# #         tokenizer = AutoTokenizer.from_pretrained(model_name)
-# #         model = AutoModelForSeq2SeqLM.from_pretrained(model_name).to(self.device)
-# #         model.eval()
-# #         return tokenizer, model
-# #     def build_narrative_prompt(self, patient_data):
-# #         result = patient_data['result']
-# #         prompt_lines = [f"Past Medical History: {', '.join(result.get('past_medical_history', []))}.\n"]
-# #         for enc in result.get('encounters', []):
-# #             prompt_lines.append(
-# #                 f"Encounter on {enc['visit_date']}:\n"
-# #                 f"- Chief Complaint: {enc.get('chief_complaint')}\n"
-# #                 f"- Symptoms: {enc.get('symptoms')}\n"
-# #                 f"- Diagnoses: {', '.join(enc.get('diagnosis', []))}\n"
-# #                 f"- Doctor's Notes: {enc.get('dr_notes')}\n"
-# #                 f"- Investigations: {enc.get('investigations')}\n"
-# #                 f"- Medications: {', '.join(enc.get('medications', []))}\n"
-# #                 f"- Treatment: {enc.get('treatment')}\n"
-# #             )
-# #         return (
-# #             "Summarize the following clinical timeline with a narrative, assessment, plan, and possible "
-# #             "next steps.\n\nPATIENT HISTORY:\n" + "\n".join(prompt_lines)
-# #         )
-# #     def generate_summary(self, prompt: str):
-# #         inputs = self.tokenizer(prompt, return_tensors="pt", truncation=True, max_length=1024).to(self.device)
-# #         outputs = self.model.generate(**inputs, max_new_tokens=512, num_beams=4, early_stopping=True)
-# #         return self.tokenizer.decode(outputs[0], skip_special_tokens=True).strip()
-# #     def generate_clinical_summary(self, patient_data: dict) -> str:
-# #         try:
-# #             prompt = self.build_narrative_prompt(patient_data)
-# #             summary = self.generate_summary(prompt)
-# #             return self.format_clinical_output(summary, patient_data)
-# #         except Exception as e:
-# #             return f"❌ Error generating summary: {e}"
-# #     def format_clinical_output(self, raw_summary: str, patient_data: dict) -> str:
-# #         result = patient_data['result']
-# #         now = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
-# #         report = "\n--- CLINICAL SUMMARY ---\n"
-# #         report += f"Generated On: {now}\n\n"
-# #         report += f"Patient ID: {result.get('patientnumber', 'N/A')}\n"
-# #         report += f"Age/Sex: {result.get('agey', 'N/A')} / {result.get('gender', 'N/A')}\n"
-# #         report += f"Allergies: {', '.join(result.get('allergies', ['None']))}\n"
-# #         report += f"\n--- MODEL-GENERATED SUMMARY ---\n"
-# #         report += raw_summary + "\n"
-# #         return report
-# import datetime
-# import torch
-# import warnings
-# import re
-# from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
-# from textwrap import fill
-# # Suppress non-critical warnings
-# warnings.filterwarnings("ignore", category=UserWarning)
-# class PatientSummarizerAgent:
-#     def __init__(self):
-#         # Device configuration
-#         self.device = 'cuda' if torch.cuda.is_available() else 'cpu'
-#         if self.device == 'cuda':
-#             torch.cuda.empty_cache()
-#         print(f"✅ Using device for tensors: {self.device}")
-#         # Model configuration
-#         self.MODEL_NAME = "Falconsai/medical_summarization"
-#         try:
-#             self.tokenizer, self.model = self.load_model(self.MODEL_NAME, self.device)
-#             print(f"✅ Model '{self.MODEL_NAME}' loaded successfully.")
-#         except RuntimeError as e:
-#             print(f"❌ Failed to load model: {e}")
-#             exit(1)
-#     def load_model(self, model_name: str, device: str):
-#         """
-#         Loads the medical summarization model and tokenizer.
-#         """
-#         try:
-#             print(f"🔄 Loading model: {model_name}...")
-#             tokenizer = AutoTokenizer.from_pretrained(model_name)
-#             model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
-#             if device == 'cuda':
-#                 model = model.half()  # FP16 for GPU
-#             model.to(device)
-#             model.eval()
-#             print(f"✅ Model '{model_name}' loaded and set to evaluation mode.")
-#             return tokenizer, model
-#         except Exception as e:
-#             raise RuntimeError(f"Model loading failed: {str(e)}")
-#     def summarize_chunk(self, text: str) -> str:
-#         """
-#         Summarizes a single text chunk using the model.
-#         """
-#         try:
-#             inputs = self.tokenizer(
-#                 text,
-#                 return_tensors="pt",
-#                 truncation=True,
-#                 max_length=1024
-#             ).to(self.device)
-#             outputs = self.model.generate(
-#                 **inputs,
-#                 max_new_tokens=400,
-#                 num_beams=4,
-#                 temperature=0.7,
-#                 early_stopping=True
-#             )
-#             summary = self.tokenizer.decode(outputs[0], skip_special_tokens=True).strip()
-#             return summary
-#         except Exception as e:
-#             return f"Error summarizing chunk: {str(e)}"
-#     def generate_clinical_summary(self, patient_data: dict) -> str:
-#         """
-#         End-to-end method to generate a comprehensive clinical summary.
-#         Mimics the logic flow of the reference script: narrative, assessment, pathway, formatting, and evaluation.
-#         """
-#         print("✨ Generating clinical summary using Falconsai/medical_summarization...")
-#         try:
-#             # Step 1: Build a chronological narrative from all encounters
-#             narrative_history = self.build_chronological_narrative(patient_data)
-#             print(f"\n--- Prompt Sent to Model (truncated) ---\n{fill(narrative_history, width=80)[:1000]}...\n")
-#             # Step 2: Summarize in chunks if needed
-#             chunks = self.chunk_text(narrative_history, chunk_size=1500)
-#             chunk_summaries = [self.summarize_chunk(chunk) for chunk in chunks]
-#             raw_summary_text = " ".join(chunk_summaries)
-#             print(f"\n--- Raw Model Output ---\n{fill(raw_summary_text, width=80)}\n")
-#             # Step 3: Format into structured clinical report
-#             formatted_report = self.format_clinical_output(raw_summary_text, patient_data)
-#             # Step 4: Simulated guideline evaluation
-#             evaluation_report = self.evaluate_summary_against_guidelines(raw_summary_text, patient_data)
-#             # Step 5: Combine final output
-#             final_output = (
-#                 f"\n{'='*80}\n"
-#                 f"             FINAL CLINICAL SUMMARY REPORT\n"
-#                 f"{'='*80}\n"
-#                 f"{formatted_report}\n\n"
-#                 f"{'='*80}\n"
-#                 f"             SIMULATED EVALUATION REPORT\n"
-#                 f"{'='*80}\n"
-#                 f"{evaluation_report}"
-#             )
-#             return final_output
-#         except Exception as e:
-#             print(f"❌ Error during summary generation: {e}")
-#             import traceback
-#             traceback.print_exc()
-#             return f"Error generating summary: {str(e)}"
-#     def build_chronological_narrative(self, patient_data: dict) -> str:
-#         """
-#         Builds a chronological narrative from multi-encounter patient history.
-#         """
-#         result = patient_data["result"]
-#         narrative = []
-#         # Past Medical History
-#         narrative.append(f"Past Medical History: {', '.join(result.get('past_medical_history', []))}.")
-#         # Social History
-#         social = result.get('social_history', 'Not specified.')
-#         narrative.append(f"Social History: {social}.")
-#         # Allergies
-#         allergies = ', '.join(result.get('allergies', ['None']))
-#         narrative.append(f"Allergies: {allergies}.")
-#         # Loop through encounters chronologically
-#         for enc in result.get("encounters", []):
-#             encounter_str = (
-#                 f"Encounter on {enc['visit_date']}: "
-#                 f"Chief Complaint: '{enc['chief_complaint']}'. "
-#                 f"Symptoms: {enc.get('symptoms', 'None reported')}. "
-#                 f"Diagnosis: {', '.join(enc['diagnosis'])}. "
-#                 f"Doctor's Notes: {enc['dr_notes']}. "
-#             )
-#             if enc.get('vitals'):
-#                 encounter_str += f"Vitals: {', '.join([f'{k}: {v}' for k, v in enc['vitals'].items()])}. "
-#             if enc.get('lab_results'):
-#                 encounter_str += f"Labs: {', '.join([f'{k}: {v}' for k, v in enc['lab_results'].items()])}. "
-#             if enc.get('medications'):
-#                 encounter_str += f"Medications: {', '.join(enc['medications'])}. "
-#             if enc.get('treatment'):
-#                 encounter_str += f"Treatment: {enc['treatment']}."
-#             narrative.append(encounter_str)
-#         return "\n".join(narrative)
-#     def chunk_text(self, text: str, chunk_size: int = 1500) -> list:
-#         """
-#         Splits a long text into overlapping chunks for processing.
-#         """
-#         words = text.split()
-#         chunks = []
-#         for i in range(0, len(words), chunk_size):
-#             chunk = " ".join(words[i:i + chunk_size])
-#             chunks.append(chunk)
-#         return chunks
-#     def format_clinical_output(self, raw_summary: str, patient_data: dict) -> str:
-#         """
-#         Formats the raw AI-generated summary into a structured, doctor-friendly report.
-#         """
-#         result = patient_data["result"]
-#         last_encounter = result["encounters"][-1] if result.get("encounters") else result
-#         # Consolidate active problems
-#         all_diagnoses_raw = set(result.get('past_medical_history', []))
-#         for enc in result.get('encounters', []):
-#             all_diagnoses_raw.update(enc.get('diagnosis', []))
-#         cleaned_diagnoses = sorted({
-#             re.sub(r'\s*\([^)]*\)', '', dx).strip() for dx in all_diagnoses_raw
-#         })
-#         # Consolidate current medications
-#         all_medications = set()
-#         for enc in result.get('encounters', []):
-#             all_medications.update(enc.get('medications', []))
-#         current_meds = sorted(all_medications)
-#         # Report Header
-#         report = "\n==============================================\n"
-#         report += "             CLINICAL SUMMARY REPORT\n"
-#         report += "==============================================\n"
-#         report += f"Generated On: {datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n"
-#         # Patient Overview
-#         report += "\n--- PATIENT OVERVIEW ---\n"
-#         report += f"Name: {result.get('patientname', 'Unknown')}\n"
-#         report += f"Patient ID: {result.get('patientnumber', 'Unknown')}\n"
-#         gender = result.get('gender', 'Unknown')
-#         report += f"Age/Sex: {result.get('agey', 'Unknown')} {gender[0] if gender != 'Unknown' else 'U'}\n"
-#         report += f"Allergies: {', '.join(result.get('allergies', ['None']))}\n"
-#         # Social History
-#         report += "\n--- SOCIAL HISTORY ---\n"
-#         report += fill(result.get('social_history', 'Not specified.'), width=80) + "\n"
-#         # Immediate Attention
-#         report += "\n--- IMMEDIATE ATTENTION (Most Recent Encounter) ---\n"
-#         report += f"Date of Event: {last_encounter.get('visit_date', 'Unknown')}\n"
-#         report += f"Chief Complaint: {last_encounter.get('chief_complaint', 'Not specified')}\n"
-#         if last_encounter.get('vitals'):
-#             vitals_str = ', '.join([f'{k}: {v}' for k, v in last_encounter['vitals'].items()])
-#             report += f"Vitals: {vitals_str}\n"
-#         critical_diagnoses = [
-#             dx for dx in last_encounter.get('diagnosis', [])
-#             if any(kw in dx.lower() for kw in ['acute', 'new onset', 'fall', 'afib', 'kidney injury'])
-#         ]
-#         if critical_diagnoses:
-#             report += f"Critical New Diagnoses: {', '.join(critical_diagnoses)}\n"
-#         report += f"Doctor's Notes: {last_encounter.get('dr_notes', 'N/A')}\n"
-#         # Active Problem List
-#         report += "\n--- ACTIVE PROBLEM LIST (Consolidated) ---\n"
-#         report += "\n".join(f"- {dx}" for dx in cleaned_diagnoses) + "\n"
-#         # Current Medications
-#         report += "\n--- CURRENT MEDICATION LIST (Consolidated) ---\n"
-#         report += "\n".join(f"- {med}" for med in current_meds) + "\n"
-#         # Procedures
-#         procedures = set()
-#         for enc in result.get('encounters', []):
-#             if 'treatment' in enc and 'PCI' in enc['treatment']:
-#                 procedures.add(enc['treatment'])
-#         if procedures:
-#             report += "\n--- PROCEDURES & SURGERIES ---\n"
-#             report += "\n".join(f"- {proc}" for proc in sorted(procedures)) + "\n"
-#         # AI-Generated Narrative
-#         report += "\n--- AI-GENERATED CLINICAL NARRATIVE ---\n"
-#         report += fill(raw_summary, width=80) + "\n"
-#         # Placeholder sections if not in model output
-#         if "Assessment and Plan" not in raw_summary:
-#             report += "\n--- ASSESSMENT, PLAN AND NEXT STEPS (AI-Generated) ---\n"
-#             report += "The model did not generate a structured assessment and plan. Please review clinical context.\n"
-#         if "Clinical Pathway" not in raw_summary:
-#             report += "\n--- CLINICAL PATHWAY (AI-Generated) ---\n"
-#             report += "No clinical pathway was generated. Consider next steps based on active issues.\n"
-#         return report
-#     def evaluate_summary_against_guidelines(self, summary_text: str, patient_data: dict) -> str:
-#         """
-#         Simulated evaluation of summary against clinical guidelines.
-#         """
-#         result = patient_data["result"]
-#         last_enc = result["encounters"][-1] if result.get("encounters") else {}
-#         summary_lower = summary_text.lower()
-#         evaluation = (
-#             "\n==============================================\n"
-#             "      AI SUMMARY EVALUATION & GUIDELINE CHECK\n"
-#             "==============================================\n"
-#         )
-#         # Keyword-based accuracy
-#         critical_keywords = [
-#             "fall", "dizziness", "atrial fibrillation", "afib", "rvr", "kidney", "ckd",
-#             "diabetes", "anticoagulation", "warfarin", "aspirin", "statin", "metformin",
-#             "gout", "angina", "pci", "bph", "hypertension", "metoprolol", "clopidogrel"
-#         ]
-#         found = [kw for kw in critical_keywords if kw in summary_lower]
-#         score = (len(found) / len(critical_keywords)) * 10
-#         evaluation += f"\n1. KEYWORD ACCURACY SCORE: {score:.1f}/10\n"
-#         evaluation += f"   - Found {len(found)} out of {len(critical_keywords)} critical concepts.\n"
-#         # Guideline checks
-#         evaluation += "\n2. CLINICAL GUIDELINE COMMENTARY (SIMULATED):\n"
-#         has_afib = any("atrial fibrillation" in dx.lower() for dx in last_enc.get('diagnosis', []))
-#         on_anticoag = any("warfarin" in med.lower() or "apixaban" in med.lower() for med in last_enc.get('medications', []))
-#         if has_afib:
-#             evaluation += "   - ✅ Patient with Atrial Fibrillation is on anticoagulation.\n" if on_anticoag \
-#                 else "   - ❌ Atrial Fibrillation present but no anticoagulant prescribed.\n"
-#         has_mi = any("myocardial infarction" in hx.lower() for hx in result.get('past_medical_history', []))
-#         on_statin = any("atorvastatin" in med.lower() or "statin" in med.lower() for med in last_enc.get('medications', []))
-#         if has_mi:
-#             evaluation += "   - ✅ Patient with MI history is on statin therapy.\n" if on_statin \
-#                 else "   - ❌ Patient with MI history is not on statin therapy.\n"
-#         has_aki = any("acute kidney injury" in dx.lower() for dx in last_enc.get('diagnosis', []))
-#         acei_held = "hold" in last_enc.get('dr_notes', '').lower() and "lisinopril" in last_enc.get('dr_notes', '')
-#         if has_aki:
-#             evaluation += "   - ✅ AKI noted and ACE inhibitor was appropriately held.\n" if acei_held \
-#                 else "   - ⚠️ AKI present but ACE inhibitor not documented as held.\n"
-#         evaluation += (
-#             "\nDisclaimer: This is a simulated evaluation and not a substitute for clinical judgment.\n"
-#         )
-#         return evaluation
-# import datetime
-# import torch
-# import warnings
-# import re
-# import json
-# from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
-# from textwrap import fill
-# # Suppress non-critical warnings
-# warnings.filterwarnings("ignore", category=UserWarning)
-# class PatientSummarizerAgent:
-#     def __init__(self, model_name: str = "Falconsai/medical_summarization", model_type: str = "seq2seq", device: str = None):
-#         self.device = device or ('cuda' if torch.cuda.is_available() else 'cpu')
-#         if self.device == 'cuda':
-#             torch.cuda.empty_cache()
-#         print(f"✅ Using device for tensors: {self.device}")
-#         self.model_name = model_name
-#         if model_type != "seq2seq":
-#             raise ValueError(f"Unsupported model_type: {model_type}. Only 'seq2seq' is supported.")
-#         try:
-#             self.tokenizer, self.model = self.load_model(model_name, self.device)
-#             print(f"✅ Model '{model_name}' loaded successfully.")
-#         except RuntimeError as e:
-#             print(f"❌ Failed to load model: {e}")
-#             raise
-#     def load_model(self, model_name: str, device: str):
-#         try:
-#             print(f"🔄 Loading model: {model_name}...")
-#             tokenizer = AutoTokenizer.from_pretrained(model_name)
-#             model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
-#             if device == 'cuda':
-#                 model = model.half()
-#             model.to(device)
-#             model.eval()
-#             if tokenizer.pad_token is None:
-#                 tokenizer.pad_token = tokenizer.eos_token
-#             return tokenizer, model
-#         except Exception as e:
-#             raise RuntimeError(f"Model loading failed: {str(e)}")
-#     def summarize_chunk(self, text: str) -> str:
-#         try:
-#             inputs = self.tokenizer(
-#                 text,
-#                 return_tensors="pt",
-#                 truncation=True,
-#                 max_length=1024,
-#                 padding=True
-#             ).to(self.device)
-#             outputs = self.model.generate(
-#                 **inputs,
-#                 max_new_tokens=400,
-#                 num_beams=4,
-#                 temperature=0.7,
-#                 early_stopping=True
-#             )
-#             summary = self.tokenizer.decode(outputs[0], skip_special_tokens=True).strip()
-#             return summary
-#         except Exception as e:
-#             return f"Error summarizing chunk: {str(e)}"
-#     def parse_vitals(self, vitals_list):
-#         vitals_dict = {"BP": "N/A", "HR": "N/A", "Temp": "N/A", "SpO2": "N/A", "Height": "N/A", "Weight": "N/A", "BMI": "N/A"}
-#         if not isinstance(vitals_list, list):
-#             return vitals_dict
-#         for item in vitals_list:
-#             if not isinstance(item, dict):
-#                 continue
-#             name = item.get("name", "").lower()
-#             value = item.get("value", "N/A")
-#             if "bp(sys)" in name:
-#                 dia = next((v["value"] for v in vitals_list if "bp(dia)" in v.get("name", "").lower()), "N/A")
-#                 vitals_dict["BP"] = f"{value}/{dia}" if value != "N/A" and dia != "N/A" else "N/A"
-#             elif "pulse" in name or "hr" in name:
-#                 vitals_dict["HR"] = value
-#             elif "temp" in name:
-#                 vitals_dict["Temp"] = value
-#             elif "spo2" in name or "o2 sat" in name:
-#                 vitals_dict["SpO2"] = value
-#             elif "height" in name:
-#                 vitals_dict["Height"] = value
-#             elif "weight" in name:
-#                 vitals_dict["Weight"] = value
-#             elif "bmi" in name:
-#                 vitals_dict["BMI"] = value
-#         return vitals_dict
-#     def build_chronological_narrative(self, patient_data: dict) -> str:
-#         narrative = []
-#         result = patient_data.get("result", {})
-#         flattened = patient_data.get("flattened", [])
-#         # Extract basic patient info
-#         narrative.append(f"Patient ID: {result.get('patientnumber', 'Unknown')}")
-#         narrative.append(f"Age/Sex: {result.get('agey', 'Unknown')} {result.get('gender', 'Unknown')}")
-#         narrative.append(f"Allergies: {', '.join(result.get('allergies', ['None known']))}")
-#         narrative.append(f"Social History: {result.get('social_history', 'Not specified')}")
-#         narrative.append(f"Past Medical History: {', '.join(result.get('past_medical_history', ['None']))}")
-#         # Parse Chartsummarydtl from flattened
-#         encounters = []
-#         for item in flattened:
-#             if item.startswith("Chartsummarydtl:"):
-#                 try:
-#                     chart_data_str = item.split("Chartsummarydtl:")[1].strip()
-#                     chart_data = json.loads(chart_data_str)
-#                     if isinstance(chart_data, list):
-#                         encounters.extend(chart_data)
-#                 except (IndexError, json.JSONDecodeError, ValueError) as e:
-#                     print(f"Failed to parse Chartsummarydtl: {e}")
-#                     continue
-#         if not encounters:
-#             narrative.append("No encounter data available.")
-#             return "\n".join(narrative)
-#         # Sort encounters by date
-#         encounters = sorted(encounters, key=lambda x: x.get('chartdate', ''), reverse=True)
-#         for enc in encounters:
-#             vitals = self.parse_vitals(enc.get('vitals', []))
-#             encounter_str = f"Encounter on {enc.get('chartdate', 'Unknown')}: "
-#             encounter_str += f"Chief Complaint: {result.get('chief_complaint', 'Not specified')}. "
-#             encounter_str += f"Vitals: BP: {vitals['BP']}, HR: {vitals['HR']}, SpO2: {vitals['SpO2']}, Temp: {vitals['Temp']}, Height: {vitals['Height']}, Weight: {vitals['Weight']}, BMI: {vitals['BMI']}. "
-#             encounter_str += f"Diagnosis: {', '.join(enc.get('diagnosis', ['None']))}. "
-#             # Deduplicate medications
-#             medications = list(set(enc.get('medications', ['None'])))
-#             encounter_str += f"Medications: {', '.join(med.strip(' ||') for med in medications)}. "
-#             encounter_str += f"Lab Tests: {', '.join(enc.get('labtests', ['None']))}. "
-#             radiology = [r['name'] for r in enc.get('radiologyorders', [])]
-#             encounter_str += f"Radiology Orders: {', '.join(radiology) if radiology else 'None'}. "
-#             encounter_str += f"Allergies: {', '.join(enc.get('allergies', ['None']))}."
-#             narrative.append(encounter_str)
-#         return "\n".join(narrative)
-#     def chunk_text(self, text: str, chunk_size: int = 1500) -> list:
-#         words = text.split()
-#         chunks = []
-#         for i in range(0, len(words), chunk_size):
-#             chunk = " ".join(words[i:i + chunk_size])
-#             chunks.append(chunk)
-#         return chunks if chunks else [text]
-#     def generate_clinical_summary(self, patient_data: dict) -> str:
-#         print(f"✨ Generating clinical summary using model: {self.model_name}...")
-#         try:
-#             narrative_history = self.build_chronological_narrative(patient_data)
-#             print(f"\n--- Prompt Sent to Model (truncated) ---\n{fill(narrative_history, width=80)[:1000]}...")
-#             chunks = self.chunk_text(narrative_history, chunk_size=1500)
-#             chunk_summaries = [self.summarize_chunk(chunk) for chunk in chunks]
-#             raw_summary_text = " ".join(chunk_summaries)
-#             print(f"\n--- Raw Model Output ---\n{fill(raw_summary_text, width=80)}")
-#             formatted_report = self.format_clinical_output(raw_summary_text, patient_data)
-#             evaluation_report = self.evaluate_summary_against_guidelines(raw_summary_text, patient_data)
-#             final_output = (
-#                 f"\n{'='*80}\n"
-#                 f"             FINAL CLINICAL SUMMARY REPORT\n"
-#                 f"{'='*80}\n"
-#                 f"{formatted_report}\n"
-#                 f"{'='*80}\n"
-#                 f"             SIMULATED EVALUATION REPORT\n"
-#                 f"{'='*80}\n"
-#                 f"{evaluation_report}"
-#             )
-#             return final_output
-#         except Exception as e:
-#             print(f"❌ Error during summary generation: {e}")
-#             import traceback
-#             traceback.print_exc()
-#             return f"Error generating summary: {str(e)}"
-#     def format_clinical_output(self, raw_summary: str, patient_data: dict) -> str:
-#         result = patient_data.get("result", {})
-#         flattened = patient_data.get("flattened", [])
-#         encounters = []
-#         for item in flattened:
-#             if item.startswith("Chartsummarydtl:"):
-#                 try:
-#                     chart_data = json.loads(item.split("Chartsummarydtl:")[1].strip())
-#                     if isinstance(chart_data, list):
-#                         encounters.extend(chart_data)
-#                 except (IndexError, json.JSONDecodeError, ValueError) as e:
-#                     print(f"Failed to parse Chartsummarydtl: {e}")
-#                     continue
-#         last_encounter = sorted(encounters, key=lambda x: x.get('chartdate', ''), reverse=True)[0] if encounters else {}
-#         all_diagnoses = set(result.get('past_medical_history', []))
-#         all_medications = set()
-#         for enc in encounters:
-#             all_diagnoses.update(enc.get('diagnosis', []))
-#             all_medications.update(med.strip(' ||') for med in enc.get('medications', []))
-#         cleaned_diagnoses = sorted({re.sub(r'\s*\([^)]*\)', '', dx).strip() for dx in all_diagnoses})
-#         current_meds = sorted(all_medications)
-#         report = (
-#             "\n==============================================\n"
-#             "             CLINICAL SUMMARY REPORT\n"
-#             "==============================================\n"
-#             f"Generated On: {datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n"
-#         )
-#         report += f"Name: {result.get('patientname', 'Unknown')}\n"
-#         report += f"Patient ID: {result.get('patientnumber', 'Unknown')}\n"
-#         gender = result.get('gender', 'Unknown')
-#         report += f"Age/Sex: {result.get('agey', 'Unknown')} {gender[0] if gender != 'Unknown' else 'U'}\n"
-#         report += f"Allergies: {', '.join(result.get('allergies', ['None known']))}\n"
-#         report += "\n--- SOCIAL HISTORY ---\n"
-#         report += fill(result.get('social_history', 'Not specified.'), width=80) + "\n"
-#         report += "\n--- IMMEDIATE ATTENTION (Most Recent Encounter) ---\n"
-#         report += f"Date of Event: {last_encounter.get('chartdate', 'Unknown')}\n"
-#         report += f"Chief Complaint: {result.get('chief_complaint', 'Not specified')}\n"
-#         if last_encounter.get('vitals'):
-#             vitals = self.parse_vitals(last_encounter['vitals'])
-#             vitals_str = ', '.join([f"{k}: {v}" for k, v in vitals.items()])
-#             report += f"Vitals: {vitals_str}\n"
-#         critical_diagnoses = [
-#             dx for dx in last_encounter.get('diagnosis', [])
-#             if any(kw in dx.lower() for kw in ['acute', 'new onset', 'fall', 'afib', 'kidney injury'])
-#         ]
-#         if critical_diagnoses:
-#             report += f"Critical New Diagnoses: {', '.join(critical_diagnoses)}\n"
-#         report += f"Doctor's Notes: {last_encounter.get('dr_notes', 'N/A')}\n"
-#         report += "\n--- ACTIVE PROBLEM LIST (Consolidated) ---\n"
-#         report += "\n".join(f"- {dx}" for dx in cleaned_diagnoses) + "\n" if cleaned_diagnoses else "- None\n"
-#         report += "\n--- CURRENT MEDICATION LIST (Consolidated) ---\n"
-#         report += "\n".join(f"- {med}" for med in current_meds) + "\n" if current_meds else "- None\n"
-#         report += "\n--- AI-GENERATED CLINICAL NARRATIVE ---\n"
-#         report += fill(raw_summary, width=80) + "\n"
-#         report += "\n--- ASSESSMENT, PLAN AND NEXT STEPS (AI-Generated) ---\n"
-#         report += "The model did not generate a structured assessment and plan. Please review clinical context.\n"
-#         report += "\n--- CLINICAL PATHWAY (AI-Generated) ---\n"
-#         report += "No clinical pathway was generated. Consider next steps based on active issues.\n"
-#         return report
-#     def evaluate_summary_against_guidelines(self, summary_text: str, patient_data: dict) -> str:
-#         result = patient_data.get("result", {})
-#         flattened = patient_data.get("flattened", [])
-#         encounters = []
-#         for item in flattened:
-#             if item.startswith("Chartsummarydtl:"):
-#                 try:
-#                     chart_data = json.loads(item.split("Chartsummarydtl:")[1].strip())
-#                     if isinstance(chart_data, list):
-#                         encounters.extend(chart_data)
-#                 except (IndexError, json.JSONDecodeError, ValueError):
-#                     continue
-#         last_enc = sorted(encounters, key=lambda x: x.get('chartdate', ''), reverse=True)[0] if encounters else {}
-#         summary_lower = summary_text.lower()
-#         evaluation = (
-#             "\n==============================================\n"
-#             "      AI SUMMARY EVALUATION & GUIDELINE CHECK\n"
-#             "==============================================\n"
-#         )
-#         critical_keywords = [
-#             "metrogyl", "rantac", "ultrasound", "egg allergy", "blood pressure", "pulse", "spo2",
-#             "bmi", "temperature", "pain", "height", "weight"
-#         ]
-#         found = [kw for kw in critical_keywords if kw in summary_lower]
-#         score = (len(found) / len(critical_keywords)) * 10
-#         evaluation += f"\n1. KEYWORD ACCURACY SCORE: {score:.1f}/10\n"
-#         evaluation += f"   - Found {len(found)} out of {len(critical_keywords)} critical concepts.\n"
-#         evaluation += "\n2. CLINICAL GUIDELINE COMMENTARY (SIMULATED):\n"
-#         has_allergy = any("egg allergy" in a.lower() for a in last_enc.get('allergies', []))
-#         if has_allergy:
-#             evaluation += "   - ✅ Egg allergy noted in the patient record.\n"
-#         has_medications = bool(last_enc.get('medications', []))
-#         if has_medications:
-#             medications = list(set(med.strip(' ||') for med in last_enc.get('medications', [])))
-#             evaluation += f"   - ✅ Medications prescribed: {', '.join(medications)}.\n"
-#         else:
-#             evaluation += "   - ⚠️ No medications prescribed in the latest encounter.\n"
-#         has_radiology = bool(last_enc.get('radiologyorders', []))
-#         if has_radiology:
-#             radiology = [r['name'] for r in last_enc['radiologyorders']]
-#             evaluation += f"   - ✅ Radiology orders issued: {', '.join(radiology)}.\n"
-#         evaluation += "\nDisclaimer: This is a simulated evaluation and not a substitute for clinical judgment.\n"
-#         return evaluation
-# import torch
-# import warnings
-# import logging
-# import json
-# import requests
-# from typing import List, Dict, Union, Optional
-# from flask import Flask, request, jsonify
-# from transformers import (
-#     AutoTokenizer,
-#     AutoModelForSeq2SeqLM,
-#     AutoModelForCausalLM,
-#     AutoConfig
-# )
-# # -----------------------------
-# # Setup
-# # -----------------------------
-# warnings.filterwarnings("ignore", category=UserWarning)
-# logging.basicConfig(level=logging.INFO)
-# logger = logging.getLogger(__name__)
-# # -----------------------------
-# # Enhanced Patient Summarizer Agent
-# # -----------------------------
-# class PatientSummarizerAgent:
-#     def __init__(
-#         self,
-#         model_name: str = "Falconsai/medical_summarization",
-#         model_type: Optional[str] = None,
-#         device: Optional[str] = None,
-#         max_input_tokens: int = 2048,
-#         max_output_tokens: int = 512
-#     ):
-#         self.model_name = model_name
-#         self.device = device or ("cuda" if torch.cuda.is_available() else "cpu")
-#         self.max_input_tokens = max_input_tokens
-#         self.max_output_tokens = max_output_tokens
-#         logger.info(f"Loading model '{model_name}' on {self.device}...")
-#         config = AutoConfig.from_pretrained(model_name)
-#         if config.model_type in ["t5", "bart", "mbart", "longt5", "led"]:
-#             self.model_type = "seq2seq"
-#             self.model = AutoModelForSeq2SeqLM.from_pretrained(model_name).to(self.device)
-#         else:
-#             self.model_type = "causal"
-#             self.model = AutoModelForCausalLM.from_pretrained(model_name).to(self.device)
-#         self.tokenizer = AutoTokenizer.from_pretrained(model_name)
-#         if self.tokenizer.pad_token is None:
-#             self.tokenizer.pad_token = self.tokenizer.eos_token
-#         logger.info("Model loaded successfully.")
-#     def _parse_patient_data(self, data: Union[List[str], Dict]) -> Dict:
-#         """Convert flattened list or dict to key-value dict."""
-#         if isinstance(data, dict):
-#             return data
-#         elif isinstance(data, list):
-#             patient_dict = {}
-#             for entry in data:
-#                 if ":" in entry:
-#                     parts = entry.split(":", 1)
-#                     key = parts[0].strip()
-#                     value = parts[1].strip() if len(parts) > 1 else "N/A"
-#                     patient_dict[key] = value
-#             return patient_dict
-#         else:
-#             raise ValueError("Patient data must be a dict or a list of 'key: value' strings.")
-#     def _build_prompt(self, patient_info: Dict) -> str:
-#         """Build a rich, instructive prompt for clinical reasoning."""
-#         patient_details = "\n".join([f"{k}: {v}" for k, v in patient_info.items() if v not in ["N/A", ""]])
-#         prompt = (
-#             "You are an AI clinical assistant. Analyze the patient data below and generate a structured, "
-#             "professional summary for use by physicians. Focus on:\n"
-#             "1. Patient Overview (age, gender, key identifiers)\n"
-#             "2. Key Medical History (PMH, allergies, medications)\n"
-#             "3. Vital Sign Trends (BP, HR, weight, SpO2) — highlight changes over time\n"
-#             "4. Assessment (possible conditions based on data)\n"
-#             "5. Recommendations (labs, imaging, referrals, medication review)\n\n"
-#             "Rules:\n"
-#             "- Only use information provided. Do not invent details.\n"
-#             "- If a value is increasing (e.g., BP), flag it as a concern.\n"
-#             "- If a medication is repeated across visits, assume chronic use.\n"
-#             "- If a test (e.g., ultrasound) is ordered repeatedly without result, recommend follow-up.\n"
-#             "- Use concise, professional language.\n\n"
-#             "--- PATIENT DATA ---\n"
-#             f"{patient_details}\n\n"
-#             "Provide the summary in this format:\n"
-#             "Patient Overview:\n"
-#             "Medical History:\n"
-#             "Vital Trends:\n"
-#             "Assessment:\n"
-#             "Recommendations:"
-#         )
-#         return prompt
-#     def generate_clinical_summary(self, patient_data: Union[List[str], Dict]) -> str:
-#         """Generate a clinical summary with error handling."""
-#         try:
-#             patient_info = self._parse_patient_data(patient_data)
-#             prompt = self._build_prompt(patient_info)
-#             inputs = self.tokenizer(
-#                 prompt,
-#                 return_tensors="pt",
-#                 truncation=True,
-#                 max_length=self.max_input_tokens,
-#                 padding=True
-#             ).to(self.device)
-#             if self.model_type == "seq2seq":
-#                 outputs = self.model.generate(
-#                     **inputs,
-#                     max_new_tokens=self.max_output_tokens,
-#                     num_beams=4,
-#                     temperature=0.7,
-#                     top_p=0.9,
-#                     do_sample=True
-#                 )
-#             else:
-#                 outputs = self.model.generate(
-#                     **inputs,
-#                     max_new_tokens=self.max_output_tokens,
-#                     temperature=0.7,
-#                     top_p=0.9,
-#                     do_sample=True,
-#                     pad_token_id=self.tokenizer.eos_token_id
-#                 )
-#             summary = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
-#             return summary.strip()
-#         except Exception as e:
-#             logger.error(f"Error during summary generation: {str(e)}")
-#             return "Error: Failed to generate clinical summary due to model processing error."
-# # agent.py partially working
-# import torch
-# import warnings
-# import logging
-# from typing import List, Dict, Union, Optional
-# from transformers import (
-#     AutoTokenizer,
-#     AutoModelForSeq2SeqLM,
-#     AutoModelForCausalLM,
-#     AutoConfig
-# )
-# warnings.filterwarnings("ignore", category=UserWarning)
-# logging.basicConfig(level=logging.INFO)
-# logger = logging.getLogger(__name__)
-# class PatientSummarizerAgent:
-#     def __init__(
-#         self,
-#         model_name: str = "Falconsai/medical_summarization",
-#         device: Optional[str] = None,
-#         max_input_tokens: int = 2048,
-#         max_output_tokens: int = 512
-#     ):
-#         self.model_name = model_name
-#         self.device = device or ("cuda" if torch.cuda.is_available() else "cpu")
-#         self.max_input_tokens = max_input_tokens
-#         self.max_output_tokens = max_output_tokens
-#         logger.info(f"Loading model '{model_name}' on {self.device}...")
-#         try:
-#             config = AutoConfig.from_pretrained(model_name)
-#             if config.model_type in ["t5", "bart", "mbart", "longt5", "led"]:
-#                 self.model_type = "seq2seq"
-#                 self.model = AutoModelForSeq2SeqLM.from_pretrained(model_name).to(self.device)
-#             else:
-#                 self.model_type = "causal"
-#                 self.model = AutoModelForCausalLM.from_pretrained(model_name).to(self.device)
-#             self.tokenizer = AutoTokenizer.from_pretrained(model_name)
-#             if self.tokenizer.pad_token is None:
-#                 self.tokenizer.pad_token = self.tokenizer.eos_token
-#             if self.tokenizer.sep_token is None:
-#                 self.tokenizer.sep_token = self.tokenizer.eos_token
-#             logger.info(f"Model '{model_name}' loaded successfully as {config.model_type}.")
-#         except Exception as e:
-#             logger.critical(f"Model loading failed: {str(e)}", exc_info=True)
-#             raise RuntimeError(f"Model loading failed: {str(e)}")
-#     def _parse_patient_data(self, data: Union[List[str], Dict]) -> Dict:
-#         """Safely parse flattened list into dict without overwriting or nesting issues."""
-#         if isinstance(data, dict):
-#             return data
-#         elif isinstance(data, list):
-#             patient_dict = {}
-#             for entry in data:
-#                 if not isinstance(entry, str) or ":" not in entry:
-#                     continue
-#                 key, *value_parts = entry.split(":", 1)
-#                 value = value_parts[0].strip() if value_parts else "N/A"
-#                 key = key.strip()
-#                 # Skip if value is a dict repr (like "{...}")
-#                 if value.startswith("{") and value.endswith("}"):
-#                     continue
-#                 if key in patient_dict:
-#                     if isinstance(patient_dict[key], list):
-#                         patient_dict[key].append(value)
-#                     else:
-#                         patient_dict[key] = [patient_dict[key], value]
-#                 else:
-#                     patient_dict[key] = value
-#             # Deduplicate and clean
-#             cleaned = {}
-#             for k, v in patient_dict.items():
-#                 if isinstance(v, list):
-#                     unique_vals = list({x for x in v if x not in ["N/A", "Unknown", ""]})
-#                     cleaned[k] = ", ".join(unique_vals) if unique_vals else "N/A"
-#                 else:
-#                     cleaned[k] = v if v not in ["", "Unknown", "N/A"] else "N/A"
-#             return cleaned
-#         else:
-#             raise ValueError("Unsupported data format")
-#     def _build_prompt(self, patient_info: Dict) -> str:
-#         """Build a dynamic, instructive prompt for clinical reasoning."""
-#         non_na_items = [
-#             f"{k}: {v}" for k, v in patient_info.items()
-#             if v not in ["N/A", "Unknown", "None known", "Stable", "Not specified", "", "None"]
-#             and isinstance(v, str)
-#             and len(v.strip()) > 1
-#         ]
-#         patient_details = "\n".join(non_na_items)
-#         prompt = (
-#             "You are an expert AI clinical assistant. Analyze the following patient data and generate a structured, "
-#             "concise, and actionable summary for physicians. Use only the provided information.\n\n"
-#             "Include:\n"
-#             "1. Patient Overview (age, gender, ID)\n"
-#             "2. Medical History (allergies, medications, diagnosis)\n"
-#             "3. Vital Trends (BP, HR, SpO2, weight) — highlight changes over last 3 visits\n"
-#             "4. Test Trends (labs, imaging) — flag repeated orders without results\n"
-#             "5. Assessment (possible conditions)\n"
-#             "6. Recommendations (labs, imaging, referrals, med review)\n\n"
-#             "Rules:\n"
-#             "- Do not invent any information.\n"
-#             "- If BP is rising (e.g., 132/85 → 135/95), flag it.\n"
-#             "- If a medication appears in ≥2 visits, assume chronic use.\n"
-#             "- If a test is repeated without result, recommend follow-up.\n"
-#             "- Use professional, concise language.\n\n"
-#             "--- PATIENT DATA ---\n"
-#             f"{patient_details}\n\n"
-#             "Provide the summary in this format:\n"
-#             "Patient Overview:\n"
-#             "Medical History:\n"
-#             "Vital Trends:\n"
-#             "Test Trends:\n"
-#             "Assessment:\n"
-#             "Recommendations:"
-#         )
-#         return prompt
-#     def generate_clinical_summary(self, patient_data: Union[List[str], Dict]) -> str:
-#         """Generate a clinical summary with full error resilience."""
-#         try:
-#             patient_info = self._parse_patient_data(patient_data)
-#             prompt = self._build_prompt(patient_info)
-#             inputs = self.tokenizer(
-#                 prompt,
-#                 return_tensors="pt",
-#                 truncation=True,
-#                 max_length=self.max_input_tokens,
-#                 padding=True
-#             ).to(self.device)
-#             if self.model_type == "seq2seq":
-#                 outputs = self.model.generate(
-#                     **inputs,
-#                     max_new_tokens=self.max_output_tokens,
-#                     num_beams=4,
-#                     temperature=0.7,
-#                     top_p=0.9,
-#                     do_sample=True
-#                 )
-#             else:
-#                 outputs = self.model.generate(
-#                     **inputs,
-#                     max_new_tokens=self.max_output_tokens,
-#                     temperature=0.7,
-#                     top_p=0.9,
-#                     do_sample=True,
-#                     pad_token_id=self.tokenizer.pad_token_id or self.tokenizer.eos_token_id,
-#                     eos_token_id=self.tokenizer.eos_token_id
-#                 )
-#             summary = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
-#             return summary.strip()
-#         except Exception as e:
-#             logger.error(f"Summary generation failed: {str(e)}", exc_info=True)
-#             return (
-#                 "Error: Failed to generate clinical summary. "
-#                 "Please check model and input data."
-#             )
-# # agent.py working partially
-# import torch
-# import warnings
-# import logging
-# from typing import List, Dict, Union
-# from transformers import (
-#     AutoTokenizer,
-#     AutoModelForSeq2SeqLM,
-#     AutoModelForCausalLM,
-#     AutoConfig
-# )
-# warnings.filterwarnings("ignore", category=UserWarning)
-# logging.basicConfig(level=logging.INFO)
-# logger = logging.getLogger(__name__)
-# class PatientSummarizerAgent:
-#     def __init__(
-#         self,
-#         model_name: str = "Falconsai/medical_summarization",
-#         device: str = None,
-#         max_input_tokens: int = 2048,
-#         max_output_tokens: int = 512
-#     ):
-#         self.model_name = model_name
-#         self.device = device or ("cuda" if torch.cuda.is_available() else "cpu")
-#         self.max_input_tokens = max_input_tokens
-#         self.max_output_tokens = max_output_tokens
-#         logger.info(f"Loading model '{model_name}' on {self.device}...")
-#         try:
-#             config = AutoConfig.from_pretrained(model_name)
-#             if config.model_type in ["t5", "bart", "mbart"]:
-#                 self.model = AutoModelForSeq2SeqLM.from_pretrained(model_name).to(self.device)
-#                 self.model_type = "seq2seq"
-#             else:
-#                 self.model = AutoModelForCausalLM.from_pretrained(model_name).to(self.device)
-#                 self.model_type = "causal"
-#             self.tokenizer = AutoTokenizer.from_pretrained(model_name)
-#             if self.tokenizer.pad_token is None:
-#                 self.tokenizer.pad_token = self.tokenizer.eos_token
-#             logger.info(f"Model '{model_name}' loaded successfully.")
-#         except Exception as e:
-#             logger.critical(f"Model loading failed: {str(e)}", exc_info=True)
-#             raise RuntimeError(f"Model loading failed: {str(e)}")
-#     def generate_clinical_summary(self, patient_data: Union[List[str], str]) -> str:
-#         """
-#         Generate clinical summary directly from flattened list.
-#         No parsing back to dict — just join into clean text.
-#         """
-#         try:
-#             # Convert to single string
-#             if isinstance(patient_data, list):
-#                 # Join all lines into one clean string
-#                 patient_text = "\n".join(
-#                     line.strip() for line in patient_data if line.strip()
-#                 )
-#             elif isinstance(patient_data, str):
-#                 patient_text = patient_data
-#             else:
-#                 return "Error: Invalid input type."
-#             # Build prompt
-#             prompt = f"""
-# You are an expert AI clinical assistant. Analyze the patient data below and generate a structured,
-# concise, and actionable summary for physicians. Use only the provided information.
-# Include:
-# 1. Patient Overview (age, gender, ID)
-# 2. Medical History (allergies, medications, diagnosis)
-# 3. Vital Trends (BP, HR, SpO2, weight) — highlight changes over last 3 visits
-# 4. Test Trends (labs, imaging) — flag repeated orders without results
-# 5. Assessment (possible conditions)
-# 6. Recommendations (labs, imaging, referrals, medication review)
-# Rules:
-# - Do not invent any information.
-# - If BP is rising (e.g., 132/85 → 135/95), flag it.
-# - If a medication appears in multiple visits, assume chronic use.
-# - If a test is repeated, recommend follow-up.
-# - Use professional, concise language.
-# --- PATIENT DATA ---
-# {patient_text}
-# --- SUMMARY ---
-# Patient Overview:
-# Medical History:
-# Vital Trends:
-# Test Trends:
-# Assessment:
-# Recommendations:""".strip()
-#             # Tokenize
-#             inputs = self.tokenizer(
-#                 prompt,
-#                 return_tensors="pt",
-#                 truncation=True,
-#                 max_length=self.max_input_tokens,
-#                 padding=True
-#             ).to(self.device)
-#             # Generate
-#             if self.model_type == "seq2seq":
-#                 outputs = self.model.generate(
-#                     **inputs,
-#                     max_new_tokens=self.max_output_tokens,
-#                     num_beams=4,
-#                     temperature=0.7,
-#                     top_p=0.9,
-#                     do_sample=True
-#                 )
-#             else:
-#                 outputs = self.model.generate(
-#                     **inputs,
-#                     max_new_tokens=self.max_output_tokens,
-#                     temperature=0.7,
-#                     top_p=0.9,
-#                     do_sample=True,
-#                     pad_token_id=self.tokenizer.pad_token_id or self.tokenizer.eos_token_id,
-#                     eos_token_id=self.tokenizer.eos_token_id
-#                 )
-#             # Decode
-#             summary = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
-#             # Extract only the part after "--- SUMMARY ---"
-#             if "--- SUMMARY ---" in summary:
-#                 summary = summary.split("--- SUMMARY ---")[-1].strip()
-#             return summary
-#         except Exception as e:
-#             logger.error(f"Summary generation failed: {str(e)}", exc_info=True)
-#             return "Error: Failed to generate clinical summary."
-# agent.py
 import torch
-import logging
-from typing import List, Dict, Union
-from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, AutoModelForCausalLM, AutoConfig
-from ai_med_extract.utils.patient_summary_utils import parse_vitals
-logging.basicConfig(level=logging.INFO)
-logger = logging.getLogger(__name__)
 class PatientSummarizerAgent:
     def __init__(
         self,
-        model_name: str = "Falconsai/medical_summarization",
-        device: str = None
     ):
         self.device = device or ("cuda" if torch.cuda.is_available() else "cpu")
-        # Normalize and set default if invalid
-        safe_model_name = (model_name or "").strip() or "falconsai/medical_summarization"
-        if safe_model_name.lower() in {"none", "null"}:
-            safe_model_name = "falconsai/medical_summarization"
         try:
-            self.tokenizer = AutoTokenizer.from_pretrained(safe_model_name)
-            config = AutoConfig.from_pretrained(safe_model_name)
-            if config.model_type in ["t5", "bart"]:
-                self.model = AutoModelForSeq2SeqLM.from_pretrained(safe_model_name).to(self.device)
-                self.model_type = "seq2seq"
             else:
-                self.model = AutoModelForCausalLM.from_pretrained(safe_model_name).to(self.device)
-                self.model_type = "causal"
         except Exception as e:
-            logger.warning(f"Failed to load model '{safe_model_name}' ({e}); falling back to 'falconsai/medical_summarization'.")
-            safe_model_name = "falconsai/medical_summarization"
-            self.tokenizer = AutoTokenizer.from_pretrained(safe_model_name)
-            config = AutoConfig.from_pretrained(safe_model_name)
-            if config.model_type in ["t5", "bart"]:
-                self.model = AutoModelForSeq2SeqLM.from_pretrained(safe_model_name).to(self.device)
-                self.model_type = "seq2seq"
-            else:
-                self.model = AutoModelForCausalLM.from_pretrained(safe_model_name).to(self.device)
-                self.model_type = "causal"
-        if not self.tokenizer.pad_token:
-            self.tokenizer.pad_token = self.tokenizer.eos_token
-        logger.info(f"Loaded model: {safe_model_name} on {self.device}")
     def generate_clinical_summary(self, patient_data: Union[List[str], Dict]) -> str:
         try:
-            # Extract timeline and insights
-            if isinstance(patient_data, dict):
-                data = patient_data.get("result", {})
-            elif isinstance(patient_data, list):
-                # If list, we assume it's flattened — but we need full data
-                return "Error: Timeline data missing. Cannot generate summary."
-            else:
-                return "Error: Invalid input."
-            timeline = data.get("Timeline", "No visit data.")
-            insights = data.get("Insights", {})
-            # Build rich prompt with timeline and analysis. Explicitly instruct to return only sections.
-            prompt = f"""
-You are an expert AI clinical assistant. Analyze the patient's complete visit history and generate a structured, actionable clinical summary history. Use only the provided data. Do not echo any instructions. Return only the sections requested.
---- PATIENT TIMELINE (Narrative across visits) ---
-{timeline}
---- CLINICAL INSIGHTS (Computed trends) ---
-- Total visits: {insights.get('total_visits', 0)}
-- Blood Pressure Trend: {insights.get('bp_trend', 'No data')}
-- Weight Trend: {insights.get('weight_trend', 'No data')}
-- Chronic Medications: {', '.join(insights.get('chronic_meds', [])) or 'None'}
-- Repeated Imaging: {', '.join(insights.get('repeated_imaging', [])) or 'None'}
-Provide the clinical summary using exactly these headings, in this order, with concise content under each:
-Patient Overview:
-Visit History:
-Trend Analysis:
-Assessment:
-Recommendations:
-"""
-            # Tokenize
-            inputs = self.tokenizer(
-                prompt,
-                return_tensors="pt",
-                truncation=True,
-                max_length=2048,
-                padding=True
-            ).to(self.device)
-            # Generate with model-type aware settings
-            if self.model_type == "seq2seq":
-                outputs = self.model.generate(
-                    **inputs,
-                    max_new_tokens=512,
-                    num_beams=4,
-                    temperature=0.7,
-                    top_p=0.9,
-                    do_sample=True,
-                    pad_token_id=self.tokenizer.pad_token_id
                 )
             else:
-                outputs = self.model.generate(
-                    **inputs,
-                    max_new_tokens=512,
                     temperature=0.7,
-                    top_p=0.9,
-                    do_sample=True,
-                    pad_token_id=self.tokenizer.pad_token_id,
-                    eos_token_id=self.tokenizer.eos_token_id
                 )
-            # Decode and sanitize
-            raw_summary = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
-            summary = self._sanitize_and_structure_summary(raw_summary, data)
-            return summary
         except Exception as e:
-            logger.error(f"Summary generation failed: {str(e)}")
-            return "Error: Failed to generate clinical summary."
-    # -----------------------------
-    # Helpers for robust, structured output
-    # -----------------------------
-    def _sanitize_and_structure_summary(self, text: str, data: Dict) -> str:
-        """Remove instruction echoes, ensure required sections with fallbacks."""
-        text = text or ""
-        # Strip leading instruction-like content
-        markers = ["Patient Overview:", "PATIENT OVERVIEW:"]
-        start_idx = min([text.find(m) for m in markers if m in text] or [0])
-        cleaned = text[start_idx:].strip() if start_idx > 0 else text.strip()
-        # Remove common instruction phrases if leaked
-        banned_phrases = [
-            "INSTRUCTIONS", "Generate a summary", "Return only", "Use only the provided data"
         ]
-        lines = [ln for ln in cleaned.splitlines() if not any(bp.lower() in ln.lower() for bp in banned_phrases)]
-        cleaned = "\n".join(lines).strip()
-        sections = self._split_sections(cleaned)
-        required = ["Patient Overview", "Visit History", "Trend Analysis", "Assessment", "Recommendations"]
-        # Build fallbacks from available structured data
-        fallbacks = self._build_fallback_sections(data)
-        # Ensure each section exists and is non-empty
-        ordered_output: List[str] = []
-        for name in required:
-            content = sections.get(name, "").strip()
-            if not content:
-                content = fallbacks.get(name, "N/A")
-            ordered_output.append(f"{name}:\n{content}".strip())
-        return "\n".join(ordered_output).strip()
-    def _split_sections(self, text: str) -> dict:
-        """Split text into sections by known headings, case-insensitive."""
-        headings = ["Patient Overview", "Visit History", "Trend Analysis", "Assessment", "Recommendations"]
-        sections: dict = {}
-        current = None
-        buffer: List[str] = []
-        def flush():
-            nonlocal current, buffer
-            if current is not None:
-                sections[current] = "\n".join(buffer).strip()
-            buffer = []
-        for line in text.splitlines():
-            line_stripped = line.strip()
-            matched = None
-            for h in headings:
-                if line_stripped.lower().startswith(h.lower() + ":"):
-                    matched = h
-                    break
-            if matched:
-                flush()
-                current = matched
-                # If there is text after the colon on same line, keep it
-                after = line_stripped[len(matched)+1:].strip()
-                buffer = [after] if after else []
-            else:
-                if current is None:
-                    # Skip preamble
-                    continue
-                buffer.append(line)
-        flush()
-        return sections
-    def _build_fallback_sections(self, data: Dict) -> dict:
-        """Deterministic sections using demographics, timeline and insights."""
-        name = data.get("Patient Name", "Anonymous")
-        num = data.get("Patient Number", "Unknown")
-        age = data.get("Age", "Unknown")
-        gender = data.get("Gender", "Unknown")
-        dob = data.get("DOB", "N/A")
-        last = data.get("Last Visit", "N/A")
-        overview = f"Name: {name}\nPatient ID: {num}\nAge/Sex: {age} / {gender}\nDOB: {dob}\nLast Visit: {last}"
-        insights = data.get("Insights", {})
-        # Build a structured visit history from normalized encounters if available
-        visit_lines: List[str] = []
-        charts = data.get("chartsummarydtl") or []
-        if isinstance(charts, list) and charts:
-            # sort by date ascending
-            charts_sorted = sorted(charts, key=lambda x: (x.get("chartdate") or x.get("date") or ""))
-            for ch in charts_sorted:
-                date = (ch.get("chartdate") or ch.get("date") or "Unknown")[:10]
-                vitals_str = parse_vitals(ch.get("vitals", []))
-                diag = ", ".join(ch.get("diagnosis", [])) if isinstance(ch.get("diagnosis", []), list) else ""
-                meds_list = ch.get("medications", [])
-                meds_list = meds_list if isinstance(meds_list, list) else []
-                meds = ", ".join(sorted({m.split("||")[0].strip() if isinstance(m, str) else str(m) for m in meds_list if str(m).strip()}))
-                labs_list = ch.get("labtests", [])
-                labs = ", ".join([t.get("name", str(t)) if isinstance(t, dict) else str(t) for t in labs_list if str(t).strip()])
-                radio_list = ch.get("radiologyorders", [])
-                radio = ", ".join([r.get("name", str(r)) if isinstance(r, dict) else str(r) for r in radio_list if str(r).strip()])
-                entry = f"{date}: Vitals: {vitals_str}."
-                if diag:
-                    entry += f" Diagnosis: {diag}."
-                if meds:
-                    entry += f" Medications: {meds}."
-                if labs:
-                    entry += f" Labs: {labs}."
-                if radio:
-                    entry += f" Imaging: {radio}."
-                visit_lines.append(entry)
-        else:
-            # Fallback to narrative timeline text
-            timeline = data.get("Timeline", "No visit data available.")
-            visit_lines = [timeline]
-        trend_lines: List[str] = [
-            f"BP Trend: {insights.get('bp_trend', 'No data')}",
-            f"Weight Trend: {insights.get('weight_trend', 'No data')}",
         ]
-        if insights.get("chronic_meds"):
-            trend_lines.append(f"Chronic Medications: {', '.join(insights['chronic_meds'])}")
-        if insights.get("repeated_imaging"):
-            trend_lines.append(f"Repeated Imaging: {', '.join(insights['repeated_imaging'])}")
-        vt = "\n".join(trend_lines)
-        # Extract diagnosis, meds, allergies from timeline text if present
-        timeline = data.get("Timeline", "")
-        diag, meds, alg, imaging = self._extract_from_timeline(timeline)
-        # Simple assessments/recommendations derived from trends
-        assessment_points: List[str] = []
-        if "No data" not in insights.get("bp_trend", "") and any(x in insights.get("bp_trend", "") for x in ["→", ";"]):
-            assessment_points.append("Blood pressure trend noted; evaluate for hypertension control.")
-        if insights.get("repeated_imaging"):
-            assessment_points.append("Repeated imaging suggests unresolved issue; verify prior reports.")
-        if not assessment_points:
-            assessment_points.append("Review vitals, labs, and medications for ongoing management.")
-        assessment = "\n".join(f"- {p}" for p in assessment_points)
-        recommendations_points: List[str] = []
-        if insights.get("repeated_imaging"):
-            recommendations_points.append("Follow up on repeated imaging with radiology report review.")
-        recommendations_points.append("Medication reconciliation and adherence review.")
-        recommendations_points.append("Consider labs or referrals as clinically indicated.")
-        recommendations = "\n".join(f"- {p}" for p in recommendations_points)
         return {
-            "Patient Overview": overview,
-            "Visit History": "\n".join(visit_lines) if visit_lines else "No visit data available.",
-            "Trend Analysis": vt,
-            "Assessment": assessment,
-            "Recommendations": recommendations,
-        }
-    def _extract_from_timeline(self, timeline: str):
-        import re
-        diags = set()
-        meds = set()
-        algs = set()
-        imaging = set()
-        if not isinstance(timeline, str) or not timeline:
-            return diags, meds, algs, imaging
-        # capture multiple occurrences lazily until period
-        for m in re.finditer(r"Diagnosis:\s*([^\.]+)\.", timeline, flags=re.IGNORECASE):
-            part = m.group(1).strip()
-            for item in [x.strip() for x in part.split(",") if x.strip()]:
-                diags.add(item)
-        for m in re.finditer(r"Medications prescribed:\s*([^\.]+)\.", timeline, flags=re.IGNORECASE):
-            part = m.group(1).strip()
-            for item in [x.strip() for x in part.split(",") if x.strip()]:
-                meds.add(item)
-        for m in re.finditer(r"Allergies noted:\s*([^\.]+)\.", timeline, flags=re.IGNORECASE):
-            part = m.group(1).strip()
-            for item in [x.strip() for x in part.split(",") if x.strip()]:
-                algs.add(item)
-        for m in re.finditer(r"Imaging ordered:\s*([^\.]+)\.", timeline, flags=re.IGNORECASE):
-            part = m.group(1).strip()
-            for item in [x.strip() for x in part.split(",") if x.strip()]:
-                imaging.add(item)
-        return diags, meds, algs, imaging

+import datetime
 import torch
+import warnings
+import re
+import json
+from typing import List, Dict, Union, Optional
+from textwrap import fill
+# Suppress non-critical warnings
+warnings.filterwarnings("ignore", category=UserWarning)
 class PatientSummarizerAgent:
     def __init__(
         self,
+        model_name: str = "falconsai/medical_summarization",
+        model_type: str = "summarization",
+        device: Optional[str] = None,
+        max_input_tokens: int = 2048,
+        max_output_tokens: int = 512
     ):
+        self.model_name = model_name
+        self.model_type = model_type
         self.device = device or ("cuda" if torch.cuda.is_available() else "cpu")
+        self.max_input_tokens = max_input_tokens
+        self.max_output_tokens = max_output_tokens
+        # Initialize model loader through unified model manager
+        self.model_loader = None
+        self._initialize_model_loader()
+        print(f"✅ PatientSummarizerAgent initialized with {model_name} ({model_type}) on {self.device}")
+    def _initialize_model_loader(self):
+        """Initialize the model loader using the unified model manager"""
         try:
+            from ..utils.model_manager import model_manager
+            # Determine if this is a GGUF model
+            if self.model_type == "gguf" or self.model_name.endswith('.gguf'):
+                # Extract filename if model_name contains path
+                if '/' in self.model_name and not self.model_name.startswith('http'):
+                    if self.model_name.endswith('.gguf'):
+                        # Full path to .gguf file
+                        filename = None
+                    else:
+                        # HuggingFace repo with filename
+                        parts = self.model_name.split('/')
+                        if len(parts) >= 2:
+                            filename = parts[-1] if parts[-1].endswith('.gguf') else None
+                            model_name = '/'.join(parts[:-1]) if filename else self.model_name
+                        else:
+                            filename = None
+                            model_name = self.model_name
+                else:
+                    filename = None
+                    model_name = self.model_name
+                self.model_loader = model_manager.get_model_loader(
+                    model_name,
+                    "gguf",
+                    filename=filename
+                )
             else:
+                # Use the specified model type
+                self.model_loader = model_manager.get_model_loader(
+                    self.model_name,
+                    self.model_type
+                )
+            print(f"✅ Model loader initialized: {self.model_name} ({self.model_type})")
         except Exception as e:
+            print(f"❌ Failed to initialize model loader: {e}")
+            # Create a fallback loader
+            self._create_fallback_loader()
+    def _create_fallback_loader(self):
+        """Create a fallback text-based loader when model loading fails"""
+        class FallbackLoader:
+            def __init__(self, model_name: str, model_type: str):
+                self.model_name = model_name
+                self.model_type = model_type
+                self.name = "fallback_text"
+            def generate(self, prompt: str, **kwargs) -> str:
+                # Simple template-based response
+                sections = [
+                    "## Clinical Assessment\nBased on the provided information, this appears to be a medical case requiring clinical review.",
+                    "## Key Trends & Changes\nPlease review the patient data for any significant changes or trends.",
+                    "## Plan & Suggested Actions\nConsider consulting with a healthcare provider for proper medical assessment.",
+                    "## Direct Guidance for Physician\nThis summary was generated using a fallback method. Please review all patient data thoroughly."
+                ]
+                return "\n\n".join(sections)
+            def generate_full_summary(self, prompt: str, **kwargs) -> str:
+                return self.generate(prompt, **kwargs)
+        self.model_loader = FallbackLoader(self.model_name, self.model_type)
+        print(f"⚠️ Using fallback loader for {self.model_name}")
     def generate_clinical_summary(self, patient_data: Union[List[str], Dict]) -> str:
+        """Generate a comprehensive clinical summary using the unified model manager"""
+        print(f"✨ Generating clinical summary using model: {self.model_name} ({self.model_type})...")
         try:
+            # Build the narrative prompt
+            narrative_history = self.build_chronological_narrative(patient_data)
+            print(f"\n--- Prompt Sent to Model (truncated) ---\n{fill(narrative_history, width=80)[:1000]}...")
+            # Generate summary using the model loader
+            if hasattr(self.model_loader, 'generate_full_summary'):
+                # GGUF models support full summary generation
+                raw_summary_text = self.model_loader.generate_full_summary(
+                    narrative_history,
+                    max_tokens=self.max_output_tokens,
+                    max_loops=1
                 )
             else:
+                # Other models use standard generation
+                raw_summary_text = self.model_loader.generate(
+                    narrative_history,
+                    max_new_tokens=self.max_output_tokens,
                     temperature=0.7,
+                    top_p=0.9
                 )
+            print(f"\n--- Raw Model Output ---\n{fill(raw_summary_text, width=80)}")
+            # Format the output
+            formatted_report = self.format_clinical_output(raw_summary_text, patient_data)
+            evaluation_report = self.evaluate_summary_against_guidelines(raw_summary_text, patient_data)
+            # Combine final output
+            final_output = (
+                f"\n{'='*80}\n"
+                f"             FINAL CLINICAL SUMMARY REPORT\n"
+                f"{'='*80}\n"
+                f"{formatted_report}\n\n"
+                f"{'='*80}\n"
+                f"             SIMULATED EVALUATION REPORT\n"
+                f"{'='*80}\n"
+                f"{evaluation_report}"
+            )
+            return final_output
         except Exception as e:
+            print(f"❌ Error during summary generation: {e}")
+            import traceback
+            traceback.print_exc()
+            return f"Error generating summary: {str(e)}"
+    def build_chronological_narrative(self, patient_data: dict) -> str:
+        """Builds a chronological narrative from multi-encounter patient history."""
+        result = patient_data.get("result", {})
+        narrative = []
+        # Past Medical History
+        narrative.append(f"Past Medical History: {', '.join(result.get('past_medical_history', []))}.")
+        # Social History
+        social = result.get('social_history', 'Not specified.')
+        narrative.append(f"Social History: {social}.")
+        # Allergies
+        allergies = ', '.join(result.get('allergies', ['None']))
+        narrative.append(f"Allergies: {allergies}.")
+        # Loop through encounters chronologically
+        for enc in result.get("encounters", []):
+            encounter_str = (
+                f"Encounter on {enc['visit_date']}: "
+                f"Chief Complaint: '{enc['chief_complaint']}'. "
+                f"Symptoms: {enc.get('symptoms', 'None reported')}. "
+                f"Diagnosis: {', '.join(enc['diagnosis'])}. "
+                f"Doctor's Notes: {enc['dr_notes']}. "
+            )
+            if enc.get('vitals'):
+                encounter_str += f"Vitals: {', '.join([f'{k}: {v}' for k, v in enc['vitals'].items()])}. "
+            if enc.get('lab_results'):
+                encounter_str += f"Labs: {', '.join([f'{k}: {v}' for k, v in enc['lab_results'].items()])}. "
+            if enc.get('medications'):
+                encounter_str += f"Medications: {', '.join(enc['medications'])}. "
+            if enc.get('treatment'):
+                encounter_str += f"Treatment: {enc['treatment']}."
+            narrative.append(encounter_str)
+        return "\n".join(narrative)
+    def format_clinical_output(self, raw_summary: str, patient_data: dict) -> str:
+        """Formats the raw AI-generated summary into a structured, doctor-friendly report."""
+        result = patient_data.get("result", {})
+        last_encounter = result.get("encounters", [{}])[-1] if result.get("encounters") else result
+        # Consolidate active problems
+        all_diagnoses_raw = set(result.get('past_medical_history', []))
+        for enc in result.get('encounters', []):
+            all_diagnoses_raw.update(enc.get('diagnosis', []))
+        cleaned_diagnoses = sorted({
+            re.sub(r'\s*\([^)]*\)', '', dx).strip() for dx in all_diagnoses_raw
+        })
+        # Consolidate current medications
+        all_medications = set()
+        for enc in result.get('encounters', []):
+            all_medications.update(enc.get('medications', []))
+        current_meds = sorted(all_medications)
+        # Report Header
+        report = "\n==============================================\n"
+        report += "             CLINICAL SUMMARY REPORT\n"
+        report += "==============================================\n"
+        report += f"Generated On: {datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n"
+        # Patient Overview
+        report += "\n--- PATIENT OVERVIEW ---\n"
+        report += f"Name: {result.get('patientname', 'Unknown')}\n"
+        report += f"Patient ID: {result.get('patientnumber', 'Unknown')}\n"
+        gender = result.get('gender', 'Unknown')
+        report += f"Age/Sex: {result.get('agey', 'Unknown')} {gender[0] if gender != 'Unknown' else 'U'}\n"
+        report += f"Allergies: {', '.join(result.get('allergies', ['None']))}\n"
+        # Social History
+        report += "\n--- SOCIAL HISTORY ---\n"
+        report += fill(result.get('social_history', 'Not specified.'), width=80) + "\n"
+        # Immediate Attention
+        report += "\n--- IMMEDIATE ATTENTION (Most Recent Encounter) ---\n"
+        report += f"Date of Event: {last_encounter.get('visit_date', 'Unknown')}\n"
+        report += f"Chief Complaint: {last_encounter.get('chief_complaint', 'Not specified')}\n"
+        if last_encounter.get('vitals'):
+            vitals_str = ', '.join([f'{k}: {v}' for k, v in last_encounter['vitals'].items()])
+            report += f"Vitals: {vitals_str}\n"
+        critical_diagnoses = [
+            dx for dx in last_encounter.get('diagnosis', [])
+            if any(kw in dx.lower() for kw in ['acute', 'new onset', 'fall', 'afib', 'kidney injury'])
         ]
+        if critical_diagnoses:
+            report += f"Critical New Diagnoses: {', '.join(critical_diagnoses)}\n"
+        report += f"Doctor's Notes: {last_encounter.get('dr_notes', 'N/A')}\n"
+        # Active Problem List
+        report += "\n--- ACTIVE PROBLEM LIST (Consolidated) ---\n"
+        report += "\n".join(f"- {dx}" for dx in cleaned_diagnoses) + "\n"
+        # Current Medications
+        report += "\n--- CURRENT MEDICATION LIST (Consolidated) ---\n"
+        report += "\n".join(f"- {med}" for med in current_meds) + "\n"
+        # Procedures
+        procedures = set()
+        for enc in result.get('encounters', []):
+            if 'treatment' in enc and 'PCI' in enc['treatment']:
+                procedures.add(enc['treatment'])
+        if procedures:
+            report += "\n--- PROCEDURES & SURGERIES ---\n"
+            report += "\n".join(f"- {proc}" for proc in sorted(procedures)) + "\n"
+        # AI-Generated Narrative
+        report += "\n--- AI-GENERATED CLINICAL NARRATIVE ---\n"
+        report += fill(raw_summary, width=80) + "\n"
+        # Placeholder sections if not in model output
+        if "Assessment and Plan" not in raw_summary:
+            report += "\n--- ASSESSMENT, PLAN AND NEXT STEPS (AI-Generated) ---\n"
+            report += "The model did not generate a structured assessment and plan. Please review clinical context.\n"
+        if "Clinical Pathway" not in raw_summary:
+            report += "\n--- CLINICAL PATHWAY (AI-Generated) ---\n"
+            report += "No clinical pathway was generated. Consider next steps based on active issues.\n"
+        return report
+    def evaluate_summary_against_guidelines(self, summary_text: str, patient_data: dict) -> str:
+        """Simulated evaluation of summary against clinical guidelines."""
+        result = patient_data.get("result", {})
+        last_enc = result.get("encounters", [{}])[-1] if result.get("encounters") else {}
+        summary_lower = summary_text.lower()
+        evaluation = (
+            "\n==============================================\n"
+            "      AI SUMMARY EVALUATION & GUIDELINE CHECK\n"
+            "==============================================\n"
+        )
+        # Keyword-based accuracy
+        critical_keywords = [
+            "fall", "dizziness", "atrial fibrillation", "afib", "rvr", "kidney", "ckd",
+            "diabetes", "anticoagulation", "warfarin", "aspirin", "statin", "metformin",
+            "gout", "angina", "pci", "bph", "hypertension", "metoprolol", "clopidogrel"
         ]
+        found = [kw for kw in critical_keywords if kw in summary_lower]
+        score = (len(found) / len(critical_keywords)) * 10
+        evaluation += f"\n1. KEYWORD ACCURACY SCORE: {score:.1f}/10\n"
+        evaluation += f"   - Found {len(found)} out of {len(critical_keywords)} critical concepts.\n"
+        # Guideline checks
+        evaluation += "\n2. CLINICAL GUIDELINE COMMENTARY (SIMULATED):\n"
+        has_afib = any("atrial fibrillation" in dx.lower() for dx in last_enc.get('diagnosis', []))
+        on_anticoag = any("warfarin" in med.lower() or "apixaban" in med.lower() for med in last_enc.get('medications', []))
+        if has_afib:
+            evaluation += "   - ✅ Patient with Atrial Fibrillation is on anticoagulation.\n" if on_anticoag \
+                else "   - ❌ Atrial Fibrillation present but no anticoagulant prescribed.\n"
+        has_mi = any("myocardial infarction" in hx.lower() for hx in result.get('past_medical_history', []))
+        on_statin = any("atorvastatin" in med.lower() or "statin" in med.lower() for med in last_enc.get('medications', []))
+        if has_mi:
+            evaluation += "   - ✅ Patient with MI history is on statin therapy.\n" if on_statin \
+                else "   - ❌ Patient with MI history is not on statin therapy.\n"
+        has_aki = any("acute kidney injury" in dx.lower() for dx in last_enc.get('diagnosis', []))
+        acei_held = "hold" in last_enc.get('dr_notes', '').lower() and "lisinopril" in last_enc.get('dr_notes', '')
+        if has_aki:
+            evaluation += "   - ✅ AKI noted and ACE inhibitor was appropriately held.\n" if acei_held \
+                else "   - ⚠️ AKI present but ACE inhibitor not documented as held.\n"
+        evaluation += (
+            "\nDisclaimer: This is a simulated evaluation and not a substitute for clinical judgment.\n"
+        )
+        return evaluation
+    def update_model(self, model_name: str, model_type: str):
+        """Update the model used by this agent"""
+        self.model_name = model_name
+        self.model_type = model_type
+        self._initialize_model_loader()
+        print(f"✅ Model updated to: {model_name} ({model_type})")
+    def get_model_info(self) -> dict:
+        """Get information about the current model"""
+        if self.model_loader:
+            return self.model_loader.get_model_info()
         return {
+            "type": "unknown",
+            "model_name": self.model_name,
+            "model_type": self.model_type,
+            "loaded": False
+        }

ai_med_extract/api/model_management.py ADDED Viewed

	@@ -0,0 +1,397 @@

+"""
+Dynamic Model Management API
+Allows runtime loading, switching, and management of different model types
+"""
+from flask import Blueprint, request, jsonify
+import logging
+from typing import Dict, Any, Optional
+import torch
+from ..utils.model_manager import model_manager
+from ..utils.model_config import (
+    get_default_model,
+    get_fallback_model,
+    detect_model_type,
+    validate_model_config,
+    get_model_info
+)
+# Configure logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+# Create Blueprint
+model_management_bp = Blueprint('model_management', __name__, url_prefix='/api/models')
+@model_management_bp.route('/load', methods=['POST'])
+def load_model():
+    """
+    Load a new model with specified name and type
+    Request body:
+    {
+        "model_name": "microsoft/Phi-3-mini-4k-instruct-gguf",
+        "model_type": "gguf",
+        "filename": "Phi-3-mini-4k-instruct-q4.gguf",  # Optional for GGUF
+        "force_reload": false  # Optional, force reload even if cached
+    }
+    """
+    try:
+        data = request.get_json()
+        if not data:
+            return jsonify({"error": "No data provided"}), 400
+        model_name = data.get("model_name")
+        model_type = data.get("model_type")
+        filename = data.get("filename")
+        force_reload = data.get("force_reload", False)
+        if not model_name:
+            return jsonify({"error": "model_name is required"}), 400
+        # Auto-detect model type if not provided
+        if not model_type:
+            model_type = detect_model_type(model_name)
+            logger.info(f"Auto-detected model type: {model_type} for {model_name}")
+        # Validate model configuration
+        validation = validate_model_config(model_name, model_type)
+        if not validation["valid"]:
+            return jsonify({
+                "error": "Invalid model configuration",
+                "validation": validation
+            }), 400
+        # Load the model
+        start_time = torch.cuda.Event(enable_timing=True) if torch.cuda.is_available() else None
+        end_time = torch.cuda.Event(enable_timing=True) if torch.cuda.is_available() else None
+        if start_time:
+            start_time.record()
+        loader = model_manager.get_model_loader(model_name, model_type, filename, force_reload)
+        if end_time:
+            end_time.record()
+            torch.cuda.synchronize()
+            load_time = start_time.elapsed_time(end_time) / 1000.0  # Convert to seconds
+        else:
+            load_time = None
+        # Get model information
+        model_info = loader.get_model_info()
+        model_info["load_time_seconds"] = load_time
+        return jsonify({
+            "success": True,
+            "message": f"Model {model_name} ({model_type}) loaded successfully",
+            "model_info": model_info,
+            "validation": validation
+        }), 200
+    except Exception as e:
+        logger.error(f"Failed to load model: {str(e)}", exc_info=True)
+        return jsonify({
+            "success": False,
+            "error": f"Model loading failed: {str(e)}"
+        }), 500
+@model_management_bp.route('/generate', methods=['POST'])
+def generate_text():
+    """
+    Generate text using a specific model
+    Request body:
+    {
+        "model_name": "microsoft/Phi-3-mini-4k-instruct-gguf",
+        "model_type": "gguf",
+        "filename": "Phi-3-mini-4k-instruct-q4.gguf",  # Optional for GGUF
+        "prompt": "Generate a medical summary for...",
+        "max_tokens": 512,
+        "temperature": 0.7,
+        "top_p": 0.95
+    }
+    """
+    try:
+        data = request.get_json()
+        if not data:
+            return jsonify({"error": "No data provided"}), 400
+        model_name = data.get("model_name")
+        model_type = data.get("model_type")
+        filename = data.get("filename")
+        prompt = data.get("prompt")
+        if not all([model_name, prompt]):
+            return jsonify({"error": "model_name and prompt are required"}), 400
+        # Auto-detect model type if not provided
+        if not model_type:
+            model_type = detect_model_type(model_name)
+        # Generate text
+        start_time = torch.cuda.Event(enable_timing=True) if torch.cuda.is_available() else None
+        end_time = torch.cuda.Event(enable_timing=True) if torch.cuda.is_available() else None
+        if start_time:
+            start_time.record()
+        generated_text = model_manager.generate_text(
+            model_name,
+            model_type,
+            prompt,
+            filename,
+            **{k: v for k, v in data.items() if k not in ["model_name", "model_type", "filename", "prompt"]}
+        )
+        if end_time:
+            end_time.record()
+            torch.cuda.synchronize()
+            generation_time = start_time.elapsed_time(end_time) / 1000.0
+        else:
+            generation_time = None
+        return jsonify({
+            "success": True,
+            "generated_text": generated_text,
+            "model_name": model_name,
+            "model_type": model_type,
+            "generation_time_seconds": generation_time,
+            "text_length": len(generated_text)
+        }), 200
+    except Exception as e:
+        logger.error(f"Text generation failed: {str(e)}", exc_info=True)
+        return jsonify({
+            "success": False,
+            "error": f"Text generation failed: {str(e)}"
+        }), 500
+@model_management_bp.route('/info', methods=['GET'])
+def get_model_information():
+    """
+    Get information about a specific model or all loaded models
+    Query parameters:
+    - model_name: Optional, specific model to get info for
+    - model_type: Optional, filter by model type
+    """
+    try:
+        model_name = request.args.get("model_name")
+        model_type = request.args.get("model_type")
+        if model_name:
+            # Get info for specific model
+            if not model_type:
+                model_type = detect_model_type(model_name)
+            validation = validate_model_config(model_name, model_type)
+            model_info = get_model_info(model_name, model_type)
+            return jsonify({
+                "success": True,
+                "model_info": model_info,
+                "validation": validation
+            }), 200
+        else:
+            # Get info for all loaded models
+            loaded_models = model_manager.list_loaded_models()
+            # Filter by type if specified
+            if model_type:
+                loaded_models = {
+                    k: v for k, v in loaded_models.items()
+                    if v.get("model_type") == model_type
+                }
+            return jsonify({
+                "success": True,
+                "loaded_models": loaded_models,
+                "total_models": len(loaded_models)
+            }), 200
+    except Exception as e:
+        logger.error(f"Failed to get model information: {str(e)}", exc_info=True)
+        return jsonify({
+            "success": False,
+            "error": f"Failed to get model information: {str(e)}"
+        }), 500
+@model_management_bp.route('/defaults', methods=['GET'])
+def get_default_models():
+    """
+    Get default models for different model types
+    """
+    try:
+        from ..utils.model_config import DEFAULT_MODELS, SPACES_OPTIMIZED_MODELS
+        return jsonify({
+            "success": True,
+            "default_models": DEFAULT_MODELS,
+            "spaces_optimized_models": SPACES_OPTIMIZED_MODELS
+        }), 200
+    except Exception as e:
+        logger.error(f"Failed to get default models: {str(e)}", exc_info=True)
+        return jsonify({
+            "success": False,
+            "error": f"Failed to get default models: {str(e)}"
+        }), 500
+@model_management_bp.route('/clear_cache', methods=['POST'])
+def clear_model_cache():
+    """
+    Clear the model cache and free memory
+    """
+    try:
+        # Get cache info before clearing
+        loaded_models = model_manager.list_loaded_models()
+        cache_size = len(loaded_models)
+        # Clear cache
+        model_manager.clear_cache()
+        return jsonify({
+            "success": True,
+            "message": f"Model cache cleared successfully",
+            "cleared_models": cache_size,
+            "memory_freed": "GPU and CPU memory cleared"
+        }), 200
+    except Exception as e:
+        logger.error(f"Failed to clear cache: {str(e)}", exc_info=True)
+        return jsonify({
+            "success": False,
+            "error": f"Failed to clear cache: {str(e)}"
+        }), 500
+@model_management_bp.route('/switch', methods=['POST'])
+def switch_model():
+    """
+    Switch the model used by a specific agent
+    Request body:
+    {
+        "agent_name": "patient_summarizer",
+        "model_name": "microsoft/Phi-3-mini-4k-instruct-gguf",
+        "model_type": "gguf",
+        "filename": "Phi-3-mini-4k-instruct-q4.gguf"  # Optional for GGUF
+    }
+    """
+    try:
+        data = request.get_json()
+        if not data:
+            return jsonify({"error": "No data provided"}), 400
+        agent_name = data.get("agent_name")
+        model_name = data.get("model_name")
+        model_type = data.get("model_type")
+        filename = data.get("filename")
+        if not all([agent_name, model_name]):
+            return jsonify({"error": "agent_name and model_name are required"}), 400
+        # Auto-detect model type if not provided
+        if not model_type:
+            model_type = detect_model_type(model_name)
+        # Validate model configuration
+        validation = validate_model_config(model_name, model_type)
+        if not validation["valid"]:
+            return jsonify({
+                "error": "Invalid model configuration",
+                "validation": validation
+            }), 400
+        # Get the agent from the current app context
+        from flask import current_app
+        agents = getattr(current_app, 'agents', {})
+        if agent_name not in agents:
+            return jsonify({
+                "error": f"Agent '{agent_name}' not found",
+                "available_agents": list(agents.keys())
+            }), 404
+        agent = agents[agent_name]
+        # Update the agent's model if it supports it
+        if hasattr(agent, 'update_model'):
+            agent.update_model(model_name, model_type)
+            message = f"Agent '{agent_name}' model updated to {model_name} ({model_type})"
+        elif hasattr(agent, 'model_loader'):
+            # Try to update the model loader
+            try:
+                from ..utils.model_manager import model_manager
+                agent.model_loader = model_manager.get_model_loader(model_name, model_type, filename)
+                message = f"Agent '{agent_name}' model loader updated to {model_name} ({model_type})"
+            except Exception as e:
+                return jsonify({
+                    "error": f"Failed to update agent model loader: {str(e)}"
+                }), 500
+        else:
+            return jsonify({
+                "error": f"Agent '{agent_name}' does not support model switching"
+            }), 400
+        return jsonify({
+            "success": True,
+            "message": message,
+            "agent_name": agent_name,
+            "model_name": model_name,
+            "model_type": model_type,
+            "validation": validation
+        }), 200
+    except Exception as e:
+        logger.error(f"Failed to switch model: {str(e)}", exc_info=True)
+        return jsonify({
+            "success": False,
+            "error": f"Failed to switch model: {str(e)}"
+        }), 500
+@model_management_bp.route('/health', methods=['GET'])
+def model_health_check():
+    """
+    Health check for the model management system
+    """
+    try:
+        # Check if model manager is accessible
+        loaded_models = model_manager.list_loaded_models()
+        # Check GPU memory if available
+        gpu_info = {}
+        if torch.cuda.is_available():
+            gpu_info = {
+                "available": True,
+                "device_count": torch.cuda.device_count(),
+                "current_device": torch.cuda.current_device(),
+                "memory_allocated": f"{torch.cuda.memory_allocated() / 1024**3:.2f} GB",
+                "memory_reserved": f"{torch.cuda.memory_reserved() / 1024**3:.2f} GB"
+            }
+        else:
+            gpu_info = {"available": False}
+        return jsonify({
+            "success": True,
+            "status": "healthy",
+            "model_manager": "operational",
+            "loaded_models_count": len(loaded_models),
+            "gpu_info": gpu_info,
+            "timestamp": torch.cuda.Event(enable_timing=True).elapsed_time(torch.cuda.Event(enable_timing=True)) if torch.cuda.is_available() else None
+        }), 200
+    except Exception as e:
+        logger.error(f"Health check failed: {str(e)}", exc_info=True)
+        return jsonify({
+            "success": False,
+            "status": "unhealthy",
+            "error": f"Health check failed: {str(e)}"
+        }), 500
+# Register the blueprint
+def register_model_management_routes(app):
+    """Register model management routes with the Flask app"""
+    app.register_blueprint(model_management_bp)
+    logger.info("Model management routes registered successfully")

ai_med_extract/api/routes.py CHANGED Viewed

@@ -14,7 +14,6 @@ from transformers import (
     pipeline as transformers_pipeline
 )
 from ai_med_extract.agents.patient_summary_agent import PatientSummarizerAgent
-agent = PatientSummarizerAgent(model_name="falconsai/medical_summarization")
 from ai_med_extract.agents.summarizer import SummarizerAgent
 from ai_med_extract.utils.file_utils import (
     allowed_file,
@@ -23,278 +22,89 @@ from ai_med_extract.utils.file_utils import (
     get_data_from_storage,
 )
 from ..utils.validation import clean_result, validate_patient_name
-# from ..utils.patient_summary_utils import clean_patient_data, flatten_to_string_list
-from ai_med_extract.utils.patient_summary_utils import   clean_patient_data, flatten_to_string_list
 import time
-# Add GGUF model cache at the top of the file
-GGUF_MODEL_CACHE = {}
-def get_gguf_pipeline(model_name, filename=None):
-    key = (model_name, filename)
-    if key not in GGUF_MODEL_CACHE:
-        try:
-            from ai_med_extract.utils.model_loader_gguf import GGUFModelPipeline, create_fallback_pipeline
-            import time
-            # Add timeout for model loading
-            start_time = time.time()
-            timeout = 300  # 5 minutes timeout
-            # Try to load the GGUF model
-            try:
-                GGUF_MODEL_CACHE[key] = GGUFModelPipeline(model_name, filename, timeout=timeout)
-                load_time = time.time() - start_time
-                print(f"[GGUF] Model loaded successfully in {load_time:.2f}s: {model_name}")
-            except Exception as e:
-                load_time = time.time() - start_time
-                print(f"[GGUF] Failed to load model {model_name} after {load_time:.2f}s: {e}")
-                # If model loading fails, use fallback
-                print("[GGUF] Using fallback pipeline")
-                GGUF_MODEL_CACHE[key] = create_fallback_pipeline()
-        except Exception as e:
-            print(f"[GGUF] Critical error in model loading: {e}")
-            # Create a basic fallback
-            from ai_med_extract.utils.model_loader_gguf import create_fallback_pipeline
-            GGUF_MODEL_CACHE[key] = create_fallback_pipeline()
-    return GGUF_MODEL_CACHE[key]
-def get_qa_pipeline(qa_model_type, qa_model_name):
     if not qa_model_type or not qa_model_name:
         raise ValueError("Both qa_model_type and qa_model_name must be provided")
-    if not hasattr(get_qa_pipeline, "cache"):
-        get_qa_pipeline.cache = {}
-    # For Hugging Face Spaces, we need to be memory efficient
-    import torch
-    torch.cuda.empty_cache()  # Clear GPU memory before loading model
-    # Set default tensor type to float32 for better compatibility
-    torch.set_default_tensor_type(torch.FloatTensor)
-    if torch.cuda.is_available():
-        torch.set_default_tensor_type(torch.cuda.FloatTensor)
-    key = (qa_model_type, qa_model_name)
-    if key in get_qa_pipeline.cache:
-        return get_qa_pipeline.cache[key]
     try:
-        # For Hugging Face Spaces, use smaller models by default
-        if "Qwen/Qwen-7B-Chat" in qa_model_name:
-            qa_model_name = "Qwen/Qwen-1_8B-Chat"
-        elif "Llama" in qa_model_name:
-            qa_model_name = "facebook/opt-125m"
-        # Load tokenizer with trust_remote_code=True for custom tokenizers
-        tokenizer = AutoTokenizer.from_pretrained(
-            qa_model_name,
-            trust_remote_code=True,
-            cache_dir=os.environ.get('HF_HOME', '/tmp/huggingface')
-        )
-        # Load model with memory optimizations
-        try:
-            model = AutoModelForCausalLM.from_pretrained(
-                qa_model_name,
-                device_map="auto",
-                torch_dtype=torch.float32,  # Use float32 for better compatibility
-                trust_remote_code=True,
-                low_cpu_mem_usage=True,
-                cache_dir=os.environ.get('HF_HOME', '/tmp/huggingface')
-            )
-        except Exception as e:
-            # Try loading with a simpler model
-            fallback_model = "facebook/bart-base"
-            model = AutoModelForCausalLM.from_pretrained(
-                fallback_model,
-                device_map="auto",
-                torch_dtype=torch.float32,
-                trust_remote_code=True,
-                low_cpu_mem_usage=True,
-                cache_dir=os.environ.get('HF_HOME', '/tmp/huggingface')
-            )
-        # Create pipeline with memory optimizations
-        pipeline = transformers_pipeline(
-            task=qa_model_type,
-            model=model,
-            tokenizer=tokenizer,
-            device_map="auto",
-            torch_dtype=torch.float32
-        )
-        get_qa_pipeline.cache[key] = pipeline
-        return pipeline
     except Exception as e:
         raise
-def run_qa_pipeline(qa_pipeline, question, context):
     """
-    Run QA pipeline for both 'question-answering', 'text-generation', or other models.
     """
     if not qa_pipeline or not question or not context:
         raise ValueError("Pipeline, question and context are required")
-    qa_model_type = getattr(qa_pipeline, '_qa_model_type', None)
     try:
-        if qa_model_type == 'text-generation':
             prompt = f"Question: {question}\nContext: {context}\nAnswer:"
-            result = qa_pipeline(prompt, max_new_tokens=128, do_sample=False)
-            if isinstance(result, list) and result and 'generated_text' in result[0]:
-                answer = result[0]['generated_text'].split('Answer:')[-1].strip()
-                return {'answer': answer}
-            return {'answer': str(result)}
-        else:
             result = qa_pipeline(question=question, context=context)
             return result
     except Exception as e:
         raise
-def get_ner_pipeline(ner_model_type, ner_model_name):
-    if not ner_model_type or not ner_model_name:
-        raise ValueError("Both ner_model_type and ner_model_name must be provided")
-    if not hasattr(get_ner_pipeline, "cache"):
-        get_ner_pipeline.cache = {}
-    # For Hugging Face Spaces, we need to be memory efficient
-    import torch
-    torch.cuda.empty_cache()  # Clear GPU memory before loading model
-    # Set default tensor type
-    torch.set_default_tensor_type(torch.FloatTensor)
-    if torch.cuda.is_available():
-        torch.set_default_tensor_type(torch.cuda.FloatTensor)
-    key = (ner_model_type, ner_model_name)
-    if key in get_ner_pipeline.cache:
-        return get_ner_pipeline.cache[key]
-    try:
-        from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
-        # Clear any existing models from memory
-        if torch.cuda.is_available():
-            torch.cuda.empty_cache()
-        # Load tokenizer
-        try:
-            tokenizer = AutoTokenizer.from_pretrained(
-                ner_model_name,
-                trust_remote_code=True,
-                cache_dir=os.environ.get('HF_HOME', '/tmp/huggingface')
-            )
-        except Exception as e:
-            # Try loading with a simpler model
-            fallback_model = "dslim/bert-base-NER"
-            tokenizer = AutoTokenizer.from_pretrained(
-                fallback_model,
-                trust_remote_code=True,
-                cache_dir=os.environ.get('HF_HOME', '/tmp/huggingface')
-            )
-        # Load model with memory optimizations
-        try:
-            # For NER models, we'll use CPU if device_map='auto' is not supported
-            try:
-                model = AutoModelForTokenClassification.from_pretrained(
-                    ner_model_name,
-                    trust_remote_code=True,
-                    device_map="auto",
-                    low_cpu_mem_usage=True,
-                    torch_dtype=torch.float32,
-                    cache_dir=os.environ.get('HF_HOME', '/tmp/huggingface')
-                )
-            except ValueError as e:
-                if "device_map='auto'" in str(e):
-                    model = AutoModelForTokenClassification.from_pretrained(
-                        ner_model_name,
-                        trust_remote_code=True,
-                        low_cpu_mem_usage=True,
-                        torch_dtype=torch.float32,
-                        cache_dir=os.environ.get('HF_HOME', '/tmp/huggingface')
-                    )
-                else:
-                    raise
-        except Exception as e:
-            # Try loading with a simpler model
-            fallback_model = "dslim/bert-base-NER"
-            model = AutoModelForTokenClassification.from_pretrained(
-                fallback_model,
-                trust_remote_code=True,
-                low_cpu_mem_usage=True,
-                torch_dtype=torch.float32,
-                cache_dir=os.environ.get('HF_HOME', '/tmp/huggingface')
-            )
-        # Create pipeline with appropriate device configuration
-        try:
-            qa_pipeline = pipeline(
-                task=ner_model_type,
-                model=model,
-                tokenizer=tokenizer,
-                device_map="auto",
-                torch_dtype=torch.float32
-            )
-        except ValueError as e:
-            if "device_map='auto'" in str(e):
-                qa_pipeline = pipeline(
-                    task=ner_model_type,
-                    model=model,
-                    tokenizer=tokenizer,
-                    device=-1,  # Use CPU
-                    torch_dtype=torch.float32
-                )
-            else:
-                raise
-        # Cache the pipeline
-        get_ner_pipeline.cache[key] = qa_pipeline
-        return qa_pipeline
-    except Exception as e:
-        raise
-def get_summarizer_pipeline(summarizer_model_type, summarizer_model_name):
-    if not hasattr(get_summarizer_pipeline, "cache"):
-        get_summarizer_pipeline.cache = {}
-    key = (summarizer_model_type, summarizer_model_name)
-    if key not in get_summarizer_pipeline.cache:
-        import torch
-        from transformers import pipeline
-        # Use float16 only if CUDA is available, else use float32
-        if torch.cuda.is_available():
-            dtype = torch.float16
-            device = 0
-            device_map = "auto"
-        else:
-            dtype = torch.float32
-            device = -1
-            device_map = None
-        get_summarizer_pipeline.cache[key] = pipeline(
-            task=summarizer_model_type,
-            model=summarizer_model_name,
-            trust_remote_code=True,
-            device=device,
-            torch_dtype=dtype,
-            **({"device_map": device_map} if device_map else {})
-        )
-    return get_summarizer_pipeline.cache[key]
 def register_routes(app, agents):
     from ai_med_extract.utils.openvino_summarizer_utils import (
         parse_ehr_chartsummarydtl, visits_sorted, compute_deltas, build_compact_baseline, delta_to_text, build_main_prompt, validate_and_compare_summaries
@@ -336,22 +146,31 @@ def register_routes(app, agents):
             # Model selection logic (model_name, model_type)
             model_name = data.get("model_name") or "microsoft/Phi-3-mini-4k-instruct"
             model_type = data.get("model_type") or "text-generation"
-            # Use existing model loader abstraction
-            if model_type == "text-generation":
-                loader = agents.get("medical_data_extractor")
-            else:
-                loader = agents.get("patient_summarizer")
-            pipeline = loader.model_loader.load() if hasattr(loader, "model_loader") else None
-            if not pipeline:
-                return jsonify({"error": "Model pipeline not available"}), 500
             # Run inference
             import torch
             torch.set_num_threads(2)
-            inputs = pipeline.tokenizer([prompt], return_tensors="pt")
-            outputs = pipeline.model.generate(**inputs, max_new_tokens=400, do_sample=False, pad_token_id=pipeline.tokenizer.eos_token_id or 32000)
-            text = pipeline.tokenizer.decode(outputs[0], skip_special_tokens=True)
-            new_summary = text.split("Now generate the complete, updated clinical summary with all four sections in a markdown format:")[-1].strip()
             # Update state
             with state_lock:
@@ -369,6 +188,7 @@ def register_routes(app, agents):
             }), 200
         except Exception as e:
             return jsonify({"error": f"Failed to generate summary: {str(e)}"}), 500
     # Configure upload directory based on environment
     import os
@@ -391,7 +211,8 @@ def register_routes(app, agents):
     PHIScrubberAgent = agents["phi_scrubber"]
     Summarizer_Agent = agents["summarizer"]
     MedicalDataExtractorAgent = agents["medical_data_extractor"]
-    whisper_model = agents["whisper_model"]  # No longer needs to be called as a function
     @app.route("/upload", methods=["POST"])
     def upload_file():
@@ -619,7 +440,6 @@ def register_routes(app, agents):
                 os.remove(temp_path)
             return jsonify({"error": str(e)}), 500
     def group_by_category(data):
         grouped = defaultdict(list)
         for item in data:
@@ -649,7 +469,7 @@ def register_routes(app, agents):
         return list(reversed(reversed_unique))
     def chunk_text(text, tokenizer, max_tokens=256, overlap=100):
-            # Tokenize with memory optimizations
         input_ids = tokenizer.encode(
             text,
             add_special_tokens=False
@@ -714,7 +534,6 @@ def register_routes(app, agents):
         return extracted
     def process_chunk(generator, chunk, idx):
         prompt = f"""
                 [INST] <<SYS>>
@@ -767,12 +586,20 @@ def register_routes(app, agents):
             torch.cuda.empty_cache()
             # Process with memory optimizations
-            output = generator(
-                prompt,
-                max_new_tokens=1024,  # Reduced from 1024 for memory efficiency
-                do_sample=False,     # Disable sampling for deterministic output
-                temperature=0.3,     # Lower temperature for more focused output
-            )[0]["generated_text"]
             return idx, output
         except Exception as e:
@@ -792,27 +619,19 @@ def register_routes(app, agents):
             return jsonify({"error": "Missing 'extracted_data' in request"}), 400
         try:
-            tokenizer = AutoTokenizer.from_pretrained(
-                qa_model_name,
-                trust_remote_code=True,
-                cache_dir=os.environ.get('HF_HOME', '/tmp/huggingface')
-            )
-            model = AutoModelForCausalLM.from_pretrained(
-                qa_model_name,
-                device_map="auto",
-                torch_dtype=torch.float32,
-                trust_remote_code=True,
-                low_cpu_mem_usage=True,
-                cache_dir=os.environ.get('HF_HOME', '/tmp/huggingface')
-            )
-            generator = transformers_pipeline(
-                task=qa_model_type,
-                model=model,
-                tokenizer=tokenizer,
-                torch_dtype=torch.float32
-            )
         except Exception as e:
             return jsonify({"error": f"Could not load model: {str(e)}"}), 500
@@ -856,7 +675,6 @@ def register_routes(app, agents):
             # Clean and group results for this file
             if all_extracted:
                 deduped = deduplicate_extractions(all_extracted)
-                # cleaned_json = clean_result()
                 grouped_data = group_by_category(deduped)
             else:
                 grouped_data = {"error": "No valid data extracted"}
@@ -873,8 +691,6 @@ def register_routes(app, agents):
         print("✅ Extraction complete.")
         return jsonify(structured_response)
     @app.route("/api/generate_summary", methods=["POST"])
     def generate_summary():
         data = request.json
@@ -886,7 +702,7 @@ def register_routes(app, agents):
         except Exception:
             clean_text = context
         try:
-            summary = SummarizerAgent.generate_summary(Summarizer_Agent,clean_text)
             return jsonify({"summary": summary}), 200
         except Exception as e:
             return jsonify({"error": f"Summary generation failed: {str(e)}"}), 500
@@ -1005,14 +821,10 @@ def register_routes(app, agents):
                 "error": f"Request handling failed: {str(e)}"
             }), 500
     @app.route('/generate_patient_summary', methods=['POST'])
     def generate_patient_summary():
         """
-        Enhanced: Uses OpenVINO-style prompt, delta, and validation logic for patient summary generation.
         """
         from ai_med_extract.utils.openvino_summarizer_utils import (
             parse_ehr_chartsummarydtl, visits_sorted, compute_deltas, build_compact_baseline, delta_to_text, build_main_prompt, validate_and_compare_summaries
@@ -1084,93 +896,68 @@ def register_routes(app, agents):
             delta_text = delta_to_text(delta)
             prompt = build_main_prompt(old_summary, baseline, delta_text)
             t_model_load_start = time.time()
-            # Model selection logic (supporting OpenVINO, HuggingFace, and GGUF)
-            pipeline = None
-            loader = None
-            import torch
-            torch.set_num_threads(2)
-            if model_type == "gguf":
-                try:
-                    # Support both local path and HuggingFace repo/filename
-                    if model_name.endswith('.gguf') and '/' in model_name:
-                        repo_id, filename = model_name.rsplit('/', 1)
-                        pipeline = get_gguf_pipeline(repo_id, filename)
                     else:
-                        pipeline = get_gguf_pipeline(model_name)
-                    try:
-                        # The timeout is now handled internally by the pipeline
-                        summary_raw = pipeline.generate_full_summary(prompt, max_tokens=512, max_loops=1)
-                        # Extract markdown summary as with other models
-                        new_summary = summary_raw.split("Now generate the complete, updated clinical summary with all four sections in a markdown format:")[-1].strip()
-                        if not new_summary.strip():
-                            new_summary = summary_raw  # Use full output if split fails
-                        markdown_summary = summary_to_markdown(new_summary)
-                        with state_lock:
-                            patient_state["visits"] = all_visits
-                            patient_state["last_summary"] = markdown_summary
-                        validation_report = validate_and_compare_summaries(old_summary, markdown_summary, "Update")
-                        # Remove undefined timing variables and only log steps that are actually measured
-                        total_time = time.time() - start_total
-                        print(f"[TIMING] API call: {t_api_end-t_api_start:.2f}s, TOTAL: {total_time:.2f}s")
-                        return jsonify({
-                            "summary": markdown_summary,
-                            "validation": validation_report,
-                            "baseline": baseline,
-                            "delta": delta_text
-                        }), 200
-                    except TimeoutError as e:
-                        return jsonify({"error": f"GGUF model generation timed out: {str(e)}"}), 408
-                    except Exception as e:
-                        return jsonify({"error": f"GGUF model generation failed: {str(e)}"}), 500
-                except Exception as e:
-                    return jsonify({"error": f"Failed to load GGUF model: {str(e)}"}), 500
-            elif model_type in {"text-generation", "causal-openvino"}:
-                # Try to use an existing loader if available
-                loader = agents.get("medical_data_extractor")
-                if not loader or getattr(loader, 'model_name', None) != model_name:
-                    # Dynamically create OpenVINO loader if needed
-                    from ai_med_extract.utils.model_loader_spaces import get_openvino_pipeline
-                    try:
-                        pipeline = get_openvino_pipeline(model_name)
-                    except Exception as e:
-                        return jsonify({"error": f"Failed to load OpenVINO pipeline: {str(e)}"}), 500
-            elif model_type == "summarization":
-                loader = agents.get("summarizer")
-            # Use loader if available
-            if not pipeline and loader and hasattr(loader, "model_loader"):
-                pipeline = loader.model_loader.load()
-            if not pipeline:
-                return jsonify({"error": "Model pipeline not available"}), 500
-            # GGUF pipeline uses a different interface
-            if model_type == "gguf":
-                try:
-                    summary = pipeline.generate(prompt)
-                    return jsonify({"summary": summary})
-                except Exception as e:
-                    return jsonify({"error": f"GGUF model generation failed: {str(e)}"}), 500
-            inputs = pipeline.tokenizer([prompt], return_tensors="pt")
-            outputs = pipeline.model.generate(**inputs, max_new_tokens=500, do_sample=False, pad_token_id=pipeline.tokenizer.eos_token_id or 32000)
-            text = pipeline.tokenizer.decode(outputs[0], skip_special_tokens=True)
-            new_summary = text.split("Now generate the complete, updated clinical summary with all four sections in a markdown format:")[-1].strip()
-            # For other models, after extracting new_summary:
-            markdown_summary = summary_to_markdown(new_summary)
-            with state_lock:
-                patient_state["visits"] = all_visits
-                patient_state["last_summary"] = markdown_summary
-            validation_report = validate_and_compare_summaries(old_summary, markdown_summary, "Update")
-            # Remove undefined timing variables and only log steps that are actually measured
-            total_time = time.time() - start_total
-            print(f"[TIMING] API call: {t_api_end-t_api_start:.2f}s, TOTAL: {total_time:.2f}s")
-            return jsonify({
-                "summary": markdown_summary,
-                "validation": validation_report,
-                "baseline": baseline,
-                "delta": delta_text
-            }), 200
         except requests.exceptions.Timeout:
             return jsonify({"error": "Request to EHR API timed out"}), 504
         except requests.exceptions.RequestException as e:
@@ -1183,7 +970,6 @@ def register_routes(app, agents):
     def home():
         return "Medical Data Extraction API is running!", 200
 def summary_to_markdown(summary):
     import re
     # Remove '- answer:' and similar artifacts

     pipeline as transformers_pipeline
 )
 from ai_med_extract.agents.patient_summary_agent import PatientSummarizerAgent
 from ai_med_extract.agents.summarizer import SummarizerAgent
 from ai_med_extract.utils.file_utils import (
     allowed_file,
     get_data_from_storage,
 )
 from ..utils.validation import clean_result, validate_patient_name
+from ai_med_extract.utils.patient_summary_utils import clean_patient_data, flatten_to_string_list
 import time
+# Configure logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+def get_model_pipeline(model_name: str, model_type: str, filename: str = None):
+    """
+    Unified function to get any model pipeline using the unified model manager
+    """
+    try:
+        from ..utils.model_manager import model_manager
+        # Get the model loader
+        loader = model_manager.get_model_loader(model_name, model_type, filename)
+        # Return the loaded pipeline
+        return loader.load()
+    except Exception as e:
+        logger.error(f"Failed to get model pipeline for {model_name} ({model_type}): {e}")
+        raise RuntimeError(f"Model pipeline creation failed: {str(e)}")
+def get_qa_pipeline(qa_model_type: str, qa_model_name: str):
+    """Get QA pipeline using unified model manager"""
     if not qa_model_type or not qa_model_name:
         raise ValueError("Both qa_model_type and qa_model_name must be provided")
+    try:
+        return get_model_pipeline(qa_model_name, qa_model_type)
+    except Exception as e:
+        logger.error(f"QA pipeline creation failed: {e}")
+        raise
+def get_ner_pipeline(ner_model_type: str, ner_model_name: str):
+    """Get NER pipeline using unified model manager"""
+    if not ner_model_type or not ner_model_name:
+        raise ValueError("Both ner_model_type and ner_model_name must be provided")
+    try:
+        return get_model_pipeline(ner_model_name, ner_model_type)
+    except Exception as e:
+        logger.error(f"NER pipeline creation failed: {e}")
+        raise
+def get_summarizer_pipeline(summarizer_model_type: str, summarizer_model_name: str):
+    """Get summarizer pipeline using unified model manager"""
+    if not summarizer_model_type or not summarizer_model_name:
+        raise ValueError("Both summarizer_model_type and summarizer_model_name must be provided")
     try:
+        return get_model_pipeline(summarizer_model_name, summarizer_model_type)
     except Exception as e:
+        logger.error(f"Summarizer pipeline creation failed: {e}")
         raise
+def run_qa_pipeline(qa_pipeline, question: str, context: str):
     """
+    Run QA pipeline for any model type
     """
     if not qa_pipeline or not question or not context:
         raise ValueError("Pipeline, question and context are required")
     try:
+        # Handle different pipeline types
+        if hasattr(qa_pipeline, 'generate'):
+            # Custom pipeline with generate method
             prompt = f"Question: {question}\nContext: {context}\nAnswer:"
+            result = qa_pipeline.generate(prompt, max_new_tokens=128)
+            return {'answer': result}
+        elif hasattr(qa_pipeline, '__call__'):
+            # Standard transformers pipeline
             result = qa_pipeline(question=question, context=context)
+            if isinstance(result, list) and result:
+                return result[0]
             return result
+        else:
+            raise ValueError("Unsupported pipeline type")
     except Exception as e:
+        logger.error(f"QA pipeline execution failed: {e}")
         raise
 def register_routes(app, agents):
     from ai_med_extract.utils.openvino_summarizer_utils import (
         parse_ehr_chartsummarydtl, visits_sorted, compute_deltas, build_compact_baseline, delta_to_text, build_main_prompt, validate_and_compare_summaries
             # Model selection logic (model_name, model_type)
             model_name = data.get("model_name") or "microsoft/Phi-3-mini-4k-instruct"
             model_type = data.get("model_type") or "text-generation"
+            # Use unified model manager
+            try:
+                pipeline = get_model_pipeline(model_name, model_type)
+            except Exception as e:
+                return jsonify({"error": f"Model loading failed: {str(e)}"}), 500
             # Run inference
             import torch
             torch.set_num_threads(2)
+            if hasattr(pipeline, 'generate'):
+                # Custom pipeline with generate method
+                new_summary = pipeline.generate(prompt, max_new_tokens=400)
+            else:
+                # Standard transformers pipeline
+                inputs = pipeline.tokenizer([prompt], return_tensors="pt")
+                outputs = pipeline.model.generate(
+                    **inputs,
+                    max_new_tokens=400,
+                    do_sample=False,
+                    pad_token_id=pipeline.tokenizer.eos_token_id or 32000
+                )
+                text = pipeline.tokenizer.decode(outputs[0], skip_special_tokens=True)
+                new_summary = text.split("Now generate the complete, updated clinical summary with all four sections in a markdown format:")[-1].strip()
             # Update state
             with state_lock:
             }), 200
         except Exception as e:
             return jsonify({"error": f"Failed to generate summary: {str(e)}"}), 500
     # Configure upload directory based on environment
     import os
     PHIScrubberAgent = agents["phi_scrubber"]
     Summarizer_Agent = agents["summarizer"]
     MedicalDataExtractorAgent = agents["medical_data_extractor"]
+    whisper_model = agents["whisper_model"]
+    model_manager = agents.get("model_manager")
     @app.route("/upload", methods=["POST"])
     def upload_file():
                 os.remove(temp_path)
             return jsonify({"error": str(e)}), 500
     def group_by_category(data):
         grouped = defaultdict(list)
         for item in data:
         return list(reversed(reversed_unique))
     def chunk_text(text, tokenizer, max_tokens=256, overlap=100):
+        # Tokenize with memory optimizations
         input_ids = tokenizer.encode(
             text,
             add_special_tokens=False
         return extracted
     def process_chunk(generator, chunk, idx):
         prompt = f"""
                 [INST] <<SYS>>
             torch.cuda.empty_cache()
             # Process with memory optimizations
+            if hasattr(generator, 'generate'):
+                output = generator.generate(
+                    prompt,
+                    max_new_tokens=1024,
+                    do_sample=False,
+                    temperature=0.3,
+                )
+            else:
+                output = generator(
+                    prompt,
+                    max_new_tokens=1024,
+                    do_sample=False,
+                    temperature=0.3,
+                )[0]["generated_text"]
             return idx, output
         except Exception as e:
             return jsonify({"error": "Missing 'extracted_data' in request"}), 400
         try:
+            # Use unified model manager
+            generator = get_model_pipeline(qa_model_name, qa_model_type)
+            # Get tokenizer for chunking
+            if hasattr(generator, 'tokenizer'):
+                tokenizer = generator.tokenizer
+            else:
+                # Load tokenizer separately if needed
+                tokenizer = AutoTokenizer.from_pretrained(
+                    qa_model_name,
+                    trust_remote_code=True,
+                    cache_dir=os.environ.get('HF_HOME', '/tmp/huggingface')
+                )
         except Exception as e:
             return jsonify({"error": f"Could not load model: {str(e)}"}), 500
             # Clean and group results for this file
             if all_extracted:
                 deduped = deduplicate_extractions(all_extracted)
                 grouped_data = group_by_category(deduped)
             else:
                 grouped_data = {"error": "No valid data extracted"}
         print("✅ Extraction complete.")
         return jsonify(structured_response)
     @app.route("/api/generate_summary", methods=["POST"])
     def generate_summary():
         data = request.json
         except Exception:
             clean_text = context
         try:
+            summary = SummarizerAgent.generate_summary(Summarizer_Agent, clean_text)
             return jsonify({"summary": summary}), 200
         except Exception as e:
             return jsonify({"error": f"Summary generation failed: {str(e)}"}), 500
                 "error": f"Request handling failed: {str(e)}"
             }), 500
     @app.route('/generate_patient_summary', methods=['POST'])
     def generate_patient_summary():
         """
+        Enhanced: Uses unified model manager for any model type including GGUF for patient summary generation.
         """
         from ai_med_extract.utils.openvino_summarizer_utils import (
             parse_ehr_chartsummarydtl, visits_sorted, compute_deltas, build_compact_baseline, delta_to_text, build_main_prompt, validate_and_compare_summaries
             delta_text = delta_to_text(delta)
             prompt = build_main_prompt(old_summary, baseline, delta_text)
             t_model_load_start = time.time()
+            # Use unified model manager for any model type
+            try:
+                # Handle GGUF models with filename extraction
+                filename = None
+                if model_type == "gguf" and '/' in model_name:
+                    if model_name.endswith('.gguf'):
+                        # Full path to .gguf file
+                        pass
                     else:
+                        # HuggingFace repo with filename
+                        parts = model_name.split('/')
+                        if len(parts) >= 2 and parts[-1].endswith('.gguf'):
+                            filename = parts[-1]
+                            model_name = '/'.join(parts[:-1])
+                pipeline = get_model_pipeline(model_name, model_type, filename)
+                # Generate summary based on model type
+                if model_type == "gguf" and hasattr(pipeline, 'generate_full_summary'):
+                    summary_raw = pipeline.generate_full_summary(prompt, max_tokens=512, max_loops=1)
+                    new_summary = summary_raw.split("Now generate the complete, updated clinical summary with all four sections in a markdown format:")[-1].strip()
+                    if not new_summary.strip():
+                        new_summary = summary_raw
+                else:
+                    # Standard generation for other model types
+                    if hasattr(pipeline, 'generate'):
+                        new_summary = pipeline.generate(prompt, max_new_tokens=500)
+                    else:
+                        # Transformers pipeline
+                        inputs = pipeline.tokenizer([prompt], return_tensors="pt")
+                        outputs = pipeline.model.generate(
+                            **inputs,
+                            max_new_tokens=500,
+                            do_sample=False,
+                            pad_token_id=pipeline.tokenizer.eos_token_id or 32000
+                        )
+                        text = pipeline.tokenizer.decode(outputs[0], skip_special_tokens=True)
+                        new_summary = text.split("Now generate the complete, updated clinical summary with all four sections in a markdown format:")[-1].strip()
+                # Convert to markdown and update state
+                markdown_summary = summary_to_markdown(new_summary)
+                with state_lock:
+                    patient_state["visits"] = all_visits
+                    patient_state["last_summary"] = markdown_summary
+                validation_report = validate_and_compare_summaries(old_summary, markdown_summary, "Update")
+                total_time = time.time() - start_total
+                print(f"[TIMING] API call: {t_api_end-t_api_start:.2f}s, TOTAL: {total_time:.2f}s")
+                return jsonify({
+                    "summary": markdown_summary,
+                    "validation": validation_report,
+                    "baseline": baseline,
+                    "delta": delta_text
+                }), 200
+            except Exception as e:
+                logger.error(f"Model processing failed: {str(e)}", exc_info=True)
+                return jsonify({"error": f"Model processing failed: {str(e)}"}), 500
         except requests.exceptions.Timeout:
             return jsonify({"error": "Request to EHR API timed out"}), 504
         except requests.exceptions.RequestException as e:
     def home():
         return "Medical Data Extraction API is running!", 200
 def summary_to_markdown(summary):
     import re
     # Remove '- answer:' and similar artifacts

ai_med_extract/app.py CHANGED Viewed

@@ -11,9 +11,9 @@ from .agents.summarizer import SummarizerAgent
 from .agents.medical_data_extractor import MedicalDataExtractorAgent
 from .agents.medical_data_extractor import MedicalDocDataExtractorAgent
 from .agents.patient_summary_agent import PatientSummarizerAgent
 import torch
 # Load environment variables
 load_dotenv()
@@ -50,7 +50,6 @@ app.config['MAX_CONTENT_LENGTH'] = 100 * 1024 * 1024  # 100 MB max file size
 # Set cache directories
 CACHE_DIRS = {
-    'HF_HOME': '/tmp/huggingface',
     'HF_HOME': '/tmp/huggingface',
     'XDG_CACHE_HOME': '/tmp',
     'TORCH_HOME': '/tmp/torch',
@@ -61,79 +60,7 @@ for env_var, path in CACHE_DIRS.items():
     os.environ[env_var] = path
     os.makedirs(path, exist_ok=True)
-# Model loaders
-class LazyModelLoader:
-    def __init__(self, model_name, model_type, fallback_model=None, max_retries=2):
-        self.model_name = model_name
-        self.model_type = model_type
-        self.fallback_model = fallback_model
-        self._model = None
-        self._tokenizer = None
-        self._pipeline = None
-        self._retries = 0
-        self.max_retries = max_retries
-    def load(self):
-        from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM, AutoModelForSeq2SeqLM
-        import torch
-        if self._pipeline is None:
-            try:
-                logging.info(f"Loading model: {self.model_name} (attempt {self._retries + 1})")
-                torch.cuda.empty_cache()
-                self._tokenizer = AutoTokenizer.from_pretrained(
-                    self.model_name,
-                    trust_remote_code=True,
-                    cache_dir=os.environ.get('HF_HOME', '/tmp/huggingface')
-                )
-                if self.model_type == "text-generation":
-                    self._model = AutoModelForCausalLM.from_pretrained(
-                        self.model_name,
-                        trust_remote_code=True,
-                        device_map="auto",
-                        low_cpu_mem_usage=True,
-                        torch_dtype=torch.float16,
-                        cache_dir=os.environ.get('HF_HOME', '/tmp/huggingface')
-                    )
-                else:
-                     dtype = torch.float16 if torch.cuda.is_available() else torch.float32
-                     self._model = AutoModelForSeq2SeqLM.from_pretrained(
-                         self.model_name,
-                         trust_remote_code=True,
-                         device_map="auto",
-                         low_cpu_mem_usage=True,
-                         torch_dtype=dtype,
-                         cache_dir=os.environ.get('HF_HOME', '/tmp/huggingface')
-                         )
-                device = 0 if torch.cuda.is_available() else -1
-                self._pipeline = pipeline(
-                    task=self.model_type,
-                    model=self._model,
-                    tokenizer=self._tokenizer,
-                )
-                logging.info(f"Model loaded successfully: {self.model_name}")
-                return self._pipeline
-            except Exception as e:
-                logging.error(f"Error loading model '{self.model_name}': {e}", exc_info=True)
-                self._retries += 1
-                if self._retries >= self.max_retries:
-                    raise RuntimeError(f"Exceeded retry limit for model: {self.model_name}")
-                # Attempt fallback if it's different from current
-                if self.fallback_model and self.fallback_model != self.model_name:
-                    logging.warning(f"Falling back to model: {self.fallback_model}")
-                    self.model_name = self.fallback_model
-                    return self.load()
-                else:
-                    raise RuntimeError(f"Fallback failed or not set for model: {self.model_name}")
-        return self._pipeline
 class WhisperModelLoader:
     _instance = None
@@ -164,25 +91,22 @@ class WhisperModelLoader:
         model = self.load()
         return model.transcribe(audio_path)
-# Initialize agents
 try:
-    # Use smaller models for Hugging Face Spaces
-    medical_data_extractor_model_loader = LazyModelLoader(
-        "facebook/bart-base",  # Start with a smaller model
-        "text-generation",
-        fallback_model="facebook/bart-large-cnn"
-    )
-    summarization_model_loader = LazyModelLoader(
-        "Falconsai/medical_summarization",  # ✅ Known working
-        "summarization",
-        fallback_model="Falconsai/medical_summarization"
-  )
-    # Initialize agents with lazy loading
     text_extractor_agent = TextExtractorAgent()
     phi_scrubber_agent = PHIScrubberAgent()
-    medical_data_extractor_agent = MedicalDataExtractorAgent(medical_data_extractor_model_loader)
-    summarizer_agent = SummarizerAgent(summarization_model_loader)
     # Pass all agents and models to routes
     agents = {
@@ -191,12 +115,15 @@ try:
         "summarizer": summarizer_agent,
         "medical_data_extractor": medical_data_extractor_agent,
         "whisper_model": WhisperModelLoader.get_instance(),
-        "patient_summarizer": PatientSummarizerAgent(model_name="falconsai/medical_summarization",),
     }
     from .api.routes import register_routes
     register_routes(app, agents)
 except Exception as e:
     logging.error(f"Failed to initialize application: {str(e)}", exc_info=True)
     raise

 from .agents.medical_data_extractor import MedicalDataExtractorAgent
 from .agents.medical_data_extractor import MedicalDocDataExtractorAgent
 from .agents.patient_summary_agent import PatientSummarizerAgent
+from .utils.model_manager import model_manager
 import torch
 # Load environment variables
 load_dotenv()
 # Set cache directories
 CACHE_DIRS = {
     'HF_HOME': '/tmp/huggingface',
     'XDG_CACHE_HOME': '/tmp',
     'TORCH_HOME': '/tmp/torch',
     os.environ[env_var] = path
     os.makedirs(path, exist_ok=True)
+# WhisperModelLoader for audio transcription
 class WhisperModelLoader:
     _instance = None
         model = self.load()
         return model.transcribe(audio_path)
+# Initialize agents with unified model manager
 try:
+    # Initialize basic agents that don't require specific models
     text_extractor_agent = TextExtractorAgent()
     phi_scrubber_agent = PHIScrubberAgent()
+    # Initialize model-dependent agents with unified model manager
+    # These will be loaded dynamically when needed
+    medical_data_extractor_agent = MedicalDataExtractorAgent(None)  # Will be set dynamically
+    summarizer_agent = SummarizerAgent(None)  # Will be set dynamically
+    # Initialize patient summarizer with unified model manager support
+    patient_summarizer_agent = PatientSummarizerAgent(
+        model_name="falconsai/medical_summarization",
+        model_type="summarization"
+    )
     # Pass all agents and models to routes
     agents = {
         "summarizer": summarizer_agent,
         "medical_data_extractor": medical_data_extractor_agent,
         "whisper_model": WhisperModelLoader.get_instance(),
+        "patient_summarizer": patient_summarizer_agent,
+        "model_manager": model_manager,  # Add unified model manager
     }
     from .api.routes import register_routes
     register_routes(app, agents)
+    logging.info("Application initialized successfully with unified model manager")
 except Exception as e:
     logging.error(f"Failed to initialize application: {str(e)}", exc_info=True)
     raise

ai_med_extract/utils/model_config.py CHANGED Viewed

	@@ -0,0 +1,165 @@

+"""
+Model configuration for the unified model manager
+Defines default models, fallback options, and model type mappings
+"""
+# Default models for different tasks
+DEFAULT_MODELS = {
+    "text-generation": {
+        "primary": "facebook/bart-base",
+        "fallback": "facebook/bart-large-cnn",
+        "description": "Text generation models for QA and medical data extraction"
+    },
+    "summarization": {
+        "primary": "Falconsai/medical_summarization",
+        "fallback": "facebook/bart-large-cnn",
+        "description": "Text summarization models for medical reports"
+    },
+    "ner": {
+        "primary": "dslim/bert-base-NER",
+        "fallback": "dslim/bert-base-NER",
+        "description": "Named Entity Recognition for medical entities"
+    },
+    "gguf": {
+        "primary": "microsoft/Phi-3-mini-4k-instruct-gguf",
+        "fallback": "microsoft/Phi-3-mini-4k-instruct-gguf",
+        "description": "GGUF models for patient summaries and medical tasks"
+    },
+    "openvino": {
+        "primary": "microsoft/Phi-3-mini-4k-instruct",
+        "fallback": "microsoft/Phi-3-mini-4k-instruct",
+        "description": "OpenVINO optimized models"
+    }
+}
+# Model type mappings for automatic detection
+MODEL_TYPE_MAPPINGS = {
+    # GGUF models
+    ".gguf": "gguf",
+    "gguf": "gguf",
+    # OpenVINO models
+    "openvino": "openvino",
+    "ov": "openvino",
+    # Transformers models
+    "text-generation": "text-generation",
+    "summarization": "summarization",
+    "ner": "ner",
+    "question-answering": "text-generation",
+    "translation": "text-generation"
+}
+# Memory-optimized models for Hugging Face Spaces
+SPACES_OPTIMIZED_MODELS = {
+    "text-generation": "facebook/bart-base",
+    "summarization": "Falconsai/medical_summarization",
+    "ner": "dslim/bert-base-NER",
+    "gguf": "microsoft/Phi-3-mini-4k-instruct-gguf"
+}
+# Model validation rules
+MODEL_VALIDATION_RULES = {
+    "text-generation": {
+        "min_tokens": 100,
+        "max_tokens": 2048,
+        "supported_formats": ["huggingface", "local"]
+    },
+    "summarization": {
+        "min_tokens": 50,
+        "max_tokens": 1024,
+        "supported_formats": ["huggingface", "local"]
+    },
+    "ner": {
+        "min_tokens": 50,
+        "max_tokens": 512,
+        "supported_formats": ["huggingface", "local"]
+    },
+    "gguf": {
+        "min_tokens": 100,
+        "max_tokens": 4096,
+        "supported_formats": ["huggingface", "local", "remote"]
+    },
+    "openvino": {
+        "min_tokens": 100,
+        "max_tokens": 2048,
+        "supported_formats": ["huggingface", "local"]
+    }
+}
+def get_default_model(model_type: str, use_spaces_optimized: bool = False) -> str:
+    """Get the default model for a given type"""
+    if use_spaces_optimized and model_type in SPACES_OPTIMIZED_MODELS:
+        return SPACES_OPTIMIZED_MODELS[model_type]
+    if model_type in DEFAULT_MODELS:
+        return DEFAULT_MODELS[model_type]["primary"]
+    # Fallback to text-generation if type not found
+    return DEFAULT_MODELS["text-generation"]["primary"]
+def get_fallback_model(model_type: str) -> str:
+    """Get the fallback model for a given type"""
+    if model_type in DEFAULT_MODELS:
+        return DEFAULT_MODELS[model_type]["fallback"]
+    return DEFAULT_MODELS["text-generation"]["fallback"]
+def detect_model_type(model_name: str) -> str:
+    """Automatically detect model type from model name"""
+    model_name_lower = model_name.lower()
+    # Check for explicit type indicators
+    for indicator, model_type in MODEL_TYPE_MAPPINGS.items():
+        if indicator in model_name_lower:
+            return model_type
+    # Check file extensions
+    if model_name.endswith('.gguf'):
+        return "gguf"
+    # Default to text-generation for unknown types
+    return "text-generation"
+def validate_model_config(model_name: str, model_type: str) -> dict:
+    """Validate model configuration and return validation result"""
+    result = {
+        "valid": True,
+        "warnings": [],
+        "errors": [],
+        "recommendations": []
+    }
+    # Check if model type is supported
+    if model_type not in MODEL_VALIDATION_RULES:
+        result["valid"] = False
+        result["errors"].append(f"Unsupported model type: {model_type}")
+        return result
+    # Check model name format
+    if model_type == "gguf":
+        if not (model_name.endswith('.gguf') or '/' in model_name):
+            result["warnings"].append("GGUF model should have .gguf extension or be in repo/filename format")
+    # Check for memory optimization recommendations
+    if model_type in ["text-generation", "summarization"]:
+        if "large" in model_name.lower() or "xl" in model_name.lower():
+            result["warnings"].append("Large models may cause memory issues on limited resources")
+            result["recommendations"].append("Consider using a smaller model for better performance")
+    return result
+def get_model_info(model_name: str, model_type: str) -> dict:
+    """Get comprehensive information about a model configuration"""
+    validation = validate_model_config(model_name, model_type)
+    return {
+        "model_name": model_name,
+        "model_type": model_type,
+        "detected_type": detect_model_type(model_name),
+        "default_model": get_default_model(model_type),
+        "fallback_model": get_fallback_model(model_type),
+        "validation": validation,
+        "supported_formats": MODEL_VALIDATION_RULES.get(model_type, {}).get("supported_formats", []),
+        "description": DEFAULT_MODELS.get(model_type, {}).get("description", "Unknown model type")
+    }

ai_med_extract/utils/model_manager.py ADDED Viewed

	@@ -0,0 +1,408 @@

+import os
+import logging
+import torch
+from typing import Dict, Any, Optional, Union, Tuple
+from abc import ABC, abstractmethod
+import time
+# Configure logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+class BaseModelLoader(ABC):
+    """Abstract base class for model loaders"""
+    @abstractmethod
+    def load(self) -> Any:
+        """Load and return the model"""
+        pass
+    @abstractmethod
+    def generate(self, prompt: str, **kwargs) -> str:
+        """Generate text from prompt"""
+        pass
+    @abstractmethod
+    def get_model_info(self) -> Dict[str, Any]:
+        """Get model information"""
+        pass
+class TransformersModelLoader(BaseModelLoader):
+    """Loader for Hugging Face Transformers models"""
+    def __init__(self, model_name: str, model_type: str, device: Optional[str] = None):
+        self.model_name = model_name
+        self.model_type = model_type
+        self.device = device or ("cuda" if torch.cuda.is_available() else "cpu")
+        self._model = None
+        self._tokenizer = None
+        self._pipeline = None
+    def load(self):
+        if self._pipeline is None:
+            try:
+                from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM, AutoModelForSeq2SeqLM
+                logger.info(f"Loading Transformers model: {self.model_name} ({self.model_type})")
+                torch.cuda.empty_cache()
+                # Load tokenizer
+                self._tokenizer = AutoTokenizer.from_pretrained(
+                    self.model_name,
+                    trust_remote_code=True,
+                    cache_dir=os.environ.get('HF_HOME', '/tmp/huggingface')
+                )
+                if self.tokenizer.pad_token is None:
+                    self._tokenizer.pad_token = self._tokenizer.eos_token
+                # Load model based on type
+                if self.model_type == "text-generation":
+                    self._model = AutoModelForCausalLM.from_pretrained(
+                        self.model_name,
+                        trust_remote_code=True,
+                        device_map="auto" if self.device == "cuda" else None,
+                        low_cpu_mem_usage=True,
+                        torch_dtype=torch.float16 if self.device == "cuda" else torch.float32,
+                        cache_dir=os.environ.get('HF_HOME', '/tmp/huggingface')
+                    )
+                else:
+                    self._model = AutoModelForSeq2SeqLM.from_pretrained(
+                        self.model_name,
+                        trust_remote_code=True,
+                        device_map="auto" if self.device == "cuda" else None,
+                        low_cpu_mem_usage=True,
+                        torch_dtype=torch.float16 if self.device == "cuda" else torch.float32,
+                        cache_dir=os.environ.get('HF_HOME', '/tmp/huggingface')
+                    )
+                # Create pipeline
+                device_id = 0 if self.device == "cuda" else -1
+                self._pipeline = pipeline(
+                    task=self.model_type,
+                    model=self._model,
+                    tokenizer=self._tokenizer,
+                    device=device_id
+                )
+                logger.info(f"Transformers model loaded successfully: {self.model_name}")
+            except Exception as e:
+                logger.error(f"Failed to load Transformers model: {e}")
+                raise RuntimeError(f"Transformers model loading failed: {str(e)}")
+        return self._pipeline
+    def generate(self, prompt: str, **kwargs) -> str:
+        pipeline = self.load()
+        try:
+            if self.model_type == "text-generation":
+                result = pipeline(
+                    prompt,
+                    max_new_tokens=kwargs.get('max_new_tokens', 512),
+                    do_sample=kwargs.get('do_sample', False),
+                    temperature=kwargs.get('temperature', 0.7),
+                    pad_token_id=self._tokenizer.eos_token_id
+                )
+                if isinstance(result, list) and result:
+                    return result[0].get('generated_text', '').replace(prompt, '').strip()
+                return str(result)
+            else:
+                result = pipeline(
+                    prompt,
+                    max_length=kwargs.get('max_length', 512),
+                    min_length=kwargs.get('min_length', 50),
+                    do_sample=kwargs.get('do_sample', False)
+                )
+                if isinstance(result, list) and result:
+                    return result[0].get('summary_text', str(result[0]))
+                return str(result)
+        except Exception as e:
+            logger.error(f"Generation failed: {e}")
+            raise RuntimeError(f"Text generation failed: {str(e)}")
+    def get_model_info(self) -> Dict[str, Any]:
+        return {
+            "type": "transformers",
+            "model_name": self.model_name,
+            "model_type": self.model_type,
+            "device": self.device,
+            "loaded": self._pipeline is not None
+        }
+    @property
+    def tokenizer(self):
+        if self._tokenizer is None:
+            self.load()
+        return self._tokenizer
+    @property
+    def model(self):
+        if self._model is None:
+            self.load()
+        return self._model
+class GGUFModelLoader(BaseModelLoader):
+    """Loader for GGUF models using llama.cpp"""
+    def __init__(self, model_name: str, filename: Optional[str] = None, device: Optional[str] = None):
+        self.model_name = model_name
+        self.filename = filename
+        self.device = device or ("cuda" if torch.cuda.is_available() else "cpu")
+        self._pipeline = None
+    def load(self):
+        if self._pipeline is None:
+            try:
+                from .model_loader_gguf import GGUFModelPipeline
+                logger.info(f"Loading GGUF model: {self.model_name}")
+                if self.filename:
+                    self._pipeline = GGUFModelPipeline(self.model_name, self.filename)
+                else:
+                    self._pipeline = GGUFModelPipeline(self.model_name)
+                logger.info(f"GGUF model loaded successfully: {self.model_name}")
+            except Exception as e:
+                logger.error(f"Failed to load GGUF model: {e}")
+                # Fallback to text-based response
+                from .model_loader_gguf import create_fallback_pipeline
+                self._pipeline = create_fallback_pipeline()
+                logger.warning(f"Using fallback pipeline for {self.model_name}")
+        return self._pipeline
+    def generate(self, prompt: str, **kwargs) -> str:
+        pipeline = self.load()
+        try:
+            max_tokens = kwargs.get('max_tokens', 512)
+            temperature = kwargs.get('temperature', 0.7)
+            top_p = kwargs.get('top_p', 0.95)
+            if hasattr(pipeline, 'generate_full_summary'):
+                return pipeline.generate_full_summary(
+                    prompt,
+                    max_tokens=max_tokens,
+                    max_loops=kwargs.get('max_loops', 1)
+                )
+            else:
+                return pipeline.generate(
+                    prompt,
+                    max_tokens=max_tokens,
+                    temperature=temperature,
+                    top_p=top_p
+                )
+        except Exception as e:
+            logger.error(f"GGUF generation failed: {e}")
+            raise RuntimeError(f"GGUF generation failed: {str(e)}")
+    def get_model_info(self) -> Dict[str, Any]:
+        return {
+            "type": "gguf",
+            "model_name": self.model_name,
+            "filename": self.filename,
+            "device": self.device,
+            "loaded": self._pipeline is not None
+        }
+class OpenVINOModelLoader(BaseModelLoader):
+    """Loader for OpenVINO models"""
+    def __init__(self, model_name: str, device: Optional[str] = None):
+        self.model_name = model_name
+        self.device = device or ("cuda" if torch.cuda.is_available() else "cpu")
+        self._pipeline = None
+    def load(self):
+        if self._pipeline is None:
+            try:
+                from .model_loader_spaces import get_openvino_pipeline
+                logger.info(f"Loading OpenVINO model: {self.model_name}")
+                self._pipeline = get_openvino_pipeline(self.model_name)
+                logger.info(f"OpenVINO model loaded successfully: {self.model_name}")
+            except Exception as e:
+                logger.error(f"Failed to load OpenVINO model: {e}")
+                raise RuntimeError(f"OpenVINO model loading failed: {str(e)}")
+        return self._pipeline
+    def generate(self, prompt: str, **kwargs) -> str:
+        pipeline = self.load()
+        try:
+            # OpenVINO models typically use the same interface as transformers
+            inputs = pipeline.tokenizer([prompt], return_tensors="pt")
+            outputs = pipeline.model.generate(
+                **inputs,
+                max_new_tokens=kwargs.get('max_new_tokens', 500),
+                do_sample=False,
+                pad_token_id=pipeline.tokenizer.eos_token_id or 32000
+            )
+            return pipeline.tokenizer.decode(outputs[0], skip_special_tokens=True)
+        except Exception as e:
+            logger.error(f"OpenVINO generation failed: {e}")
+            raise RuntimeError(f"OpenVINO generation failed: {str(e)}")
+    def get_model_info(self) -> Dict[str, Any]:
+        return {
+            "type": "openvino",
+            "model_name": self.model_name,
+            "device": self.device,
+            "loaded": self._pipeline is not None
+        }
+class UnifiedModelManager:
+    """Unified model manager that can handle any model type"""
+    def __init__(self):
+        self._model_cache: Dict[str, BaseModelLoader] = {}
+        self._fallback_models = {
+            "text-generation": "facebook/bart-base",
+            "summarization": "Falconsai/medical_summarization",
+            "ner": "dslim/bert-base-NER",
+            "gguf": "microsoft/Phi-3-mini-4k-instruct-gguf"
+        }
+    def get_model_loader(
+        self,
+        model_name: str,
+        model_type: str,
+        filename: Optional[str] = None,
+        force_reload: bool = False
+    ) -> BaseModelLoader:
+        """
+        Get a model loader for the specified model and type
+        Args:
+            model_name: Name or path of the model
+            model_type: Type of model (text-generation, summarization, ner, gguf, openvino)
+            filename: Optional filename for GGUF models
+            force_reload: Force reload the model even if cached
+        Returns:
+            BaseModelLoader instance
+        """
+        cache_key = f"{model_name}:{model_type}:{filename or ''}"
+        if not force_reload and cache_key in self._model_cache:
+            return self._model_cache[cache_key]
+        try:
+            # Determine loader type and create appropriate loader
+            if model_type == "gguf":
+                loader = GGUFModelLoader(model_name, filename)
+            elif model_type == "openvino":
+                loader = OpenVINOModelLoader(model_name)
+            else:
+                # Default to transformers for text-generation, summarization, ner, etc.
+                loader = TransformersModelLoader(model_name, model_type)
+            # Test load the model
+            loader.load()
+            # Cache the loader
+            self._model_cache[cache_key] = loader
+            logger.info(f"Model loader created successfully: {model_name} ({model_type})")
+            return loader
+        except Exception as e:
+            logger.error(f"Failed to create model loader for {model_name} ({model_type}): {e}")
+            # Try fallback model
+            fallback_name = self._fallback_models.get(model_type)
+            if fallback_name and fallback_name != model_name:
+                logger.warning(f"Trying fallback model: {fallback_name}")
+                try:
+                    if model_type == "gguf":
+                        loader = GGUFModelLoader(fallback_name)
+                    elif model_type == "openvino":
+                        loader = OpenVINOModelLoader(fallback_name)
+                    else:
+                        loader = TransformersModelLoader(fallback_name, model_type)
+                    loader.load()
+                    self._model_cache[cache_key] = loader
+                    logger.info(f"Fallback model loaded successfully: {fallback_name}")
+                    return loader
+                except Exception as fallback_error:
+                    logger.error(f"Fallback model also failed: {fallback_error}")
+            # Create a basic fallback
+            from .model_loader_gguf import create_fallback_pipeline
+            class FallbackLoader(BaseModelLoader):
+                def __init__(self, model_name: str, model_type: str):
+                    self.model_name = model_name
+                    self.model_type = model_type
+                    self._pipeline = create_fallback_pipeline()
+                def load(self):
+                    return self._pipeline
+                def generate(self, prompt: str, **kwargs) -> str:
+                    return self._pipeline.generate(prompt, **kwargs)
+                def get_model_info(self) -> Dict[str, Any]:
+                    return {
+                        "type": "fallback",
+                        "model_name": self.model_name,
+                        "model_type": self.model_type,
+                        "loaded": True
+                    }
+            fallback_loader = FallbackLoader(model_name, model_type)
+            self._model_cache[cache_key] = fallback_loader
+            return fallback_loader
+    def generate_text(
+        self,
+        model_name: str,
+        model_type: str,
+        prompt: str,
+        filename: Optional[str] = None,
+        **kwargs
+    ) -> str:
+        """
+        Generate text using the specified model
+        Args:
+            model_name: Name or path of the model
+            model_type: Type of model
+            prompt: Input prompt
+            filename: Optional filename for GGUF models
+            **kwargs: Additional generation parameters
+        Returns:
+            Generated text
+        """
+        loader = self.get_model_loader(model_name, model_type, filename)
+        return loader.generate(prompt, **kwargs)
+    def get_model_info(self, model_name: str, model_type: str, filename: Optional[str] = None) -> Dict[str, Any]:
+        """Get information about a specific model"""
+        loader = self.get_model_loader(model_name, model_type, filename)
+        return loader.get_model_info()
+    def clear_cache(self):
+        """Clear the model cache"""
+        self._model_cache.clear()
+        torch.cuda.empty_cache()
+        logger.info("Model cache cleared")
+    def list_loaded_models(self) -> Dict[str, Dict[str, Any]]:
+        """List all loaded models and their information"""
+        return {
+            cache_key: loader.get_model_info()
+            for cache_key, loader in self._model_cache.items()
+        }
+# Global instance
+model_manager = UnifiedModelManager()

test_refactored_system.py ADDED Viewed

	@@ -0,0 +1,321 @@

+#!/usr/bin/env python3
+"""
+Test script for the refactored HNTAI system
+Demonstrates the new unified model manager and dynamic model loading capabilities
+"""
+import os
+import sys
+import time
+import logging
+import requests
+import json
+# Configure logging
+logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
+logger = logging.getLogger(__name__)
+# Set environment variables for testing
+os.environ['HF_HOME'] = '/tmp/huggingface'
+os.environ['GGUF_N_THREADS'] = '2'
+os.environ['GGUF_N_BATCH'] = '64'
+def test_model_manager():
+    """Test the unified model manager"""
+    logger.info("Testing Unified Model Manager...")
+    try:
+        from ai_med_extract.utils.model_manager import model_manager
+        # Test 1: Load a transformers model
+        logger.info("Test 1: Loading Transformers model...")
+        loader = model_manager.get_model_loader("facebook/bart-base", "text-generation")
+        result = loader.generate("Hello, how are you?", max_new_tokens=50)
+        logger.info(f"✅ Transformers model test passed: {len(result)} characters generated")
+        # Test 2: Load a GGUF model
+        logger.info("Test 2: Loading GGUF model...")
+        try:
+            gguf_loader = model_manager.get_model_loader(
+                "microsoft/Phi-3-mini-4k-instruct-gguf",
+                "gguf"
+            )
+            result = gguf_loader.generate("Generate a brief medical summary: Patient has fever and cough.", max_tokens=100)
+            logger.info(f"✅ GGUF model test passed: {len(result)} characters generated")
+        except Exception as e:
+            logger.warning(f"⚠️ GGUF model test failed (this is expected if model not available): {e}")
+        # Test 3: Test fallback mechanism
+        logger.info("Test 3: Testing fallback mechanism...")
+        try:
+            fallback_loader = model_manager.get_model_loader("invalid/model", "text-generation")
+            result = fallback_loader.generate("Test prompt")
+            logger.info(f"✅ Fallback mechanism test passed: {len(result)} characters generated")
+        except Exception as e:
+            logger.error(f"❌ Fallback mechanism test failed: {e}")
+            return False
+        logger.info("🎉 All model manager tests passed!")
+        return True
+    except Exception as e:
+        logger.error(f"❌ Model manager test failed: {e}")
+        return False
+def test_patient_summarizer():
+    """Test the refactored patient summarizer agent"""
+    logger.info("Testing Patient Summarizer Agent...")
+    try:
+        from ai_med_extract.agents.patient_summary_agent import PatientSummarizerAgent
+        # Test with different model types
+        test_cases = [
+            {
+                "name": "Transformers Summarization",
+                "model_name": "Falconsai/medical_summarization",
+                "model_type": "summarization"
+            },
+            {
+                "name": "GGUF Model",
+                "model_name": "microsoft/Phi-3-mini-4k-instruct-gguf",
+                "model_type": "gguf"
+            }
+        ]
+        for test_case in test_cases:
+            logger.info(f"Testing: {test_case['name']}")
+            try:
+                agent = PatientSummarizerAgent(
+                    model_name=test_case["model_name"],
+                    model_type=test_case["model_type"]
+                )
+                # Test with sample patient data
+                sample_data = {
+                    "result": {
+                        "patientname": "John Doe",
+                        "patientnumber": "12345",
+                        "agey": "45",
+                        "gender": "Male",
+                        "allergies": ["Penicillin"],
+                        "social_history": "Non-smoker, occasional alcohol",
+                        "past_medical_history": ["Hypertension", "Diabetes"],
+                        "encounters": [
+                            {
+                                "visit_date": "2024-01-15",
+                                "chief_complaint": "Chest pain",
+                                "symptoms": "Sharp chest pain, shortness of breath",
+                                "diagnosis": ["Angina", "Hypertension"],
+                                "dr_notes": "Patient reports chest pain for 2 days",
+                                "vitals": {"BP": "140/90", "HR": "85", "SpO2": "98%"},
+                                "medications": ["Aspirin", "Metoprolol"],
+                                "treatment": "Prescribed medications, follow-up in 1 week"
+                            }
+                        ]
+                    }
+                }
+                summary = agent.generate_clinical_summary(sample_data)
+                logger.info(f"✅ {test_case['name']} test passed: {len(summary)} characters generated")
+            except Exception as e:
+                logger.warning(f"⚠️ {test_case['name']} test failed (this may be expected): {e}")
+        logger.info("🎉 Patient summarizer tests completed!")
+        return True
+    except Exception as e:
+        logger.error(f"❌ Patient summarizer test failed: {e}")
+        return False
+def test_model_config():
+    """Test the model configuration system"""
+    logger.info("Testing Model Configuration...")
+    try:
+        from ai_med_extract.utils.model_config import (
+            detect_model_type,
+            validate_model_config,
+            get_model_info,
+            get_default_model
+        )
+        # Test model type detection
+        test_models = [
+            ("facebook/bart-base", "text-generation"),
+            ("Falconsai/medical_summarization", "summarization"),
+            ("microsoft/Phi-3-mini-4k-instruct-gguf", "gguf"),
+            ("model.gguf", "gguf"),
+            ("unknown/model", "text-generation")  # Default fallback
+        ]
+        for model_name, expected_type in test_models:
+            detected_type = detect_model_type(model_name)
+            if detected_type == expected_type:
+                logger.info(f"✅ Model type detection correct: {model_name} -> {detected_type}")
+            else:
+                logger.warning(f"⚠️ Model type detection mismatch: {model_name} -> {detected_type} (expected {expected_type})")
+        # Test model validation
+        validation = validate_model_config("microsoft/Phi-3-mini-4k-instruct-gguf", "gguf")
+        if validation["valid"]:
+            logger.info("✅ Model validation test passed")
+        else:
+            logger.warning(f"⚠️ Model validation warnings: {validation['warnings']}")
+        # Test default models
+        default_summary = get_default_model("summarization")
+        logger.info(f"✅ Default summarization model: {default_summary}")
+        logger.info("🎉 Model configuration tests completed!")
+        return True
+    except Exception as e:
+        logger.error(f"❌ Model configuration test failed: {e}")
+        return False
+def test_api_endpoints():
+    """Test the new API endpoints (if server is running)"""
+    logger.info("Testing API Endpoints...")
+    base_url = "http://localhost:7860"  # Adjust if different
+    try:
+        # Test health check
+        response = requests.get(f"{base_url}/api/models/health", timeout=10)
+        if response.status_code == 200:
+            health_data = response.json()
+            logger.info(f"✅ Health check passed: {health_data.get('status', 'unknown')}")
+            logger.info(f"   Loaded models: {health_data.get('loaded_models_count', 0)}")
+            if health_data.get('gpu_info', {}).get('available'):
+                logger.info(f"   GPU memory: {health_data['gpu_info']['memory_allocated']}")
+        else:
+            logger.warning(f"⚠️ Health check failed with status {response.status_code}")
+            return False
+        # Test model info
+        response = requests.get(f"{base_url}/api/models/info", timeout=10)
+        if response.status_code == 200:
+            info_data = response.json()
+            logger.info(f"✅ Model info endpoint working: {info_data.get('total_models', 0)} models loaded")
+        else:
+            logger.warning(f"⚠️ Model info endpoint failed with status {response.status_code}")
+        # Test default models
+        response = requests.get(f"{base_url}/api/models/defaults", timeout=10)
+        if response.status_code == 200:
+            defaults_data = response.json()
+            logger.info(f"✅ Default models endpoint working: {len(defaults_data.get('default_models', {}))} model types available")
+        else:
+            logger.warning(f"⚠️ Default models endpoint failed with status {response.status_code}")
+        logger.info("🎉 API endpoint tests completed!")
+        return True
+    except requests.exceptions.ConnectionError:
+        logger.warning("⚠️ Server not running, skipping API tests")
+        return True
+    except Exception as e:
+        logger.error(f"❌ API endpoint test failed: {e}")
+        return False
+def test_memory_optimization():
+    """Test memory optimization features"""
+    logger.info("Testing Memory Optimization...")
+    try:
+        import torch
+        # Check if we're in Hugging Face Spaces
+        is_hf_space = os.environ.get('SPACE_ID') is not None
+        if is_hf_space:
+            logger.info("🔄 Detected Hugging Face Space - testing memory optimization...")
+            # Test with smaller models
+            from ai_med_extract.utils.model_manager import model_manager
+            loader = model_manager.get_model_loader("facebook/bart-base", "text-generation")
+            result = loader.generate("Test prompt for memory optimization", max_new_tokens=50)
+            logger.info(f"✅ Memory optimization test passed: {len(result)} characters generated")
+        else:
+            logger.info("🔄 Local environment detected - memory optimization not applicable")
+        # Test cache clearing
+        from ai_med_extract.utils.model_manager import model_manager
+        model_manager.clear_cache()
+        logger.info("✅ Cache clearing test passed")
+        return True
+    except Exception as e:
+        logger.error(f"❌ Memory optimization test failed: {e}")
+        return False
+def main():
+    """Main test function"""
+    logger.info("🚀 Starting HNTAI Refactored System Tests...")
+    logger.info("=" * 60)
+    test_results = []
+    # Run all tests
+    tests = [
+        ("Model Manager", test_model_manager),
+        ("Patient Summarizer", test_patient_summarizer),
+        ("Model Configuration", test_model_config),
+        ("API Endpoints", test_api_endpoints),
+        ("Memory Optimization", test_memory_optimization)
+    ]
+    for test_name, test_func in tests:
+        logger.info(f"\n🧪 Running {test_name} Test...")
+        try:
+            result = test_func()
+            test_results.append((test_name, result))
+        except Exception as e:
+            logger.error(f"❌ {test_name} test crashed: {e}")
+            test_results.append((test_name, False))
+    # Summary
+    logger.info("\n" + "=" * 60)
+    logger.info("📊 TEST SUMMARY")
+    logger.info("=" * 60)
+    passed = 0
+    total = len(test_results)
+    for test_name, result in test_results:
+        status = "✅ PASS" if result else "❌ FAIL"
+        logger.info(f"{test_name}: {status}")
+        if result:
+            passed += 1
+    logger.info(f"\nOverall: {passed}/{total} tests passed")
+    if passed == total:
+        logger.info("🎉 All tests passed! The refactored system is working correctly.")
+        logger.info("✨ You can now use any model name and type, including GGUF models!")
+    else:
+        logger.warning(f"⚠️ {total - passed} tests failed. Check the logs above for details.")
+    # Recommendations
+    logger.info("\n💡 RECOMMENDATIONS:")
+    if passed >= total * 0.8:
+        logger.info("✅ System is ready for production use")
+        logger.info("✅ GGUF models are supported for patient summaries")
+        logger.info("✅ Dynamic model loading is working")
+    elif passed >= total * 0.6:
+        logger.info("⚠️ System is mostly working but has some issues")
+        logger.info("⚠️ Check failed tests and fix issues")
+    else:
+        logger.error("❌ System has significant issues")
+        logger.error("❌ Review and fix failed tests before use")
+    return passed == total
+if __name__ == "__main__":
+    success = main()
+    sys.exit(0 if success else 1)