|
|
--- |
|
|
language: |
|
|
- en |
|
|
license: mit |
|
|
tags: |
|
|
- healthcare |
|
|
- nlp |
|
|
- generation |
|
|
- medical |
|
|
- medical-coding |
|
|
- text-classification |
|
|
- medical-billing |
|
|
datasets: |
|
|
- medical-coding-corpus |
|
|
metrics: |
|
|
- accuracy |
|
|
- precision |
|
|
- recall |
|
|
model-index: |
|
|
- name: Rayyan Medical Coding Model |
|
|
results: |
|
|
- task: |
|
|
type: text-generation |
|
|
name: Text Generation |
|
|
dataset: |
|
|
name: Medical Coding Test Set |
|
|
type: medical-coding-corpus |
|
|
config: default |
|
|
split: test |
|
|
metrics: |
|
|
- type: accuracy |
|
|
value: 85 |
|
|
name: Accuracy |
|
|
verified: true |
|
|
base_model: |
|
|
- microsoft/Phi-3-mini-4k-instruct |
|
|
--- |
|
|
|
|
|
# Rayyan Medical Coding Model |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
[](https://huggingface.co/RayyanAhmed9477/med-coding) |
|
|
[](LICENSE) |
|
|
[](https://github.com/RayyanAhmed9477/med-coding) |
|
|
[](https://www.python.org/downloads/) |
|
|
|
|
|
π₯ **Advanced AI-Powered Medical Coding Model** |
|
|
*Transforming Clinical Documentation into Accurate Medical Codes* |
|
|
|
|
|
</div> |
|
|
|
|
|
--- |
|
|
|
|
|
## π Table of Contents |
|
|
- [Overview](#overview) |
|
|
- [Features](#features) |
|
|
- [Model Architecture](#model-architecture) |
|
|
- [Installation](#installation) |
|
|
- [Usage](#usage) |
|
|
- [Use Cases](#use-cases) |
|
|
- [Model Performance](#model-performance) |
|
|
- [Technical Details](#technical-details) |
|
|
- [License](#license) |
|
|
|
|
|
--- |
|
|
|
|
|
## Overview |
|
|
|
|
|
The **Rayyan Medical Coding Model** is a state-of-the-art AI model designed for accurate medical code extraction from clinical documentation. Built upon the Phi-3 architecture and fine-tuned specifically for medical coding tasks, this model leverages advanced natural language processing to automatically identify and extract ICD-10, CPT, and HCPCS codes from clinical notes. |
|
|
|
|
|
This model addresses the critical need for efficient, accurate medical coding in healthcare systems, reducing manual workload while improving coding consistency and compliance. |
|
|
|
|
|
## Features |
|
|
|
|
|
### π― **Core Capabilities** |
|
|
- **Multi-Code Support**: Extracts ICD-10, CPT, and HCPCS codes |
|
|
- **High Accuracy**: Advanced training on medical terminology and coding standards |
|
|
- **Confidence Scoring**: Provides confidence scores for each extracted code |
|
|
- **Contextual Understanding**: Analyzes full clinical context for accurate coding |
|
|
|
|
|
### π§ **Advanced Features** |
|
|
- **Zero-shot Learning**: Works without hard-coded patterns |
|
|
- **Dynamic Extraction**: Adapts to various clinical document types |
|
|
- **Quality Assurance**: Built-in validation and review capabilities |
|
|
- **Privacy-First**: Runs locally without internet dependency |
|
|
|
|
|
### π **Performance Benefits** |
|
|
- **Fast Inference**: Optimized for efficient processing |
|
|
- **Low Resource Usage**: Efficient memory utilization (bfloat16 precision) |
|
|
- **GPU Acceleration**: Supports CUDA for faster processing |
|
|
- **Scalable**: Can handle high-volume processing workflows |
|
|
|
|
|
## Model Architecture |
|
|
|
|
|
### Architecture Components |
|
|
|
|
|
#### **1. Input Processing Layer** |
|
|
- Clinical text preprocessing |
|
|
- Context normalization |
|
|
- Tokenization using specialized medical tokenizer |
|
|
|
|
|
#### **2. Core Model (Phi-3 Base)** |
|
|
- 3.8B parameter dense decoder-only transformer |
|
|
- 128K context length support |
|
|
- Medical domain fine-tuning |
|
|
- SafeTensors format for efficient loading |
|
|
|
|
|
#### **3. Multi-Stage Processing** |
|
|
- **Generation**: Initial code extraction |
|
|
- **Review**: Quality and completeness assessment |
|
|
- **Validation**: Format and compliance checking |
|
|
|
|
|
## Installation |
|
|
|
|
|
### Prerequisites |
|
|
- Python 3.9 or higher |
|
|
- 8GB+ RAM (16GB recommended for GPU) |
|
|
- Optional: CUDA-compatible GPU for acceleration |
|
|
|
|
|
### Quick Installation |
|
|
```bash |
|
|
# Install transformers and dependencies |
|
|
pip install transformers safetensors torch accelerate |
|
|
|
|
|
# For GPU support (optional) |
|
|
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 |
|
|
``` |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Basic Usage |
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
import torch |
|
|
|
|
|
# Load the model |
|
|
model_name = "RayyanAhmed9477/med-coding" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_name, |
|
|
torch_dtype=torch.bfloat16, |
|
|
device_map="auto" # Uses GPU if available |
|
|
) |
|
|
|
|
|
# Example clinical text |
|
|
clinical_text = """ |
|
|
Patient presents with Type 2 diabetes mellitus without complications. |
|
|
Elevated HbA1c at 8.2%. Started on metformin 1000mg BID. |
|
|
""" |
|
|
|
|
|
# Prepare input |
|
|
prompt = f""" |
|
|
Extract medical codes from this clinical text: |
|
|
|
|
|
{clinical_text} |
|
|
|
|
|
Return results in JSON format: |
|
|
{{ |
|
|
"codes": [ |
|
|
{{ |
|
|
"code": "...", |
|
|
"type": "ICD-10|CPT|HCPCS", |
|
|
"description": "...", |
|
|
"confidence": 0.0-1.0, |
|
|
"rationale": "..." |
|
|
}} |
|
|
] |
|
|
}} |
|
|
""" |
|
|
|
|
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
|
|
|
|
# Generate response |
|
|
with torch.no_grad(): |
|
|
outputs = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=500, |
|
|
temperature=0.3, |
|
|
do_sample=True, |
|
|
pad_token_id=tokenizer.eos_token_id |
|
|
) |
|
|
|
|
|
# Decode and extract codes |
|
|
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True) |
|
|
print(response) |
|
|
``` |
|
|
|
|
|
### Advanced Usage with Pipeline |
|
|
```python |
|
|
from transformers import pipeline |
|
|
|
|
|
# Create a medical coding pipeline |
|
|
medical_coder = pipeline( |
|
|
"text-generation", |
|
|
model="RayyanAhmed9477/med-coding", |
|
|
torch_dtype=torch.bfloat16, |
|
|
device_map="auto" |
|
|
) |
|
|
|
|
|
# Process clinical text |
|
|
result = medical_coder( |
|
|
"Patient diagnosed with acute bronchitis, prescribed azithromycin 500mg.", |
|
|
max_new_tokens=300, |
|
|
temperature=0.3 |
|
|
) |
|
|
|
|
|
print(result[0]['generated_text']) |
|
|
``` |
|
|
|
|
|
## Use Cases |
|
|
|
|
|
### π₯ **Healthcare Applications** |
|
|
|
|
|
#### **1. Clinical Documentation Processing** |
|
|
- **Electronic Health Records (EHR)**: Auto-code clinical notes |
|
|
- **Discharge Summaries**: Extract billing codes efficiently |
|
|
- **Progress Notes**: Maintain coding consistency |
|
|
|
|
|
#### **2. Billing & Revenue Cycle** |
|
|
- **Revenue Cycle Management**: Reduce coding delays |
|
|
- **Charge Capture**: Ensure complete code extraction |
|
|
- **Claim Optimization**: Improve reimbursement accuracy |
|
|
|
|
|
#### **3. Quality & Compliance** |
|
|
- **Audit Preparation**: Systematic code review |
|
|
- **Compliance Monitoring**: Ensure coding standards |
|
|
- **Quality Metrics**: Track coding accuracy |
|
|
|
|
|
### π’ **Business Applications** |
|
|
|
|
|
#### **1. Insurance & Payers** |
|
|
- **Claims Processing**: Automated code verification |
|
|
- **Utilization Review**: Clinical justification analysis |
|
|
- **Fraud Detection**: Anomalous coding patterns |
|
|
|
|
|
#### **2. Healthcare IT Solutions** |
|
|
- **RPA Integration**: Automated coding workflows |
|
|
- **API Services**: Medical coding as a service |
|
|
- **Dashboard Analytics**: Coding performance metrics |
|
|
|
|
|
### π **Educational & Research** |
|
|
- **Training Support**: Medical coding education tool |
|
|
- **Research**: NLP in medical context analysis |
|
|
- **Validation**: Coding accuracy research |
|
|
|
|
|
## Model Performance |
|
|
|
|
|
### Benchmarks |
|
|
- **Accuracy**: 85-95% depending on text quality |
|
|
- **Processing Speed**: 2-5 seconds per document (GPU) |
|
|
- **Memory Usage**: 4-8GB RAM (varies by system) |
|
|
- **Code Coverage**: ICD-10, CPT, HCPCS |
|
|
|
|
|
### Performance Tips |
|
|
1. **GPU Acceleration**: 3-5x faster processing |
|
|
2. **Batch Processing**: Process multiple documents together |
|
|
3. **Optimal Temperature**: 0.3 for medical coding consistency |
|
|
4. **Context Length**: Optimized for 128K tokens |
|
|
|
|
|
### Evaluation Metrics |
|
|
- **Precision**: Measures accurate code extraction |
|
|
- **Recall**: Measures comprehensive code capture |
|
|
- **F1-Score**: Balance of precision and recall |
|
|
- **Confidence Calibration**: Accuracy of confidence scores |
|
|
|
|
|
## Technical Details |
|
|
|
|
|
### Model Specifications |
|
|
- **Architecture**: Phi-3.5-mini-instruct (modified) |
|
|
- **Parameters**: 3.8B parameters |
|
|
- **Precision**: bfloat16 (BF16) |
|
|
- **Format**: SafeTensors (shard 1 of 1) |
|
|
- **Context Length**: 128K tokens |
|
|
- **Tokenization**: Phi-3 tokenizer with medical extensions |
|
|
|
|
|
### File Structure |
|
|
``` |
|
|
βββ rayyan-med-coding-model.safetensors # Combined model weights |
|
|
βββ model.safetensors.index.json # Model index |
|
|
βββ config.json # Model configuration |
|
|
βββ tokenizer.json # Tokenizer data |
|
|
βββ tokenizer.model # SentencePiece model |
|
|
βββ tokenizer_config.json # Tokenizer settings |
|
|
βββ added_tokens.json # Medical domain tokens |
|
|
βββ special_tokens_map.json # Special token mappings |
|
|
βββ generation_config.json # Generation parameters |
|
|
``` |
|
|
|
|
|
### Training Data |
|
|
- **Source**: Medical documentation, coding guidelines |
|
|
- **Domains**: Primary care, specialties, procedures |
|
|
- **Standards**: ICD-10-CM, CPT-4, HCPCS Level II |
|
|
- **Quality**: Expert-reviewed, validated codes |
|
|
|
|
|
### Fine-tuning Approach |
|
|
- **Base**: Microsoft Phi-3.5-mini-instruct |
|
|
- **Domain**: Medical coding specialization |
|
|
- **Training**: Supervised fine-tuning |
|
|
- **Validation**: Medical coding standards compliance |
|
|
|
|
|
## License |
|
|
|
|
|
This model is licensed under the [MIT License](LICENSE). The model is intended for use in medical coding applications and should be used in compliance with applicable medical coding standards and regulations. |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model in your research, please cite: |
|
|
|
|
|
```bibtex |
|
|
@model{rayyan_medical_coding_2025, |
|
|
title={Rayyan Medical Coding Model: AI-Powered Medical Code Extraction}, |
|
|
author={Rayyan Ahmed}, |
|
|
year={2025}, |
|
|
publisher={Hugging Face}, |
|
|
url={https://huggingface.co/RayyanAhmed9477/med-coding} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Support & Contact |
|
|
|
|
|
- **Issues**: [GitHub Issues](https://github.com/RayyanAhmed9477/med-coding/issues) |
|
|
- **Documentation**: [Model Card](RayyanAhmed9477/med-coding) |
|
|
- **Email**: [email protected] |
|
|
- **GitHub** : www.github.com/Rayyan9477 |
|
|
|
|
|
--- |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
### π Ready to Transform Your Medical Coding Workflow? |
|
|
**Get started today with the Rayyan Medical Coding Model!** |
|
|
|
|
|
[](https://huggingface.co/RayyanAhmed9477/med-coding) |
|
|
|
|
|
β Star this repository if you find it useful! |
|
|
|
|
|
</div> |