File size: 3,228 Bytes
192e42d a8b2e18 192e42d a8b2e18 192e42d a8b2e18 192e42d a8b2e18 192e42d a8b2e18 a2391ae a8b2e18 0a6dd8f a8b2e18 0a6dd8f a8b2e18 0a6dd8f a8b2e18 164f1d4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 |
---
base_model: Qwen/Qwen2.5-VL-3B-Instruct
library_name: peft
pipeline_tag: text-generation
tags:
- base_model:adapter:Qwen/Qwen2.5-VL-3B-Instruct
- lora
- sft
- trl
- vision-language
- medical
---
# Qwen2.5-VL-3B-Instruct - MIMIC-CXR Fine-tuned
This repository contains a **LoRA fine-tuned adapter** for [Qwen/Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct), trained on the **MIMIC-CXR** dataset.
The goal is to adapt a powerful **multimodal vision-language model** for **medical chest X-ray interpretation**, generating clinical-style reports from chest radiographs.
---
## How to Use
```python
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from peft import PeftModel
from qwen_vl_utils import process_vision_info
import torch
base_model_id = "Qwen/Qwen2.5-VL-3B-Instruct"
adapter_id = "onurulu17/qwen2.5-vl-3b-instruct-mimic-cxr"
# Load base model
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
base_model_id,
device_map="auto",
torch_dtype=torch.bfloat16,
)
# Load LoRA adapter
model = PeftModel.from_pretrained(model, adapter_id)
# Processor
processor = AutoProcessor.from_pretrained(base_model_id)
# Example inference
def generate_text_from_sample(model, processor, sample, max_new_tokens=1024, device="cuda"):
text_input = processor.apply_chat_template(
sample[:1], tokenize=False, add_generation_prompt=True
)
image_inputs, _ = process_vision_info(sample)
model_inputs = processor(
text=[text_input],
images=image_inputs,
return_tensors="pt",
).to(device)
generated_ids = model.generate(**model_inputs, max_new_tokens=max_new_tokens)
trimmed_generated_ids = [out_ids[len(in_ids) :] for in_ids, out_ids in zip(model_inputs.input_ids, generated_ids)]
output_text = processor.batch_decode(
trimmed_generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
return output_text[0]
sample = [
{'role': 'user',
'content': [{'type': 'image',
'image': "./chest_xray.jpg"},
{'type': 'text',
'text': 'Please analyze this chest X-ray and provide the findings and impression.'}]},
]
output = generate_text_from_sample(model, processor, sample)
print(output)
```
---
## Model Details
- **Base model:** Qwen/Qwen2.5-VL-3B-Instruct
- **Adapter type:** LoRA (PEFT)
- **Training objective:** Supervised fine-tuning (SFT) on chest X-ray reports
- **Dataset:** [MIMIC-CXR](https://physionet.org/content/mimic-cxr/2.0.0/) (radiology images + reports)
- **Languages:** English (medical reporting domain)
- **Frameworks:** `transformers`, `peft`, `trl`
---
## Intended Uses
### Direct Use
- Generating radiology-style reports from chest X-ray images.
- Research on applying large multimodal models to medical imaging tasks.
### Downstream Use
- Medical text generation tasks where radiological image context is available.
- Adaptation for other healthcare VQA (Visual Question Answering) tasks.
### Out-of-Scope Use
⚠️ **Not for clinical decision-making.**
This model is intended **for research purposes only**. Do not use it in medical practice without proper validation and regulatory approval.
|