File size: 3,228 Bytes
192e42d
 
 
 
 
 
 
 
 
a8b2e18
 
192e42d
 
a8b2e18
192e42d
a8b2e18
 
192e42d
a8b2e18
192e42d
a8b2e18
 
a2391ae
a8b2e18
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0a6dd8f
a8b2e18
0a6dd8f
a8b2e18
 
0a6dd8f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a8b2e18
164f1d4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
---
base_model: Qwen/Qwen2.5-VL-3B-Instruct
library_name: peft
pipeline_tag: text-generation
tags:
- base_model:adapter:Qwen/Qwen2.5-VL-3B-Instruct
- lora
- sft
- trl
- vision-language
- medical
---

# Qwen2.5-VL-3B-Instruct - MIMIC-CXR Fine-tuned

This repository contains a **LoRA fine-tuned adapter** for [Qwen/Qwen2.5-VL-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct), trained on the **MIMIC-CXR** dataset.  
The goal is to adapt a powerful **multimodal vision-language model** for **medical chest X-ray interpretation**, generating clinical-style reports from chest radiographs.

---

## How to Use

```python
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from peft import PeftModel
from qwen_vl_utils import process_vision_info
import torch

base_model_id = "Qwen/Qwen2.5-VL-3B-Instruct"
adapter_id = "onurulu17/qwen2.5-vl-3b-instruct-mimic-cxr"

# Load base model
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    base_model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

# Load LoRA adapter
model = PeftModel.from_pretrained(model, adapter_id)

# Processor
processor = AutoProcessor.from_pretrained(base_model_id)

# Example inference
def generate_text_from_sample(model, processor, sample, max_new_tokens=1024, device="cuda"):
    text_input = processor.apply_chat_template(
        sample[:1], tokenize=False, add_generation_prompt=True
    )
    image_inputs, _ = process_vision_info(sample)
    model_inputs = processor(
        text=[text_input],
        images=image_inputs,
        return_tensors="pt",
    ).to(device)  
    generated_ids = model.generate(**model_inputs, max_new_tokens=max_new_tokens)
    trimmed_generated_ids = [out_ids[len(in_ids) :] for in_ids, out_ids in zip(model_inputs.input_ids, generated_ids)]
    output_text = processor.batch_decode(
        trimmed_generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False
    )

    return output_text[0]  

sample = [
{'role': 'user',
  'content': [{'type': 'image',
    'image': "./chest_xray.jpg"},
   {'type': 'text',
    'text': 'Please analyze this chest X-ray and provide the findings and impression.'}]},
 ]

output = generate_text_from_sample(model, processor, sample)
print(output)
```
---

## Model Details

- **Base model:** Qwen/Qwen2.5-VL-3B-Instruct  
- **Adapter type:** LoRA (PEFT)  
- **Training objective:** Supervised fine-tuning (SFT) on chest X-ray reports  
- **Dataset:** [MIMIC-CXR](https://physionet.org/content/mimic-cxr/2.0.0/) (radiology images + reports)  
- **Languages:** English (medical reporting domain)  
- **Frameworks:** `transformers`, `peft`, `trl`

---

## Intended Uses

### Direct Use
- Generating radiology-style reports from chest X-ray images.  
- Research on applying large multimodal models to medical imaging tasks.  

### Downstream Use
- Medical text generation tasks where radiological image context is available.  
- Adaptation for other healthcare VQA (Visual Question Answering) tasks.  

### Out-of-Scope Use
⚠️ **Not for clinical decision-making.**  
This model is intended **for research purposes only**. Do not use it in medical practice without proper validation and regulatory approval.