onurulu17's picture
Update README.md
164f1d4 verified
metadata
base_model: Qwen/Qwen2.5-VL-3B-Instruct
library_name: peft
pipeline_tag: text-generation
tags:
  - base_model:adapter:Qwen/Qwen2.5-VL-3B-Instruct
  - lora
  - sft
  - trl
  - vision-language
  - medical

Qwen2.5-VL-3B-Instruct - MIMIC-CXR Fine-tuned

This repository contains a LoRA fine-tuned adapter for Qwen/Qwen2.5-VL-3B-Instruct, trained on the MIMIC-CXR dataset.
The goal is to adapt a powerful multimodal vision-language model for medical chest X-ray interpretation, generating clinical-style reports from chest radiographs.


How to Use

from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from peft import PeftModel
from qwen_vl_utils import process_vision_info
import torch

base_model_id = "Qwen/Qwen2.5-VL-3B-Instruct"
adapter_id = "onurulu17/qwen2.5-vl-3b-instruct-mimic-cxr"

# Load base model
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    base_model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

# Load LoRA adapter
model = PeftModel.from_pretrained(model, adapter_id)

# Processor
processor = AutoProcessor.from_pretrained(base_model_id)

# Example inference
def generate_text_from_sample(model, processor, sample, max_new_tokens=1024, device="cuda"):
    text_input = processor.apply_chat_template(
        sample[:1], tokenize=False, add_generation_prompt=True
    )
    image_inputs, _ = process_vision_info(sample)
    model_inputs = processor(
        text=[text_input],
        images=image_inputs,
        return_tensors="pt",
    ).to(device)  
    generated_ids = model.generate(**model_inputs, max_new_tokens=max_new_tokens)
    trimmed_generated_ids = [out_ids[len(in_ids) :] for in_ids, out_ids in zip(model_inputs.input_ids, generated_ids)]
    output_text = processor.batch_decode(
        trimmed_generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False
    )

    return output_text[0]  

sample = [
{'role': 'user',
  'content': [{'type': 'image',
    'image': "./chest_xray.jpg"},
   {'type': 'text',
    'text': 'Please analyze this chest X-ray and provide the findings and impression.'}]},
 ]

output = generate_text_from_sample(model, processor, sample)
print(output)

Model Details

  • Base model: Qwen/Qwen2.5-VL-3B-Instruct
  • Adapter type: LoRA (PEFT)
  • Training objective: Supervised fine-tuning (SFT) on chest X-ray reports
  • Dataset: MIMIC-CXR (radiology images + reports)
  • Languages: English (medical reporting domain)
  • Frameworks: transformers, peft, trl

Intended Uses

Direct Use

  • Generating radiology-style reports from chest X-ray images.
  • Research on applying large multimodal models to medical imaging tasks.

Downstream Use

  • Medical text generation tasks where radiological image context is available.
  • Adaptation for other healthcare VQA (Visual Question Answering) tasks.

Out-of-Scope Use

⚠️ Not for clinical decision-making.
This model is intended for research purposes only. Do not use it in medical practice without proper validation and regulatory approval.