Configuration Parsing Warning: In adapter_config.json: "peft.base_model_name_or_path" must be a string

Qwen2.5-VL-3B-Instruct - MIMIC-CXR Fine-tuned

This repository contains a LoRA fine-tuned adapter for Qwen/Qwen2.5-VL-3B-Instruct, trained on the MIMIC-CXR dataset.
The goal is to adapt a powerful multimodal vision-language model for medical chest X-ray interpretation, generating clinical-style reports from chest radiographs.


How to Use

from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from peft import PeftModel
from qwen_vl_utils import process_vision_info
import torch

base_model_id = "Qwen/Qwen2.5-VL-3B-Instruct"
adapter_id = "onurulu17/qwen2.5-vl-3b-instruct-mimic-cxr"

# Load base model
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    base_model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

# Load LoRA adapter
model = PeftModel.from_pretrained(model, adapter_id)

# Processor
processor = AutoProcessor.from_pretrained(base_model_id)

# Example inference
def generate_text_from_sample(model, processor, sample, max_new_tokens=1024, device="cuda"):
    text_input = processor.apply_chat_template(
        sample[:1], tokenize=False, add_generation_prompt=True
    )
    image_inputs, _ = process_vision_info(sample)
    model_inputs = processor(
        text=[text_input],
        images=image_inputs,
        return_tensors="pt",
    ).to(device)  
    generated_ids = model.generate(**model_inputs, max_new_tokens=max_new_tokens)
    trimmed_generated_ids = [out_ids[len(in_ids) :] for in_ids, out_ids in zip(model_inputs.input_ids, generated_ids)]
    output_text = processor.batch_decode(
        trimmed_generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False
    )

    return output_text[0]  

sample = [
{'role': 'user',
  'content': [{'type': 'image',
    'image': "./chest_xray.jpg"},
   {'type': 'text',
    'text': 'Please analyze this chest X-ray and provide the findings and impression.'}]},
 ]

output = generate_text_from_sample(model, processor, sample)
print(output)

Model Details

  • Base model: Qwen/Qwen2.5-VL-3B-Instruct
  • Adapter type: LoRA (PEFT)
  • Training objective: Supervised fine-tuning (SFT) on chest X-ray reports
  • Dataset: MIMIC-CXR (radiology images + reports)
  • Languages: English (medical reporting domain)
  • Frameworks: transformers, peft, trl

Intended Uses

Direct Use

  • Generating radiology-style reports from chest X-ray images.
  • Research on applying large multimodal models to medical imaging tasks.

Downstream Use

  • Medical text generation tasks where radiological image context is available.
  • Adaptation for other healthcare VQA (Visual Question Answering) tasks.

Out-of-Scope Use

โš ๏ธ Not for clinical decision-making.
This model is intended for research purposes only. Do not use it in medical practice without proper validation and regulatory approval.

Downloads last month
22
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for onurulu17/qwen2.5-vl-3b-instruct-mimic-cxr

Adapter
(47)
this model