ayeshaishaq/DriveLMMo1 · Hugging Face

DriveLMM-o1: A Large Multimodal Model for Autonomous Driving Reasoning

DriveLMM-o1 is a fine-tuned large multimodal model designed for autonomous driving. Built on InternVL2.5-8B with LoRA-based adaptation, it leverages stitched multiview images to produce step-by-step reasoning. This structured approach enhances both final decision accuracy and interpretability in complex driving tasks like perception, prediction, and planning.

Key Features:

Multimodal Integration: Combines multiview images for comprehensive scene understanding.
Step-by-Step Reasoning: Produces detailed intermediate reasoning steps to explain decisions.
Efficient Adaptation: Utilizes dynamic image patching and LoRA finetuning for high-resolution inputs with minimal extra parameters.
Performance Gains: Achieves significant improvements in both final answer accuracy and overall reasoning scores compared to previous open-source models.

Performance Comparison:

Model	Risk Assessment Accuracy	Traffic Rule Adherence	Scene Awareness & Object Understanding	Relevance	Missing Details	Overall Reasoning Score	Final Answer Accuracy
GPT-4o (Closed)	71.32	80.72	72.96	76.65	71.43	72.52	57.84
Qwen-2.5-VL-7B	46.44	60.45	51.02	50.15	52.19	51.77	37.81
Ovis1.5-Gemma2-9B	51.34	66.36	54.74	55.72	55.74	55.62	48.85
Mulberry-7B	51.89	63.66	56.68	57.27	57.45	57.65	52.86
LLaVA-CoT	57.62	69.01	60.84	62.72	60.67	61.41	49.27
LlamaV-o1	60.20	73.52	62.67	64.66	63.41	63.13	50.02
InternVL2.5-8B	69.02	78.43	71.52	75.80	70.54	71.62	54.87
DriveLMM-o1 (Ours)	73.01	81.56	75.39	79.42	74.49	75.24	62.36

Usage:

Load the model using the following code snippet:

from transformers import AutoModel, AutoTokenizer
import torch

path = 'ayeshaishaq/DriveLMMo1'
model = AutoModel.from_pretrained(
    path,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    use_flash_attn=True,
    trust_remote_code=True
).eval().cuda()

tokenizer = AutoTokenizer.from_pretrained(
    path,
    trust_remote_code=True,
    use_fast=False
)

For detailed usage instructions and additional configurations, please refer to the OpenGVLab/InternVL2_5-8B repository.

Code: https://github.com/ayesha-ishaq/DriveLMM-o1

Limitations: While DriveLMM-o1 demonstrates strong performance in autonomous driving tasks, it is fine-tuned for domain-specific reasoning. Users may need to further fine-tune or adapt the model for different driving environments.

ayeshaishaq
/

DriveLMMo1

Model tree for ayeshaishaq/DriveLMMo1

Dataset used to train ayeshaishaq/DriveLMMo1