DriveLMM-o1: A Large Multimodal Model for Autonomous Driving Reasoning

Paper

DriveLMM-o1 is a fine-tuned large multimodal model designed for autonomous driving. Built on InternVL2.5-8B with LoRA-based adaptation, it leverages stitched multiview images to produce step-by-step reasoning. This structured approach enhances both final decision accuracy and interpretability in complex driving tasks like perception, prediction, and planning.

Key Features:

  • Multimodal Integration: Combines multiview images for comprehensive scene understanding.
  • Step-by-Step Reasoning: Produces detailed intermediate reasoning steps to explain decisions.
  • Efficient Adaptation: Utilizes dynamic image patching and LoRA finetuning for high-resolution inputs with minimal extra parameters.
  • Performance Gains: Achieves significant improvements in both final answer accuracy and overall reasoning scores compared to previous open-source models.

Performance Comparison:

Model Risk Assessment Accuracy Traffic Rule Adherence Scene Awareness & Object Understanding Relevance Missing Details Overall Reasoning Score Final Answer Accuracy
GPT-4o (Closed) 71.32 80.72 72.96 76.65 71.43 72.52 57.84
Qwen-2.5-VL-7B 46.44 60.45 51.02 50.15 52.19 51.77 37.81
Ovis1.5-Gemma2-9B 51.34 66.36 54.74 55.72 55.74 55.62 48.85
Mulberry-7B 51.89 63.66 56.68 57.27 57.45 57.65 52.86
LLaVA-CoT 57.62 69.01 60.84 62.72 60.67 61.41 49.27
LlamaV-o1 60.20 73.52 62.67 64.66 63.41 63.13 50.02
InternVL2.5-8B 69.02 78.43 71.52 75.80 70.54 71.62 54.87
DriveLMM-o1 (Ours) 73.01 81.56 75.39 79.42 74.49 75.24 62.36

Usage:

Load the model using the following code snippet:

from transformers import AutoModel, AutoTokenizer
import torch

path = 'ayeshaishaq/DriveLMMo1'
model = AutoModel.from_pretrained(
    path,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    use_flash_attn=True,
    trust_remote_code=True
).eval().cuda()

tokenizer = AutoTokenizer.from_pretrained(
    path,
    trust_remote_code=True,
    use_fast=False
)

For detailed usage instructions and additional configurations, please refer to the OpenGVLab/InternVL2_5-8B repository.

Code: https://github.com/ayesha-ishaq/DriveLMM-o1

Limitations: While DriveLMM-o1 demonstrates strong performance in autonomous driving tasks, it is fine-tuned for domain-specific reasoning. Users may need to further fine-tune or adapt the model for different driving environments.

Downloads last month
39
Safetensors
Model size
8.11B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support model that require custom code execution.

Model tree for ayeshaishaq/DriveLMMo1

Finetuned
(7)
this model

Dataset used to train ayeshaishaq/DriveLMMo1