DriveMLLM
The DriveMLLM dataset, created by the Institute of Automation, Chinese Academy of Sciences and other institutions, focuses on spatial understanding tasks in autonomous driving scenarios. It contains 880 forward‑camera images covering absolute and relative spatial reasoning tasks, accompanied by rich natural‑language questions. Built upon the nuScenes dataset, the images were strictly selected and annotated to ensure clear visibility of objects and explicit spatial relationships. DriveMLLM aims to evaluate and improve multimodal large language models' spatial reasoning abilities in autonomous driving, addressing complex spatial relation understanding.
Description
DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous Driving
Dataset Overview
- Dataset Name: MLLM_eval_dataset
- Data Source:
- Images come from the nuScenes validation set
CAM_FRONT. - A
metadata.jsonlfile provides image attributes such aslocation2D.
- Images come from the nuScenes validation set
- Purpose: Evaluate multimodal large language models on spatial understanding in autonomous driving.
Using the Dataset
0. Prepare the Dataset
- Dataset Link: MLLM_eval_dataset
1. Environment Setup
- Setup Documentation: Setup Environment
2. Inference
- Inference Scripts:
- GPT API:
export OPENAI_API_KEY=your_api_key python inference/get_MLLM_output.py \ --model_type gpt \ --model gpt-4o \ --hf_dataset bonbon-rj/MLLM_eval_dataset \ --prompts_dir prompt/prompts \ --save_dir inference/mllm_outputs - Gemini API:
export GOOGLE_API_KEY=your_api_key python inference/get_MLLM_output.py \ --model_type gemini \ --model models/gemini-1.5-flash \ --hf_dataset bonbon-rj/MLLM_eval_dataset \ --prompts_dir prompt/prompts \ --save_dir inference/mllm_outputs - Local LLaVA‑Next:
python inference/get_MLLM_output.py \ --model_type llava \ --model lmms-lab/llava-onevision-qwen2-7b-si \ --hf_dataset bonbon-rj/MLLM_eval_dataset \ --prompts_dir prompt/prompts \ --save_dir inference/mllm_outputs - Local QWen2‑VL:
python inference/get_MLLM_output.py \ --model_type qwen \ --model Qwen/Qwen2-VL-7B-Instruct \ --hf_dataset bonbon-rj/MLLM_eval_dataset \ --prompts_dir prompt/prompts \ --save_dir inference/mllm_outputs
- GPT API:
3. Evaluation
- Evaluation Scripts:
- All Results:
python evaluation/eval_from_json.py \ --hf_dataset bonbon-rj/MLLM_eval_dataset \ --eval_root_dir inference/mllm_outputs \ --save_dir evaluation/eval_result \ --eval_model_path all - Specific Model:
python evaluation/eval_from_json.py \ --hf_dataset bonbon-rj/MLLM_eval_dataset \ --eval_root_dir inference/mllm_outputs \ --save_dir evaluation/eval_result \ --eval_model_path gemini/gemini-1.5-flash
- All Results:
Citation
@article{DriveMLLM,
title={DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous Driving},
author={Guo, Xianda and Zhang Ruijun and Duan Yiqun and He Yuhang and Zhang, Chenming and Chen, Long},
journal={arXiv preprint arXiv:2411.13112},
year={2024}
}
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: arXiv
Created: 11/20/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.