MVCap-4M
The MVCap‑4M dataset is a large‑scale multi‑view image‑text pair dataset designed for studying viewpoint invariance of vision‑language pretraining (VLP) models. It contains over 4.6 million multi‑view image‑text pairs covering more than 100 000 objects. The dataset combines multiple 3D assets with real‑world multi‑view data, renders extensive multi‑view images, and employs visual large language models (VLLM) for automatic caption generation, yielding semantically rich descriptions. A class‑guided prompting strategy ensures category consistency across viewpoints.
Description
MVCap‑4M Dataset Overview
Dataset Information
- Name: MVCap‑4M
- Language: English
- Task Categories:
- Zero‑Shot Classification
- Feature Extraction
- Scale: 1M < n < 10M
- Configuration:
- Default configuration
- Data files:
- Training set: metadata.json
Dataset Description
MVCap‑4M is a large‑scale dataset expressly designed for viewpoint‑invariant research of vision‑language pretraining models. It comprises over 4.6 million multi‑view image‑text pairs involving more than 100 000 objects. The dataset was constructed by integrating various 3D assets and real‑world multi‑view data, using visual large language models (VLLM) to automatically generate captions with rich semantics. To maintain category consistency across viewpoints, a class‑guided prompting strategy is applied.
Data File Structure
-
metadata.json: Stores each image sample’s path, caption, object ID, and image ID.
{ "path": "./views/54cadb86f3db4aa6920f673aeff0d1e3/026.png", "caption": "The rocking chair in the image is made of metal and has a green cushion on it.", "obj_id": 3177, "img_id": 317726 } -
Source Multi‑View Images: Sampled from three existing 3D datasets.
- Objavers‑80k: stored in
/views - IM3D: stored in
/im3d - MVImgNet: stored in
/mvimgnet
- Objavers‑80k: stored in
Citation
If you use this dataset, please cite:
@article{Ruan2024Omniview,
title={Omniview‑Tuning: Boosting Viewpoint Invariance of Vision‑Language Pre‑training Models},
author={{Shouwei Ruan, Yinpeng Dong, Hanqing Liu, Yao Huang, Hang Su, Xingxing Wei}},
journal={European Conference on Computer Vision (ECCV)},
year={2024}
}
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: huggingface
Created: 7/4/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.