WebVi3D
WebVi3D is a multi‑view image dataset containing 320 M frames extracted from 16 M video clips, used for training See3D models. The dataset expands training data by automatically filtering video clips with inconsistent multi‑view information or insufficient observations, yielding a high‑quality, diverse multi‑view image collection.
Description
See3D Dataset Overview
Dataset Summary
See3D is a vision‑conditioned multi‑view diffusion model trained on large‑scale internet video data for open‑world 3D creation. The model extracts visual content solely from video data to generate 3D knowledge.
Dataset Features
- WebVi3D: 320 M image frames from 16 M video clips, used for multi‑view training.
- Data Curation: Automatic filtering removes clips with inconsistent multi‑view cues or insufficient observation, producing a high‑quality, diverse dataset.
- Pose‑Free: By introducing temporally dependent visual noise, the approach eliminates the need for explicit pose annotations.
Applications
- 3D Generation: Supports object‑level and scene‑level 3D generation, including sparse‑view‑to‑3D, text/image‑to‑3D, and 3D editing.
- High‑Fidelity 3D: Integrating See3D into distortion‑based pipelines yields high‑fidelity 3D outputs.
Dataset Download
- Pre‑trained Models & Test Data: Available from Google Drive.
Citation
If you use the See3D dataset, please cite:
@inproceedings{Ma2024See3D,
title = {You See it, You Got it: Learning 3D Creation on Pose‑Free Videos at Scale},
author = {Baorui Ma and Huachen Gao and Haoge Deng and Zhengxiong Luo and Tiejun Huang and Lulu Tang and Xinlong Wang},
journal = {arXiv preprint arXiv:2412.06699},
year = {2024}
}
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: github
Created: 12/9/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.