WebVi3D

WebVi3D is a multi‑view image dataset containing 320 M frames extracted from 16 M video clips, used for training See3D models. The dataset expands training data by automatically filtering video clips with inconsistent multi‑view information or insufficient observations, yielding a high‑quality, diverse multi‑view image collection.

Updated 12/10/2024

github

See3D Dataset Overview

Dataset Summary

See3D is a vision‑conditioned multi‑view diffusion model trained on large‑scale internet video data for open‑world 3D creation. The model extracts visual content solely from video data to generate 3D knowledge.

Dataset Features

WebVi3D: 320 M image frames from 16 M video clips, used for multi‑view training.
Data Curation: Automatic filtering removes clips with inconsistent multi‑view cues or insufficient observation, producing a high‑quality, diverse dataset.
Pose‑Free: By introducing temporally dependent visual noise, the approach eliminates the need for explicit pose annotations.

Applications

3D Generation: Supports object‑level and scene‑level 3D generation, including sparse‑view‑to‑3D, text/image‑to‑3D, and 3D editing.
High‑Fidelity 3D: Integrating See3D into distortion‑based pipelines yields high‑fidelity 3D outputs.

Dataset Download

Pre‑trained Models & Test Data: Available from Google Drive.

Citation

If you use the See3D dataset, please cite:

@inproceedings{Ma2024See3D,
    title = {You See it, You Got it: Learning 3D Creation on Pose‑Free Videos at Scale},
    author = {Baorui Ma and Huachen Gao and Haoge Deng and Zhengxiong Luo and Tiejun Huang and Lulu Tang and Xinlong Wang},
    journal = {arXiv preprint arXiv:2412.06699},
    year = {2024}
}

WebVi3D

Description

See3D Dataset Overview

Dataset Summary

Dataset Features

Applications

Dataset Download

Citation

AI studio

Access Dataset

Topics

Source