JUHE API Marketplace
DATASET
Open Source Community

WebLI

一个包含10亿张图片和120亿个文本的数据集,用于多语言语言-图像模型的训练。

Updated 9/20/2024
github

Description

Awesome-MLLM-Datasets

数据集概述

该项目旨在收集和整理用于多模态大模型训练的各种数据集,包括但不限于预训练数据、指令微调数据和上下文学习数据。目标是提供一个全面的资源库,支持研究人员在开发和优化多模态AI系统时更容易访问高质量的数据集。

数据集分类

预训练数据集

名称图像数量文本数量图像-文本对数量论文链接类型
WebLI10B12B12BPaLI: A Jointly-Scaled Multilingual Language-Image ModelLinkCaptions(109 languages)
LAION-5B5.9B5.9B5.9BLAION-5B: An open large-scale dataset for training next generation image-text modelsLinkCaptions(Multiple languages)
LAION-en2.3B2.3B2.3BLAION-5B: An open large-scale dataset for training next generation image-text modelsLinkCaptions(English)
ALIGN1.8B1.8B1.8BScaling Up Visual and Vision-Language Representation Learning With Noisy Text SupervisionLinkCaptions(English)
DataComp1.4B1.4B1.4BDATACOMP: In search of the next generation of multimodal datasetsLinkCaptions(English)
COYO747M747M747MCOYO-700M: Large-scale Image-Text Pair DatasetLinkCaptions(English)
LAION-COCO600M600M600MLAION COCO: 600M SYNTHETIC CAPTIONS FROM LAION2B-ENLinkCaptions(English)
LAION-400M400M400M400MLAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text PairsLinkCaptions(English)
Episodic WebLI400M400M400MPaLI-X: On Scaling up a Multilingual Vision and Language Model-Captions(English)
CLIP400M400M400MLearning Transferable Visual Models From Natural Language SupervisionLinkCaptions(English)
LTIP312M312M312MFlamingo: a Visual Language Model for Few-Shot Learning-Captions(English)
FILIP300M300M300MFILIP: Fine-grained Interactive Language-Image Pre-Training-Captions(English)
LAION-zh142M142M142MLAION-5B: An open large-scale dataset for training next generation image-text modelsLinkCaptions(Chinese)
Obelics353M115M141MOBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text DocumentsLinkInterleaved image-text web documents
MMC4571M43B101.2MMultimodal C4: An Open, Billion-scale Corpus of Images Interleaved With TextLinkInterleaved image-text
Wukong101M101M101MWuKong:100 Million Large-scale Chinese Cross-modal Pre-training Dataset and A Foundation FrameworkLinkCaptions(Chinese)
M3W185M182GB43.3MFlamingo: a Visual Language Model for Few-Shot Learning-Captions(English)
WIT11.5M37.6M37.6MWIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine LearningLinkCaptions(English)
GQA113K22M22MGQA: A New Dataset for Real-World Visual Reasoning and Compositional Question AnsweringLinkVisual Reasoning and Compositional Question Answering(English)
CC12M12.4M12.4M12.4MConceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual ConceptsLinkCaptions(English)
Red Caps12M12M12MRedCaps: Web-curated image-text data created by the people, for the peopleLinkCaptions(English)
Visual Genome108k4.5M4.5MVisual Genome: Connecting Language and Vision Using Crowdsourced Dense Image AnnotationsLinkAnnotations(English)
DVQA300K3.5M3.5MDVQA: Understanding Data Visualizations via Question AnsweringLinkQuestion answering(English)
CC3M3.3M3.3M3.3MConceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image CaptioningLinkCaptions(English)
MS-COCO328k2.5M2.5MMicrosoft COCO: Common Objects in ContextLinkObject detection,Segmentation,Caption(English)
AI Challenger Captions300K1.5M1.5MAI Challenger : A Large-scale Dataset for Going Deeper in Image UnderstandingLinkCaptions(English)
VQA v2265K1.4M1.4MMaking the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question AnsweringLinkVisual question answering(English)
SBU(Image Caption)1M1M1MIm2Text: Describing Images Using 1 Million Captioned PhotographsLinkCaptions(English)
OCR-VQA207K1M1MOCR-VQA: Visual Question Answering by Reading Text in ImagesLinkVisual question answering(English)
COCO Caption164K1M1MMicrosoft COCO Captions: Data Collection and Evaluation ServerLinkObject detection,Segmentation,Caption(English)
CC595k595K595K595KVisual Instruction TuningLinkCaptions(English)
Visual-7W47.3K328K328KVisual7W: Grounded Question Answering in Images--
Flickr30k31K158K158KFrom image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptionsLinkAnnotations(English)
Text Captions28K145K145KTextCaps: a Dataset for Image Captioning with Reading Comprehension--
RefCOCO20K142K142KReferItGame: Referring to Objects in Photographs of Natural Scenes--

多模态指令微调数据集

  • 待补充

上下文学习数据集

  • 待补充

多模态思维链数据集

  • 待补充

多模态RLHF数据集

  • 待补充

评估基准数据集

  • 待补充

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

South African Theatre
Multilingual Processing

Source

Organization: github

Created: 9/2/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.