zzliang/GRIT
GRIT is a large‑scale image‑text pair dataset built on COYO‑700M and LAION‑2B, focusing on precise grounding of text to image regions. The dataset extracts and links textual fragments (such as noun phrases and referring expressions) to corresponding image areas, supporting tasks including image captioning, visual question answering, object detection, and zero‑shot classification. Each data instance includes detailed image and text information, along with metadata such as image dimensions, text description, and similarity scores between text and image.
Description
GRIT: Large‑Scale Training Corpus of Grounded Image‑Text Pairs
Dataset Description
- Name: GRIT
- Language: English
- Multilinguality: Monolingual
- Size: 100M < n < 1B
- Source Dataset: COYO‑700M
- License: MS‑PL
- Tags:
- Image‑Text Bounding‑Box Pairs
- Image‑Text Pairs
- Task Types:
- Text‑to‑Image
- Image‑to‑Text
- Object Detection
- Zero‑Shot Classification
- Task IDs:
- Image Caption Generation
- Visual Question Answering
Dataset Overview
GRIT is a large‑scale image‑text pair dataset built on COYO‑700M and LAION‑2B. It extracts and links textual fragments (such as noun phrases and referring expressions) to corresponding image regions, supporting various position‑aware uni‑/multi‑modal tasks such as phrase grounding, referring expression comprehension, referring expression generation, and open‑world object detection.
Data Instances
Each data instance contains the following fields:
key: filename (ignored)clip_similarity_vitb32: cosine similarity between text and image (ViT‑B/32) embeddingsclip_similarity_vitl14: cosine similarity between text and image (ViT‑L/14) embeddingsid: unique IDurl: image URLcaption: corresponding captionwidth: image widthheight: image heightnoun_chunks: noun phrases with associated bounding boxesref_exps: corresponding referring expressions
Image Download
It is recommended to use the img2dataset tool to download images. Steps include downloading metadata, installing img2dataset, and then using the provided command‑line arguments to retrieve the images.
Citation
When using this dataset, please cite the associated papers and the COYO‑700M dataset.
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.