JUHE API Marketplace
DATASET
Open Source Community

zzliang/GRIT

GRIT is a large‑scale image‑text pair dataset built on COYO‑700M and LAION‑2B, focusing on precise grounding of text to image regions. The dataset extracts and links textual fragments (such as noun phrases and referring expressions) to corresponding image areas, supporting tasks including image captioning, visual question answering, object detection, and zero‑shot classification. Each data instance includes detailed image and text information, along with metadata such as image dimensions, text description, and similarity scores between text and image.

Updated 7/4/2023
hugging_face

Description

GRIT: Large‑Scale Training Corpus of Grounded Image‑Text Pairs

Dataset Description

  • Name: GRIT
  • Language: English
  • Multilinguality: Monolingual
  • Size: 100M < n < 1B
  • Source Dataset: COYO‑700M
  • License: MS‑PL
  • Tags:
    • Image‑Text Bounding‑Box Pairs
    • Image‑Text Pairs
  • Task Types:
    • Text‑to‑Image
    • Image‑to‑Text
    • Object Detection
    • Zero‑Shot Classification
  • Task IDs:
    • Image Caption Generation
    • Visual Question Answering

Dataset Overview

GRIT is a large‑scale image‑text pair dataset built on COYO‑700M and LAION‑2B. It extracts and links textual fragments (such as noun phrases and referring expressions) to corresponding image regions, supporting various position‑aware uni‑/multi‑modal tasks such as phrase grounding, referring expression comprehension, referring expression generation, and open‑world object detection.

Data Instances

Each data instance contains the following fields:

  • key: filename (ignored)
  • clip_similarity_vitb32: cosine similarity between text and image (ViT‑B/32) embeddings
  • clip_similarity_vitl14: cosine similarity between text and image (ViT‑L/14) embeddings
  • id: unique ID
  • url: image URL
  • caption: corresponding caption
  • width: image width
  • height: image height
  • noun_chunks: noun phrases with associated bounding boxes
  • ref_exps: corresponding referring expressions

Image Download

It is recommended to use the img2dataset tool to download images. Steps include downloading metadata, installing img2dataset, and then using the provided command‑line arguments to retrieve the images.

Citation

When using this dataset, please cite the associated papers and the COYO‑700M dataset.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Image Recognition
Natural Language Processing

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.