JUHE API Marketplace
DATASET
Open Source Community

pixelprose

PixelProse is a comprehensive dataset containing 16 million synthetically generated image captions created with the Gemini 1.0 Pro Vision model. The dataset provides rich variables such as image unique identifiers, URLs, captioning model, and caption text, and supports multiple download and usage options.

Updated 6/18/2024
huggingface

Description

PixelProse Dataset Overview

Basic Information

  • License: cc-by-4.0
  • Task Categories:
    • Image‑to‑Text
    • Text‑to‑Image
    • Visual Question Answering
  • Language: English
  • Tag: croissant
  • Name: PixelProse
  • Size Category: 10M<n<100M

Configuration

  • Default Config:
    • Training Set: data/vlm_captions_*.parquet
    • CC12M: data/vlm_captions_cc12m_*.parquet
    • CommonPool: data/vlm_captions_common-pool_*.parquet
    • RedCaps: data/vlm_captions_redcaps_*.parquet

Details

  • Total Image‑Text Pairs: 16,896,214 (16.9M)
    • CommonPool: 6,538,898 (6.5M)
    • CC12M: 9,066,455 (9.1M)
    • RedCaps: 1,290,861 (1.3M)

Data Download

  • Parquet Files:
    • Via Git LFS:
      git lfs install
      git clone https://huggingface.co/datasets/tomg-group-umd/pixelprose
      
    • Via HuggingFace API:
      from datasets import load_dataset
      ds = load_dataset("tomg-group-umd/pixelprose")
      
    • Direct Link: access the data directory to download required files.

Columns

  • uid: unique image identifier
  • url: image URL
  • key: image‑related key
  • status: status returned by vlm_model
  • original_caption: original inherited caption
  • vlm_model: model used for captioning
  • vlm_caption: dense caption from PixelProse
  • toxicity: score for general harmful behavior
  • severe_toxicity: score for extremely harmful or abusive language
  • obscene: score for obscene or inappropriate language
  • identity_attack: score for language targeting individuals or groups based on identity
  • insult: score for language meant to insult or demean
  • threat: score for language conveying threats of harm
  • sexual_explicit: score for language containing explicit sexual content
  • watermark_class_id: watermark class (0 = watermarked image, 1 = non‑watermarked, 2 = non‑watermarked with text)
  • watermark_class_score: prediction scores for each watermark class, range [0, 1]
  • aesthetic_score: aesthetic rating, range [0, 10]
  • error_message: error message returned by vlm_model
  • width / height: image dimensions used for running vlm_model
  • original_width / original_height: original image dimensions
  • exif: EXIF metadata of the image file
  • sha256: SHA256 hash of the image file
  • image_id, author, subreddit, score: attributes inherited from RedCaps (not available in CC12M and CommonPool)

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Image Processing
Natural Language Processing

Source

Organization: huggingface

Created: 6/14/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.