JUHE API Marketplace
DATASET
Open Source Community

Open Images dataset

Open Images is a dataset containing approximately 9 million image URLs, annotated with over 6 000 category labels. The dataset is split into training and validation sets, and each image may have one or multiple labels; label information can be obtained via CSV files.

Updated 1/9/2017
github

Description

Open Images Dataset Overview

Dataset Description

  • Scale: Approximately 9 million image URLs.
  • Labels: Contains over 6 000 categories.
  • Label type: Represented by mids from Freebase or Google Knowledge Graph API.
  • Number of labels: 7 844 distinct labels, of which about 6 000 are considered trainable.
  • Images: 9 018 219 training images, 167 057 validation images.
  • Annotation: Each image may have one or multiple labels; annotations are provided in CSV files.

Dataset Content

  • Training set: 9 018 219 images with machine‑generated annotations.
  • Validation set: 167 057 images with both machine‑generated and human annotations.
  • Human annotation quality: Positive labels are 1.0, negative labels 0.0; machine annotation confidence ranges from 0.0 to 1.0.

Dataset Organization

  • Files: Two CSV files – images.csv (contains image URLs, OpenImages ID, title, author, license) and labels.csv (maps labels to image IDs with confidence scores).
  • Download:
    • Images and metadata: link (654 MB)
    • Machine image‑level annotations: link (330 MB)
    • Human image‑level annotations: link (7 MB)

Data Quality

  • Label distribution: Highly imbalanced; some labels appear in over a million images, others in fewer than 100.
  • Annotation accuracy: Machine annotations are noisy, but accuracy improves with the number of associated images.

Model Applications

  • Models such as Inception v3 have been trained on Open Images annotations and can be fine‑tuned for downstream tasks, as well as used for advanced applications like DeepDream and style transfer.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Image Recognition
Machine Learning

Source

Organization: github

Created: 10/1/2016

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.