Back to datasets
Dataset assetOpen Source CommunityNatural Language ProcessingVisual Reasoning
CLEVR
The CLEVR dataset is a diagnostic dataset for compositional language and elementary visual reasoning, designed to help researchers evaluate and develop models that can understand and answer questions about complex visual scenes.
Source
github
Created
May 22, 2019
Updated
Jan 16, 2020
Signals
282 views
Availability
Linked source ready
Overview
Dataset description and usage context
CLEVR Dataset Overview
Dataset Description
- Name: CLEVR Dataset
- Purpose: Diagnostic for compositional language and elementary visual reasoning
- Origin: Proposed by Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Fei‑Fei Li, C Lawrence Zitnick, Ross Girshick at CVPR 2017
Dataset Generation
- Image Generation: Images rendered with Blender; a JSON file containing scene information for each image is provided.
- Question Generation: Questions, functional programs, and answers are generated from the scene information; a JSON file containing all questions is provided.
Dataset Content Examples
- Image Examples: Several synthetic images such as
images/img1.png…images/img6.png. - Question & Answer Examples:
- Q: How many small spheres are there?
- A: 2
- Q: How many cubes are small objects or red metallic objects?
- A: 2
- Q: Do the metal sphere and the metal cylinder share the same color?
- A: Yes
- Q: Are there more small cylinders than metal objects?
- A: No
- Q: Is there a shiny cube to the right of the blue ball behind the large yellow object?
- A: Yes
Citation
@inproceedings{johnson2017clevr,
title={CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning},
author={Johnson, Justin and Hariharan, Bharath and van der Maaten, Laurens
and Fei‑Fei, Li and Zitnick, C Lawrence and Girshick, Ross},
booktitle={CVPR},
year={2017}
}
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.