Back to datasets
Dataset assetOpen Source CommunityMultimodal ModelsScientific QA
lmms-lab/ScienceQA-IMG
This is a formatted version of the [derek‑thomas/ScienceQA](https://huggingface.co/datasets/derek-thomas/ScienceQA) dataset that includes only image instances. It is used in the `lmms‑eval` pipeline to enable one‑click evaluation of large multimodal models. The dataset provides fields such as image, question, choices, answer, hint, task, grade, subject, topic, category, skill, lecture, and solution, and is split into training, validation, and test sets.
Source
hugging_face
Created
Nov 28, 2025
Updated
Mar 8, 2024
Signals
302 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
Dataset Information
Features
- image: image type
- question: string type
- choices: sequence of strings
- answer: 8‑bit integer type
- hint: string type
- task: string type
- grade: string type
- subject: string type
- topic: string type
- category: string type
- skill: string type
- lecture: string type
- solution: string type
Data Splits
- train:
- Size (bytes): 206 256 098.99
- Samples: 6 218
- validation:
- Size (bytes): 69 283 708.63
- Samples: 2 097
- test:
- Size (bytes): 65 753 122.30
- Samples: 2 017
Data Size
- Download size: 663 306 124
- Dataset size: 341 292 929.92
Configuration
- config_name: default
- data_files:
- train: data/train-*
- validation: data/validation-*
- test: data/test-*
- data_files:
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.