Dataset assetOpen Source CommunityMultimodal ModelsScientific QA

lmms-lab/ScienceQA-IMG

This is a formatted version of the [derek‑thomas/ScienceQA](https://huggingface.co/datasets/derek-thomas/ScienceQA) dataset that includes only image instances. It is used in the `lmms‑eval` pipeline to enable one‑click evaluation of large multimodal models. The dataset provides fields such as image, question, choices, answer, hint, task, grade, subject, topic, category, skill, lecture, and solution, and is split into training, validation, and test sets.

Source

hugging_face

Created

Nov 28, 2025

Updated

Mar 8, 2024

Signals

302 views

Availability

Linked source ready

Overview

Dataset description and usage context

Dataset Overview

Dataset Information

Features

image: image type
question: string type
choices: sequence of strings
answer: 8‑bit integer type
hint: string type
task: string type
grade: string type
subject: string type
topic: string type
category: string type
skill: string type
lecture: string type
solution: string type

Data Splits

train:
- Size (bytes): 206 256 098.99
- Samples: 6 218
validation:
- Size (bytes): 69 283 708.63
- Samples: 2 097
test:
- Size (bytes): 65 753 122.30
- Samples: 2 017

Data Size

Download size: 663 306 124
Dataset size: 341 292 929.92

Configuration

config_name: default
- data_files:
  - train: data/train-*
  - validation: data/validation-*
  - test: data/test-*

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio