Back to datasets
Dataset assetOpen Source CommunityMultimodal ModelsScientific QA

lmms-lab/ScienceQA-IMG

This is a formatted version of the [derek‑thomas/ScienceQA](https://huggingface.co/datasets/derek-thomas/ScienceQA) dataset that includes only image instances. It is used in the `lmms‑eval` pipeline to enable one‑click evaluation of large multimodal models. The dataset provides fields such as image, question, choices, answer, hint, task, grade, subject, topic, category, skill, lecture, and solution, and is split into training, validation, and test sets.

Source
hugging_face
Created
Nov 28, 2025
Updated
Mar 8, 2024
Signals
302 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Information

Features

  • image: image type
  • question: string type
  • choices: sequence of strings
  • answer: 8‑bit integer type
  • hint: string type
  • task: string type
  • grade: string type
  • subject: string type
  • topic: string type
  • category: string type
  • skill: string type
  • lecture: string type
  • solution: string type

Data Splits

  • train:
    • Size (bytes): 206 256 098.99
    • Samples: 6 218
  • validation:
    • Size (bytes): 69 283 708.63
    • Samples: 2 097
  • test:
    • Size (bytes): 65 753 122.30
    • Samples: 2 017

Data Size

  • Download size: 663 306 124
  • Dataset size: 341 292 929.92

Configuration

  • config_name: default
    • data_files:
      • train: data/train-*
      • validation: data/validation-*
      • test: data/test-*
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio