JUHE API Marketplace
DATASET
Open Source Community

lmms-lab/ScienceQA-IMG

This is a formatted version of the [derek‑thomas/ScienceQA](https://huggingface.co/datasets/derek-thomas/ScienceQA) dataset that includes only image instances. It is used in the `lmms‑eval` pipeline to enable one‑click evaluation of large multimodal models. The dataset provides fields such as image, question, choices, answer, hint, task, grade, subject, topic, category, skill, lecture, and solution, and is split into training, validation, and test sets.

Updated 3/8/2024
hugging_face

Description

Dataset Overview

Dataset Information

Features

  • image: image type
  • question: string type
  • choices: sequence of strings
  • answer: 8‑bit integer type
  • hint: string type
  • task: string type
  • grade: string type
  • subject: string type
  • topic: string type
  • category: string type
  • skill: string type
  • lecture: string type
  • solution: string type

Data Splits

  • train:
    • Size (bytes): 206 256 098.99
    • Samples: 6 218
  • validation:
    • Size (bytes): 69 283 708.63
    • Samples: 2 097
  • test:
    • Size (bytes): 65 753 122.30
    • Samples: 2 017

Data Size

  • Download size: 663 306 124
  • Dataset size: 341 292 929.92

Configuration

  • config_name: default
    • data_files:
      • train: data/train-*
      • validation: data/validation-*
      • test: data/test-*

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Multimodal Models
Scientific QA

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.