Back to datasets
Dataset assetOpen Source CommunitySTEM EducationNeural Model Evaluation

stemdataset/STEM

The STEM dataset is a multimodal benchmark for testing neural models on science, technology, engineering, and mathematics (STEM) skills. It contains 448 skills and 1,073,146 questions covering all STEM subjects. Unlike existing datasets, it requires models to understand multimodal visual‑language information and is based on K‑12 curricula. The dataset is split into training, validation, and test sets; the test set’s ground‑truth answers are hidden and can be evaluated via leaderboard submission. Each entry is a multimodal multiple‑choice question with a description, image, options, and the correct answer index.

Source
hugging_face
Created
Nov 28, 2025
Updated
Apr 30, 2024
Signals
236 views
Availability
Linked source ready
Overview

Dataset description and usage context

STEM Dataset Overview

Basic Information

  • License: Apache‑2.0
  • Language: English
  • Scale: 1M < n < 10M
  • Tags: STEM, Benchmark

Content

  • Type: Multimodal multiple‑choice
  • Subjects: Science, Technology, Engineering, Mathematics
  • Number of Skills: 448
  • Number of Questions: 1,073,146
  • Splits: Training, Validation, Test
  • Training Size: 644,797 questions
  • Validation Size: 214,272 questions
  • Test Size: 214,077 questions

Features

  • Schema:
DatasetDict({
    train: Dataset({
        features: [subject, grade, skill, pic_choice, pic_prob, problem, problem_pic, choices, choices_pic, answer_idx],
        num_rows: 644797
    })
    valid: Dataset({
        features: [subject, grade, skill, pic_choice, pic_prob, problem, problem_pic, choices, choices_pic, answer_idx],
        num_rows: 214272
    })
    test: Dataset({
        features: [subject, grade, skill, pic_choice, pic_prob, problem, problem_pic, choices, choices_pic, answer_idx],
        num_rows: 214077
    })
})
  • Feature Description:
    • subject: subject area
    • grade: educational grade level
    • skill: specific skill identifier
    • pic_choice: whether options are images
    • pic_prob: whether the problem includes an image
    • problem: textual description of the problem
    • problem_pic: associated image for the problem
    • choices: textual options
    • choices_pic: image options (if any)
    • answer_idx: index of the correct answer

Use Cases

  • Evaluation: Follow the code for dataset evaluation.

Contact

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio