Dataset assetOpen Source CommunitySTEM EducationNeural Model Evaluation

stemdataset/STEM

The STEM dataset is a multimodal benchmark for testing neural models on science, technology, engineering, and mathematics (STEM) skills. It contains 448 skills and 1,073,146 questions covering all STEM subjects. Unlike existing datasets, it requires models to understand multimodal visual‑language information and is based on K‑12 curricula. The dataset is split into training, validation, and test sets; the test set’s ground‑truth answers are hidden and can be evaluated via leaderboard submission. Each entry is a multimodal multiple‑choice question with a description, image, options, and the correct answer index.

Source

hugging_face

Created

Nov 28, 2025

Updated

Apr 30, 2024

Signals

236 views

Availability

Linked source ready

Overview

Dataset description and usage context

STEM Dataset Overview

Basic Information

License: Apache‑2.0
Language: English
Scale: 1M < n < 10M
Tags: STEM, Benchmark

Content

Type: Multimodal multiple‑choice
Subjects: Science, Technology, Engineering, Mathematics
Number of Skills: 448
Number of Questions: 1,073,146
Splits: Training, Validation, Test
Training Size: 644,797 questions
Validation Size: 214,272 questions
Test Size: 214,077 questions

Features

Schema:

DatasetDict({
    train: Dataset({
        features: [subject, grade, skill, pic_choice, pic_prob, problem, problem_pic, choices, choices_pic, answer_idx],
        num_rows: 644797
    })
    valid: Dataset({
        features: [subject, grade, skill, pic_choice, pic_prob, problem, problem_pic, choices, choices_pic, answer_idx],
        num_rows: 214272
    })
    test: Dataset({
        features: [subject, grade, skill, pic_choice, pic_prob, problem, problem_pic, choices, choices_pic, answer_idx],
        num_rows: 214077
    })
})

Feature Description:
- subject: subject area
- grade: educational grade level
- skill: specific skill identifier
- pic_choice: whether options are images
- pic_prob: whether the problem includes an image
- problem: textual description of the problem
- problem_pic: associated image for the problem
- choices: textual options
- choices_pic: image options (if any)
- answer_idx: index of the correct answer

Use Cases

Evaluation: Follow the code for dataset evaluation.

Contact

Email: stemdataset@gmail.com

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio