stemdataset/STEM
The STEM dataset is a multimodal benchmark for testing neural models on science, technology, engineering, and mathematics (STEM) skills. It contains 448 skills and 1,073,146 questions covering all STEM subjects. Unlike existing datasets, it requires models to understand multimodal visual‑language information and is based on K‑12 curricula. The dataset is split into training, validation, and test sets; the test set’s ground‑truth answers are hidden and can be evaluated via leaderboard submission. Each entry is a multimodal multiple‑choice question with a description, image, options, and the correct answer index.
Description
STEM Dataset Overview
Basic Information
- License: Apache‑2.0
- Language: English
- Scale: 1M < n < 10M
- Tags: STEM, Benchmark
Content
- Type: Multimodal multiple‑choice
- Subjects: Science, Technology, Engineering, Mathematics
- Number of Skills: 448
- Number of Questions: 1,073,146
- Splits: Training, Validation, Test
- Training Size: 644,797 questions
- Validation Size: 214,272 questions
- Test Size: 214,077 questions
Features
- Schema:
DatasetDict({
train: Dataset({
features: [subject, grade, skill, pic_choice, pic_prob, problem, problem_pic, choices, choices_pic, answer_idx],
num_rows: 644797
})
valid: Dataset({
features: [subject, grade, skill, pic_choice, pic_prob, problem, problem_pic, choices, choices_pic, answer_idx],
num_rows: 214272
})
test: Dataset({
features: [subject, grade, skill, pic_choice, pic_prob, problem, problem_pic, choices, choices_pic, answer_idx],
num_rows: 214077
})
})
- Feature Description:
subject: subject areagrade: educational grade levelskill: specific skill identifierpic_choice: whether options are imagespic_prob: whether the problem includes an imageproblem: textual description of the problemproblem_pic: associated image for the problemchoices: textual optionschoices_pic: image options (if any)answer_idx: index of the correct answer
Use Cases
- Evaluation: Follow the code for dataset evaluation.
Contact
- Email: stemdataset@gmail.com
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.