JUHE API Marketplace
DATASET
Open Source Community

stemdataset/STEM

The STEM dataset is a multimodal benchmark for testing neural models on science, technology, engineering, and mathematics (STEM) skills. It contains 448 skills and 1,073,146 questions covering all STEM subjects. Unlike existing datasets, it requires models to understand multimodal visual‑language information and is based on K‑12 curricula. The dataset is split into training, validation, and test sets; the test set’s ground‑truth answers are hidden and can be evaluated via leaderboard submission. Each entry is a multimodal multiple‑choice question with a description, image, options, and the correct answer index.

Updated 4/30/2024
hugging_face

Description

STEM Dataset Overview

Basic Information

  • License: Apache‑2.0
  • Language: English
  • Scale: 1M < n < 10M
  • Tags: STEM, Benchmark

Content

  • Type: Multimodal multiple‑choice
  • Subjects: Science, Technology, Engineering, Mathematics
  • Number of Skills: 448
  • Number of Questions: 1,073,146
  • Splits: Training, Validation, Test
  • Training Size: 644,797 questions
  • Validation Size: 214,272 questions
  • Test Size: 214,077 questions

Features

  • Schema:
DatasetDict({
    train: Dataset({
        features: [subject, grade, skill, pic_choice, pic_prob, problem, problem_pic, choices, choices_pic, answer_idx],
        num_rows: 644797
    })
    valid: Dataset({
        features: [subject, grade, skill, pic_choice, pic_prob, problem, problem_pic, choices, choices_pic, answer_idx],
        num_rows: 214272
    })
    test: Dataset({
        features: [subject, grade, skill, pic_choice, pic_prob, problem, problem_pic, choices, choices_pic, answer_idx],
        num_rows: 214077
    })
})
  • Feature Description:
    • subject: subject area
    • grade: educational grade level
    • skill: specific skill identifier
    • pic_choice: whether options are images
    • pic_prob: whether the problem includes an image
    • problem: textual description of the problem
    • problem_pic: associated image for the problem
    • choices: textual options
    • choices_pic: image options (if any)
    • answer_idx: index of the correct answer

Use Cases

  • Evaluation: Follow the code for dataset evaluation.

Contact

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

STEM Education
Neural Model Evaluation

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.