JUHE API Marketplace
High Quality Data

Dataset Hub

Explore high-quality datasets for your AI and machine learning projects.

Sort:

Browse by Category

lmms-lab/ScienceQA-IMG

Multimodal Models
Scientific QA

This is a formatted version of the [derek‑thomas/ScienceQA](https://huggingface.co/datasets/derek-thomas/ScienceQA) dataset that includes only image instances. It is used in the `lmms‑eval` pipeline to enable one‑click evaluation of large multimodal models. The dataset provides fields such as image, question, choices, answer, hint, task, grade, subject, topic, category, skill, lecture, and solution, and is split into training, validation, and test sets.

hugging_face
View Details

derek-thomas/ScienceQA

Scientific QA
Multimodal Reasoning

The ScienceQA dataset is a multimodal science question‑answering collection covering numerous domains such as chemistry, biology, physics, earth science, engineering, geography, history, civics, economics, global studies, grammar, writing, vocabulary, natural science, language science, and social science. The dataset comprises fields including images, questions, multiple‑choice options, answers, hints, task descriptions, grade levels, subjects, topics, categories, skills, lectures, and solutions. It is primarily intended for multimodal multiple‑choice tasks, supporting question answering (multiple choice, closed‑domain, open‑domain), visual question answering, and multi‑class classification. The dataset was created to diagnose AI systems’ multi‑hop reasoning capability and explainability, especially in scientific question answering. The language is English, with a scale ranging from 10 K to 100 K instances, split into training, validation, and test sets.

hugging_face
View Details