Dataset assetOpen Source CommunityArtificial IntelligenceEvaluation Dataset

OlympiadBench

OlympiadBench is an Olympic‑level bilingual multimodal scientific benchmark, containing 8,476 questions from Olympic‑level mathematics and physics competitions, including the Chinese Gaokao. Each question is accompanied by expert‑level step‑by‑step reasoning annotations. The dataset aims to challenge and advance AGI development through complex scientific problems. The best‑performing model, GPT‑4V, achieved an average score of only 17.97% on this benchmark, dropping to as low as 10.74% in the physics domain, highlighting the benchmark's rigor and the complexity of physical reasoning.

Source

huggingface

Created

Jul 16, 2024

Updated

Jul 17, 2024

Signals

546 views

Availability

Linked source ready

Overview

Dataset description and usage context

Dataset Overview

Dataset Name

OlympiadBench

Dataset Description

OlympiadBench is an Olympic‑level bilingual multimodal scientific question benchmark, containing 8,476 questions from Olympic‑level mathematics and physics competitions, including the Chinese Gaokao. Each question includes expert‑level step‑by‑step reasoning annotations. The best‑performing model, GPT‑4V, achieved an average score of only 17.97% on OlympiadBench, and as low as 10.74% in the physics domain, highlighting the benchmark's rigor and the complexity of physical reasoning.

Dataset Configuration

OE_MM_maths_en_COMP: OlympiadBench/OE_MM_maths_en_COMP/OE_MM_maths_en_COMP.parquet
OE_MM_maths_zh_CEE: OlympiadBench/OE_MM_maths_zh_CEE/OE_MM_maths_zh_CEE.parquet
OE_MM_maths_zh_COMP: OlympiadBench/OE_MM_maths_zh_COMP/OE_MM_maths_zh_COMP.parquet
OE_MM_physics_en_COMP: OlympiadBench/OE_MM_physics_en_COMP/OE_MM_physics_en_COMP.parquet
OE_MM_physics_zh_CEE: OlympiadBench/OE_MM_physics_zh_CEE/OE_MM_physics_zh_CEE.parquet
OE_TO_maths_en_COMP: OlympiadBench/OE_TO_maths_en_COMP/OE_TO_maths_en_COMP.parquet
OE_TO_maths_zh_CEE: OlympiadBench/OE_TO_maths_zh_CEE/OE_TO_maths_zh_CEE.parquet
OE_TO_maths_zh_COMP: OlympiadBench/OE_TO_maths_zh_COMP/OE_TO_maths_zh_COMP.parquet
OE_TO_physics_en_COMP: OlympiadBench/OE_TO_physics_en_COMP/OE_TO_physics_en_COMP.parquet
OE_TO_physics_zh_CEE: OlympiadBench/OE_TO_physics_zh_CEE/OE_TO_physics_zh_CEE.parquet
TP_MM_maths_en_COMP: OlympiadBench/TP_MM_maths_en_COMP/TP_MM_maths_en_COMP.parquet
TP_MM_maths_zh_CEE: OlympiadBench/TP_MM_maths_zh_CEE/TP_MM_maths_zh_CEE.parquet
TP_MM_maths_zh_COMP: OlympiadBench/TP_MM_maths_zh_COMP/TP_MM_maths_zh_COMP.parquet
TP_MM_physics_en_COMP: OlympiadBench/TP_MM_physics_en_COMP/TP_MM_physics_en_COMP.parquet
TP_TO_maths_en_COMP: OlympiadBench/TP_TO_maths_en_COMP/TP_TO_maths_en_COMP.parquet
TP_TO_maths_zh_CEE: OlympiadBench/TP_TO_maths_zh_CEE/TP_TO_maths_zh_CEE.parquet
TP_TO_maths_zh_COMP: OlympiadBench/TP_TO_maths_zh_COMP/TP_TO_maths_zh_COMP.parquet
TP_TO_physics_en_COMP: OlympiadBench/TP_TO_physics_en_COMP/TP_TO_physics_en_COMP.parquet

Task Types

Question Answering
Visual Question Answering

Languages

Chinese
English

Dataset Size

10M<n<100M

License

apache-2.0

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio

OlympiadBench

Dataset description and usage context

Dataset Overview

Dataset Name

Dataset Description

Dataset Configuration

Task Types

Languages

Tags

Dataset Size

License

Pair the dataset with AI analysis and content workflows.