math-eval/TAL-SCQ5K
TAL‑SCQ5K is a high‑quality mathematics competition dataset created by TAL Education Group, containing English (TAL‑SCQ5K‑EN) and Chinese (TAL‑SCQ5K‑CN) versions, each with 5,000 items (3,000 training and 2,000 testing). The items are multiple‑choice questions covering primary, middle, and high‑school mathematics topics, and provide detailed solution steps to facilitate chain‑of‑thought (CoT) training. All mathematical expressions are rendered in standard LaTeX text format.
Description
Dataset Overview
Dataset Name: TAL‑SCQ5K
Languages: English (TAL‑SCQ5K‑EN) and Chinese (TAL‑SCQ5K‑CN)
Number of Questions: 5,000 per language (3,000 training, 2,000 testing)
Question Type: Single‑choice, covering elementary, middle‑school, and high‑school mathematics topics.
Dataset Structure:
- Data Instances: Each instance includes question ID, difficulty, question type, problem statement, answer option list, knowledge point routes, answer analysis, and correct answer.
- Fields:
difficulty: difficulty level (0‑4)qtype: always "single_choice"problem: mathematical problem descriptionanswer_option_list: list of answer optionsknowledge_point_routes: hierarchical knowledge pointsanswer_analysis: detailed solution (used for CoT training)answer_value: correct option
Data Split:
| Split | Training | Test |
|---|---|---|
| EN | 3K | 2K |
| CN | 3K | 2K |
Usage: Load via load_dataset() and select either the EN or CN subset.
License: MIT License.
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.