math-eval/TAL-SCQ5K
TAL‑SCQ5K is a high‑quality mathematics competition dataset created by TAL Education Group, containing English (TAL‑SCQ5K‑EN) and Chinese (TAL‑SCQ5K‑CN) versions, each with 5,000 items (3,000 training and 2,000 testing). The items are multiple‑choice questions covering primary, middle, and high‑school mathematics topics, and provide detailed solution steps to facilitate chain‑of‑thought (CoT) training. All mathematical expressions are rendered in standard LaTeX text format.
Dataset description and usage context
Dataset Overview
Dataset Name: TAL‑SCQ5K
Languages: English (TAL‑SCQ5K‑EN) and Chinese (TAL‑SCQ5K‑CN)
Number of Questions: 5,000 per language (3,000 training, 2,000 testing)
Question Type: Single‑choice, covering elementary, middle‑school, and high‑school mathematics topics.
Dataset Structure:
- Data Instances: Each instance includes question ID, difficulty, question type, problem statement, answer option list, knowledge point routes, answer analysis, and correct answer.
- Fields:
difficulty: difficulty level (0‑4)qtype: always "single_choice"problem: mathematical problem descriptionanswer_option_list: list of answer optionsknowledge_point_routes: hierarchical knowledge pointsanswer_analysis: detailed solution (used for CoT training)answer_value: correct option
Data Split:
| Split | Training | Test |
|---|---|---|
| EN | 3K | 2K |
| CN | 3K | 2K |
Usage: Load via load_dataset() and select either the EN or CN subset.
License: MIT License.
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.