Back to datasets
Dataset assetOpen Source CommunityEducational TechnologyMathematics Competition

math-eval/TAL-SCQ5K

TAL‑SCQ5K is a high‑quality mathematics competition dataset created by TAL Education Group, containing English (TAL‑SCQ5K‑EN) and Chinese (TAL‑SCQ5K‑CN) versions, each with 5,000 items (3,000 training and 2,000 testing). The items are multiple‑choice questions covering primary, middle, and high‑school mathematics topics, and provide detailed solution steps to facilitate chain‑of‑thought (CoT) training. All mathematical expressions are rendered in standard LaTeX text format.

Source
hugging_face
Created
Nov 28, 2025
Updated
Sep 15, 2023
Signals
448 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Name: TAL‑SCQ5K

Languages: English (TAL‑SCQ5K‑EN) and Chinese (TAL‑SCQ5K‑CN)

Number of Questions: 5,000 per language (3,000 training, 2,000 testing)

Question Type: Single‑choice, covering elementary, middle‑school, and high‑school mathematics topics.

Dataset Structure:

  • Data Instances: Each instance includes question ID, difficulty, question type, problem statement, answer option list, knowledge point routes, answer analysis, and correct answer.
  • Fields:
    • difficulty: difficulty level (0‑4)
    • qtype: always "single_choice"
    • problem: mathematical problem description
    • answer_option_list: list of answer options
    • knowledge_point_routes: hierarchical knowledge points
    • answer_analysis: detailed solution (used for CoT training)
    • answer_value: correct option

Data Split:

SplitTrainingTest
EN3K2K
CN3K2K

Usage: Load via load_dataset() and select either the EN or CN subset.

License: MIT License.

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio