JUHE API Marketplace
DATASET
Open Source Community

math-eval/TAL-SCQ5K

TAL‑SCQ5K is a high‑quality mathematics competition dataset created by TAL Education Group, containing English (TAL‑SCQ5K‑EN) and Chinese (TAL‑SCQ5K‑CN) versions, each with 5,000 items (3,000 training and 2,000 testing). The items are multiple‑choice questions covering primary, middle, and high‑school mathematics topics, and provide detailed solution steps to facilitate chain‑of‑thought (CoT) training. All mathematical expressions are rendered in standard LaTeX text format.

Updated 9/15/2023
hugging_face

Description

Dataset Overview

Dataset Name: TAL‑SCQ5K

Languages: English (TAL‑SCQ5K‑EN) and Chinese (TAL‑SCQ5K‑CN)

Number of Questions: 5,000 per language (3,000 training, 2,000 testing)

Question Type: Single‑choice, covering elementary, middle‑school, and high‑school mathematics topics.

Dataset Structure:

  • Data Instances: Each instance includes question ID, difficulty, question type, problem statement, answer option list, knowledge point routes, answer analysis, and correct answer.
  • Fields:
    • difficulty: difficulty level (0‑4)
    • qtype: always "single_choice"
    • problem: mathematical problem description
    • answer_option_list: list of answer options
    • knowledge_point_routes: hierarchical knowledge points
    • answer_analysis: detailed solution (used for CoT training)
    • answer_value: correct option

Data Split:

SplitTrainingTest
EN3K2K
CN3K2K

Usage: Load via load_dataset() and select either the EN or CN subset.

License: MIT License.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Mathematics Competition
Educational Technology

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.