Explore high-quality datasets for your AI and machine learning projects.
TAL‑SCQ5K is a high‑quality mathematics competition dataset created by TAL Education Group, containing English (TAL‑SCQ5K‑EN) and Chinese (TAL‑SCQ5K‑CN) versions, each with 5,000 items (3,000 training and 2,000 testing). The items are multiple‑choice questions covering primary, middle, and high‑school mathematics topics, and provide detailed solution steps to facilitate chain‑of‑thought (CoT) training. All mathematical expressions are rendered in standard LaTeX text format.
The dataset contains two features: instruction and output, both of type string. It is split into a training set with 682 samples and a test set with 293 samples. The total size of the dataset is 2,205,691 bytes, and the download size is 1,042,346 bytes. The dataset configuration includes a default configuration, specifying the paths for the training and test data.