Explore high-quality datasets for your AI and machine learning projects.
SCUT-EnsExam is a real‑world handwritten text erasure dataset designed for exam paper scenarios, containing 545 exam images. The dataset is randomly split into a training set of 430 images and a test set of 115 images.
TAL‑SCQ5K is a high‑quality mathematics competition dataset created by TAL Education Group, containing English (TAL‑SCQ5K‑EN) and Chinese (TAL‑SCQ5K‑CN) versions, each with 5,000 items (3,000 training and 2,000 testing). The items are multiple‑choice questions covering primary, middle, and high‑school mathematics topics, and provide detailed solution steps to facilitate chain‑of‑thought (CoT) training. All mathematical expressions are rendered in standard LaTeX text format.