High Quality Data

Dataset Hub

Explore high-quality datasets for your AI and machine learning projects.

Sort:

Browse by Category

52AI/TinyStoriesZh

The TinyStories dataset is used to explore the capability boundaries of small language models (LMs), specifically studying how small LMs can still fluently tell stories. The stories are generated by GPT‑3.5 and GPT‑4, and the difficulty is limited to a level understandable by 3–4‑year‑old children. The Chinese stories are translations of the English stories using a machine translator.

hugging_face

View Details