Explore high-quality datasets for your AI and machine learning projects.
The TinyStories dataset is used to explore the capability boundaries of small language models (LMs), specifically studying how small LMs can still fluently tell stories. The stories are generated by GPT‑3.5 and GPT‑4, and the difficulty is limited to a level understandable by 3–4‑year‑old children. The Chinese stories are translations of the English stories using a machine translator.