52AI/TinyStoriesZh
Language ModelsChildren's Stories
The TinyStories dataset is used to explore the capability boundaries of small language models (LMs), specifically studying how small LMs can still fluently tell stories. The stories are generated by GPT‑3.5 and GPT‑4, and the difficulty is limited to a level understandable by 3–4‑year‑old children. The Chinese stories are translations of the English stories using a machine translator.
Source hugging_faceUpdated Aug 19, 2023237 viewsLinked
Inspect dataset