Back to datasets
Dataset assetOpen Source CommunityText GenerationChinese Stories

zhoukz/TinyStories-Qwen

A Chinese story dataset generated using Qwen series models, modeled after the TinyStories dataset. All data are AI‑generated; the dataset is unfiltered and does not guarantee uniform distribution, safety, harmlessness, or any other properties. The seed information used for generation was randomly selected without any specific meaning.

Source
hugging_face
Created
Nov 28, 2025
Updated
Jan 1, 2024
Signals
121 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

License

  • MIT License

Task Category

  • Text Generation

Language

  • Chinese

Configuration

  • Configuration Name: default
    • Data Files:
      • Training Set: data_???.jsonl
      • Validation Set: data_val_???.jsonl

Dataset Description

  • Chinese story collection generated using Qwen series models, modeled after the TinyStories dataset.
  • Dataset Characteristics:
    • Not a translation of the original dataset.
    • Does not follow the original dataset format.
    • All data are AI‑generated.
    • The dataset is unfiltered and does not guarantee uniform distribution, safety, harmlessness, or any other properties.
    • Seed information for generation is randomly selected, with no specific meaning.
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio