Back to datasets
Dataset assetOpen Source CommunityText GenerationAdult Content
chinese_porn_novel
The xbookcn_short_story dataset contains Chinese short stories for text generation tasks. Each story is split into multiple chunks, and the Qwen‑instruct model generates four summaries of varying lengths. Features include source, category, title, content, content length, URL, and four summaries. The dataset size ranges from 100 MB to 1 GB; the training set comprises 627,195 samples.
Source
huggingface
Created
Nov 13, 2024
Updated
Nov 13, 2024
Signals
184 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
Basic Information
- Language: Chinese
- Dataset Size: 100M<n<1B
- Task Category: Text Generation
- Tag: Art
Dataset Configuration
- Configuration Name: xbookcn_short_story
- Default Configuration: Yes
Dataset Features
- source: string
- category: string
- title: string
- content: string
- content_length: unsigned 32‑bit integer
- url: string
- summary1: string
- summary2: string
- summary3: string
- summary4: string
Dataset Split
- Training Set:
- Number of Samples: 627,195
- Bytes: 1,167,355,353
Dataset Files
- Download Size: 721,183,317
- Dataset Size: 1,167,355,353
Data File Paths
- Training Set Path: xbookcn_short_story/train-*
Intended Uses
- Used to build specialized GPT language models.
- Each story is chunked and Qwen‑instruct generates four summaries per chunk.
Summary Generation Rules
- Summary 1:
- Produce 3–7 short sentences based on text length.
- Each sentence about 10 characters.
- Summary 2:
- Produce 2–4 short sentences.
- Each sentence about 15 characters.
- Summary 3:
- Produce 2–4 short sentences.
- Each sentence about 10 characters.
- Summary 4:
- Produce 3–5 short sentences.
- Each sentence about 10 characters.
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.