Back to datasets
Dataset assetOpen Source CommunityText GenerationAdult Content

chinese_porn_novel

The xbookcn_short_story dataset contains Chinese short stories for text generation tasks. Each story is split into multiple chunks, and the Qwen‑instruct model generates four summaries of varying lengths. Features include source, category, title, content, content length, URL, and four summaries. The dataset size ranges from 100 MB to 1 GB; the training set comprises 627,195 samples.

Source
huggingface
Created
Nov 13, 2024
Updated
Nov 13, 2024
Signals
184 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Basic Information

  • Language: Chinese
  • Dataset Size: 100M<n<1B
  • Task Category: Text Generation
  • Tag: Art

Dataset Configuration

  • Configuration Name: xbookcn_short_story
  • Default Configuration: Yes

Dataset Features

  • source: string
  • category: string
  • title: string
  • content: string
  • content_length: unsigned 32‑bit integer
  • url: string
  • summary1: string
  • summary2: string
  • summary3: string
  • summary4: string

Dataset Split

  • Training Set:
    • Number of Samples: 627,195
    • Bytes: 1,167,355,353

Dataset Files

  • Download Size: 721,183,317
  • Dataset Size: 1,167,355,353

Data File Paths

  • Training Set Path: xbookcn_short_story/train-*

Intended Uses

  • Used to build specialized GPT language models.
  • Each story is chunked and Qwen‑instruct generates four summaries per chunk.

Summary Generation Rules

  • Summary 1:
    • Produce 3–7 short sentences based on text length.
    • Each sentence about 10 characters.
  • Summary 2:
    • Produce 2–4 short sentences.
    • Each sentence about 15 characters.
  • Summary 3:
    • Produce 2–4 short sentences.
    • Each sentence about 10 characters.
  • Summary 4:
    • Produce 3–5 short sentences.
    • Each sentence about 10 characters.
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio