Back to datasets
Dataset assetOpen Source CommunityText ClassificationPornographic Content Filtering

nsfw

This dataset contains erotic stories that have been cleaned, deduplicated, and depolluted, intended for training text‑filtering classifiers. The data originates from the HuggingFace datasets bluuwhale/nsfwstory and bluuwhale/nsfwstory2. The dataset comprises 49,579 samples, and the downloaded parquet file is 646 MB.

Source
huggingface
Created
Jan 1, 2025
Updated
Jan 11, 2025
Signals
1,165 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Name

Geralt-Targaryen/nsfw

Dataset Description

This dataset contains cleaned, deduplicated, and depolluted NSFW (Not Safe For Work) stories, intended for training text‑filtering classifiers.

Dataset Source

Dataset Scale

  • Number of samples: 49,579
  • Downloaded parquet file size: 646 M

License

Apache-2.0

Warning

This dataset contains explicit sexual content.

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio