Back to datasets
Dataset assetOpen Source CommunityText ClassificationPornographic Content Filtering
nsfw
This dataset contains erotic stories that have been cleaned, deduplicated, and depolluted, intended for training text‑filtering classifiers. The data originates from the HuggingFace datasets bluuwhale/nsfwstory and bluuwhale/nsfwstory2. The dataset comprises 49,579 samples, and the downloaded parquet file is 646 MB.
Source
huggingface
Created
Jan 1, 2025
Updated
Jan 11, 2025
Signals
1,165 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
Dataset Name
Geralt-Targaryen/nsfw
Dataset Description
This dataset contains cleaned, deduplicated, and depolluted NSFW (Not Safe For Work) stories, intended for training text‑filtering classifiers.
Dataset Source
Dataset Scale
- Number of samples: 49,579
- Downloaded parquet file size: 646 M
License
Apache-2.0
Warning
This dataset contains explicit sexual content.
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.