Back to datasets
Dataset assetOpen Source CommunityText Safety Detection
nsfw_detection
The dataset includes two configurations: `nsfw_detection_test_v1` and `nsfw_detection_v1`. `nsfw_detection_test_v1` provides a test split with 10,000 samples, each containing a text and a label (0 for safe, 1 for nsfw). `nsfw_detection_v1` includes a training split with 845,904 samples and a validation split with 10,000 samples, both following the same format. The dataset is primarily used for detecting unsafe content in text.
Source
huggingface
Created
Aug 22, 2024
Updated
Aug 22, 2024
Signals
118 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
Dataset Configurations
Configuration Name: nsfw_detection_test_v1
- Features:
text: stringlabel: categorical with two classes:0: safe1: nsfw
__index_level_0__: integer
- Splits:
test: 10,000 samples, 9,258,616 bytes
- Download Size: 5,981,940 bytes
- Dataset Size: 9,258,616 bytes
Configuration Name: nsfw_detection_v1
- Features:
text: stringlabel: categorical with two classes:0: safe1: nsfw
__index_level_0__: integer
- Splits:
train: 845,904 samples, 776,291,817 bytesval: 10,000 samples, 9,258,616 bytes
- Download Size: 506,877,225 bytes
- Dataset Size: 785,550,433 bytes
Data Files
Configuration Name: nsfw_detection_test_v1
- Splits:
test: file pathnsfw_detection_test_v1/test-*
Configuration Name: nsfw_detection_v1
- Splits:
train: file pathnsfw_detection_v1/train-*val: file pathnsfw_detection_v1/val-*
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.