Dataset assetOpen Source CommunityText Safety Detection

nsfw_detection

The dataset includes two configurations: `nsfw_detection_test_v1` and `nsfw_detection_v1`. `nsfw_detection_test_v1` provides a test split with 10,000 samples, each containing a text and a label (0 for safe, 1 for nsfw). `nsfw_detection_v1` includes a training split with 845,904 samples and a validation split with 10,000 samples, both following the same format. The dataset is primarily used for detecting unsafe content in text.

Source

huggingface

Created

Aug 22, 2024

Updated

Aug 22, 2024

Signals

118 views

Availability

Linked source ready

Overview

Dataset description and usage context

Dataset Overview

Dataset Configurations

Configuration Name: nsfw_detection_test_v1

Features:
- text: string
- label: categorical with two classes:
  - 0: safe
  - 1: nsfw
- __index_level_0__: integer
Splits:
- test: 10,000 samples, 9,258,616 bytes
Download Size: 5,981,940 bytes
Dataset Size: 9,258,616 bytes

Configuration Name: nsfw_detection_v1

Features:
- text: string
- label: categorical with two classes:
  - 0: safe
  - 1: nsfw
- __index_level_0__: integer
Splits:
- train: 845,904 samples, 776,291,817 bytes
- val: 10,000 samples, 9,258,616 bytes
Download Size: 506,877,225 bytes
Dataset Size: 785,550,433 bytes

Data Files

Configuration Name: nsfw_detection_test_v1

Splits:
- test: file path nsfw_detection_test_v1/test-*

Configuration Name: nsfw_detection_v1

Splits:
- train: file path nsfw_detection_v1/train-*
- val: file path nsfw_detection_v1/val-*

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio