Back to datasets
Dataset assetOpen Source CommunityText Safety Detection

nsfw_detection

The dataset includes two configurations: `nsfw_detection_test_v1` and `nsfw_detection_v1`. `nsfw_detection_test_v1` provides a test split with 10,000 samples, each containing a text and a label (0 for safe, 1 for nsfw). `nsfw_detection_v1` includes a training split with 845,904 samples and a validation split with 10,000 samples, both following the same format. The dataset is primarily used for detecting unsafe content in text.

Source
huggingface
Created
Aug 22, 2024
Updated
Aug 22, 2024
Signals
118 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Configurations

Configuration Name: nsfw_detection_test_v1

  • Features:
    • text: string
    • label: categorical with two classes:
      • 0: safe
      • 1: nsfw
    • __index_level_0__: integer
  • Splits:
    • test: 10,000 samples, 9,258,616 bytes
  • Download Size: 5,981,940 bytes
  • Dataset Size: 9,258,616 bytes

Configuration Name: nsfw_detection_v1

  • Features:
    • text: string
    • label: categorical with two classes:
      • 0: safe
      • 1: nsfw
    • __index_level_0__: integer
  • Splits:
    • train: 845,904 samples, 776,291,817 bytes
    • val: 10,000 samples, 9,258,616 bytes
  • Download Size: 506,877,225 bytes
  • Dataset Size: 785,550,433 bytes

Data Files

Configuration Name: nsfw_detection_test_v1

  • Splits:
    • test: file path nsfw_detection_test_v1/test-*

Configuration Name: nsfw_detection_v1

  • Splits:
    • train: file path nsfw_detection_v1/train-*
    • val: file path nsfw_detection_v1/val-*
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio