Back to datasets
Dataset assetOpen Source CommunityNatural Language ProcessingSarcasm Detection

CreativeLang/SARC_Sarcasm

This dataset is a large‑scale corpus for sarcasm research and for training and evaluating sarcasm detection systems. It contains 1.3 million sarcastic statements—ten times larger than any previous dataset—and a larger number of non‑sarcastic statements, enabling learning under both balanced and imbalanced label regimes. Each statement is self‑annotated (the sarcasm label is provided by the author rather than an external annotator) and includes user, topic, and dialogue context. The dataset’s accuracy has been evaluated, a sarcasm detection benchmark established, and baseline methods assessed.

Source
hugging_face
Created
Nov 28, 2025
Updated
Jul 11, 2023
Signals
247 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Name

  • Name: SARC_Sarcasm

Dataset Features

  • Feature List:
    • text: string
    • author: string
    • score: int64
    • ups: int64
    • downs: int64
    • date: string
    • created_utc: int64
    • subreddit: string
    • id: string

Dataset Splits

  • Training Set:
    • Samples: 12,704,751
    • Size: 1,764,500,045 bytes

Dataset Size

  • Download Size: 903,559,115 bytes
  • Total Size: 1,764,500,045 bytes

License

  • License Type: cc-by-2.0

Dataset Description

  • Purpose: For sarcasm research and training/evaluating sarcasm detection systems
  • Scale: 1.3 million sarcastic statements, ten times larger than any previous dataset
  • Annotation Method: Self‑annotation by authors
  • Content: Includes user, topic, and dialogue context information
  • Evaluation & Benchmark: Accuracy evaluated, sarcasm detection benchmark established

Dataset Metadata

  • Type: Sarcasm
  • Task Type: Detection
  • Creation Year: 2018
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio