JUHE API Marketplace
DATASET
Open Source Community

CreativeLang/SARC_Sarcasm

This dataset is a large‑scale corpus for sarcasm research and for training and evaluating sarcasm detection systems. It contains 1.3 million sarcastic statements—ten times larger than any previous dataset—and a larger number of non‑sarcastic statements, enabling learning under both balanced and imbalanced label regimes. Each statement is self‑annotated (the sarcasm label is provided by the author rather than an external annotator) and includes user, topic, and dialogue context. The dataset’s accuracy has been evaluated, a sarcasm detection benchmark established, and baseline methods assessed.

Updated 7/11/2023
hugging_face

Description

Dataset Overview

Dataset Name

  • Name: SARC_Sarcasm

Dataset Features

  • Feature List:
    • text: string
    • author: string
    • score: int64
    • ups: int64
    • downs: int64
    • date: string
    • created_utc: int64
    • subreddit: string
    • id: string

Dataset Splits

  • Training Set:
    • Samples: 12,704,751
    • Size: 1,764,500,045 bytes

Dataset Size

  • Download Size: 903,559,115 bytes
  • Total Size: 1,764,500,045 bytes

License

  • License Type: cc-by-2.0

Dataset Description

  • Purpose: For sarcasm research and training/evaluating sarcasm detection systems
  • Scale: 1.3 million sarcastic statements, ten times larger than any previous dataset
  • Annotation Method: Self‑annotation by authors
  • Content: Includes user, topic, and dialogue context information
  • Evaluation & Benchmark: Accuracy evaluated, sarcasm detection benchmark established

Dataset Metadata

  • Type: Sarcasm
  • Task Type: Detection
  • Creation Year: 2018

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Sarcasm Detection
Natural Language Processing

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.