CreativeLang/SARC_Sarcasm
This dataset is a large‑scale corpus for sarcasm research and for training and evaluating sarcasm detection systems. It contains 1.3 million sarcastic statements—ten times larger than any previous dataset—and a larger number of non‑sarcastic statements, enabling learning under both balanced and imbalanced label regimes. Each statement is self‑annotated (the sarcasm label is provided by the author rather than an external annotator) and includes user, topic, and dialogue context. The dataset’s accuracy has been evaluated, a sarcasm detection benchmark established, and baseline methods assessed.
Description
Dataset Overview
Dataset Name
- Name: SARC_Sarcasm
Dataset Features
- Feature List:
text: stringauthor: stringscore: int64ups: int64downs: int64date: stringcreated_utc: int64subreddit: stringid: string
Dataset Splits
- Training Set:
- Samples: 12,704,751
- Size: 1,764,500,045 bytes
Dataset Size
- Download Size: 903,559,115 bytes
- Total Size: 1,764,500,045 bytes
License
- License Type: cc-by-2.0
Dataset Description
- Purpose: For sarcasm research and training/evaluating sarcasm detection systems
- Scale: 1.3 million sarcastic statements, ten times larger than any previous dataset
- Annotation Method: Self‑annotation by authors
- Content: Includes user, topic, and dialogue context information
- Evaluation & Benchmark: Accuracy evaluated, sarcasm detection benchmark established
Dataset Metadata
- Type: Sarcasm
- Task Type: Detection
- Creation Year: 2018
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.