tartuNLP/reddit-anhedonia
The PRIMATE dataset focuses on detecting anhedonia (loss of interest or pleasure) in mental‑health contexts. Re‑annotation by mental‑health professionals provides finer‑grained labels and textual evidence, revealing many false‑positive cases and resulting in a higher‑quality test set for anhedonia detection. The study highlights the necessity of addressing annotation quality in mental‑health datasets and advocates improved methods to enhance the reliability of NLP models for mental‑health assessment. Access to the PRIMATE dataset is required first, after which provided scripts can be used for label mapping. The dataset was created by extracting Reddit posts from the original PRIMATE collection and annotating them by mental‑health professionals. Only labels are included; the original post content is omitted.
Description
PRIMATE Dataset Overview
Dataset Description
Dataset Summary
- Topic: Addresses annotation quality issues in the PRIMATE dataset, especially the lack of annotations for anhedonia symptoms.
- Improvement: Re‑annotation by mental‑health professionals introduces finer‑grained labels and textual spans as evidence, identifying a large number of false positives.
- Purpose: Provides a higher‑quality test set for detecting anhedonia, emphasizing the need to resolve annotation quality problems in mental‑health datasets.
Using the Dataset
- Dataset File: Place the
primate_dataset.jsonfile in the script directory. - Code Example: Use Python code to map dataset labels to PRIMATE posts.
Language
- Language: English
Dataset Structure
Data Instances
- Example:
{ "primate_id": 1394, "answerable": 0, "mentioned": 1, "writer_symptom": 1, "quote": [ [ 1537, 1710 ] ] }
Data Fields
- Information pending
Data Splits
- Information pending
Dataset Creation
Source Data
- Source: Extracted Reddit posts from the original PRIMATE dataset.
- Acquisition: Requires agreement to PRIMATE's terms and conditions and follows a specified acquisition workflow.
Annotation
- Process: Mental‑health professionals read all posts and label the presence of anhedonia symptoms.
- Label Definitions:
- "mentioned": Symptom is mentioned in the text but duration or intensity cannot be inferred.
- "answerable": Clear evidence of anhedonia is present.
- "writer_symptom": The author discusses their own or a third‑party's symptom.
- Annotator: The second author, who is also a clinical‑psychology intern.
Personal and Sensitive Information
- Protection: No original posts are released; only annotation results are published.
Usage Considerations
Bias Discussion
- Limitation: Manual annotation serves as a proxy for clinical evaluation of Reddit posts as depressive indicators.
- Annotator Single‑Source: Only one mental‑health professional performed the annotation, precluding inter‑annotator agreement analysis.
- Label Limitation: Binary labels may not suit cases where symptom presence/absence is ambiguous; a Likert scale is recommended.
Citation Information
- Citation: If used in research, please cite the associated paper.
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.