tartuNLP/reddit-anhedonia

The PRIMATE dataset focuses on detecting anhedonia (loss of interest or pleasure) in mental‑health contexts. Re‑annotation by mental‑health professionals provides finer‑grained labels and textual evidence, revealing many false‑positive cases and resulting in a higher‑quality test set for anhedonia detection. The study highlights the necessity of addressing annotation quality in mental‑health datasets and advocates improved methods to enhance the reliability of NLP models for mental‑health assessment. Access to the PRIMATE dataset is required first, after which provided scripts can be used for label mapping. The dataset was created by extracting Reddit posts from the original PRIMATE collection and annotating them by mental‑health professionals. Only labels are included; the original post content is omitted.

Updated 7/1/2024

hugging_face

Description

PRIMATE Dataset Overview

Dataset Description

Dataset Summary

Topic: Addresses annotation quality issues in the PRIMATE dataset, especially the lack of annotations for anhedonia symptoms.
Improvement: Re‑annotation by mental‑health professionals introduces finer‑grained labels and textual spans as evidence, identifying a large number of false positives.
Purpose: Provides a higher‑quality test set for detecting anhedonia, emphasizing the need to resolve annotation quality problems in mental‑health datasets.

Using the Dataset

Dataset File: Place the primate_dataset.json file in the script directory.
Code Example: Use Python code to map dataset labels to PRIMATE posts.

Language

Language: English

Dataset Structure

Data Instances

Example:

{
  "primate_id": 1394,
  "answerable": 0,
  "mentioned": 1,
  "writer_symptom": 1,
  "quote": [
    [
      1537,
      1710
    ]
  ]
}

Data Fields

Information pending

Data Splits

Information pending

Dataset Creation

Source Data

Source: Extracted Reddit posts from the original PRIMATE dataset.
Acquisition: Requires agreement to PRIMATE's terms and conditions and follows a specified acquisition workflow.

Annotation

Process: Mental‑health professionals read all posts and label the presence of anhedonia symptoms.
Label Definitions:
- "mentioned": Symptom is mentioned in the text but duration or intensity cannot be inferred.
- "answerable": Clear evidence of anhedonia is present.
- "writer_symptom": The author discusses their own or a third‑party's symptom.
Annotator: The second author, who is also a clinical‑psychology intern.

Personal and Sensitive Information

Protection: No original posts are released; only annotation results are published.

Usage Considerations

Bias Discussion

Limitation: Manual annotation serves as a proxy for clinical evaluation of Reddit posts as depressive indicators.
Annotator Single‑Source: Only one mental‑health professional performed the annotation, precluding inter‑annotator agreement analysis.
Label Limitation: Binary labels may not suit cases where symptom presence/absence is ambiguous; a Likert scale is recommended.

Citation Information

Citation: If used in research, please cite the associated paper.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Please login to view download links and access full dataset details.

Topics

Mental Health

Text Classification

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.

Check Prices →