Dataset Catalog

Browse trusted datasets for evaluation, enrichment, and production use.

Category index
Showing 4 of 4 datasets
Category: Mental Health

Sulav/mental_health_counseling_conversations_sharegpt

Mental HealthDialogue Data

The dataset comprises mental‑health counseling dialogues. Primary features include `Context`, `Response`, and `conversations`. Each conversation entry contains a `from` field (sender) and a `value` field (content). The training split contains 3,512 samples with a total size of 9,356,552 bytes.

Source hugging_faceUpdated Mar 8, 2024147 viewsLinked
Inspect dataset

tartuNLP/reddit-anhedonia

Mental HealthText Classification

The PRIMATE dataset focuses on detecting anhedonia (loss of interest or pleasure) in mental‑health contexts. Re‑annotation by mental‑health professionals provides finer‑grained labels and textual evidence, revealing many false‑positive cases and resulting in a higher‑quality test set for anhedonia detection. The study highlights the necessity of addressing annotation quality in mental‑health datasets and advocates improved methods to enhance the reliability of NLP models for mental‑health assessment. Access to the PRIMATE dataset is required first, after which provided scripts can be used for label mapping. The dataset was created by extracting Reddit posts from the original PRIMATE collection and annotating them by mental‑health professionals. Only labels are included; the original post content is omitted.

Source hugging_faceUpdated Jul 1, 2024140 viewsLinked
Inspect dataset

EATD-Corpus

Mental HealthData Analysis

EATD-Corpus is a dataset of audio and text files from 162 volunteers who received counseling. The training set contains data from 83 volunteers (19 depressed and 64 non‑depressed), and the validation set contains data from 79 volunteers (11 depressed and 68 non‑depressed). Each folder contains a volunteer’s depression data, including raw audio, preprocessed audio, audio transcripts, and depression scores.

Source githubUpdated Jul 10, 20231,042 viewsLinked
Inspect dataset

PsyDTCorpus

Mental HealthNatural Language Processing

PsyDTCorpus is a high‑quality multi‑turn psychological‑health dialogue dataset created by a team at South China University of Technology. It aims to simulate the personalized counseling style of a specific therapist. The dataset contains 5,000 single‑turn long‑text dialogues generated in a single pass with GPT‑4, modeling the five major personality traits of clients and synthesizing multi‑turn conversations. The creation process combines real‑world counseling cases to ensure complexity and diversity. PsyDTCorpus is mainly applied in psychological counseling, seeking to improve the performance of LLMs for mental‑health support by providing personalized counseling styles, addressing the lack of personalization in existing models.

Source arXivUpdated Dec 18, 20241,128 viewsLinked
Inspect dataset