High Quality Data

Dataset Hub

Explore high-quality datasets for your AI and machine learning projects.

Sort:

Browse by Category

Sulav/mental_health_counseling_conversations_sharegpt

The dataset comprises mental‑health counseling dialogues. Primary features include `Context`, `Response`, and `conversations`. Each conversation entry contains a `from` field (sender) and a `value` field (content). The training split contains 3,512 samples with a total size of 9,356,552 bytes.

hugging_face

View Details

tartuNLP/reddit-anhedonia

Mental Health

Text Classification

The PRIMATE dataset focuses on detecting anhedonia (loss of interest or pleasure) in mental‑health contexts. Re‑annotation by mental‑health professionals provides finer‑grained labels and textual evidence, revealing many false‑positive cases and resulting in a higher‑quality test set for anhedonia detection. The study highlights the necessity of addressing annotation quality in mental‑health datasets and advocates improved methods to enhance the reliability of NLP models for mental‑health assessment. Access to the PRIMATE dataset is required first, after which provided scripts can be used for label mapping. The dataset was created by extracting Reddit posts from the original PRIMATE collection and annotating them by mental‑health professionals. Only labels are included; the original post content is omitted.

hugging_face

View Details

EATD-Corpus

Mental Health

Data Analysis

EATD-Corpus is a dataset of audio and text files from 162 volunteers who received counseling. The training set contains data from 83 volunteers (19 depressed and 64 non‑depressed), and the validation set contains data from 79 volunteers (11 depressed and 68 non‑depressed). Each folder contains a volunteer’s depression data, including raw audio, preprocessed audio, audio transcripts, and depression scores.

github

View Details

PsyDTCorpus

Mental Health

Natural Language Processing

PsyDTCorpus is a high‑quality multi‑turn psychological‑health dialogue dataset created by a team at South China University of Technology. It aims to simulate the personalized counseling style of a specific therapist. The dataset contains 5,000 single‑turn long‑text dialogues generated in a single pass with GPT‑4, modeling the five major personality traits of clients and synthesizing multi‑turn conversations. The creation process combines real‑world counseling cases to ensure complexity and diversity. PsyDTCorpus is mainly applied in psychological counseling, seeking to improve the performance of LLMs for mental‑health support by providing personalized counseling styles, addressing the lack of personalization in existing models.

arXiv

View Details