PsyDTCorpus
PsyDTCorpus is a high‑quality multi‑turn psychological‑health dialogue dataset created by a team at South China University of Technology. It aims to simulate the personalized counseling style of a specific therapist. The dataset contains 5,000 single‑turn long‑text dialogues generated in a single pass with GPT‑4, modeling the five major personality traits of clients and synthesizing multi‑turn conversations. The creation process combines real‑world counseling cases to ensure complexity and diversity. PsyDTCorpus is mainly applied in psychological counseling, seeking to improve the performance of LLMs for mental‑health support by providing personalized counseling styles, addressing the lack of personalization in existing models.
Description
Digital Twin of a Psychotherapist Dataset (PsyDTCorpus)
Dataset Overview
- Dataset Name: PsyDTCorpus
- Source: Based on real multi‑turn counseling cases of a specific therapist, synthesized via a digital‑twin data generation framework.
- Scale:
- Training set: 4,760 dialogues, total 86,054 turns, average 18 turns per dialogue.
- Test set: 240 dialogues, total 4,311 turns, average 18 turns per dialogue.
- Format: OpenAI format.
- Topic Distribution: The dataset covers various topics; the distribution is shown in the topic distribution chart.
Data Generation Method
- Framework: Using a small number of real counseling cases, combined with the Big Five personality analysis and LLM summarization capabilities, to generate multi‑turn dialogues that reflect the therapist's language style and counseling techniques.
- Generation Scale:
- Single‑turn counseling database size: 5,000.
- Specific therapist case count: 12 (typically not more than 20).
Dataset Download
- Download Methods:
- Using
git-lfs:cd <project_path>/data git lfs install git clone https://www.modelscope.cn/datasets/YIRONGCHEN/PsyDTCorpus.git - Using
modelscope download:cd <project_path>/data mkdir PsyDTCorpus modelscope download --dataset YIRONGCHEN/PsyDTCorpus --include *
- Using
Sample Entry
{
"id": 0,
"normalizedTag": "婚恋",
"messages": [
{
"role": "system",
"content": "You are a psychotherapist proficient in Rational Emotive Behavior Therapy (REBT), capable of providing professional guidance and support to alleviate clients' negative emotions and behavioral responses, helping them achieve personal growth and mental health. REBT includes several stages, listed below with brief descriptions of each stage..."
},
...
]
}
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: arXiv
Created: 12/18/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.