JUHE API Marketplace
DATASET
Open Source Community

PsyDTCorpus

PsyDTCorpus is a high‑quality multi‑turn psychological‑health dialogue dataset created by a team at South China University of Technology. It aims to simulate the personalized counseling style of a specific therapist. The dataset contains 5,000 single‑turn long‑text dialogues generated in a single pass with GPT‑4, modeling the five major personality traits of clients and synthesizing multi‑turn conversations. The creation process combines real‑world counseling cases to ensure complexity and diversity. PsyDTCorpus is mainly applied in psychological counseling, seeking to improve the performance of LLMs for mental‑health support by providing personalized counseling styles, addressing the lack of personalization in existing models.

Updated 12/18/2024
arXiv

Description

Digital Twin of a Psychotherapist Dataset (PsyDTCorpus)

Dataset Overview

  • Dataset Name: PsyDTCorpus
  • Source: Based on real multi‑turn counseling cases of a specific therapist, synthesized via a digital‑twin data generation framework.
  • Scale:
    • Training set: 4,760 dialogues, total 86,054 turns, average 18 turns per dialogue.
    • Test set: 240 dialogues, total 4,311 turns, average 18 turns per dialogue.
  • Format: OpenAI format.
  • Topic Distribution: The dataset covers various topics; the distribution is shown in the topic distribution chart.

Data Generation Method

  • Framework: Using a small number of real counseling cases, combined with the Big Five personality analysis and LLM summarization capabilities, to generate multi‑turn dialogues that reflect the therapist's language style and counseling techniques.
  • Generation Scale:
    • Single‑turn counseling database size: 5,000.
    • Specific therapist case count: 12 (typically not more than 20).

Dataset Download

  • Download Methods:
    1. Using git-lfs:
      cd <project_path>/data
      git lfs install
      git clone https://www.modelscope.cn/datasets/YIRONGCHEN/PsyDTCorpus.git
      
    2. Using modelscope download:
      cd <project_path>/data
      mkdir PsyDTCorpus
      modelscope download --dataset YIRONGCHEN/PsyDTCorpus --include *
      

Sample Entry

{
    "id": 0,
    "normalizedTag": "婚恋",
    "messages": [
        {
            "role": "system",
            "content": "You are a psychotherapist proficient in Rational Emotive Behavior Therapy (REBT), capable of providing professional guidance and support to alleviate clients' negative emotions and behavioral responses, helping them achieve personal growth and mental health. REBT includes several stages, listed below with brief descriptions of each stage..."
        },
        ...
    ]
}

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Mental Health
Natural Language Processing

Source

Organization: arXiv

Created: 12/18/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.