Back to datasets
Dataset assetOpen Source CommunityMulti‑Turn DialogueBilingual Dataset

botp/Azure99_blossom-chat-v3

Blossom Chat V3 is a bilingual Chinese‑English dialogue dataset derived from ShareGPT 90K, suitable for multi‑turn dialogue fine‑tuning. The dataset is fully distilled using GPT‑4, addressing the scarcity of Chinese dialogue data and the output truncation problem. Chinese and English data are mixed in roughly a 1:1 ratio; each record represents a complete multi‑turn conversation containing an `id` and a `conversations` field. The `conversations` field includes `role` and `content`, representing user input and assistant output respectively. The dataset exhibits issues such as incoherent multi‑turn dialogues and inaccurate answers.

Source
hugging_face
Created
Nov 28, 2025
Updated
Apr 21, 2024
Signals
162 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Dataset Name

BLOSSOM CHAT V3

Dataset Source

Derived from ShareGPT 90K, specifically designed for bilingual Chinese‑English multi‑turn dialogue fine‑tuning.

Dataset Characteristics

  • Fully distilled with GPT‑4.
  • Solves the problems of limited Chinese dialogue data and output truncation caused by ChatGPT’s length limits.
  • The released version contains 50 % of the total data, amounting to 5 K records.

Language

The dataset primarily contains Chinese and English, mixed at approximately a 1:1 ratio.

Dataset Structure

  • id: Unique identifier starting from 1.
  • conversations: Array of objects, each with role and content fields.
    • role: Either user or assistant, indicating user input or assistant output.
    • content: The corresponding textual content.

Dataset Limitations

  • May contain incoherent multi‑turn dialogues, especially in conversations involving randomness.
  • All responses are generated by gpt‑4‑0125‑preview without rigorous data verification; they may include inaccurate or severely erroneous answers.
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio