JUHE API Marketplace
DATASET
Open Source Community

botp/Azure99_blossom-chat-v3

Blossom Chat V3 is a bilingual Chinese‑English dialogue dataset derived from ShareGPT 90K, suitable for multi‑turn dialogue fine‑tuning. The dataset is fully distilled using GPT‑4, addressing the scarcity of Chinese dialogue data and the output truncation problem. Chinese and English data are mixed in roughly a 1:1 ratio; each record represents a complete multi‑turn conversation containing an `id` and a `conversations` field. The `conversations` field includes `role` and `content`, representing user input and assistant output respectively. The dataset exhibits issues such as incoherent multi‑turn dialogues and inaccurate answers.

Updated 4/21/2024
hugging_face

Description

Dataset Overview

Dataset Name

BLOSSOM CHAT V3

Dataset Source

Derived from ShareGPT 90K, specifically designed for bilingual Chinese‑English multi‑turn dialogue fine‑tuning.

Dataset Characteristics

  • Fully distilled with GPT‑4.
  • Solves the problems of limited Chinese dialogue data and output truncation caused by ChatGPT’s length limits.
  • The released version contains 50 % of the total data, amounting to 5 K records.

Language

The dataset primarily contains Chinese and English, mixed at approximately a 1:1 ratio.

Dataset Structure

  • id: Unique identifier starting from 1.
  • conversations: Array of objects, each with role and content fields.
    • role: Either user or assistant, indicating user input or assistant output.
    • content: The corresponding textual content.

Dataset Limitations

  • May contain incoherent multi‑turn dialogues, especially in conversations involving randomness.
  • All responses are generated by gpt‑4‑0125‑preview without rigorous data verification; they may include inaccurate or severely erroneous answers.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Multi‑Turn Dialogue
Bilingual Dataset

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.