botp/Azure99_blossom-chat-v3
Blossom Chat V3 is a bilingual Chinese‑English dialogue dataset derived from ShareGPT 90K, suitable for multi‑turn dialogue fine‑tuning. The dataset is fully distilled using GPT‑4, addressing the scarcity of Chinese dialogue data and the output truncation problem. Chinese and English data are mixed in roughly a 1:1 ratio; each record represents a complete multi‑turn conversation containing an `id` and a `conversations` field. The `conversations` field includes `role` and `content`, representing user input and assistant output respectively. The dataset exhibits issues such as incoherent multi‑turn dialogues and inaccurate answers.
Description
Dataset Overview
Dataset Name
BLOSSOM CHAT V3
Dataset Source
Derived from ShareGPT 90K, specifically designed for bilingual Chinese‑English multi‑turn dialogue fine‑tuning.
Dataset Characteristics
- Fully distilled with GPT‑4.
- Solves the problems of limited Chinese dialogue data and output truncation caused by ChatGPT’s length limits.
- The released version contains 50 % of the total data, amounting to 5 K records.
Language
The dataset primarily contains Chinese and English, mixed at approximately a 1:1 ratio.
Dataset Structure
- id: Unique identifier starting from 1.
- conversations: Array of objects, each with
roleandcontentfields.role: Eitheruserorassistant, indicating user input or assistant output.content: The corresponding textual content.
Dataset Limitations
- May contain incoherent multi‑turn dialogues, especially in conversations involving randomness.
- All responses are generated by gpt‑4‑0125‑preview without rigorous data verification; they may include inaccurate or severely erroneous answers.
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.