botp/Azure99_blossom-chat-v3

Blossom Chat V3 is a bilingual Chinese‑English dialogue dataset derived from ShareGPT 90K, suitable for multi‑turn dialogue fine‑tuning. The dataset is fully distilled using GPT‑4, addressing the scarcity of Chinese dialogue data and the output truncation problem. Chinese and English data are mixed in roughly a 1:1 ratio; each record represents a complete multi‑turn conversation containing an `id` and a `conversations` field. The `conversations` field includes `role` and `content`, representing user input and assistant output respectively. The dataset exhibits issues such as incoherent multi‑turn dialogues and inaccurate answers.

Updated 4/21/2024

hugging_face

Description

Dataset Overview

Dataset Name

BLOSSOM CHAT V3

Dataset Source

Derived from ShareGPT 90K, specifically designed for bilingual Chinese‑English multi‑turn dialogue fine‑tuning.

Dataset Characteristics

Fully distilled with GPT‑4.
Solves the problems of limited Chinese dialogue data and output truncation caused by ChatGPT’s length limits.
The released version contains 50 % of the total data, amounting to 5 K records.

Language

The dataset primarily contains Chinese and English, mixed at approximately a 1:1 ratio.

Dataset Structure

id: Unique identifier starting from 1.
conversations: Array of objects, each with role and content fields.
- role: Either user or assistant, indicating user input or assistant output.
- content: The corresponding textual content.

Dataset Limitations

May contain incoherent multi‑turn dialogues, especially in conversations involving randomness.
All responses are generated by gpt‑4‑0125‑preview without rigorous data verification; they may include inaccurate or severely erroneous answers.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Please login to view download links and access full dataset details.

Topics

Multi‑Turn Dialogue

Bilingual Dataset

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.

Check Prices →