botp/Azure99_blossom-chat-v3
Multi‑Turn DialogueBilingual Dataset
Blossom Chat V3 is a bilingual Chinese‑English dialogue dataset derived from ShareGPT 90K, suitable for multi‑turn dialogue fine‑tuning. The dataset is fully distilled using GPT‑4, addressing the scarcity of Chinese dialogue data and the output truncation problem. Chinese and English data are mixed in roughly a 1:1 ratio; each record represents a complete multi‑turn conversation containing an `id` and a `conversations` field. The `conversations` field includes `role` and `content`, representing user input and assistant output respectively. The dataset exhibits issues such as incoherent multi‑turn dialogues and inaccurate answers.
Source hugging_faceUpdated Apr 21, 2024162 viewsLinked
Inspect dataset