Explore high-quality datasets for your AI and machine learning projects.
Blossom Chat V3 is a bilingual Chinese‑English dialogue dataset derived from ShareGPT 90K, suitable for multi‑turn dialogue fine‑tuning. The dataset is fully distilled using GPT‑4, addressing the scarcity of Chinese dialogue data and the output truncation problem. Chinese and English data are mixed in roughly a 1:1 ratio; each record represents a complete multi‑turn conversation containing an `id` and a `conversations` field. The `conversations` field includes `role` and `content`, representing user input and assistant output respectively. The dataset exhibits issues such as incoherent multi‑turn dialogues and inaccurate answers.