alpaca-chinese-dataset

This dataset comprises a mixed Chinese‑English corpus designed for bilingual fine‑tuning and ongoing data correction. The original Alpaca English dataset contains numerous issues, such as erroneous mathematical samples, mislabeled output fields, and misaligned tags. This dataset rectifies those problems, translates the corrected samples into Chinese, and manually rewrites instructions where literal translation leads to loss of rhyme, tense inconsistencies, or other nuances. It focuses on: (1) fixing problems in the original English data, (2) translating into Chinese, (3) adjusting samples affected by direct translation, (4) leaving code and special outputs unchanged, and (5) aligning special tags or refusal outputs.

Updated 3/27/2023

github

Description

alpaca‑chinese‑dataset Overview

Dataset Description

Name: alpaca Chinese instruction fine‑tuning dataset

Data Generation Methods

Machine Translation
self‑instruct

Data Format

Same JSON format as the original Alpaca data

Data Samples

json [ { "instruction": "Determine the nearest airport given specific coordinates.", "input": "40.728157, -73.794853", "output": "The nearest airport to the given coordinates is LaGuardia Airport (LGA) in New York City." }, { "instruction": "Output a list of different kinds of fruit", "input": "", "output": "1. Apple\n2. Banana\n3. Orange\n4. Mango\n5. Strawberry\n6. Grape\n7. Blueberry\n8. Cherry\n9. Kiwi\n10. Melon\n11. Pineapple\n12. Plum\n13. Peach" }, { "instruction": "Find the hidden message in the string", "input": "业余咖啡", "output": "The hidden message is "Coffee‑Lover"." } ]

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Please login to view download links and access full dataset details.

Topics

Instruction Fine‑tuning

Machine Translation

Source

Organization: github

Created: 3/19/2023

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.

Check Prices →