erhwenkuo/alpaca-data-gpt4-chinese-zhtw
The dataset named alpaca-data-gpt4-chinese-zhtw contains traditional Chinese instruction‑following data generated by GPT‑4 for fine‑tuning large language models. The dataset originates from a GitHub repository and is a Chinese translation of the original English version. It comprises 52 K instruction‑following entries, formatted like the Alpaca dataset, but with outputs generated by GPT‑4. The three primary fields are: instruction (task description), input (optional task context or input), and output (GPT‑4‑generated answer). Compared with the original Alpaca dataset, this version leverages GPT‑4 for response generation, resulting in higher quality and longer responses. The dataset is suitable for text generation, dialogue, and question‑answering tasks.
Description
Dataset Overview
Dataset Name
- Name: alpaca-data-gpt4-chinese-zhtw
Dataset Description
- Description: This dataset contains traditional Chinese instruction‑following data generated by GPT‑4 using Alpaca prompts for fine‑tuning large language models (LLMs).
Dataset Structure
- Features:
instruction: A string describing the task the model should perform.input: A string providing optional context or input for the task.output: A string containing the answer generated by GPT‑4.
- Splits:
train: 33,817,106 bytes, 52,049 samples.
- Download Size: 22,275,874 bytes
- Dataset Size: 33,817,106 bytes
Task Categories
- Categories:
- Text Generation
- Dialogue
- Question Answering
Language
- Language: Chinese
Configuration
- Configuration Name: default
- Data Files:
split: trainpath: data/train-*
- Data Files:
Tags
- Tags:
- gpt4
- alpaca
- instruction-finetuning
Pretty Name
- Pretty Name: alpaca-data-gpt4-chinese-zhtw
Size Category
- Size Category: 10K<n<100K
License Information
- License: Creative Commons NonCommercial (CC BY‑NC 4.0)
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.