Dataset assetOpen Source CommunityText GenerationModel Fine-tuning

erhwenkuo/alpaca-data-gpt4-chinese-zhtw

The dataset named alpaca-data-gpt4-chinese-zhtw contains traditional Chinese instruction‑following data generated by GPT‑4 for fine‑tuning large language models. The dataset originates from a GitHub repository and is a Chinese translation of the original English version. It comprises 52 K instruction‑following entries, formatted like the Alpaca dataset, but with outputs generated by GPT‑4. The three primary fields are: instruction (task description), input (optional task context or input), and output (GPT‑4‑generated answer). Compared with the original Alpaca dataset, this version leverages GPT‑4 for response generation, resulting in higher quality and longer responses. The dataset is suitable for text generation, dialogue, and question‑answering tasks.

Source

hugging_face

Created

Nov 28, 2025

Updated

Sep 26, 2023

Signals

157 views

Availability

Linked source ready

Overview

Dataset description and usage context

Dataset Overview

Dataset Name

Name: alpaca-data-gpt4-chinese-zhtw

Dataset Description

Description: This dataset contains traditional Chinese instruction‑following data generated by GPT‑4 using Alpaca prompts for fine‑tuning large language models (LLMs).

Dataset Structure

Features:
- instruction: A string describing the task the model should perform.
- input: A string providing optional context or input for the task.
- output: A string containing the answer generated by GPT‑4.
Splits:
- train: 33,817,106 bytes, 52,049 samples.
Download Size: 22,275,874 bytes
Dataset Size: 33,817,106 bytes

Task Categories

Categories:
- Text Generation
- Dialogue
- Question Answering

Language

Language: Chinese

Configuration

Configuration Name: default
- Data Files:
  - split: train
  - path: data/train-*

Pretty Name

Pretty Name: alpaca-data-gpt4-chinese-zhtw

Size Category

Size Category: 10K<n<100K

License Information

License: Creative Commons NonCommercial (CC BY‑NC 4.0)

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio