Back to datasets
Dataset assetOpen Source CommunityInstruction Fine‑tuningChinese QA
TigerResearch/sft_zh
Chinese sft‑zh data collection from the Tigerbot open‑source project, encompassing multiple Chinese datasets such as Alpaca‑Chinese, encyclopedia QA, classic literature QA, riddles, reading comprehension, general QA, and Zhihu QA. The collection can be used directly without repeated downloads.
Source
hugging_face
Created
Nov 28, 2025
Updated
Jun 9, 2023
Signals
584 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
This dataset is the Chinese sft‑zh fine‑tuning collection within the Tigerbot open‑source project, encompassing other Chinese sft datasets released by the organization, eliminating the need for duplicate downloads.
Usage
import datasets
ds_sft = datasets.load_dataset('TigerResearch/sft_zh')
File Breakdown
| Type | Language | Dataset File | Size |
|---|---|---|---|
| Alpaca Chinese | Chinese | tigerbot-alpaca-zh-0.5m | 0.5m |
| Encyclopedia QA | Chinese | tigerbot-wiki-qa-1k | 1k |
| Classic Literature QA | Chinese | tigerbot-book-qa-1k | 1k |
| Riddles | Chinese | tigerbot-riddle-qa-1k | 1k |
| Reading Comprehension | Chinese | tigerbot-superclue-c3-zh-5k | 5k |
| General QA | Chinese | tigerbot-hc3-zh-12k | 12k |
| Zhihu QA | Chinese | tigerbot-zhihu-zh-10k | 10k |
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.