Back to datasets
Dataset assetOpen Source CommunityInstruction Fine‑tuningChinese QA

TigerResearch/sft_zh

Chinese sft‑zh data collection from the Tigerbot open‑source project, encompassing multiple Chinese datasets such as Alpaca‑Chinese, encyclopedia QA, classic literature QA, riddles, reading comprehension, general QA, and Zhihu QA. The collection can be used directly without repeated downloads.

Source
hugging_face
Created
Nov 28, 2025
Updated
Jun 9, 2023
Signals
584 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

This dataset is the Chinese sft‑zh fine‑tuning collection within the Tigerbot open‑source project, encompassing other Chinese sft datasets released by the organization, eliminating the need for duplicate downloads.

Usage

import datasets
ds_sft = datasets.load_dataset('TigerResearch/sft_zh')

File Breakdown

TypeLanguageDataset FileSize
Alpaca ChineseChinesetigerbot-alpaca-zh-0.5m0.5m
Encyclopedia QAChinesetigerbot-wiki-qa-1k1k
Classic Literature QAChinesetigerbot-book-qa-1k1k
RiddlesChinesetigerbot-riddle-qa-1k1k
Reading ComprehensionChinesetigerbot-superclue-c3-zh-5k5k
General QAChinesetigerbot-hc3-zh-12k12k
Zhihu QAChinesetigerbot-zhihu-zh-10k10k
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio