TigerResearch/sft_zh

Chinese sft‑zh data collection from the Tigerbot open‑source project, encompassing multiple Chinese datasets such as Alpaca‑Chinese, encyclopedia QA, classic literature QA, riddles, reading comprehension, general QA, and Zhihu QA. The collection can be used directly without repeated downloads.

Updated 6/9/2023

hugging_face

Dataset Overview

This dataset is the Chinese sft‑zh fine‑tuning collection within the Tigerbot open‑source project, encompassing other Chinese sft datasets released by the organization, eliminating the need for duplicate downloads.

Usage

import datasets
ds_sft = datasets.load_dataset('TigerResearch/sft_zh')

File Breakdown

Type	Language	Dataset File	Size
Alpaca Chinese	Chinese	tigerbot-alpaca-zh-0.5m	0.5m
Encyclopedia QA	Chinese	tigerbot-wiki-qa-1k	1k
Classic Literature QA	Chinese	tigerbot-book-qa-1k	1k
Riddles	Chinese	tigerbot-riddle-qa-1k	1k
Reading Comprehension	Chinese	tigerbot-superclue-c3-zh-5k	5k
General QA	Chinese	tigerbot-hc3-zh-12k	12k
Zhihu QA	Chinese	tigerbot-zhihu-zh-10k	10k

TigerResearch/sft_zh

Description

Dataset Overview

Usage

File Breakdown

AI studio

Access Dataset

Topics

Source