Dataset assetOpen Source CommunityNatural Language ProcessingChinese Language Models

BelleGroup/train_0.5M_CN

包含约50万条由BELLE项目生成的中文指令数据。每条数据包含一个指令、输入（本数据集均为空）和输出。

Source

hugging_face

Created

Nov 28, 2025

Updated

Apr 3, 2023

Signals

160 views

Availability

Linked source ready

Overview

Dataset description and usage context

数据集概述

基本信息

许可证: GPL-3.0
任务类别: 文本到文本生成
语言: 中文
数据集大小: 10万至100万条数据

内容描述

数据来源: BELLE项目
数据量: 约50万条中文指令数据

数据结构

字段说明:
- instruction: 指令
- input: 输入（数据集中的输入均为空）
- output: 输出

使用限制

使用目的: 仅限于研究目的
禁止用途: 不得用于商业用途或可能对社会造成危害的用途
免责声明: 本数据集不代表任何立场、利益或想法，使用本数据集造成的任何损害、纠纷，本项目不承担责任。

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio