Dataset assetOpen Source CommunityMachine LearningNatural Language Processing

Starlento/DPO-En-Zh-20k-handbook

This dataset is a rearranged version of the original DPO‑En‑Zh‑20k dataset, split into 9,900 + 9,900 samples for training and 100 + 100 for testing. It contains fields such as language, prompt, rejected response (content and role), and chosen response (content and role), suitable for text generation and QA tasks in both Chinese and English.

Source

hugging_face

Created

Nov 28, 2025

Updated

May 2, 2024

Signals

86 views

Availability

Linked source ready

Overview

Dataset description and usage context

Dataset Overview

Dataset Name

DPO-En-Zh-20k-handbook

Dataset Features

language: string type
prompt: string type
rejected: list type, includes
- content: string type
- role: string type
chosen: list type, includes
- content: string type
- role: string type

Dataset Splits

test: 200 samples, occupying 1 354 176 bytes
train: 19 800 samples, occupying 107 311 936 bytes

Dataset Size

Download size: 60 064 620 bytes
Dataset size: 108 666 112 bytes

Configuration Information

config_name: default
data_files:
- test: path data/test-*
- train: path data/train-*

Task Categories

Text Generation
Question Answering

Languages

Chinese
English

Size Category

10K<n<100K

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio