Back to datasets
Dataset assetOpen Source CommunityMachine LearningNatural Language Processing
Starlento/DPO-En-Zh-20k-handbook
This dataset is a rearranged version of the original DPO‑En‑Zh‑20k dataset, split into 9,900 + 9,900 samples for training and 100 + 100 for testing. It contains fields such as language, prompt, rejected response (content and role), and chosen response (content and role), suitable for text generation and QA tasks in both Chinese and English.
Source
hugging_face
Created
Nov 28, 2025
Updated
May 2, 2024
Signals
86 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
Dataset Name
- DPO-En-Zh-20k-handbook
Dataset Features
- language: string type
- prompt: string type
- rejected: list type, includes
- content: string type
- role: string type
- chosen: list type, includes
- content: string type
- role: string type
Dataset Splits
- test: 200 samples, occupying 1 354 176 bytes
- train: 19 800 samples, occupying 107 311 936 bytes
Dataset Size
- Download size: 60 064 620 bytes
- Dataset size: 108 666 112 bytes
Configuration Information
- config_name: default
- data_files:
- test: path
data/test-* - train: path
data/train-*
- test: path
Task Categories
- Text Generation
- Question Answering
Languages
- Chinese
- English
Tags
- dpo
Size Category
- 10K<n<100K
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.