DATASET
Open Source Community
Starlento/DPO-En-Zh-20k-handbook
This dataset is a rearranged version of the original DPO‑En‑Zh‑20k dataset, split into 9,900 + 9,900 samples for training and 100 + 100 for testing. It contains fields such as language, prompt, rejected response (content and role), and chosen response (content and role), suitable for text generation and QA tasks in both Chinese and English.
Updated 5/2/2024
hugging_face
Description
Dataset Overview
Dataset Name
- DPO-En-Zh-20k-handbook
Dataset Features
- language: string type
- prompt: string type
- rejected: list type, includes
- content: string type
- role: string type
- chosen: list type, includes
- content: string type
- role: string type
Dataset Splits
- test: 200 samples, occupying 1 354 176 bytes
- train: 19 800 samples, occupying 107 311 936 bytes
Dataset Size
- Download size: 60 064 620 bytes
- Dataset size: 108 666 112 bytes
Configuration Information
- config_name: default
- data_files:
- test: path
data/test-* - train: path
data/train-*
- test: path
Task Categories
- Text Generation
- Question Answering
Languages
- Chinese
- English
Tags
- dpo
Size Category
- 10K<n<100K
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Login to Access
Please login to view download links and access full dataset details.
Topics
Natural Language Processing
Machine Learning
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.