JUHE API Marketplace
DATASET
Open Source Community

Starlento/DPO-En-Zh-20k-handbook

This dataset is a rearranged version of the original DPO‑En‑Zh‑20k dataset, split into 9,900 + 9,900 samples for training and 100 + 100 for testing. It contains fields such as language, prompt, rejected response (content and role), and chosen response (content and role), suitable for text generation and QA tasks in both Chinese and English.

Updated 5/2/2024
hugging_face

Description

Dataset Overview

Dataset Name

  • DPO-En-Zh-20k-handbook

Dataset Features

  • language: string type
  • prompt: string type
  • rejected: list type, includes
    • content: string type
    • role: string type
  • chosen: list type, includes
    • content: string type
    • role: string type

Dataset Splits

  • test: 200 samples, occupying 1 354 176 bytes
  • train: 19 800 samples, occupying 107 311 936 bytes

Dataset Size

  • Download size: 60 064 620 bytes
  • Dataset size: 108 666 112 bytes

Configuration Information

  • config_name: default
  • data_files:
    • test: path data/test-*
    • train: path data/train-*

Task Categories

  • Text Generation
  • Question Answering

Languages

  • Chinese
  • English

Tags

  • dpo

Size Category

  • 10K<n<100K

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Natural Language Processing
Machine Learning

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.