Dataset assetOpen Source CommunityText GenerationText Evaluation

sardinelab/MT-pref

The dataset contains multiple fields such as prompt, chosen (selected response), rejected (rejected response), best_response (best response), each with its respective data type. The dataset is split into a training set comprising 15,798 samples, with a total size of 26,999,980.39076906 bytes.

Source

hugging_face

Created

Nov 28, 2025

Updated

Jul 18, 2024

Signals

49 views

Availability

Linked source ready

Overview

Dataset description and usage context

Dataset Overview

License

License Type: CC BY-NC 4.0

Dataset Information

Features

prompt: string
chosen: string
rejected: string
best_response: string
chosen_score: float64
rejected_score: float64
score_diff: float64

Data Splits

train: 15,798 samples, 26,999,980.39076906 bytes

Dataset Size

Download size: 12,613,788 bytes
Total size: 26,999,980.39076906 bytes

Configuration

config_name: default
- data_files:
  - split: train
  - path: data/train-*

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio