Back to datasets
Dataset assetOpen Source CommunityText GenerationText Evaluation
sardinelab/MT-pref
The dataset contains multiple fields such as prompt, chosen (selected response), rejected (rejected response), best_response (best response), each with its respective data type. The dataset is split into a training set comprising 15,798 samples, with a total size of 26,999,980.39076906 bytes.
Source
hugging_face
Created
Nov 28, 2025
Updated
Jul 18, 2024
Signals
49 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
License
- License Type: CC BY-NC 4.0
Dataset Information
Features
- prompt: string
- chosen: string
- rejected: string
- best_response: string
- chosen_score: float64
- rejected_score: float64
- score_diff: float64
Data Splits
- train: 15,798 samples, 26,999,980.39076906 bytes
Dataset Size
- Download size: 12,613,788 bytes
- Total size: 26,999,980.39076906 bytes
Configuration
- config_name: default
- data_files:
- split: train
- path: data/train-*
- data_files:
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.