DATASET
Open Source Community
prm800k
This dataset contains data from [openai/prm800k](https://github.com/openai/prm800k). It is divided into two phases (phase1 and phase2), each with train and test splits. Features include labeler, timestamp, question, etc.; detailed feature types are described in the README.
Updated 12/14/2024
huggingface
Description
Dataset Overview
Dataset Information
Configuration phase1
- Features:
labeler: stringtimestamp: stringgeneration: nullis_quality_control_question: boolis_initial_screening_question: boolquestion(structured):problem: stringground_truth_answer: string
label(structured):steps(list):completions(list):text: stringrating: int64flagged: bool
human_completion(structured):text: stringrating: nullsource: stringflagged: boolcorrected_rating: int64
chosen_completion: int64
total_time: int64finish_reason: string
- Splits:
train: 5,185,121 bytes, 949 samplestest: 532,137 bytes, 106 samples
- Download Size: 1,850,110 bytes
- Dataset Size: 5,717,258 bytes
Configuration phase2
- Features:
labeler: stringtimestamp: stringgeneration: int64is_quality_control_question: boolis_initial_screening_question: boolquestion(structured):problem: stringground_truth_solution: stringground_truth_answer: stringpre_generated_steps: sequence of stringpre_generated_answer: stringpre_generated_verifier_score: float64
label(structured):steps(list):completions(list):text: stringrating: int64flagged: bool
human_completion: nullchosen_completion: int64
total_time: int64finish_reason: string
- Splits:
train: 344,736,273 bytes, 97,782 samplestest: 9,164,167 bytes, 2,762 samples
- Download Size: 132,668,705 bytes
- Dataset Size: 353,900,440 bytes
Configuration Files
- phase1:
train:phase1/train-*test:phase1/test-*
- phase2:
train:phase2/train-*test:phase2/test-*
Language
- English (
en)
Dataset Scale
- 10K < n < 100K
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Login to Access
Please login to view download links and access full dataset details.
Topics
Natural Language Processing
Machine Learning
Source
Organization: huggingface
Created: 12/13/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.