JUHE API Marketplace
DATASET
Open Source Community

prm800k

This dataset contains data from [openai/prm800k](https://github.com/openai/prm800k). It is divided into two phases (phase1 and phase2), each with train and test splits. Features include labeler, timestamp, question, etc.; detailed feature types are described in the README.

Updated 12/14/2024
huggingface

Description

Dataset Overview

Dataset Information

Configuration phase1

  • Features:
    • labeler: string
    • timestamp: string
    • generation: null
    • is_quality_control_question: bool
    • is_initial_screening_question: bool
    • question (structured):
      • problem: string
      • ground_truth_answer: string
    • label (structured):
      • steps (list):
        • completions (list):
          • text: string
          • rating: int64
          • flagged: bool
        • human_completion (structured):
          • text: string
          • rating: null
          • source: string
          • flagged: bool
          • corrected_rating: int64
        • chosen_completion: int64
      • total_time: int64
      • finish_reason: string
  • Splits:
    • train: 5,185,121 bytes, 949 samples
    • test: 532,137 bytes, 106 samples
  • Download Size: 1,850,110 bytes
  • Dataset Size: 5,717,258 bytes

Configuration phase2

  • Features:
    • labeler: string
    • timestamp: string
    • generation: int64
    • is_quality_control_question: bool
    • is_initial_screening_question: bool
    • question (structured):
      • problem: string
      • ground_truth_solution: string
      • ground_truth_answer: string
      • pre_generated_steps: sequence of string
      • pre_generated_answer: string
      • pre_generated_verifier_score: float64
    • label (structured):
      • steps (list):
        • completions (list):
          • text: string
          • rating: int64
          • flagged: bool
        • human_completion: null
        • chosen_completion: int64
      • total_time: int64
      • finish_reason: string
  • Splits:
    • train: 344,736,273 bytes, 97,782 samples
    • test: 9,164,167 bytes, 2,762 samples
  • Download Size: 132,668,705 bytes
  • Dataset Size: 353,900,440 bytes

Configuration Files

  • phase1:
    • train: phase1/train-*
    • test: phase1/test-*
  • phase2:
    • train: phase2/train-*
    • test: phase2/test-*

Language

  • English (en)

Dataset Scale

  • 10K < n < 100K

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Natural Language Processing
Machine Learning

Source

Organization: huggingface

Created: 12/13/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.