Back to datasets
Dataset assetOpen Source CommunityProgramming EducationCode Generation

mbpp

The dataset comprises four features: instance_id (integer), prompt (string), canonical_solution (string), and test (string). It is divided into four parts: training set (train), test set (test), validation set (validation), and prompt set (prompt). Each part has corresponding file paths and sample counts. The total download size is 228,122 bytes, and the total dataset size is 500,198 bytes.

Source
huggingface
Created
Dec 4, 2024
Updated
Dec 8, 2024
Signals
380 views
Availability
Linked source ready
Overview

Dataset description and usage context

MBPP Dataset Overview

Dataset Information

Features

  • instance_id: data type int32
  • prompt: data type string
  • canonical_solution: data type string
  • test: data type string

Data Splits

  • train: contains 374 samples, occupying 189,426 bytes
  • test: contains 500 samples, occupying 260,317 bytes
  • validation: contains 90 samples, occupying 45,555 bytes
  • prompt: contains 10 samples, occupying 4,900 bytes

Dataset Size

  • Download Size: 228,122 bytes
  • Total Size: 500,198 bytes

Configuration

  • config_name: default
    • data_files:
      • train: data/train-*
      • test: data/test-*
      • validation: data/validation-*
      • prompt: data/prompt-*
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio