JUHE API Marketplace
DATASET
Open Source Community

google-research-datasets/mbpp

The Mostly Basic Python Problems (MBPP) dataset contains about 1,000 Python programming problems generated by crowdsourcing and experts, intended for evaluating code generation models. Each problem includes a task description, a code solution, and three automated test cases. The dataset is provided in two versions: full and sanitized, each comprising training, test, validation, and prompt partitions. It was created to assess code generation capabilities and was developed and annotated internally at Google through crowdsourcing efforts.

Updated 1/4/2024
hugging_face

Description

Dataset Overview

Basic Information

  • Dataset Name: Mostly Basic Python Problems (mbpp)
  • Language: English
  • License: CC-BY-4.0
  • Multilinguality: Monolingual
  • Size Category: n<1K
  • Source Dataset: Raw Data
  • Task Category: Text-to-Text Generation
  • Tags: Code Generation

Dataset Structure

Configurations

  • full:

    • Features:
      • task_id: int32
      • text: string
      • code: string
      • test_list: sequence of string
      • test_setup_code: string
      • challenge_test_list: sequence of string
    • Splits:
      • train: 374 samples, 176,879 bytes
      • test: 500 samples, 244,104 bytes
      • validation: 90 samples, 42,405 bytes
      • prompt: 10 samples, 4,550 bytes
    • Download Size: 236,069 bytes
    • Dataset Size: 467,938 bytes
  • sanitized:

    • Features:
      • source_file: string
      • task_id: int32
      • prompt: string
      • code: string
      • test_imports: sequence of string
      • test_list: sequence of string
    • Splits:
      • train: 120 samples, 63,453 bytes
      • test: 257 samples, 132,720 bytes
      • validation: 43 samples, 20,050 bytes
      • prompt: 7 samples, 3,407 bytes
    • Download Size: 115,422 bytes
    • Dataset Size: 219,630 bytes

Data Examples

  • full:

    {
        "task_id": 1,
        "text": "Write a function to find the minimum cost path to reach (m, n) from (0, 0) for the given cost matrix cost[][] and a position (m, n) in cost[][]",
        "code": "R = 3\r\nC = 3\r\ndef min_cost(cost, m, n): \r\n\ttc = [[0 for x in range(C)] for x in range(R)] \r\n\ttc[0][0] = cost[0][0] \r\n\tfor i in range(1, m+1): \r\n\t\ttc[i][0] = tc[i-1][0] + cost[i][0] \r\n\tfor j in range(1, n+1): \r\n\t\ttc[0][j] = tc[0][j-1] + cost[0][j] \r\n\tfor i in range(1, m+1): \r\n\t\tfor j in range(1, n+1): \r\n\t\t\ttc[i][j] = min(tc[i-1][j-1], tc[i-1][j], tc[i][j-1]) + cost[i][j] \r\n\treturn tc[m][n]",
        "test_list": [
            "assert min_cost([[1, 2, 3], [4, 8, 2], [1, 5, 3]], 2, 2) == 8",
            "assert min_cost([[2, 3, 4], [5, 9, 3], [2, 6, 4]], 2, 2) == 12",
            "assert min_cost([[3, 4, 5], [6, 10, 4], [3, 7, 5]], 2, 2) == 16"
        ],
        "test_setup_code": "",
        "challenge_test_list": []
    }
    
  • sanitized:

    {
        "source_file": "Benchmark Questions Verification V2.ipynb",
        "task_id": 2,
        "prompt": "Write a function to find the shared elements from the given two lists.",
        "code": "def similar_elements(test_tup1, test_tup2):\n  res = tuple(set(test_tup1) & set(test_tup2))\n  return (res) ",
        "test_imports": [],
        "test_list": [
            "assert set(similar_elements((3, 4, 5, 6),(5, 7, 4, 10))) == set((4, 5))",
            "assert set(similar_elements((1, 2, 3, 4),(5, 4, 3, 7))) == set((3, 4))",
            "assert set(similar_elements((11, 12, 14, 13),(17, 15, 14, 13))) == set((13, 14))"
        ]
    }
    

Data Fields

  • source_file: unknown
  • text/prompt: programming task description
  • code: solution to the programming task
  • test_setup_code/test_imports: code imports required to run the tests
  • test_list: test suite for validating the solution
  • challenge_test_list: additional, more challenging tests for deeper validation

Data Splits

  • Both full and sanitized versions contain four splits: train, evaluation, test, and prompt (used for few‑shot prompting, not for training).

Dataset Creation

  • Purpose: To evaluate code‑generation capabilities, a collection of simple programming tasks and their solutions was assembled.
  • Source: The dataset was built from scratch by internal crowdsourcing efforts at Google.
  • Annotation: The full version was created first; a subset then received a second round of refined task descriptions.

Usage Considerations

  • Execute generated Python code only in a secure sandbox, as it may be unsafe.
  • Social Impact: The dataset enables more reliable assessment of code‑generation models, helping to mitigate risks when deploying such models.
  • Known Limitations: Some task descriptions may be ambiguous or insufficient; the sanitized split aims to alleviate this through a second‑round of annotation improvements.

Additional Information

  • Curator: Google Research
  • License: CC-BY-4.0
  • Citation:
    @article{austin2021program,
      title={Program Synthesis with Large Language Models},
      author={Austin, Jacob and Odena, Augustus and Nye, Maxwell and Bosma, Maarten and Michalewski, Henryk and Dohan, David and Jiang, Ellen and Cai, Carrie and Terry, Michael and Le, Quoc and others},
      journal={arXiv preprint arXiv:2108.07732},
      year={2021}
    }
    
  • Contributors: @lvwerra

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Python Programming
Code Generation

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.