google-research-datasets/mbpp
The Mostly Basic Python Problems (MBPP) dataset contains about 1,000 Python programming problems generated by crowdsourcing and experts, intended for evaluating code generation models. Each problem includes a task description, a code solution, and three automated test cases. The dataset is provided in two versions: full and sanitized, each comprising training, test, validation, and prompt partitions. It was created to assess code generation capabilities and was developed and annotated internally at Google through crowdsourcing efforts.
Description
Dataset Overview
Basic Information
- Dataset Name: Mostly Basic Python Problems (mbpp)
- Language: English
- License: CC-BY-4.0
- Multilinguality: Monolingual
- Size Category: n<1K
- Source Dataset: Raw Data
- Task Category: Text-to-Text Generation
- Tags: Code Generation
Dataset Structure
Configurations
-
full:
- Features:
task_id: int32text: stringcode: stringtest_list: sequence of stringtest_setup_code: stringchallenge_test_list: sequence of string
- Splits:
train: 374 samples, 176,879 bytestest: 500 samples, 244,104 bytesvalidation: 90 samples, 42,405 bytesprompt: 10 samples, 4,550 bytes
- Download Size: 236,069 bytes
- Dataset Size: 467,938 bytes
- Features:
-
sanitized:
- Features:
source_file: stringtask_id: int32prompt: stringcode: stringtest_imports: sequence of stringtest_list: sequence of string
- Splits:
train: 120 samples, 63,453 bytestest: 257 samples, 132,720 bytesvalidation: 43 samples, 20,050 bytesprompt: 7 samples, 3,407 bytes
- Download Size: 115,422 bytes
- Dataset Size: 219,630 bytes
- Features:
Data Examples
-
full:
{ "task_id": 1, "text": "Write a function to find the minimum cost path to reach (m, n) from (0, 0) for the given cost matrix cost[][] and a position (m, n) in cost[][]", "code": "R = 3\r\nC = 3\r\ndef min_cost(cost, m, n): \r\n\ttc = [[0 for x in range(C)] for x in range(R)] \r\n\ttc[0][0] = cost[0][0] \r\n\tfor i in range(1, m+1): \r\n\t\ttc[i][0] = tc[i-1][0] + cost[i][0] \r\n\tfor j in range(1, n+1): \r\n\t\ttc[0][j] = tc[0][j-1] + cost[0][j] \r\n\tfor i in range(1, m+1): \r\n\t\tfor j in range(1, n+1): \r\n\t\t\ttc[i][j] = min(tc[i-1][j-1], tc[i-1][j], tc[i][j-1]) + cost[i][j] \r\n\treturn tc[m][n]", "test_list": [ "assert min_cost([[1, 2, 3], [4, 8, 2], [1, 5, 3]], 2, 2) == 8", "assert min_cost([[2, 3, 4], [5, 9, 3], [2, 6, 4]], 2, 2) == 12", "assert min_cost([[3, 4, 5], [6, 10, 4], [3, 7, 5]], 2, 2) == 16" ], "test_setup_code": "", "challenge_test_list": [] } -
sanitized:
{ "source_file": "Benchmark Questions Verification V2.ipynb", "task_id": 2, "prompt": "Write a function to find the shared elements from the given two lists.", "code": "def similar_elements(test_tup1, test_tup2):\n res = tuple(set(test_tup1) & set(test_tup2))\n return (res) ", "test_imports": [], "test_list": [ "assert set(similar_elements((3, 4, 5, 6),(5, 7, 4, 10))) == set((4, 5))", "assert set(similar_elements((1, 2, 3, 4),(5, 4, 3, 7))) == set((3, 4))", "assert set(similar_elements((11, 12, 14, 13),(17, 15, 14, 13))) == set((13, 14))" ] }
Data Fields
source_file: unknowntext/prompt: programming task descriptioncode: solution to the programming tasktest_setup_code/test_imports: code imports required to run the teststest_list: test suite for validating the solutionchallenge_test_list: additional, more challenging tests for deeper validation
Data Splits
- Both full and sanitized versions contain four splits:
train,evaluation,test, andprompt(used for few‑shot prompting, not for training).
Dataset Creation
- Purpose: To evaluate code‑generation capabilities, a collection of simple programming tasks and their solutions was assembled.
- Source: The dataset was built from scratch by internal crowdsourcing efforts at Google.
- Annotation: The full version was created first; a subset then received a second round of refined task descriptions.
Usage Considerations
- Execute generated Python code only in a secure sandbox, as it may be unsafe.
- Social Impact: The dataset enables more reliable assessment of code‑generation models, helping to mitigate risks when deploying such models.
- Known Limitations: Some task descriptions may be ambiguous or insufficient; the sanitized split aims to alleviate this through a second‑round of annotation improvements.
Additional Information
- Curator: Google Research
- License: CC-BY-4.0
- Citation:
@article{austin2021program, title={Program Synthesis with Large Language Models}, author={Austin, Jacob and Odena, Augustus and Nye, Maxwell and Bosma, Maarten and Michalewski, Henryk and Dohan, David and Jiang, Ellen and Cai, Carrie and Terry, Michael and Le, Quoc and others}, journal={arXiv preprint arXiv:2108.07732}, year={2021} } - Contributors: @lvwerra
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.