JUHE API Marketplace
DATASET
Open Source Community

CodeFeedback-Python105K

This dataset is a subset extracted from the `m-a-p/CodeFeedback-Filtered-Instruction` dataset, specifically selecting 104,848 samples written in Python. The dataset includes two main features: 'query' and 'response', both of string type. It is divided into a training set containing 104,848 samples. The dataset is suitable for question‑answering tasks, in English, with a sample size between 10,000 and 100,000.

Updated 11/14/2024
huggingface

Description

CodeFeedback-Python105K Dataset Overview

Dataset Information

  • Features:
    • query: string type
    • response: string type
  • Splits:
    • train: contains 104,848 samples, occupying 232,791,997 bytes
  • Download Size: 114,503,169 bytes
  • Dataset Size: 232,791,997 bytes
  • Configurations:
    • default: includes training data files data/train-*
  • License: Apache 2.0
  • Task Category: Question Answering
  • Language: English
  • Scale Category: 10K < n < 100K

Dataset Source

  • This dataset is a subset extracted from the m-a-p/CodeFeedback-Filtered-Instruction dataset, which originally contains 156,526 samples.
  • The original dataset includes samples from four major open‑source code instruction tuning datasets:
    • Magicoder-OSS-Instruct
    • Python code subset of ShareGPT
    • Magicoder-Evol-Instruct
    • Evol-Instruct-Code
  • This subset contains only 104,848 samples written in Python.

References

@article{zheng2024opencodeinterpreter, title={Opencodeinterpreter: Integrating code generation with execution and refinement}, author={Zheng, Tianyu and Zhang, Ge and Shen, Tianhao and Liu, Xueling and Lin, Bill Yuchen and Fu, Jie and Chen, Wenhu and Yue, Xiang}, journal={arXiv preprint arXiv:2402.14658}, year={2024} }

@article{meng2024pissa, title={Pissa: Principal singular values and singular vectors adaptation of large language models}, author={Meng, Fanxu and Wang, Zhaohui and Zhang, Muhan}, journal={arXiv preprint arXiv:2404.02948}, year={4 2024} }

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Python Programming
Question Answering

Source

Organization: huggingface

Created: 11/1/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.