Explore high-quality datasets for your AI and machine learning projects.
xCodeEval is currently the largest executable multilingual multitask benchmark dataset, containing 25 million document‑level code examples covering approximately 7,500 unique problems across 17 programming languages. The dataset comprises seven tasks involving code understanding, generation, translation, and retrieval, and uses execution‑based evaluation. It also introduces a code execution engine, ExecEval, supporting all languages, and proposes a data splitting and selection scheme based on geometric mean and graph‑theoretic principles to balance the distribution of multiple attributes.
This dataset includes all test cases from NIST's Juliet test suite for the C and C++ programming languages. Each sample provides a good and a defective implementation, extracted via the Juliet suite's OMITGOOD and OMITBAD preprocessor macros. The dataset supports software defect prediction and code clone detection tasks. Its structure comprises data instances, fields, and splits. Fields include index, filename, defect class, good code, and bad code. Splits contain training and test set sizes. The dataset is synthetic, with all samples handcrafted, and therefore does not fully represent real‑world software defects.