Explore high-quality datasets for your AI and machine learning projects.
This dataset includes all test cases from NIST's Juliet test suite for the C and C++ programming languages. Each sample provides a good and a defective implementation, extracted via the Juliet suite's OMITGOOD and OMITBAD preprocessor macros. The dataset supports software defect prediction and code clone detection tasks. Its structure comprises data instances, fields, and splits. Fields include index, filename, defect class, good code, and bad code. Splits contain training and test set sizes. The dataset is synthetic, with all samples handcrafted, and therefore does not fully represent real‑world software defects.