JUHE API Marketplace
DATASET
Open Source Community

LorenzH/juliet_test_suite_c_1_3

This dataset includes all test cases from NIST's Juliet test suite for the C and C++ programming languages. Each sample provides a good and a defective implementation, extracted via the Juliet suite's OMITGOOD and OMITBAD preprocessor macros. The dataset supports software defect prediction and code clone detection tasks. Its structure comprises data instances, fields, and splits. Fields include index, filename, defect class, good code, and bad code. Splits contain training and test set sizes. The dataset is synthetic, with all samples handcrafted, and therefore does not fully represent real‑world software defects.

Updated 3/21/2023
hugging_face

Description

Dataset Card: Juliet Test Suite 1.3

Dataset Overview

This dataset contains all test cases from NIST's Juliet Test Suite for the C and C++ programming languages. Each sample includes a good and a defective implementation, extracted using the Juliet suite's OMITGOOD and OMITBAD preprocessor macros.

Supported Tasks and Leaderboards

  • Software defect prediction
  • Code clone detection

Languages

C and C++ programming languages

Dataset Structure

Data Instances

Data Fields

IndexNameTypeDescription
0indexintIndex of each sample in the dataset
1filenamestrPath of the test case file, including the filename
2classintDefect category, i.e., the CWE identifier set for the sample
3goodstrSource code of the benign implementation
4badstrSource code of the defective implementation

Data Splits

TypeSize
train80,706 cases
test20,177 cases

Dataset Creation

Source

https://samate.nist.gov/SARD/test-suites/112

Usage Considerations

Societal Impact

Bias Discussion

Other Known Limitations

The Juliet test suite is a synthetic dataset; all samples are handcrafted and therefore do not fully represent real software defects. Applying classifiers trained on these samples to real‑world environments may lead to degraded performance and severe misclassifications, potentially overlooking critical software defects.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Software Defect Detection
Code Analysis

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.