LorenzH/juliet_test_suite_c_1_3
This dataset includes all test cases from NIST's Juliet test suite for the C and C++ programming languages. Each sample provides a good and a defective implementation, extracted via the Juliet suite's OMITGOOD and OMITBAD preprocessor macros. The dataset supports software defect prediction and code clone detection tasks. Its structure comprises data instances, fields, and splits. Fields include index, filename, defect class, good code, and bad code. Splits contain training and test set sizes. The dataset is synthetic, with all samples handcrafted, and therefore does not fully represent real‑world software defects.
Description
Dataset Card: Juliet Test Suite 1.3
Dataset Overview
This dataset contains all test cases from NIST's Juliet Test Suite for the C and C++ programming languages. Each sample includes a good and a defective implementation, extracted using the Juliet suite's OMITGOOD and OMITBAD preprocessor macros.
Supported Tasks and Leaderboards
- Software defect prediction
- Code clone detection
Languages
C and C++ programming languages
Dataset Structure
Data Instances
Data Fields
| Index | Name | Type | Description |
|---|---|---|---|
| 0 | index | int | Index of each sample in the dataset |
| 1 | filename | str | Path of the test case file, including the filename |
| 2 | class | int | Defect category, i.e., the CWE identifier set for the sample |
| 3 | good | str | Source code of the benign implementation |
| 4 | bad | str | Source code of the defective implementation |
Data Splits
| Type | Size |
|---|---|
| train | 80,706 cases |
| test | 20,177 cases |
Dataset Creation
Source
https://samate.nist.gov/SARD/test-suites/112
Usage Considerations
Societal Impact
Bias Discussion
Other Known Limitations
The Juliet test suite is a synthetic dataset; all samples are handcrafted and therefore do not fully represent real software defects. Applying classifiers trained on these samples to real‑world environments may lead to degraded performance and severe misclassifications, potentially overlooking critical software defects.
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.