JUHE API Marketplace
DATASET
Open Source Community

CodeNet

The CodeNet dataset was created by the Graduate School of Informatics at Nagoya University and is primarily used to evaluate multilingual code clone detectors. It contains code from two online judge systems (AIZU OJ and AtCoder) in Java, Python, C, and C++. The dataset selects 12 sub‑datasets to reflect edit‑distance similarity ranges targeted by clone detectors, aiming to assess detector correctness. CodeNet is applied in software engineering for code clone detection, addressing limitations of existing detectors in language extensibility and detection performance.

Updated 9/10/2024
arXiv

Description

MCCD_Benchmarking Dataset Overview

Dataset Download

Dataset Usage

Execute Clone Detection

  • Subset identifiers: "p02263", "p00048", "p00001", "p00000", "p02269", "p02256", "p02257", "p02265", "p00002", "p00003", "p00008", "p00050", "p02271", "p00005"
  • Result file naming: Language_problemId.csv
  • File format: Each line represents a clone pair, formatted as "segment1 file path, segment1 start line, segment1 endline, segment2 file path, segment2 start line, segment2 endline"

Add Target Tool

  • Command: python3 AddDetector.py ToolName
  • Parameter: ToolName is the identifier of the target clone detector

Import Detection Results

  • Command: python3 ImportClones.py ToolName ResultFolder [Language+]
  • Parameters:
    • ToolName: tool identifier
    • ResultFolder: path to the folder containing results for each subset
    • Language: optionally list target languages to import

Evaluation

  • Command: python3 Evaluation.py [ToolName]
  • Parameter: ToolName is the tool identifier; if omitted, all registered tools are evaluated

Output All Results

  • Command: python3 GroupedData.py

Remove Tool or Clone

  • Command: python3 CloneClones.py ToolName
  • Parameter: ToolName is the tool identifier
  • Command: python3 RemoveDetector.py ToolName
  • Parameter: ToolName is the tool identifier

Dataset Characteristics

  • Recall: Provided
  • Precision: To be provided

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Code Clone Detection
Gait Analysis

Source

Organization: arXiv

Created: 9/10/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.