Explore high-quality datasets for your AI and machine learning projects.
The CodeNet dataset was created by the Graduate School of Informatics at Nagoya University and is primarily used to evaluate multilingual code clone detectors. It contains code from two online judge systems (AIZU OJ and AtCoder) in Java, Python, C, and C++. The dataset selects 12 sub‑datasets to reflect edit‑distance similarity ranges targeted by clone detectors, aiming to assess detector correctness. CodeNet is applied in software engineering for code clone detection, addressing limitations of existing detectors in language extensibility and detection performance.