Back to datasets
Dataset assetOpen Source CommunityMultilingual ProcessingCode Analysis

NTU-NLP-sg/xCodeEval

xCodeEval is currently the largest executable multilingual multitask benchmark dataset, containing 25 million document‑level code examples covering approximately 7,500 unique problems across 17 programming languages. The dataset comprises seven tasks involving code understanding, generation, translation, and retrieval, and uses execution‑based evaluation. It also introduces a code execution engine, ExecEval, supporting all languages, and proposes a data splitting and selection scheme based on geometric mean and graph‑theoretic principles to balance the distribution of multiple attributes.

Source
hugging_face
Created
Nov 28, 2025
Updated
Jun 6, 2024
Signals
241 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Basic Information

  • name: xCodeEval
  • languages: code, English
  • language creation method: discovery, expert generation
  • license: cc-by-nc-4.0
  • multilinguality: multilingual
  • size: 1M<n<10M, 10M<n<100M
  • source: raw data

Tags

  • programming languages
  • code
  • program synthesis
  • automatic code repair
  • code retrieval
  • code translation
  • code classification

Task Categories

  • translation
  • token classification
  • text‑to‑text generation
  • text retrieval
  • text generation
  • text classification
  • feature extraction
  • question answering

Dataset Description

  • xCodeEval is a large‑scale multilingual multitask benchmark, containing ~25M document‑level code examples, covering ~7.5K unique problems and 17 programming languages.
  • The dataset includes seven tasks covering code understanding, generation, translation, and retrieval, evaluated via execution.
  • Developed a multilingual code execution engine ExecEval supporting all languages.
  • Proposed a data splitting and selection scheme based on geometric mean and graph‑theoretic principles to balance multi‑attribute data distribution.

Data Download

  • Can be loaded via Hugging Face load_dataset() API.
  • Data also downloadable via Git LFS from Hugging Face.

Task Details

  1. Tag Classification
  2. Code Compilation
  3. Program Synthesis
  4. Code Translation
  5. Automated Program Repair
  6. Code‑to‑Code Retrieval
  7. Natural Language‑to‑Code Retrieval

Shared Data

  • problem_descriptions.jsonl
  • unittest_db.json
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio