Dataset assetOpen Source CommunityCode GenerationMultilingual Evaluation

THUDM/humaneval-x

HumanEval-X is a benchmark dataset for evaluating the multilingual capabilities of code‑generation models. It comprises 820 high‑quality human‑written samples covering Python, C++, Java, JavaScript, and Go, each accompanied by test cases. The dataset can be used for code generation, translation, and related tasks.

Source

hugging_face

Created

Nov 28, 2025

Updated

Oct 25, 2022

Signals

287 views

Availability

Linked source ready

Overview

Dataset description and usage context

HumanEval-X

Dataset Description

HumanEval-X is a benchmark designed to assess the multilingual ability of code‑generation models. It contains 820 high‑quality human‑written samples (each with test cases), covering Python, C++, Java, JavaScript, and Go, and can be used for a variety of tasks such as code generation and translation.

Languages

The dataset includes programming problems in five languages: Python, C++, Java, JavaScript, and Go.

Dataset Structure

When loading the dataset, specify one of the five available languages [python, cpp, go, java, js]. The default is python.

from datasets import load_dataset
load_dataset("THUDM/humaneval-x", "js")

DatasetDict({
    test: Dataset({
        features: [task_id, prompt, declaration, canonical_solution, test, example_test],
        num_rows: 164
    })
})

next(iter(data["test"]))
{task_id: JavaScript/0,
 prompt: /* Check if in given list of numbers, are any two numbers closer to each other than
  given threshold.
  >>> hasCloseElements([1.0, 2.0, 3.0], 0.5)
  false
  >>> hasCloseElements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
  true
  */
const hasCloseElements = (numbers, threshold) => {
,
 declaration: 
const hasCloseElements = (numbers, threshold) => {
,
 canonical_solution:   for (let i = 0; i < numbers.length; i++) {
    for (let j = 0; j < numbers.length; j++) {
      if (i != j) {
        let distance = Math.abs(numbers[i] - numbers[j]);
        if (distance < threshold) {
          return true;
        }
      }
    }
  }
  return false;
}
,
 test: const testHasCloseElements = () => {
  console.assert(hasCloseElements([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.3) === true)
  console.assert(
    hasCloseElements([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.05) === false
  )
  console.assert(hasCloseElements([1.0, 2.0, 5.9, 4.0, 5.0], 0.95) === true)
  console.assert(hasCloseElements([1.0, 2.0, 5.9, 4.0, 5.0], 0.8) === false)
  console.assert(hasCloseElements([1.0, 2.0, 3.0, 4.0, 5.0, 2.0], 0.1) === true)
  console.assert(hasCloseElements([1.1, 2.2, 3.1, 4.1, 5.1], 1.0) === true)
  console.assert(hasCloseElements([1.1, 2.2, 3.1, 4.1, 5.1], 0.5) === false)
}

testHasCloseElements()
,
 example_test: const testHasCloseElements = () => {
  console.assert(hasCloseElements([1.0, 2.0, 3.0], 0.5) === false)
  console.assert(
    hasCloseElements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3) === true
  )
}
testHasCloseElements()
}

Data Fields

task_id: Indicates the target language and problem ID. Language is one of ["Python", "Java", "JavaScript", "CPP", "Go"].
prompt: Function signature and docstring for code generation.
declaration: Only the function signature for code translation.
canonical_solution: Human‑written reference solution.
test: Hidden test cases for evaluation.
example_test: Public test cases (appear in the prompt) for evaluation.

Data Splits

Each subset contains a single split: test.

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio