JUHE API Marketplace
DATASET
Open Source Community

THUDM/humaneval-x

HumanEval-X is a benchmark dataset for evaluating the multilingual capabilities of code‑generation models. It comprises 820 high‑quality human‑written samples covering Python, C++, Java, JavaScript, and Go, each accompanied by test cases. The dataset can be used for code generation, translation, and related tasks.

Updated 10/25/2022
hugging_face

Description

HumanEval-X

Dataset Description

HumanEval-X is a benchmark designed to assess the multilingual ability of code‑generation models. It contains 820 high‑quality human‑written samples (each with test cases), covering Python, C++, Java, JavaScript, and Go, and can be used for a variety of tasks such as code generation and translation.

Languages

The dataset includes programming problems in five languages: Python, C++, Java, JavaScript, and Go.

Dataset Structure

When loading the dataset, specify one of the five available languages [python, cpp, go, java, js]. The default is python.

from datasets import load_dataset
load_dataset("THUDM/humaneval-x", "js")
DatasetDict({
    test: Dataset({
        features: [task_id, prompt, declaration, canonical_solution, test, example_test],
        num_rows: 164
    })
})
next(iter(data["test"]))
{task_id: JavaScript/0,
 prompt: /* Check if in given list of numbers, are any two numbers closer to each other than
  given threshold.
  >>> hasCloseElements([1.0, 2.0, 3.0], 0.5)
  false
  >>> hasCloseElements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
  true
  */
const hasCloseElements = (numbers, threshold) => {
,
 declaration: 
const hasCloseElements = (numbers, threshold) => {
,
 canonical_solution:   for (let i = 0; i < numbers.length; i++) {
    for (let j = 0; j < numbers.length; j++) {
      if (i != j) {
        let distance = Math.abs(numbers[i] - numbers[j]);
        if (distance < threshold) {
          return true;
        }
      }
    }
  }
  return false;
}
,
 test: const testHasCloseElements = () => {
  console.assert(hasCloseElements([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.3) === true)
  console.assert(
    hasCloseElements([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.05) === false
  )
  console.assert(hasCloseElements([1.0, 2.0, 5.9, 4.0, 5.0], 0.95) === true)
  console.assert(hasCloseElements([1.0, 2.0, 5.9, 4.0, 5.0], 0.8) === false)
  console.assert(hasCloseElements([1.0, 2.0, 3.0, 4.0, 5.0, 2.0], 0.1) === true)
  console.assert(hasCloseElements([1.1, 2.2, 3.1, 4.1, 5.1], 1.0) === true)
  console.assert(hasCloseElements([1.1, 2.2, 3.1, 4.1, 5.1], 0.5) === false)
}

testHasCloseElements()
,
 example_test: const testHasCloseElements = () => {
  console.assert(hasCloseElements([1.0, 2.0, 3.0], 0.5) === false)
  console.assert(
    hasCloseElements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3) === true
  )
}
testHasCloseElements()
}

Data Fields

  • task_id: Indicates the target language and problem ID. Language is one of ["Python", "Java", "JavaScript", "CPP", "Go"].
  • prompt: Function signature and docstring for code generation.
  • declaration: Only the function signature for code translation.
  • canonical_solution: Human‑written reference solution.
  • test: Hidden test cases for evaluation.
  • example_test: Public test cases (appear in the prompt) for evaluation.

Data Splits

Each subset contains a single split: test.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Code Generation
Multilingual Evaluation

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.