THUDM/humaneval-x
Code GenerationMultilingual Evaluation
HumanEval-X is a benchmark dataset for evaluating the multilingual capabilities of code‑generation models. It comprises 820 high‑quality human‑written samples covering Python, C++, Java, JavaScript, and Go, each accompanied by test cases. The dataset can be used for code generation, translation, and related tasks.
Source hugging_faceUpdated Oct 25, 2022287 viewsLinked
Inspect dataset