Dataset Catalog

Browse trusted datasets for evaluation, enrichment, and production use.

Showing 1 of 1 datasets

Category: Multilingual Evaluation

THUDM/humaneval-x

HumanEval-X is a benchmark dataset for evaluating the multilingual capabilities of code‑generation models. It comprises 820 high‑quality human‑written samples covering Python, C++, Java, JavaScript, and Go, each accompanied by test cases. The dataset can be used for code generation, translation, and related tasks.

Source hugging_faceUpdated Oct 25, 2022287 viewsLinked

Inspect dataset