High Quality Data

Dataset Hub

Explore high-quality datasets for your AI and machine learning projects.

Sort:

Browse by Category

gonglinyuan/safim

SAFIM (Syntax-Aware Fill-in-the-Middle) is a benchmark for evaluating large language models (LLMs) on code fill-in-the-middle (FIM) tasks. SAFIM comprises three sub-tasks: algorithmic block completion, control-flow expression completion, and API function call completion. The dataset is sourced from code submitted between April 2022 and January 2023 to minimize data contamination affecting evaluation results.

hugging_face

View Details