Dataset Catalog

Browse trusted datasets for evaluation, enrichment, and production use.

Showing 1 of 1 datasets

Category: Chat Large Language Models

A-Eval

Model EvaluationChat Large Language Models

A-Eval is a benchmark for evaluating chat large language models (LLMs) of various scales from an application-driven perspective. The dataset contains 678 question-answer pairs spanning 5 categories, 27 sub-categories, and 3 difficulty levels. A-Eval provides clear empirical and engineering guidelines for selecting the "best" model for real-world applications.

Source githubUpdated Aug 9, 202486 viewsLinked

Inspect dataset