LiveBench
LiveBench is a large‑language‑model (LLM) benchmark created jointly by Abacus.AI, NYU, Nvidia, UMD, and USC. It contains 18 tasks spanning mathematics, programming, reasoning, language understanding, instruction following, and data analysis. LiveBench's questions are sourced from up‑to‑date materials such as recent math competitions, arXiv papers, news articles, and datasets, and answers are automatically scored against objective facts, eliminating the need for LLM or human judges. The benchmark aims to address data contamination issues in traditional evaluations, ensuring fairness and validity.
Description
LiveBench Dataset Overview
Dataset Introduction
LiveBench is a benchmark specifically designed for large language models (LLMs) to avoid test‑set contamination and enable objective evaluation. Its key characteristics are:
- Regular Updates: New questions are released monthly, based on recent datasets, arXiv papers, news articles, and IMDb movie summaries.
- Objective Scoring: Every question has a verifiable, objective correct answer, allowing automatic accurate scoring without LLM judges.
- Diversity: Currently includes 17 distinct tasks across 6 categories, with more challenging tasks to be added regularly.
Dataset Content
LiveBench comprises multiple tasks covering the following categories:
- Reasoning
- Programming
- Mathematics
- Data Analysis
- Language
- Comprehensive Evaluation
Dataset Usage
Users can evaluate their models by submitting an issue on GitHub or emailing livebench.ai@gmail.com.
Dataset Origin
LiveBench was developed collaboratively by:
- Abacus.AI: Colin White, Samuel Dooley, Manley Roberts, Arka Pal
- NYU: Ben Feuer, Ravid Shwartz‑Ziv, Chinmay Hegde, Yann LeCun, Micah Goldblum
- Nvidia: Siddhartha Jain
- UMD: Tom Goldstein
- USC: Willie Neiswanger
Citation
To cite the LiveBench dataset, use the following BibTeX entry:
@article{livebench,
author = {White, Colin and Dooley, Samuel and Roberts, Manley and Pal, Arka and Feuer, Ben and Jain, Siddhartha and Shwartz-Ziv, Ravid and Jain, Neel and Saifullah, Khalid and Naidu, Siddartha and Hegde, Chinmay and LeCun, Yann and Goldstein, Tom and Neiswanger, Willie and Goldblum, Micah},
title = {LiveBench: A Challenging, Contamination-Free LLM Benchmark},
url = {arXiv preprint arXiv:2406.19314},
year = {2024},
}
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: arXiv
Created: 6/28/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.