Dataset assetOpen Source CommunityLanguage ModelsBenchmarking

LiveBench

LiveBench is a large‑language‑model (LLM) benchmark created jointly by Abacus.AI, NYU, Nvidia, UMD, and USC. It contains 18 tasks spanning mathematics, programming, reasoning, language understanding, instruction following, and data analysis. LiveBench's questions are sourced from up‑to‑date materials such as recent math competitions, arXiv papers, news articles, and datasets, and answers are automatically scored against objective facts, eliminating the need for LLM or human judges. The benchmark aims to address data contamination issues in traditional evaluations, ensuring fairness and validity.

Source

arXiv

Created

Jun 28, 2024

Updated

Jun 28, 2024

Signals

493 views

Availability

Linked source ready

Overview

Dataset description and usage context

LiveBench Dataset Overview

Dataset Introduction

LiveBench is a benchmark specifically designed for large language models (LLMs) to avoid test‑set contamination and enable objective evaluation. Its key characteristics are:

Regular Updates: New questions are released monthly, based on recent datasets, arXiv papers, news articles, and IMDb movie summaries.
Objective Scoring: Every question has a verifiable, objective correct answer, allowing automatic accurate scoring without LLM judges.
Diversity: Currently includes 17 distinct tasks across 6 categories, with more challenging tasks to be added regularly.

Dataset Content

LiveBench comprises multiple tasks covering the following categories:

Reasoning
Programming
Mathematics
Data Analysis
Language
Comprehensive Evaluation

Dataset Usage

Users can evaluate their models by submitting an issue on GitHub or emailing livebench.ai@gmail.com.

Dataset Origin

LiveBench was developed collaboratively by:

Abacus.AI: Colin White, Samuel Dooley, Manley Roberts, Arka Pal
NYU: Ben Feuer, Ravid Shwartz‑Ziv, Chinmay Hegde, Yann LeCun, Micah Goldblum
Nvidia: Siddhartha Jain
UMD: Tom Goldstein
USC: Willie Neiswanger

Citation

To cite the LiveBench dataset, use the following BibTeX entry:

@article{livebench,
  author    = {White, Colin and Dooley, Samuel and Roberts, Manley and Pal, Arka and Feuer, Ben and Jain, Siddhartha and Shwartz-Ziv, Ravid and Jain, Neel and Saifullah, Khalid and Naidu, Siddartha and Hegde, Chinmay and LeCun, Yann and Goldstein, Tom and Neiswanger, Willie and Goldblum, Micah},
  title     = {LiveBench: A Challenging, Contamination-Free LLM Benchmark},
  url       = {arXiv preprint arXiv:2406.19314},
  year      = {2024},
}

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.