Back to datasets
Dataset assetOpen Source CommunityLanguage ModelsBenchmarking

LiveBench

LiveBench is a large‑language‑model (LLM) benchmark created jointly by Abacus.AI, NYU, Nvidia, UMD, and USC. It contains 18 tasks spanning mathematics, programming, reasoning, language understanding, instruction following, and data analysis. LiveBench's questions are sourced from up‑to‑date materials such as recent math competitions, arXiv papers, news articles, and datasets, and answers are automatically scored against objective facts, eliminating the need for LLM or human judges. The benchmark aims to address data contamination issues in traditional evaluations, ensuring fairness and validity.

Source
arXiv
Created
Jun 28, 2024
Updated
Jun 28, 2024
Signals
493 views
Availability
Linked source ready
Overview

Dataset description and usage context

LiveBench Dataset Overview

Dataset Introduction

LiveBench is a benchmark specifically designed for large language models (LLMs) to avoid test‑set contamination and enable objective evaluation. Its key characteristics are:

  • Regular Updates: New questions are released monthly, based on recent datasets, arXiv papers, news articles, and IMDb movie summaries.
  • Objective Scoring: Every question has a verifiable, objective correct answer, allowing automatic accurate scoring without LLM judges.
  • Diversity: Currently includes 17 distinct tasks across 6 categories, with more challenging tasks to be added regularly.

Dataset Content

LiveBench comprises multiple tasks covering the following categories:

  • Reasoning
  • Programming
  • Mathematics
  • Data Analysis
  • Language
  • Comprehensive Evaluation

Dataset Usage

Users can evaluate their models by submitting an issue on GitHub or emailing livebench.ai@gmail.com.

Dataset Origin

LiveBench was developed collaboratively by:

  • Abacus.AI: Colin White, Samuel Dooley, Manley Roberts, Arka Pal
  • NYU: Ben Feuer, Ravid Shwartz‑Ziv, Chinmay Hegde, Yann LeCun, Micah Goldblum
  • Nvidia: Siddhartha Jain
  • UMD: Tom Goldstein
  • USC: Willie Neiswanger

Citation

To cite the LiveBench dataset, use the following BibTeX entry:

@article{livebench,
  author    = {White, Colin and Dooley, Samuel and Roberts, Manley and Pal, Arka and Feuer, Ben and Jain, Siddhartha and Shwartz-Ziv, Ravid and Jain, Neel and Saifullah, Khalid and Naidu, Siddartha and Hegde, Chinmay and LeCun, Yann and Goldstein, Tom and Neiswanger, Willie and Goldblum, Micah},
  title     = {LiveBench: A Challenging, Contamination-Free LLM Benchmark},
  url       = {arXiv preprint arXiv:2406.19314},
  year      = {2024},
}
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.