JUHE API Marketplace
DATASET
Open Source Community

LiveBench

LiveBench is a large‑language‑model (LLM) benchmark created jointly by Abacus.AI, NYU, Nvidia, UMD, and USC. It contains 18 tasks spanning mathematics, programming, reasoning, language understanding, instruction following, and data analysis. LiveBench's questions are sourced from up‑to‑date materials such as recent math competitions, arXiv papers, news articles, and datasets, and answers are automatically scored against objective facts, eliminating the need for LLM or human judges. The benchmark aims to address data contamination issues in traditional evaluations, ensuring fairness and validity.

Updated 6/28/2024
arXiv

Description

LiveBench Dataset Overview

Dataset Introduction

LiveBench is a benchmark specifically designed for large language models (LLMs) to avoid test‑set contamination and enable objective evaluation. Its key characteristics are:

  • Regular Updates: New questions are released monthly, based on recent datasets, arXiv papers, news articles, and IMDb movie summaries.
  • Objective Scoring: Every question has a verifiable, objective correct answer, allowing automatic accurate scoring without LLM judges.
  • Diversity: Currently includes 17 distinct tasks across 6 categories, with more challenging tasks to be added regularly.

Dataset Content

LiveBench comprises multiple tasks covering the following categories:

  • Reasoning
  • Programming
  • Mathematics
  • Data Analysis
  • Language
  • Comprehensive Evaluation

Dataset Usage

Users can evaluate their models by submitting an issue on GitHub or emailing livebench.ai@gmail.com.

Dataset Origin

LiveBench was developed collaboratively by:

  • Abacus.AI: Colin White, Samuel Dooley, Manley Roberts, Arka Pal
  • NYU: Ben Feuer, Ravid Shwartz‑Ziv, Chinmay Hegde, Yann LeCun, Micah Goldblum
  • Nvidia: Siddhartha Jain
  • UMD: Tom Goldstein
  • USC: Willie Neiswanger

Citation

To cite the LiveBench dataset, use the following BibTeX entry:

@article{livebench,
  author    = {White, Colin and Dooley, Samuel and Roberts, Manley and Pal, Arka and Feuer, Ben and Jain, Siddhartha and Shwartz-Ziv, Ravid and Jain, Neel and Saifullah, Khalid and Naidu, Siddartha and Hegde, Chinmay and LeCun, Yann and Goldstein, Tom and Neiswanger, Willie and Goldblum, Micah},
  title     = {LiveBench: A Challenging, Contamination-Free LLM Benchmark},
  url       = {arXiv preprint arXiv:2406.19314},
  year      = {2024},
}

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Language Models
Benchmarking

Source

Organization: arXiv

Created: 6/28/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.