Back to datasets
Dataset assetOpen Source CommunityCommercial FinanceQuantitative Reasoning

kensho/bizbench

--- license: apache-2.0 configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* dataset_info: features: - name: question dtype: string - name: answer dtype: string - name: task dtype: string - name: context dtype: string - name: context_type dtype: string - name: options sequence: string - name: program dtype: string splits: - name: train num_bytes: 52823429 num_examples: 14377 - name: test num_bytes: 15720371 num_examples: 4673 download_size: 23760863 dataset_size: 68543800 --- <p align="left"> <img src="bizbench_pyramid.png"> </p> # BizBench: A Quantitative Reasoning Benchmark for Business and Finance Public dataset for [BizBench](https://arxiv.org/abs/2311.06602). Answering questions within business and finance requires reasoning, precision, and a wide-breadth of technical knowledge. Together, these requirements make this domain difficult for large language models (LLMs). We introduce BizBench, a benchmark for evaluating models' ability to reason about realistic financial problems. BizBench comprises **eight quantitative reasoning tasks**, focusing on question-answering (QA) over financial data via program synthesis. We include three financially-themed code-generation tasks from newly collected and augmented QA data. Additionally, we isolate the reasoning capabilities required for financial QA: reading comprehension of financial text and tables for extracting intermediate values, and understanding financial concepts and formulas needed to calculate complex solutions. Collectively, these tasks evaluate a model's financial background knowledge, ability to parse financial documents, and capacity to solve problems with code. We conducted an in-depth evaluation of open-source and commercial LLMs, comparing and contrasting the behavior of code-focused and language-focused models. We demonstrate that the current bottleneck in performance is due to LLMs' limited business and financial understanding, highlighting the value of a challenging benchmark for quantitative reasoning within this domain. We have also develop a heavily curated leaderboard with a held-out test set open to submission: [https://benchmarks.kensho.com/](https://benchmarks.kensho.com/). This set was manually curated by financial professionals and further cleaned by hand in order to ensure the highest quality. A sample pipeline for using this dataset can be found at [https://github.com/kensho-technologies/benchmarks-pipeline](https://github.com/kensho-technologies/benchmarks-pipeline). ## Dataset Statistics | Dataset | Train/Few Shot Data | Test Data | | --- | --- | --- | | **Program Synthesis** | | | | FinCode | 7 | 47 | | CodeFinQA | 4668 | 795 | | CodeTATQA | 2856 | 2000 | | **Quantity Extraction** | | | | ConvFinQA (E) | | 629 | | TAT-QA (E) | | 120 | | SEC-Num | 6846 | 2000 | | **Domain Knowledge** | | | | FinKnow | | 744 | | ForumlaEval | | 50 |

Source
hugging_face
Created
Nov 28, 2025
Updated
Jun 3, 2024
Signals
196 views
Availability
Linked source ready
Overview

Dataset description and usage context

BizBench Dataset Overview

Dataset Information

License

  • Apache 2.0

Configuration

  • Default configuration
    • Training data path: data/train-*
    • Test data path: data/test-*

Features

  • question: string
  • answer: string
  • task: string
  • context: string
  • context_type: string
  • options: sequence of strings
  • program: string

Data Splits

  • Training set
    • Bytes: 52,823,429
    • Samples: 14,377
  • Test set
    • Bytes: 15,720,371
    • Samples: 4,673

Size

  • Download size: 23,760,863 bytes
  • Dataset size: 68,543,800 bytes

Statistics

DatasetTrain/Small‑shot DataTest Data
Program Synthesis
FinCode747
CodeFinQA4,668795
CodeTATQA2,8562,000
Quantity Extraction
ConvFinQA (E)629
TAT‑QA (E)120
SEC‑Num6,8462,000
Domain Knowledge
FinKnow744
ForumlaEval50
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio