JUHE API Marketplace
DATASET
Open Source Community

DocuQA

This dataset is designed for testing document‑based question‑answering applications or APIs and accepts PDF documents as input. It contains 20 distinct documents, each accompanied by 5 different question types, for a total of 100 evaluation questions. Document types vary widely, including journal articles, news reports, financial statements, and tutorials, aiming to assess a QA system's ability to understand context, recognize keywords, and extract specific information.

Updated 2/15/2024
github

Description

Dataset Overview

Dataset Name

Document‑Based Question Answering Dataset

Purpose

To test PDF‑document‑based question‑answering applications or interfaces.

Content

  • Number of Documents: 20
  • Question Types per Document: 5 (total 100 questions)
  • Document Types:
    • Journal articles (5): contain calculations, formulas, and numerical data
    • News articles (5): contain specific headlines and dates
    • Reports / Financial reports / News (5): contain specific numbers and monetary data
    • Tutorials (5): provide step‑by‑step instructions, including numerical values and units

Questions & Answers

  • Question Design: Five question types per document, covering diverse aspects to comprehensively evaluate QA capability
  • Answer Format: Answer key based on ground‑truth answers

Accuracy Computation

  • Method: Calculate the proportion of questions answered "TRUE" out of the total to gauge the system's ability to extract accurate information from varied document types

Use Cases

  • Evaluate performance of QA systems handling heterogeneous document and question types

Citation

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Document QA
Natural Language Processing

Source

Organization: github

Created: 2/14/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.