JUHE API Marketplace
DATASET
Open Source Community

eliolio/docvqa

DocVQA is a dataset for visual question answering on document images, containing 50,000 questions based on 12,767 images. It is split 80‑10‑10 into train, validation, and test sets (39,463 questions & 10,194 images for training, 5,349 questions & 1,286 images for validation, 5,188 questions & 1,287 images for testing). Document images originate from the UCSF Industry Documents Library and include printed, typed, and handwritten content such as letters, memos, notes, and reports.

Updated 10/11/2022
hugging_face

Description

DocVQA – A Dataset for VQA on Document Images

Dataset Overview

  • Name: DocVQA
  • Task Type: Document Image Question‑Answering
  • Source: Document images from the UCSF Industry Documents Library, covering printed, typed, and handwritten content.

Structure

  • Total Questions: 50,000
  • Total Images: 12,767
  • Splits: Random 80‑10‑10 for training, validation, and test.
    • Training: 39,463 questions, 10,194 images
    • Validation: 5,349 questions, 1,286 images
    • Test: 5,188 questions, 1,287 images

Access

Citation

@InProceedings{mathew2021docvqa, author = {Mathew, Minesh and Karatzas, Dimosthenis and Jawahar, CV}, title = {Docvqa: A dataset for vqa on document images}, booktitle = {Proceedings of the IEEE/CVF winter conference on applications of computer vision}, year = {2021}, pages = {2200--2209}, }

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Visual Question Answering
Machine Learning

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.