Dataset assetOpen Source CommunityMachine LearningVisual Question Answering

eliolio/docvqa

DocVQA is a dataset for visual question answering on document images, containing 50,000 questions based on 12,767 images. It is split 80‑10‑10 into train, validation, and test sets (39,463 questions & 10,194 images for training, 5,349 questions & 1,286 images for validation, 5,188 questions & 1,287 images for testing). Document images originate from the UCSF Industry Documents Library and include printed, typed, and handwritten content such as letters, memos, notes, and reports.

Source

hugging_face

Created

Nov 28, 2025

Updated

Oct 11, 2022

Signals

385 views

Availability

Linked source ready

Overview

Dataset description and usage context

DocVQA – A Dataset for VQA on Document Images

Dataset Overview

Name: DocVQA
Task Type: Document Image Question‑Answering
Source: Document images from the UCSF Industry Documents Library, covering printed, typed, and handwritten content.

Structure

Total Questions: 50,000
Total Images: 12,767
Splits: Random 80‑10‑10 for training, validation, and test.
- Training: 39,463 questions, 10,194 images
- Validation: 5,349 questions, 1,286 images
- Test: 5,188 questions, 1,287 images

Access

The dataset can be downloaded from the “Downloads” tab of the RRC Challenge page.

Citation

@InProceedings{mathew2021docvqa, author = {Mathew, Minesh and Karatzas, Dimosthenis and Jawahar, CV}, title = {Docvqa: A dataset for vqa on document images}, booktitle = {Proceedings of the IEEE/CVF winter conference on applications of computer vision}, year = {2021}, pages = {2200--2209}, }

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio