eliolio/docvqa
DocVQA is a dataset for visual question answering on document images, containing 50,000 questions based on 12,767 images. It is split 80‑10‑10 into train, validation, and test sets (39,463 questions & 10,194 images for training, 5,349 questions & 1,286 images for validation, 5,188 questions & 1,287 images for testing). Document images originate from the UCSF Industry Documents Library and include printed, typed, and handwritten content such as letters, memos, notes, and reports.
Dataset description and usage context
DocVQA – A Dataset for VQA on Document Images
Dataset Overview
- Name: DocVQA
- Task Type: Document Image Question‑Answering
- Source: Document images from the UCSF Industry Documents Library, covering printed, typed, and handwritten content.
Structure
- Total Questions: 50,000
- Total Images: 12,767
- Splits: Random 80‑10‑10 for training, validation, and test.
- Training: 39,463 questions, 10,194 images
- Validation: 5,349 questions, 1,286 images
- Test: 5,188 questions, 1,287 images
Access
- The dataset can be downloaded from the “Downloads” tab of the RRC Challenge page.
Citation
@InProceedings{mathew2021docvqa, author = {Mathew, Minesh and Karatzas, Dimosthenis and Jawahar, CV}, title = {Docvqa: A dataset for vqa on document images}, booktitle = {Proceedings of the IEEE/CVF winter conference on applications of computer vision}, year = {2021}, pages = {2200--2209}, }
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.