eliolio/docvqa
DocVQA is a dataset for visual question answering on document images, containing 50,000 questions based on 12,767 images. It is split 80‑10‑10 into train, validation, and test sets (39,463 questions & 10,194 images for training, 5,349 questions & 1,286 images for validation, 5,188 questions & 1,287 images for testing). Document images originate from the UCSF Industry Documents Library and include printed, typed, and handwritten content such as letters, memos, notes, and reports.
Description
DocVQA – A Dataset for VQA on Document Images
Dataset Overview
- Name: DocVQA
- Task Type: Document Image Question‑Answering
- Source: Document images from the UCSF Industry Documents Library, covering printed, typed, and handwritten content.
Structure
- Total Questions: 50,000
- Total Images: 12,767
- Splits: Random 80‑10‑10 for training, validation, and test.
- Training: 39,463 questions, 10,194 images
- Validation: 5,349 questions, 1,286 images
- Test: 5,188 questions, 1,287 images
Access
- The dataset can be downloaded from the “Downloads” tab of the RRC Challenge page.
Citation
@InProceedings{mathew2021docvqa, author = {Mathew, Minesh and Karatzas, Dimosthenis and Jawahar, CV}, title = {Docvqa: A dataset for vqa on document images}, booktitle = {Proceedings of the IEEE/CVF winter conference on applications of computer vision}, year = {2021}, pages = {2200--2209}, }
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.