Back to datasets
Dataset assetOpen Source CommunityMachine LearningVisual Question Answering

eliolio/docvqa

DocVQA is a dataset for visual question answering on document images, containing 50,000 questions based on 12,767 images. It is split 80‑10‑10 into train, validation, and test sets (39,463 questions & 10,194 images for training, 5,349 questions & 1,286 images for validation, 5,188 questions & 1,287 images for testing). Document images originate from the UCSF Industry Documents Library and include printed, typed, and handwritten content such as letters, memos, notes, and reports.

Source
hugging_face
Created
Nov 28, 2025
Updated
Oct 11, 2022
Signals
385 views
Availability
Linked source ready
Overview

Dataset description and usage context

DocVQA – A Dataset for VQA on Document Images

Dataset Overview

  • Name: DocVQA
  • Task Type: Document Image Question‑Answering
  • Source: Document images from the UCSF Industry Documents Library, covering printed, typed, and handwritten content.

Structure

  • Total Questions: 50,000
  • Total Images: 12,767
  • Splits: Random 80‑10‑10 for training, validation, and test.
    • Training: 39,463 questions, 10,194 images
    • Validation: 5,349 questions, 1,286 images
    • Test: 5,188 questions, 1,287 images

Access

Citation

@InProceedings{mathew2021docvqa, author = {Mathew, Minesh and Karatzas, Dimosthenis and Jawahar, CV}, title = {Docvqa: A dataset for vqa on document images}, booktitle = {Proceedings of the IEEE/CVF winter conference on applications of computer vision}, year = {2021}, pages = {2200--2209}, }

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio