Dataset assetOpen Source CommunityVisual Question AnsweringInfographic Question Answering

vidore/infovqa_test_subsampled

This dataset is a test split extracted from the InfoVQA dataset, containing infographics collected from the internet with manually annotated questions and answers. To ensure benchmark consistency, the original test set was sampled to 500 pairs and column names were renamed. Each data instance includes multiple features such as questionId, query, image, etc.

Source

hugging_face

Created

Nov 28, 2025

Updated

Jun 27, 2024

Signals

280 views

Availability

Linked source ready

Overview

Dataset description and usage context

Dataset Overview

Dataset Description

Source: This dataset is extracted from the InfoVQA dataset test split, containing infographics collected from the internet by searching for “infographics”. Questions and answers are manually annotated.

Data Structure

Features:
- questionId: Question ID (string)
- query: Query content (string)
- answer: Answer (empty)
- answer_type: Answer type (empty)
- image: Image (image)
- image_filename: Image filename (string)
- operation/reasoning: Operation/Reasoning (empty)
- ocr: OCR text (string)
- data_split: Data split (string)
- source: Data source (string)

Data Split

Test set:
- test: Contains 500 samples, total size 277,995,931 bytes.

Dataset Size

Download size: 218,577,138 bytes.
Dataset size: 277,995,931 bytes.

Data Loading

Loading method:

from datasets import load_dataset

ds = load_dataset("vidore/infovqa_test_subsampled", split="test")

Citation Information

Citation format:

@misc{mathew_infographicvqa_2021,
  title = {{InfographicVQA}},
  copyright = {arXiv.org perpetual, non-exclusive license},
  url = {https://arxiv.org/abs/2104.12756},
  doi = {10.48550/ARXIV.2104.12756},
  urldate = {2024-06-02},
  publisher = {arXiv},
  author = {Mathew, Minesh and Bagal, Viraj and Tito, Rubèn Pérez and Karatzas, Dimosthenis and Valveny, Ernest and Jawahar, C. V},
  year = {2021},
  note = {Version Number: 2},
  keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV)},
}

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio