JUHE API Marketplace
DATASET
Open Source Community

vidore/infovqa_test_subsampled

This dataset is a test split extracted from the InfoVQA dataset, containing infographics collected from the internet with manually annotated questions and answers. To ensure benchmark consistency, the original test set was sampled to 500 pairs and column names were renamed. Each data instance includes multiple features such as questionId, query, image, etc.

Updated 6/27/2024
hugging_face

Description

Dataset Overview

Dataset Description

  • Source: This dataset is extracted from the InfoVQA dataset test split, containing infographics collected from the internet by searching for “infographics”. Questions and answers are manually annotated.

Data Structure

  • Features:
    • questionId: Question ID (string)
    • query: Query content (string)
    • answer: Answer (empty)
    • answer_type: Answer type (empty)
    • image: Image (image)
    • image_filename: Image filename (string)
    • operation/reasoning: Operation/Reasoning (empty)
    • ocr: OCR text (string)
    • data_split: Data split (string)
    • source: Data source (string)

Data Split

  • Test set:
    • test: Contains 500 samples, total size 277,995,931 bytes.

Dataset Size

  • Download size: 218,577,138 bytes.
  • Dataset size: 277,995,931 bytes.

Data Loading

  • Loading method:
from datasets import load_dataset

ds = load_dataset("vidore/infovqa_test_subsampled", split="test")

Citation Information

  • Citation format:
@misc{mathew_infographicvqa_2021,
  title = {{InfographicVQA}},
  copyright = {arXiv.org perpetual, non-exclusive license},
  url = {https://arxiv.org/abs/2104.12756},
  doi = {10.48550/ARXIV.2104.12756},
  urldate = {2024-06-02},
  publisher = {arXiv},
  author = {Mathew, Minesh and Bagal, Viraj and Tito, Rubèn Pérez and Karatzas, Dimosthenis and Valveny, Ernest and Jawahar, C. V},
  year = {2021},
  note = {Version Number: 2},
  keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV)},
}

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Infographic Question Answering
Visual Question Answering

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.