JUHE API Marketplace
High Quality Data

Dataset Hub

Explore high-quality datasets for your AI and machine learning projects.

Sort:

Browse by Category

DARE

Visual Question Answering
Robustness Evaluation

DARE (Diverse Visual Question Answering with Robustness Evaluation) is a carefully curated multiple‑choice VQA benchmark. It evaluates visual‑language models across five categories and includes four robustness assessments based on prompt, answer‑option subset, output format, and number of correct answers. The validation split contains images, questions, answer options, and correct answers, while the test split hides correct answers to prevent leakage.

huggingface
View Details

eliolio/docvqa

Visual Question Answering
Machine Learning

DocVQA is a dataset for visual question answering on document images, containing 50,000 questions based on 12,767 images. It is split 80‑10‑10 into train, validation, and test sets (39,463 questions & 10,194 images for training, 5,349 questions & 1,286 images for validation, 5,188 questions & 1,287 images for testing). Document images originate from the UCSF Industry Documents Library and include printed, typed, and handwritten content such as letters, memos, notes, and reports.

hugging_face
View Details

InternVL-Chat-V1-2-SFT-Data

Visual Question Answering
Question Answering Systems

This dataset is used for visual question answering and QA tasks, supporting both Chinese and English. It includes multiple configuration files such as ai2d_train_12k, chartqa_train_18k, etc., each corresponding to different types of training data files.

huggingface
View Details

vidore/infovqa_test_subsampled

Infographic Question Answering
Visual Question Answering

This dataset is a test split extracted from the InfoVQA dataset, containing infographics collected from the internet with manually annotated questions and answers. To ensure benchmark consistency, the original test set was sampled to 500 pairs and column names were renamed. Each data instance includes multiple features such as questionId, query, image, etc.

hugging_face
View Details

VQA-RAD Dataset

Medical Image Analysis
Visual Question Answering

We contributed to the development of the VQA‑RAD dataset by acquiring radiology reports. Our work involved collecting and validating these reports to ensure clear structure and accurate textual information corresponding to each image.

github
View Details

Phando/vqa_v2

Visual Question Answering
Computer Vision

This dataset, named vqa_v2, contains multiple features such as question type, multiple-choice answer, answer list (including answer, answer confidence, and answer ID), image ID, answer type, question ID, question, and image. The dataset is split into training, validation, and test parts, containing 443,757, 214,354, and 447,793 samples respectively. The download size is 34,818,002,031 bytes, and the total size is 171,555,262,245.114 bytes.

hugging_face
View Details

HuggingFaceM4/A-OKVQA

Visual Question Answering
Computer Vision

--- configs: - config_name: default data_files: - split: train path: data/train-* - split: validation path: data/validation-* - split: test path: data/test-* dataset_info: features: - name: image dtype: image - name: question_id dtype: string - name: question dtype: string - name: choices list: string - name: correct_choice_idx dtype: int8 - name: direct_answers dtype: string - name: difficult_direct_answer dtype: bool - name: rationales list: string splits: - name: train num_bytes: 929295572.0 num_examples: 17056 - name: validation num_bytes: 60797340.875 num_examples: 1145 - name: test num_bytes: 338535925.25 num_examples: 6702 download_size: 1323807326 dataset_size: 1328628838.125 --- # Dataset Card for "A-OKVQA" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

hugging_face
View Details

OpenDriveLab/DriveLM

Autonomous Driving
Visual Question Answering

The DriveLM dataset supports perception, prediction, planning, behavior and motion tasks through graph‑structured question‑answer pairs. It consists of two parts: DriveLM‑nuScenes and DriveLM‑CARLA. DriveLM‑nuScenes is built on the nuScenes dataset, while DriveLM‑CARLA is collected from the CARLA simulator. Currently, only the training split of DriveLM‑nuScenes is publicly available. The dataset includes a series of questions and answers together with the associated images.

hugging_face
View Details

docmatix-ir

Visual Question Answering
Document Retrieval

The Docmatix‑IR dataset is derived from the original Docmatix collection and is specifically intended for training document visual embedding models for open‑domain visual question answering. By filtering unsuitable questions and mining hard negatives, the dataset provides high‑quality training data. Concretely, the Document Screenshot Embedding (DSE) model encodes the entire Docmatix corpus, and retrieval results are used to select questions. The final result consists of 5.61 M high‑quality training samples, after filtering out roughly 4 M questions.

huggingface
View Details

vqa

Visual Question Answering
Multilingual Culture

WorldCuisines is a large‑scale multilingual and multicultural visual question answering (VQA) benchmark that focuses on cross‑cultural understanding through global cuisines. The dataset comprises text‑image pairs in 30 languages and dialects, spanning nine language families, and contains over one million data points, making it the largest multicultural VQA benchmark to date. It includes two primary tasks: dish name prediction and location prediction. The construction process involves dish selection, metadata annotation, quality assurance, and data compilation. Two evaluation subsets (12,000 and 60,000 instances) and one training set (1,080,000 instances) are provided.

huggingface
View Details

vgbench/VGQA

Visual Question Answering
Vector Graphics Understanding

The VGQA dataset is the first comprehensive benchmark for evaluating large language models (LLMs) on vector graphics processing and generation capabilities.

hugging_face
View Details