vqa
WorldCuisines is a large‑scale multilingual and multicultural visual question answering (VQA) benchmark that focuses on cross‑cultural understanding through global cuisines. The dataset comprises text‑image pairs in 30 languages and dialects, spanning nine language families, and contains over one million data points, making it the largest multicultural VQA benchmark to date. It includes two primary tasks: dish name prediction and location prediction. The construction process involves dish selection, metadata annotation, quality assurance, and data compilation. Two evaluation subsets (12,000 and 60,000 instances) and one training set (1,080,000 instances) are provided.
Description
WorldCuisines: A Massive‑Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines
Overview
WorldCuisines is a massive‑scale VQA benchmark for multilingual and multicultural understanding via global cuisines. The dataset spans 30 languages and dialects across nine language families, containing over one million data points, the largest multicultural VQA benchmark as of October 17 2024.
Dataset Composition
- WC‑VQA: VQA dataset built on the WC‑KB.
- WC‑KB: Knowledge base of global cuisines, comprising 2,414 dishes, 6,045 images and metadata, covering coarse‑ and fine‑grained classification, geography, and regional cuisine.
Task Design
-
Task 1: Dish Name Prediction
- (a) Context‑free Questions: Predict dish name without additional context.
- (b) Contextualized Questions: Provide context and predict dish name.
- (c) Adversarial Contextualized Questions: Provide misleading context and predict dish name.
-
Task 2: Geographic Location Prediction
- Based on dish image, question, and context, predict the typical consumption or origin location of the dish.
Dataset Scale
- WC‑VQA: 1 M samples across 30 languages and dialects.
- Two evaluation subsets (12 k and 60 k instances) and a training set (1.08 M instances) are provided.
Dataset Construction
Data Sources
- Data collected from Wikipedia and Wikimedia Commons.
Construction Steps
- Dish Selection: Filter culturally significant dishes from Wikipedia.
- Metadata Annotation: Manually compile dish metadata, including visual representation, classification, description, cuisine, and geographic distribution.
- Quality Assurance: Multi‑round checks to ensure data quality.
- Data Compilation: Merge metadata into a single file.
VQA Generation
Generation Process
- Dish Name Similarity Search: Use multilingual models to compute text embeddings and identify similar dishes.
- Question and Context Construction: Build various question types based on context.
- Multilingual Translation: Translate questions and contexts into 30 languages and dialects.
- VQA Triplet Generation: Ensure no overlap between train and test sets; randomly sample dishes and question pairs to generate VQA data.
Ethical Considerations
- Crowdsourced annotation with transparency and fairness.
- Dataset released under CC‑BY‑SA 4.0 license.
Contact Information
- Email: Genta Indra Winata and Frederikus Hudi
Citation
bibtex @misc{winata2024worldcuisinesmassivescalebenchmarkmultilingual, title={WorldCuisines: A Massive‑Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines}, author={Genta Indra Winata and Frederikus Hudi and Patrick Amadeus Irawan and David Anugraha and Rifki Afina Putri and Yutong Wang and Adam Nohejl and Ubaidillah Ariq Prathama and Nedjma Ousidhoum and Afifa Amriani and Anar Rzayev and Anirban Das and Ashmari Pramodya and Aulia Adila and Bryan Wilie and Candy Olivia Mawalim and Ching Lam Cheng and Daud Abolade and Emmanuele Chersoni and Enrico Santus and Fariz Ikhwantri and Garry Kuwanto and Hanyang Zhao and Haryo Akbarianto Wibowo and Holy Lovenia and Jan Christian Blaise Cruz and Jan Wira Gotama Putra and Junho Myung and Lucky Susanto and Maria Angelica Riera Machin and Marina Zhukova and Michael Anugraha and Muhammad Farid Adilazuarda and Natasha Santosa and Peerat Limkonchotiwat... (truncated for brevity), year={2024}, eprint={2410.12705}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2410.12705}, }
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: huggingface
Created: 10/9/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.