DATASET
Open Source Community
REFINESUMM
The REFINESUMM dataset is an integrated benchmark designed for training and evaluating vision‑language models on image‑text multimodal summarization. It comprises triples of text, associated images, and summaries derived from Wikipedia articles and their accompanying images. The summaries are automatically generated by the multimodal large language model LLaVA‑v1.6‑Mistral‑7B, which has been self‑refined for this task.
Updated 10/2/2024
github
Description
REFINESUMM: Self‑Refining Multimodal Language Model Generates Multimodal Summaries Dataset
Dataset Overview
- Name: REFINESUMM
- Type: Multimodal summarization dataset
- Goal: Train and evaluate vision‑language models for image‑text multimodal summarization tasks
- Content: Triples of text, related images, and summaries based on Wikipedia articles and their images
- Generation Model: Summaries are automatically generated by the multimodal large language model (LLaVA‑v1.6‑Mistral‑7B) and refined through a self‑refinement process
Dataset Download
- Download Link: Hugging Face
Data Loading
- Steps:
- Download the test split of WikiWeb2M:
wget https://storage.googleapis.com/gresearch/wit/wikiweb2m/wikiweb2m-test.tfrecord.gz - Place the downloaded file in the
data/directory. - In
update_data_from_wikiweb2m.py, set the split (e.g.,train,val,test) on line 12. - Run the following command:
python update_data_from_wikiweb2m.py - The dataset will be saved in
data/with columnstxt(article),img(image), andsummary(summary).
- Download the test split of WikiWeb2M:
Citation
- BibTeX:
@inproceedings{patil-etal-2024-refinesumm, title = "{REFINESUMM}: Self‑Refining {MLLM} for Generating a Multimodal Summarization Dataset", author = "Patil, Vaidehi and Ribeiro, Leonardo and Liu, Mengwen and Bansal, Mohit and Dreyer, Markus", editor = "Ku, Lun‑Wei and Martins, Andre and Srikumar, Vivek", booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", month = aug, year = "2024", address = "Bangkok, Thailand", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2024.acl-long.743", pages = "13773--13786", abstract = "Multimodal Large Language Models (MLLMs) excel at synthesizing key information from diverse sources..." }
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Login to Access
Please login to view download links and access full dataset details.
Topics
Multimodal Summarization
Vision‑Language Models
Source
Organization: github
Created: 9/23/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.