Back to datasets
Dataset assetOpen Source CommunityVision‑Language ModelsMultimodal Summarization
REFINESUMM
The REFINESUMM dataset is an integrated benchmark designed for training and evaluating vision‑language models on image‑text multimodal summarization. It comprises triples of text, associated images, and summaries derived from Wikipedia articles and their accompanying images. The summaries are automatically generated by the multimodal large language model LLaVA‑v1.6‑Mistral‑7B, which has been self‑refined for this task.
Source
github
Created
Sep 23, 2024
Updated
Oct 2, 2024
Signals
84 views
Availability
Linked source ready
Overview
Dataset description and usage context
REFINESUMM: Self‑Refining Multimodal Language Model Generates Multimodal Summaries Dataset
Dataset Overview
- Name: REFINESUMM
- Type: Multimodal summarization dataset
- Goal: Train and evaluate vision‑language models for image‑text multimodal summarization tasks
- Content: Triples of text, related images, and summaries based on Wikipedia articles and their images
- Generation Model: Summaries are automatically generated by the multimodal large language model (LLaVA‑v1.6‑Mistral‑7B) and refined through a self‑refinement process
Dataset Download
- Download Link: Hugging Face
Data Loading
- Steps:
- Download the test split of WikiWeb2M:
wget https://storage.googleapis.com/gresearch/wit/wikiweb2m/wikiweb2m-test.tfrecord.gz - Place the downloaded file in the
data/directory. - In
update_data_from_wikiweb2m.py, set the split (e.g.,train,val,test) on line 12. - Run the following command:
python update_data_from_wikiweb2m.py - The dataset will be saved in
data/with columnstxt(article),img(image), andsummary(summary).
- Download the test split of WikiWeb2M:
Citation
- BibTeX:
@inproceedings{patil-etal-2024-refinesumm, title = "{REFINESUMM}: Self‑Refining {MLLM} for Generating a Multimodal Summarization Dataset", author = "Patil, Vaidehi and Ribeiro, Leonardo and Liu, Mengwen and Bansal, Mohit and Dreyer, Markus", editor = "Ku, Lun‑Wei and Martins, Andre and Srikumar, Vivek", booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", month = aug, year = "2024", address = "Bangkok, Thailand", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2024.acl-long.743", pages = "13773--13786", abstract = "Multimodal Large Language Models (MLLMs) excel at synthesizing key information from diverse sources..." }
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.