Explore high-quality datasets for your AI and machine learning projects.
The REFINESUMM dataset is an integrated benchmark designed for training and evaluating vision‑language models on image‑text multimodal summarization. It comprises triples of text, associated images, and summaries derived from Wikipedia articles and their accompanying images. The summaries are automatically generated by the multimodal large language model LLaVA‑v1.6‑Mistral‑7B, which has been self‑refined for this task.