Explore high-quality datasets for your AI and machine learning projects.
LLaVA‑Interleave Bench is a comprehensive multi‑image dataset collected from public datasets or generated via the GPT‑4V API. The dataset aims to evaluate the interleaved multi‑image reasoning capability of large multimodal models. It was collected in April 2024 and released in June 2024. Its primary use is for research on large multimodal models and chatbots, targeting researchers and enthusiasts in computer vision, natural language processing, machine learning, and AI.
M4‑Instruct is a multi‑image dataset collected in April 2024 from public datasets and the GPT‑4V API, intended for training large multimodal models. It is used for research on large multimodal models and chatbots, targeting audiences in computer vision, natural language processing, machine learning, and AI.
This is a formatted version of the [derek‑thomas/ScienceQA](https://huggingface.co/datasets/derek-thomas/ScienceQA) dataset that includes only image instances. It is used in the `lmms‑eval` pipeline to enable one‑click evaluation of large multimodal models. The dataset provides fields such as image, question, choices, answer, hint, task, grade, subject, topic, category, skill, lecture, and solution, and is split into training, validation, and test sets.
The MM‑Vet v2 dataset was jointly created by the National University of Singapore, Microsoft, and Advanced Micro Devices to evaluate the comprehensive capabilities of large multimodal models. It comprises 517 high‑quality evaluation samples covering a wide range of scenarios from everyday life to professional/industrial applications. The creation process involved researchers designing questions and collecting reference answers, ensuring high quality and broad applicability. MM‑Vet v2 specifically introduces an "image‑text sequence understanding" ability to assess a model's capacity to handle combined image and text‑sequence data, addressing complex task handling in real‑world multimodal applications.