JUHE API Marketplace
High Quality Data

Dataset Hub

Explore high-quality datasets for your AI and machine learning projects.

Sort:

Browse by Category

lmms-lab/llava-interleave-bench

Multimodal Models
Computer Vision

LLaVA‑Interleave Bench is a comprehensive multi‑image dataset collected from public datasets or generated via the GPT‑4V API. The dataset aims to evaluate the interleaved multi‑image reasoning capability of large multimodal models. It was collected in April 2024 and released in June 2024. Its primary use is for research on large multimodal models and chatbots, targeting researchers and enthusiasts in computer vision, natural language processing, machine learning, and AI.

hugging_face
View Details

M4-Instruct-Data

Multimodal Models
Artificial Intelligence Research

M4‑Instruct is a multi‑image dataset collected in April 2024 from public datasets and the GPT‑4V API, intended for training large multimodal models. It is used for research on large multimodal models and chatbots, targeting audiences in computer vision, natural language processing, machine learning, and AI.

huggingface
View Details

lmms-lab/ScienceQA-IMG

Multimodal Models
Scientific QA

This is a formatted version of the [derek‑thomas/ScienceQA](https://huggingface.co/datasets/derek-thomas/ScienceQA) dataset that includes only image instances. It is used in the `lmms‑eval` pipeline to enable one‑click evaluation of large multimodal models. The dataset provides fields such as image, question, choices, answer, hint, task, grade, subject, topic, category, skill, lecture, and solution, and is split into training, validation, and test sets.

hugging_face
View Details

MM-Vet v2

Multimodal Models
Image‑Text Sequence Understanding

The MM‑Vet v2 dataset was jointly created by the National University of Singapore, Microsoft, and Advanced Micro Devices to evaluate the comprehensive capabilities of large multimodal models. It comprises 517 high‑quality evaluation samples covering a wide range of scenarios from everyday life to professional/industrial applications. The creation process involved researchers designing questions and collecting reference answers, ensuring high quality and broad applicability. MM‑Vet v2 specifically introduces an "image‑text sequence understanding" ability to assess a model's capacity to handle combined image and text‑sequence data, addressing complex task handling in real‑world multimodal applications.

arXiv
View Details