JUHE API Marketplace
High Quality Data

Dataset Hub

Explore high-quality datasets for your AI and machine learning projects.

Sort:

Browse by Category

ActPlan-1K

Vision-Language Models
Program Planning

ActPlan‑1K is a multimodal planning benchmark jointly created by the Hong Kong University of Science and Technology and the University of California, San Diego. It evaluates vision‑language models' program planning abilities in domestic activities. The dataset includes 153 activities and 1,187 instances, each comprising a natural‑language task description and multiple environment images captured from the iGibson2 simulator. The creation process combined ChatGPT and iGibson2, converting BDDL activity definitions into natural‑language descriptions and collecting environment images. ActPlan‑1K is primarily used to assess the program planning capabilities of vision‑language models in multimodal tasks, especially for home activities and counterfactual scenarios.

arXiv
View Details

Multi-P2A

Privacy Protection
Vision-Language Models

Multi-P2A is a comprehensive benchmark dataset created by the Institute of Computing Technology, Chinese Academy of Sciences, intended to evaluate the privacy protection capabilities of large vision‑language models (LVLMs). The dataset covers 26 categories of personal privacy, 15 categories of commercial secrets, and 18 categories of state secrets, totaling 31,962 samples. It is constructed from existing datasets and social media platforms, generating samples via visual question answering (VQA) tasks to ensure high quality and diversity. Multi-P2A is mainly applied in privacy risk assessment, helping developers and researchers identify and mitigate potential privacy leaks in LVLMs during training and inference, thereby advancing privacy protection technologies.

arXiv
View Details

MC-LLaVA Multi-Concept Personalization Dataset

Vision-Language Models
Multi-Concept Personalization

The MC-LLaVA Multi-Concept Personalization dataset is a high-quality collection designed to advance multi-concept personalization research. It gathers images featuring multiple characters from various movies and manually generates multi-concept question‑answer samples. With diverse movie genres and QA types, the dataset aims to enable vision‑language models to excel in multi-concept personalization tasks.

github
View Details

SPA-VL

Vision-Language Models
Model Safety

SPA-VL是一个综合的安全偏好对齐数据集,用于视觉语言模型。该数据集包含100,788个样本,覆盖多个领域,旨在通过多样化的模型回答和问题类型来增强模型的安全性和有效性,确保模型在无害性和帮助性两方面得到平衡改进。

github
View Details

MM-CamObj

Vision-Language Models
Camouflaged Object Detection

The MM‑CamObj dataset, created by Shanghai Jiao Tong University, addresses challenges for vision‑language models in complex, especially camouflaged‑object, scenarios. It comprises two subsets: CamObj‑Align (11,363 high‑quality image‑text pairs) for vision‑language alignment, and CamObj‑Instruct (11,363 images with 68,849 diverse dialogues) for instruction fine‑tuning. Images were carefully selected from classic datasets and detailed descriptions and dialogues were generated using GPT‑4o. MM‑CamObj is primarily used to evaluate and improve vision‑language models on camouflaged‑object detection, localization, and counting tasks.

arXiv
View Details