Dataset Hub

MHAD: Multimodal Home Activity Dataset

Home Activity Recognition

The MHAD dataset was jointly collected by JD Health, Huazhong University of Science and Technology, and Zhejiang University. It is the first multimodal dataset captured in real home environments, featuring multiple camera angles and a wide range of household scenarios. It includes the most comprehensive set of physiological signals to date and is a valuable resource for computer vision, machine learning, and biomedical engineering research.

Intern · WanJuan 1.0

AI Research

Intern·WanJuan 1.0 is the first open‑source version of the Intern·Wanjuan multimodal corpus, comprising text, image‑text, and video datasets, with a total data volume exceeding 2 TB. Built on the large‑model data alliance, Shanghai AI Lab performed fine‑grained cleaning, deduplication, and value alignment, resulting in a multimodal‑integrated, meticulously processed, value‑aligned, user‑friendly, and efficient dataset.

HUVER

Unmanned Aerial Vehicles

The HUVER dataset contains 6,051 unique drone configurations, each described by multiple formats such as grammar strings, RGB images, and GLB files. Additionally, each configuration includes an English textual descriptor that details the drone’s features in natural language. The dataset supports tasks such as image‑to‑text, image‑to‑3D, and feature extraction, and is curated by Abhiram Karri, Gary Stump, Christopher McComb, and Binyang Song under the MIT License.

huggingface

Social-Media-Dataset

Social Media

This dataset contains over 1 million tweets crawled from Twitter. After filtering and processing, it retains multimodal text‑image data, extracts emojis and embedded text, resulting in a dataset with four modalities.

Medical Image Segmentation

IMed-361M

IMed-361M数据集是最大的公开多模态交互式医学图像分割数据集，包含640万张图像、2.734亿个掩码（每张图像56个掩码）、14种成像模式和204个分割目标。它确保了六个解剖组之间的多样性，细粒度注释，大多数掩码覆盖的图像区域小于2%，并且广泛适用，83%的图像分辨率在256×256到1024×1024之间。IMed-361M提供的掩码数量是MedTrinity-25M的14.4倍，显著超过了其他数据集的规模和掩码数量。

UniMed

Medical Imaging

UniMed is a large‑scale, open‑source multimodal medical dataset created by Mohammed Bin Zayed University of AI and other institutions. It contains over 5.3 million image‑text pairs across six imaging modalities: X‑ray, CT, MRI, Ultrasound, Pathology, and Fundus. The dataset is built by converting modality‑specific classification datasets into image‑text format using large language models, and augmenting them with existing medical image‑text data, enabling scalable pre‑training of visual‑language models (VLMs). UniMed aims to alleviate the scarcity of publicly available large‑scale medical image‑text data and supports tasks such as zero‑shot classification and cross‑modal generalization.

arXiv

MedTrinity-25M

Medical Data Analysis

MedTrinity‑25M is a large‑scale multimodal medical dataset with multigranular annotations. It extracts key information from collected data, integrates metadata to generate coarse descriptions, locates regions of interest, and gathers medical knowledge, then prompts large language models to generate fine‑grained descriptions.