UniMed
UniMed is a large‑scale, open‑source multimodal medical dataset created by Mohammed Bin Zayed University of AI and other institutions. It contains over 5.3 million image‑text pairs across six imaging modalities: X‑ray, CT, MRI, Ultrasound, Pathology, and Fundus. The dataset is built by converting modality‑specific classification datasets into image‑text format using large language models, and augmenting them with existing medical image‑text data, enabling scalable pre‑training of visual‑language models (VLMs). UniMed aims to alleviate the scarcity of publicly available large‑scale medical image‑text data and supports tasks such as zero‑shot classification and cross‑modal generalization.
Description
UniMed‑CLIP: Towards a Unified Image‑Text Pre‑training Paradigm for Diverse Medical Imaging Modalities
Dataset Overview
Dataset Name
UniMed‑CLIP
Dataset Description
UniMed‑CLIP is a unified image‑text pre‑training dataset for multiple medical imaging modalities. It comprises over 5.3 million image‑text pairs covering six modalities: X‑ray, CT, MRI, Ultrasound, Pathology, and Fundus.
Dataset Features
- Multimodal Coverage: Includes six distinct medical imaging modalities, providing rich multimodal data.
- Large Scale: Over 5.3 million image‑text pairs furnish a solid foundation for training general medical VLMs.
- Open‑Source: Accompanied by detailed code and annotation files for dataset preparation, fostering open research in medical VLMs.
Dataset Applications
UniMed‑CLIP primarily supports training and evaluating medical visual‑language models (VLMs), especially excelling in zero‑shot evaluation.
Dataset Contributions
- UniMed Dataset: An open‑source large‑scale multimodal medical dataset with over 5.3 million samples across six modalities.
- UniMed‑CLIP VLMs: Contrastive learning VLMs trained on UniMed, outperforming existing general VLMs across multiple medical modalities.
- Extensive Evaluation: Provides ablation studies on design choices and releases training code, dataset, and model checkpoints to advance medical VLM research.
Dataset Performance
| Method | Paper Link | X‑ray | Retinal‑Fundus | CT | MRI | US | Histopathology | Average |
|---|---|---|---|---|---|---|---|---|
| BioMedCLIP | Link | 55.43 | 22.87 | 43.99 | 64.59 | 49.20 | 54.50 | 49.02 |
| PMC‑CLIP | Link | 52.64 | 25.84 | 66.06 | 63.68 | 62.51 | 53.56 | 53.37 |
| UniMed‑CLIP | Link | 68.78 | 31.23 | 85.54 | 68.83 | 68.64 | 59.96 | 61.63 |
Dataset Updates
- 13 December 2024: Released annotation and code scripts for preparing the UniMed pre‑training dataset, along with training and inference code and pretrained checkpoints for UniMed‑CLIP.
Dataset Preparation
Detailed instructions and annotation files for dataset preparation are available in the UniMed‑DATA.md document.
Pre‑trained Models
Three UniMed‑CLIP model weights are provided:
model_name | text encoder | pretrained_weights | Resolution | GPUs | Avg. Score (21 datasets) |
|---|---|---|---|---|---|
| ViT‑B‑16‑quickgelu | BiomedNLP‑BiomedBERT‑base‑uncased‑abstract | unimed_clip_vit_b16 | 224 | 16 × A100 (40 GB) | 61.63 |
| ViT‑L‑14‑quickgelu | BiomedNLP‑BiomedBERT‑large‑uncased‑abstract | unimed_clip_vit_l14_large_text_encoder | 336 | 16 × A100 (40 GB) | 62.09 |
| ViT‑L‑14‑quickgelu | BiomedNLP‑BiomedBERT‑base‑uncased‑abstract | unimed_clip_vit_l14_base_text_encoder | 336 | 16 × A100 (40 GB) | 64.84 |
Citation
If you use this dataset, please cite the following paper:
@inproceedings{khattakuniemed,
title={UniMed‑CLIP: Towards a Unified Image‑Text Pre‑training Paradigm for Diverse Medical Imaging Modalities},
author={Khattak, Muhammad Uzair and Kunhimon, Shahina and Muzzamal, Naseer and Khan, Salman and Khan, Fahad Shahbaz},
journal={arXiv:2412.10372},
year={2024}
}
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: arXiv
Created: 12/14/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.