MedTrinity-25M
MedTrinity‑25M is a large‑scale multimodal medical dataset with multigranular annotations. It extracts key information from collected data, integrates metadata to generate coarse descriptions, locates regions of interest, and gathers medical knowledge, then prompts large language models to generate fine‑grained descriptions.
Description
MedTrinity‑25M: A Large‑scale Multimodal Dataset with Multigranular Annotations for Medicine
Dataset Overview
MedTrinity‑25M is a large‑scale multimodal dataset designed for the medical domain, featuring multigranular annotations. It includes abundant medical images and corresponding textual descriptions, suitable for medical visual question answering and related tasks.
Construction Process
- Data Processing: Extract key information from collected data, integrate metadata to generate coarse descriptions, locate regions of interest (ROI), and gather medical knowledge.
- Multigranular Text Generation: Use this information to prompt large language models (MLLMs) to produce fine‑grained descriptions.
Statistics Overview
Statistical information (illustrated in the original paper) details the scale and structure of the dataset.
Download
The dataset can be downloaded from Hugging Face Hub:
- MedTrinity‑25M: UCSC‑VLAA/MedTrinity-25M
Results Showcase
Performance results on several medical VQA tasks are shown, demonstrating the dataset's utility.
Quick Start
Installation
- Clone the repository and navigate to the folder:
git clone https://github.com/UCSC-VLAA/MedTrinity-25M.git - Install the package:
conda create -n llava-med++ python=3.10 -y conda activate llava-med++ pip install --upgrade pip pip install -e . - Install additional training packages:
pip install -e "[train]" pip install flash-attn --no-build-isolation pip install git+https://github.com/bfshi/scaling_on_scales.git pip install multimedeval
Model Zoo
The following models are available in the dataset:
| Model Name | Link | Description |
|---|---|---|
| LLaVA‑Med++ (VQA‑RAD) | Google Drive | Pre‑trained on LLaVA‑Med data and MedTrinity‑25M VQA‑RAD subset, then fine‑tuned on VQA‑RAD training set. |
| LLaVA‑Med++ (SLAKE) | Google Drive | Pre‑trained on LLaVA‑Med data and MedTrinity‑25M SLAKE subset, then fine‑tuned on SLAKE training set. |
| LLaVA‑Med++ (PathVQA) | Google Drive | Pre‑trained on LLaVA‑Med data and MedTrinity‑25M PathVQA subset, then fine‑tuned on PathVQA training set. |
| LLaVA‑Med‑Captioner | Hugging Face | Description generator for multigranular annotations, fine‑tuned on MedTrinity‑Instruct‑200K. |
Citation
If MedTrinity‑25M benefits your research, please cite:
@misc{xie2024medtrinity25mlargescalemultimodaldataset,
title={MedTrinity‑25M: A Large‑scale Multimodal Dataset with Multigranular Annotations for Medicine},
author={Yunfei Xie and Ce Zhou and Lang Gao and Juncheng Wu and Xianhang Li and Hong‑Yu Zhou and Sheng Liu and Lei Xing and James Zou and Cihang Xie and Yuyin Zhou},
year={2024},
eprint={2408.02900},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2408.02900}
}
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: github
Created: 8/6/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.