MedTrinity‑25M: A Large‑scale Multimodal Dataset with Multigranular Annotations for Medicine

Dataset Overview

MedTrinity‑25M is a large‑scale multimodal dataset designed for the medical domain, featuring multigranular annotations. It includes abundant medical images and corresponding textual descriptions, suitable for medical visual question answering and related tasks.

Construction Process

Data Processing: Extract key information from collected data, integrate metadata to generate coarse descriptions, locate regions of interest (ROI), and gather medical knowledge.
Multigranular Text Generation: Use this information to prompt large language models (MLLMs) to produce fine‑grained descriptions.

Statistics Overview

Statistical information (illustrated in the original paper) details the scale and structure of the dataset.

Download

The dataset can be downloaded from Hugging Face Hub:

MedTrinity‑25M: UCSC‑VLAA/MedTrinity-25M

Results Showcase

Performance results on several medical VQA tasks are shown, demonstrating the dataset's utility.

Quick Start

Installation

Clone the repository and navigate to the folder:

git clone https://github.com/UCSC-VLAA/MedTrinity-25M.git

Install the package:

conda create -n llava-med++ python=3.10 -y
conda activate llava-med++
pip install --upgrade pip
pip install -e .

Install additional training packages:

pip install -e "[train]"
pip install flash-attn --no-build-isolation
pip install git+https://github.com/bfshi/scaling_on_scales.git
pip install multimedeval

Model Zoo

The following models are available in the dataset:

Model Name	Link	Description
LLaVA‑Med++ (VQA‑RAD)	Google Drive	Pre‑trained on LLaVA‑Med data and MedTrinity‑25M VQA‑RAD subset, then fine‑tuned on VQA‑RAD training set.
LLaVA‑Med++ (SLAKE)	Google Drive	Pre‑trained on LLaVA‑Med data and MedTrinity‑25M SLAKE subset, then fine‑tuned on SLAKE training set.
LLaVA‑Med++ (PathVQA)	Google Drive	Pre‑trained on LLaVA‑Med data and MedTrinity‑25M PathVQA subset, then fine‑tuned on PathVQA training set.
LLaVA‑Med‑Captioner	Hugging Face	Description generator for multigranular annotations, fine‑tuned on MedTrinity‑Instruct‑200K.

Citation

If MedTrinity‑25M benefits your research, please cite:

@misc{xie2024medtrinity25mlargescalemultimodaldataset,
      title={MedTrinity‑25M: A Large‑scale Multimodal Dataset with Multigranular Annotations for Medicine},
      author={Yunfei Xie and Ce Zhou and Lang Gao and Juncheng Wu and Xianhang Li and Hong‑Yu Zhou and Sheng Liu and Lei Xing and James Zou and Cihang Xie and Yuyin Zhou},
      year={2024},
      eprint={2408.02900},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2408.02900}
}

MedTrinity-25M

Description