JUHE API Marketplace
DATASET
Open Source Community

MedTrinity-25M

MedTrinity‑25M is a large‑scale multimodal medical dataset with multigranular annotations. It extracts key information from collected data, integrates metadata to generate coarse descriptions, locates regions of interest, and gathers medical knowledge, then prompts large language models to generate fine‑grained descriptions.

Updated 8/15/2024
github

Description

MedTrinity‑25M: A Large‑scale Multimodal Dataset with Multigranular Annotations for Medicine

Dataset Overview

MedTrinity‑25M is a large‑scale multimodal dataset designed for the medical domain, featuring multigranular annotations. It includes abundant medical images and corresponding textual descriptions, suitable for medical visual question answering and related tasks.

Construction Process

  1. Data Processing: Extract key information from collected data, integrate metadata to generate coarse descriptions, locate regions of interest (ROI), and gather medical knowledge.
  2. Multigranular Text Generation: Use this information to prompt large language models (MLLMs) to produce fine‑grained descriptions.

Statistics Overview

Statistical information (illustrated in the original paper) details the scale and structure of the dataset.

Download

The dataset can be downloaded from Hugging Face Hub:

Results Showcase

Performance results on several medical VQA tasks are shown, demonstrating the dataset's utility.

Quick Start

Installation

  1. Clone the repository and navigate to the folder:
    git clone https://github.com/UCSC-VLAA/MedTrinity-25M.git
    
  2. Install the package:
    conda create -n llava-med++ python=3.10 -y
    conda activate llava-med++
    pip install --upgrade pip
    pip install -e .
    
  3. Install additional training packages:
    pip install -e "[train]"
    pip install flash-attn --no-build-isolation
    pip install git+https://github.com/bfshi/scaling_on_scales.git
    pip install multimedeval
    

Model Zoo

The following models are available in the dataset:

Model NameLinkDescription
LLaVA‑Med++ (VQA‑RAD)Google DrivePre‑trained on LLaVA‑Med data and MedTrinity‑25M VQA‑RAD subset, then fine‑tuned on VQA‑RAD training set.
LLaVA‑Med++ (SLAKE)Google DrivePre‑trained on LLaVA‑Med data and MedTrinity‑25M SLAKE subset, then fine‑tuned on SLAKE training set.
LLaVA‑Med++ (PathVQA)Google DrivePre‑trained on LLaVA‑Med data and MedTrinity‑25M PathVQA subset, then fine‑tuned on PathVQA training set.
LLaVA‑Med‑CaptionerHugging FaceDescription generator for multigranular annotations, fine‑tuned on MedTrinity‑Instruct‑200K.

Citation

If MedTrinity‑25M benefits your research, please cite:

@misc{xie2024medtrinity25mlargescalemultimodaldataset,
      title={MedTrinity‑25M: A Large‑scale Multimodal Dataset with Multigranular Annotations for Medicine},
      author={Yunfei Xie and Ce Zhou and Lang Gao and Juncheng Wu and Xianhang Li and Hong‑Yu Zhou and Sheng Liu and Lei Xing and James Zou and Cihang Xie and Yuyin Zhou},
      year={2024},
      eprint={2408.02900},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2408.02900}
}

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Medical Data Analysis
Multimodal Data

Source

Organization: github

Created: 8/6/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.