Back to datasets
Dataset assetOpen Source CommunityBiomedicalAI Applications

MedPix-2.0

MedPix 2.0 is a comprehensive multimodal biomedical dataset for advanced AI applications. The dataset includes detailed clinical case information and images, supporting CT and MRI scans.

Source
github
Created
Jun 21, 2024
Updated
Jul 4, 2024
Signals
885 views
Availability
Linked source ready
Overview

Dataset description and usage context

MedPix-2.0 Dataset Overview

Dataset Introduction

MedPix 2.0 is a comprehensive multimodal biomedical dataset, specifically designed for advanced artificial intelligence applications.

Citation Information

If you use the MedPix 2.0 dataset, please cite it as follows:

@misc{siragusa2024medpix20comprehensivemultimodal, title={MedPix 2.0: A Comprehensive Multimodal Biomedical Dataset for Advanced AI Applications}, author={Irene Siragusa and Salvatore Contino and Massimo La Ciura and Rosario Alicata and Roberto Pirrone}, year={2024}, eprint={2407.02994}, archivePrefix={arXiv}, primaryClass={cs.DB}, url={https://arxiv.org/abs/2407.02994}, }

Dataset Structure

Folder Structure

  • images folder: Contains all images in the dataset.
  • splitted_dataset folder: Provides a split of the dataset; see /splitted_dataset/README.md for details.

Case_topic.json

Provides a series of JSON objects, each offering information about a clinical case. Each element includes:

  • U_id: UID of the clinical case.
  • TAC: List of .png file names for CT scans (if any), located in the image folder.
  • MRI: List of .png file names for MR scans (if any), located in the image folder.
  • Case: Dictionary with clinical case details such as Title, History, Exam, Findings, Differential Diagnosis, Case Diagnosis, Diagnosis By.
  • Topic: Dictionary with disease information such as Title, Disease Discussion, ACR Code, Category.

Descriptions.json

Provides a series of JSON objects, each offering textual information for a single image, stored in the image folder. Each element includes:

  • Type: CT or MR, indicating the scan modality.
  • U_id: UID of the clinical case the image belongs to.
  • image: Image file name.
  • location: Fine‑grained body part information.
  • location category: Macro location of the body part.
  • Description: Dictionary with details such as ACR codes, Age, Sex, Caption, Figure part, Modality, Plane.
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio