Explore high-quality datasets for your AI and machine learning projects.
The IST-3 CT head scan dataset was created by the Clinical Brain Sciences Centre at the University of Edinburgh, containing 10,659 CT series for research on intracranial arterial calcification segmentation. The dataset originates from the third International Stroke Trial (IST-3), involving non‑contrast CT scans of 3,035 acute ischemic stroke patients. During creation, registration to a template and quality control ensured validity and accuracy. The dataset primarily supports deep‑learning methods for stroke risk assessment, especially automatic quantification of intracranial arterial calcifications.
AbdomenAtlas 3.0 is the first publicly available high‑quality abdominal CT scan dataset with paired radiology reports. The database contains over 9,000 CT scans and their radiology reports, together with voxel‑level annotations of liver, kidney and pancreas tumors.
CIFAR‑100‑LT is an imbalanced dataset containing fewer than 60,000 32×32 colour images across 100 classes. The number of samples per class follows an exponential decay, with a factor of 10 and 100. The dataset includes 10,000 test images (100 per class) and fewer than 50,000 training images. The 100 classes are further grouped into 20 super‑classes. Each image has two labels: a fine label for the specific class and a coarse label for the related super‑class.
The dataset contains MRI scans of brain cancer patients stored as .dcm and .jpg files, accompanied by radiologist annotations and PDF reports. It includes 10 different study angles, offering a comprehensive view of brain tumor morphology. The full version comprises 100,000 studies covering various diseases and conditions such as cancer, multiple sclerosis, and metastatic lesions. It is valuable for researchers and clinicians developing new imaging techniques, training and validating machine‑learning algorithms, and analyzing tumor response to treatments.
A novel multimodal question‑answering dataset that combines structured electronic health records and chest X‑ray images, intended to promote joint reasoning of image and table modalities in EHR QA systems.
The dataset contains spinal X‑ray images stored in .jpg and .dcm formats, categorized by various spine‑related medical conditions. Each condition corresponds to a folder containing images of specific spinal deformities. The dataset covers multiple diseases and conditions such as scoliosis, osteochondrosis, osteoporosis, spondylolisthesis, vertebral compression fractures, disability, other, and healthy states. It supports researchers and medical professionals in developing algorithms for automated diagnosis, treatment planning, and prognosis estimation, suitable for computer algorithms, machine learning models, and deep learning techniques.
The dataset contains MRI images of lumbar vertebrae and intervertebral discs, focusing on the vertebrae and discs. Scans are accompanied by medical reports that aid in diagnosing spinal diseases such as degenerative spinal disease, lumbar degenerative disease, and disc herniation. The dataset emphasizes MRI of the lumbar spine and spinal canal using sagittal T2‑weighted images.
The IXI dataset contains nearly 600 MRI scans from healthy volunteers, including T1‑weighted, T2‑weighted, PD‑weighted, MRA, and diffusion‑weighted images. The scans were acquired using different scanner systems at Hammersmith Hospital, Guy’s Hospital, and the Institute of Psychiatry.
This dataset is primarily used for image analysis, containing three features: image, findings, and impression. The image feature stores image data; findings and impression store textual descriptions. The dataset includes a training set with 30,633 samples, total size 800,678,886 bytes, download size 792,886,513 bytes.
The dataset contains 2.4 million lumbar spine MRI studies, focusing on vertebrae and intervertebral discs. Scans are accompanied by medical reports for diagnosing spinal diseases such as degenerative spinal disease, lumbar degenerative disease, and disc herniation. The dataset emphasizes sagittal T2‑weighted images of the spinal canal and uses the sagittal view for detailed spinal imaging. It supports segmentation algorithms and classification models to achieve accurate automatic segmentation and classification. Deep learning can be applied to assess spinal stenosis, detect degenerative changes, and segment spinal structures. The data include sagittal and axial views and are suitable for machine‑learning and medical‑diagnostic tasks. All patients consented to data release, and the data are de‑identified.
The released SYSU-CEUS dataset contains three types of focal liver lesions (FLL): 186 HCC instances, 109 HEM instances, and 58 FNH instances (i.e., 186 malignant and 167 benign instances). The dataset was collected by the First Affiliated Hospital of Sun Yat-sen University using an Aplio SSA-770A (Toshiba Medical Systems) device. All instances have a resolution of 768 × 576, originate from different patients, and exhibit substantial variation in appearance and enhancement patterns (e.g., size, contrast, shape, and location).
The CT‑PET dataset was created by Hanoi University of Science and Technology (Vietnam) and Nagoya University (Japan) among other institutions. It is currently the largest paired CT‑PET image dataset, containing 2,028,628 CT‑PET image pairs. The dataset covers a wide range of anatomical regions, from the head to the upper thigh, with images stored in DICOM format and detailed metadata. It was designed to support training and evaluation of CT‑to‑PET image translation models, particularly for cancer diagnosis and treatment monitoring. By incorporating domain knowledge such as attention maps and attenuation maps, the dataset aims to improve the accuracy of PET image generation and the quality of diagnostic information.
The Polyp-Gen dataset is a realistic and diverse polyp image generation dataset for expanding endoscopic datasets. It contains 55,883 samples, including 29,640 polyp frames and 26,243 non‑polyp frames. Low‑quality images such as blurry, reflective, or ghosted frames were filtered out.
UniMed is a large‑scale, open‑source multimodal medical dataset created by Mohammed Bin Zayed University of AI and other institutions. It contains over 5.3 million image‑text pairs across six imaging modalities: X‑ray, CT, MRI, Ultrasound, Pathology, and Fundus. The dataset is built by converting modality‑specific classification datasets into image‑text format using large language models, and augmenting them with existing medical image‑text data, enabling scalable pre‑training of visual‑language models (VLMs). UniMed aims to alleviate the scarcity of publicly available large‑scale medical image‑text data and supports tasks such as zero‑shot classification and cross‑modal generalization.
The FracAtlas dataset is a musculoskeletal radiographic image collection for fracture classification, localisation, and segmentation. It includes 4,083 X‑ray images (717 with fractures) and provides annotations in COCO, VGG, YOLO, and Pascal VOC formats. The dataset is intended for deep‑learning tasks in medical imaging, particularly fracture understanding. It is freely available under CC‑BY 4.0.