Explore high-quality datasets for your AI and machine learning projects.
The CADICA dataset is a newly released resource for coronary artery disease research. It includes annotated coronary angiography images with bounding boxes around lesions. Initially, bounding boxes are provided in the format of top‑left coordinates, width, and height. The dataset is split into training, validation, and test sets; each split has detailed CSV files listing image paths and corresponding ground‑truth annotations.
This dataset is a complete collection of COVID‑19 chest X‑ray segmentations, comprising 100 images. Each annotation file follows the COCO format and includes segmentations of anatomical categories (left lung, right lung, heart‑thorax, airway) and pathological categories (ground‑glass opacity, consolidation, pleural effusion, pneumothorax), as well as objects such as endotracheal tubes, central venous lines, monitoring probes, nasogastric tubes, chest tubes, and tubing. Every image was manually annotated by qualified radiologists.
This dataset comprises various MRI images for automated multiple‑sclerosis (MS) diagnosis. It includes: - **SCLEROSIS**: 109 images. - **ISBI 2015** challenge: 21 images for longitudinal MS lesion segmentation. - **Ljubljana**: 30 training images. - **MICCAI 2008**: 51 images for the MS lesion segmentation challenge. - **MICCAI 2016**: 15 training images. - **HEALTHY**: 21 healthy subject images. - **KIRBY**: 42 images for multimodal MRI reproducibility. - **OASIS‑3**: forthcoming. The dataset provides detailed demographic information for several subsets (age, sex, MS subtype, etc.) and includes download instructions and preprocessing protocols such as rigid registration, MNI alignment, anisotropic filtering, skull stripping, and bias field correction.
This is an open pathology image dataset for deep‑learning model development, containing 15 organ categories. The dataset follows FAIR principles and provides a large, diverse collection of whole‑slide pathology images for tasks such as disease classification, cancer and pneumonia cell segmentation, contributing to improved diagnostic and therapeutic strategies.
The BraTS 2020 dataset is used for brain tumor segmentation projects on multimodal MRI scans. It aims to accurately segment three tumor sub‑regions: GD‑enhancing tumor (ET), peritumoral edema (ED), and necrotic and non‑enhancing tumor core (NCR/NET). By developing automated segmentation methods with deep learning, it seeks to help medical professionals analyze brain tumor MRI scans more efficiently and accurately, improving diagnosis, treatment planning, and monitoring.
The dataset contains features computed from digitized fine‑needle aspiration (FNA) breast images, describing nuclear characteristics.
This dataset contains 3,543 chest X‑ray images, annotated with bounding boxes for ten chest abnormalities, including nodules, masses, and pneumothorax. The project uses the ChestX‑Det10 dataset, which is a subset of NIH ChestX‑14.
TJDR is a high‑quality diabetic retinopathy pixel‑level annotation dataset comprising 561 color fundus images from Tongji Hospital, Tongji University. Images were captured with various fundus cameras at high resolution. Personal identifiers have been removed. Anatomical structures such as optic disc, retinal vessels, and macula are clearly visible. Experienced ophthalmologists annotated four common DR lesion types—micro‑aneurysms (MA), hemorrhage (HE), hard exudates (EX), and soft exudates (SE)—using the Labelme tool. The dataset is split into training and test sets and publicly released to support DR lesion segmentation research.