High Quality Data

Dataset Hub

Explore high-quality datasets for your AI and machine learning projects.

Sort:

Browse by Category

CADICA

The CADICA dataset is a newly released resource for coronary artery disease research. It includes annotated coronary angiography images with bounding boxes around lesions. Initially, bounding boxes are provided in the format of top‑left coordinates, width, and height. The dataset is split into training, validation, and test sets; each split has detailed CSV files listing image paths and corresponding ground‑truth annotations.

github

View Details

COVID-19 Chest X-ray Segmentations Dataset

COVID‑19

Medical Imaging Analysis

This dataset is a complete collection of COVID‑19 chest X‑ray segmentations, comprising 100 images. Each annotation file follows the COCO format and includes segmentations of anatomical categories (left lung, right lung, heart‑thorax, airway) and pathological categories (ground‑glass opacity, consolidation, pleural effusion, pneumothorax), as well as objects such as endotracheal tubes, central venous lines, monitoring probes, nasogastric tubes, chest tubes, and tubing. Every image was manually annotated by qualified radiologists.

github

View Details

SCLEROSIS, ISBI 2015, ljubljana, MICCAI 2008, MICCAI 2016, HEALTHY, KIRBY, OASIS-3

Medical Imaging Analysis

Multiple Sclerosis

This dataset comprises various MRI images for automated multiple‑sclerosis (MS) diagnosis. It includes: - **SCLEROSIS**: 109 images. - **ISBI 2015** challenge: 21 images for longitudinal MS lesion segmentation. - **Ljubljana**: 30 training images. - **MICCAI 2008**: 51 images for the MS lesion segmentation challenge. - **MICCAI 2016**: 15 training images. - **HEALTHY**: 21 healthy subject images. - **KIRBY**: 42 images for multimodal MRI reproducibility. - **OASIS‑3**: forthcoming. The dataset provides detailed demographic information for several subsets (age, sex, MS subtype, etc.) and includes download instructions and preprocessing protocols such as rigid registration, MNI alignment, anisotropic filtering, skull stripping, and bias field correction.

github

View Details

UCF-WSI-Dataset

Medical Imaging Analysis

Pathology Images

This is an open pathology image dataset for deep‑learning model development, containing 15 organ categories. The dataset follows FAIR principles and provides a large, diverse collection of whole‑slide pathology images for tasks such as disease classification, cancer and pneumonia cell segmentation, contributing to improved diagnostic and therapeutic strategies.

github

View Details

BraTS 2020 dataset

Medical Imaging Analysis

Deep Learning

The BraTS 2020 dataset is used for brain tumor segmentation projects on multimodal MRI scans. It aims to accurately segment three tumor sub‑regions: GD‑enhancing tumor (ET), peritumoral edema (ED), and necrotic and non‑enhancing tumor core (NCR/NET). By developing automated segmentation methods with deep learning, it seeks to help medical professionals analyze brain tumor MRI scans more efficiently and accurately, improving diagnosis, treatment planning, and monitoring.

github

View Details

Wisconsin Breast Cancer Diagnostic dataset

Breast Cancer Diagnosis

Medical Imaging Analysis

The dataset contains features computed from digitized fine‑needle aspiration (FNA) breast images, describing nuclear characteristics.

github

View Details

ChestX-Det10

Medical Imaging Analysis

Anomaly Detection

This dataset contains 3,543 chest X‑ray images, annotated with bounding boxes for ten chest abnormalities, including nodules, masses, and pneumothorax. The project uses the ChestX‑Det10 dataset, which is a subset of NIH ChestX‑14.

github

View Details

TJDR

Diabetic Retinopathy

Medical Imaging Analysis

TJDR is a high‑quality diabetic retinopathy pixel‑level annotation dataset comprising 561 color fundus images from Tongji Hospital, Tongji University. Images were captured with various fundus cameras at high resolution. Personal identifiers have been removed. Anatomical structures such as optic disc, retinal vessels, and macula are clearly visible. Experienced ophthalmologists annotated four common DR lesion types—micro‑aneurysms (MA), hemorrhage (HE), hard exudates (EX), and soft exudates (SE)—using the Labelme tool. The dataset is split into training and test sets and publicly released to support DR lesion segmentation research.

github

View Details