High Quality Data

Dataset Hub

Explore high-quality datasets for your AI and machine learning projects.

Sort:

Browse by Category

CIMA histology images

This dataset provides user‑generated landmark annotations for CIMA histology images, containing 2D tissue micro‑sections stained with different methods. Challenges include extremely large image sizes, visual heterogeneity, and lack of salient objects. The dataset includes 108 image portions with manually placed landmarks for registration quality assessment.

github

View Details

Cervix93 Cytology Dataset

Cervical Cytology

Image Analysis

The dataset contains 93 image stacks and their corresponding extended depth of field (EDF) images, sourced from cases classified according to the Bethesda System as Negative, LSIL, or HSIL. It also includes grade labels for each frame and manually marked points within cervical cells.

github

View Details

Linaqruf/pixiv-niji-journey

Image Analysis

Artworks

The Pixiv Niji Journey dataset comprises 9,766 images and associated metadata collected from the online art platform Pixiv. It is provided in raw and preprocessed versions. The raw version contains the original data as scraped from Pixiv. The preprocessed version includes additional processing steps: conversion of images from RGB to RGBA, annotation with the BLIP tool, Danbooru tags generated by the wd‑v1‑4‑vit‑tagger, and thorough cleaning to remove low‑quality or irrelevant images. Images are in JPG and PNG formats; metadata is supplied in JSON, with preprocessed metadata also available as .txt and .caption files. The dataset is primarily intended for image classification and generation tasks, though users should be aware of potential biases originating from Pixiv content and the specific search terms used.

hugging_face

View Details

retina_dataset

Ophthalmic Diseases

Image Analysis

Dataset containing images of four ophthalmic conditions: 1) Normal 2) Cataract 3) Glaucoma 4) Retinal diseases.

github

View Details

mimic-cxr-dataset

Medical Imaging

Image Analysis

This dataset is primarily used for image analysis, containing three features: image, findings, and impression. The image feature stores image data; findings and impression store textual descriptions. The dataset includes a training set with 30,633 samples, total size 800,678,886 bytes, download size 792,886,513 bytes.

huggingface

View Details

danaroth/harvard

Hyperspectral Imaging

Image Analysis

This dataset contains 75 hyperspectral images, 50 captured under natural daylight in indoor and outdoor scenes, and 25 captured under artificial and mixed lighting in indoor scenes. Images were acquired with a commercial hyperspectral camera (Nuance FX, CRI Inc.) equipped with an integrated liquid‑crystal tunable filter, capturing 31 narrow spectral bands from 420 nm to 720 nm in 10 nm steps. All images are of static scenes and include masks to conceal regions that moved during exposure. The dataset is divided into two parts: `CZ_hsdb` and `CZ_hsdbi`, corresponding to images taken under different illumination conditions. It is intended for non‑commercial research use only.

hugging_face

View Details