High Quality Data

Dataset Hub

Explore high-quality datasets for your AI and machine learning projects.

Sort:

Browse by Category

lowercaseonly/cghd

The GTDB‑HD public ground‑truth dataset for hand‑drawn circuit diagrams contains images of hand‑drawn electrical schematics together with bounding‑box annotations for object detection and segmentation ground‑truth files. It is intended for training models to extract electrical diagrams from raster graphics. The dataset is organised into folders storing images, annotations, instance‑segmentation polygons and segmentation maps, and includes a README with usage guide, contribution instructions, citation format and license information.

hugging_face

View Details

rimvydasrub/crackseg9k

Crack Detection

Image Segmentation

This dataset is the largest, most diverse, and most consistent crack‑segmentation dataset built to date. It contains 9 255 images aggregated from various open‑source small datasets and pre‑processed to a resolution of 400 × 400 pixels. The dataset comprises ten sub‑datasets, such as Crack500, Deepcrack, etc.

hugging_face

View Details

torchgeo/l8biome

Bioclimates

Image Segmentation

This dataset is redistributed from Landsat 8 Cloud Cover Assessment Validation data, with masks modified to add georeferenced metadata. The task category is image segmentation, with climate‑related labels, named L8 Biome, size less than 1 KB, and licensed under CC0‑1.0.

hugging_face

View Details

chuonghm/MaGGIe-HIM

Image Segmentation

Human Instance Segmentation

The MaGGIe dataset is a training and benchmark collection for instance‑aware alpha portrait matting in images and videos, specifically targeting mask‑guided matting tasks. Developed during Adobe Research’s 2023 Summer Internship and accepted at CVPR 2024, it serves applications such as image segmentation, instance matting, portrait matting, video matting, guided matting, and human matting.

hugging_face

View Details

Oxford-IIIT Pet Dataset

Pet Recognition

Image Segmentation

The Oxford‑IIIT Pet Dataset is a collection of 37 pet categories, each with roughly 200 images, created by the Oxford Visual Geometry Group. Images exhibit wide variation in scale, pose, and illumination. All images are accompanied by ground‑truth annotations, including breed, head region of interest, and pixel‑level silhouette segmentation.

github

View Details

Snakes CALABARZON pt. 2

Snake Recognition

Image Segmentation

This dataset is specifically for classifying and annotating snakes in the Philippines, containing 15 different snake categories that cover various species in the region, providing an ideal foundation for training and improving the YOLOv8‑seg model's performance in snake identification and image segmentation tasks.

github

View Details

SA-Med2D-20M

Medical Image Processing

Image Segmentation

SA-Med2D-20M是一个大规模的2D医学图像分割数据集，包含4.6百万张2D医学图像和19.7百万个对应的掩码，覆盖了几乎整个身体，并展示了显著的多样性。该数据集旨在帮助研究人员构建医学视觉基础模型或将其模型应用于下游医学应用。

arXiv

View Details