Explore high-quality datasets for your AI and machine learning projects.
The GTDB‑HD public ground‑truth dataset for hand‑drawn circuit diagrams contains images of hand‑drawn electrical schematics together with bounding‑box annotations for object detection and segmentation ground‑truth files. It is intended for training models to extract electrical diagrams from raster graphics. The dataset is organised into folders storing images, annotations, instance‑segmentation polygons and segmentation maps, and includes a README with usage guide, contribution instructions, citation format and license information.
This dataset is the largest, most diverse, and most consistent crack‑segmentation dataset built to date. It contains 9 255 images aggregated from various open‑source small datasets and pre‑processed to a resolution of 400 × 400 pixels. The dataset comprises ten sub‑datasets, such as Crack500, Deepcrack, etc.
This dataset is redistributed from Landsat 8 Cloud Cover Assessment Validation data, with masks modified to add georeferenced metadata. The task category is image segmentation, with climate‑related labels, named L8 Biome, size less than 1 KB, and licensed under CC0‑1.0.
The MaGGIe dataset is a training and benchmark collection for instance‑aware alpha portrait matting in images and videos, specifically targeting mask‑guided matting tasks. Developed during Adobe Research’s 2023 Summer Internship and accepted at CVPR 2024, it serves applications such as image segmentation, instance matting, portrait matting, video matting, guided matting, and human matting.
The Oxford‑IIIT Pet Dataset is a collection of 37 pet categories, each with roughly 200 images, created by the Oxford Visual Geometry Group. Images exhibit wide variation in scale, pose, and illumination. All images are accompanied by ground‑truth annotations, including breed, head region of interest, and pixel‑level silhouette segmentation.
This dataset is specifically for classifying and annotating snakes in the Philippines, containing 15 different snake categories that cover various species in the region, providing an ideal foundation for training and improving the YOLOv8‑seg model's performance in snake identification and image segmentation tasks.
SA-Med2D-20M是一个大规模的2D医学图像分割数据集,包含4.6百万张2D医学图像和19.7百万个对应的掩码,覆盖了几乎整个身体,并展示了显著的多样性。该数据集旨在帮助研究人员构建医学视觉基础模型或将其模型应用于下游医学应用。