High Quality Data

Dataset Hub

Explore high-quality datasets for your AI and machine learning projects.

Sort:

Browse by Category

SH17

Personal Protective Equipment

Object Detection

The SH17 dataset was created by the Department of Mechanical, Automotive and Materials Engineering at the University of Windsor, containing 8,099 annotated images covering 17 categories of personal protective equipment (PPE) such as helmets and safety glasses. Collected from diverse industrial settings, the dataset aims to enhance worker safety in manufacturing through object detection and convolutional neural network techniques. Images were sourced via the Pexels website and annotated by professionals to ensure quality and diversity. The dataset is primarily used to train and validate object detection models for PPE compliance in industrial environments.

ds4sd/SynthTabNet_OTSL

Table Structure Recognition

Object Detection

This dataset converts the original SynthTabNet tables into OTSL format for table‑structure recognition tasks. It comprises four parts, each containing 150 k tables (total 600 k). Each part is divided by table appearance, size, structure, and content, and split into training, test, and validation sets. The structure includes cell content, OTSL tokens, HTML structure, restored HTML, column count, row count, and image. An OTSL vocabulary defines cell token types. The dataset was transformed and maintained by IBM Research's Deep Search team.

HiFSOD-Bird

Bird Species Recognition

Object Detection

HiFSOD‑Bird is the first dataset for the HIFSOD problem. It contains 176,350 wild bird images, divided into 1,432 categories, organized into a four‑level taxonomy comprising 32 orders, 132 families, 572 genera, and 1,432 species.

HazyDet

Unmanned Aerial Vehicles

Object Detection

HazyDet is a large‑scale dataset created by the PLA Engineering University and other institutions, specifically for drone‑view object detection in haze and smog conditions. It contains 383,000 real‑world instances collected from natural haze environments and normal scenes where haze effects were artificially added to simulate adverse weather. The dataset creation combined depth estimation and atmospheric scattering models to ensure realism and diversity. HazyDet is primarily applied to object detection for drones operating under harsh weather, aiming to enhance drone perception in complex environments.

Francesco/parasites-1s07h

Parasite Identification

Object Detection

The parasites‑1s07h dataset is an object detection dataset containing images and their object annotations. Each entry includes image ID, image, width, height, and object annotations (object ID, area, bounding box, category). The dataset language is English, size between 1 K and 10 K, and is single‑language. It was created via crowdsourcing, with original data sourced from Roboflow. The structure and field information are detailed, including automatic decode hints for images.

SAR-Ship-Dataset

Remote Sensing Image Analysis

Object Detection

The dataset was annotated by SAR experts and consists of 102 GF‑3 (Gaofen‑3) satellite images and 108 Sentinel‑1 images, providing a total of 43,819 ship chips sized 256 × 256 pixels with varying scales and backgrounds. It can be used to develop object detectors for multi‑scale and small‑object detection.

JijoJS/car-damage-new1

Car Damage Detection

Object Detection

This dataset is for object detection, primarily containing images of car damage. Labels include dent and scratch. Total 2,427 images: 2,187 training, 120 validation, 120 test. Images auto‑oriented and resized to 416×416. Exported via Roboflow platform, no image augmentation applied.

Exclusively Dark (ExDark) Image Dataset

Low‑Light Image Processing

Object Detection

To promote research on low‑light object detection and image enhancement, we introduce the Exclusively Dark (ExDark) dataset, which comprises 7,363 images captured under extremely low‑light to dusk conditions (10 different illumination levels) and includes 12 object categories (similar to PASCAL VOC). Images are annotated with both image‑level class labels and localized object bounding boxes.

DOTA v1.5

Object Detection

Aerial Image Analysis

The DOTA v1.5 dataset is designed for object detection in aerial and satellite imagery and supports rotated bounding boxes. It is compatible with YOLOv9 models.

zhuchi76/Boat_dataset

Object Detection

Ship Identification

This dataset contains real and synthetic boat images for object detection tasks. The dataset structure includes images and object annotations, with fields such as image_id, width, height, and objects. It is split into training and validation sets, each further divided into real and synthetic subsets, covering various boat categories such as BallonBoat, BigBoat, Boat, JetSki, Katamaran, SailBoat, SmallBoat, SpeedBoat, and WAM_V.

Charitarth/dac-sdc-2024

Object Detection

The DAC SDC 2024 dataset is for object detection tasks, containing 10,000 training samples. Each sample includes an image and object information such as bounding boxes, categories, and segmentation masks. Images with the _1.jpg suffix have been removed from the original dataset.

FLIR Dataset

Thermal Imaging

Object Detection

This dataset primarily provides three types of thermal‑imaging images: the training set contains 8,862 thermal images, the validation set 1,366 images, and the video set 4,224 images. These images are used to train a YOLOv3 detector, with mAP reported on the validation set. The video set is used for tracking detected objects.

NWPU VHR-10 dataset

Remote Sensing Imagery

Object Detection

The NWPU VHR‑10 dataset is a challenging benchmark for geospatial object detection containing ten categories. It includes 800 very‑high‑resolution (VHR) optical remote‑sensing images: 715 color images from Google Earth (spatial resolution 0.5–2 m) and 85 panchromatic‑sharpened color‑infrared images from Vaihingen (0.08 m). The dataset is split into a positive set (650 images containing at least one target) and a negative set (150 images with no targets). The positive set is manually annotated with 757 aircraft, 302 ships, 655 storage tanks, 390 baseball fields, 524 tennis courts, 159 basketball courts, 163 athletics fields, 224 ports, 124 bridges, and 477 vehicles using bounding boxes and instance masks as ground truth.

custom YOLOv8 dataset

AI Corrosion Detection

Object Detection

A custom YOLOv8 dataset for AI‑based corrosion detection on mobile platforms, containing 705 corrosion‑surface images, annotated for instance segmentation via Roboflow, and expanded to 1,398 images through data augmentation for training, validation, and testing.

HRSC2016

Object Detection

Computer Vision

The HRSC 2016 dataset can be downloaded from the provided link and is used for object detection tasks.

HRSC2016, DIOR, HRSID, NWPU VHR-10 dataset

Remote Sensing Imagery

Object Detection

HRSC2016: Ship detection dataset in remote sensing images. DIOR: Object detection dataset in remote sensing images. HRSID: Small object detection dataset in remote sensing images. NWPU VHR‑10 dataset: Object detection dataset in remote sensing images.

Underwater Pipeline Detection Dataset (YOLO Format)

Object Detection

Underwater Pipeline

This project provides a YOLO‑format underwater pipeline detection dataset, including training and validation sets, their corresponding annotation files, and a class definition file (`classes.txt`).

Francesco/apex-videogame

Object Detection

The apex‑videogame dataset is a dataset for object detection tasks, containing images and their object annotations. Each data point includes an image ID, the image itself, its width and height, and annotation information for objects such as object ID, area, bounding box, and category. The dataset language is English, annotated by Roboflow users, and its size ranges between 1K and 10K.

blanchon/FAIR1M

Remote Sensing Imagery

Object Detection

FAIR1M is a fine‑grained object recognition and detection dataset focusing on high‑resolution (0.3‑0.8 m) RGB images sourced from Gaofen satellites and Google Earth. It contains 15 000 high‑resolution images covering five major categories (ships, vehicles, aircraft, ball‑fields, roads) and 37 sub‑categories. Annotations are provided as rotated bounding boxes, suitable for remote sensing, Earth observation, geospatial, and satellite‑image research.

dduka/guitar-chords

Guitar Chord Recognition

Object Detection

This dataset is intended for object‑detection; the target class is "music". It contains between 1 K and 10 K samples and is built from three external sources: chorddetection2.2, chorddetection/dataset/11, and chord‑gitar‑detection/dataset/1.

RSOD-Dataset

Remote Sensing Imagery

Object Detection

This is an open remote‑sensing image object detection dataset containing objects such as aircraft, oil tanks, playgrounds, and overpasses. The dataset follows the PASCAL VOC format and consists of four files, each corresponding to one object type.

AODRaw

Object Detection

AODRaw dataset provides 7,785 high‑resolution real RAW images, containing 135,601 annotated instances across 62 categories, capturing indoor and outdoor scenes under nine different lighting and weather conditions. The dataset supports RAW and sRGB object detection and offers a comprehensive benchmark for evaluating current detection methods.