Explore high-quality datasets for your AI and machine learning projects.
This dataset provides real noise images for denoising research, containing 40 different scenes captured by 5 leading‑brand cameras, totaling 100 regions of size 512 × 512, including noisy images and corresponding ground‑truth images.
The LEVIR‑CC dataset is a large dataset for remote‑sensing image change caption generation, specifically supporting research on change captioning using dual‑branch Transformers.
The dataset contains 100 images of various fruits and vegetables captured under controlled lighting conditions using a Living Optics camera. Data types include RGB images, sparse spectral samples, and instance segmentation masks. The dataset includes over 430,000 spectral samples, of which more than 85,000 belong to one of 19 categories. Additionally, 13 labeled images are provided as a validation set along with some unlabeled demonstration videos. The dataset is primarily used for image segmentation and classification tasks.
The AdvancedEdit dataset was created via a novel data construction pipeline, featuring high visual quality, complex instructions, and good background consistency at a large scale.
This dataset is based on the Danbooru2018 dataset for anime character recognition, containing 1 million images and 70,000 characters. The dataset has been processed to generate 1 million head images and their corresponding character labels. The character label distribution follows a long‑tail, with an average of 13.85 images per label.
This is a dataset for object detection under very low‑light conditions, containing various image types (RGB‑normal, RGB‑dark, RAW‑normal, RAW‑dark) and corresponding annotation files. It is used for research on object detection in extremely low‑light environments.
The dataset contains raw images from the DIV2K, Flicker2K, and OST datasets, primarily for image‑to‑image tasks such as single‑image super‑resolution.
The FFHQ (Flickr‑Faces‑HQ) dataset comprises 70,000 high‑quality PNG images at 1024 × 1024 resolution, featuring diverse ages, ethnicities, backgrounds, and accessories (glasses, hats, etc.). Images were sourced from Flickr under permissive licenses, automatically aligned and cropped using dlib, and filtered to remove non‑photos. The dataset supports research in generative adversarial networks and related fields.
The MIT‑Adobe FiveK dataset is a publicly available collection containing 5,000 RAW images in DNG format, each retouched by five experts to produce 25,000 TIFF images (16‑bit per channel, ProPhoto RGB, lossless). The dataset also includes semantic information for each image. Created by MIT and Adobe Systems, Inc., it is intended to provide a diverse and challenging test set for image‑processing algorithms. Images cover a wide range of scenes—landscapes, portraits, still life, architecture—and exhibit varied lighting, color balance, and exposure conditions.
PixelProse is a comprehensive dataset containing 16 million synthetically generated image captions created with the Gemini 1.0 Pro Vision model. The dataset provides rich variables such as image unique identifiers, URLs, captioning model, and caption text, and supports multiple download and usage options.
The Tampere 2008 Image Database is an image quality assessment dataset containing 25 reference images and 17 types of distortions, each with 4 severity levels, totaling 1,700 (reference image, distortion type, subjective score) triplets for evaluating and researching image quality.
The Dronescapes dataset comprises various representations extracted from drone‑captured videos, including RGB, optical flow, depth, edges, and semantic segmentation. It can be downloaded directly from HuggingFace or generated from raw videos and labels. The dataset is roughly 500 GB, contains video data from multiple scenes, and provides detailed generation and processing steps. It also offers training, validation, semi‑supervised, and test splits, along with tools for data inspection.
This dataset includes multiple fields such as image, instruction, bounding box, resolution, source, platform, name, description, type, OCR, language, purpose, and expectation. The dataset is divided into training, validation, and test sets containing 63,530, 7,944, and 7,938 samples respectively. The total download size is 3,400,799,177 bytes and the total size is 34,533,114,093.5 bytes. Based on the field content, the dataset may be used for image‑processing and natural‑language‑processing tasks, particularly scenarios involving image annotation and textual instructions.
The dataset contains images and their corresponding saliency maps for training and validating saliency detection models based on recurrent U‑Net. It is packaged into two .npy files, one for images and one for the associated saliency maps.
The dataset contains 1,000 handwritten digit images, each 28 × 28 pixels with a white background and black strokes, covering the digits 0–9.
The DIV8K training set is a diverse 8 K resolution image dataset for image‑to‑image tasks. The dataset size ranges between 1 K and 10 K.
The HPatches dataset contains patches extracted from multiple image sequences, each sequence comprising images of the same scene. Sequences are organized by transformation type into illumination changes and viewpoint changes. Each image sequence provides reference patches and corresponding patches from other images, with patch size 65 × 65 pixels. The dataset is used to evaluate the performance of local descriptors.
The ccHarmony dataset is a color‑checker‑based image harmonization dataset designed to better reflect natural illumination variations while maintaining a distribution similar to the Hday2night dataset, but at a lower collection cost. It comprises 350 real images and 426 segmented foregrounds; each foreground is paired with 10 synthetic composite images, resulting in a total of 4,260 synthetic‑real image pairs.
BSDS500/300 is a dataset provided by the Berkeley Vision Lab for image segmentation or contour detection, and is also used for super‑resolution reconstruction. The database contains 200 training images, 200 validation images, and 100 test images, with ground‑truth annotations stored in MAT files. BSD68 is a color dataset for image denoising benchmarks and is part of the Berkeley Segmentation Dataset and Benchmark. Set12 contains 12 images for evaluating image denoising algorithms.
Image dataset for testing OpenMVG, including 11 images of the Sceaux Castle, nine high‑resolution image datasets, an indoor planar dataset of 11 images, an outdoor school dataset of 4 images, and a Palm Desert micro‑drone dataset of 21 images captured with a DJI Mini2.
The dataset comprises approximately 425,000 carefully selected high‑quality synthetic face images at 1024 × 1024 resolution, generated by transforming various inspirations such as paintings, sketches, 3D models, and text‑to‑image generators into realistic faces. It also includes facial landmarks (an extended set of 110 points) and semantic segmentation masks for face parsing.
The FB‑SSEM dataset is a synthetic dataset comprising surround‑view fisheye camera images and BEV (bird’s‑eye‑view) maps generated from simulated ego‑vehicle motion sequences.
Pokemon icon dataset. Most icons were collected and cropped from screenshots of the Pokémon Sword and Shield game.
The dataset contains 28 training samples, each with a label, an optimal prompt, two images (splash and tile), and subject information. The total dataset size is 2,453,145.0 bytes, and the download size is 2,453,716 bytes.