JUHE API Marketplace
High Quality Data

Dataset Hub

Explore high-quality datasets for your AI and machine learning projects.

Sort:

Browse by Category

PolyUDataset

Image Denoising
Image Processing

This dataset provides real noise images for denoising research, containing 40 different scenes captured by 5 leading‑brand cameras, totaling 100 regions of size 512 × 512, including noisy images and corresponding ground‑truth images.

github
View Details

LEVIR-CC Dataset

Remote Sensing Technology
Image Processing

The LEVIR‑CC dataset is a large dataset for remote‑sensing image change caption generation, specifically supporting research on change captioning using dual‑branch Transformers.

github
View Details

hyperspectral-fruit

Hyperspectral Imaging
Image Processing

The dataset contains 100 images of various fruits and vegetables captured under controlled lighting conditions using a Living Optics camera. Data types include RGB images, sparse spectral samples, and instance segmentation masks. The dataset includes over 430,000 spectral samples, of which more than 85,000 belong to one of 19 categories. Additionally, 13 labeled images are provided as a validation set along with some unlabeled demonstration videos. The dataset is primarily used for image segmentation and classification tasks.

huggingface
View Details

AdvancedEdit

Image Processing
Data Augmentation

The AdvancedEdit dataset was created via a novel data construction pipeline, featuring high visual quality, complex instructions, and good background consistency at a large scale.

github
View Details

Danbooru2018 Anime Character Recognition Dataset

Anime Character Recognition
Image Processing

This dataset is based on the Danbooru2018 dataset for anime character recognition, containing 1 million images and 70,000 characters. The dataset has been processed to generate 1 million head images and their corresponding character labels. The character label distribution follows a long‑tail, with an average of 13.85 images per label.

github
View Details

Low-light Object Detection (LOD) Dataset

Low‑Light Object Detection
Image Processing

This is a dataset for object detection under very low‑light conditions, containing various image types (RGB‑normal, RGB‑dark, RAW‑normal, RAW‑dark) and corresponding annotation files. It is used for research on object detection in extremely low‑light environments.

github
View Details

Iceclear/DF2K-OST

Image Processing
Super‑Resolution

The dataset contains raw images from the DIV2K, Flicker2K, and OST datasets, primarily for image‑to‑image tasks such as single‑image super‑resolution.

hugging_face
View Details

student/FFHQ

Face Recognition
Image Processing

The FFHQ (Flickr‑Faces‑HQ) dataset comprises 70,000 high‑quality PNG images at 1024 × 1024 resolution, featuring diverse ages, ethnicities, backgrounds, and accessories (glasses, hats, etc.). Images were sourced from Flickr under permissive licenses, automatically aligned and cropped using dlib, and filtered to remove non‑photos. The dataset supports research in generative adversarial networks and related fields.

hugging_face
View Details

yuukicammy/MIT-Adobe-FiveK

Image Processing
Machine Learning

The MIT‑Adobe FiveK dataset is a publicly available collection containing 5,000 RAW images in DNG format, each retouched by five experts to produce 25,000 TIFF images (16‑bit per channel, ProPhoto RGB, lossless). The dataset also includes semantic information for each image. Created by MIT and Adobe Systems, Inc., it is intended to provide a diverse and challenging test set for image‑processing algorithms. Images cover a wide range of scenes—landscapes, portraits, still life, architecture—and exhibit varied lighting, color balance, and exposure conditions.

hugging_face
View Details

pixelprose

Image Processing
Natural Language Processing

PixelProse is a comprehensive dataset containing 16 million synthetically generated image captions created with the Gemini 1.0 Pro Vision model. The dataset provides rich variables such as image unique identifiers, URLs, captioning model, and caption text, and supports multiple download and usage options.

huggingface
View Details

Jorgvt/TID2008

Image Quality Assessment
Image Processing

The Tampere 2008 Image Database is an image quality assessment dataset containing 25 reference images and 17 types of distortions, each with 4 severity levels, totaling 1,700 (reference image, distortion type, subjective score) triplets for evaluating and researching image quality.

hugging_face
View Details

Meehai/dronescapes

Computer Vision
Image Processing

The Dronescapes dataset comprises various representations extracted from drone‑captured videos, including RGB, optical flow, depth, edges, and semantic segmentation. It can be downloaded directly from HuggingFace or generated from raw videos and labels. The dataset is roughly 500 GB, contains video data from multiple scenes, and provides detailed generation and processing steps. It also offers training, validation, semi‑supervised, and test splits, along with tools for data inspection.

hugging_face
View Details

agentsea/wave-ui

User Interface
Image Processing

This dataset includes multiple fields such as image, instruction, bounding box, resolution, source, platform, name, description, type, OCR, language, purpose, and expectation. The dataset is divided into training, validation, and test sets containing 63,530, 7,944, and 7,938 samples respectively. The total download size is 3,400,799,177 bytes and the total size is 34,533,114,093.5 bytes. Based on the field content, the dataset may be used for image‑processing and natural‑language‑processing tasks, particularly scenarios involving image annotation and textual instructions.

hugging_face
View Details

PASCAL VOC 2010

Image Processing
Salient Object Detection

The dataset contains images and their corresponding saliency maps for training and validating saliency detection models based on recurrent U‑Net. It is packaged into two .npy files, one for images and one for the associated saliency maps.

github
View Details

HwD-1000

Handwritten Digit Recognition
Image Processing

The dataset contains 1,000 handwritten digit images, each 28 × 28 pixels with a white background and black strokes, covering the digits 0–9.

github
View Details

Iceclear/DIV8K_TrainingSet

Image Processing
Dataset

The DIV8K training set is a diverse 8 K resolution image dataset for image‑to‑image tasks. The dataset size ranges between 1 K and 10 K.

hugging_face
View Details

HPatches

Computer Vision
Image Processing

The HPatches dataset contains patches extracted from multiple image sequences, each sequence comprising images of the same scene. Sequences are organized by transformation type into illumination changes and viewpoint changes. Each image sequence provides reference patches and corresponding patches from other images, with patch size 65 × 65 pixels. The dataset is used to evaluate the performance of local descriptors.

github
View Details

ccHarmony

Image Processing
Illumination Variation

The ccHarmony dataset is a color‑checker‑based image harmonization dataset designed to better reflect natural illumination variations while maintaining a distribution similar to the Hday2night dataset, but at a lower collection cost. It comprises 350 real images and 426 segmented foregrounds; each foreground is paired with 10 synthetic composite images, resulting in a total of 4,260 synthetic‑real image pairs.

github
View Details

BSDS500/300, BSD68, Set12

Image Processing
Computer Vision

BSDS500/300 is a dataset provided by the Berkeley Vision Lab for image segmentation or contour detection, and is also used for super‑resolution reconstruction. The database contains 200 training images, 200 validation images, and 100 test images, with ground‑truth annotations stored in MAT files. BSD68 is a color dataset for image denoising benchmarks and is part of the Berkeley Segmentation Dataset and Benchmark. Set12 contains 12 images for evaluating image denoising algorithms.

github
View Details

ImageDataset_SceauxCastle, ReconstructionDataSet, Flat, School, Palm Desert Micro

Image Processing
3D Reconstruction

Image dataset for testing OpenMVG, including 11 images of the Sceaux Castle, nine high‑resolution image datasets, an indoor planar dataset of 11 images, an outdoor school dataset of 4 images, and a Palm Desert micro‑drone dataset of 21 images captured with a DJI Mini2.

github
View Details

Synthetic Faces High Quality (SFHQ) dataset

Face Recognition
Image Processing

The dataset comprises approximately 425,000 carefully selected high‑quality synthetic face images at 1024 × 1024 resolution, generated by transforming various inspirations such as paintings, sketches, 3D models, and text‑to‑image generators into realistic faces. It also includes facial landmarks (an extended set of 110 points) and semantic segmentation masks for face parsing.

github
View Details

FB-SSEM-dataset

Autonomous Driving
Image Processing

The FB‑SSEM dataset is a synthetic dataset comprising surround‑view fisheye camera images and BEV (bird’s‑eye‑view) maps generated from simulated ego‑vehicle motion sequences.

github
View Details

zishuod/pokemon-icons

Pokémon
Image Processing

Pokemon icon dataset. Most icons were collected and cropped from screenshots of the Pokémon Sword and Shield game.

hugging_face
View Details

jlbaker361/league-optimal-prompt

Prompt Engineering
Image Processing

The dataset contains 28 training samples, each with a label, an optimal prompt, two images (splash and tile), and subject information. The total dataset size is 2,453,145.0 bytes, and the download size is 2,453,716 bytes.

hugging_face
View Details