Dataset Catalog

Browse trusted datasets for evaluation, enrichment, and production use.

Category index
Showing 12 of 12 datasets
Category: Face Recognition

Selfie-with-ID

AuthenticationFace Recognition

The dataset contains over 65,000 photos of more than 5,000 individuals from 40 countries, providing a valuable resource for exploring and developing authentication solutions. It is especially suitable for biometric verification, notably facial recognition in financial services. Each individual includes 13 selfie images and 2 ID photos captured with various devices and resolutions. The dataset aims to develop more robust re‑identification algorithms and enhance security measures across applications.

Source huggingfaceUpdated Nov 13, 2024316 viewsLinked
Inspect dataset

ffhq-256_training_faces

Face RecognitionComputer Vision

The dataset contains four features: image, original_index, landmark, and mask. The image feature is stored as an image format, original_index is an integer, landmark is a sequence of integers, and mask is null. The dataset is divided into two parts: base_transforms (69,426 samples) and random_aug_transforms (26,435 samples). Total download size is 8,177,644,392 bytes and total dataset size is 8,315,251,492.07 bytes.

Source huggingfaceUpdated Sep 16, 2024203 viewsLinked
Inspect dataset

Flickr-Faces-HQ (FFHQ)

Face RecognitionGenerative Adversarial Networks

Flickr‑Faces‑HQ (FFHQ) is a high‑quality face image dataset originally created as a benchmark for Generative Adversarial Networks (GANs). The dataset contains 70,000 high‑quality PNG images at a resolution of 1024×1024, featuring significant variation in age, race, and background, as well as accessories such as glasses, sunglasses, and hats. Images were scraped from Flickr, inheriting its biases, and were automatically aligned and cropped using dlib. Only images with appropriate licenses were collected, and various automatic filters and Amazon Mechanical Turk were employed to remove occasional statues, paintings, or non‑photographic content.

Source githubUpdated May 24, 20241,763 viewsLinked
Inspect dataset

FaceCaption-15M

Face RecognitionNatural Language Processing

FaceCaption‑15M is a large‑scale, diverse, high‑quality dataset of facial images and their natural‑language descriptions, containing over 15 million facial image‑description pairs, intended to promote research on face‑centric tasks. The dataset construction includes image collection, facial attribute annotation, facial description generation, and statistical analysis.

Source huggingfaceUpdated Jul 5, 2024244 viewsLinked
Inspect dataset

jxie/celeba-hq

Face RecognitionGender Classification

The dataset contains images and labels. The image feature is of image type, and the label is a binary classification with two classes: female and male. The dataset is split into a training set of 28 000 samples and a validation set of 2 000 samples. Total download size is 2 762 725 456 bytes and total size is 2 763 112 879 bytes. Data file paths are `train-*` and `validation-*`.

Source hugging_faceUpdated Mar 21, 2024307 viewsLinked
Inspect dataset

student/FFHQ

Face RecognitionImage Processing

The FFHQ (Flickr‑Faces‑HQ) dataset comprises 70,000 high‑quality PNG images at 1024 × 1024 resolution, featuring diverse ages, ethnicities, backgrounds, and accessories (glasses, hats, etc.). Images were sourced from Flickr under permissive licenses, automatically aligned and cropped using dlib, and filtered to remove non‑photos. The dataset supports research in generative adversarial networks and related fields.

Source hugging_faceUpdated Apr 16, 2022306 viewsLinked
Inspect dataset

pca-face-dataset

Face RecognitionPrincipal Component Analysis

This dataset contains representative images generated via Principal Component Analysis for face recognition.

Source githubUpdated Nov 26, 2020133 viewsLinked
Inspect dataset

UADFV, EBV, Deepfake-TIMIT, DFFD, Wild Deepfake, Celeb-DF (v1), Celeb-DF (v2), DFDC, Deeper Forensic, FaceForensic++, DFGC, FFIW-10K, ForgeryNet

Deepfake DetectionFace Recognition

This is a list of multiple Deepfakes‑related datasets, each with specific uses and characteristics, e.g., UADFV for detecting inconsistent head pose, EBV for revealing AI‑generated fake‑face videos by detecting eye blinks, etc.

Source githubUpdated Dec 22, 2021776 viewsLinked
Inspect dataset

CASIA-SURF, CASIA-SURF-CeFA, CASIA-SURF-HiFiMask, CASIA-SURF-SuHiFiMask

Face RecognitionAnti‑spoofing

This is a large multimodal benchmark dataset for face anti‑spoofing, comprising multiple datasets including CASIA‑SURF, CASIA‑SURF‑CeFA, CASIA‑SURF‑HiFiMask, and CASIA‑SURF‑SuHiFiMask. These datasets support face anti‑spoofing research across various modalities and cross‑ethnicity analyses.

Source githubUpdated May 20, 2024245 viewsLinked
Inspect dataset

IMDb-Face, Megaface

Face RecognitionImage Dataset

The IMDb‑Face dataset is used for face recognition and contains facial images gathered from IMDb. The Megaface dataset is a large‑scale face recognition benchmark comprising multiple subsets for various recognition tasks.

Source githubUpdated Jan 4, 2024276 viewsLinked
Inspect dataset

Synthetic Faces High Quality (SFHQ) dataset

Face RecognitionImage Processing

The dataset comprises approximately 425,000 carefully selected high‑quality synthetic face images at 1024 × 1024 resolution, generated by transforming various inspirations such as paintings, sketches, 3D models, and text‑to‑image generators into realistic faces. It also includes facial landmarks (an extended set of 110 points) and semantic segmentation masks for face parsing.

Source githubUpdated Dec 20, 2022350 viewsLinked
Inspect dataset

facescrub-dataset

Face RecognitionComputer Vision

The dataset contains 47,500 face images, each 50 × 50 pixels in colour, sourced from facescrub. It is intended for training and validation, extracted using OpenCV HOG face detection and not manually cleaned.

Source githubUpdated Nov 10, 2023236 viewsLinked
Inspect dataset