Flickr-Faces-HQ (FFHQ)

Flickr‑Faces‑HQ (FFHQ) is a high‑quality face image dataset originally created as a benchmark for Generative Adversarial Networks (GANs). The dataset contains 70,000 high‑quality PNG images at a resolution of 1024×1024, featuring significant variation in age, race, and background, as well as accessories such as glasses, sunglasses, and hats. Images were scraped from Flickr, inheriting its biases, and were automatically aligned and cropped using dlib. Only images with appropriate licenses were collected, and various automatic filters and Amazon Mechanical Turk were employed to remove occasional statues, paintings, or non‑photographic content.

Updated 5/24/2024

github

Description

Dataset Overview

Name: Flickr‑Faces‑HQ Dataset (FFHQ)

Description: FFHQ is a high‑quality face image dataset containing 70,000 PNG images at 1024×1024 resolution. The dataset exhibits significant diversity in age, race, and background, and includes accessories such as glasses, sunglasses, and hats. Images are sourced from Flickr and have been automatically aligned and cropped.

Usage: Primarily intended for research on Generative Adversarial Networks (GANs); not for development or improvement of facial recognition technologies.

Dataset Content

Number of Images: 70,000
Image Format: PNG
Resolution: 1024×1024
Dataset Size: 2.56 TB

Dataset Structure

Root Folder: ffhq-dataset
Subfolders and Contents:
- ffhq-dataset-v2.json: Metadata (including copyright information, URLs, etc.) – 255 MB
- images1024x1024: Aligned and cropped 1024×1024 images – 89.1 GB
- thumbnails128x128: 128×128 thumbnails – 1.95 GB
- in-the-wild-images: Original Flickr images – 955 GB
- tfrecords: Multi‑resolution data for StyleGAN and StyleGAN2 – 273 GB
- zips: ZIP archives of each folder's contents – 1.28 TB

Dataset Usage

Download Script: download_ffhq.py script is provided for automated download and verification.
Training & Validation: First 60,000 images are used for training; the remaining 10,000 for validation.

Copyright & License

Image Licenses: Various Creative Commons licenses that permit free use, redistribution, and adaptation, with some requiring attribution and indication of changes.
Dataset License: Released by NVIDIA Corporation under Creative Commons BY‑NC‑SA 4.0, allowing non‑commercial use, redistribution, and adaptation provided the original paper is cited and changes are noted; derivative works must use the same license.

Privacy Protection

The dataset only includes photos whose authors have explicitly permitted free use and redistribution.
Mechanisms are provided for users to check whether their photos are included and request removal.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Please login to view download links and access full dataset details.

Topics

Face Recognition

Generative Adversarial Networks

Source

Organization: github

Created: 2/4/2019

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.

Check Prices →