JUHE API Marketplace
DATASET
Open Source Community

Flickr-Faces-HQ (FFHQ)

Flickr‑Faces‑HQ (FFHQ) is a high‑quality face image dataset originally created as a benchmark for Generative Adversarial Networks (GANs). The dataset contains 70,000 high‑quality PNG images at a resolution of 1024×1024, featuring significant variation in age, race, and background, as well as accessories such as glasses, sunglasses, and hats. Images were scraped from Flickr, inheriting its biases, and were automatically aligned and cropped using dlib. Only images with appropriate licenses were collected, and various automatic filters and Amazon Mechanical Turk were employed to remove occasional statues, paintings, or non‑photographic content.

Updated 5/24/2024
github

Description

Dataset Overview

Name: Flickr‑Faces‑HQ Dataset (FFHQ)

Description: FFHQ is a high‑quality face image dataset containing 70,000 PNG images at 1024×1024 resolution. The dataset exhibits significant diversity in age, race, and background, and includes accessories such as glasses, sunglasses, and hats. Images are sourced from Flickr and have been automatically aligned and cropped.

Usage: Primarily intended for research on Generative Adversarial Networks (GANs); not for development or improvement of facial recognition technologies.

Dataset Content

  • Number of Images: 70,000
  • Image Format: PNG
  • Resolution: 1024×1024
  • Dataset Size: 2.56 TB

Dataset Structure

  • Root Folder: ffhq-dataset
  • Subfolders and Contents:
    • ffhq-dataset-v2.json: Metadata (including copyright information, URLs, etc.) – 255 MB
    • images1024x1024: Aligned and cropped 1024×1024 images – 89.1 GB
    • thumbnails128x128: 128×128 thumbnails – 1.95 GB
    • in-the-wild-images: Original Flickr images – 955 GB
    • tfrecords: Multi‑resolution data for StyleGAN and StyleGAN2 – 273 GB
    • zips: ZIP archives of each folder's contents – 1.28 TB

Dataset Usage

  • Download Script: download_ffhq.py script is provided for automated download and verification.
  • Training & Validation: First 60,000 images are used for training; the remaining 10,000 for validation.

Copyright & License

  • Image Licenses: Various Creative Commons licenses that permit free use, redistribution, and adaptation, with some requiring attribution and indication of changes.
  • Dataset License: Released by NVIDIA Corporation under Creative Commons BY‑NC‑SA 4.0, allowing non‑commercial use, redistribution, and adaptation provided the original paper is cited and changes are noted; derivative works must use the same license.

Privacy Protection

  • The dataset only includes photos whose authors have explicitly permitted free use and redistribution.
  • Mechanisms are provided for users to check whether their photos are included and request removal.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Face Recognition
Generative Adversarial Networks

Source

Organization: github

Created: 2/4/2019

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.