Back to datasets
Dataset assetOpen Source CommunityFace RecognitionImage Processing

student/FFHQ

The FFHQ (Flickr‑Faces‑HQ) dataset comprises 70,000 high‑quality PNG images at 1024 × 1024 resolution, featuring diverse ages, ethnicities, backgrounds, and accessories (glasses, hats, etc.). Images were sourced from Flickr under permissive licenses, automatically aligned and cropped using dlib, and filtered to remove non‑photos. The dataset supports research in generative adversarial networks and related fields.

Source
hugging_face
Created
Nov 28, 2025
Updated
Apr 16, 2022
Signals
306 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Name: Flickr‑Faces‑HQ Dataset (FFHQ)

Description: FFHQ is a high‑quality human‑face image dataset containing 70,000 PNG images at 1024 × 1024 resolution. It exhibits substantial variation in age, ethnicity, and background, and includes accessories such as glasses, sunglasses, and hats. Images were crawled from Flickr, automatically aligned and cropped using dlib, and only images under permissive licenses were collected.

Features:

  • Number of Images: 70,000
  • Resolution: 1024 × 1024
  • Format: PNG
  • Diversity: Varying ages, ethnicities, backgrounds, and accessories

License:

  • Individual images are released under various licenses (CC BY 2.0, CC BY‑NC 2.0, Public Domain Mark 1.0, CC0 1.0, U.S. Government Works). These allow free use, redistribution, and adaptation for non‑commercial purposes, with appropriate attribution and indication of changes where required.
  • The dataset itself (metadata, download script, documentation) is provided under CC BY‑NC‑SA 4.0 by NVIDIA Corporation.

Data Structure:

  • Main Folder: ffhq-dataset (2.56 TB, 210,014 files)
  • Metadata: ffhq-dataset-v1.json (254 MB)
  • Images: images1024x1024 (89.1 GB, 70,000 PNG files)
  • Thumbnails: thumbnails128x128 (1.95 GB, 70,000 PNG files)
  • Raw Images: in-the-wild-images (955 GB, 70,000 PNG files)
  • TFRecords: tfrecords (273 GB, 9 files)

Download & Usage:

  • Data can be downloaded directly from Google Drive or via the provided download_ffhq.py script, which handles verification, retries, and parallel downloading.

Training & Validation Split:

  • The first 60,000 images are designated for training; the remaining 10,000 are for validation.

Metadata Details:

  • Each image entry includes original Flickr information, aligned image details, thumbnail information, and raw image data, all recorded in ffhq-dataset-v1.json.

Acknowledgements:

  • Thanks to contributors and researchers who assisted with data collection, alignment, and release.

Contact:

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio