Back to datasets
Dataset assetOpen Source CommunityFace RecognitionImage Processing
Synthetic Faces High Quality (SFHQ) dataset
The dataset comprises approximately 425,000 carefully selected high‑quality synthetic face images at 1024 × 1024 resolution, generated by transforming various inspirations such as paintings, sketches, 3D models, and text‑to‑image generators into realistic faces. It also includes facial landmarks (an extended set of 110 points) and semantic segmentation masks for face parsing.
Source
github
Created
Sep 4, 2022
Updated
Dec 20, 2022
Signals
350 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
Dataset Name
- Synthetic Faces High Quality (SFHQ) dataset
Dataset Composition
- Total Images: ~425,000
- Resolution: 1024 × 1024
- Four Parts:
- Part 1: 89,785 images sourced from Artstation‑Artistic‑face‑HQ Dataset (AAHQ), Close‑Up Humans Dataset, and UIBVFED Dataset.
- Part 2: 91,361 images sourced from Face Synthetics Dataset and Stable Diffusion v1.4.
- Part 3: 118,358 images generated via the StyleGAN2 mapping network.
- Part 4: 125,754 images generated via Stable Diffusion v2.1.
Generation Process
- Inspiration Sources: paintings, 3D models, text‑to‑image generators, etc.
- Image Processing: StyleGAN2 latent‑space encoding and fine‑tuning to produce photo‑realistic images.
- Selection: Semi‑automatic and manual filtering using a visual taste approximator tool.
Additional Information
- Facial Features: 110 facial landmark points and semantic segmentation maps.
- Tools:
explore_dataset.pyscript provided for accessing landmarks, masks, and text‑based search. - Privacy & License: All images are synthetic; no privacy or copyright concerns.
Use Cases
- Training machine‑learning models, especially generative adversarial networks (e.g., StyleGAN).
- Provides extensive diversity across identity, ethnicity, age, pose, expression, lighting, hairstyle, and hair color.
Download
- Available on Kaggle as separate parts.
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.