Fake2M
A newly collected large‑scale fake‑image dataset for evaluating human and model ability to distinguish AI‑generated visual content.
Description
Sentry‑Image Dataset Overview
Dataset Description
The Sentry‑Image dataset is used for detecting AI‑generated images and includes training, validation, and test splits. Data are organized into two folders: ImageData (raw images) and MetaData (metadata).
Data Download
The dataset can be downloaded from Hugging Face with the following commands:
git lfs install
git clone https://huggingface.co/datasets/InfImagine/FakeImageDataset
Directory Structure
FakeImageDataset/
├── ImageData/
│ ├── train/
│ │ ├── IFv1-CC1M/
│ │ ├── SDv15R-CC1M/
│ │ └── stylegan3-80K/
│ └── val/
│ ├── IF-CC95K/
│ ├── Midjourneyv5-5K/
│ ├── SDv15-CC30K/
│ ├── SDv21-CC15K/
│ ├── cogview2-22K/
│ └── stylegan3-60K/
└── MetaData/
├── train/
│ ├── IF-CC1M.csv
│ ├── SDv15R-CC1M.csv
│ └── stylegan3-80K.csv
└── val/
├── IF-CC95K.csv
├── Midjourneyv5-5K.csv
├── SDv15-CC30K.csv
├── SDv21-CC15K.csv
├── cogview2-22K.csv
├── stylegan3-60K.csv
└── stylegan3-80K.csv
Training Set (Fake2M)
| Dataset | Generator | Count | Resolution | Image Path | Metadata Path |
|---|---|---|---|---|---|
| SD‑V1.5Real‑dpms‑25 | Diffusion | 1M | 512 | ImageData/train/SDv15R-CC1M | MetaData/train/SDv15R-CC1M.csv |
| IF‑V1.0‑dpms++‑25 | Diffusion | 1M | 256 | ImageData/train/IFv1-CC1M | MetaData/train/IF-CC1M.csv |
| StyleGAN3 | GAN | 87K | ≥512 | ImageData/train/stylegan3-80K | MetaData/train/stylegan3-80K.csv |
Validation Set (MPBench)
| Dataset | Generator | Count | Resolution | Image Path | Metadata Path |
|---|---|---|---|---|---|
| SDv15 | Diffusion | 30K | 512 | ImageData/val/SDv15-CC30K | MetaData/val/SDv15-CC30K.csv |
| SDv21 | Diffusion | 15K | 512 | ImageData/val/SDv21-CC15K | MetaData/val/SDv21-CC15K.csv |
| IF | Diffusion | 95K | 256 | ImageData/val/IF-CC95K | MetaData/val/IF-CC95K.csv |
| Cogview2 | AR | 22K | 480 | ImageData/val/cogview2-22K | MetaData/val/cogview2-22K.csv |
| StyleGAN3 | GAN | 60K | ≥512 | ImageData/val/stylegan3-60K | MetaData/val/stylegan3-60K.csv |
| Midjourneyv5 | – | 5K | ≥512 | ImageData/val/Midjourneyv5-5K | MetaData/val/Midjourneyv5-5K.csv |
Additional Information
- Aesthetic Quality Scores: CLIP‑IQA based scores are available for download from Hugging Face.
- Visualization: Visualisation files are provided.
Maintenance Plan
- 2023.7: Release training and validation sets.
- 2023.8: Publish open questionnaire.
- 2023.9: Support Stable Diffusion XL fake‑image set.
- 2023.9: Release training and evaluation code.
- 2023.10: Support Midjourney V5 fake‑image set.
- 2023.10: Release new test set.
License
The project is open‑source under the Apache‑2.0 license. Academic use is free; commercial use requires written permission.
Citation
Please cite the following when using the dataset:
@misc{sentry-image-leaderboard, title = {Sentry‑Image Leaderboard}, author = {Zeyu Lu and Di Huang and Chunli Zhang and Chengyue Wu and Xihui Liu and Lei Bai and Wanli Ouyang}, year = {2023}, publisher = {InfImagine, Shanghai AI Laboratory}, howpublished = "url{https://github.com/Inf-imagine/Sentry}" } @inproceedings{lu2023seeing, title = {Seeing is not always believing: Benchmarking Human and Model Perception of AI‑Generated Images}, author = {Zeyu Lu and Di Huang and Lei Bai and Jingjing Qu and Chengyue Wu and Xihui Liu and Wanli Ouyang}, booktitle = {Advances in Neural Information Processing Systems}, year = {2023}, }
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: arXiv
Created: 4/26/2023
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.