Sentry‑Image Dataset Overview

Dataset Description

The Sentry‑Image dataset is used for detecting AI‑generated images and includes training, validation, and test splits. Data are organized into two folders: ImageData (raw images) and MetaData (metadata).

Data Download

The dataset can be downloaded from Hugging Face with the following commands:

git lfs install
git clone https://huggingface.co/datasets/InfImagine/FakeImageDataset

Directory Structure

FakeImageDataset/
├── ImageData/
│   ├── train/
│   │   ├── IFv1-CC1M/
│   │   ├── SDv15R-CC1M/
│   │   └── stylegan3-80K/
│   └── val/
│       ├── IF-CC95K/
│       ├── Midjourneyv5-5K/
│       ├── SDv15-CC30K/
│       ├── SDv21-CC15K/
│       ├── cogview2-22K/
│       └── stylegan3-60K/
└── MetaData/
    ├── train/
    │   ├── IF-CC1M.csv
    │   ├── SDv15R-CC1M.csv
    │   └── stylegan3-80K.csv
    └── val/
        ├── IF-CC95K.csv
        ├── Midjourneyv5-5K.csv
        ├── SDv15-CC30K.csv
        ├── SDv21-CC15K.csv
        ├── cogview2-22K.csv
        ├── stylegan3-60K.csv
        └── stylegan3-80K.csv

Training Set (Fake2M)

Dataset	Generator	Count	Resolution	Image Path	Metadata Path
SD‑V1.5Real‑dpms‑25	Diffusion	1M	512	ImageData/train/SDv15R-CC1M	MetaData/train/SDv15R-CC1M.csv
IF‑V1.0‑dpms++‑25	Diffusion	1M	256	ImageData/train/IFv1-CC1M	MetaData/train/IF-CC1M.csv
StyleGAN3	GAN	87K	≥512	ImageData/train/stylegan3-80K	MetaData/train/stylegan3-80K.csv

Validation Set (MPBench)

Dataset	Generator	Count	Resolution	Image Path	Metadata Path
SDv15	Diffusion	30K	512	ImageData/val/SDv15-CC30K	MetaData/val/SDv15-CC30K.csv
SDv21	Diffusion	15K	512	ImageData/val/SDv21-CC15K	MetaData/val/SDv21-CC15K.csv
IF	Diffusion	95K	256	ImageData/val/IF-CC95K	MetaData/val/IF-CC95K.csv
Cogview2	AR	22K	480	ImageData/val/cogview2-22K	MetaData/val/cogview2-22K.csv
StyleGAN3	GAN	60K	≥512	ImageData/val/stylegan3-60K	MetaData/val/stylegan3-60K.csv
Midjourneyv5	–	5K	≥512	ImageData/val/Midjourneyv5-5K	MetaData/val/Midjourneyv5-5K.csv

Additional Information

Aesthetic Quality Scores: CLIP‑IQA based scores are available for download from Hugging Face.
Visualization: Visualisation files are provided.

Maintenance Plan

2023.7: Release training and validation sets.
2023.8: Publish open questionnaire.
2023.9: Support Stable Diffusion XL fake‑image set.
2023.9: Release training and evaluation code.
2023.10: Support Midjourney V5 fake‑image set.
2023.10: Release new test set.

License

The project is open‑source under the Apache‑2.0 license. Academic use is free; commercial use requires written permission.

Citation

Please cite the following when using the dataset:

@misc{sentry-image-leaderboard, title = {Sentry‑Image Leaderboard}, author = {Zeyu Lu and Di Huang and Chunli Zhang and Chengyue Wu and Xihui Liu and Lei Bai and Wanli Ouyang}, year = {2023}, publisher = {InfImagine, Shanghai AI Laboratory}, howpublished = "url{https://github.com/Inf-imagine/Sentry}" } @inproceedings{lu2023seeing, title = {Seeing is not always believing: Benchmarking Human and Model Perception of AI‑Generated Images}, author = {Zeyu Lu and Di Huang and Lei Bai and Jingjing Qu and Chengyue Wu and Xihui Liu and Wanli Ouyang}, booktitle = {Advances in Neural Information Processing Systems}, year = {2023}, }

Fake2M

Description