JUHE API Marketplace
DATASET
Open Source Community

Fake2M

A newly collected large‑scale fake‑image dataset for evaluating human and model ability to distinguish AI‑generated visual content.

Updated 9/23/2023
arXiv

Description

Sentry‑Image Dataset Overview

Dataset Description

The Sentry‑Image dataset is used for detecting AI‑generated images and includes training, validation, and test splits. Data are organized into two folders: ImageData (raw images) and MetaData (metadata).

Data Download

The dataset can be downloaded from Hugging Face with the following commands:

git lfs install
git clone https://huggingface.co/datasets/InfImagine/FakeImageDataset

Directory Structure

FakeImageDataset/
├── ImageData/
│   ├── train/
│   │   ├── IFv1-CC1M/
│   │   ├── SDv15R-CC1M/
│   │   └── stylegan3-80K/
│   └── val/
│       ├── IF-CC95K/
│       ├── Midjourneyv5-5K/
│       ├── SDv15-CC30K/
│       ├── SDv21-CC15K/
│       ├── cogview2-22K/
│       └── stylegan3-60K/
└── MetaData/
    ├── train/
    │   ├── IF-CC1M.csv
    │   ├── SDv15R-CC1M.csv
    │   └── stylegan3-80K.csv
    └── val/
        ├── IF-CC95K.csv
        ├── Midjourneyv5-5K.csv
        ├── SDv15-CC30K.csv
        ├── SDv21-CC15K.csv
        ├── cogview2-22K.csv
        ├── stylegan3-60K.csv
        └── stylegan3-80K.csv

Training Set (Fake2M)

DatasetGeneratorCountResolutionImage PathMetadata Path
SD‑V1.5Real‑dpms‑25Diffusion1M512ImageData/train/SDv15R-CC1MMetaData/train/SDv15R-CC1M.csv
IF‑V1.0‑dpms++‑25Diffusion1M256ImageData/train/IFv1-CC1MMetaData/train/IF-CC1M.csv
StyleGAN3GAN87K≥512ImageData/train/stylegan3-80KMetaData/train/stylegan3-80K.csv

Validation Set (MPBench)

DatasetGeneratorCountResolutionImage PathMetadata Path
SDv15Diffusion30K512ImageData/val/SDv15-CC30KMetaData/val/SDv15-CC30K.csv
SDv21Diffusion15K512ImageData/val/SDv21-CC15KMetaData/val/SDv21-CC15K.csv
IFDiffusion95K256ImageData/val/IF-CC95KMetaData/val/IF-CC95K.csv
Cogview2AR22K480ImageData/val/cogview2-22KMetaData/val/cogview2-22K.csv
StyleGAN3GAN60K≥512ImageData/val/stylegan3-60KMetaData/val/stylegan3-60K.csv
Midjourneyv55K≥512ImageData/val/Midjourneyv5-5KMetaData/val/Midjourneyv5-5K.csv

Additional Information

  • Aesthetic Quality Scores: CLIP‑IQA based scores are available for download from Hugging Face.
  • Visualization: Visualisation files are provided.

Maintenance Plan

  • 2023.7: Release training and validation sets.
  • 2023.8: Publish open questionnaire.
  • 2023.9: Support Stable Diffusion XL fake‑image set.
  • 2023.9: Release training and evaluation code.
  • 2023.10: Support Midjourney V5 fake‑image set.
  • 2023.10: Release new test set.

License

The project is open‑source under the Apache‑2.0 license. Academic use is free; commercial use requires written permission.

Citation

Please cite the following when using the dataset:

@misc{sentry-image-leaderboard, title = {Sentry‑Image Leaderboard}, author = {Zeyu Lu and Di Huang and Chunli Zhang and Chengyue Wu and Xihui Liu and Lei Bai and Wanli Ouyang}, year = {2023}, publisher = {InfImagine, Shanghai AI Laboratory}, howpublished = "url{https://github.com/Inf-imagine/Sentry}" } @inproceedings{lu2023seeing,  title = {Seeing is not always believing: Benchmarking Human and Model Perception of AI‑Generated Images},  author = {Zeyu Lu and Di Huang and Lei Bai and Jingjing Qu and Chengyue Wu and Xihui Liu and Wanli Ouyang},  booktitle = {Advances in Neural Information Processing Systems},  year = {2023}, }

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

AI‑Generated Content
Image Recognition

Source

Organization: arXiv

Created: 4/26/2023

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.