JUHE API Marketplace
DATASET
Open Source Community

ADRepository

This repository offers a suite of real‑world datasets for anomaly detection, covering tabular data (categorical and numerical), time‑series data, graph data, image data, and video data. These datasets support deep anomaly detection research and can be cited alongside the associated publications.

Updated 5/23/2024
github

Description

Dataset Overview

Numerical Datasets

  • Source: From the KDD19 paper – DevNet
  • Count: 7 datasets
  • Basic statistics:
    • Dataset: donors, census, fraud, celeba, backdoor, campaign, thyroid
    • Data size: 619,326, 299,285, 284,807, 202,599, 95,329, 41,188, 7,200
    • Dimensionality: 10, 500, 29, 39, 196, 62, 21
  • Details & benchmarks: Available in the DevNet paper; code at the DevNet GitHub repository.

Categorical Datasets

  • Count: 14 datasets
  • Basic statistics:
    • Dataset: bank, census, AID362, w7a, CMC, APAS, CelebA, Chess, AD, Solar‑flare, Probe, U2R, R10, CoverType
    • Data size: 41,188, 299,285, 4,279, 49,749, 1,473, 12,695, 202,599, 28,056, 3,279, 1,066, 64,759, 60,821, 12,897, 581,012
    • Dimensionality: 10, 33, 114, 300, 8, 64, 39, 6, 1,555, 11, 6, 6, 100, 44
    • Anomaly class: yes, 50K+, active, yes, child>10, train, bald, zero, ad., F, attack, attack, corn, cottonwood
  • Details & benchmarks: Provided in the corresponding papers.

Video Datasets

  • Datasets: ShanghaiTech Campus, UCF‑Crime
  • Characteristics: Features extracted with I3D backbone, suitable for weakly‑supervised video anomaly detection.
  • Details: Available in the related publications.

Image Datasets

  • Count: 14 datasets
  • Application domains: Defect detection, novelty detection, lesion detection in medical images, anomaly segmentation in autonomous driving scenarios.
  • Details: Available in the related publications.

Graph Datasets

  • Graph‑level anomaly detection: 16 datasets (e.g., PROTEINS_full, ENZYMES, AIDS, …).
  • Node‑level anomaly detection: 4 datasets (e.g., YelpRes, YelpHotel, YelpNYC, Amazon).
  • Details: Provided in the related literature.

Time‑Series Datasets

  • Common datasets: ASD, SMD, SWAT, WaQ, DSADS, Epilepsy
  • Details: Refer to the associated papers.
  • Note: Existing datasets have certain usage issues; consult the cited works for best practices.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Anomaly Detection
Dataset

Source

Organization: github

Created: 10/16/2020

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.