Back to datasets
Dataset assetOpen Source CommunityAnomaly DetectionDataset

ADRepository

This repository offers a suite of real‑world datasets for anomaly detection, covering tabular data (categorical and numerical), time‑series data, graph data, image data, and video data. These datasets support deep anomaly detection research and can be cited alongside the associated publications.

Source
github
Created
Oct 16, 2020
Updated
May 23, 2024
Signals
481 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Overview

Numerical Datasets

  • Source: From the KDD19 paper – DevNet
  • Count: 7 datasets
  • Basic statistics:
    • Dataset: donors, census, fraud, celeba, backdoor, campaign, thyroid
    • Data size: 619,326, 299,285, 284,807, 202,599, 95,329, 41,188, 7,200
    • Dimensionality: 10, 500, 29, 39, 196, 62, 21
  • Details & benchmarks: Available in the DevNet paper; code at the DevNet GitHub repository.

Categorical Datasets

  • Count: 14 datasets
  • Basic statistics:
    • Dataset: bank, census, AID362, w7a, CMC, APAS, CelebA, Chess, AD, Solar‑flare, Probe, U2R, R10, CoverType
    • Data size: 41,188, 299,285, 4,279, 49,749, 1,473, 12,695, 202,599, 28,056, 3,279, 1,066, 64,759, 60,821, 12,897, 581,012
    • Dimensionality: 10, 33, 114, 300, 8, 64, 39, 6, 1,555, 11, 6, 6, 100, 44
    • Anomaly class: yes, 50K+, active, yes, child>10, train, bald, zero, ad., F, attack, attack, corn, cottonwood
  • Details & benchmarks: Provided in the corresponding papers.

Video Datasets

  • Datasets: ShanghaiTech Campus, UCF‑Crime
  • Characteristics: Features extracted with I3D backbone, suitable for weakly‑supervised video anomaly detection.
  • Details: Available in the related publications.

Image Datasets

  • Count: 14 datasets
  • Application domains: Defect detection, novelty detection, lesion detection in medical images, anomaly segmentation in autonomous driving scenarios.
  • Details: Available in the related publications.

Graph Datasets

  • Graph‑level anomaly detection: 16 datasets (e.g., PROTEINS_full, ENZYMES, AIDS, …).
  • Node‑level anomaly detection: 4 datasets (e.g., YelpRes, YelpHotel, YelpNYC, Amazon).
  • Details: Provided in the related literature.

Time‑Series Datasets

  • Common datasets: ASD, SMD, SWAT, WaQ, DSADS, Epilepsy
  • Details: Refer to the associated papers.
  • Note: Existing datasets have certain usage issues; consult the cited works for best practices.
Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio