JUHE API Marketplace
DATASET
Open Source Community

MMAD

The MMAD dataset is a comprehensive benchmark dataset for multimodal large language models in the field of industrial anomaly detection, containing questions, images, and descriptive text. All questions are presented in multiple‑choice format and have been manually verified. Images come from multiple sources and retain ground‑truth mask format to facilitate future evaluation of segmentation performance of multimodal large language models. The descriptive text is mostly of good quality but has not been manually verified, so use with caution. MMAD aims to evaluate the performance of current multimodal large language models in industrial quality inspection and identify key challenges in industrial anomaly detection.

Updated 10/30/2024
huggingface

Description

MMAD: The First-Ever Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection

Dataset Overview

  • Task Type: Question Answering
  • Tags:
    • Anomaly Detection
    • Multimodal Large Language Model (MLLM)
  • Scale: 10K<n<100K
  • License: MIT

Dataset Content

  • Content: Includes questions, images, and descriptive text.
  • Questions: All questions are in multiple‑choice format and have been manually verified, including options and answers.
  • Images: Image sources include the following datasets:
    • DS-MVTec
    • MVTec-AD
    • MVTec-LOCO
    • VisA
    • GoodsAD Images retain ground‑truth mask format to facilitate future evaluation of segmentation performance of multimodal large language models.
  • Descriptive Text: Most images have a corresponding text file in the same folder containing relevant descriptions. Since this is not the primary focus of the benchmark, it has not been manually verified. Although most descriptions are of good quality, use with caution.

Dataset Objectives

  • Evaluate the performance of current multimodal large language models in industrial quality inspection.
  • Identify the multimodal large language models that perform best in industrial anomaly detection.
  • Recognize key challenges for multimodal large language models in industrial anomaly detection.

Evaluation Method

Citation

bibtex @inproceedings{Jiang2024MMADTF, title={MMAD: The First-Ever Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection}, author={Xi Jiang and Jian Li and Hanqiu Deng and Yong Liu and Bin-Bin Gao and Yifeng Zhou and Jialin Li and Chengjie Wang and Feng Zheng}, year={2024}, journal={arXiv preprint arXiv:2410.09453}, }

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Industrial Anomaly Detection
Multimodal Large Language Models

Source

Organization: huggingface

Created: 10/17/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.