MMAD
The MMAD dataset is a comprehensive benchmark dataset for multimodal large language models in the field of industrial anomaly detection, containing questions, images, and descriptive text. All questions are presented in multiple‑choice format and have been manually verified. Images come from multiple sources and retain ground‑truth mask format to facilitate future evaluation of segmentation performance of multimodal large language models. The descriptive text is mostly of good quality but has not been manually verified, so use with caution. MMAD aims to evaluate the performance of current multimodal large language models in industrial quality inspection and identify key challenges in industrial anomaly detection.
Description
MMAD: The First-Ever Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection
Dataset Overview
- Task Type: Question Answering
- Tags:
- Anomaly Detection
- Multimodal Large Language Model (MLLM)
- Scale: 10K<n<100K
- License: MIT
Dataset Content
- Content: Includes questions, images, and descriptive text.
- Questions: All questions are in multiple‑choice format and have been manually verified, including options and answers.
- Images: Image sources include the following datasets:
- DS-MVTec
- MVTec-AD
- MVTec-LOCO
- VisA
- GoodsAD Images retain ground‑truth mask format to facilitate future evaluation of segmentation performance of multimodal large language models.
- Descriptive Text: Most images have a corresponding text file in the same folder containing relevant descriptions. Since this is not the primary focus of the benchmark, it has not been manually verified. Although most descriptions are of good quality, use with caution.
Dataset Objectives
- Evaluate the performance of current multimodal large language models in industrial quality inspection.
- Identify the multimodal large language models that perform best in industrial anomaly detection.
- Recognize key challenges for multimodal large language models in industrial anomaly detection.
Evaluation Method
- Please refer to the evaluation/examples folder in the GitHub repository.
Citation
bibtex @inproceedings{Jiang2024MMADTF, title={MMAD: The First-Ever Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection}, author={Xi Jiang and Jian Li and Hanqiu Deng and Yong Liu and Bin-Bin Gao and Yifeng Zhou and Jialin Li and Chengjie Wang and Feng Zheng}, year={2024}, journal={arXiv preprint arXiv:2410.09453}, }
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: huggingface
Created: 10/17/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.