Dataset assetOpen Source CommunityIndustrial Anomaly DetectionMultimodal Large Language Models

MMAD

The MMAD dataset is a comprehensive benchmark dataset for multimodal large language models in the field of industrial anomaly detection, containing questions, images, and descriptive text. All questions are presented in multiple‑choice format and have been manually verified. Images come from multiple sources and retain ground‑truth mask format to facilitate future evaluation of segmentation performance of multimodal large language models. The descriptive text is mostly of good quality but has not been manually verified, so use with caution. MMAD aims to evaluate the performance of current multimodal large language models in industrial quality inspection and identify key challenges in industrial anomaly detection.

Source

huggingface

Created

Oct 17, 2024

Updated

Oct 30, 2024

Signals

937 views

Availability

Linked source ready

Overview

Dataset description and usage context

MMAD: The First-Ever Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection

Dataset Overview

Task Type: Question Answering
Tags:
- Anomaly Detection
- Multimodal Large Language Model (MLLM)
Scale: 10K<n<100K
License: MIT

Dataset Content

Content: Includes questions, images, and descriptive text.
Questions: All questions are in multiple‑choice format and have been manually verified, including options and answers.
Images: Image sources include the following datasets:
- DS-MVTec
- MVTec-AD
- MVTec-LOCO
- VisA
- GoodsAD Images retain ground‑truth mask format to facilitate future evaluation of segmentation performance of multimodal large language models.
Descriptive Text: Most images have a corresponding text file in the same folder containing relevant descriptions. Since this is not the primary focus of the benchmark, it has not been manually verified. Although most descriptions are of good quality, use with caution.

Dataset Objectives

Evaluate the performance of current multimodal large language models in industrial quality inspection.
Identify the multimodal large language models that perform best in industrial anomaly detection.
Recognize key challenges for multimodal large language models in industrial anomaly detection.

Evaluation Method

Please refer to the evaluation/examples folder in the GitHub repository.

Citation

bibtex @inproceedings{Jiang2024MMADTF, title={MMAD: The First-Ever Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection}, author={Xi Jiang and Jian Li and Hanqiu Deng and Yong Liu and Bin-Bin Gao and Yifeng Zhou and Jialin Li and Chengjie Wang and Feng Zheng}, year={2024}, journal={arXiv preprint arXiv:2410.09453}, }

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio