Multimodal IFT and PT Dataset
This dataset is the first large‑scale multimodal IFT and PT dataset proposed for table understanding.
Dataset description and usage context
Dataset Overview
Dataset Name
MMTab
Dataset Description
MMTab is the first open‑source large‑scale multimodal table‑understanding dataset, designed to support training and evaluation of multimodal large language models (MLLMs) on table‑understanding tasks. Built from 14 public table datasets spanning eight domains, scripts convert raw textual tables into images emphasizing diverse structures and styles, and all task samples are transformed into a unified multimodal instruction‑tuning format: <table image, input request, output response>.
Dataset Structure
MMTab is divided into three parts:
- MMTab‑pre: 97 K table images, 150 K table recognition samples for pre‑training.
- MMTab‑instruct: 82 K table images, 232 K samples covering 14 basic table tasks for instruction tuning.
- MMTab‑eval: 23 K table images, 45 K samples for evaluating 17 internal benchmarks and 4 K samples for 7 external benchmarks.
Dataset Use Cases
The dataset supports training and evaluation of multimodal table‑understanding models, especially those that directly leverage visual information to interpret tables.
Dataset Download
The dataset can be downloaded from Hugging Face Dataset.
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.