Dataset assetOpen Source CommunityMultimodal Information FusionTable Understanding

Multimodal IFT and PT Dataset

This dataset is the first large‑scale multimodal IFT and PT dataset proposed for table understanding.

Source

github

Created

May 17, 2024

Updated

May 17, 2024

Signals

140 views

Availability

Linked source ready

Overview

Dataset description and usage context

Dataset Overview

Dataset Name

MMTab

Dataset Description

MMTab is the first open‑source large‑scale multimodal table‑understanding dataset, designed to support training and evaluation of multimodal large language models (MLLMs) on table‑understanding tasks. Built from 14 public table datasets spanning eight domains, scripts convert raw textual tables into images emphasizing diverse structures and styles, and all task samples are transformed into a unified multimodal instruction‑tuning format: <table image, input request, output response>.

Dataset Structure

MMTab is divided into three parts:

MMTab‑pre: 97 K table images, 150 K table recognition samples for pre‑training.
MMTab‑instruct: 82 K table images, 232 K samples covering 14 basic table tasks for instruction tuning.
MMTab‑eval: 23 K table images, 45 K samples for evaluating 17 internal benchmarks and 4 K samples for 7 external benchmarks.

Dataset Use Cases

The dataset supports training and evaluation of multimodal table‑understanding models, especially those that directly leverage visual information to interpret tables.

Dataset Download

The dataset can be downloaded from Hugging Face Dataset.

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio