Multimodal IFT and PT Dataset
This dataset is the first large‑scale multimodal IFT and PT dataset proposed for table understanding.
Description
Dataset Overview
Dataset Name
MMTab
Dataset Description
MMTab is the first open‑source large‑scale multimodal table‑understanding dataset, designed to support training and evaluation of multimodal large language models (MLLMs) on table‑understanding tasks. Built from 14 public table datasets spanning eight domains, scripts convert raw textual tables into images emphasizing diverse structures and styles, and all task samples are transformed into a unified multimodal instruction‑tuning format: <table image, input request, output response>.
Dataset Structure
MMTab is divided into three parts:
- MMTab‑pre: 97 K table images, 150 K table recognition samples for pre‑training.
- MMTab‑instruct: 82 K table images, 232 K samples covering 14 basic table tasks for instruction tuning.
- MMTab‑eval: 23 K table images, 45 K samples for evaluating 17 internal benchmarks and 4 K samples for 7 external benchmarks.
Dataset Use Cases
The dataset supports training and evaluation of multimodal table‑understanding models, especially those that directly leverage visual information to interpret tables.
Dataset Download
The dataset can be downloaded from Hugging Face Dataset.
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: github
Created: 5/17/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.