JUHE API Marketplace
DATASET
Open Source Community

Multimodal IFT and PT Dataset

This dataset is the first large‑scale multimodal IFT and PT dataset proposed for table understanding.

Updated 5/17/2024
github

Description

Dataset Overview

Dataset Name

MMTab

Dataset Description

MMTab is the first open‑source large‑scale multimodal table‑understanding dataset, designed to support training and evaluation of multimodal large language models (MLLMs) on table‑understanding tasks. Built from 14 public table datasets spanning eight domains, scripts convert raw textual tables into images emphasizing diverse structures and styles, and all task samples are transformed into a unified multimodal instruction‑tuning format: <table image, input request, output response>.

Dataset Structure

MMTab is divided into three parts:

  • MMTab‑pre: 97 K table images, 150 K table recognition samples for pre‑training.
  • MMTab‑instruct: 82 K table images, 232 K samples covering 14 basic table tasks for instruction tuning.
  • MMTab‑eval: 23 K table images, 45 K samples for evaluating 17 internal benchmarks and 4 K samples for 7 external benchmarks.

Dataset Use Cases

The dataset supports training and evaluation of multimodal table‑understanding models, especially those that directly leverage visual information to interpret tables.

Dataset Download

The dataset can be downloaded from Hugging Face Dataset.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Multimodal Information Fusion
Table Understanding

Source

Organization: github

Created: 5/17/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.