Datasets | JuheAPI

ds4sd/SynthTabNet_OTSL

Table Structure Recognition

Object Detection

This dataset converts the original SynthTabNet tables into OTSL format for table‑structure recognition tasks. It comprises four parts, each containing 150 k tables (total 600 k). Each part is divided by table appearance, size, structure, and content, and split into training, test, and validation sets. The structure includes cell content, OTSL tokens, HTML structure, restored HTML, column count, row count, and image. An OTSL vocabulary defines cell token types. The dataset was transformed and maintained by IBM Research's Deep Search team.

hugging_face

View Details

Dataset Hub

Browse by Category

ds4sd/SynthTabNet_OTSL

bsmock/ICDAR-2013.c