bsmock/ICDAR-2013.c
ICDAR‑2013.c dataset, released in 2023, is a branch of the original ICDAR‑2013 dataset modified by different authors. It includes minor corrections to the original data and automated fixes (e.g., normalization) to address over‑segmentation and make the dataset more consistent with other table structure recognition (TSR) datasets such as PubTables‑1M. For more details on this version and manual corrections, refer to the associated paper.
Dataset description and usage context
ICDAR-2013.c Dataset
Overview
ICDAR‑2013.c dataset was released in 2023 and can be considered a modified version of the original ICDAR‑2013 dataset. It contains manual corrections of minor annotation errors in the original data as well as automated normalizations to fix over‑segmentation issues and improve consistency with other TSR datasets such as PubTables‑1M.
Content
- Manual Corrections: Small annotation errors in the original dataset are corrected manually.
- Automated Corrections: Normalization is applied to resolve over‑segmentation and increase alignment with other TSR datasets.
Citation
If your research uses this dataset, please cite the following paper:
@article{smock2023aligning,
title={Aligning benchmark datasets for table structure recognition},
author={Smock, Brandon and Pesala, Rohith and Abraham, Robin},
booktitle={International Conference on Document Analysis and Recognition},
pages={371--386},
year={2023},
organization={Springer}
}
Original Dataset
The original ICDAR‑2013 dataset was released for the ICDAR 2013 Table Competition. The original dataset has no known license but is generally considered public domain, so we treat it as having no license restrictions.
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.