SEMv3: A Fast and Robust Approach to Table Separation Line Detection

Authors: Chunxia Qin, Zhenrong Zhang, Pengfei Hu, Chenyu Liu, Jiefeng Ma, Jun Du

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive ablation studies demonstrate that our proposed KOR module can detect table separation lines quickly and accurately. Furthermore, on public datasets (e.g. WTW, ICDAR-2019 c TDa R Historical and i FLYTAB), SEMv3 achieves state-of-the-art (SOTA) performance.
Researcher Affiliation Collaboration Chunxia Qin1 , Zhenrong Zhang1 , Pengfei Hu1 , Chenyu Liu2 , Jiefeng Ma1 and Jun Du1 1University of Science and Technology of China 2i FLYTEK Research
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes The code is available at https://github.com/Chunchunwumu/SEMv3.
Open Datasets Yes We evaluate the performance of our method on several public datasets. These datasets encompass a wide range of challenging scenarios related to table structure recognition. ICDAR-2019 c TDa R Historical [Gao et al., 2019] dataset contains 600 training samples and 150 testing samples from archival historical documents. WTW [Long et al., 2021] contains 14581 wired table images collected from real business scenarios. i FLYTAB [Zhang et al., 2024] contains 12,103 training samples and 5,188 testing samples.
Dataset Splits No The paper mentions training and testing samples for datasets like ICDAR-2019 c TDa R Historical and i FLYTAB, but does not explicitly provide details about a validation dataset split.
Hardware Specification Yes All experiments are implemented in Pytorch v1.7.1 and conducted on 4 Nvidia Tesla V100 GPUs with 24GB RAM memory.
Software Dependencies Yes All experiments are implemented in Pytorch v1.7.1 and conducted on 4 Nvidia Tesla V100 GPUs with 24GB RAM memory.
Experiment Setup Yes Models are trained end-to-end for 100 epochs. We use Adam [Kingma and Ba, 2014] as the optimizer. The initial learning rate is 1 10 4, and the learning rate is adjusted to 1 10 6 according to the cosine annealing strategy [Loshchilov and Hutter, 2016]. All experiments are implemented in Pytorch v1.7.1 and conducted on 4 Nvidia Tesla V100 GPUs with 24GB RAM memory. During the training process, the ground truth grid boxes coordinates are used when extracting grid represent feature using Ro IAlign. ... And the feature F channel number C is 256, grid feature channel number Cg is 512. The sampling step size t is 32.