SEMv3: A Fast and Robust Approach to Table Separation Line Detection
Authors: Chunxia Qin, Zhenrong Zhang, Pengfei Hu, Chenyu Liu, Jiefeng Ma, Jun Du
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive ablation studies demonstrate that our proposed KOR module can detect table separation lines quickly and accurately. Furthermore, on public datasets (e.g. WTW, ICDAR-2019 c TDa R Historical and i FLYTAB), SEMv3 achieves state-of-the-art (SOTA) performance. |
| Researcher Affiliation | Collaboration | Chunxia Qin1 , Zhenrong Zhang1 , Pengfei Hu1 , Chenyu Liu2 , Jiefeng Ma1 and Jun Du1 1University of Science and Technology of China 2i FLYTEK Research |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/Chunchunwumu/SEMv3. |
| Open Datasets | Yes | We evaluate the performance of our method on several public datasets. These datasets encompass a wide range of challenging scenarios related to table structure recognition. ICDAR-2019 c TDa R Historical [Gao et al., 2019] dataset contains 600 training samples and 150 testing samples from archival historical documents. WTW [Long et al., 2021] contains 14581 wired table images collected from real business scenarios. i FLYTAB [Zhang et al., 2024] contains 12,103 training samples and 5,188 testing samples. |
| Dataset Splits | No | The paper mentions training and testing samples for datasets like ICDAR-2019 c TDa R Historical and i FLYTAB, but does not explicitly provide details about a validation dataset split. |
| Hardware Specification | Yes | All experiments are implemented in Pytorch v1.7.1 and conducted on 4 Nvidia Tesla V100 GPUs with 24GB RAM memory. |
| Software Dependencies | Yes | All experiments are implemented in Pytorch v1.7.1 and conducted on 4 Nvidia Tesla V100 GPUs with 24GB RAM memory. |
| Experiment Setup | Yes | Models are trained end-to-end for 100 epochs. We use Adam [Kingma and Ba, 2014] as the optimizer. The initial learning rate is 1 10 4, and the learning rate is adjusted to 1 10 6 according to the cosine annealing strategy [Loshchilov and Hutter, 2016]. All experiments are implemented in Pytorch v1.7.1 and conducted on 4 Nvidia Tesla V100 GPUs with 24GB RAM memory. During the training process, the ground truth grid boxes coordinates are used when extracting grid represent feature using Ro IAlign. ... And the feature F channel number C is 256, grid feature channel number Cg is 512. The sampling step size t is 32. |