LORE: Logical Location Regression Network for Table Structure Recognition

Authors: Hangdi Xing, Feiyu Gao, Rujiao Long, Jiajun Bu, Qi Zheng, Liangcheng Li, Cong Yao, Zhi Yu

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on standard benchmarks demonstrate that LORE consistently outperforms prior arts.
Researcher Affiliation Collaboration Hangdi Xing*1, Feiyu Gao*3, Rujiao Long3, Jiajun Bu1, Qi Zheng3, Liangcheng Li1, Cong Yao3, Zhi Yu 2 1Zhejiang Provincial Key Laboratory of Service Robot, College of Computer Science, Zhejiang University 2Zhejiang Provincial Key Laboratory of Service Robot, School of Software Technology, Zhejiang University 3DAMO Academy, Alibaba Group, Hangzhou, China
Pseudocode No No structured pseudocode or algorithm blocks were found.
Open Source Code Yes Code is available at https:// github.com/Alibaba Research/Advanced Literate Machinery/ tree/main/Document Understanding/LORE-TSR.
Open Datasets Yes We evaluate LORE on a wide range of benchmarks, including tables in digital-born documents, i.e., ICDAR-2013 (G obel et al. 2013), Sci TSR-comp (Chi et al. 2019), Pub Tab Net (Zhong, Shafiei Bavani, and Jimeno Yepes 2020), Table Bank (Li et al. 2020) and Table Graph-24K (Xue et al. 2021), as well as tables from scanned documents and photos, i.e., ICDAR-2019 (Gao et al. 2019) and WTW (Long et al. 2021).
Dataset Splits Yes It should be noted that ICDAR-2013 provides no training data, so we extend it to the partial version for cross validation following previous works (Raja, Mondal, and Jawahar 2020; Liu et al. 2022, 2021).
Hardware Specification Yes All the experiments are performed on the platform with 4 NVIDIA Tesla V100 GPUs.
Software Dependencies No The paper mentions using a DLA-34 backbone, but does not specify software versions for libraries like PyTorch, TensorFlow, or CUDA.
Experiment Setup Yes The model is trained for 100 epochs, and the initial learning rate is chosen as 1 10 4, decaying to 1 10 5 and 1 10 6 at the 70th and 90th epochs for all benchmarks. ... We use the DLA-34 (Yu et al. 2018) backbone, the output stride R = 4 and the number of channels d = 256. ... The number of attention layers is set to 3 for both the base and the stacking regressors.