Divide Rows and Conquer Cells: Towards Structure Recognition for Large Tables

Authors: Huawen Shen, Xiang Gao, Jin Wei, Liang Qiao, Yu Zhou, Qiang Li, Zhanzhan Cheng

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that our method achieves competitive performance on three public benchmarks. Our models are conducted on three popular public benchmarks, including Pub Tab Net [Zhong et al., 2020], Sci TSR [Chi et al., 2019] and Synth Tab Net [Nassar et al., 2022] to verify the effectiveness of our model.
Researcher Affiliation Collaboration 1Institute of Information Engineering, Chinese Academy of Sciences 2School of Cyber Security, University of Chinese Academy of Sciences 3Hikvision Research Institute, China 4School of Information and Communication Engineering, Communication University of China {shenhuawen, zhouyu, liqiang}@iie.ac.cn, gaoxiang181@mails.ucas.ac.cn, weijin@cuc.edu.cn, {qiaoliang6, chengzhanzhan}@hikvision.com
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes Our models are conducted on three popular public benchmarks, including Pub Tab Net [Zhong et al., 2020], Sci TSR [Chi et al., 2019] and Synth Tab Net [Nassar et al., 2022] to verify the effectiveness of our model.
Dataset Splits Yes Pub Tab Net...contains 500,777 training images, 9,115 validating images, and 9,138 testing images. Synth Tab Net...all images are divided into train, test and val splits (80%, 10%, 10%).
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU or CPU models.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup Yes We set H = W = 960, i.e., rescale all training and testing table images to 960 960 resolution, the resolution of the final extracted CNN feature maps are 30 30. We set C = 512, i.e., the feature dimension at all network modules are fixed at 512. The sequence length of the Transformer encoder is 900, which is in line with CNN feature map size. The row decoder sequence length Lrow is set to be 50, and the cell decoder sequence length Lcell is set to 500.