Divide Rows and Conquer Cells: Towards Structure Recognition for Large Tables
Authors: Huawen Shen, Xiang Gao, Jin Wei, Liang Qiao, Yu Zhou, Qiang Li, Zhanzhan Cheng
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our method achieves competitive performance on three public benchmarks. Our models are conducted on three popular public benchmarks, including Pub Tab Net [Zhong et al., 2020], Sci TSR [Chi et al., 2019] and Synth Tab Net [Nassar et al., 2022] to verify the effectiveness of our model. |
| Researcher Affiliation | Collaboration | 1Institute of Information Engineering, Chinese Academy of Sciences 2School of Cyber Security, University of Chinese Academy of Sciences 3Hikvision Research Institute, China 4School of Information and Communication Engineering, Communication University of China {shenhuawen, zhouyu, liqiang}@iie.ac.cn, gaoxiang181@mails.ucas.ac.cn, weijin@cuc.edu.cn, {qiaoliang6, chengzhanzhan}@hikvision.com |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | Our models are conducted on three popular public benchmarks, including Pub Tab Net [Zhong et al., 2020], Sci TSR [Chi et al., 2019] and Synth Tab Net [Nassar et al., 2022] to verify the effectiveness of our model. |
| Dataset Splits | Yes | Pub Tab Net...contains 500,777 training images, 9,115 validating images, and 9,138 testing images. Synth Tab Net...all images are divided into train, test and val splits (80%, 10%, 10%). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | We set H = W = 960, i.e., rescale all training and testing table images to 960 960 resolution, the resolution of the final extracted CNN feature maps are 30 30. We set C = 512, i.e., the feature dimension at all network modules are fixed at 512. The sequence length of the Transformer encoder is 900, which is in line with CNN feature map size. The row decoder sequence length Lrow is set to be 50, and the cell decoder sequence length Lcell is set to 500. |