TableSense: Spreadsheet Table Detection with Convolutional Neural Networks
Authors: Haoyu Dong, Shijie Liu, Shi Han, Zhouyu Fu, Dongmei Zhang69-76
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our evaluation shows that Table Sense is highly effective with 91.3% recall and 86.5% precision in Eo B-2 metric, a significant improvement over both the current detection algorithm that are used in commodity spreadsheet tools and state-of-the-art convolutional neural networks in computer vision. |
| Researcher Affiliation | Collaboration | 1Microsoft Research, Beijing 100080, China. 2Beihang University, Beijing 100191, China |
| Pseudocode | Yes | Algorithm 1 Active learning for Table Sense |
| Open Source Code | No | The paper mentions using external libraries like 'Closed XML' and 'Tensor Flow', but does not provide any statement or link indicating that the source code for their proposed methodology (Table Sense) is open-source or publicly available. |
| Open Datasets | No | The paper describes its own datasets, 'Web Sheet', 'Web Sheet10k', and 'Web Sheet400', which were created through web crawling and human labeling. However, it does not provide concrete access information (e.g., specific link, DOI, or repository name) for these datasets, nor does it explicitly state that they are publicly available. |
| Dataset Splits | No | The paper states 'We use sheets in Web Sheet10K for training and sheets in Web Sheet400 for testing.' It does not explicitly mention or detail a separate validation dataset split with specific percentages or counts. |
| Hardware Specification | Yes | Our experiments are implemented on Nvidia V100 GPUs with Tensor Flow (Abadi et al. 2016). |
| Software Dependencies | No | The paper mentions using 'Tensor Flow' and the 'Closed XML library' but does not provide specific version numbers for these software dependencies, which are necessary for reproducible descriptions. |
| Experiment Setup | Yes | We customized Res Nets (He et al. 2016) as the backbone for Table Sense, and the pooling layers are removed. The BBR module is combined with the PBR module to achieve accuracy promotion. Both the BBR and PBR modules contain three convolutional layers yet have different receptive fields and prediction targets. We use sheets in Web Sheet10K for training and sheets in Web Sheet400 for testing. To parse Excel files and extract features, we use the Closed XML3 library. Since spreadsheets have various sizes, the mini-batch size for training is set to 1. Due to the large variations in sheet size, the span of RPN anchors and the span of aspect ratios in our model range from 8 to 4,096 and 1/256 to 256 incrementing by factor of 2 respectively. As a result, our model can detect small tables with only 12 cells up to large tables with over 100,000 cells. For the region proposal module, the proposed region number is set to 2,000, and the top 2,000 Ro Is are further classified and refined in the detection branch. The weight decay is set to 0.0001 for regularization. The parameter k for the PBR module is set to 7. The rescaled output size of Ro IAlign is 14 14. |