TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy
Authors: Weichao Zhao, Hao Feng, Qi Liu, Jingqun Tang, Binghong Wu, Lei Liao, Shu Wei, Yongjie Ye, Hao Liu, Wengang Zhou, Houqiang Li, Can Huang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive quantitative and qualitative experiments on both table perception and comprehension tasks, conducted across various public benchmarks, validate the effectiveness of our Tab Pedia. |
| Researcher Affiliation | Collaboration | 1 University of Science and Technology of China, 2 Byte Dance |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | Yes | The source code and model also have been released at https://github.com/zhaowc-ustc/Tab Pedia. |
| Open Datasets | Yes | The entire data is derived from five public datasets, including Pub Tab1M [9], Fin Tab Net [5], Pub Tab Net [65], Wiki Table Questions (WTQ) [88] and Tab Fact [89]. ... The benchmark Com TQA has been open-sourced at https://huggingface.co/datasets/Byte Dance/Com TQA. |
| Dataset Splits | No | The paper lists the number of samples for training datasets and separate testing datasets for different tasks (e.g., "Pub Tab1M-Det TD 460k" for training and "Pub Tab1M-Det [9] contains 57,125 images for testing"), but it does not specify explicit train/validation/test splits (e.g., percentages) from a single dataset. |
| Hardware Specification | Yes | All experiments are implemented by Py Torch [96] and trained on 16 A100 GPUs. |
| Software Dependencies | No | The paper mentions "Py Torch [96]" as the implementation framework but does not provide specific version numbers for it or any other software dependencies. |
| Experiment Setup | Yes | For the hyper-parameters in model design, the number of meditative tokens is set to 256. The max length of text sequence is set to 4000 to satisfy task requirements. To implement Tab Pedia, we adopt a cosine schedule with one-cycle learning rate strategy [94]. In the pre-training phase, the learning rate warms up in the first 2% of the training process and then decreases from the peak rate (1e-3) with batch sizes of 64. In the fine-tuning phase, we set the peak learning rate as 5e-6 with batch sizes of 16. We employ the Adam W optimizer [95] in both phases. |