TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy

Authors: Weichao Zhao, Hao Feng, Qi Liu, Jingqun Tang, Binghong Wu, Lei Liao, Shu Wei, Yongjie Ye, Hao Liu, Wengang Zhou, Houqiang Li, Can Huang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive quantitative and qualitative experiments on both table perception and comprehension tasks, conducted across various public benchmarks, validate the effectiveness of our Tab Pedia.
Researcher Affiliation Collaboration 1 University of Science and Technology of China, 2 Byte Dance
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code Yes The source code and model also have been released at https://github.com/zhaowc-ustc/Tab Pedia.
Open Datasets Yes The entire data is derived from five public datasets, including Pub Tab1M [9], Fin Tab Net [5], Pub Tab Net [65], Wiki Table Questions (WTQ) [88] and Tab Fact [89]. ... The benchmark Com TQA has been open-sourced at https://huggingface.co/datasets/Byte Dance/Com TQA.
Dataset Splits No The paper lists the number of samples for training datasets and separate testing datasets for different tasks (e.g., "Pub Tab1M-Det TD 460k" for training and "Pub Tab1M-Det [9] contains 57,125 images for testing"), but it does not specify explicit train/validation/test splits (e.g., percentages) from a single dataset.
Hardware Specification Yes All experiments are implemented by Py Torch [96] and trained on 16 A100 GPUs.
Software Dependencies No The paper mentions "Py Torch [96]" as the implementation framework but does not provide specific version numbers for it or any other software dependencies.
Experiment Setup Yes For the hyper-parameters in model design, the number of meditative tokens is set to 256. The max length of text sequence is set to 4000 to satisfy task requirements. To implement Tab Pedia, we adopt a cosine schedule with one-cycle learning rate strategy [94]. In the pre-training phase, the learning rate warms up in the first 2% of the training process and then decreases from the peak rate (1e-3) with batch sizes of 64. In the fine-tuning phase, we set the peak learning rate as 5e-6 with batch sizes of 16. We employ the Adam W optimizer [95] in both phases.