TFLOP: Table Structure Recognition Framework with Layout Pointer Mechanism

Authors: Minsoo Khang, Teakgyu Hong

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental As a result, TFLOP achieves the state-of-the-art performance across multiple benchmarks such as Pub Tab Net, Fin Tab Net, and Synth Tab Net. In our extensive experiments, TFLOP not only exhibits competitive performance but also shows promising results on industrial document TSR scenarios such as documents with watermarks or in non-English domain.
Researcher Affiliation Industry Minsoo Khang and Teakgyu Hong Upstage AI, South Korea {mkhang, tghong}@upstage.ai
Pseudocode No The paper describes the model architecture and equations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Source code of our work is publicly available at: https://github.com/UpstageAI/TFLOP.
Open Datasets Yes To validate the effectiveness of our framework, experiments are conducted against three popular TSR benchmark datasets: Pub Tab Net [Zhong et al., 2020], Fin Tab Net [Zheng et al., 2021], and Synth Tab Net [Nassar et al., 2022].
Dataset Splits Yes Pub Tab Net is one of the large-scale TSR datasets containing HTML annotations of tables extracted from scientific articles. It is composed of 500,777 training and 9,115 validation table images. Annotated test dataset comprising of 9,064 images was subsequently released, and TFLOP s TSR performance against both the validation and test datasets are reported in this work.
Hardware Specification Yes All experiments were conducted with 4 A100 GPUs at 250K training steps.
Software Dependencies No The paper mentions using specific models like Swin Transformer and BART, but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes In training of TFLOP, input image resolution is set to 768 768 across all benchmark datasets. The output sequence length, N, is fixed at 1,376 to allow sufficient length for the layout embedding and generation of the table tags. Feature dimension d of the framework is set to 1,024 and the hyper-parameters of the loss formulation Equation 7 are: λ1 = λ2 = λ3 = 1 and λ4 = λ5 = 0.5. The temperature value τ is set to 0.1. All experiments were conducted with 4 A100 GPUs at 250K training steps.