reproducibilityindex.ai

TFLOP: Table Structure Recognition Framework with Layout Pointer Mechanism

Authors: Minsoo Khang, Teakgyu Hong

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	As a result, TFLOP achieves the state-of-the-art performance across multiple benchmarks such as Pub Tab Net, Fin Tab Net, and Synth Tab Net. In our extensive experiments, TFLOP not only exhibits competitive performance but also shows promising results on industrial document TSR scenarios such as documents with watermarks or in non-English domain.
Researcher Affiliation	Industry	Minsoo Khang and Teakgyu Hong Upstage AI, South Korea {mkhang, tghong}@upstage.ai
Pseudocode	No	The paper describes the model architecture and equations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Source code of our work is publicly available at: https://github.com/UpstageAI/TFLOP.
Open Datasets	Yes	To validate the effectiveness of our framework, experiments are conducted against three popular TSR benchmark datasets: Pub Tab Net [Zhong et al., 2020], Fin Tab Net [Zheng et al., 2021], and Synth Tab Net [Nassar et al., 2022].
Dataset Splits	Yes	Pub Tab Net is one of the large-scale TSR datasets containing HTML annotations of tables extracted from scientific articles. It is composed of 500,777 training and 9,115 validation table images. Annotated test dataset comprising of 9,064 images was subsequently released, and TFLOP s TSR performance against both the validation and test datasets are reported in this work.
Hardware Specification	Yes	All experiments were conducted with 4 A100 GPUs at 250K training steps.
Software Dependencies	No	The paper mentions using specific models like Swin Transformer and BART, but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	In training of TFLOP, input image resolution is set to 768 768 across all benchmark datasets. The output sequence length, N, is fixed at 1,376 to allow sufficient length for the layout embedding and generation of the table tags. Feature dimension d of the framework is set to 1,024 and the hyper-parameters of the loss formulation Equation 7 are: λ1 = λ2 = λ3 = 1 and λ4 = λ5 = 0.5. The temperature value τ is set to 0.1. All experiments were conducted with 4 A100 GPUs at 250K training steps.