TFLOP: Table Structure Recognition Framework with Layout Pointer Mechanism
Authors: Minsoo Khang, Teakgyu Hong
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | As a result, TFLOP achieves the state-of-the-art performance across multiple benchmarks such as Pub Tab Net, Fin Tab Net, and Synth Tab Net. In our extensive experiments, TFLOP not only exhibits competitive performance but also shows promising results on industrial document TSR scenarios such as documents with watermarks or in non-English domain. |
| Researcher Affiliation | Industry | Minsoo Khang and Teakgyu Hong Upstage AI, South Korea {mkhang, tghong}@upstage.ai |
| Pseudocode | No | The paper describes the model architecture and equations but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Source code of our work is publicly available at: https://github.com/UpstageAI/TFLOP. |
| Open Datasets | Yes | To validate the effectiveness of our framework, experiments are conducted against three popular TSR benchmark datasets: Pub Tab Net [Zhong et al., 2020], Fin Tab Net [Zheng et al., 2021], and Synth Tab Net [Nassar et al., 2022]. |
| Dataset Splits | Yes | Pub Tab Net is one of the large-scale TSR datasets containing HTML annotations of tables extracted from scientific articles. It is composed of 500,777 training and 9,115 validation table images. Annotated test dataset comprising of 9,064 images was subsequently released, and TFLOP s TSR performance against both the validation and test datasets are reported in this work. |
| Hardware Specification | Yes | All experiments were conducted with 4 A100 GPUs at 250K training steps. |
| Software Dependencies | No | The paper mentions using specific models like Swin Transformer and BART, but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | In training of TFLOP, input image resolution is set to 768 768 across all benchmark datasets. The output sequence length, N, is fixed at 1,376 to allow sufficient length for the layout embedding and generation of the table tags. Feature dimension d of the framework is set to 1,024 and the hyper-parameters of the loss formulation Equation 7 are: λ1 = λ2 = λ3 = 1 and λ4 = λ5 = 0.5. The temperature value τ is set to 0.1. All experiments were conducted with 4 A100 GPUs at 250K training steps. |