TPS++: Attention-Enhanced Thin-Plate Spline for Scene Text Recognition
Authors: Tianlun Zheng, Zhineng Chen, Jinfeng Bai, Hongtao Xie, Yu-Gang Jiang
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on public benchmarks show that TPS++ consistently improves the recognition and achieves state-of-the-art accuracy. Meanwhile, it generalizes well on different backbones and recognizers. |
| Researcher Affiliation | Collaboration | Tianlun Zheng1 , Zhineng Chen1 , Jinfeng Bai2 , Hongtao Xie3 , Yu-Gang Jiang1 1Shanghai Collaborative Innovation Center of Intelligent Visual Computing, School of Computer Science, Fudan University, China 2Tomorrow Advance Life, China 3University of Science and Technology of China, China |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. The methods are described in prose. |
| Open Source Code | Yes | Code is at https://github.com/simplify23/TPS PP. |
| Open Datasets | Yes | MJSynth (MJ) [Jaderberg et al., 2014] and Synth Text (ST) [Gupta et al., 2016] are the two synthetic datasets with 8.91M and 6.95M text instances, respectively. |
| Dataset Splits | No | The paper mentions using synthetic datasets for training and public benchmarks for evaluation but does not explicitly provide training/validation/test dataset splits or cross-validation setup. |
| Hardware Specification | Yes | All models were trained by using a server with 6 NVIDIA 3080 GPUs. |
| Software Dependencies | No | The paper mentions specific optimizers and model architectures but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | All models were trained with Adam optimizer for 12 epochs on the two synthetic datasets, only word-level annotations are utilized. The initial learning rate was set to 1e 3, which was reduced to 1e 4 and 1e 5 at the 8th and 10th epoch, respectively. All input images were resized to 32 128. The batch size was set to 200. Warm-up strategy was used in the first epoch, and the initial warm-up ratio was set to 0.001. |