GTC: Guided Training of CTC towards Efficient and Accurate Scene Text Recognition
Authors: Wenyang Hu, Xiaocong Cai, Jun Hou, Shuai Yi, Zhiping Lin11005-11012
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on standard benchmarks demonstrate that our end-to-end model achieves a new state-of-the-art for regular and irregular scene text recognition and needs 6 times shorter inference time than attention-based methods. |
| Researcher Affiliation | Collaboration | 1Nanyang Technological University 2Sense Time Group Ltd. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | There are three public synthetic datasets, namely Synth90K (Jaderberg et al. 2015), Synth Text (Gupta, Vedaldi, and Zisserman 2016) and Synth Add (Li et al. 2019)...IIIT5K-Words (IIIT5K) (Mishra, Alahari, and Jawahar 2012)...Street View Text (SVT) (Wang, Babenko, and Belongie 2011)...ICDAR 2003 (IC03) (Lucas et al. 2003)...ICDAR 2013 (IC13) (Karatzas et al. 2013)...ICDAR 2015 (IC15) (Karatzas et al. 2015)...SVT Perspective (SVT-P) (Quy Phan et al. 2013)...CUTE80 (Risnumawan et al. 2014)...COCO-Text (COCO) (Veit et al. 2016). |
| Dataset Splits | No | The paper mentions training and testing data but does not explicitly provide details about a validation dataset split. |
| Hardware Specification | Yes | We implement our proposed network structure with Py Torch and conduct all experiments on NVIDIA Tesla V100 GPUs with 16GB memory. |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not provide specific version numbers for software dependencies needed to replicate the experiment. |
| Experiment Setup | Yes | We use a batch size of 32 on each GPU, with 32 GPUs in total. ADAM optimizer is chosen for training, with the initial learning rate set to 10-3 and a decay rate of 0.1 every 30000 iterations. |