Connectionist Temporal Classification with Maximum Entropy Regularization

Authors: Hu Liu, Sheng Jin, Changshui Zhang

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on scene text recognition show that our proposed methods consistently improve over the CTC baseline without the need to adjust training settings. 4 Experiments We evaluate our proposed method on several standard benchmarks for scene text recognition tasks.
Researcher Affiliation Academia Institute for Artificial Intelligence, Tsinghua University (THUAI) Beijing National Research Center for Information Science and Technology (BNRist) State Key Lab of Intelligent Technologies and Systems Department of Automation, Tsinghua University, Beijing, P.R.China
Pseudocode No Due to space limit, the details of the dynamic programming are presented in the supplementary material.
Open Source Code Yes Code has been made publicly available at: https://github.com/liuhu-bigeye/enctc.crnn.
Open Datasets Yes Synth90K [11] consists of 8M training images and 1M testing images generated by a synthetic data engine. ICDAR-2003 (IC03) [20], ICDAR-2013 (IC13) [14], IIIT5k-word (IIIT5k) [24] and Street View Text (SVT) [34] datasets.
Dataset Splits Yes Synth90K [11] consists of 8M training images and 1M testing images generated by a synthetic data engine. Synth5K is a small-scale dataset with 5K training data and 5K testing data randomly sampled from Synth90K.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments.
Software Dependencies No Models and loss functions are all implemented using Pytorch [26].
Experiment Setup Yes We use RMSProp to train our model and set the batch size to 100. The learning rate is fixed at 1 10 3 during training. The training stops at 150 epochs. For all the experiments, we set β as 0.2 and τ as 1.5 without further tuning.