Connectionist Temporal Classification with Maximum Entropy Regularization
Authors: Hu Liu, Sheng Jin, Changshui Zhang
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on scene text recognition show that our proposed methods consistently improve over the CTC baseline without the need to adjust training settings. 4 Experiments We evaluate our proposed method on several standard benchmarks for scene text recognition tasks. |
| Researcher Affiliation | Academia | Institute for Artificial Intelligence, Tsinghua University (THUAI) Beijing National Research Center for Information Science and Technology (BNRist) State Key Lab of Intelligent Technologies and Systems Department of Automation, Tsinghua University, Beijing, P.R.China |
| Pseudocode | No | Due to space limit, the details of the dynamic programming are presented in the supplementary material. |
| Open Source Code | Yes | Code has been made publicly available at: https://github.com/liuhu-bigeye/enctc.crnn. |
| Open Datasets | Yes | Synth90K [11] consists of 8M training images and 1M testing images generated by a synthetic data engine. ICDAR-2003 (IC03) [20], ICDAR-2013 (IC13) [14], IIIT5k-word (IIIT5k) [24] and Street View Text (SVT) [34] datasets. |
| Dataset Splits | Yes | Synth90K [11] consists of 8M training images and 1M testing images generated by a synthetic data engine. Synth5K is a small-scale dataset with 5K training data and 5K testing data randomly sampled from Synth90K. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments. |
| Software Dependencies | No | Models and loss functions are all implemented using Pytorch [26]. |
| Experiment Setup | Yes | We use RMSProp to train our model and set the batch size to 100. The learning rate is fixed at 1 10 3 during training. The training stops at 150 epochs. For all the experiments, we set β as 0.2 and τ as 1.5 without further tuning. |