Deep Structured Output Learning for Unconstrained Text Recognition
Authors: Max Jaderberg, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman
ICLR 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The resulting model is a more accurate system on standard real-world text recognition benchmarks than character prediction alone, setting a benchmark for systems that have not been trained on a particular lexicon. In addition, our model achieves state-of-the-art accuracy in lexicon-constrained scenarios, without being specifically modelled for constrained recognition. To test the generalisation of our model, we also perform experiments with random alpha-numeric strings to evaluate the method when no visual language model is applicable. |
| Researcher Affiliation | Collaboration | Max Jaderberg , Karen Simonyan*, Andrea Vedaldi & Andrew Zisserman+ Visual Geometry Group, Department of Engineering Science, University of Oxford {max,karen,vedaldi,az}@robots.ox.ac.uk Current affiliation Google Deep Mind. +Current affiliation University of Oxford and Google Deep Mind. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement or link for open-source code for the described methodology. |
| Open Datasets | Yes | We evaluate our models on a number of standard datasets ICDAR 2003, ICDAR 2013, Street View Text, and IIIT5k, whereas for training, as well as testing across a larger vocabulary, we turn to the synthetic Synth90k and Synth Rand datasets. (followed by citations like 'Lucas et al. (2003)', 'Karatzas et al. (2013)', 'Wang et al. (2011)', 'Mishra et al. (2012)', 'Jaderberg et al. (2014a;b)') |
| Dataset Splits | No | The paper describes training and test splits for datasets like Synth90k ('approximately 8 million training images and 900k test images') and Synth Rand ('8 million training images and the test set of 900k images'), but does not explicitly state a separate validation dataset split with specific percentages or counts. |
| Hardware Specification | No | The acknowledgments section mentions 'NVIDIA Corporation with the donation of the GPUs used for this research' but does not specify any particular GPU models or other hardware components used for running experiments. |
| Software Dependencies | No | The paper describes the training algorithms and loss functions (e.g., 'multinomial logistic regression loss', 'stochastic gradient descent'), but does not specify any software dependencies or libraries with version numbers. |
| Experiment Setup | Yes | The paper provides details such as input image size (32x100), CNN architecture (number of layers, filter sizes, stride, pooling), activation functions (Rectified linear units), output layer configurations (Nmax=23, 10k N-grams), optimization method (SGD with dropout), beam search widths (5 during training, 10 during testing), and initialization strategy (pre-trained weights for JOINT model). |