reproducibilityindex.ai

Deep Structured Output Learning for Unconstrained Text Recognition

Authors: Max Jaderberg, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman

ICLR 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The resulting model is a more accurate system on standard real-world text recognition benchmarks than character prediction alone, setting a benchmark for systems that have not been trained on a particular lexicon. In addition, our model achieves state-of-the-art accuracy in lexicon-constrained scenarios, without being speciﬁcally modelled for constrained recognition. To test the generalisation of our model, we also perform experiments with random alpha-numeric strings to evaluate the method when no visual language model is applicable.
Researcher Affiliation	Collaboration	Max Jaderberg , Karen Simonyan*, Andrea Vedaldi & Andrew Zisserman+ Visual Geometry Group, Department of Engineering Science, University of Oxford {max,karen,vedaldi,az}@robots.ox.ac.uk Current afﬁliation Google Deep Mind. +Current afﬁliation University of Oxford and Google Deep Mind.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statement or link for open-source code for the described methodology.
Open Datasets	Yes	We evaluate our models on a number of standard datasets ICDAR 2003, ICDAR 2013, Street View Text, and IIIT5k, whereas for training, as well as testing across a larger vocabulary, we turn to the synthetic Synth90k and Synth Rand datasets. (followed by citations like 'Lucas et al. (2003)', 'Karatzas et al. (2013)', 'Wang et al. (2011)', 'Mishra et al. (2012)', 'Jaderberg et al. (2014a;b)')
Dataset Splits	No	The paper describes training and test splits for datasets like Synth90k ('approximately 8 million training images and 900k test images') and Synth Rand ('8 million training images and the test set of 900k images'), but does not explicitly state a separate validation dataset split with specific percentages or counts.
Hardware Specification	No	The acknowledgments section mentions 'NVIDIA Corporation with the donation of the GPUs used for this research' but does not specify any particular GPU models or other hardware components used for running experiments.
Software Dependencies	No	The paper describes the training algorithms and loss functions (e.g., 'multinomial logistic regression loss', 'stochastic gradient descent'), but does not specify any software dependencies or libraries with version numbers.
Experiment Setup	Yes	The paper provides details such as input image size (32x100), CNN architecture (number of layers, filter sizes, stride, pooling), activation functions (Rectified linear units), output layer configurations (Nmax=23, 10k N-grams), optimization method (SGD with dropout), beam search widths (5 during training, 10 during testing), and initialization strategy (pre-trained weights for JOINT model).