reproducibilityindex.ai

End-to-End Text Recognition with Hybrid HMM Maxout Models

Authors: Ouais Alsharif; Joelle Pineau

ICLR 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Using these elements, we build a tunable and highly accurate recognition system that beats state-of-the-art results on all the sub-problems for both the ICDAR 2003 and SVT benchmark datasets.
Researcher Affiliation	Academia	Ouais Alsharif OUAIS.ALSHARIF@MAIL.MCGILL.CA Reasoning and Learning Laboratory, School of Computer Science, Mc Gill University, Montreal, QC, Canada Joelle Pineau JPINEAU@CS.MCGILL.CA Reasoning and Learning Laboratory, School of Computer Science, Mc Gill University, Montreal, QC, Canada
Pseudocode	Yes	Algorithm 1 Cascade Beam Search
Open Source Code	No	Code for this paper will be provided with the ﬁnal version.
Open Datasets	Yes	The dataset we use for this task is the ICDAR 2003 character recognition dataset (Lucas et al., 2003) which consists of 6114 training samples and 5379 test samples after removing all non-alphanumeric characters as in (Wang et al., 2012). We augment the training dataset with 75,495 character images from the Chars74k English dataset (de Campos et al., 2009) and 50,000 synthetic characters generated by (Wang et al., 2012) making the total size of the training set 131,609 tightly cropped character images.
Dataset Splits	No	The dataset we use for this task is the ICDAR 2003 character recognition dataset (Lucas et al., 2003) which consists of 6114 training samples and 5379 test samples after removing all non-alphanumeric characters as in (Wang et al., 2012).
Hardware Specification	No	Training was done on GPUs using Theano (Bergstra et al., 2010) and pylearn (Goodfellow et al., 2013a). (No specific GPU model or other hardware details provided).
Software Dependencies	No	Training was done on GPUs using Theano (Bergstra et al., 2010) and pylearn (Goodfellow et al., 2013a). (Specific versions of these software dependencies are not provided.)
Experiment Setup	Yes	The architecture we use for this task is a ﬁve-layer convolutional Maxout network with the ﬁrst three layers being convolution-pooling Maxout layers, the fourth a Maxout layer and ﬁnally a softmax layer on top. The ﬁrst three layers have respectively 48, 128, 128 ﬁlters of sizes 8-by-8 for the ﬁrst two and 5-by-5 for the third, pooling over regions of sizes 4-by-4, 4-by-4 and 2-by-2 respectively, with 2 linear pieces per Maxout unit and a 2-by-2 stride. The 4th layer has 400 units and 5 linear pieces per Maxout unit, fully connected with the softmax output layer. We train the proposed network on 32-by-32 grey-scale character image patches with a simple preprocessing stage of subtracting the mean of every patch and dividing by its standard deviation + ϵ. Similar to (Goodfellow et al., 2013b), we train this network using stochastic gradient descent with momentum and dropout to maximize log p(y\|x).