reproducibilityindex.ai

OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks

Authors: Pierre Sermanet; Rob Fergus; Yann LeCun; Xiang Zhang; David Eigen; Michael Mathieu

ICLR 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments are conducted on the Image Net ILSVRC 2012 and 2013 datasets and establish state of the art results on the ILSVRC 2013 localization and detection tasks.
Researcher Affiliation	Academia	Courant Institute of Mathematical Sciences, New York University 719 Broadway, 12th Floor, New York, NY 10003 sermanet,deigen,xiang,mathieu,fergus,yann@cs.nyu.edu
Pseudocode	Yes	We combine the individual predictions (see Fig. 7) via a greedy merge strategy applied to the regressor bounding boxes, using the following algorithm. (a) Assign to Cs the set of classes in the top k for each scale s 1 . . . 6, found by taking the maximum detection class outputs across spatial locations for that scale. (b) Assign to Bs the set of bounding boxes predicted by the regressor network for each class in Cs, across all spatial locations at scale s. (c) Assign B S s Bs (d) Repeat merging until done: (e) (b 1, b 2) = argminb1 =b2 Bmatch score(b1, b2) (f) If match score(b 1, b 2) > t , stop. (g) Otherwise, set B B\{b 1, b 2} box merge(b 1, b 2)
Open Source Code	Yes	Along with this paper, we release a feature extractor named Over Feat 1 in order to provide powerful features for computer vision research. Two models are provided, a fast and accurate one. Each architecture is described in tables 1 and 3. We also compare their sizes in Table 4 in terms of parameters and connections. 1http://cilvr.nyu.edu/doku.php?id=software:overfeat:start
Open Datasets	Yes	We train the network on the Image Net 2012 training set (1.2 million images and C = 1000 classes) [5].
Dataset Splits	Yes	We apply our network to the Imagenet 2012 validation set using the localization criterion speciﬁed for the competition.
Hardware Specification	Yes	Our network with 6 scales takes around 2 secs on a K20x GPU to process one image
Software Dependencies	No	No specific software dependencies with version numbers are mentioned. It mentions common ML techniques and components like 'relu', 'max pooling', 'Drop Out', 'softmax', 'stochastic gradient descent', but not software frameworks or libraries with versions.
Experiment Setup	Yes	Each image is downsampled so that the smallest dimension is 256 pixels. We then extract 5 random crops (and their horizontal ﬂips) of size 221x221 pixels and present these to the network in mini-batches of size 128. The weights in the network are initialized randomly with (µ, σ) = (0, 1 10 2). They are then updated by stochastic gradient descent, accompanied by momentum term of 0.6 and an ℓ2 weight decay of 1 10 5. The learning rate is initially 5 10 2 and is successively decreased by a factor of 0.5 after (30, 50, 60, 70, 80) epochs. Drop Out [11] with a rate of 0.5 is employed on the fully connected layers (6th and 7th) in the classiﬁer.