OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks
Authors: Pierre Sermanet; Rob Fergus; Yann LeCun; Xiang Zhang; David Eigen; Michael Mathieu
ICLR 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments are conducted on the Image Net ILSVRC 2012 and 2013 datasets and establish state of the art results on the ILSVRC 2013 localization and detection tasks. |
| Researcher Affiliation | Academia | Courant Institute of Mathematical Sciences, New York University 719 Broadway, 12th Floor, New York, NY 10003 sermanet,deigen,xiang,mathieu,fergus,yann@cs.nyu.edu |
| Pseudocode | Yes | We combine the individual predictions (see Fig. 7) via a greedy merge strategy applied to the regressor bounding boxes, using the following algorithm. (a) Assign to Cs the set of classes in the top k for each scale s 1 . . . 6, found by taking the maximum detection class outputs across spatial locations for that scale. (b) Assign to Bs the set of bounding boxes predicted by the regressor network for each class in Cs, across all spatial locations at scale s. (c) Assign B S s Bs (d) Repeat merging until done: (e) (b 1, b 2) = argminb1 =b2 Bmatch score(b1, b2) (f) If match score(b 1, b 2) > t , stop. (g) Otherwise, set B B\{b 1, b 2} box merge(b 1, b 2) |
| Open Source Code | Yes | Along with this paper, we release a feature extractor named Over Feat 1 in order to provide powerful features for computer vision research. Two models are provided, a fast and accurate one. Each architecture is described in tables 1 and 3. We also compare their sizes in Table 4 in terms of parameters and connections. 1http://cilvr.nyu.edu/doku.php?id=software:overfeat:start |
| Open Datasets | Yes | We train the network on the Image Net 2012 training set (1.2 million images and C = 1000 classes) [5]. |
| Dataset Splits | Yes | We apply our network to the Imagenet 2012 validation set using the localization criterion specified for the competition. |
| Hardware Specification | Yes | Our network with 6 scales takes around 2 secs on a K20x GPU to process one image |
| Software Dependencies | No | No specific software dependencies with version numbers are mentioned. It mentions common ML techniques and components like 'relu', 'max pooling', 'Drop Out', 'softmax', 'stochastic gradient descent', but not software frameworks or libraries with versions. |
| Experiment Setup | Yes | Each image is downsampled so that the smallest dimension is 256 pixels. We then extract 5 random crops (and their horizontal flips) of size 221x221 pixels and present these to the network in mini-batches of size 128. The weights in the network are initialized randomly with (µ, σ) = (0, 1 10 2). They are then updated by stochastic gradient descent, accompanied by momentum term of 0.6 and an ℓ2 weight decay of 1 10 5. The learning rate is initially 5 10 2 and is successively decreased by a factor of 0.5 after (30, 50, 60, 70, 80) epochs. Drop Out [11] with a rate of 0.5 is employed on the fully connected layers (6th and 7th) in the classifier. |