Multiple Object Recognition with Visual Attention
Authors: Jimmy Ba, Volodymyr Mnih, and Koray Kavukcuoglu
ICLR 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the model on the challenging task of transcribing house number sequences from Google Street View images and show that it is both more accurate than the state-of-the-art convolutional networks and uses fewer parameters and less computation. |
| Researcher Affiliation | Collaboration | Jimmy Lei Ba University of Toronto jimmy@psi.utoronto.ca Volodymyr Mnih Google Deep Mind vmnih@google.com Koray Kavukcuoglu Google Deep Mind korayk@google.com |
| Pseudocode | No | The paper describes the model and learning process using mathematical equations and textual descriptions, but it does not provide formal pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or a direct link to the open-source code for the described methodology. |
| Open Datasets | Yes | The publicly available multi-digit street view house number (SVHN) dataset Netzer et al. (2011) consists of images of digits taken from pictures of house fronts. ... The models are trained using the remaining 200,000 training images. |
| Dataset Splits | Yes | Following Goodfellow et al. (2013), we formed a validation set of 5000 images by randomly sampling images from the training set and the extra set, and these were used for selecting the learning rate and sampling variance for the stochastic glimpse policy. |
| Hardware Specification | No | The paper mentions training on 'a GPU' but does not provide specific hardware details such as GPU model, CPU type, or memory specifications. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., library names like TensorFlow or PyTorch with their respective versions). |
| Experiment Setup | Yes | We optimized the model parameters using stochastic gradient descent with the Nesterov momentum technique. A mini-batch size of 128 was used to estimate the gradient direction. The momentum coefficient was set to 0.9 throughout the training. The learning rate η scheduling was applied in training to improve the convergence of the learning process. η starts at 0.01 in the first epoch and was exponentially reduced by a factor of 0.97 after each epoch. |