Cortical microcircuits as gated-recurrent neural networks
Authors: Rui Costa, Ioannis Alexandros Assael, Brendan Shillingford, Nando de Freitas, TIm Vogels
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical evaluation across sequential image classification and language modelling tasks shows that sub LSTM units can achieve similar performance to LSTM units. |
| Researcher Affiliation | Collaboration | Rui Ponte Costa Centre for Neural Circuits and Behaviour Dept. of Physiology, Anatomy and Genetics University of Oxford, Oxford, UK rui.costa@cncb.ox.ac.uk Yannis M. Assael Dept. of Computer Science University of Oxford, Oxford, UK and Deep Mind, London, UK yannis.assael@cs.ox.ac.uk Brendan Shillingford Dept. of Computer Science University of Oxford, Oxford, UK and Deep Mind, London, UK brendan.shillingford@cs.ox.ac.uk Nando de Freitas Deep Mind London, UK nandodefreitas@google.com Tim P. Vogels Centre for Neural Circuits and Behaviour Dept. of Physiology, Anatomy and Genetics University of Oxford, Oxford, UK tim.vogels@cncb.ox.ac.uk |
| Pseudocode | No | The paper presents mathematical equations for the LSTM and sub LSTM models and their derivatives, along with diagrams, but it does not include a distinct section or block labeled as 'Pseudocode' or 'Algorithm'. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | In the sequential MNIST digit classification task, each digit image from the MNIST dataset is presented to the RNN as a sequence of pixels (Le et al. (2015); Fig. 2a). We first used the Penn Treebank (PTB) dataset to train our model on word-level language modelling (929k training, 73k validation and 82k test words; with a vocabulary of 10k words). We also tested the Wikitext-2 language modelling dataset based on Wikipedia articles. |
| Dataset Splits | Yes | We first used the Penn Treebank (PTB) dataset to train our model on word-level language modelling (929k training, 73k validation and 82k test words; with a vocabulary of 10k words). This dataset is twice as large as the PTB dataset (2000k training, 217k validation and 245k test words) and also features a larger vocabulary (33k words). |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'RMSProp with momentum' and 'Google Vizier' for optimization, but it does not specify version numbers for any software dependencies, libraries, or programming languages used in the implementation of the models. |
| Experiment Setup | Yes | The network was optimised using RMSProp with momentum (Tieleman and Hinton, 2012), a learning rate of 10 4, one hidden layer and 100 hidden units. All RNNs tested have 2 hidden layers; backpropagation is truncated to 35 steps, and a batch size of 20. To optimise the networks we used RMSProp with momentum. We also performed a hyperparameter search on the validation set over input, output, and update dropout rates, the learning rate, and weight decay. The hyperparameter search was done with Google Vizier, which performs black-box optimisation using Gaussian process bandits and transfer learning. Tables 2 and 3 show the resulting hyperparameters. |