Understanding and Controlling Memory in Recurrent Neural Networks
Authors: Doron Haviv, Alexander Rivkind, Omri Barak
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here, we utilize different training protocols, datasets and architectures to obtain a range of networks solving a delayed classification task with similar performance, alongside substantial differences in their ability to extrapolate for longer delays. We analyze the dynamics of the network s hidden state, and uncover the reasons for this difference. |
| Researcher Affiliation | Academia | 1Faculty of Electrical Engineering, Technion, Israel Institute of Technology 2Network Biology Research Laboratory, Technion, Israel Institute of Technology 3Rappaport Faculty of Medicine, Technion, Israel Institute of Technology 4Currently at Weizmann Institute of Science, Israel . |
| Pseudocode | No | The paper provides mathematical equations for network units and speed calculation, but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code availabe at: https://github.com/Doron Haviv/Memory RNN |
| Open Datasets | Yes | The network was presented with a series of noisy images, among which appears a single target image (from MNIST or CIFAR-10) at time ts. ... The total stimulation time is Tmax = 20, and the network was requested to distinguish between |V | = 10 different classes of MNIST (Le Cun et al., 2010) or CIFAR-10 (Krizhevsky et al.). |
| Dataset Splits | No | The paper mentions a 'nominal test-set' but does not provide specific details on how the dataset was split into training, validation, and test sets (e.g., percentages, counts, or explicit standard splits). |
| Hardware Specification | Yes | The Titan Xp used for this research was donated by the NVIDIA Corporation. |
| Software Dependencies | No | The paper mentions using the 'Adam optimizer' but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow, or specific libraries). |
| Experiment Setup | Yes | For MNIST, the network consists of a single recurrent layer of d = 200 gated recurrent units, an output layer of |V | + 1 = 11 neurons, |V | = 10 neuron for the different classes, and an additional neuron for the null indicator. The input layer has n + 1 neurons... For CIFAR-10, the network was expanded to d = 400 recurrent units, along with a convolutional front-end composed of three convolutional layers and two dense layers. ... The network was trained using the Adam optimizer (Kingma & Ba, 2014) with a soft-max cross-entropy loss function with an increased loss on reporting at t = ta in proportion to Tmax. Full description of each protocol, including schedules and other hyper-parameters is given in the supplemental code. |