Using Fast Weights to Attend to the Recent Past
Authors: Jimmy Ba, Geoffrey E. Hinton, Volodymyr Mnih, Joel Z. Leibo, Catalin Ionescu
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To demonstrate the effectiveness of the fast associative memory, we first investigated the problems of associative retrieval (section 4.1) and MNIST classification (section 4.2). We compared fast weight models to regular RNNs and LSTM variants. We then applied the proposed fast weights to a facial expression recognition task using a fast associative memory model to store the results of processing at one level while examining a sequence of details at a finer level (section 4.3). Lastly, we show that fast weights can also be used effectively to implement reinforcement learning agents with memory (section 4.4). |
| Researcher Affiliation | Collaboration | Jimmy Ba University of Toronto jimmy@psi.toronto.edu Geoffrey Hinton University of Toronto and Google Brain geoffhinton@google.com Volodymyr Mnih Google Deep Mind vmnih@google.com Joel Z. Leibo Google Deep Mind jzl@google.com Catalin Ionescu Google Deep Mind cdi@google.com |
| Pseudocode | No | The paper provides mathematical equations for update rules and describes the model's steps, such as 'A(t) = λA(t 1) + ηh(t)h(t)T' and 'hs+1(t + 1) = f([Wh(t) + Cx(t)] + A(t)hs(t + 1))', but it does not include explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement or a direct link to the open-source code for the methodology described within this paper. |
| Open Datasets | Yes | We evaluated the multi-level visual attention model on the MNIST handwritten digit dataset. We performed facial expression recognition tasks on the CMU Multi-PIE face database [Gross et al., 2010]. |
| Dataset Splits | Yes | We generated 100,000 training examples, 10,000 validation examples and 20,000 test examples. The hyper-parameters of the experiments were selected through grid search on the validation set. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware (e.g., GPU models, CPU types, or cloud infrastructure specifications) used to run the experiments. |
| Software Dependencies | No | The paper mentions the use of 'Adam optimizer' but does not specify any version numbers for software components, libraries, or programming languages used in the experiments. |
| Experiment Setup | Yes | For the rest of the paper, fast weight models are trained using layer normalization and the outer product learning rule with fast learning rate of 0.5 and decay rate of 0.95, unless otherwise noted. All the models were trained using mini-batches of size 128 and the Adam optimizer [Kingma and Ba, 2014]. A description of the training protocols and the hyper-parameter settings we used can be found in the Appendix. |