Using Fast Weights to Attend to the Recent Past

Authors: Jimmy Ba, Geoffrey E. Hinton, Volodymyr Mnih, Joel Z. Leibo, Catalin Ionescu

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To demonstrate the effectiveness of the fast associative memory, we first investigated the problems of associative retrieval (section 4.1) and MNIST classification (section 4.2). We compared fast weight models to regular RNNs and LSTM variants. We then applied the proposed fast weights to a facial expression recognition task using a fast associative memory model to store the results of processing at one level while examining a sequence of details at a finer level (section 4.3). Lastly, we show that fast weights can also be used effectively to implement reinforcement learning agents with memory (section 4.4).
Researcher Affiliation Collaboration Jimmy Ba University of Toronto jimmy@psi.toronto.edu Geoffrey Hinton University of Toronto and Google Brain geoffhinton@google.com Volodymyr Mnih Google Deep Mind vmnih@google.com Joel Z. Leibo Google Deep Mind jzl@google.com Catalin Ionescu Google Deep Mind cdi@google.com
Pseudocode No The paper provides mathematical equations for update rules and describes the model's steps, such as 'A(t) = λA(t 1) + ηh(t)h(t)T' and 'hs+1(t + 1) = f([Wh(t) + Cx(t)] + A(t)hs(t + 1))', but it does not include explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain an explicit statement or a direct link to the open-source code for the methodology described within this paper.
Open Datasets Yes We evaluated the multi-level visual attention model on the MNIST handwritten digit dataset. We performed facial expression recognition tasks on the CMU Multi-PIE face database [Gross et al., 2010].
Dataset Splits Yes We generated 100,000 training examples, 10,000 validation examples and 20,000 test examples. The hyper-parameters of the experiments were selected through grid search on the validation set.
Hardware Specification No The paper does not provide specific details regarding the hardware (e.g., GPU models, CPU types, or cloud infrastructure specifications) used to run the experiments.
Software Dependencies No The paper mentions the use of 'Adam optimizer' but does not specify any version numbers for software components, libraries, or programming languages used in the experiments.
Experiment Setup Yes For the rest of the paper, fast weight models are trained using layer normalization and the outer product learning rule with fast learning rate of 0.5 and decay rate of 0.95, unless otherwise noted. All the models were trained using mini-batches of size 128 and the Adam optimizer [Kingma and Ba, 2014]. A description of the training protocols and the hyper-parameter settings we used can be found in the Appendix.