reproducibilityindex.ai

Using Fast Weights to Attend to the Recent Past

Authors: Jimmy Ba, Geoffrey E. Hinton, Volodymyr Mnih, Joel Z. Leibo, Catalin Ionescu

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To demonstrate the effectiveness of the fast associative memory, we ﬁrst investigated the problems of associative retrieval (section 4.1) and MNIST classiﬁcation (section 4.2). We compared fast weight models to regular RNNs and LSTM variants. We then applied the proposed fast weights to a facial expression recognition task using a fast associative memory model to store the results of processing at one level while examining a sequence of details at a ﬁner level (section 4.3). Lastly, we show that fast weights can also be used effectively to implement reinforcement learning agents with memory (section 4.4).
Researcher Affiliation	Collaboration	Jimmy Ba University of Toronto jimmy@psi.toronto.edu Geoffrey Hinton University of Toronto and Google Brain geoffhinton@google.com Volodymyr Mnih Google Deep Mind vmnih@google.com Joel Z. Leibo Google Deep Mind jzl@google.com Catalin Ionescu Google Deep Mind cdi@google.com
Pseudocode	No	The paper provides mathematical equations for update rules and describes the model's steps, such as 'A(t) = λA(t 1) + ηh(t)h(t)T' and 'hs+1(t + 1) = f([Wh(t) + Cx(t)] + A(t)hs(t + 1))', but it does not include explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain an explicit statement or a direct link to the open-source code for the methodology described within this paper.
Open Datasets	Yes	We evaluated the multi-level visual attention model on the MNIST handwritten digit dataset. We performed facial expression recognition tasks on the CMU Multi-PIE face database [Gross et al., 2010].
Dataset Splits	Yes	We generated 100,000 training examples, 10,000 validation examples and 20,000 test examples. The hyper-parameters of the experiments were selected through grid search on the validation set.
Hardware Specification	No	The paper does not provide specific details regarding the hardware (e.g., GPU models, CPU types, or cloud infrastructure specifications) used to run the experiments.
Software Dependencies	No	The paper mentions the use of 'Adam optimizer' but does not specify any version numbers for software components, libraries, or programming languages used in the experiments.
Experiment Setup	Yes	For the rest of the paper, fast weight models are trained using layer normalization and the outer product learning rule with fast learning rate of 0.5 and decay rate of 0.95, unless otherwise noted. All the models were trained using mini-batches of size 128 and the Adam optimizer [Kingma and Ba, 2014]. A description of the training protocols and the hyper-parameter settings we used can be found in the Appendix.