reproducibilityindex.ai

Learning Associative Inference Using Fast Weight Memory

Authors: Imanol Schlag, Tsendsuren Munkhdalai, Jürgen Schmidhuber

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Section 4 demonstrates the generality of our method through experiments in the supervised, self-supervised, and meta-reinforcement learning setting.
Researcher Affiliation	Collaboration	Imanol Schlag The Swiss AI Lab IDSIA / USI / SUPSI imanol@idsia.ch Tsendsuren Munkhdalai Microsoft Research tsendsuren.munkhdalai@microsoft.com J urgen Schmidhuber The Swiss AI Lab IDSIA / USI / SUPSI juergen@idsia.ch
Pseudocode	Yes	Listing 1: Python3 code to sample new environments such that any state is reachable by any other state.
Open Source Code	Yes	Source code and data used in this paper is available at github.com/ischlag/Fast-Weight-Memory-public
Open Datasets	Yes	Source code and data used in this paper is available at github.com/ischlag/Fast-Weight-Memory-public" and "Penn Treebank (PTB; Mikolov et al. (2010)) or Wiki Text-2 (WT2; Merity et al. (2017))" and "We provide the preprocessed catb Ab I data together with our code so future work can compare using the same validation and test sequence.
Dataset Splits	Yes	We used the same train/test/valid split of the data as in regular b Ab I." and "Table 3: Statistics of the catb Ab I dataset based on our preprocessing of the regular b Ab I data. subset number of tokens number of stories number of questions train ~5M 56,376 179,909 valid ~560k 6,245 19,907 test ~560k 6,247 19,910
Hardware Specification	Yes	limited the amount of GPU memory to ~16GB for practical reasons." and "We thank NVIDIA Corporation for donating several DGX machines, and IBM for donating a Minsky machine.
Software Dependencies	No	The paper mentions software like Python3, PyTorch, Transformer-XL implementation, and Adam optimizer but does not provide specific version numbers for these software components.
Experiment Setup	Yes	We truncate backpropagation through time (t BPTT) to 200 tokens for all models and limited the amount of GPU memory to ~16GB for practical reasons. For every model, we performed a hyperparameter search in QA mode over the ﬁrst 3k steps of which a smaller selection was trained for 30-60k steps. For example, for FWM: 'We set d LSTM = 256, d FWM = 32, Nr = 3 and searched experimented with two seeds for batch sizes 64, 128 and learning rates 0.0001, 0.00025, 0.0005, 0.001, 0.002.' (More details are provided in sections F.1 to F.4)