Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Associative Inference Using Fast Weight Memory

Authors: Imanol Schlag, Tsendsuren Munkhdalai, Jürgen Schmidhuber

ICLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Section 4 demonstrates the generality of our method through experiments in the supervised, self-supervised, and meta-reinforcement learning setting.
Researcher Affiliation Collaboration Imanol Schlag The Swiss AI Lab IDSIA / USI / SUPSI EMAIL Tsendsuren Munkhdalai Microsoft Research EMAIL J urgen Schmidhuber The Swiss AI Lab IDSIA / USI / SUPSI EMAIL
Pseudocode Yes Listing 1: Python3 code to sample new environments such that any state is reachable by any other state.
Open Source Code Yes Source code and data used in this paper is available at github.com/ischlag/Fast-Weight-Memory-public
Open Datasets Yes Source code and data used in this paper is available at github.com/ischlag/Fast-Weight-Memory-public" and "Penn Treebank (PTB; Mikolov et al. (2010)) or Wiki Text-2 (WT2; Merity et al. (2017))" and "We provide the preprocessed catb Ab I data together with our code so future work can compare using the same validation and test sequence.
Dataset Splits Yes We used the same train/test/valid split of the data as in regular b Ab I." and "Table 3: Statistics of the catb Ab I dataset based on our preprocessing of the regular b Ab I data. subset number of tokens number of stories number of questions train ~5M 56,376 179,909 valid ~560k 6,245 19,907 test ~560k 6,247 19,910
Hardware Specification Yes limited the amount of GPU memory to ~16GB for practical reasons." and "We thank NVIDIA Corporation for donating several DGX machines, and IBM for donating a Minsky machine.
Software Dependencies No The paper mentions software like Python3, PyTorch, Transformer-XL implementation, and Adam optimizer but does not provide specific version numbers for these software components.
Experiment Setup Yes We truncate backpropagation through time (t BPTT) to 200 tokens for all models and limited the amount of GPU memory to ~16GB for practical reasons. For every model, we performed a hyperparameter search in QA mode over the first 3k steps of which a smaller selection was trained for 30-60k steps. For example, for FWM: 'We set d LSTM = 256, d FWM = 32, Nr = 3 and searched experimented with two seeds for batch sizes 64, 128 and learning rates 0.0001, 0.00025, 0.0005, 0.001, 0.002.' (More details are provided in sections F.1 to F.4)