Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning Associative Inference Using Fast Weight Memory
Authors: Imanol Schlag, Tsendsuren Munkhdalai, Jürgen Schmidhuber
ICLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Section 4 demonstrates the generality of our method through experiments in the supervised, self-supervised, and meta-reinforcement learning setting. |
| Researcher Affiliation | Collaboration | Imanol Schlag The Swiss AI Lab IDSIA / USI / SUPSI EMAIL Tsendsuren Munkhdalai Microsoft Research EMAIL J urgen Schmidhuber The Swiss AI Lab IDSIA / USI / SUPSI EMAIL |
| Pseudocode | Yes | Listing 1: Python3 code to sample new environments such that any state is reachable by any other state. |
| Open Source Code | Yes | Source code and data used in this paper is available at github.com/ischlag/Fast-Weight-Memory-public |
| Open Datasets | Yes | Source code and data used in this paper is available at github.com/ischlag/Fast-Weight-Memory-public" and "Penn Treebank (PTB; Mikolov et al. (2010)) or Wiki Text-2 (WT2; Merity et al. (2017))" and "We provide the preprocessed catb Ab I data together with our code so future work can compare using the same validation and test sequence. |
| Dataset Splits | Yes | We used the same train/test/valid split of the data as in regular b Ab I." and "Table 3: Statistics of the catb Ab I dataset based on our preprocessing of the regular b Ab I data. subset number of tokens number of stories number of questions train ~5M 56,376 179,909 valid ~560k 6,245 19,907 test ~560k 6,247 19,910 |
| Hardware Specification | Yes | limited the amount of GPU memory to ~16GB for practical reasons." and "We thank NVIDIA Corporation for donating several DGX machines, and IBM for donating a Minsky machine. |
| Software Dependencies | No | The paper mentions software like Python3, PyTorch, Transformer-XL implementation, and Adam optimizer but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | We truncate backpropagation through time (t BPTT) to 200 tokens for all models and limited the amount of GPU memory to ~16GB for practical reasons. For every model, we performed a hyperparameter search in QA mode over the first 3k steps of which a smaller selection was trained for 30-60k steps. For example, for FWM: 'We set d LSTM = 256, d FWM = 32, Nr = 3 and searched experimented with two seeds for batch sizes 64, 128 and learning rates 0.0001, 0.00025, 0.0005, 0.001, 0.002.' (More details are provided in sections F.1 to F.4) |