Explain My Surprise: Learning Efficient Long-Term Memory by predicting uncertain outcomes

Authors: Artyom Sorokin, Nazar Buzun, Leonid Pugachev, Mikhail Burtsev

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Supervised Learning Experiments; 6 Reinforcement Learning Experiments; Figure 2: Final performance on supervised learning tasks.
Researcher Affiliation Academia Artyom Sorokin AIRI, MIPT Moscow, Russia asorokin@airi.net; Nazar Buzun AIRI Moscow, Russia buzun@airi.net; Leonid Pugachev MIPT Dolgoprudny, Russia puleon@mail.ru; Mikhail Burtsev AIRI, MIPT Moscow, Russia burtsev@airi.net
Pseudocode Yes A pseudocode for RNN Mem UP training is shown in Appendix G.
Open Source Code Yes For more information you can look at our implementation (link in Appendix E). The second version is also implemented in the code (link in Appendix E).
Open Datasets Yes For evaluation and comparison of our method we use four tasks: Copy [40], Scattered copy, Add [2] and permuted sequential MNIST (p MNIST) [41].
Dataset Splits No The paper specifies training and test dataset sizes (e.g., 'For Copy, Scattered copy and Add tasks train and test datasets have sizes 10K and 1K, for p MNIST 60K and 10K'), but it does not explicitly provide details about a separate validation split or its size/percentage.
Hardware Specification No The paper mentions 'GPU RAM' limitations for a baseline model ('it didn t fit into our GPU RAM') but does not provide specific details on the GPU models, CPU, or other hardware used for running their own experiments.
Software Dependencies No The paper mentions software like 'pytorch library' and 'RLPyt library [46]' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes In the majority of supervised learning experiments we set rollout length r between 10 and 60 steps. In all Reinforcement Learning experiments r = 1, i.e. recurrent memory is trained without actually using backpropagation through time. In All Experiments gθ is a simple LSTM network with additional input encoder. The encoder E rnn consists of a single fully-connected layer followed by two LSTM-layers with 128 hidden units and dropout probability 0.1. In the Add Task the model is trained with MSE loss... For Mem UP we use the discounted future reward with γ = 0.8 as a prediction target. The memory module gθ is trained with the rollout length of 1 step. In Vizdoom-Two-Colors we set number of targets K = 3.