Explain My Surprise: Learning Efficient Long-Term Memory by predicting uncertain outcomes
Authors: Artyom Sorokin, Nazar Buzun, Leonid Pugachev, Mikhail Burtsev
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Supervised Learning Experiments; 6 Reinforcement Learning Experiments; Figure 2: Final performance on supervised learning tasks. |
| Researcher Affiliation | Academia | Artyom Sorokin AIRI, MIPT Moscow, Russia asorokin@airi.net; Nazar Buzun AIRI Moscow, Russia buzun@airi.net; Leonid Pugachev MIPT Dolgoprudny, Russia puleon@mail.ru; Mikhail Burtsev AIRI, MIPT Moscow, Russia burtsev@airi.net |
| Pseudocode | Yes | A pseudocode for RNN Mem UP training is shown in Appendix G. |
| Open Source Code | Yes | For more information you can look at our implementation (link in Appendix E). The second version is also implemented in the code (link in Appendix E). |
| Open Datasets | Yes | For evaluation and comparison of our method we use four tasks: Copy [40], Scattered copy, Add [2] and permuted sequential MNIST (p MNIST) [41]. |
| Dataset Splits | No | The paper specifies training and test dataset sizes (e.g., 'For Copy, Scattered copy and Add tasks train and test datasets have sizes 10K and 1K, for p MNIST 60K and 10K'), but it does not explicitly provide details about a separate validation split or its size/percentage. |
| Hardware Specification | No | The paper mentions 'GPU RAM' limitations for a baseline model ('it didn t fit into our GPU RAM') but does not provide specific details on the GPU models, CPU, or other hardware used for running their own experiments. |
| Software Dependencies | No | The paper mentions software like 'pytorch library' and 'RLPyt library [46]' but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | In the majority of supervised learning experiments we set rollout length r between 10 and 60 steps. In all Reinforcement Learning experiments r = 1, i.e. recurrent memory is trained without actually using backpropagation through time. In All Experiments gθ is a simple LSTM network with additional input encoder. The encoder E rnn consists of a single fully-connected layer followed by two LSTM-layers with 128 hidden units and dropout probability 0.1. In the Add Task the model is trained with MSE loss... For Mem UP we use the discounted future reward with γ = 0.8 as a prediction target. The memory module gθ is trained with the rollout length of 1 step. In Vizdoom-Two-Colors we set number of targets K = 3. |