AMRL: Aggregated Memory For Reinforcement Learning
Authors: Jacob Beck, Kamil Ciosek, Sam Devlin, Sebastian Tschiatschek, Cheng Zhang, Katja Hofmann
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluating in Minecraft and maze environments that test long-term memory, we find that our model improves average return by 19% over a baseline that has the same number of parameters and by 9% over a stronger baseline that has far more parameters. In Section 4, we systematically evaluate how the sources of noise that affect RL agents affect the sample efficiency of AMRL and baseline approaches. We devise a series of experiments in two domains, (1) a symbolic maze domain and (2) 3D mazes in the game Minecraft. |
| Researcher Affiliation | Industry | Jacob Beck*, Kamil Ciosek, Sam Devlin, Sebastian Tschiatschek, Cheng Zhang, Katja Hofmann Microsoft Research, Cambridge, UK *Jacob Beck@alumni.brown.edu, Firstname.Lastname@microsoft.com |
| Pseudocode | No | The paper does not contain any blocks or figures explicitly labeled as "Pseudocode" or "Algorithm." |
| Open Source Code | No | The paper mentions using "open-source Project Malmo" but does not provide a link or statement confirming that the authors' own AMRL implementation or associated source code is publicly available. |
| Open Datasets | No | The paper describes creating environments (TMaze, Minecraft) for experiments and does not refer to using a publicly available or open dataset with a direct link, DOI, or specific citation for data access. It mentions "open-source Project Malmo" for creating Minecraft environments, which is a platform, not a dataset. |
| Dataset Splits | No | The paper does not explicitly state training/validation/test dataset splits. It mentions training parameters like "mini-batch size 200, train batch size 4,000" which refer to training process parameters, not data partitioning for validation. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or cloud computing instances. |
| Software Dependencies | Yes | Software We used Ray version 0.6.2 (Liang et al., 2018). We used python 3.5.2 for all experiments not in the appendix, while python 3.5.5 was used for some earlier experiments in the appendix. Python 3.6.8 was used for some empirical analysis. |
| Experiment Setup | Yes | For learning rates, the best out of 5e-3, 5e-4, and 5e-5 are reported in each experiment, over 5 initialization each. All LSTM sizes were size 256. DNC memory size is 16 (slots), word size 16 (16 floats), 4 read heads, 1 write head, and a 256-LSTM controller. For models with image input, FF1 consists of 2 convolutions layers: (4,4) kernel with stride 2 and output channels 16; (4,4) kernel with stride 2 and output-channels 32. Then this is flattened and a fully-connected layer of size 256 follows. For training our PPO agent: mini-batch size 200, train batch size 4,000, num sgd iter 30, gamma .98. |