reproducibilityindex.ai

AMRL: Aggregated Memory For Reinforcement Learning

Authors: Jacob Beck, Kamil Ciosek, Sam Devlin, Sebastian Tschiatschek, Cheng Zhang, Katja Hofmann

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Evaluating in Minecraft and maze environments that test long-term memory, we find that our model improves average return by 19% over a baseline that has the same number of parameters and by 9% over a stronger baseline that has far more parameters. In Section 4, we systematically evaluate how the sources of noise that affect RL agents affect the sample efficiency of AMRL and baseline approaches. We devise a series of experiments in two domains, (1) a symbolic maze domain and (2) 3D mazes in the game Minecraft.
Researcher Affiliation	Industry	Jacob Beck, Kamil Ciosek, Sam Devlin, Sebastian Tschiatschek, Cheng Zhang, Katja Hofmann Microsoft Research, Cambridge, UK Jacob Beck@alumni.brown.edu, Firstname.Lastname@microsoft.com
Pseudocode	No	The paper does not contain any blocks or figures explicitly labeled as "Pseudocode" or "Algorithm."
Open Source Code	No	The paper mentions using "open-source Project Malmo" but does not provide a link or statement confirming that the authors' own AMRL implementation or associated source code is publicly available.
Open Datasets	No	The paper describes creating environments (TMaze, Minecraft) for experiments and does not refer to using a publicly available or open dataset with a direct link, DOI, or specific citation for data access. It mentions "open-source Project Malmo" for creating Minecraft environments, which is a platform, not a dataset.
Dataset Splits	No	The paper does not explicitly state training/validation/test dataset splits. It mentions training parameters like "mini-batch size 200, train batch size 4,000" which refer to training process parameters, not data partitioning for validation.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or cloud computing instances.
Software Dependencies	Yes	Software We used Ray version 0.6.2 (Liang et al., 2018). We used python 3.5.2 for all experiments not in the appendix, while python 3.5.5 was used for some earlier experiments in the appendix. Python 3.6.8 was used for some empirical analysis.
Experiment Setup	Yes	For learning rates, the best out of 5e-3, 5e-4, and 5e-5 are reported in each experiment, over 5 initialization each. All LSTM sizes were size 256. DNC memory size is 16 (slots), word size 16 (16 floats), 4 read heads, 1 write head, and a 256-LSTM controller. For models with image input, FF1 consists of 2 convolutions layers: (4,4) kernel with stride 2 and output channels 16; (4,4) kernel with stride 2 and output-channels 32. Then this is flattened and a fully-connected layer of size 256 follows. For training our PPO agent: mini-batch size 200, train batch size 4,000, num sgd iter 30, gamma .98.