reproducibilityindex.ai

Transient Non-stationarity and Generalisation in Deep Reinforcement Learning

Authors: Maximilian Igl, Gregory Farquhar, Jelena Luketina, Wendelin Boehmer, Shimon Whiteson

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimentally, we show that ITER improves performance on the challenging generalisation benchmarks Proc Gen and Multiroom.
Researcher Affiliation	Collaboration	University of Oxford. Corresponding author: Maximilian Igl (maximilian.igl@gmail.com) Now at Deep Mind, London Delft University of Technology
Pseudocode	Yes	Algorithm 1: Pseudo-Code for parallel ITER
Open Source Code	No	The paper does not provide an explicit statement about releasing its own source code or a direct link to a code repository for the described methodology. It cites external resources (e.g., Gym-Minigrid, Boxoban levels) which are open-source, but not its own implementation.
Open Datasets	Yes	We use the CIFAR-10 dataset for image classiﬁcation (Krizhevsky et al., 2009) and artiﬁcially inject non-stationarity. [...] In the following, we evaluate ITER on the Multiroom (Chevalier-Boisvert & Willems, 2018) and on several environments from the Proc Gen (Cobbe et al., 2019a) benchmark
Dataset Splits	No	The paper mentions training and testing phases but does not explicitly specify a validation set split for hyperparameter tuning. For CIFAR-10, it states 'While the last 1500 epochs are trained on the full, unaltered dataset, we modify the training data in three different ways during the first 1000 epochs. Test data is left unmodiﬁed throughout training.' For Proc Gen, it states 'for each environment, we train on 500 randomly generated level layouts and test on additional, previously unseen levels.' No specific validation split is detailed.
Hardware Specification	No	The paper mentions 'Training is done on 4 GPUs in parallel' in Appendix B.3, but it does not specify the model or type of GPUs (e.g., NVIDIA A100, Tesla V100). No other specific hardware details like CPU models or memory are provided.
Software Dependencies	No	The paper lists hyperparameters for various algorithms like SGD, PPO, and Adam (e.g., 'SGD: Learning rate 3e-4', 'Adam: Learning rate 7e-4'). However, it does not provide specific version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages (e.g., Python 3.x).
Experiment Setup	Yes	Hyper-parameters and more details can be found in appendix B. [...] Table 2: Hyper-parameters used in the supervised learning experiment on CIFAR-10 [...] Table 3: Hyper-parameters used for Multiroom [...] Table 4: Hyper-parameters used for Proc Gen