Transient Non-stationarity and Generalisation in Deep Reinforcement Learning
Authors: Maximilian Igl, Gregory Farquhar, Jelena Luketina, Wendelin Boehmer, Shimon Whiteson
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentally, we show that ITER improves performance on the challenging generalisation benchmarks Proc Gen and Multiroom. |
| Researcher Affiliation | Collaboration | University of Oxford. Corresponding author: Maximilian Igl (maximilian.igl@gmail.com) Now at Deep Mind, London Delft University of Technology |
| Pseudocode | Yes | Algorithm 1: Pseudo-Code for parallel ITER |
| Open Source Code | No | The paper does not provide an explicit statement about releasing its own source code or a direct link to a code repository for the described methodology. It cites external resources (e.g., Gym-Minigrid, Boxoban levels) which are open-source, but not its own implementation. |
| Open Datasets | Yes | We use the CIFAR-10 dataset for image classification (Krizhevsky et al., 2009) and artificially inject non-stationarity. [...] In the following, we evaluate ITER on the Multiroom (Chevalier-Boisvert & Willems, 2018) and on several environments from the Proc Gen (Cobbe et al., 2019a) benchmark |
| Dataset Splits | No | The paper mentions training and testing phases but does not explicitly specify a validation set split for hyperparameter tuning. For CIFAR-10, it states 'While the last 1500 epochs are trained on the full, unaltered dataset, we modify the training data in three different ways during the first 1000 epochs. Test data is left unmodified throughout training.' For Proc Gen, it states 'for each environment, we train on 500 randomly generated level layouts and test on additional, previously unseen levels.' No specific validation split is detailed. |
| Hardware Specification | No | The paper mentions 'Training is done on 4 GPUs in parallel' in Appendix B.3, but it does not specify the model or type of GPUs (e.g., NVIDIA A100, Tesla V100). No other specific hardware details like CPU models or memory are provided. |
| Software Dependencies | No | The paper lists hyperparameters for various algorithms like SGD, PPO, and Adam (e.g., 'SGD: Learning rate 3e-4', 'Adam: Learning rate 7e-4'). However, it does not provide specific version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages (e.g., Python 3.x). |
| Experiment Setup | Yes | Hyper-parameters and more details can be found in appendix B. [...] Table 2: Hyper-parameters used in the supervised learning experiment on CIFAR-10 [...] Table 3: Hyper-parameters used for Multiroom [...] Table 4: Hyper-parameters used for Proc Gen |