reproducibilityindex.ai

Learning Synthetic Environments and Reward Networks for Reinforcement Learning

Authors: Fabio Ferreira, Thomas Nierhoff, Andreas Sälinger, Frank Hutter

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our proposed new concept on a broad range of RL algorithms and classic control environments. In a one-to-one comparison, learning an SE proxy requires more interactions with the real environment than training agents only on the real environment. However, once such an SE has been learned, we do not need any interactions with the real environment to train new agents. Moreover, the learned SE proxies allow us to train agents with fewer interactions while maintaining the original task performance. Our empirical results suggest that SEs achieve this result by learning informed representations that bias the agents towards relevant states.
Researcher Affiliation	Collaboration	1 University of Freiburg 2 Bosch Center for Artificial Intelligence
Pseudocode	Yes	Algorithm 1: Learning Synthetic Env. with NES
Open Source Code	Yes	Our Py Torch (Paszke et al., 2019) code and models are made available publicly.1 1https://github.com/automl/learning environments
Open Datasets	Yes	Gym tasks (Brockman et al., 2016) Cart Pole and Acrobot, Cliff Walking (Sutton & Barto, 2018), Mountain Car Continuous-v0 (Brockman et al., 2016) and Half Cheetah-v3 (Todorov & Tassa, 2012)
Dataset Splits	No	The paper mentions 'early stopping heuristic' and 'Evaluate Agent' functions which involve testing on the real environment, but it does not specify explicit dataset splits like 'training/validation/test' or 'k-fold cross-validation' in the main text. For example, 'After agent training, we evaluated each agent on the real environment across 10 test episodes'.
Hardware Specification	Yes	Each worker had one Intel Xeon Gold 6242 CPU core at its disposal, resulting in an overall runtime of 6-7h on Acrobot and 5-6h on Cart Pole for 200 NES outer loop iterations.
Software Dependencies	No	The paper mentions 'Py Torch' but does not provide specific version numbers for any software dependencies, libraries, or programming languages. For example, 'Our Py Torch (Paszke et al., 2019) code and models are made available publicly.'
Experiment Setup	Yes	Experimental Setup So far we have described our proposed method on an abstract level and before we start with individual SE experiments we describe the experimental setup. In our work, we refer to the process of optimizing for suitable SEs with Algorithm 1 as SE training and the process of training agents on SEs as agent training). For both SE and agent training on the discrete-actionspace Cart Pole-v0 and Acrobot-v1 environments, we use DDQN (van Hasselt et al., 2016).