reproducibilityindex.ai

Evaluating the Performance of Reinforcement Learning Algorithms

Authors: Scott Jordan, Yash Chandak, Daniel Cohen, Mengxue Zhang, Philip Thomas

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate this method by evaluating a broad class of reinforcement learning algorithms on standard benchmark tasks.
Researcher Affiliation	Academia	1College of Information and Computer Sciences, University of Massachusetts, MA, USA.
Pseudocode	Yes	We provide pseudocode in Appendix C and source code in the repository.
Open Source Code	Yes	Source code for this paper can be found at https:// github.com/Scott Jordan/Evaluation Of RLAlgs.
Open Datasets	Yes	These algorithms are evaluated on 15 environments, eight discrete MDPs, half with stochastic transition dynamics, and seven continuous state environments: Cart-Pole (Florian, 2007), Mountain Car (Sutton & Barto, 1998), Acrobot (Sutton, 1995), and four variations of the pinball environment (Konidaris & Barto, 2009; Geramifard et al., 2015).
Dataset Splits	No	The paper discusses 'tuning phase' and 'testing phase' for algorithms and refers to 'trials' but does not specify a train/validation/test split for a dataset used in the traditional sense for model training and evaluation.
Hardware Specification	No	The paper mentions 'high performance computing equipment' in the acknowledgements, but no specific hardware details (e.g., GPU/CPU models, memory) are provided for the experiments.
Software Dependencies	No	The paper mentions 'Julia (Bezanson et al., 2017) or C++, where we have noticed approximately two orders of magnitude faster execution than similar Python implementations', but it does not provide specific version numbers for any of these programming languages or relevant libraries.
Experiment Setup	Yes	For the continuous state environments, each algorithm employs linear function approximation using the Fourier basis (Konidaris et al., 2011) with a randomly sampled order. See Appendix E for full details of each algorithm. For further details about the experiment see Appendix F.